KR100451940B1

KR100451940B1 - Data Analysis System and Method capable of Managing Customer Relations

Info

Publication number: KR100451940B1
Application number: KR10-2001-0059620A
Authority: KR
Inventors: 최경현
Original assignee: (주)프리즘엠아이텍
Priority date: 2001-09-26
Filing date: 2001-09-26
Publication date: 2004-10-08
Anticipated expiration: 2021-09-26
Also published as: KR20030026575A

Abstract

본 발명은 사용자가 몇가지 조건을 입력하여 주면 이를 근거로 기업 경영의 마케팅 대상이 되는 고객들의 정보를 K 프로토타입을 이용하여 고객 특성에 따라 분석하고, 그 결과를 사용자가 쉽게 인식할 수 있는 그래프 등으로 표시 출력해 주는 고객 관리 기능을 갖는 데이터 분석 시스템 및 그 방법에 관한 것이다.According to the present invention, if a user inputs a few conditions, the customer's information, which is the marketing target of the corporate management, is analyzed according to the characteristics of the customer using the K prototype, and the result can be easily recognized by the user. The present invention relates to a data analysis system having a customer management function that displays and outputs the data.

본 발명은 사용자가 입력한 분석 대상이 되는 데이터 베이스와 요소 데이터를 근거로 K 프로토타입을 이용하여 데이터 베이스 내의 내용을 세분화하고 군집화하여 쉽게 구별할 수 있게 시각적으로 표시 출력하는 데이터 분석 시스템의 고객 관리 방법에 있어서, 데이터 분석에 필요한 데이터를 입력하기 위한 데이터 입력 폼을 출력하는 단계, 상기 데이터 입력 폼을 통해 입력된 데이터를 임시 저장하는 단계, 상기 입력된 데이터 베이스내의 데이터를 표준화시키는 단계, 상기 입력된 데이터를 근거로 K 프로토타입 알고리즘을 이용하여 세분화하고 군집화하는 단계, 상기 군집화된 결과를 시각적으로 출력하는 단계를 포함하는 것을 특징으로 하는 데이터 분석 시스템의 고객 관리 방법을 제공한다.According to the present invention, the customer management of the data analysis system that visually displays and outputs the data in the database by segmenting, clustering and easily distinguishing the contents in the database using the K prototype based on the database and the element data to be analyzed by the user. A method, comprising: outputting a data input form for inputting data required for data analysis, temporarily storing data input through the data input form, normalizing data in the input database, and inputting the data Segmentation and clustering using the K prototype algorithm based on the collected data, and visually outputting the clustered results provides a customer management method of a data analysis system.

본 발명에 의하면, 사용자의 간단한 입력 조작만으로도 고객 데이터 베이스의 고객 특성을 분석하여 그 결과를 쉽게 알아볼 수 있도록 표시 출력하게 됨으로써, 사용자는 이를 근거로 판매 전략을 용이하게 수립할 수 있는 효과가 있다.According to the present invention, it is possible to analyze the customer characteristics of the customer database and display the result so that the result can be easily recognized by a simple input operation of the user, and thus the user can easily establish a sales strategy based on this.

Description

Data Analysis System and Method Capable of Managing Customer Relations

본 발명은 고객 관리 기능을 갖는 데이터 분석 시스템 및 그 방법에 관한 것으로, 더욱 상세하게는 사용자가 몇가지 조건을 입력하여 주면 이를 근거로 기업 경영의 마케팅 대상이 되는 고객들의 정보를 K 프로토타입을 이용하여 고객 특성에 따라 분석하고, 그 결과를 사용자가 쉽게 인식할 수 있는 그래프 등으로 표시 출력해 주는 고객 관리 기능을 갖는 데이터 분석 시스템 및 그 방법에 관한 것이다.The present invention relates to a data analysis system having a customer management function and a method thereof. More specifically, when a user inputs a few conditions, the information of customers who are the marketing targets of the corporate management is based on the K prototype. The present invention relates to a data analysis system and method having a customer management function that analyzes according to customer characteristics and displays and outputs the result in a graph that can be easily recognized by a user.

최근, 기업 경영 환경이 공급자 중심에서 수요자 중심으로 변화되면서, 소비자들의 다양한 욕구를 개인별로 얼마나 적절히 충족시켜 줄 수 있는가가 기업 경영의 성패를 좌우하게 되었다.Recently, as the business management environment has changed from supplier-centered to consumer-centered, the success or failure of corporate management depends on how appropriately individual consumers' various needs are satisfied.

기업 경영의 의사 결정과 마케팅 활동을 위해 대상 고객의 신상 정보나 구매 및 마케팅 관련 정보를 수집하고 축적하는 일은 매우 중요하다. 고객 개개인의 성향이나 욕구에 관한 정보는 거래 내역이나, 고객 불만 등의 데이터를 분석하거나 설문조사 등을 통하여 수집할 수 있고, 이의 적절한 분석 및 해석을 통해 의사 결정에 유용한 정보를 도출할 수 있다.It is very important to collect and accumulate identifiable information about the target customers or purchase and marketing related information for decision making and marketing activities of corporate management. Information about individual customers' disposition or desire can be collected through analysis or survey of data such as transaction history, customer complaints, etc., and information useful for decision making can be derived through appropriate analysis and interpretation.

경영 의사 결정을 위하여는 다량의 데이터 처리와 변환을 수반하는 복잡한 분석이 필요하게 되며, 통계학적인 지식 기반과 고가의 통계 분석 소프트웨어 없이는 수행하기가 곤란하다는 문제가 있다.Management decision-making requires complex analysis involving large amounts of data processing and transformations, and is difficult to perform without a statistical knowledge base and expensive statistical analysis software.

최근에는 전사적인 차원에서 데이터웨어 하우스(Dataware House)를 설립하거나 통합적 데이터 마이닝(Data Mining) 환경을 구축하는 경향이 있고, 이런 맥락에서 데이터 분석 환경 구축을 위한 회사의 비용은 더욱 높아지게 되었다. 이에 일반적인 기업의 마케터들은 자신의 고객 정보 분석과 잠재 고객 예측 등의 고난도 분석을 고가의 데이터 마이닝 시스템(Data Mining System) 구축 없이도 마치 엑셀(Excel) 등의 오피스(Office) 프로그램을 사용하듯이 쉽고 저렴하게 사용할 수 있는 새로운 데이터 분석 솔루션의 필요성을 인식하게 되었다.In recent years, there has been a tendency to set up a Data Warehouse House or to build an integrated Data Mining environment, and the company's cost for building a data analysis environment has become higher in this context. Therefore, general marketers can easily and inexpensively analyze the difficulty of their customer information and prospect prediction without using expensive data mining system, as if using an office program such as Excel. The need for a new data analysis solution is readily available.

전술한 필요성에 의해 데이터 마이닝 툴(Data Mining Tool)이라는 명칭으로 시장에 출시된 제품은 전 세계적으로 100여개가 넘고 그 가격 또한 천차만별이다. 그 중 잘 알려진 제품은 SAS(Statistical Analysis System)사의 "Enterprise Miner"와 SPSS(Statistical Package for the Social Sciences)사의 "Clementine", IBM사의 "Intelligent Miner" 및 오라클사의 "Darwin" 등이 있다.Due to the aforementioned needs, there are more than 100 products on the market under the name of Data Mining Tool, and their prices vary widely. Well-known products include "Statistical Analysis System (SAS)" Enterprise Miner "," Statistical Package for the Social Sciences "" Clementine ", IBM" Intelligent Miner "and Oracle" Darwin ".

상기 솔루션들은 종류마다 조금씩 다르지만 통계 분석 기법을 근간으로 하여 데이터베이스와의 연결 기능이나 분석 결과 도출 기능, 필터링 및 데이터 변환 기능, 매크로 기능 등을 제공해 주고 있지만, 다음과 같은 문제점들을 가지고 있다.Although the solutions vary slightly depending on the type, they provide a connection with a database, an analysis result derivation function, a filtering and data conversion function, and a macro function based on a statistical analysis technique, but have the following problems.

첫째, 전문 통계 분석가의 도움이 없이는 사용이 곤란하다는 것이다. 데이터 분석 자체가 통계학에 기반을 두고 있으므로, 통계학적인 배경이 없는 일반 마케터나 분석가가 접근하기에는 무리가 있으며, 전문가나 외부 컨설턴트의 도움이 절대적으로 필요하다는 것이다. 인터페이스적인 측면에서도 안내자 등이 없어 사용자가 데이터 상태 및 분석 과정 프로세스 등에 대하여 사전 지식이 없이는 진행하기가 곤란하다는 문제점이 있다.First, it is difficult to use without the help of a professional statistical analyst. The data analysis itself is based on statistics, making it difficult for general marketers or analysts without a statistical background to access and absolutely need the help of experts or external consultants. In terms of interface, there is no guide, so that it is difficult for a user to proceed without prior knowledge of data status and analytical process.

둘째, 실제 비지니스 중심이 아닌 분석 툴로서의 기능에만 촛점을 두고 있다는 것이다. 툴이 제공하는 분석 과정 및 분석 결과를 사용자가 해석하여 자신의 문제에 적용하여야 하기 때문에 구체적인 비즈니스 문제에 적용하는데 부족한 면이 있다. 사용자의 비즈니스 이슈(Business Issue)에 맞추어 그 이슈를 해결하는데 적당한 분석 모형 및 알고리즘을 선정하기가 쉽지 않으며, 분석 모형 및 알고리즘의 선정 및 최적화에 있어서 많은 시행착오가 필요하다는 문제점이 있다.Second, it focuses only on functioning as an analysis tool, not on actual business. The analysis process and analysis results provided by the tool have to be analyzed and applied to the user's problem, which is insufficient to apply to the specific business problem. It is not easy to select an appropriate analysis model and algorithm to solve the issue according to the user's business issue, and there is a problem that a lot of trial and error is required in selecting and optimizing the analysis model and algorithm.

셋째, 가격 및 구축 기간의 문제이다. 일반적인 데이터 마이닝 툴은 그 규모가 매우 방대하여 구입 비용만 억대에 달하는 고가이며, 데이터 마이닝 툴 자체의 구매 가격 이외에도 컨설팅 비용을 추가로 지불해야 하고, 툴을 구축할 엔지니어와 컨설턴트가 필요하여 상당 기간동안 이들의 지원을 받아야 하는 문제점이 있다.Third, it is a matter of price and construction period. In general, data mining tools are very large, costing only $ 100 million in purchase costs, and in addition to the purchase price of data mining tools themselves, consulting costs are additionally required, and engineers and consultants to build the tools are required for a long time. There is a problem that needs to be supported.

마지막으로, 외부의 데이터 베이스나 데이터 파일과의 연결 과정이 까다로울 뿐만 아니라 사용자가 데이터를 직접 입력하는 데이터 로딩 과정이 번거롭고 세부 과정이 많아 컴퓨터에 대한 전문 지식이 없는 분석가가 이용하기에는 다소 곤란하다는 것과, 데이터 변환 및 표준화 과정이 사용자의 판단과 통찰력을 필요로 하는 복잡한 기능들을 조합하여 수행되어야 하므로 사용자에 따라 데이터 분석의 질이 천차만별이어서 분석 결과의 신뢰성을 보장해 주지 못한다는 문제점이 있다.Finally, not only is it difficult to connect to external databases or data files, but the data loading process for users entering data directly is cumbersome and detailed, making it difficult for analysts without computer expertise to use. Since the data conversion and standardization process must be performed by combining complex functions requiring the judgment and insight of the user, there is a problem in that the quality of data analysis varies greatly depending on the user, thereby ensuring the reliability of the analysis results.

전술한 문제점을 해결하기 위해 본 발명은, 고객의 정보가 저장되어 있는 데이터 베이스의 내용을 사용자가 입력한 조건을 근거로 K 프로토타입을 이용하여 분석하고 그 결과를 표시 출력해 주는 고객 관리 기능을 갖는 데이터 분석 시스템을 제공함에 그 목적이 있다.In order to solve the above problems, the present invention provides a customer management function that analyzes the contents of a database in which customer information is stored using a K prototype based on a condition input by a user and displays and outputs the results. The purpose is to provide a data analysis system having.

본 발명에 따른 또 다른 목적은 사용자가 몇가지 조건을 입력해 주면, 이를 근거로 기업 경영의 마케팅 대상이 되는 고객들의 정보를 K 프로토타입을 이용해 고객 특성에 따라 분석하고, 그 결과를 쉽게 구별할 수 있는 그래프 등으로 표시 출력해 주는 고객 관리 데이터 분석 방법을 제공함에 있다.Another object of the present invention is to input a few conditions, based on the information of the customers of the marketing target of the company management based on the characteristics of the customer using the K prototype, the results can be easily distinguished The present invention provides a method of analyzing customer management data that displays and outputs a graph.

본 발명의 목적에 의하면, 고객의 정보가 저장되어 있는 데이터 베이스의 내용을 사용자가 입력한 조건을 근거로 K 프로토타입을 이용하여 분석하고 그 결과를 표시 출력해 주는 고객 관리 기능을 갖는 데이터 분석 시스템에 있어서, 데이터를 입력 또는 선택하기 위한 키입력부와, 데이터 입력을 위한 데이터 입력 폼을 제시하고 이를 통해 입력된 데이터를 중앙 처리 장치에 전달하는 데이터 입력 스레드, 입력된 데이터를 표준화시키기 위해 표준 데이터로 변환하는 기능을 담당하는 데이터 변환 스레드, 분석 대상이 되는 데이터 베이스내의 데이터들을 중앙 처리 장치에 매칭하는 기능을 담당하는 DB 매칭 스레드, K 프로토타입 알고리즘의 연산을 실행하는 알고리즘 스레드, 상기 알고리즘 스레드에 의해 연산된 결과를 도식화하는 기능을 담당하는 결과 분석 스레드, 상기 키입력부를 통해 입력된 데이터나 알고리즘의 연산 처리된 중간 값을 임시 저장하기 위한 데이터 저장부, 시스템의 동작 실행 현황이나 데이터 베이스의 분석된 결과를 가시적으로 출력하는 모니터, 상기 키입력부를 통해 입력된 데이터를 근거로 상기 데이터 입력 스레드나 상기 데이터 변환 스레드, 상기 알고리즘 스레드를 구동하여 상기 분석 대상이 되는 데이터 베이스의 내용을 읽어들여 분석하고, 그 분석 결과를 상기 모니터로 출력하는 중앙 처리 장치를 포함하는 것을 특징으로 하는 고객 관리 기능을 갖는 데이터 분석 시스템이 제공된다.According to an object of the present invention, a data analysis system having a customer management function that analyzes the contents of a database in which customer information is stored using a K prototype based on a condition input by a user and displays and outputs the result. A data input thread for presenting a key input unit for inputting or selecting data, a data input form for inputting data, and transferring the input data to the central processing unit, and standardizing the input data. Data conversion thread responsible for converting function, DB matching thread responsible for matching data in the database to be analyzed to the central processing unit, Algorithm thread for performing the operation of the K prototype algorithm, by the algorithm thread Responsible for plotting computed results And an analysis thread, a data storage unit for temporarily storing data inputted through the key input unit or an algorithm-processed intermediate value, a monitor for visually outputting the execution status of the system or the analyzed result of the database, and the key. A center for driving the data input thread, the data conversion thread, and the algorithm thread based on the data input through the input unit to read and analyze the contents of the database to be analyzed, and output the analysis result to the monitor. Provided is a data analysis system having a customer management function comprising a processing device.

본 발명의 다른 목적에 의하면, 사용자가 입력한 분석 대상이 되는 데이터 베이스와 요소 데이터를 근거로 K 프로토타입을 이용하여 데이터 베이스 내의 내용을 세분화하고 군집화하여 쉽게 구별할 수 있게 시각적으로 표시 출력하는 고객 관리 데이터 분석 방법에 있어서, 데이터 분석에 필요한 데이터를 입력하기 위한 데이터 입력 폼을 출력하는 단계, 상기 데이터 입력 폼을 통해 입력된 데이터를 임시 저장하는 단계, 상기 입력된 데이터를 근거로 데이터 베이스내의 데이터를 표준화시키는 단계, 상기 입력된 데이터를 근거로 K 프로토타입 알고리즘을 이용하여 세분화하고 군집화하는 단계, 상기 군집화된 결과를 시각적으로 출력하는 단계를 포함하는 것을 특징으로 하는 고객 관리 데이터 분석 방법이 제공된다.According to another object of the present invention, based on the database and the element data to be analyzed by the user, using the K prototype, the contents of the database can be segmented and clustered to visually display and output for easy discrimination. A method for analyzing management data, the method comprising: outputting a data input form for inputting data required for data analysis, temporarily storing data input through the data input form, and data in a database based on the input data Standardizing the data, segmenting and clustering using the K prototype algorithm based on the input data, and visually outputting the clustered results are provided. .

도 1은 본 발명의 실시예에 따른 고객 관리 기능을 갖는 데이터 분석 시스템의 구성을 개략적으로 나타낸 블럭 구성도,1 is a block diagram schematically showing the configuration of a data analysis system having a customer management function according to an embodiment of the present invention;

도 2는 본 발명의 실시예에 따른 데이터 분석 시스템의 고객 관리 방법을 나타낸 순서도,2 is a flowchart illustrating a customer management method of a data analysis system according to an exemplary embodiment of the present invention;

도 3은 데이터를 입력하기 위한 데이터 입력 폼을 나타낸 도면,3 is a view showing a data input form for inputting data;

도 4는 중앙 처리 장치가 읽어들인 분석 대상이 되는 데이터 베이스의 내용을 나타낸 도면,4 is a diagram showing the contents of a database to be analyzed by the central processing unit;

도 5a는 K 프로토타입을 이용해 분석한 결과를 막대 그래프로 출력한 도면,Figure 5a is a bar graph output of the results analyzed using the K prototype,

도 5b는 K 프로토타입을 이용해 분석한 결과를 도넛 그래프로 출력한 도면이다.Figure 5b is a diagram showing the results of the analysis using the K prototype as a donut graph.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

10 : 키입력부 20 : 데이터 입력 스레드10: key input unit 20: data input thread

21 : 데이터 베이스 입력란 22 : 클러스터 입력란21: Database Field 22: Cluster Field

23 : 속성 리스트 입력란 24 : 독립 변수 입력란23: Attribute list input field 24: Independent variable input field

25 : 독립 변수 선택 버튼 26 : 종속 변수 입력란25: independent variable selection button 26: dependent variable input field

27 : 종속 변수 선택 버튼 30 : 데이터 변환 스레드27: dependent variable selection button 30: data conversion thread

40 : DB 매칭 스레드 50 : 알고리즘 스레드40: DB Matching Thread 50: Algorithm Thread

60 : 결과 분석 스레드 70 : 데이터 저장부60: result analysis thread 70: data storage

80 : 모니터 90 : 중앙처리장치(CPU)80: monitor 90: central processing unit (CPU)

이하, 첨부한 도면을 참조하여 본 발명에 따른 바람직한 실시예를 설명한다.Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings.

먼저, 본 발명의 이해를 돕기 위해, 본 발명에 따른 고객 관리 기능을 갖는 데이터 분석 시스템의 개념을 개략적으로 설명하고, 본 발명에서 사용되는 용어에 대해 간단히 설명한다.First, in order to help the understanding of the present invention, the concept of a data analysis system having a customer management function according to the present invention will be briefly described, and the terms used in the present invention will be briefly described.

본 발명에 따른 고객 관리 기능을 갖는 데이터 분석 시스템은 솔루션 형태의 프로그램으로서, 사용자가 데이터 분석 시스템 프로그램을 실행하여 분석해야 할 대상이 되는 데이터 베이스의 화일명과, 그 데이터 베이스 내 요소 중 데이터 분석에 이용할 독립 변수와 종속 변수를 지정함과 더불어 클러스터 수를 임의로 설정하게 된다. 여기서, 클러스터 수는 사용자가 원하는 동질적인 고객 군의 갯수이고, 독립 변수는 고객의 직업이나 주소, 나이 등과 같이 데이터 베이스의 내용을 세분화하는 기준이 되는 속성을 나타내는 변수이며, 종속 변수는 독립 변수를 이용한 세분화를 통해 각 클러스터별로 더 알고 싶은 속성을 나타내는 변수이다. 예컨대, 고객의 연체료나 연체율 또는 구입 품목, 구입량을 나타낸다.The data analysis system having a customer management function according to the present invention is a program in the form of a solution, which is used for data analysis of a file name of a database to be analyzed by a user by executing a data analysis system program and the elements in the database. In addition to specifying independent and dependent variables, the number of clusters can be set arbitrarily. Here, the number of clusters is the number of homogeneous customer groups desired by the user, and the independent variable is a variable representing an attribute that is a standard for subdividing the contents of the database such as the job, address, and age of the customer, and the dependent variable is an independent variable. It is a variable that represents the attributes that you want to know more for each cluster through the segmentation used. For example, the late fees, delinquency rate, purchased items, and purchase amount of customers are shown.

전술한 바와 같이 사용자가 입력한 데이터를 근거로 데이터 분석 시스템은 K 프로토타입(K-Prototype) 알고리즘을 이용하여 유사한 속성을 가지는 개체들을 분류하고 묶어서 그룹을 구성하고, 구성된 각 그룹별로 종속 변수, 독립 변수 요인별 비율을 통계 처리하여 그 결과를 화면상에 출력해 줌으로써, 사용자는 고객들의 물품 구입 성향이나 특성을 한눈에 파악할 수 있게 되는 것이다. 여기서, 상기 프로토타입이란 어떤 시스템에 대하여 특정 기능 등을 수행하기 위해 만든 알고리즘으로서, 본 발명에 따른 기능을 수행하는 알고리즘을 K 프로토타입이라 정의한다.As described above, based on the data input by the user, the data analysis system classifies and groups objects having similar properties using the K-Prototype algorithm to form a group, and forms a dependent variable and an independent variable for each group. By statistically processing the ratio for each variable factor and outputting the result on the screen, the user can grasp the customer's purchase tendency or characteristics at a glance. Here, the prototype is an algorithm made to perform a specific function or the like for a system, and an algorithm for performing a function according to the present invention is defined as a K prototype.

또한, 본 발명에 따른 데이터 분석 시스템은 각각의 스레드(Thread)를 통해 그 기능을 수행하게 되는데, 하나의 프로그램을 프로세스(Process)라고 볼 때 이 스레드는 하나의 프로그램 내에서의 실행 단위를 말한다. 즉, 한 프로세스 내에서 서로 독립적인 프로그램 카운터(Program Counter)를 갖고 있는 프로그램의 흐름을말한다. 따라서, 본 발명은 각 작업(Task)을 스레드로 표현하고, 이러한 스레드를 여러 개 둘 수 있도록 함으로써 멀티태스킹(Multitasking)이 가능하도록 하였다.In addition, the data analysis system according to the present invention performs its function through each thread. When one program is referred to as a process, the thread refers to a unit of execution in one program. In other words, it refers to the flow of a program having independent program counters within a process. Therefore, the present invention represents each task as a thread, and multitasking is possible by allowing several such threads to be provided.

도 1은 본 발명의 실시예에 따른 고객 관리 기능을 갖는 데이터 분석 시스템의 구성을 개략적으로 나타낸 블럭 구성도이다.1 is a block diagram schematically showing the configuration of a data analysis system having a customer management function according to an embodiment of the present invention.

도 1에서 참조 번호 10은 분석 대상이 되는 고객 관련 데이터를 담고 있는 데이터 베이스명을 선택 입력하거나, 클러스터의 수 또는 독립 및 종속 변수를 선택하기 위한 키입력부이고, 참조 번호 20은 데이터 베이스의 분석에 필요한 요소 데이터를 입력하기 위한 데이터 입력 폼(Input Form)을 제시하고 이를 통해 입력된 요소 데이터를 이후에 설명할 중앙 처리 장치(90)에 전달하는 데이터 입력 스레드, 참조 번호 30은 입력된 데이터를 표준화시키기 위해 표준 데이터로 변환하는 기능을 담당하는 데이터 변환 스레드, 참조 번호 40은 분석 대상이 되는 데이터 베이스 내의 개별 데이터들을 본 시스템에 적용할 수 있도록 매칭하는 기능을 담당하는 DB 매칭 스레드이다.In FIG. 1, reference numeral 10 denotes a key input unit for selecting or inputting a database name containing customer related data to be analyzed, or selecting a number of clusters or independent and dependent variables, and reference numeral 20 denotes an analysis of the database. Data input thread for presenting a data input form for inputting the required element data and passing the input element data to the central processing unit 90 to be described later, reference numeral 30 standardizes the input data. Data conversion thread, which is in charge of converting into standard data for reference, reference numeral 40 is a DB matching thread, which is responsible for matching individual data in the database to be analyzed to apply to the system.

또한, 참조 번호 50은 데이터 형태에 따라 K 프로토타입 알고리즘의 연산을 실행하는 알고리즘 스레드로서, 수치형 데이터일 경우 유클리디안 거리 함수로, 범주형 데이터일 경우 클러스터의 프로토타입이 차지하는 비율로서 나타내되, 수치형 데이터에 대한 범주형 데이터의 가중치는 수치형 데이터에 대한 표준 편차의 1/2을 적용하게 된다.In addition, reference numeral 50 is an algorithm thread for performing the operation of the K prototype algorithm according to the data type, which is represented as Euclidean distance function for numerical data and as a percentage of the cluster prototype for categorical data. For example, the weight of categorical data for numerical data applies 1/2 of the standard deviation for numerical data.

도 1에서, 참조 번호 60은 알고리즘 스레드(50)에 의해 연산된 결과를 각 클러스터별로 각 속성별 요인의 비율이 높은 순위대로 선별해, 예컨대, 막대 그래프나 도넛 그래프 등으로 도식화하는 기능을 담당하는 결과 분석 스레드, 참조 번호 70은 키입력부(10)를 통해 선택 입력된 요소 데이터 또는 알고리즘의 연산 처리된 중간 값 등을 임시 저장하기 위한 데이터 저장부, 참조 번호 80은 시스템의 동작 실행 과정이나 데이터 베이스의 분석된 결과를 가시적으로 출력하는 모니터, 참조 번호 90은 분석 대상인 데이터 베이스에 포함되어 있는 데이터를 읽어들이고, 키입력부(10)를 통해 선택 입력된 요소 데이터를 근거로 데이터 입력 스레드(20)나 데이터 변환 스레드(30) 및 알고리즘 스레드(50)를 구동하여 데이터를 분석하고, 그 분석 결과를 모니터(80)로 출력 제어하는 중앙 처리 장치(CPU)이다.In FIG. 1, reference numeral 60 denotes a function of selecting a result calculated by the algorithm thread 50 in order of a high ratio of factors of each attribute for each cluster, for example, plotting a bar graph or a donut graph. A result analysis thread, reference numeral 70 denotes a data storage unit for temporarily storing element data selected or inputted through the key input unit 10 or an intermediate value processed by an algorithm, and reference numeral 80 denotes an operation execution process or a database of a system. A monitor for visually outputting the analyzed result of the reference numeral 90 reads data included in the database to be analyzed, and based on the element data selected and input through the key input unit 10, the data input thread 20 or The data conversion thread 30 and the algorithm thread 50 are driven to analyze the data, and the analysis result is output to the monitor 80. A central processing unit (CPU) for control.

이어, 전술한 바와 같이 구성된 고객 관리 기능을 갖는 데이터 분석 시스템의 동작과 관련하여 도 2a 및 도 2b에 도시된 순서도를 참조하여 본 발명에 따른 고객 관리 데이터 분석 방법에 대하여 설명한다.Next, the method of analyzing customer management data according to the present invention will be described with reference to the flowcharts shown in FIGS. 2A and 2B in relation to the operation of the data analysis system having the customer management function configured as described above.

사용자가 키입력부(10)에 구비된 키들을 이용하거나 마우스 버튼을 이용하여 시스템의 동작을 실행시키면(S2 단계), 키입력부(10)로부터 시스템 동작 실행에 관한 키입력을 인식한 중앙 처리 장치(90)는 데이터 입력 스레드(20)를 구동하여 도 3에 도시된 바와 같은 데이터 입력 양식을 모니터(80)로 출력시키게 된다(S4 단계).When the user executes the operation of the system by using the keys provided in the key input unit 10 or by using the mouse button (step S2), the central processing unit that recognizes the key input for executing the system operation from the key input unit 10 ( 90 drives the data input thread 20 to output the data input form as shown in FIG. 3 to the monitor 80 (step S4).

도 3에 도시된 데이터 입력 양식은 분석 대상이 되는 데이터 베이스명이나 데이터 베이스의 화일명을 선택적으로 입력하는 데이터베이스 입력란(21)과, 클러스터 수를 입력하는 클러스터 입력란(22), 데이터베이스의 필드명들의 목록을 보여 주는 속성 리스트 입력란(23), 독립 변수의 목록을 보여 주는 독립 변수입력란(24), 속성 리스트 중에서 독립 변수의 요소로 설정하기 위한 독립 변수 선택 버튼(25), 종속 변수의 목록을 보여 주는 종속 변수 입력란(26), 속성 리스트 중 종속 변수의 요소를 설정하기 위한 종속 변수 선택 버튼(27) 등으로 구성된다.The data entry form shown in FIG. 3 includes a database input field 21 for selectively inputting a database name or a file name of a database to be analyzed, a cluster input field 22 for inputting the number of clusters, and a list of field names of a database. Attribute list text box (23) to show, independent variable text box (24) to show the list of independent variables, independent variable selection button (25) to set as an element of the independent variable from the property list, and a list of dependent variables The dependent variable input field 26, and the dependent variable selection button 27 for setting the element of the dependent variable in an attribute list, etc. are comprised.

사용자는 모니터(80)로 출력된 데이터 입력 양식을 확인하고, 키입력부(10)에 구비된 다수의 키들을 사용해 입력란에서 요구하는 데이터들을 입력하게 된다.The user checks the data input form output to the monitor 80, and inputs data required by an input box using a plurality of keys provided in the key input unit 10.

즉, 사용자는 데이터 베이스 입력란(21)을 통해 분석 대상이 되는 데이터베이스를 선택하거나 화일 형태, 예컨대, 액세스(Access)나 엑셀(Excel) 등으로 된 화일명을 선택하게 된다. 데이터베이스 입력란(21)을 통해 데이터 베이스나 화일이 선택 입력되면(S6 단계), 중앙 처리 장치(90)는 DB 매칭 스레드(40)를 구동하여 선택 입력한 데이터 베이스의 내용을 도 4에 제시한 바와 같이 읽어 들이고, 그 내용을 데이터 저장부(70)의 일정 주소 번지에 임시 저장해 둠과 더불어 속성 리스트 입력란(23)으로 데이터 베이스의 필드에 해당하는 속성 목록을 출력해 주게 된다(S8 단계).That is, the user selects a database to be analyzed through the database input field 21 or selects a file name in the form of a file, for example, Access or Excel. When a database or file is selected and input through the database input field 21 (step S6), the central processing unit 90 drives the DB matching thread 40 to display the contents of the selected database as shown in FIG. 4. In addition, the contents are temporarily stored at a predetermined address of the data storage unit 70, and the attribute list corresponding to the field of the database is output to the attribute list input field 23 (step S8).

상기 읽어들인 데이터 베이스나 화일의 데이터 내용 중 속성에 해당하는, 예컨대, 도 4에 도시된 데이터 베이스 내용의 경우, 컬럼(Column)명이나 필드(Field)명에 해당하는 직업, 주소, 나이 등을 속성 리스트 입력란(23)에 데이터 베이스 속성에 대한 목록으로 나타내 주게 된다.For example, in the case of the database content shown in FIG. 4, the job, address, age, etc. corresponding to the column name or the field name may be selected. The attribute list input field 23 shows a list of database attributes.

이어, 사용자는 자신이 원하는 클러스터 갯수를 클러스터 입력란(22)을 통해 선택 입력하게 되고, 중앙 처리 장치(90)는 이 입력된 클러스터 갯수에 대한 데이터를 데이터 저장부(70)에 임시 저장한다. 또한, 사용자는 모니터(80)로 출력된 도3에 도시된 데이터 입력 양식에서, 속성 리스트 입력란(23)에 나타나 있는 목록 중에서 독립 변수로 이용할 목록을 지정하고, 독립 변수 선택 버튼(25)을 클릭하여 독립 변수 입력란(24)으로 이동 설정하게 된다. 마찬가지로, 속성 리스트 중 종속 변수로 이용할 목록을 지정하고, 종속 변수 선택 버튼(27)을 클릭하여 종속 변수 입력란(26)으로 이동 설정하게 된다.Subsequently, the user selects and inputs the desired number of clusters through the cluster input field 22, and the central processing unit 90 temporarily stores data about the input number of clusters in the data storage unit 70. In addition, in the data input form shown in FIG. 3 output to the monitor 80, the user designates a list to be used as an independent variable from the list shown in the attribute list input field 23, and clicks the independent variable selection button 25. To set the independent variable input field 24. Similarly, the list of attributes to be used as the dependent variable is designated, and the dependent variable selection button 27 is clicked to move to the dependent variable input box 26.

분석 대상이 되는 데이터 베이스는 도 4에 도시된 바와 같이 속성에 해당하는 각 필드에 해당 데이터들이 배치되어 있는 것을 볼 수 있다. 속성에 해당하는 필드(Field)들은 데이터 베이스 설계시, 그 데이터 타입(Data Type)으로서 텍스트(Text)나 숫자, 날짜/시간 등이 설정되므로, 중앙 처리 장치(90)는 설정된 데이터 타입을 근거로 각 필드에 속한 데이터가 범주형 데이터인지 수치형 데이터인지를 인식하게 되는 것이다. 즉, 중앙 처리 장치(90)는 DB 매칭 스레드(40)를 구동하여 분석 대상이 되는 데이터 베이스의 내용을 읽어들일 때, 도 4에 도시된 필드 중 직업이나 주소, 구입 품목 등은 범주형 데이터로 인식하고, 나이나 구입량 필드에 속한 데이터는 수치형 데이터로 인식하게 되는 것이다.As shown in FIG. 4, the database to be analyzed can be seen that the corresponding data are disposed in each field corresponding to the attribute. Fields corresponding to attributes are set as text, number, date / time, etc. as the data type when designing the database, so the CPU 90 based on the set data type It is to recognize whether data belonging to each field is categorical data or numeric data. That is, when the central processing unit 90 drives the DB matching thread 40 to read the contents of the database to be analyzed, the occupation, the address, the purchased item, etc. among the fields shown in FIG. 4 are categorical data. The data belonging to the age or purchase amount field is recognized as numerical data.

전술한 과정에 의해서 데이터 베이스 이외의 다른 데이터, 즉, 클러스터 수나 독립 변수, 종속 변수 등의 데이터 입력이 완료되면(S10 단계), 중앙 처리 장치(90)는 상기 입력된 데이터들을 데이터 저장부(70)의 일정 주소 번지에 임시 저장해 두게 된다(S12 단계).When the data input other than the database, that is, the number of clusters, independent variables, dependent variables, etc. is completed by the above-described process (step S10), the central processing unit 90 stores the input data in the data storage unit 70. ) Is temporarily stored at a predetermined address (step S12).

그리고, 중앙 처리 장치(90)는 데이터 변환 스레드(30)를 구동하여 데이터 베이스로부터 읽어들인 데이터 중 수치형 데이터를 0과 1 사이로 표준화시킨다(S14단계). 이 표준화는 각각의 데이터를 수치형 데이터 중 가장 큰 값으로 나누어 주면 가장 큰 값은 1이 되고 그보다 작은 데이터들은 0과 1 사이의 데이터로 변환되는 원리를 이용하여 달성된다. 이 표준화는 그 수치값이 예컨대, 10,000이나 100,000처럼 큰 값을 갖더라도 0과 1 사이의 상대적인 값으로 변환시켜 줌으로써 범주형 데이터와의 편차를 줄이게 된다.Then, the central processing unit 90 drives the data conversion thread 30 to normalize the numeric data among the data read from the database to 0 and 1 (step S14). This standardization is achieved by dividing each data into the largest of the numerical data, where the largest value is 1 and the smaller data is converted into data between 0 and 1. This standardization reduces the deviation from categorical data by converting the numerical value to a relative value between 0 and 1 even if it has a large value such as 10,000 or 100,000.

이어, 중앙 처리 장치(90)는 데이터 입력 폼을 통해 입력된 데이터들을 근거로 클러스터 입력란(22)에 입력된 클러스터 갯수만큼 클러스터를 랜덤하게 설정하고, 각 클러스터마다 클러스터 내 중심이 되는 프로토타입을 임의로 설정하게 된다(S16 단계).Subsequently, the central processing unit 90 randomly sets clusters as many as the number of clusters input in the cluster input field 22 based on the data input through the data input form, and randomly sets a prototype that is the center of the cluster for each cluster. It is set (step S16).

중앙 처리 장치(90)는 상기 설정된 각 클러스터마다 수치형 속성이 존재하는지를 판별하게 되는데(S18 단계), 검사한 클러스터에 수치형 속성이 존재하는 경우(S20 단계)에는 알고리즘 스레드(50)를 구동하여 수학식 1과 같은 K 프로토타입 알고리즘을 실행해 수치형 데이터를 유클리디안 거리 함수로 계산하게 된다(S22 단계).The central processing unit 90 determines whether a numerical attribute exists in each of the clusters set (step S18). When the numerical attribute exists in the inspected cluster (step S20), the CPU 90 drives the algorithm thread 50. The K prototype algorithm as shown in Equation 1 is executed to calculate the numerical data as a Euclidean distance function (step S22).

상기 수학식 1에서, j는 독립 변수 또는 종속 변수, i는 클러스터, x는 수치형 데이터, q는 클러스터내 프로토타입, l은 클러스터 번호를 나타내며, r는 수치형을 나타내는 상징적 기호이다. 또한,는 i 번째 데이터의 j 번째 수치형 속성의 값을 나타내고,는 l 번째 클러스터의 j 번째 수치형 속성의 프로토타입을 나타낸다. 따라서, 수학식 1에 의해 j 번째 수치형 속성의 i 번째 데이터와 l 번째 클러스터의 j 번째 프로토타입과의 거리 값을 구하게 된다.In Equation 1, j is an independent variable or a dependent variable, i is a cluster, x is numeric data, q is a prototype in a cluster, l is a cluster number, and r is a symbolic symbol representing a numeric type. Also, Represents the value of the j th numeric attribute of the i th data, Denotes the prototype of the j th numeric attribute of the l th cluster. Therefore, the distance value between the i th data of the j th numeric attribute and the j th prototype of the l th cluster is obtained by Equation 1.

상기 S20 단계에서, 상기 설정된 클러스터마다 수치형 속성이 존재하는지를 검사한 결과, 클러스터내 수치형 속성이 존재하지 않는 경우에는 데이터 베이스에서 읽어들인 모든 데이터를 각 클러스터마다 랜덤하게 배정하게 된다(S24 단계).In step S20, as a result of checking whether there is a numeric attribute for each of the set clusters, if there is no numeric attribute in the cluster, all data read from the database is randomly assigned to each cluster (step S24). .

다음으로, 중앙 처리 장치(90)는 알고리즘 스레드(50)를 구동하고 수학식 2와 같이 프로토타입에 대해 범주형 데이터가 속하는 클러스터내에서 임의의 데이터가 차지하는 비율을 계산한다(S26 단계).Next, the central processing unit 90 drives the algorithm thread 50 and calculates a ratio of arbitrary data in the cluster to which the categorical data belongs to the prototype as shown in Equation 2 (step S26).

상기 수학식 2에서,는 i번째 데이터의 j번째 범주형 속성의 값을 나타내고,는 l번째 클러스터의 j번째 범주형 속성의 프로토타입을 나타내며, δ는 클러스터내 프로토타입이 차지하는 비율을 구하는 함수를 나타낸다. 또한,의 의미는가 l 클러스터내에서 j 번째 범주형 속성의 모든 요인들의 하나의 요소임을 나타낸다. 그리고, p는 l 클러스터내에서가 차지하는 비율을 나타내고,은 l 클러스터내 데이터의 갯수를 나타내며,는 범주형속성의 갯수를 나타낸다.In Equation 2, Represents the value of the j th categorical attribute of the i th data, Denotes the prototype of the j-th categorical attribute of the l-th cluster, and δ denotes a function for calculating the proportion of prototypes in the cluster. Also, Means Is an element of all the factors of the j th categorical attribute in the cluster. And p is in l cluster Represents the proportion of Denotes the number of data in the cluster, Denotes the number of categorical attributes.

따라서, 상기 수학식 2가 의미하는 것은 임의의 데이터가 클러스터내에서 차지하는 비율은 전체(1)에서 프로토타입가 차지하는 비율을 빼면 구할수 있게 된다.Therefore, Equation 2 means that arbitrary data Accounted for in the cluster is the prototype for the whole (1). Subtract the ratio of to obtain it.

이어, 중앙 처리 장치(90)는 각 클러스터내 데이터들과 프로토타입의 동질값을 수학식 3과 같이 구하게 된다(S28 단계).Subsequently, the central processing unit 90 obtains the homogeneous values of the data and the prototypes in each cluster as shown in Equation 3 (step S28).

상기 수학식 3에서,는 i 번째 데이터의 j 번째 수치형 속성의 값을 나타내고,는 l 번째 클러스터의 j 번째 수치형 속성의 프로토타입을 나타내며,은 l 번째 클러스터의 수치형 속성에 대한 범주형 속성의 가중치를 나타낸다.In Equation 3, Represents the value of the j th numeric attribute of the i th data, Represents the prototype of the j th numeric attribute of the l th cluster, Denotes the weight of the categorical attribute with respect to the numeric attribute of the l-th cluster.

그리고,는 i번째 데이터의 j번째 범주형 속성의 값을 나타내고,는 l번째 클러스터의 j번째 범주형 속성의 프로토타입을 나타내며, δ는 클러스터내 프로토타입이 차지하는 비율을 나타낸다.And, Represents the value of the j th categorical attribute of the i th data, Denotes the prototype of the j-th categorical attribute of the l-th cluster, and δ denotes the proportion of the prototype in the cluster.

상기은 수치형 데이터에 대한 범주형 데이터의 가중치를 나타내는 것으로서, 수치형 데이터에 대한 표준 편차의 1/2을 적용하여 얻게 된다.remind Is the weight of the categorical data for the numeric data, and is obtained by applying 1/2 of the standard deviation for the numeric data.

즉, 수학식 3이 의미하는 것은 각 클러스터내 데이터들과 프로토타입의 동질값은 가중치가 적용된 범주형 값에 수치형 값을 더하여 얻는 것을 나타낸다.That is, Equation 3 means that homogeneous values of data and prototypes in each cluster are obtained by adding numerical values to weighted categorical values.

중앙 처리 장치(90)는 상기 구한 값들을 근거로 수학식 4와 같은 세분화 알고리즘, 즉 K 프로토타입 알고리즘을 이용하여 유사한 속성을 가지는 개체들간의 군집화를 실행하게 된다(S30 단계).The CPU 90 performs clustering between entities having similar properties by using a segmentation algorithm such as Equation 4, that is, a K prototype algorithm, based on the obtained values (step S30).

수학식 4에서,는 i 번째 데이터의 j 번째 수치형 속성의 값을 나타내고,는 l 번째 클러스터의 j 번째 수치형 속성의 프로토타입을 나타내며,은 l 번째 클러스터의 수치형 속성에 대한 범주형 속성의 가중치를 나타낸다.In Equation 4, Represents the value of the j th numeric attribute of the i th data, Represents the prototype of the j th numeric attribute of the l th cluster, Denotes the weight of the categorical attribute with respect to the numeric attribute of the l-th cluster.

그리고,는 i번째 데이터의 j번째 범주형 속성의 값을 나타내고,는 l번째 클러스터의 j번째 범주형 속성의 프로토타입을 나타내며, δ는 클러스터내 프로토타입이 차지하는 비율을 구하는 함수를 나타낸다.And, Represents the value of the j th categorical attribute of the i th data, Denotes the prototype of the j-th categorical attribute of the l-th cluster, and δ denotes a function for calculating the proportion of prototypes in the cluster.

또한,은 i 데이터가 l 클러스터에 속하는지의 여부를 나타내고,는 I 클러스터부터 k 클러스터까지의 클러스터 중 그 값이 최소인 것을 나타낸다. 즉, 수학식 4가 의미하는 것은 임의의 데이터에 대해 범주형 데이터와 수치형 데이터를 포함한 모든 데이터의 거리의 합을 각 클러스터별로 최소화하여 그 값이 가장 최소인 클러스터를 얻는다는 것이다.Also, Indicates whether i data belongs to l cluster, Indicates that the value among clusters from cluster I to cluster k is the minimum. That is, Equation 4 means that arbitrary data In this case, the sum of distances of all data including categorical data and numerical data is minimized for each cluster to obtain the cluster having the minimum value.

중앙 처리 장치(90)는 임의의 데이터에 대해 상기 그 값이 가장 최소인 클러스터를 배정하고, 이러한 작업을 모든 데이터에 대해 실행하여 군집화를 이루게 된다.The central processing unit 90 is random data For clusters, we assign the cluster with the smallest value, and perform this operation on all data.

상기 수학식 4의 알고리즘에 따라 클러스터내 유사한 속성을 가지는 개체들간의 군집화를 실행한 중앙 처리 장치(90)는, 설정된 모든 클러스터에 대해 K 프로토타입에 의한 군집화를 실행했는지를 판단하고(S32 단계), 아직 모든 클러스터에 대해 실행하지 않았다면 다음 클러스터를 대상으로 상기 S20 단계로 복귀하여 K 프로토타입에 의한 군집화를 실행하게 된다(S34 단계).According to the algorithm of Equation 4, the central processing unit 90 that performs the clustering between entities having similar attributes in the cluster determines whether or not clustering by the K prototype is performed for all the clusters set (step S32). If not already performed for all clusters, the process returns to step S20 for the next cluster and performs clustering by the K prototype (step S34).

이후, 알고리즘 스레드(50)를 구동하여 각 클러스터마다 K 프로토타입에 의한 군집화를 실행한 중앙 처리 장치(90)는 그 결과를 결과 분석 스레드(60)를 구동하여 수학식 5와 같이 계산하여 모니터(80)로 그 결과를 표시 출력하는데, 수치형 속성일 경우 도 5a와 같이 막대 그래프로 표시 출력하고, 범주형 속성일 경우 도넛 그래프로 표시 출력하게 된다(S36 단계).Subsequently, the central processing unit 90 which drives the algorithm thread 50 to perform clustering by the K prototype for each cluster drives the result analysis thread 60 to calculate the result as shown in Equation 5 and monitor ( 80, the result is displayed and output. In the case of a numerical attribute, the result is displayed as a bar graph as shown in FIG. 5A.

수학식 5에서, "클러스터i*독립변수j*요인k"는 클러스터 i에 속하며, 독립변수 j인 요인 k를 나타낸다.In Equation 5, " cluster i * independent variable j * factor k " belongs to cluster i and represents factor k which is independent variable j.

결과 분석 스레드(60)는 모니터(80)로 출력된 군집화 결과를 수치형 속성일 경우, 막대 그래프를 이용하여 속성내의 각 요인별 클러스터에서의 비율과 전체에서의 비율을 사용자가 쉽게 비교할 수 있도록 구별되는 색으로 표시하며, 범주형 속성일 경우 도넛 그래프를 이용하여 속성내의 각 요인별 클러스터에서의 비율은 안쪽에, 전체에서의 비율은 바깥쪽에 표시함으로써 사용자가 쉽게 알아볼 수 있도록 한다.The result analysis thread 60 distinguishes the clustering result output to the monitor 80 by using a bar graph so that a user can easily compare the ratio in the cluster for each factor in the attribute with the ratio in the whole in the case of the numeric attribute. In the case of categorical attributes, the donut graph is used to display the ratio in the cluster for each factor in the attribute and the ratio in the whole to the outside.

즉, 상기 실시예에 의하면 사용자가 분석해야 할 대상이 되는 데이터 베이스와, 데이터 분석에 이용할만한 독립 변수, 종속 변수 및 클러스터 수를 입력하게 되면, 분석 시스템은 사용자가 입력한 데이터를 근거로 K 프로토타입(K-Prototype) 알고리즘을 이용하여 유사한 속성을 가지는 개체들간의 군집화를 이루고, 각 그룹별로 종속 변수, 독립 변수 요인별 비율을 통계 처리하여 그 결과를 화면상에 출력해 주게 된다.That is, according to the embodiment, when a user inputs a database to be analyzed and the number of independent variables, dependent variables, and clusters to be used for data analysis, the analysis system uses the K protocol based on the data input by the user. Using the K-Prototype algorithm, clustering of entities with similar properties is achieved, and the ratios of dependent and independent variable factors are statistically processed for each group and the result is displayed on the screen.

본 발명은 상기한 실시예에 한정되지 않고, 본 발명의 기술적 요지를 벗어나지 않는 범위내에서 다양하게 수정 및 변경 실시할 수 있음은 이 기술 분야에서 통상의 기술을 가진 자라면 누구나 이해할 수 있을 것이다.It will be appreciated by those skilled in the art that the present invention is not limited to the above embodiments, and that various modifications and changes can be made without departing from the technical scope of the present invention.

전술한 바와 같이 본 발명에 의하면, 사용자의 간단한 입력 조작만으로도 고객 데이터 베이스의 고객 특성을 분석하여 그 결과를 쉽게 알아볼 수 있도록 표시 출력하게 됨으로써, 사용자는 이를 근거로 용이하게 판매 전략을 수립할 수 있는 효과가 있다.As described above, according to the present invention, the user's characteristics of the customer database are analyzed and displayed and outputted so that the result can be easily recognized by a simple input operation of the user, so that the user can easily establish a sales strategy based on this. It works.

Claims

In the customer management data analysis method for visually displaying and outputting the data in the database by segmenting, clustering and easily distinguishing the contents of the database using a K prototype algorithm based on the database and the element data to be analyzed by the user.

(a) outputting a data input form to a monitor to receive raw data required for data analysis from a key input unit;

(b) temporarily storing the raw data input through the data input form in a data storage unit;

(c) standardizing data in a database based on the row data temporarily stored in step (b) to generate normalized data in a central processing unit;

(d) segmenting and clustering data in a database using a K prototype algorithm based on the standardized data to generate clustered data at the central processing unit;

(e) outputting the clustered data to the monitor using a chart

Wherein, the K prototype algorithm minimizes the sum of distances of all data, including categorical data and numerical data, for each cluster, for each cluster, and selects the cluster having the minimum value for the random data. Customer management data analysis method characterized in that the algorithm determined by the cluster.

The method of claim 1,

The data required for analyzing the data is a database, cluster number, independent variable, dependent variable, characterized in that the customer management data analysis method.

The method of claim 1,

Standardizing data in the input database is characterized in that the central processing unit converts the numeric data into data between 0 and 1.

The method of claim 1,

Segmentation and clustering using the K prototype algorithm is calculated by the central processing unit as a Euclidean distance function for numerical data, and for categorical data as a percentage of prototypes in a cluster. How to analyze customer care data.

The method of claim 1,

The K prototype algorithm

, Where Is the value of the j th numeric attribute of the i th data, Is the prototype of the j th numeric attribute of the l th cluster, Is the weight of the categorical attribute for the numeric attribute of the l th cluster, Is the value of the j th categorical attribute of the i th data, Is the prototype of the j-th categorical attribute of the l-th cluster, δ is a function that calculates the proportion of prototypes in the cluster, Whether i data belongs to l cluster, Is a cluster from the I cluster to the k cluster, the value of the customer management method of the data analysis system, characterized in that the minimum.

The method of claim 4, wherein

The Euclidean distance function is

Where j is an independent or dependent variable, i is a cluster, x is numeric data, q is a prototype within the cluster, l is a cluster number, r is a symbolic symbol representing a numeric type, Is the value of the j th numeric attribute of the i th data, Represents a prototype of the j th numeric attribute of the l th cluster, and the meaning is a data analysis system characterized by obtaining a distance value between the i th data of the j th numeric attribute and the j th prototype of the l th cluster Customer care method.

The method of claim 4, wherein

The proportion of prototypes in the cluster

, Where Is the value of the j th numeric attribute of the i th data, Is the prototype of the j th numeric attribute of the l th cluster, Is the weight of the categorical attribute for the numeric attribute of the l th cluster, Is the value of the j th categorical attribute of the i th data, Is the prototype of the j-th categorical attribute of the l-th cluster, δ is a function that calculates the proportion of prototypes in the cluster, Denotes the weight of the categorical data with respect to the numeric data, and the weight applies 1/2 of the standard deviation for the numeric data. A method for managing a customer of a data analysis system, characterized by indicating that the numerical value is added to the categorical value to which the weight is applied.

In the data analysis system having a customer management function that analyzes the contents of the database in which the customer information is stored using the K prototype based on the conditions entered by the user and displays the result.

A key input unit for inputting or selecting data;

A data entry thread presenting a data entry form for data entry and passing the input data to the central processing unit;

A data conversion thread responsible for converting data in the database into standardized data,

DB matching thread that is responsible for matching the data in the database to be analyzed to the central processing unit,

An algorithm thread for executing a K prototype algorithm on data normalized based on the input data input from the key input unit,

A result analysis thread in charge of the function of tabulating the results computed by the algorithm thread,

A data storage unit for temporarily storing data inputted through the key input unit or intermediate values of algorithms processed;

A monitor that visually outputs the operation execution status of the data analysis system or the analyzed result of the database;

Based on the data input through the key input unit, the data input thread, the data conversion thread, and the algorithm thread are driven to read and analyze the contents of the database to be analyzed, and output the analysis result to the monitor. Central processing unit

Wherein, the K prototype algorithm minimizes the sum of distances of all data, including categorical data and numerical data, for each cluster, for each cluster, and selects the cluster having the minimum value for the random data. A data analysis system having a customer management function, which is an algorithm determined by a cluster.

The method of claim 8,

The algorithm thread calculates the Euclidean distance function if the target of the operation is numeric data and calculates the ratio of the prototype of the cluster in the case of categorical data. A data analysis system with a customer care function, characterized by applying one half of the standard deviation for the type data.

The method of claim 8,

The data input form output by the data input thread includes a database input box, a cluster input box, an independent variable input box, a dependent variable input box, an attribute list input box, an independent variable selection button, and a dependent variable selection button. Data analysis system.

The method of claim 8,

The data conversion thread is a data analysis system having a customer management function, characterized in that to standardize the numeric data of the data read from the database between 0 and 1.

The method of claim 8,

The result analysis thread outputs a bar graph when the analysis result is a numerical type, and outputs the result as a donut graph when categorical.

The method of claim 8,

The algorithm thread computes the Euclidean distance function for numeric data and calculates the ratio of prototypes in the cluster for categorical data.

, Where Is the value of the j th numeric attribute of the i th data, Is the prototype of the j th numeric attribute of the l th cluster, Is the weight of the categorical attribute for the numeric attribute of the l th cluster, Is the value of the j th categorical attribute of the i th data, Is the prototype of the j-th categorical attribute of the l-th cluster, δ is a function that calculates the proportion of prototypes in the cluster, Whether i data belongs to l cluster, Is a data analysis system with a customer management function, characterized in that the value of the cluster from cluster I to cluster k is the minimum.