KR100823853B1

KR100823853B1 - Personalized recommendation method using customer's demographic information and purchase history

Info

Publication number: KR100823853B1
Application number: KR1020060092219A
Authority: KR
Inventors: 이수원; 김완섭
Original assignee: 숭실대학교산학협력단
Priority date: 2006-09-22
Filing date: 2006-09-22
Publication date: 2008-04-21
Also published as: KR20080026952A

Abstract

본 발명은 개인화 콘텐츠 추천 방법에 관한 것으로, 보다 자세하게는 특정 콘텐츠에 대한 특정 고객의 예상 선호도 예측을 통해 고객이 선호할 가능성이 높은 콘텐츠들을 선별하여 추천해 주기 위한 개인화 콘텐츠 추천 방법에 관한 것이다.The present invention relates to a personalized content recommendation method, and more particularly, to a personalized content recommendation method for selecting and recommending contents which are likely to be favored by a customer by predicting a specific customer's expected preference for a specific content.

본 발명의 개인화 콘텐츠 추천 방법은 임의의 콘텐츠에 관한 카이스퀘어 분석을 위하여 유의수준에 기반한 기준값을 선정하는 제1단계; 기 저장된 고객 인구통계학적 속성 중 하나의 속성을 추출하는 제2단계; 상기 콘텐츠에 관하여 상기 속성에 따른 카이스퀘어 분석을 수행하는 제3단계; 상기 분석에 따른 결과값과 상기 기준값을 상호 비교하는 제4단계; 및 상기 기준값보다 큰 상기 결과값을 상기 콘텐츠에 대한 상기 속성의 예상 선호도 값으로 확보하는 제5단계를 포함한다.The personalized content recommendation method of the present invention includes a first step of selecting a reference value based on a significance level for a chi-square analysis of arbitrary content; Extracting an attribute of one of pre-stored customer demographic attributes; Performing a chi-square analysis according to the attribute on the content; A fourth step of comparing the result value according to the analysis with the reference value; And a fifth step of securing the result value larger than the reference value as an expected preference value of the attribute for the content.

개인화 콘텐츠 추천, 인구통계학적 정보, 콘텐츠의 구매패턴, 선호도 예측 Personalized content recommendation, demographic information, purchase pattern of content, preference prediction

Description

Personalized recommendation method using customer's demographic information and purchase history}

도 1은 본 발명에 따른 개인화 콘텐츠 추천 방법의 일실시예를 나타내는 구성도,1 is a block diagram showing an embodiment of a method for recommending personalized content according to the present invention;

도 2는 본 발명에 따른 콘텐츠의 구매패턴 분석 방법의 일실시예를 나타내는 순서도,2 is a flowchart illustrating an embodiment of a method of analyzing a purchase pattern of content according to the present invention;

도 3은 본 발명에 따른 인구통계학적 속성정보 기반 추천 방법의 일실시예를 나타내는 순서도,3 is a flowchart illustrating an embodiment of a demographic attribution information based recommendation method according to the present invention;

도 4는 본 발명에 따른 구매확률 기반 협력적 추천 방법의 일실시예를 나타내는 순서도이다.4 is a flowchart illustrating an embodiment of a purchasing probability based cooperative recommendation method according to the present invention.

여기서, 콘텐츠란, 쇼핑몰에서 판매되는 유형(도서, 가전제품 등) 또는 무형(VOD 및 TV시청 등)의 상품 및 서비스들을 의미한다.Here, the content refers to goods and services of the type (book, home appliance, etc.) or intangible (VOD, TV viewing, etc.) sold in the shopping mall.

요즘 온라인 및 오프라인의 쇼핑몰에서 상품과 고객의 수가 증가하면서 고객들에게 개인화된 콘텐츠를 추천해 주는 방법들이 요구되고 있다.Recently, as the number of products and customers increases in online and offline shopping malls, a method of recommending personalized contents to customers is required.

개인화된 추천을 위해 기존에 널리 사용되고 있는 방법으로는 협력적 추천 방법이 있다. 이 방법은 고객이 과거에 구매한 콘텐츠들을 토대로 구매한 콘텐츠가 비슷한 다른 고객들을 찾아 그들이 공통적으로 선호하는(구매한) 콘텐츠들을 추천해주는 방법이다. 또한 고객이 구매한 콘텐츠들을 다른 콘텐츠들과의 유사도를 계산하여 추천하는 약간 변경된 방법도 많이 사용되고 있다. A widely used method for personalized recommendation is the collaborative recommendation method. This is a way to find other customers with similar content, based on what they have purchased in the past, and recommend their common favorite. In addition, a slightly changed method of recommending the content purchased by the customer by calculating the similarity with other contents is widely used.

이러한 협력적 추천은 사회적 여과(Socail Filtering)라고도 하며 유사한 기호를 가지는 다른 사람들의 선호도(Preference Rating)를 고려하여 고객이 구매하지 않은 아이템을 추천한다. 이 방법은 목표 고객과 다른 고객들과의 유사한 정도와 아이템에 대해 입력한 선호도를 고려하여 각 콘텐츠에 대한 선호도를 예측한 후 고객이 구매하지 않은 콘텐츠에 대한 선호도 예측 값이 큰 상위 콘텐츠들을 추천한다.This collaborative recommendation is also called social filtering and recommends items that the customer has not purchased in consideration of the preference ratings of others with similar preferences. This method predicts the preference for each content in consideration of the similarity between the target customer and other customers and the input preference for the item, and then recommends the higher contents having the high preference prediction value for the content that the customer has not purchased.

협력적 추천은 크게 두 단계로 구성되는데, 첫 번째 단계는 고객간의 유사도를 구하는 과정이고, 두 번째 단계는 예측값을 구하는 과정이다. 일반적으로 고객 간의 유사도를 계산할 경우 코사인(Cosine)이나 피어슨 상관계수(Pearson Correlation Coefficient)를 이용한다. 코사인을 사용한 유사도 계산 방법은 고객들이 부여한 선호도(Rating) 정보를 벡터(Vector)로 표현하여 이들 벡터간의 코사인 값을 구하는 것이다. [수학식1]은 고객 a와 고객 u에 대하여 코사인에 의해 유사도를 계산하는 식이며, [수학식2]는 상관계수식에 의해 두 고객간의 유사도를 계산하는 식이다. Collaborative recommendation consists of two steps: the first step is the process of finding similarity between customers, and the second step is the process of obtaining forecasts. Generally, Cosine or Pearson Correlation Coefficient is used to calculate the similarity between customers. The similarity calculation method using cosine calculates cosine values between these vectors by expressing rating information given by customers as a vector. [Equation 1] is a formula for calculating the similarity by the cosine for the customer a and the customer u, [Equation 2] is the formula for calculating the similarity between the two customers by the correlation coefficient.

여기서,

와

는 고객 a와 u가 각 아이템에 대해 부여한 선호도를 벡터로 표현한 것이며,

와

는 고객 a와 u가 부여한 선호도 값들의 평균값이다. here,

Wow

Is a vector of preferences given by customers a and u for each item,

Wow

Is the average value of the preference values given by the customers a and u.

[수학식3]은 [수학식1]과 [수학식2]에 의해 얻어진 고객 간의 유사도를 가중치로 하여 목표 고객의 상품에 대한 선호도를 예측하는 식이다. 여기서

는 고객 a의 콘텐츠 i에 대한 선호도 예측값이며,

는 고객 u가 콘텐츠 i에 대하여 부여한 선호도 값이고,

는 고객 a와 u의 유사도로서 가중치로 사용된다. n은 a를 제외한 모든 고객의 수를 의미한다. [Equation 3] is a formula for predicting the preference of the target customer of the product by the weight of the similarity between the customers obtained by [Equation 1] and [Equation 2]. here

Is a predictive preference for content i of customer a,

Is the affinity value given by customer u for content i,

Is the weight of the similarity between customers a and u. n means the number of all customers except a.

종래의 기술로 상기 협력적 추천방법을 기초로 하는 아이템 기반의 협력적 추천방법이 있다. 기존의 방법이 고객간 유사도를 사용하는 것과는 달리 이 방법은 콘텐츠(상품)간 유사도를 이용하여 추천하는 방법으로서 유사도를 측정하는 방법과 유사도를 사용하여 추천하는 방법으로 나누어 추천이 이루어진다.In the prior art, there is an item-based collaborative recommendation method based on the collaborative recommendation method. Unlike the existing method using similarity between customers, this method is recommended by using similarity between contents (commodities). The recommendation is divided into a method of measuring similarity and a recommendation using similarity.

상기의 협력적 추천 방법들의 대표적인 문제는 고객들의 과거 구매기록이나 명시된 선호도 기록이 없는 경우 추천을 할 수 없다는 것이다. 또한 과거 구매, 선호도 기록이 있더라도 충분치 못하다면, 추천을 제대로 할 수 없다. 따라서, 신규고객에 대해서 추천 서비스를 제대로 제공해 줄 수 없는 단점이 있다. 이러한 경우 고객의 나이, 성별, 직업, 거주지 등의 인구통계학적 속성 정보를 바탕으로 추천을 해 주는 것이 필요하지만, 기존의 협력적 방법만 사용할 경우에는 고객의 인구통계학적 속성을 반영하지 못하는 단점이 있다.A representative problem with these collaborative recommendation methods is that they cannot make a recommendation without a customer's past purchase history or a specified preference record. Also, even if you have past purchases and preference records, you can't make a recommendation. Therefore, there is a disadvantage that can not properly provide a recommendation service for new customers. In this case, it is necessary to make recommendations based on demographic attribute information such as the age, gender, occupation, and place of residence of the client. have.

그리고 이 방법은 사용자의 상품에 대한 선호도(Preference Rating)값 정보가 있다는 것을 가정하고 있지만 실무의 많은 경우 선호도 정보가 충분하지 않고 단지 구매/비구매 정도의 정보만이 있기 때문에 기존 식을 적용하는 것은 적합하지 못하다. 또한, 종래의 아이템 기반 협력적 추천 방법은 두 아이템을 기준으로 측정되는 유사도가 추천시 구매의 방향성 정보를 반영하지 못한다는 단점이 있다.And this method assumes that there is preference rating value information about the user's product, but in many cases, it is appropriate to apply the existing equation because there is not enough preference information and only purchase / non-purchase information. I can't. In addition, the conventional item-based collaborative recommendation method has a disadvantage that the similarity measured based on two items does not reflect the directional information of the purchase at the time of recommendation.

따라서, 본 발명의 목적은 카이스퀘어 분석을 이용하여 콘텐츠의 인구통계학적 속성에 대한 구매패턴을 분석함에 있다.Accordingly, an object of the present invention is to analyze purchase patterns for demographic attributes of content using chisquare analysis.

또한, 본 발명의 다른 목적은 고객의 인구통계학적 속성정보를 기반으로 특정 콘텐츠에 대한 특정 고객의 예상 선호도를 분석함에 있다.In addition, another object of the present invention is to analyze the expected preference of a specific customer for a specific content based on the demographic attribute information of the customer.

또한, 본 발명의 또 다른 목적은 콘텐츠의 구매확률에 기반한 유사도 측정을 통해 특정 콘텐츠에 대한 특정 고객의 예상 선호도를 분석함에 있다.In addition, another object of the present invention is to analyze the expected preference of a specific customer for a specific content by measuring the similarity based on the probability of purchasing the content.

또한, 본 발명의 또 다른 목적은 고객의 인구통계학적 속성정보 및 고객의 과거 콘텐츠 구매이력을 모두 고려하여 특정 고객의 특정 콘텐츠에 대한 예상 선호도를 분석함에 있다.In addition, another object of the present invention is to analyze the expected preference for a specific content of a specific customer in consideration of both the demographic attribute information of the customer and the customer purchase history of the past content.

본 발명의 목적은 임의의 콘텐츠에 관한 카이스퀘어 분석을 위하여 유의수준에 기반한 기준값을 선정하는 제1단계; 기 저장된 고객 인구통계학적 속성 중 하나의 속성을 추출하는 제2단계; 상기 콘텐츠에 관하여 상기 속성에 따른 카이스퀘어 분석을 수행하는 제3단계; 상기 분석에 따른 결과값과 상기 기준값을 상호 비교하는 제4단계; 및 상기 기준값보다 큰 상기 결과값을 상기 콘텐츠에 대한 상기 속성의 예상 선호도 값으로 확보하는 제5단계를 포함하는 개인화 콘텐츠 추천 방법에 의해 달성된다.An object of the present invention is the first step of selecting a reference value based on the significance level for the chi-square analysis of any content; Extracting an attribute of one of pre-stored customer demographic attributes; Performing a chi-square analysis according to the attribute on the content; A fourth step of comparing the result value according to the analysis with the reference value; And a fifth step of securing the resultant value larger than the reference value as an expected preference value of the attribute for the content.

본 발명의 다른 목적은 제1고객의 속성 중 제1콘텐츠의 구매에 영향을 주는 적어도 하나 이상의 특징속성을 추출하는 제1단계; 상기 적어도 하나 이상의 특징속성에 속하는 고객 집합인 유사고객군을 추출하는 제2단계; 상기 제1콘텐츠에 대한 상기 유사 고객군의 구매율을 계산하는 제3단계; 및 상기 제3단계의 결과값을 상기 제1콘텐츠에 대한 상기 제1고객의 제1예상 선호도 값으로 확보하는 제4단계를 포함하는 개인화 콘텐츠 추천 방법에 의해 달성된다.Another object of the present invention is a first step of extracting at least one feature attribute that affects the purchase of the first content of the first customer attributes; A second step of extracting a similar customer group which is a set of customers belonging to the at least one feature attribute; A third step of calculating a purchase rate of the similar customer group for the first content; And a fourth step of securing the resultant value of the third step as the first expected preference value of the first customer for the first content.

본 발명의 또 다른 목적은 제1콘텐츠에 대한 제1고객의 예상 선호도를 측정함에 있어서, 상기 제1고객이 제2콘텐츠를 구매한 경우, 상기 제2콘텐츠를 구매한 고객의 수를 추출하는 제1단계; 상기 제1콘텐츠 및 제2콘텐츠를 구매한 고객의 수를 추출하는 제2단계; 상기 제2콘텐츠를 구매한 고객 중 상기 제1콘텐츠를 구매할 확률을 계산하는 제3단계; 상기 제1고객이 구매한 모든 콘텐츠에 대해 상기 제1단계 내지 제3단계를 수행하는 제4단계; 및 상기 제4단계의 결과값을 이용하여 상기 제1콘텐츠에 대한 상기 제1고객의 예상 선호도 값을 확보하는 제5단계를 포함하는 개인화 콘텐츠 추천 방법에 의해 달성된다.Another object of the present invention is to measure the expected preferences of the first customer for the first content, when the first customer purchased the second content, extracting the number of customers who purchased the second content Stage 1; Extracting the number of customers who purchased the first content and the second content; A third step of calculating a probability of purchasing the first content among customers who have purchased the second content; A fourth step of performing the first to third steps on all content purchased by the first customer; And a fifth step of securing the expected preference value of the first customer for the first content by using the result value of the fourth step.

이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다.Prior to this, terms or words used in the specification and claims should not be construed as having a conventional or dictionary meaning, and the inventors should properly explain the concept of terms in order to best explain their own invention. Based on the principle that can be defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention.

따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Therefore, the embodiments described in the specification and the drawings shown in the drawings are only one of the most preferred embodiments of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 개인화 콘텐츠 추천 방법의 일실시예를 나타내는 구성도이다. 먼저, 본 발명에 따른 카이스퀘어(

) 분석부를 통한 콘텐츠의 구매패턴 분석방법을 살펴보면 다음과 같다.1 is a block diagram showing an embodiment of a personalized content recommendation method according to the present invention. First, the Kay square according to the present invention (

The method of analyzing the purchase pattern of the content through the analysis unit is as follows.

특정 콘텐츠와 인구통계학적 속성과의 연관성 여부를 검증하기 위해서 카이스퀘어(Chi-square) 검정을 이용한다. 카이스퀘어 검정을 이용하면 범주형 자료에 대해 각 속성 간의 상호 독립성 또는 상호 연관성 여부를 검정할 수 있다. Chi-square tests are used to verify the association between specific content and demographic attributes. The chi-square test can be used to test the interdependence or correlation between attributes for categorical data.

예를 들어, 선거철에 지역과 후보에 대한 지지도 사이에 연관성이 있는가를 확인하고자 할 때 사용될 수도 있다. For example, it may be used to determine if there is a link between support for regions and candidates during the election.

카이스퀘어 검정은 두 범주의 관찰된 빈도수와 그와 대응되는 기대값을 비교하는 방법을 사용한다. 관찰된 빈도수가 기대값에 가까울수록 두 범주가 서로 독립적이라고 할 수 있으며, 반대로 기대값과 차이가 클수록 상호 연관성이 높다고 볼 수 있다. 다음 [수학식4]는 카이스웨어 검정을 나타내는 식이다.The chisquare test uses a method of comparing the observed frequencies of the two categories with their corresponding expected values. The closer the observed frequency is to the expected value, the more independent the two categories. On the contrary, the larger the difference, the higher the correlation. Equation 4 is an expression representing the caseware test.

여기서, k는 범주의 개수, n_i는 빈도수, e_i는 기대값을 의미한다. 이 식에 의해 계산된 카이스퀘어 값은 자유도가 k-1인 카이스퀘어 분포에 접근하게 된다. Where k is the number of categories, n _i is the frequency, and e _i is the expected value. The chi-square value calculated by this equation approaches the chi-square distribution with k-1 degrees of freedom.

따라서, 두 개의 범주 A, B가 상호 독립적임을 검증하기 위해 귀무가설(null hypothesis) H₀(A와 B는 독립적이다)를 세웠을 경우, 유의수준 α에서 귀무가설 H₀는

<

일 때 기각된다. 여기서 n은 자유도(degree of freedom), α는 검정의 유의수준를 의미하며, k-1의 값으로 정해진다. 즉,

>

이면 두 속성 A, B가 연관성이 있다고 볼 수 있으며, 반대로

<

이면 두 속성 A, B가 독립적이라고 볼 수 있다. Thus, when the null hypothesis H ₀ (A and B are independent) is established to verify that the two categories A and B are mutually independent, the null hypothesis H ₀ is

<

Dismissed when Where n is the degree of freedom, α is the significance level of the test, and is determined by the value of k-1. In other words,

>

If two properties A and B are related,

<

, The two attributes A and B are independent.

자유도(n)Degrees of freedom (n) 유의수준(α)Significance level (α) 0.9900.990 0.9750.975 0.9500.950 0.9000.900 0.1000.100 0.0500.050 0.0250.025 0.0100.010 1One 0.0000.000 0.0010.001 0.0040.004 0.0160.016 2.7062.706 3.8413.841 5.0245.024 6.6356.635 22 0.0200.020 0.0510.051 0.1030.103 0.2110.211 4.6054.605 5.9915.991 7.3787.378 9.2109.210 33 0.1150.115 0.2160.216 0.3520.352 0.5840.584 6.2516.251 7.8157.815 9.3489.348 11.3411.34 44 0.2970.297 0.4840.484 0.7110.711 1.0641.064 7.7797.779 9.4889.488 11.1411.14 13.2813.28 55 0.5540.554 0.8310.831 1.1451.145 1.6101.610 9.2369.236 11.0711.07 12.8312.83 15.0915.09 66 0.8720.872 1.2371.237 1.6351.635 2.2042.204 10.6410.64 12.5912.59 14.4514.45 16.8116.81 77 1.2391.239 1.6901.690 2.1672.167 2.8332.833 12.0212.02 14.0714.07 16.0116.01 18.4818.48 88 1.6461.646 2.1802.180 2.7332.733 3.4903.490 13.3613.36 15.5115.51 17.5417.54 20.0920.09 99 2.0882.088 2.7002.700 3.3253.325 4.1684.168 14.6814.68 16.9216.92 19.0219.02 21.6721.67 1010 2.5582.558 3.2473.247 3.9403.940 4.8654.865 15.9915.99 18.3118.31 20.4820.48 23.2123.21

[표1]은 카이스퀘어 분포표로서, 자유도가 n이고, 유의수준이 α일 때의

의 검정의 기준값을 보여준다. 카이스퀘어 독립성 검정을 콘텐츠와 인구통계학적 속성간의 연관성 여부에 적용하는 경우 범주를 {구매, 비구매}의 두 가지로 구분할 수 있으므로, 자유도가 1인 카이스퀘어 분포를 따른다. 다음 [표2]는 속성과 콘텐츠 구매와의 분포를 나타낸다.[Table 1] is a chi-square distribution table, when the degree of freedom is n and the significance level is α

Shows the reference value of the test. When applying the square independence test to the relationship between content and demographic attributes, the categories can be classified into two categories, {purchasing and non-purchasing}. [Table 2] shows the distribution of attributes and content purchases.

구매자수Number of buyers 비구매자수Non-buyers 고객수Number of customers 인구통계학적 속성 소속고객Demographic Attributes Customer Num(Buy∩Profile)Num (Buy∩Profile) Num(~Buy∩Profile)Num (~ Buy∩Profile) Num(Profile)Num (Profile) 전체고객All customers Num(Buy∩Total)Num (Buy∩Total) Num(~Buy∩Total)Num (~ Buy∩Total) Num(Total)Num (Total)

카이스퀘어 검정식을 콘텐츠에 대한 고객의 속성과 연관성 검증에 사용하는 경우 [수학식5]와 같이 식을 구성할 수 있다. When the chi-square test is used to verify the customer's attributes and associations with the content, the equation can be constructed as shown in [Equation 5].

[수학식5]에서 Buy는 고객 인구통계학적 속성 고객군에서 콘텐츠에 대한 구매자들의 집합이며, ~Buy는 콘텐츠에 대한 비구매자들의 집합이다. 그리고 e_Buy는 해당 인구통계학적 속성의 소속 고객군의 예측 구매 고객수이며, e_~ _Buy는 예측 비구매 고객수를 의미한다. In Equation 5, Buy is a set of buyers for the content in the customer demographic attribute customer group, and ~ Buy is a set of non-buyers for the content. And e _Buy is the predicted number of purchasers of the customer group belonging to the demographic attribute, e _~ _Buy is the number of predicted non-purchase customers.

각 콘텐츠들은 그 쓰임새에 따라 구매되는 패턴이 다르게 나타난다. 예를 들 어, 캐주얼 핸드백은 20대 여성인 고객들에게 주로 구매되고, 또 골프웨어는 직업이 전문직이고, 50대 및 60대 남성들에게 주로 구매된다. 이러한 콘텐츠의 구매 패턴을 찾기 위해서 각 콘텐츠에 대하여 카이스퀘어 검정식인 [수학식5]를 사용하여 콘텐츠별로 구매에 영향을 미치는 인구통계학적 속성을 추출할 수 있다. Each content has a different pattern of purchase depending on its use. For example, casual handbags are primarily purchased for customers in their 20s, and golf wear is primarily for professionals and men in their 50s and 60s. In order to find a purchase pattern of such contents, demographic attributes affecting purchases can be extracted for each content by using Equation 5, which is the chi-square test formula for each content.

범주는 {구매, 비구매}의 두 가지로 구분되므로, 자유도는 1이며, 독립성 판별을 위한 유의수준 α의 값을 지정하여

의 값을 기준으로 속성과 콘텐츠구매의 연관성을 판별할 수 있다. 본 발명의 일실시예에서는 유의수준 0.05(신뢰도 95%)에서 콘텐츠별로 모든 속성과의 연관성 여부를 검정한다. 채택되지 않은 속성 정보는 인구통계학적 속성 기반 추천에서 사용되지 않는다. 즉, 콘텐츠의 구매에 영향을 미치지 않는 속성은 추천 시 사용되지 않는다. 예를 들어, 여성용 핸드백인 콘텐츠 A의 성별에 대한 구매 분포가 아래의 [표3]과 같다고 할 때, 카이스퀘어 검정은 이 콘텐츠의 고객 인구통계학적 속성에 의한 콘텐츠 구매 패턴을 파악할 수 있도록 한다.The categories are divided into two types, {purchasing and non-purchasing}, so the degree of freedom is 1 and the value of significance level α for determining independence

The correlation between the attribute and the content purchase can be determined based on the value of. In an embodiment of the present invention, the correlation between all attributes is tested for each content at a significance level of 0.05 (confidence of 95%). Unaccepted attribution information is not used in demographic attribution based recommendations. That is, attributes that do not affect the purchase of the content are not used in the recommendation. For example, suppose that the distribution of purchases for the gender of content A, a women's handbag, is as shown in Table 3 below.

성별gender 여성female 남성male 미기입Unfilled 합계Sum 구매purchase 1515 3030 55 5050 비구매Non-purchase 210210 170170 2525 450450 합계Sum 220220 200200 3030 500500

관측 구매 고객수Observed purchases 예측 구매 고객수Estimated Purchases 관측 비구매 고객수Observed non-purchased customers 예측 비구매 고객수Forecast Non-Purchased Customers 카이스퀘어Kays Square 의미meaning 여성female 1515 2222 205205 198198 2.4742.474 비선호Unfavorable 남성male 3030 2020 170170 180180 5.5555.555 선호preference 미기입Unfilled 55 33 2525 2727 1.4811.481 무상관Correlation

[표3]의 데이터에 대하여 카이스퀘어 분석을 적용하면 [표4]와 같은 결과를 얻을 수 있다. [표4]에서 예측 구매수와 예측 비구매수는 [수학식6]과 [수학식7]에 의해 계산된 값이며, 선호 및 비선호의 여부는 특정 콘텐츠에 대한 해당 인구통계학적 속성 소속군의 구매율과 전체 고객의 구매율을 비교하여 얻은 것이다. 카이스퀘어 값의 크기를 통해 구매에 영향을 미치는 정도를 알 수 있다.If you apply the chi square analysis on the data in [Table 3], you can get the result as in [Table 4]. In [Table 4], the predicted purchase amount and the predicted non-purchase number are the values calculated by [Equation 6] and [Equation 7], and whether the preferences or preferences are the purchase rate of the corresponding demographic attribute group for the specific content. And the total customer purchase rate. The magnitude of the chi square value indicates how much it affects the purchase.

[표4]는 성별 속성의 모든 속성 값들에 대해 카이스퀘어 분석을 수행한 결과를 보여준다. 유의수준 0.05에서 검정할 경우, 남성인 경우 선호하지 않고, 여성인 경우 선호하는 구매 패턴을 파악할 수 있다. 미기입의 경우 카이스퀘어 값이

(=3.841)보다 작으므로 콘텐츠의 구매와 상관성이 없는 것으로 볼 수 있다. 남성인 경우의 카이스퀘어 값이 여성인 경우의 카이스퀘어 값보다 크므로 남성이 비선호하는 정도가 여성의 선호하는 정도보다 큰 것을 알 수 있다. Table 4 shows the results of performing the chi square analysis on all attribute values of the gender attribute. When the test is performed at the significance level of 0.05, it is possible to identify a purchase pattern that is not preferred for men and preferred for women. In case of unfilled, the square price

Since it is smaller than (= 3.841), it can be regarded as having nothing to do with the purchase of content. It can be seen that the male's preference is higher than the female's preference because the male's chi-square value is larger than the female's chi-square value.

또한, 이 콘텐츠 A의 구매 연령에 대한 구매 분포가 [표5]와 같다고 할 때, 카이스퀘어 분석을 적용하여 [표6]과 같은 결과를 얻을 수 있다. In addition, when the purchase distribution of the purchase age of this content A is as shown in [Table 5], the result as shown in [Table 6] can be obtained by applying chi-square analysis.

20대20's 30대30 spaces 40대40 spaces 50대50 spaces 60대60 spaces 합계Sum 구매purchase 22 88 1010 2525 55 5050 비구매Non-purchase 9898 142142 9090 5555 6565 450450 합계Sum 100100 150150 100100 8080 7070 500500

관측 구매 고객수Observed purchases 예측 구매 고객수Estimated Purchases 관측 비구매 고객수Observed non-purchased customers 예측 비구매 고객수Forecast Non-Purchased Customers 카이스퀘어Kays Square 의미meaning 20대20's 22 1010 9898 9090 7.1117.111 비선호Unfavorable 30대30 spaces 88 1515 142142 135135 3.6293.629 무상관Correlation 40대40 spaces 1010 1010 9090 9090 0.00.0 무상관Correlation 50대50 spaces 2525 88 6565 7272 36.80536.805 선호preference 60대60 spaces 55 77 6565 6363 0.6340.634 무상관Correlation

[표6]은 나이 속성의 모든 속성 값들에 대해 카이스퀘어 분석을 수행한 결과를 보여준다. 나이가 50대의 고객들에게 선호되며, 나이가 20대인 고객들에게는 비선호되는 성향이 강한 것을 볼 수 있다. 나이가 30대, 40대, 60대의 경우는 카이스퀘어 값이

(=3.841)보다 작으므로 콘텐츠의 구매와 상관이 없다고 볼 수 있다. 위와 같이 각 콘텐츠들의 고객 속성에 대한 구매 패턴을 분석하여 모델로 저장하고 이를 인구통계학적 속성정보 기반 추천 시 활용하게 된다. Table 6 shows the results of performing the chi square analysis on all attribute values of the age attribute. It is preferred to customers in their 50s and unfavorable to customers in their 20s. If you are in your 30s, 40s, or 60s,

Since it is less than (= 3.841), it can be regarded as not related to the purchase of content. As described above, the purchase pattern for each customer attribute of each content is analyzed and stored as a model and used for recommendation based on demographic attribute information.

다음으로 본 발명에 따른 인구통계학적 속성정보 기반 추천 방법을 살펴보면 다음과 같다.Next, look at the demographic attribution information based recommendation method according to the present invention.

일반적으로 인구통계학적 추천이란 고객의 나이, 성별, 주소, 직업 등과 같은 인구통계학적 정보를 이용하여 특정 콘텐츠에 대한 선호도를 예측하는 방법이다.In general, demographic recommendation is a method of predicting a preference for a specific content by using demographic information such as a customer's age, gender, address, and occupation.

인구통계학적 추천에서 선호도 예측값은 협력적 추천에서와 같이 일정한 식으로 제공되지는 않는다. 본 발명에서는 고객의 인구통계학적 속성정보를 이용하는 추천 과정을 수식화하여 추천에 적용한다.In demographic recommendations, preference estimates are not provided in a consistent way as in collaborative recommendations. In the present invention, the recommendation process using demographic attribute information of the customer is formulated and applied to the recommendation.

본 발명에서의 인구통계학적 속성정보 기반 추천은 고객의 인구통계학적 속성정보, 즉, 고객의 프로파일에 의하여 해당 콘텐츠에 대한 선호도를 예측하는 추천 기법으로서 콘텐츠별 속성정보를 기반 모델로 사용한다. 또한, 특정 고객의 목표 콘텐츠에 대한 선호도를 예측하고자 할 경우, 특정 고객의 인구통계학적 속성 정보를 입력으로 하여 특정 고객의 목표 콘텐츠에 대한 유사 고객군을 선정한다. 유사 고객군이란, 특정 고객의 속성값들 중 목표 콘텐츠의 구매에 영향을 주는 속성들에 속한 고객들을 의미한다. 유사 고객군에서 특정 콘텐츠에 대한 구매율을 계산하고 그 구매율을 해당 고객의 특정 콘텐츠에 대한 예상 선호도로 예측한다.In the present invention, the demographic attribution information-based recommendation is a recommendation technique for predicting the preference for the corresponding content based on the demographic attribution information of the customer, that is, the profile of the customer. In addition, in order to predict the preference of the target content of the specific customer, similar demographic group of the target content of the specific customer is selected by inputting demographic attribute information of the specific customer. The similar customer group refers to customers belonging to attributes that affect the purchase of target content among attribute values of a specific customer. The similar customer group calculates the purchase rate for a specific content and predicts the purchase rate as the expected preference for the specific content of the corresponding customer.

백화점 데이터에서 'A 골프웨어' 콘텐츠에 대하여 카이스퀘어 검정을 수행하여 콘텐츠의 구매 속성을 추출한 결과는 다음 [표7]과 같다. The result of extracting the purchase attribute of the content by performing the chi-square test on the 'A golf wear' content from the department store data is shown in [Table 7].

선택된 인구통계학적 속성값Selected demographic attribute values 속성에 소속된 고객수Number of customers in the property 소속 고객 중 구매 고객수The number of customers who purchase 카이스퀘어Kays Square 소속 고객군의 구매율Purchase rate of your customer group 선호/비선호 의미Preferred / Unliked Meaning 나이(20대)Age (20s) 17871787 1717 22.0222.02 0.00950.0095 비선호Unfavorable 나이(50대)Age (50s) 14201420 9191 69.4869.48 0.06400.0640 선호preference 나이(60대)Age (60s) 492492 4141 56.3456.34 0.08330.0833 선호preference 고객타입(1)Customer Type (1) 22602260 119119 51.9651.96 0.5280.528 선호preference 직업(전문직)Occupation (professional) 135135 1616 41.2241.22 0.11850.1185 선호preference 등록점(본점)Registration Store (Main Store) 12871287 9595 39.8139.81 0.07380.0738 선호preference 결혼(미혼)Marriage (unmarried) 32293229 4242 26.0126.01 0.01300.0130 선호preference 주택(본인소유)House (owned) 32073207 134134 23.4123.41 0.04170.0417 선호preference 주소(강남구)Address (Gangnam-gu) 706706 3939 19.7719.77 0.05520.0552 선호preference

[표7]을 통해 'A 골프웨어'는 나이가 50대, 60대인 고객에게 선호되고, 직업이 전문직인 고객에게 선호되는 특성들을 볼 수 있다. 결혼이 미혼인 경우와 나이가 20대인 경우에는 콘텐츠의 구매와 연관성이 있으나 구매율을 비교해 볼 때 해당 속성의 경우 선호하지 않는 경향이 있음을 알 수 있다. 그 외에 성별, 가입형태, 주거형태 등의 속성은 이 콘텐츠의 구매와 연관성이 적음을 알 수 있다. [Table 7] shows 'A golf wear' is preferred to customers in their 50's and 60's, and is favored to their professional customers. When marriage is unmarried and age 20s, it is related to the purchase of the contents, but when comparing the purchase rate, the property tends not to be preferred. In addition, it can be seen that attributes such as gender, subscription type, and housing type are less related to the purchase of this content.

한편, 고객 C의 속성의 집합이 아래와 같을 때, 이 고객에게 콘텐츠 'A 골프웨어'를 추천하고자 할 경우, 이 고객의 속성 중 콘텐츠 구매에 영향을 주는 특징 속성 집합을 추출하면 {50대, 전문직}의 속성 집합을 얻게 된다. 즉, 고객 C의 속성 집합과 콘텐츠 A의 속성 집합의 교집합에 해당하는 {50대, 전문직}이 추출된다.On the other hand, if the property set of the customer C is as follows, and the user wants to recommend the content 'A golf wear' to the customer, if the feature property set that affects the content purchase is extracted from the customer's properties, the {50s } You get a set of properties. That is, {50s, professions} corresponding to the intersection of the attribute set of the customer C and the attribute set of the content A are extracted.

고객 C의 속성 집합 = {남성, 50대, 기혼, 전문직, 부산점}Attribute set for customer C = {male, fifties, married, professional, Busan store}

콘텐츠 A의 특징속성 집합 = {나이(20대), 나이(50대), 나이(60대), 고객타입(1), 직업(전문직), 등록점(본점), 결혼(미혼), 주택(본인소유), 주소(강남구)}Characteristic set of content A = {age 20s, age 50s, age 60s, customer type 1, occupation (professional), registration (main office), marriage (unmarried), housing ( Own), address (Gangnam-gu)}

이어, 전체 고객들 중 상기 {50대, 전문직}의 속성값을 가지는 고객들을 추출한다. 이 고객들을 유사 고객군이라고 하며, 이 유사 고객군은 특정 콘텐츠에 대해 특정 고객의 선호도를 반영할 수 있다.Subsequently, customers with attribute values of the {50s, professions} among all the customers are extracted. These customers are called similar customers, which can reflect a particular customer's preference for specific content.

[수학식8]은 특정 고객과 특정 콘텐츠가 주어졌을 경우 유사 고객군을 선정하는 식이다.Equation 8 selects a similar customer group when a specific customer and specific content are given.

여기서, RUG(u,i)는 특정 고객과 특정 콘텐츠에 대한 유사 고객군, Vuser(u)는 특정 고객 u의 인구통계학적 속성 집합, Vcontent(i)는 특정 콘텐츠 i의 카이스퀘어 분석에 의해 추출된 속성집합, x는 다른 임의의 한 고객, Vuser(x)는 고객 x의 인구통계학적 속성 집합을 의미한다.Here, RUG (u, i) is a similar customer group for a specific customer and a specific content, Vuser (u) is a demographic set of characteristics of a specific customer u, and Vcontent (i) is extracted by chi-square analysis of specific content i. Attribute set, x is any other customer, and Vuser (x) is a set of demographic attributes of customer x.

즉, 유사 고객군 RUG(u,i)는 특정 고객의 인구통계학적 속성과 특정 콘텐츠의 추출된 인구통계학적 속성의 교집합 속성을 만족하는 고객들의 집합이다.That is, the similar customer group RUG (u, i) is a set of customers that satisfy the intersection property of the demographic property of the specific customer and the extracted demographic property of the specific content.

본 발명의 인구통계학적 속성정보 기반 추천에서는 특정 콘텐츠에 대한 특정 고객의 예상 선호도 분석시 특정 고객의 선호도를 반영할 수 있는 유사 고객군을 추출하고, 이들의 구매율을 특정 콘텐츠에 대한 특정 고객의 예상 선호도로 사용한다.The demographic attribution information-based recommendation of the present invention extracts a similar customer group that can reflect the preferences of a specific customer when analyzing the expected preference of a specific customer for a specific content, and their purchase rate is estimated by the specific customer for the specific content. Used as.

[수학식9]는 인구통계학적 속성에 기반한 추천 방법의 예상 선호도에 관한 식이다. 여기서, BuyingRate(RUG(a,i))는 콘텐츠 i에 대한 고객 a의 구매율을 의미한다.Equation (9) is an expression of the expected preference of the recommendation method based on demographic attributes. Here, BuyingRate (RUG (a, i)) means the purchase rate of the customer a for the content i.

다음으로 본 발명에 따른 구매확률 기반 협력적 추천 방법을 살펴보면 다음과 같다. 한편, 구매확률 기반 협력적 추천 방법은 콘텐츠의 구매확률을 기반으로 유사도를 구하는 부분과 계산된 유사도를 사용하여 특정 콘텐츠를 추천하는 부분으로 나눌 수 있다.Next, look at the probability of purchase based collaborative recommendation method according to the present invention. On the other hand, the collaborative recommendation method based on purchase probability may be divided into a part of obtaining a similarity based on the purchase probability of the content and a part of recommending specific content using the calculated similarity.

먼저, 구매확률 기반 유사도 측정 방법을 살펴보면 다음과 같다.First, the similarity measurement method based on purchase probability is as follows.

실제 콘텐츠의 구매에 있어 구매데이터에서 방향성이 분명히 존재하지만, 종래 기술인 상기 [수학식1]의 코사인 계산방법이나 [수학식2]의 피어슨 상관계수 계산방법과 같은 기존의 협력적 여과 추천 방법에서 사용하는 유사도 계산방법에서는 S(i,j)=S(j,i)이므로 구매의 방향성 정보를 표현하지 못한다.Although there is a clear direction in the purchase data for the purchase of the actual content, it is used in existing cooperative filtration recommendation methods such as the cosine calculation method of Equation 1 or Pearson correlation coefficient of Equation 2, which are conventional techniques. In the similarity calculation method, S (i, j) = S (j, i), so that the directional information of the purchase cannot be represented.

이에 본 발명에서는 [수학식10]의 계산방법을 통해 구매의 방향성 정보를 확보할 수 있다.Therefore, in the present invention, it is possible to secure the directional information of the purchase through the calculation method of [Equation 10].

여기서, purchase(i)는 콘텐츠 i를 구매한 고객의 수, purchase(i∩j)는 콘텐츠 i와 j 모두를 구매한 고객의 수를 의미한다. Here, purchase (i) means the number of customers who purchased content i, and purchase (i∩j) means the number of customers who purchased both content i and j.

[수학식10]을 통해 콘텐츠 i와 j 각각을 기준으로 유사도를 측정하면 S(i,j) ≠S(j,i)인 결과를 얻을 수 있고, 이는 콘텐츠 i와 j간 방향성 있는 유사도를 측정할 수 있음을 의미하여, 기존의 협력적 여과 추천 방법에 비해 더욱 구체적인 추천이 가능하게 된다.[10] When the similarity is measured based on each of the contents i and j through Equation 10, the result is that S (i, j) ≠ S (j, i), which measures the directional similarity between the contents i and j. This means that it is possible to make a more specific recommendation than the existing collaborative filtration recommendation method.

다음 [수학식11]은 [수학식10]을 통해 확보한 유사도 값을 활용하여 특정 고객에게 특정 콘텐츠를 추천하고자 할 경우 이용되는 예상선호도 계산방법이다.[Equation 11] is a method of calculating the expected preference used when recommending a specific content to a specific customer by using the similarity value obtained through [Equation 10].

여기서, P_cf(a,j)는 고객 a가 콘텐츠 j를 구매할 가능성을 의미하고, confidence(i,j)는 [수학식10]을 통해 확보한 유사도 값으로 콘텐츠 i를 구매한 경우 콘텐츠 j를 구매할 확률을 의미하며, N은 고객 a가 구매한 콘텐츠의 개수를 의미한다. Here, P _cf (a, j) denotes the possibility that the customer a purchases the content j, and confidence (i, j) denotes the content j when the content i is purchased with the similarity value obtained through Equation 10. The probability of purchasing, and N means the number of content purchased by the customer a.

즉, 고객 a가 구매한 모든 콘텐츠 i에 대해 confidence(i,j)를 예상선호도 계산에 사용하며, [수학식11]의 예상선호도 계산은 고객 a가 구매한 콘텐츠(i)들에 대하여 confidence(i,j)를 큰 값에 가중치를 주어 평균하는 계산식이다.That is, confidence (i, j) is used to calculate the preference preferences for all the contents i purchased by the customer a, and the calculation of the preference preferences in [Equation 11] shows the confidence (i) for the contents (i) purchased by the customer a. It is a formula that averages i, j) by weighting a large value.

다음으로 본 발명에 따른 인구통계학적 속성정보 기반 추천 방법과 구매확률 기반 협력적 추천 방법의 결합에 의한 추천방법을 살펴보면 다음과 같다.Next, the recommendation method by combining the demographic attribution information-based recommendation method and the purchase probability-based collaborative recommendation method according to the present invention will be described.

본 발명에서는 인구통계학적 속성정보 기반 추천과 구매확률 기반 협력적 추천을 결합할 때, [수학식12]의 결합식과 같이 결합한다.In the present invention, when combining demographic attribution information-based recommendation and purchase probability-based collaborative recommendation, it is combined as shown in equation (12).

결합하고자 하는 구매확률 기반 협력적 추천은 0에서 1 사이의 값을 가지며, 인구통계학적 속성정보 기반 추천은 -1에서 1 사이의 값을 갖는다. 이 식은 구매확률 기반 협력적 추천에 의해 얻어진 예측 선호도 값을 고객의 인구통계학적 속성을 고려하여 보정해 주는 역할을 한다. 인구통계학적 속성정보 기반 추천의 값이 음수(-)일 경우 보정된 예측 선호도 값이 음수(-)가 될 수 있으나, 이 경우 0으로 변경하여 최종적인 예측 선호도 값을 확보한다.Probability-based collaborative recommendations to be combined have a value between 0 and 1, and demographic attribution information-based recommendations have a value between -1 and 1. This equation corrects the predicted preference value obtained by the probability of purchase based on collaborative recommendation in consideration of the demographic attributes of the customers. If the value of the demographic attribution information-based recommendation is negative (-), the corrected prediction preference value may be negative (-), but in this case, it is changed to 0 to secure the final predicted preference value.

[수학식12]에서 P(a,i)는 구매확률 기반 협력적 추천과 인구통계학적 속성정보 기반 추천을 결합한 식에 의해 최종적으로 얻어진 고객 a의 콘텐츠 i에 대한 예측 선호도 값이다. P_cf(a,i)는 고객 a의 콘텐츠 i의 구매확률 기반 협력적 추천에 의한 예측 선호도 값이며, P_profile(a,i)는 고객 a의 콘텐츠 i의 인구통계학적 속성정보 기반 추천에 의한 예측 선호도 값이다. [수학식12]에서 P_cf(a,i)와 P_profile(a,i)의 값은 동일한 가중치로 결합된다.In Equation 12, P (a, i) is the predicted preference value for the content i of the customer a finally obtained by combining the purchase probability-based collaborative recommendation with the demographic attribution information-based recommendation. P _cf (a, i) is the predicted preference value based on the probability of purchase based on the purchase rate of the content i of the customer a, and P _profile (a, i) is based on the demographic attribution information based recommendation of the content i of the customer a Prediction preference value. In Equation 12, the values of P _cf (a, i) and P _profile (a, i) are combined with the same weight.

상기와 같이 인구통계학적 속성정보 기반 추천과 협력적 추천을 결합할 경우, 한 가지의 추천 방법을 적용하여 나타날 수 있는 추천 오류들을 두 가지 방법의 결과를 결합함으로써 오류율을 낮추고 정확도를 높이게 된다. 한 가지 방법만을 사용하는 경우와 비교할 때 아래와 같은 장점을 갖는다. When combining demographic attribution information-based recommendation and collaborative recommendation as described above, the error rate and accuracy are improved by combining the results of the two methods with the recommendation errors that may appear by applying one recommendation method. Compared to using only one method, it has the following advantages.

첫째, 데이터 희소성(Sparsity)으로 인한 추천 효율 저하를 막을 수 있다. 희소한 성격의 데이터를 입력으로 협력적 추천에 의해 추천할 경우 불충분한 정보 제공으로 인해 추천 정확도가 떨어지게 된다. 그러나 상기 결합 방식을 사용할 경우 인구통계학적 속성에 의한 추천에 의해 보정되어 희소성의 문제를 해결할 수 있다.First, it is possible to prevent a decrease in recommended efficiency due to data sparsity. In the case of recommendation by collaborative recommendation with input of rare nature, insufficient information is provided and recommendation accuracy is reduced. However, when using the combined method, it can be corrected by recommendation by demographic property, thereby solving the problem of scarcity.

둘째, 콘텐츠의 특성에 따른 추천이 가능하다. 콘텐츠에 따라 어떤 콘텐츠는 나이, 성별, 직업 등 특정한 인구통계학적 속성에서 구매되는 패턴이 있고, 어떤 콘텐츠는 고객의 특정 브랜드에 대한 선호나 취미 등에 의해 구매되는 패턴이 있다고 할 때 결합방식은 이러한 다른 패턴의 콘텐츠에 적응하여 좋은 추천 효율을 나타낼 수 있다. Second, it is possible to make recommendations based on the characteristics of the content. Depending on the content, there is a pattern where some content is purchased from specific demographics such as age, gender, and occupation, while some content is purchased by customers' preferences or hobbies. By adapting to the content of the pattern can exhibit a good recommendation efficiency.

도 2는 본 발명에 따른 콘텐츠의 구매패턴 분석 방법의 일실시예를 나타내는 순서도이다. 도 2를 참조하면, 먼저, 분석하고자 하는 특정 콘텐츠에 대한 기준값을 선정한다. 기준값은 자유도를 1로 하며, 유의수준 α 중 특정 값으로 한다(S210).2 is a flowchart illustrating an embodiment of a method of analyzing a purchase pattern of content according to the present invention. Referring to FIG. 2, first, a reference value for a specific content to be analyzed is selected. The reference value is 1 as a degree of freedom, and a specific value of the significance level α (S210).

다음으로, 특정 콘텐츠에 대한 분석 대상인 고객의 속성을 범주화한다(S230). 고객 인구통계학적 속성 DB로부터 속성들을 추출하고, 추출한 속성들을 전처리부를 통해 속성의 범주를 대표하는 범주값으로 설정한다. 여기서 범주화는 대상 속성이 수치형일 경우(S220) 수행한다.Next, the property of the customer who is the analysis target for the specific content is categorized (S230). Attributes are extracted from the customer demographic attribute database, and the extracted attributes are set to category values representing the category of attributes through the preprocessor. The categorization is performed when the target attribute is a numerical type (S220).

이어, 고객 인구통계학적 속성 중 카이스퀘어 분석하고자 하는 하나의 속성을 추출하고(S240), 추출한 하나의 속성에 대한 범주값을 추출한다(S250).Subsequently, one attribute to be analyzed by square analysis among customer demographic attributes is extracted (S240), and a category value for the extracted one attribute is extracted (S250).

다음 단계로, 추출한 범주값에 대한 도 1에서와 같이 카이스퀘어 분석을 수행한다(S260). 분석 수행 결과, 결과값과 기준값을 비교하여(S270), 결과값이 기준값보다 작으면 결과값을 버리고, 결과값이 기준값보다 크면 결과값을 저장관리한다(S280).As a next step, the chi-square analysis is performed on the extracted category value as shown in FIG. 1 (S260). As a result of the analysis, the result value is compared with the reference value (S270). If the result value is smaller than the reference value, the result value is discarded. If the result value is larger than the reference value, the result value is stored and managed (S280).

이 결과값은 특정 콘텐츠에 대한 범주값의 선호도를 나타낸다.This result represents the preference of the category value for the particular content.

도 3은 본 발명에 따른 인구통계학적 속성정보 기반 추천 방법의 일실시예를 나타내는 순서도이다. 도 3을 참조하면, 먼저, 도 2에서와 같이 카이스퀘어 분석 방법을 이용하여 특정 콘텐츠에 대한 범주값의 선호도를 추출한다.3 is a flowchart illustrating an embodiment of a demographic attribution information based recommendation method according to the present invention. Referring to FIG. 3, first, a preference value of a category value for a specific content is extracted using the chi-square analysis method as shown in FIG. 2.

다음으로, 특정 고객의 속성 중 특정 콘텐츠의 구매에 영향을 주는 특징속성집합을 추출한다(S310). 특징속성집합은 특정 고객의 인구통계학적 속성(범주값들) 중 특정 콘텐츠에 대한 카이스퀘어 분석 결과값에 해당하는 범주값, 즉, 속성들의 집합이다.Next, the feature attribute set affecting the purchase of specific content among the attributes of the specific customer is extracted (S310). A feature attribute set is a set of category values, that is, a set of attributes corresponding to the result of chi-square analysis of specific content among the demographic attributes (category values) of a specific customer.

이어, 특징속성집합에 속하는 고객에 대한 집합인 유사 고객군을 추출하고(S320), 특정 콘텐츠에 대한 유사 고객군의 구매율을 계산한다(S330).Next, a similar customer group, which is a set of customers belonging to the feature attribute set, is extracted (S320), and a purchase rate of the similar customer group for a specific content is calculated (S330).

추출한 유사 고객군의 구매율을 특정 콘텐츠에 대한 특정 고객의 예상 선호도 값으로 확보한다(S340).The purchase rate of the extracted similar customer group is secured as an expected preference value of a specific customer for a specific content (S340).

도 4는 본 발명에 따른 구매확률 기반 협력적 추천 방법의 일실시예를 나타내는 순서도이다. 도 4를 참조하면, 먼저, 콘텐츠 i를 구매한 고객의 수를 추출한다(S410).4 is a flowchart illustrating an embodiment of a purchasing probability based cooperative recommendation method according to the present invention. Referring to FIG. 4, first, the number of customers who purchased content i is extracted (S410).

다음으로, 콘텐츠 i와 콘텐츠 j를 모두 구매한 고객의 수를 추출한다(S420).Next, the number of customers who purchased both the content i and the content j is extracted (S420).

이어, 두 콘텐츠간 유사도를 추출한다(S430). 여기서 두 콘텐츠 중 하나를 기준으로 함으로써 추출되는 유사도가 구매의 방향성 정보를 포함할 수 있도록 한다.Next, the similarity between the two contents is extracted (S430). Here, the similarity extracted by referring to one of the two contents may include the directional information of the purchase.

추출된 유사도를 [수학식11]에 적용하여 특정 콘텐츠에 대한 특정 고객의 예상 선호도를 확보한다(S440).The extracted similarity is applied to Equation 11 to secure an expected preference of a specific customer for a specific content (S440).

본 발명은 이상에서 살펴본 바와 같이 바람직한 실시예를 들어 도시하고 설명하였으나, 상기한 실시예에 한정되지 아니하며 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다.Although the present invention has been shown and described with reference to the preferred embodiments as described above, it is not limited to the above embodiments and those skilled in the art without departing from the spirit of the present invention. Various modifications and variations are possible without departing from the spirit of the present invention and equivalents of the claims to be described below.

따라서, 본 발명의 개인화 콘텐츠 추천 방법은 인구통계학적 속성정보 기반 추천과 구매확률 기반 협력적 추천을 함께 사용함으로써 선호도 측정의 오류율을 낮추고 정확도를 높일 수 있는 효과가 있다.Therefore, the personalized content recommendation method of the present invention can reduce the error rate and increase the accuracy of the preference measurement by using the demographic attribution information-based recommendation and the purchase probability based cooperative recommendation together.

또한, 본 발명은 분석을 위한 기초 데이터의 희소성으로 인한 추천 효율 저 하를 방지할 수 있는 효과가 있다.In addition, the present invention has the effect of preventing a decrease in the recommended efficiency due to the scarcity of the basic data for analysis.

또한, 본 발명은 콘텐츠에 따른 고객의 인구통계학적 속성에 기반한 추천이 가능하다는 장점이 있다.In addition, the present invention has the advantage that it is possible to recommend based on the demographic properties of the customer according to the content.

또한, 본 발명은 구매확률에 기반한 콘텐츠에 대한 고객의 예상 선호도 분석 시 구매의 방향성 정보를 포함한 유사도를 얻을 수 있는 효과가 있다.In addition, the present invention has the effect of obtaining a similarity including the directional information of the purchase when analyzing the customer's expected preference for the content based on the probability of purchase.

Claims

delete

Extracting at least one feature attribute affecting the purchase of the first content among the attributes of the first customer;

A second step of extracting a similar customer group which is a set of customers belonging to the at least one feature attribute;

A third step of calculating a purchase rate of the similar customer group for the first content;

A fourth step of securing the resultant value of the third step as a first expected preference value of the first customer for the first content;

A fifth step of extracting the number of customers who have purchased the second content when the first customer has purchased the second content;

Extracting the number of customers who purchased the first content and the second content;

A seventh step of calculating a probability of purchasing the first content among customers who have purchased the second content;

An eighth step of performing the fifth to seventh steps on all the contents purchased by the first customer; And

A ninth step of securing a second expected preference value of the first customer for the first content by using the result value of the eighth step

Personalized content recommendation method comprising a.

delete

The method of claim 2, wherein the ninth step is performed by the following equation.

Equation:

Here, A is the first content, B is the second content, purchase (B) is the number of customers who purchased the second content, purchase (B∩A) is the purchase of the second content and the first content The number of customers, N is the number of content purchased by the first customer.

The method of claim 2,

A tenth step of securing a third expected preference value of the first customer for the first content by combining the first expected preference value and the second expected preference value;

More,

The combining method is a personalized content recommendation method made by the following equation.

Equation:

Where a is the first customer, A is the first content, P (a, A) is the third expected preference value, and P _cf (a, A) is the second expected preference value, P _profile (a, A) is the first expected preference value.

In measuring the expected preferences of the first customer for the first content,

Extracting the number of customers who have purchased the second content when the first customer has purchased the second content;

A third step of calculating a probability of purchasing the first content among customers who have purchased the second content;

A fourth step of performing the first to third steps on all content purchased by the first customer; And

A fifth step of securing an expected preference value of the first customer for the first content by using the result value of the fourth step

Personalized content recommendation method comprising a.

The method of claim 6, wherein the fifth step is performed by the following equation.

Equation:

Where A is the first content, B is the second content, purchase (B) is the number of customers who have purchased the second content, and purchase (B∩A) is the purchase of the second content and the first content. The number of customers, N is the number of content purchased by the first customer.