Nothing Special   »   [go: up one dir, main page]

CN110458236A - A kind of Advertising Copy style recognition methods and system - Google Patents

A kind of Advertising Copy style recognition methods and system Download PDF

Info

Publication number
CN110458236A
CN110458236A CN201910748697.3A CN201910748697A CN110458236A CN 110458236 A CN110458236 A CN 110458236A CN 201910748697 A CN201910748697 A CN 201910748697A CN 110458236 A CN110458236 A CN 110458236A
Authority
CN
China
Prior art keywords
vector
sample
advertising copy
style
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910748697.3A
Other languages
Chinese (zh)
Inventor
翁永金
李百川
陈第
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Umi-Tech Co Ltd
Original Assignee
Umi-Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Umi-Tech Co Ltd filed Critical Umi-Tech Co Ltd
Priority to CN201910748697.3A priority Critical patent/CN110458236A/en
Publication of CN110458236A publication Critical patent/CN110458236A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of Advertising Copy style recognition methods and systems, obtain the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification, determine the feature vector of the targeted advertisements official documents and correspondence, based on described eigenvector, the corresponding Advertising Copy style of described eigenvector is identified from default Advertising Copy style.Through the invention, the identification of Advertising Copy style may be implemented, and then product network operator can be from the official documents and correspondence style trend for macroscopically holding current each company, while can also meet the Advertising Copy of current trend for oneself product design, this will play the role of the publicity of product key.

Description

A kind of Advertising Copy style recognition methods and system
Technical field
The present invention relates to data processing fields, more specifically, being related to a kind of Advertising Copy style recognition methods and system.
Background technique
With the high speed development of mobile Internet and universal, traditional marketing industry also more life of mobile terminal, battalion Pin channel is gradually gone on line under line, and explosive increase is also presented as important publicity measures, quantity in Advertising Copy therewith.
If style identification can be carried out to Advertising Copy, product network operator can be from the current each company of macroscopically assurance Official documents and correspondence style trend, while the Advertising Copy of current trend can also be met for oneself product design, this will to the publicity of product Play the role of key.
Summary of the invention
In view of this, the present invention provides a kind of Advertising Copy style recognition methods and system, to solve to need to advertisement text Case carries out the problem of genre classification.
In order to solve the above technical problems, present invention employs following technical solutions:
A kind of Advertising Copy style recognition methods, comprising:
Obtain the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Determine the feature vector of the targeted advertisements official documents and correspondence;
Based on described eigenvector, the corresponding Advertising Copy of described eigenvector is identified from default Advertising Copy style Style;The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample and obtains.
Preferably, the feature vector of the targeted advertisements official documents and correspondence includes: in structural eigenvector and semantic feature vector At least one;The structural eigenvector include character feature vector, in lexical feature vector sum sentence characteristics vector at least One;The semantic feature vector is the term vector for characterizing the Advertising Copy.
Preferably, the generating process of the default Advertising Copy style includes:
Obtain the corresponding feature vector sample of Advertising Copy sample;
Cluster centre vector is selected from described eigenvector sample;
Based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, obtains the default advertisement Official documents and correspondence style.
It is preferably, described that cluster centre vector is selected from described eigenvector sample, comprising:
A feature vector sample is randomly selected out from described eigenvector sample as initial cluster center vector;
It calculates in each described eigenvector sample and the initial clustering in addition to the initial cluster center vector The distance of Heart vector, if more than pre-determined distance threshold value, then using this feature vector sample as another initial cluster center vector;
It calculates in each described eigenvector sample and the initial clustering in addition to the initial cluster center vector The distance of nearest initial cluster center vector in Heart vector, if more than the pre-determined distance threshold value, then by this feature vector sample As another initial cluster center vector, until stopping when the quantity of the initial cluster center vector reaches preset threshold;
It regard each initial cluster center vector as the cluster centre vector.
It is preferably based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, is obtained described pre- If Advertising Copy style, comprising:
According in addition to the cluster centre vector each described eigenvector sample and each cluster centre to The distance of amount carries out vector classification to all described eigenvector samples, and obtains vector class categories quantity;Wherein, often The range difference of any two feature vector sample in one kind is within the scope of preset data;
Recalculate the new cluster centre vector of each category feature vector sample;
If the new cluster centre vector of each category feature vector sample is at a distance from former cluster centre vector default Within the scope of distance threshold or reach set default the number of iterations, it is determined that the commercial paper of described eigenvector sample is other Quantity is identical as the vector class categories quantity;
Determine the Advertising Copy style of each category feature vector sample, and as the default Advertising Copy style.
Preferably, the Advertising Copy style of each category feature vector sample of the determination, comprising:
For each category feature vector sample, vocabulary frequency is carried out to the corresponding Advertising Copy sample of described eigenvector sample Secondary statistics filters out the vocabulary of specified quantity before ranking;
Obtain the Advertising Copy style determined according to the vocabulary of specified quantity before ranking.
It is preferably based on described eigenvector, identifies that described eigenvector is corresponding from default Advertising Copy style Advertising Copy style, comprising:
Obtain the ad classification model including the default Advertising Copy style;
Based on described eigenvector and the ad classification model, the default Advertising Copy wind of corresponding maximum probability is determined Lattice are as the corresponding Advertising Copy style of described eigenvector.
A kind of Advertising Copy style identifying system, comprising:
Official documents and correspondence obtains module, for obtaining the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Vector determining module, for determining the feature vector of the targeted advertisements official documents and correspondence;
Style identification module identifies the feature from default Advertising Copy style for being based on described eigenvector The corresponding Advertising Copy style of vector;The default Advertising Copy style is based on the corresponding feature vector sample of Advertising Copy sample Training obtains.
Preferably, further includes:
Sample acquisition module, for obtaining the corresponding feature vector sample of Advertising Copy sample;
Vector chooses module, for selecting cluster centre vector from described eigenvector sample;
Genre classification module carries out genre classification to the Advertising Copy sample for being based on the cluster centre vector, Obtain the default Advertising Copy style.
Preferably, the vector selection module includes:
Submodule is screened, for randomly selecting out a feature vector sample from described eigenvector sample as initial Cluster centre vector;
Computational submodule, for calculate each described eigenvector sample in addition to the initial cluster center vector with The distance of the initial cluster center vector, if more than pre-determined distance threshold value, then using this feature vector sample as at the beginning of another Beginning cluster centre vector;Calculate each described eigenvector sample in addition to the initial cluster center vector and described initial The distance of nearest initial cluster center vector in cluster centre vector, if more than the pre-determined distance threshold value, then by this feature to Sample is measured as another initial cluster center vector, until when the quantity of the initial cluster center vector reaches preset threshold Stop;
Submodule is determined, for regarding each initial cluster center vector as the cluster centre vector.
Compared to the prior art, the invention has the following advantages:
The present invention provides a kind of Advertising Copy style recognition methods and systems, obtain pending Advertising Copy genre classification Targeted advertisements official documents and correspondence, determine the feature vector of the targeted advertisements official documents and correspondence, be based on described eigenvector, from default Advertising Copy The corresponding Advertising Copy style of described eigenvector is identified in style.By the invention it is possible to realize Advertising Copy style Identification, and then product network operator can be from the official documents and correspondence style trend of the current each company of macroscopically assurance, while can also be Oneself product design meets the Advertising Copy of current trend, this will play the role of the publicity of product key.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of method flow diagram of Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 2 is the method flow diagram of another Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 3 is the method flow diagram of another Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 4 is the method flow diagram of another Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of Advertising Copy style identifying system provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of Advertising Copy style recognition methods, for carrying out the knowledge of Advertising Copy style Not, referring to Fig.1, Advertising Copy style recognition methods may include:
S11, the targeted advertisements official documents and correspondence for obtaining pending Advertising Copy genre classification.
For example, targeted advertisements official documents and correspondence can be that " fat younger sister must see ", " you want the thing bought in the big rush price reduction of day cat !It hastens to rob!" etc..
Targeted advertisements official documents and correspondence can store in the server for carrying out the identification of Advertising Copy style, and there may also be other services In device, obtained by communication mode.
S12, the feature vector for determining the targeted advertisements official documents and correspondence.
In practical applications, the feature vector of the targeted advertisements official documents and correspondence include: structural eigenvector and semantic feature to At least one of amount;The structural eigenvector includes character feature vector, in lexical feature vector sum sentence characteristics vector At least one.
Wherein, character feature vector, which refers to, is split targeted advertisements official documents and correspondence by word, and statistics official documents and correspondence length, numerical character account for Than, English alphabet accounting, other character accountings, be denoted as character feature vector Vec_char=(v respectively1,v2,v3,v4), dimension For D_char=4.Wherein, other described characters be include character in addition to numerical character and English character.
Lexical feature vector, which refers to, segments targeted advertisements official documents and correspondence, and statistics official documents and correspondence vocabulary number, unit word are (as " put down Side ", " member " etc.) accounting, interjection accounting (such as " ", " Ow "), function word accounting (such as " ", " "), person generation Word accounting (such as " you ", " he ") is denoted as lexical feature vector Vec_word=(v1,v2,v3,v4,v5), dimension D_word= 5。
Sentence characteristics vector, which refers to, carries out part-of-speech tagging and interdependent syntactic analysis to targeted advertisements official documents and correspondence, and statistics part of speech is abundant Degree (part of speech classification/vocabulary number), syntactic structure richness (syntactic type picks weight number/syntactic type number), syntactic structure (including verbal endocentric phrase and language, subject-predicate relationship, dynamic guest's relationship, punctuate) frequency vector, is denoted as sentence characteristics vector Vec_ Sentence=(v1,v2,v3 (1),v3 (2),v3 (3), v3 (4), v3 (5)), dimension D_sentence=7.
In addition, the semantic feature vector is the term vector for characterizing the Advertising Copy.Specifically, in order to drop to greatest extent The sparse disadvantage of low short essay eigen is influenced to Clustering Effect bring.Firstly, based on training B dimension, (B generally takes 50- 300) term vector model Vec (wi)=(v1,v2,...,vB), wherein wiIndicate i-th of vocabulary in term vector model, vBIt indicates The value of the corresponding b dimension of the vocabulary, in the range of [0,1].Wherein, the corresponding term vector of each vocabulary in corpus.It is excellent Selecting B is 100 dimensions.
Targeted advertisements official documents and correspondence is segmented, the vocabulary that participle obtains is removed into the feature obtained after auxiliary words of mood, punctuate Vocabulary is input to one by one in term vector model, is obtained corresponding term vector and is averaged to obtain the language of final targeted advertisements official documents and correspondence Adopted feature vectorWherein n is the feature vocabulary number of targeted advertisements official documents and correspondence.
It should be noted that this feature vector can be normalized after obtaining feature vector, to reduce Data processing amount.
In addition, in the present embodiment, for targeted advertisements official documents and correspondence, it is preferred that the feature vector of the targeted advertisements official documents and correspondence is same When include structural eigenvector and semantic feature vector;The structural eigenvector includes character feature vector, vocabulary spy simultaneously Levy vector sum sentence characteristics vector.
S13, it is based on described eigenvector, the corresponding advertisement of described eigenvector is identified from default Advertising Copy style Official documents and correspondence style.
The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample and obtains.
Specifically, in practical applications, the specific implementation of step S13 may is that
The ad classification model including the default Advertising Copy style is obtained, described eigenvector and the advertisement are based on Disaggregated model determines the default Advertising Copy style of corresponding maximum probability as the corresponding Advertising Copy wind of described eigenvector Lattice.
Specifically, after known multiple default Advertising Copy styles, ad classification model can be trained, for example, word to Amount and text classification tool fastText, Logic Regression Models, Bayesian model etc..The feature vector of targeted advertisements official documents and correspondence is defeated Enter into ad classification model, official documents and correspondence genre labels probability P=(p can be obtained1,p2,…,pk), optionally, take maximum probability Final label of the label as the official documents and correspondence, if while maximum probability small Mr. Yu's given threshold θ, θ can be appointing in [0,0.5] Meaning value, such as 0.2, then be labeled as style official documents and correspondence collection XP to be determined for the official documents and correspondence.
Optionally, routine executes sorting procedure to official documents and correspondence collection XP, if the absolute quantity of some classification is set more than certain Definite value, then whether the manual confirmation category is a new style, if being then added in default Advertising Copy style set S.
In the present embodiment, the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification is obtained, determines the targeted advertisements The feature vector of official documents and correspondence is based on described eigenvector, identifies that described eigenvector is corresponding from default Advertising Copy style Advertising Copy style.By the invention it is possible to realize the identification of Advertising Copy style, and then product network operator can be from macroscopic view The upper official documents and correspondence style trend for holding current each company, while the advertisement text of current trend can also be met for oneself product design Case, this will play the role of the publicity of product key.
Above describe " default Advertising Copy styles ", and now its generating process is introduced, specifically, referring to Fig. 2, institute The generating process for stating default Advertising Copy style may include:
S21, the corresponding feature vector sample of Advertising Copy sample is obtained.
Determine the mistake of the process and the feature vector for obtaining targeted advertisements official documents and correspondence of the feature vector sample of Advertising Copy sample Journey is similar, please refers to above-mentioned corresponding discussion, details are not described herein.
In another implementation of the invention, Advertising Copy sample is denoted as T=(T1,T2,...,Tm), obtaining advertisement After the corresponding structural eigenvector of official documents and correspondence sample, feature normalization processing is carried out.Specifically, to character feature vector, vocabulary Each feature in feature vector and sentence characteristics vector, is normalized using following methods, guarantees that each index value is equal In [0,1]:
Wherein, viIndicate ith feature, vi' for normalization after feature, vi(min) minimum value of ith feature is indicated, vi(max) maximum value of ith feature is indicated;
Vector after normalization is spliced into following characteristics vector:
V(1)=(Vec_char ';Vec_word′;Vec_sentence′).
For each Advertising Copy sample, feature vector sample is V=V(1)Or V=V(2)Or V=(V(1);V(2)), it will The corresponding feature vector sample of all Advertising Copy samples summarizes, and obtains XK=(X1,X2,...,Xn), wherein XnIt can V=V(1)Or V=V(2)Or V=(V(1);V(2)) any in three kinds of situations, m is equal with n.
S22, cluster centre vector is selected from described eigenvector sample.
Wherein, the quantity for the cluster centre vector selected be it is multiple, can preset and select how many a cluster centres Vector such as selects 10 cluster centre vectors.
In practical applications, may include: referring to Fig. 3, step S22
S31, randomly selected out from described eigenvector sample a feature vector sample as initial cluster center to Amount.
Specifically, randomly choosing a vector (i.e. some feature vector sample) from XK is used as first cluster centre μ1
S32, each described eigenvector sample calculated in addition to the initial cluster center vector initially gather with described The distance of class center vector, if more than pre-determined distance threshold value, then using this feature vector sample as another initial cluster center Vector.
S33, each described eigenvector sample calculated in addition to the initial cluster center vector initially gather with described The distance of nearest initial cluster center vector in class center vector, if more than the pre-determined distance threshold value, then by this feature vector Sample is as another initial cluster center vector, until the quantity of the initial cluster center vector is stopped when reaching preset threshold Only.S34, it regard each initial cluster center vector as the cluster centre vector.
Specifically, for each feature vector sample X of XKi, calculate it and selected initial cluster center vector In initial cluster center vector recently distanceWherein k is selection Cluster classification number, be also preset threshold, n be feature vector sample dimension.It should be noted that second cluster centre to Before amount does not determine, nearest initial cluster center vector is first initial clustering in selected initial cluster center vector Middle line vector.
Preferentially select the biggish point of D (x) as another initial cluster center vector;It repeats the above steps, is clustered A initial cluster center vector of k (preset threshold): { μ12,...,μk}。
S23, it is based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, obtained described default Advertising Copy style.
In practical applications, may include: referring to Fig. 4, step S23
S41, foundation are in each described eigenvector sample and each cluster in addition to the cluster centre vector The distance of Heart vector carries out vector classification to all described eigenvector samples, and obtains vector class categories quantity.
Wherein, the range difference of any two feature vector sample in every one kind is within the scope of preset data.
S42, the new cluster centre vector for recalculating each category feature vector sample.
If the new cluster centre vector of S43, each category feature vector sample at a distance from former cluster centre vector Pre-determined distance threshold range is interior or reaches set default the number of iterations, it is determined that the commercial paper of described eigenvector sample Other quantity is identical as vector class categories quantity.
Specifically, setting cluster maximum number of iterations as N, then it is preferably 300 times for e=1,2 ..., N, N, is had Following steps:
Note category set is C={ C1,C2,...,Ck, whereinCategory set has only assumed that K classification, and K is excellent It is selected as 10.But the particular content of specific each classification, must subsequent analysis can just obtain.
For Advertising Copy sample set i=1,2...m, Advertising Copy sample x is calculatediFeature vector sample XiWith it is each Cluster centre vector μr(r=1,2 ... distance k):Wherein r=1,2 ... k and k are cluster Classification number, n are feature vector sample dimension, by d the smallest in resultijIt is denoted as Advertising Copy sample TiCorresponding classification λi。 It updates at this time
For r=1,2 ..., k, to CrIn all sample point recalculate new cluster centre vector:
If all there is no big variation, i.e., the new cluster centres of each category feature vector sample for all center vectors Vector at a distance from former cluster centre vector in pre-determined distance threshold range or the number of iterations reach, such as reach above-mentioned 300 times, export final category set result C={ C1,C2,...,Ck}.Wherein, presetting Advertising Copy style is k kind.
It should be noted that the particular content of step S31-34, step S41-44, as using K-means++ cluster Style finds method.
S44, the Advertising Copy style for determining each category feature vector sample, and as the default Advertising Copy style.
In practical applications, step S44 may include:
For each category feature vector sample, vocabulary frequency is carried out to the corresponding Advertising Copy sample of described eigenvector sample Secondary statistics filters out the vocabulary of specified quantity before ranking, obtains the Advertising Copy determined according to the vocabulary of specified quantity before ranking Style.
Specifically, being directed to C={ C1,C2,...,Ck, vocabulary frequency statistics are carried out to the Advertising Copy sample of each classification, Using before each classification frequency 5 vocabulary as the name of the category.
Then it is manually manually proofreaded referring to the name of above-mentioned classification, finally determines genre labels are as follows: S=(S1, S2,…,Sk), and corresponding genre labels are marked for each official documents and correspondence.Genre labels, which may is that, to be occurred finally, must play and must read, emphasize It is preferential, manufacture is rare etc..
The official documents and correspondence collection XP introduced among the above, it is artificial to carry out style point after obtaining official documents and correspondence classification to XP execution step S31-S34 Analysis, can confirm whether the category is a new style in time, if being then added in default Advertising Copy style set S, That is, can find new Advertising Copy style at any time based on this programme, and unknown style official documents and correspondence collection XP is clustered Analysis, continues to optimize style system.Advertising Copy style phase only fixed in classifier used in compared with the prior art Than that can find new Advertising Copy style in time.
In actual experiment, to the Advertising Copy of 20w, first carried out after clustering determining classification system and carrying out data mark Official documents and correspondence genre classification is randomly selected 200*m (m is official documents and correspondence style number) prediction result and is manually checked, and scheme is overall quasi- True rate about 80%, test show that this method can preferably to discovery Advertising Copy style and classify to it.
In the present embodiment, the style based on K-means++ cluster finds method, can find new Advertising Copy wind in time Lattice.
Optionally, on the basis of the embodiment of above-mentioned Advertising Copy style recognition methods, another embodiment of the present invention A kind of Advertising Copy style identifying system is provided, referring to Fig. 5, may include:
Official documents and correspondence obtains module 11, for obtaining the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Vector determining module 12, for determining the feature vector of the targeted advertisements official documents and correspondence;
Style identification module 13 identifies the spy from default Advertising Copy style for being based on described eigenvector Levy the corresponding Advertising Copy style of vector;The default Advertising Copy style is based on the corresponding feature vector sample of Advertising Copy sample This training obtains.
Further, the feature vector of the targeted advertisements official documents and correspondence includes: in structural eigenvector and semantic feature vector At least one;The structural eigenvector include character feature vector, in lexical feature vector sum sentence characteristics vector at least One;The semantic feature vector is the term vector for characterizing the Advertising Copy.
In the present embodiment, the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification is obtained, determines the targeted advertisements The feature vector of official documents and correspondence is based on described eigenvector, identifies that described eigenvector is corresponding from default Advertising Copy style Advertising Copy style.By the invention it is possible to realize the identification of Advertising Copy style, and then product network operator can be from macroscopic view The upper official documents and correspondence style trend for holding current each company, while the advertisement text of current trend can also be met for oneself product design Case, this will play the role of the publicity of product key.
It should be noted that the course of work of the modules in the present embodiment, please refers to corresponding in above-described embodiment Illustrate, details are not described herein.
Optionally, on the basis of the embodiment of above-mentioned Advertising Copy style identifying system, further includes:
Sample acquisition module, for obtaining the corresponding feature vector sample of Advertising Copy sample;
Vector chooses module, for selecting cluster centre vector from described eigenvector sample;
Genre classification module carries out genre classification to the Advertising Copy sample for being based on the cluster centre vector, Obtain the default Advertising Copy style.
Further, the vector selection module includes:
Submodule is screened, for randomly selecting out a feature vector sample from described eigenvector sample as initial Cluster centre vector;
Computational submodule, for calculate each described eigenvector sample in addition to the initial cluster center vector with The distance of the initial cluster center vector, if more than pre-determined distance threshold value, then using this feature vector sample as at the beginning of another Beginning cluster centre vector;Calculate each described eigenvector sample in addition to the initial cluster center vector and described initial The distance of nearest initial cluster center vector in cluster centre vector, if more than the pre-determined distance threshold value, then by this feature to Sample is measured as another initial cluster center vector, until when the quantity of the initial cluster center vector reaches preset threshold Stop;Submodule is determined, for regarding each initial cluster center vector as the cluster centre vector.
Further, genre classification module includes:
Vector classify submodule, for according in addition to the cluster centre vector each described eigenvector sample with The distance of each cluster centre vector carries out vector classification to all described eigenvector samples, and obtains vector point Class categorical measure;Wherein, the range difference of any two feature vector sample in every one kind is within the scope of preset data;
Center determines submodule, for recalculating the new cluster centre vector of each category feature vector sample;
Quantity determines submodule, if the new cluster centre vector for each category feature vector sample is and in original cluster The distance of Heart vector is in pre-determined distance threshold range or reaches set default the number of iterations, it is determined that the feature to The other quantity of commercial paper for measuring sample is identical as vector class categories quantity;
Style determines submodule, for determining the Advertising Copy style of each category feature vector sample, and as described pre- If Advertising Copy style.
Further, style determines that submodule includes:
Style determination unit, for for each category feature vector sample, advertisement corresponding to described eigenvector sample Official documents and correspondence sample carries out vocabulary frequency statistics, filters out the vocabulary of specified quantity before ranking, obtains according to specified quantity before ranking The Advertising Copy style that vocabulary determines.
In the present embodiment, the style based on K-means++ cluster finds method, can find new Advertising Copy wind in time Lattice.
It should be noted that the course of work of modules, submodule and unit in the present embodiment, please refers to above-mentioned reality The respective description in example is applied, details are not described herein.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of Advertising Copy style recognition methods characterized by comprising
Obtain the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Determine the feature vector of the targeted advertisements official documents and correspondence;
Based on described eigenvector, the corresponding Advertising Copy wind of described eigenvector is identified from default Advertising Copy style Lattice;The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample and obtains.
2. Advertising Copy style recognition methods according to claim 1, which is characterized in that the spy of the targeted advertisements official documents and correspondence Levying vector includes: at least one of structural eigenvector and semantic feature vector;The structural eigenvector includes character spy Levy at least one of vector, lexical feature vector sum sentence characteristics vector;The semantic feature vector is to characterize the advertisement The term vector of official documents and correspondence.
3. Advertising Copy style recognition methods according to claim 1, which is characterized in that the default Advertising Copy style Generating process include:
Obtain the corresponding feature vector sample of Advertising Copy sample;
Cluster centre vector is selected from described eigenvector sample;
Based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, obtains the default Advertising Copy Style.
4. Advertising Copy style recognition methods according to claim 3, which is characterized in that described from described eigenvector sample Cluster centre vector is selected in this, comprising:
A feature vector sample is randomly selected out from described eigenvector sample as initial cluster center vector;
Calculate each described eigenvector sample in addition to the initial cluster center vector and the initial cluster center to The distance of amount, if more than pre-determined distance threshold value, then using this feature vector sample as another initial cluster center vector;
Calculate each described eigenvector sample in addition to the initial cluster center vector and the initial cluster center to The distance of nearest initial cluster center vector in amount, if more than the pre-determined distance threshold value, then using this feature vector sample as Another initial cluster center vector, until stopping when the quantity of the initial cluster center vector reaches preset threshold;
It regard each initial cluster center vector as the cluster centre vector.
5. Advertising Copy style recognition methods according to claim 4, which is characterized in that based on the cluster centre to Amount carries out genre classification to the Advertising Copy sample, obtains the default Advertising Copy style, comprising:
According to each described eigenvector sample and each cluster centre vector in addition to the cluster centre vector Distance carries out vector classification to all described eigenvector samples, and obtains vector class categories quantity;Wherein, every one kind In any two feature vector sample range difference within the scope of preset data;
Recalculate the new cluster centre vector of each category feature vector sample;
If the new cluster centre vector of each category feature vector sample is at a distance from former cluster centre vector in pre-determined distance Threshold range is interior or reaches set default the number of iterations, it is determined that the other quantity of the commercial paper of described eigenvector sample It is identical as the vector class categories quantity;
Determine the Advertising Copy style of each category feature vector sample, and as the default Advertising Copy style.
6. Advertising Copy style recognition methods according to claim 5, which is characterized in that each category feature of determination to Measure the Advertising Copy style of sample, comprising:
For each category feature vector sample, vocabulary frequency system is carried out to the corresponding Advertising Copy sample of described eigenvector sample Meter, filters out the vocabulary of specified quantity before ranking;
Obtain the Advertising Copy style determined according to the vocabulary of specified quantity before ranking.
7. Advertising Copy style recognition methods according to claim 6, which is characterized in that it is based on described eigenvector, from The corresponding Advertising Copy style of described eigenvector is identified in default Advertising Copy style, comprising:
Obtain the ad classification model including the default Advertising Copy style;
Based on described eigenvector and the ad classification model, determine that the default Advertising Copy style of corresponding maximum probability is made For the corresponding Advertising Copy style of described eigenvector.
8. a kind of Advertising Copy style identifying system characterized by comprising
Official documents and correspondence obtains module, for obtaining the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Vector determining module, for determining the feature vector of the targeted advertisements official documents and correspondence;
Style identification module identifies described eigenvector from default Advertising Copy style for being based on described eigenvector Corresponding Advertising Copy style;The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample It obtains.
9. Advertising Copy style identifying system according to claim 8, which is characterized in that further include:
Sample acquisition module, for obtaining the corresponding feature vector sample of Advertising Copy sample;
Vector chooses module, for selecting cluster centre vector from described eigenvector sample;
Genre classification module carries out genre classification to the Advertising Copy sample, obtains for being based on the cluster centre vector The default Advertising Copy style.
10. Advertising Copy style identifying system according to claim 9, the vector chooses module and includes:
Submodule is screened, for randomly selecting out a feature vector sample from described eigenvector sample as initial clustering Center vector;
Computational submodule, for calculate each described eigenvector sample in addition to the initial cluster center vector with it is described The distance of initial cluster center vector, it is if more than pre-determined distance threshold value, then initial poly- using this feature vector sample as another Class center vector;Calculate each described eigenvector sample and the initial clustering in addition to the initial cluster center vector The distance of nearest initial cluster center vector in center vector, if more than the pre-determined distance threshold value, then by this feature vector sample This is as another initial cluster center vector, until the quantity of the initial cluster center vector is stopped when reaching preset threshold Only;
Submodule is determined, for regarding each initial cluster center vector as the cluster centre vector.
CN201910748697.3A 2019-08-14 2019-08-14 A kind of Advertising Copy style recognition methods and system Pending CN110458236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910748697.3A CN110458236A (en) 2019-08-14 2019-08-14 A kind of Advertising Copy style recognition methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910748697.3A CN110458236A (en) 2019-08-14 2019-08-14 A kind of Advertising Copy style recognition methods and system

Publications (1)

Publication Number Publication Date
CN110458236A true CN110458236A (en) 2019-11-15

Family

ID=68486484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910748697.3A Pending CN110458236A (en) 2019-08-14 2019-08-14 A kind of Advertising Copy style recognition methods and system

Country Status (1)

Country Link
CN (1) CN110458236A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948538A (en) * 2021-01-29 2021-06-11 北京字节跳动网络技术有限公司 Text processing method and equipment
CN113553499A (en) * 2021-06-22 2021-10-26 杭州摸象大数据科技有限公司 Cheating detection method and system based on marketing fission and electronic equipment
CN113934815A (en) * 2021-09-18 2022-01-14 有米科技股份有限公司 Advertisement and pattern characteristic information identification method and device based on neural network
CN113935765A (en) * 2021-09-18 2022-01-14 有米科技股份有限公司 Neural network-based advertising pattern style identification method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144158A1 (en) * 2003-11-18 2005-06-30 Capper Liesl J. Computer network search engine
US20060259360A1 (en) * 2005-05-16 2006-11-16 Manyworlds, Inc. Multiple Attribute and Behavior-based Advertising Process
CN101000627A (en) * 2007-01-15 2007-07-18 北京搜狗科技发展有限公司 Method and device for issuing correlation information
US8038060B2 (en) * 2006-09-04 2011-10-18 Seiko Instruments Inc. ID image providing device
CN104156690A (en) * 2014-06-27 2014-11-19 辽宁石油化工大学 Gesture recognition method based on image space pyramid bag of features
US9299081B2 (en) * 2012-09-10 2016-03-29 Yahoo! Inc. Deriving a user profile from questions
CN106484675A (en) * 2016-09-29 2017-03-08 北京理工大学 Fusion distributed semantic and the character relation abstracting method of sentence justice feature
CN106874923A (en) * 2015-12-14 2017-06-20 阿里巴巴集团控股有限公司 A kind of genre classification of commodity determines method and device
CN108596051A (en) * 2018-04-04 2018-09-28 浙江大学城市学院 A kind of intelligent identification Method towards product style image
CN108898165A (en) * 2018-06-12 2018-11-27 浙江大学 A kind of recognition methods of billboard style
CN109447706A (en) * 2018-10-25 2019-03-08 深圳前海微众银行股份有限公司 Advertising Copy generation method, device, equipment and readable storage medium storing program for executing
CN109583952A (en) * 2018-11-28 2019-04-05 深圳前海微众银行股份有限公司 Advertising Copy processing method, device, equipment and computer readable storage medium
CN109831684A (en) * 2019-03-11 2019-05-31 深圳前海微众银行股份有限公司 Video optimized recommended method, device and readable storage medium storing program for executing
CN110880120A (en) * 2018-09-06 2020-03-13 阿里巴巴集团控股有限公司 Advertisement method, advertisement system and identification device

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144158A1 (en) * 2003-11-18 2005-06-30 Capper Liesl J. Computer network search engine
US20060259360A1 (en) * 2005-05-16 2006-11-16 Manyworlds, Inc. Multiple Attribute and Behavior-based Advertising Process
US8038060B2 (en) * 2006-09-04 2011-10-18 Seiko Instruments Inc. ID image providing device
CN101000627A (en) * 2007-01-15 2007-07-18 北京搜狗科技发展有限公司 Method and device for issuing correlation information
US9299081B2 (en) * 2012-09-10 2016-03-29 Yahoo! Inc. Deriving a user profile from questions
CN104156690A (en) * 2014-06-27 2014-11-19 辽宁石油化工大学 Gesture recognition method based on image space pyramid bag of features
CN106874923A (en) * 2015-12-14 2017-06-20 阿里巴巴集团控股有限公司 A kind of genre classification of commodity determines method and device
CN106484675A (en) * 2016-09-29 2017-03-08 北京理工大学 Fusion distributed semantic and the character relation abstracting method of sentence justice feature
CN108596051A (en) * 2018-04-04 2018-09-28 浙江大学城市学院 A kind of intelligent identification Method towards product style image
CN108898165A (en) * 2018-06-12 2018-11-27 浙江大学 A kind of recognition methods of billboard style
CN110880120A (en) * 2018-09-06 2020-03-13 阿里巴巴集团控股有限公司 Advertisement method, advertisement system and identification device
CN109447706A (en) * 2018-10-25 2019-03-08 深圳前海微众银行股份有限公司 Advertising Copy generation method, device, equipment and readable storage medium storing program for executing
CN109583952A (en) * 2018-11-28 2019-04-05 深圳前海微众银行股份有限公司 Advertising Copy processing method, device, equipment and computer readable storage medium
CN109831684A (en) * 2019-03-11 2019-05-31 深圳前海微众银行股份有限公司 Video optimized recommended method, device and readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIE LIU等: "A Chinese Character Localization Method Based on Intergrating Structure and CC-Clustering for Advertising Images", 《2011 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION》 *
陈静: "M-Learning内容设计及评估标准研究", 《中国优秀硕士学位论文全文数据库(社会科学Ⅱ辑)》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948538A (en) * 2021-01-29 2021-06-11 北京字节跳动网络技术有限公司 Text processing method and equipment
CN113553499A (en) * 2021-06-22 2021-10-26 杭州摸象大数据科技有限公司 Cheating detection method and system based on marketing fission and electronic equipment
CN113934815A (en) * 2021-09-18 2022-01-14 有米科技股份有限公司 Advertisement and pattern characteristic information identification method and device based on neural network
CN113935765A (en) * 2021-09-18 2022-01-14 有米科技股份有限公司 Neural network-based advertising pattern style identification method and device
CN113934815B (en) * 2021-09-18 2024-10-29 有米科技股份有限公司 Advertisement document characteristic information identification method and device based on neural network

Similar Documents

Publication Publication Date Title
CN110458236A (en) A kind of Advertising Copy style recognition methods and system
CN111177374B (en) Question-answer corpus emotion classification method and system based on active learning
CN109767787B (en) Emotion recognition method, device and readable storage medium
CN111104526A (en) Financial label extraction method and system based on keyword semantics
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
WO2021218028A1 (en) Artificial intelligence-based interview content refining method, apparatus and device, and medium
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN110347908B (en) Voice shopping method, device, medium and electronic equipment
CN107958385B (en) Bidding based on buyer defined function
CN109582788A (en) Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
CN110263854B (en) Live broadcast label determining method, device and storage medium
CN103870000A (en) Method and device for sorting candidate items generated by input method
CN109492103A (en) Label information acquisition methods, device, electronic equipment and computer-readable medium
CN110750646B (en) Attribute description extracting method for hotel comment text
WO2021218027A1 (en) Method and apparatus for extracting terminology in intelligent interview, device, and medium
CN111783424B (en) Text sentence dividing method and device
US20240046155A1 (en) Interface for artificial intelligence training
CN113051380A (en) Information generation method and device, electronic equipment and storage medium
CN114429134B (en) Hierarchical high-quality speech mining method and device based on multivariate semantic representation
CN114722810A (en) Real estate customer portrait method and system based on information extraction and multi-attribute decision
CN111191029B (en) AC construction method based on supervised learning and text classification
CN114742062B (en) Text keyword extraction processing method and system
Schuller et al. Semantic speech tagging: Towards combined analysis of speaker traits
CN110705308A (en) Method and device for recognizing field of voice information, storage medium and electronic equipment
CN115293818A (en) Advertisement putting and selecting method and device, equipment and medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191115

RJ01 Rejection of invention patent application after publication