CN110458236A - A kind of Advertising Copy style recognition methods and system - Google Patents
A kind of Advertising Copy style recognition methods and system Download PDFInfo
- Publication number
- CN110458236A CN110458236A CN201910748697.3A CN201910748697A CN110458236A CN 110458236 A CN110458236 A CN 110458236A CN 201910748697 A CN201910748697 A CN 201910748697A CN 110458236 A CN110458236 A CN 110458236A
- Authority
- CN
- China
- Prior art keywords
- vector
- sample
- advertising copy
- style
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of Advertising Copy style recognition methods and systems, obtain the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification, determine the feature vector of the targeted advertisements official documents and correspondence, based on described eigenvector, the corresponding Advertising Copy style of described eigenvector is identified from default Advertising Copy style.Through the invention, the identification of Advertising Copy style may be implemented, and then product network operator can be from the official documents and correspondence style trend for macroscopically holding current each company, while can also meet the Advertising Copy of current trend for oneself product design, this will play the role of the publicity of product key.
Description
Technical field
The present invention relates to data processing fields, more specifically, being related to a kind of Advertising Copy style recognition methods and system.
Background technique
With the high speed development of mobile Internet and universal, traditional marketing industry also more life of mobile terminal, battalion
Pin channel is gradually gone on line under line, and explosive increase is also presented as important publicity measures, quantity in Advertising Copy therewith.
If style identification can be carried out to Advertising Copy, product network operator can be from the current each company of macroscopically assurance
Official documents and correspondence style trend, while the Advertising Copy of current trend can also be met for oneself product design, this will to the publicity of product
Play the role of key.
Summary of the invention
In view of this, the present invention provides a kind of Advertising Copy style recognition methods and system, to solve to need to advertisement text
Case carries out the problem of genre classification.
In order to solve the above technical problems, present invention employs following technical solutions:
A kind of Advertising Copy style recognition methods, comprising:
Obtain the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Determine the feature vector of the targeted advertisements official documents and correspondence;
Based on described eigenvector, the corresponding Advertising Copy of described eigenvector is identified from default Advertising Copy style
Style;The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample and obtains.
Preferably, the feature vector of the targeted advertisements official documents and correspondence includes: in structural eigenvector and semantic feature vector
At least one;The structural eigenvector include character feature vector, in lexical feature vector sum sentence characteristics vector at least
One;The semantic feature vector is the term vector for characterizing the Advertising Copy.
Preferably, the generating process of the default Advertising Copy style includes:
Obtain the corresponding feature vector sample of Advertising Copy sample;
Cluster centre vector is selected from described eigenvector sample;
Based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, obtains the default advertisement
Official documents and correspondence style.
It is preferably, described that cluster centre vector is selected from described eigenvector sample, comprising:
A feature vector sample is randomly selected out from described eigenvector sample as initial cluster center vector;
It calculates in each described eigenvector sample and the initial clustering in addition to the initial cluster center vector
The distance of Heart vector, if more than pre-determined distance threshold value, then using this feature vector sample as another initial cluster center vector;
It calculates in each described eigenvector sample and the initial clustering in addition to the initial cluster center vector
The distance of nearest initial cluster center vector in Heart vector, if more than the pre-determined distance threshold value, then by this feature vector sample
As another initial cluster center vector, until stopping when the quantity of the initial cluster center vector reaches preset threshold;
It regard each initial cluster center vector as the cluster centre vector.
It is preferably based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, is obtained described pre-
If Advertising Copy style, comprising:
According in addition to the cluster centre vector each described eigenvector sample and each cluster centre to
The distance of amount carries out vector classification to all described eigenvector samples, and obtains vector class categories quantity;Wherein, often
The range difference of any two feature vector sample in one kind is within the scope of preset data;
Recalculate the new cluster centre vector of each category feature vector sample;
If the new cluster centre vector of each category feature vector sample is at a distance from former cluster centre vector default
Within the scope of distance threshold or reach set default the number of iterations, it is determined that the commercial paper of described eigenvector sample is other
Quantity is identical as the vector class categories quantity;
Determine the Advertising Copy style of each category feature vector sample, and as the default Advertising Copy style.
Preferably, the Advertising Copy style of each category feature vector sample of the determination, comprising:
For each category feature vector sample, vocabulary frequency is carried out to the corresponding Advertising Copy sample of described eigenvector sample
Secondary statistics filters out the vocabulary of specified quantity before ranking;
Obtain the Advertising Copy style determined according to the vocabulary of specified quantity before ranking.
It is preferably based on described eigenvector, identifies that described eigenvector is corresponding from default Advertising Copy style
Advertising Copy style, comprising:
Obtain the ad classification model including the default Advertising Copy style;
Based on described eigenvector and the ad classification model, the default Advertising Copy wind of corresponding maximum probability is determined
Lattice are as the corresponding Advertising Copy style of described eigenvector.
A kind of Advertising Copy style identifying system, comprising:
Official documents and correspondence obtains module, for obtaining the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Vector determining module, for determining the feature vector of the targeted advertisements official documents and correspondence;
Style identification module identifies the feature from default Advertising Copy style for being based on described eigenvector
The corresponding Advertising Copy style of vector;The default Advertising Copy style is based on the corresponding feature vector sample of Advertising Copy sample
Training obtains.
Preferably, further includes:
Sample acquisition module, for obtaining the corresponding feature vector sample of Advertising Copy sample;
Vector chooses module, for selecting cluster centre vector from described eigenvector sample;
Genre classification module carries out genre classification to the Advertising Copy sample for being based on the cluster centre vector,
Obtain the default Advertising Copy style.
Preferably, the vector selection module includes:
Submodule is screened, for randomly selecting out a feature vector sample from described eigenvector sample as initial
Cluster centre vector;
Computational submodule, for calculate each described eigenvector sample in addition to the initial cluster center vector with
The distance of the initial cluster center vector, if more than pre-determined distance threshold value, then using this feature vector sample as at the beginning of another
Beginning cluster centre vector;Calculate each described eigenvector sample in addition to the initial cluster center vector and described initial
The distance of nearest initial cluster center vector in cluster centre vector, if more than the pre-determined distance threshold value, then by this feature to
Sample is measured as another initial cluster center vector, until when the quantity of the initial cluster center vector reaches preset threshold
Stop;
Submodule is determined, for regarding each initial cluster center vector as the cluster centre vector.
Compared to the prior art, the invention has the following advantages:
The present invention provides a kind of Advertising Copy style recognition methods and systems, obtain pending Advertising Copy genre classification
Targeted advertisements official documents and correspondence, determine the feature vector of the targeted advertisements official documents and correspondence, be based on described eigenvector, from default Advertising Copy
The corresponding Advertising Copy style of described eigenvector is identified in style.By the invention it is possible to realize Advertising Copy style
Identification, and then product network operator can be from the official documents and correspondence style trend of the current each company of macroscopically assurance, while can also be
Oneself product design meets the Advertising Copy of current trend, this will play the role of the publicity of product key.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of method flow diagram of Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 2 is the method flow diagram of another Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 3 is the method flow diagram of another Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 4 is the method flow diagram of another Advertising Copy style recognition methods provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of Advertising Copy style identifying system provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of Advertising Copy style recognition methods, for carrying out the knowledge of Advertising Copy style
Not, referring to Fig.1, Advertising Copy style recognition methods may include:
S11, the targeted advertisements official documents and correspondence for obtaining pending Advertising Copy genre classification.
For example, targeted advertisements official documents and correspondence can be that " fat younger sister must see ", " you want the thing bought in the big rush price reduction of day cat
!It hastens to rob!" etc..
Targeted advertisements official documents and correspondence can store in the server for carrying out the identification of Advertising Copy style, and there may also be other services
In device, obtained by communication mode.
S12, the feature vector for determining the targeted advertisements official documents and correspondence.
In practical applications, the feature vector of the targeted advertisements official documents and correspondence include: structural eigenvector and semantic feature to
At least one of amount;The structural eigenvector includes character feature vector, in lexical feature vector sum sentence characteristics vector
At least one.
Wherein, character feature vector, which refers to, is split targeted advertisements official documents and correspondence by word, and statistics official documents and correspondence length, numerical character account for
Than, English alphabet accounting, other character accountings, be denoted as character feature vector Vec_char=(v respectively1,v2,v3,v4), dimension
For D_char=4.Wherein, other described characters be include character in addition to numerical character and English character.
Lexical feature vector, which refers to, segments targeted advertisements official documents and correspondence, and statistics official documents and correspondence vocabulary number, unit word are (as " put down
Side ", " member " etc.) accounting, interjection accounting (such as " ", " Ow "), function word accounting (such as " ", " "), person generation
Word accounting (such as " you ", " he ") is denoted as lexical feature vector Vec_word=(v1,v2,v3,v4,v5), dimension D_word=
5。
Sentence characteristics vector, which refers to, carries out part-of-speech tagging and interdependent syntactic analysis to targeted advertisements official documents and correspondence, and statistics part of speech is abundant
Degree (part of speech classification/vocabulary number), syntactic structure richness (syntactic type picks weight number/syntactic type number), syntactic structure
(including verbal endocentric phrase and language, subject-predicate relationship, dynamic guest's relationship, punctuate) frequency vector, is denoted as sentence characteristics vector Vec_
Sentence=(v1,v2,v3 (1),v3 (2),v3 (3), v3 (4), v3 (5)), dimension D_sentence=7.
In addition, the semantic feature vector is the term vector for characterizing the Advertising Copy.Specifically, in order to drop to greatest extent
The sparse disadvantage of low short essay eigen is influenced to Clustering Effect bring.Firstly, based on training B dimension, (B generally takes 50-
300) term vector model Vec (wi)=(v1,v2,...,vB), wherein wiIndicate i-th of vocabulary in term vector model, vBIt indicates
The value of the corresponding b dimension of the vocabulary, in the range of [0,1].Wherein, the corresponding term vector of each vocabulary in corpus.It is excellent
Selecting B is 100 dimensions.
Targeted advertisements official documents and correspondence is segmented, the vocabulary that participle obtains is removed into the feature obtained after auxiliary words of mood, punctuate
Vocabulary is input to one by one in term vector model, is obtained corresponding term vector and is averaged to obtain the language of final targeted advertisements official documents and correspondence
Adopted feature vectorWherein n is the feature vocabulary number of targeted advertisements official documents and correspondence.
It should be noted that this feature vector can be normalized after obtaining feature vector, to reduce
Data processing amount.
In addition, in the present embodiment, for targeted advertisements official documents and correspondence, it is preferred that the feature vector of the targeted advertisements official documents and correspondence is same
When include structural eigenvector and semantic feature vector;The structural eigenvector includes character feature vector, vocabulary spy simultaneously
Levy vector sum sentence characteristics vector.
S13, it is based on described eigenvector, the corresponding advertisement of described eigenvector is identified from default Advertising Copy style
Official documents and correspondence style.
The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample and obtains.
Specifically, in practical applications, the specific implementation of step S13 may is that
The ad classification model including the default Advertising Copy style is obtained, described eigenvector and the advertisement are based on
Disaggregated model determines the default Advertising Copy style of corresponding maximum probability as the corresponding Advertising Copy wind of described eigenvector
Lattice.
Specifically, after known multiple default Advertising Copy styles, ad classification model can be trained, for example, word to
Amount and text classification tool fastText, Logic Regression Models, Bayesian model etc..The feature vector of targeted advertisements official documents and correspondence is defeated
Enter into ad classification model, official documents and correspondence genre labels probability P=(p can be obtained1,p2,…,pk), optionally, take maximum probability
Final label of the label as the official documents and correspondence, if while maximum probability small Mr. Yu's given threshold θ, θ can be appointing in [0,0.5]
Meaning value, such as 0.2, then be labeled as style official documents and correspondence collection XP to be determined for the official documents and correspondence.
Optionally, routine executes sorting procedure to official documents and correspondence collection XP, if the absolute quantity of some classification is set more than certain
Definite value, then whether the manual confirmation category is a new style, if being then added in default Advertising Copy style set S.
In the present embodiment, the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification is obtained, determines the targeted advertisements
The feature vector of official documents and correspondence is based on described eigenvector, identifies that described eigenvector is corresponding from default Advertising Copy style
Advertising Copy style.By the invention it is possible to realize the identification of Advertising Copy style, and then product network operator can be from macroscopic view
The upper official documents and correspondence style trend for holding current each company, while the advertisement text of current trend can also be met for oneself product design
Case, this will play the role of the publicity of product key.
Above describe " default Advertising Copy styles ", and now its generating process is introduced, specifically, referring to Fig. 2, institute
The generating process for stating default Advertising Copy style may include:
S21, the corresponding feature vector sample of Advertising Copy sample is obtained.
Determine the mistake of the process and the feature vector for obtaining targeted advertisements official documents and correspondence of the feature vector sample of Advertising Copy sample
Journey is similar, please refers to above-mentioned corresponding discussion, details are not described herein.
In another implementation of the invention, Advertising Copy sample is denoted as T=(T1,T2,...,Tm), obtaining advertisement
After the corresponding structural eigenvector of official documents and correspondence sample, feature normalization processing is carried out.Specifically, to character feature vector, vocabulary
Each feature in feature vector and sentence characteristics vector, is normalized using following methods, guarantees that each index value is equal
In [0,1]:
Wherein, viIndicate ith feature, vi' for normalization after feature, vi(min) minimum value of ith feature is indicated,
vi(max) maximum value of ith feature is indicated;
Vector after normalization is spliced into following characteristics vector:
V(1)=(Vec_char ';Vec_word′;Vec_sentence′).
For each Advertising Copy sample, feature vector sample is V=V(1)Or V=V(2)Or V=(V(1);V(2)), it will
The corresponding feature vector sample of all Advertising Copy samples summarizes, and obtains XK=(X1,X2,...,Xn), wherein XnIt can V=V(1)Or V=V(2)Or V=(V(1);V(2)) any in three kinds of situations, m is equal with n.
S22, cluster centre vector is selected from described eigenvector sample.
Wherein, the quantity for the cluster centre vector selected be it is multiple, can preset and select how many a cluster centres
Vector such as selects 10 cluster centre vectors.
In practical applications, may include: referring to Fig. 3, step S22
S31, randomly selected out from described eigenvector sample a feature vector sample as initial cluster center to
Amount.
Specifically, randomly choosing a vector (i.e. some feature vector sample) from XK is used as first cluster centre μ1。
S32, each described eigenvector sample calculated in addition to the initial cluster center vector initially gather with described
The distance of class center vector, if more than pre-determined distance threshold value, then using this feature vector sample as another initial cluster center
Vector.
S33, each described eigenvector sample calculated in addition to the initial cluster center vector initially gather with described
The distance of nearest initial cluster center vector in class center vector, if more than the pre-determined distance threshold value, then by this feature vector
Sample is as another initial cluster center vector, until the quantity of the initial cluster center vector is stopped when reaching preset threshold
Only.S34, it regard each initial cluster center vector as the cluster centre vector.
Specifically, for each feature vector sample X of XKi, calculate it and selected initial cluster center vector
In initial cluster center vector recently distanceWherein k is selection
Cluster classification number, be also preset threshold, n be feature vector sample dimension.It should be noted that second cluster centre to
Before amount does not determine, nearest initial cluster center vector is first initial clustering in selected initial cluster center vector
Middle line vector.
Preferentially select the biggish point of D (x) as another initial cluster center vector;It repeats the above steps, is clustered
A initial cluster center vector of k (preset threshold): { μ1,μ2,...,μk}。
S23, it is based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, obtained described default
Advertising Copy style.
In practical applications, may include: referring to Fig. 4, step S23
S41, foundation are in each described eigenvector sample and each cluster in addition to the cluster centre vector
The distance of Heart vector carries out vector classification to all described eigenvector samples, and obtains vector class categories quantity.
Wherein, the range difference of any two feature vector sample in every one kind is within the scope of preset data.
S42, the new cluster centre vector for recalculating each category feature vector sample.
If the new cluster centre vector of S43, each category feature vector sample at a distance from former cluster centre vector
Pre-determined distance threshold range is interior or reaches set default the number of iterations, it is determined that the commercial paper of described eigenvector sample
Other quantity is identical as vector class categories quantity.
Specifically, setting cluster maximum number of iterations as N, then it is preferably 300 times for e=1,2 ..., N, N, is had
Following steps:
Note category set is C={ C1,C2,...,Ck, whereinCategory set has only assumed that K classification, and K is excellent
It is selected as 10.But the particular content of specific each classification, must subsequent analysis can just obtain.
For Advertising Copy sample set i=1,2...m, Advertising Copy sample x is calculatediFeature vector sample XiWith it is each
Cluster centre vector μr(r=1,2 ... distance k):Wherein r=1,2 ... k and k are cluster
Classification number, n are feature vector sample dimension, by d the smallest in resultijIt is denoted as Advertising Copy sample TiCorresponding classification λi。
It updates at this time
For r=1,2 ..., k, to CrIn all sample point recalculate new cluster centre vector:
If all there is no big variation, i.e., the new cluster centres of each category feature vector sample for all center vectors
Vector at a distance from former cluster centre vector in pre-determined distance threshold range or the number of iterations reach, such as reach above-mentioned
300 times, export final category set result C={ C1,C2,...,Ck}.Wherein, presetting Advertising Copy style is k kind.
It should be noted that the particular content of step S31-34, step S41-44, as using K-means++ cluster
Style finds method.
S44, the Advertising Copy style for determining each category feature vector sample, and as the default Advertising Copy style.
In practical applications, step S44 may include:
For each category feature vector sample, vocabulary frequency is carried out to the corresponding Advertising Copy sample of described eigenvector sample
Secondary statistics filters out the vocabulary of specified quantity before ranking, obtains the Advertising Copy determined according to the vocabulary of specified quantity before ranking
Style.
Specifically, being directed to C={ C1,C2,...,Ck, vocabulary frequency statistics are carried out to the Advertising Copy sample of each classification,
Using before each classification frequency 5 vocabulary as the name of the category.
Then it is manually manually proofreaded referring to the name of above-mentioned classification, finally determines genre labels are as follows: S=(S1,
S2,…,Sk), and corresponding genre labels are marked for each official documents and correspondence.Genre labels, which may is that, to be occurred finally, must play and must read, emphasize
It is preferential, manufacture is rare etc..
The official documents and correspondence collection XP introduced among the above, it is artificial to carry out style point after obtaining official documents and correspondence classification to XP execution step S31-S34
Analysis, can confirm whether the category is a new style in time, if being then added in default Advertising Copy style set S,
That is, can find new Advertising Copy style at any time based on this programme, and unknown style official documents and correspondence collection XP is clustered
Analysis, continues to optimize style system.Advertising Copy style phase only fixed in classifier used in compared with the prior art
Than that can find new Advertising Copy style in time.
In actual experiment, to the Advertising Copy of 20w, first carried out after clustering determining classification system and carrying out data mark
Official documents and correspondence genre classification is randomly selected 200*m (m is official documents and correspondence style number) prediction result and is manually checked, and scheme is overall quasi-
True rate about 80%, test show that this method can preferably to discovery Advertising Copy style and classify to it.
In the present embodiment, the style based on K-means++ cluster finds method, can find new Advertising Copy wind in time
Lattice.
Optionally, on the basis of the embodiment of above-mentioned Advertising Copy style recognition methods, another embodiment of the present invention
A kind of Advertising Copy style identifying system is provided, referring to Fig. 5, may include:
Official documents and correspondence obtains module 11, for obtaining the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Vector determining module 12, for determining the feature vector of the targeted advertisements official documents and correspondence;
Style identification module 13 identifies the spy from default Advertising Copy style for being based on described eigenvector
Levy the corresponding Advertising Copy style of vector;The default Advertising Copy style is based on the corresponding feature vector sample of Advertising Copy sample
This training obtains.
Further, the feature vector of the targeted advertisements official documents and correspondence includes: in structural eigenvector and semantic feature vector
At least one;The structural eigenvector include character feature vector, in lexical feature vector sum sentence characteristics vector at least
One;The semantic feature vector is the term vector for characterizing the Advertising Copy.
In the present embodiment, the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification is obtained, determines the targeted advertisements
The feature vector of official documents and correspondence is based on described eigenvector, identifies that described eigenvector is corresponding from default Advertising Copy style
Advertising Copy style.By the invention it is possible to realize the identification of Advertising Copy style, and then product network operator can be from macroscopic view
The upper official documents and correspondence style trend for holding current each company, while the advertisement text of current trend can also be met for oneself product design
Case, this will play the role of the publicity of product key.
It should be noted that the course of work of the modules in the present embodiment, please refers to corresponding in above-described embodiment
Illustrate, details are not described herein.
Optionally, on the basis of the embodiment of above-mentioned Advertising Copy style identifying system, further includes:
Sample acquisition module, for obtaining the corresponding feature vector sample of Advertising Copy sample;
Vector chooses module, for selecting cluster centre vector from described eigenvector sample;
Genre classification module carries out genre classification to the Advertising Copy sample for being based on the cluster centre vector,
Obtain the default Advertising Copy style.
Further, the vector selection module includes:
Submodule is screened, for randomly selecting out a feature vector sample from described eigenvector sample as initial
Cluster centre vector;
Computational submodule, for calculate each described eigenvector sample in addition to the initial cluster center vector with
The distance of the initial cluster center vector, if more than pre-determined distance threshold value, then using this feature vector sample as at the beginning of another
Beginning cluster centre vector;Calculate each described eigenvector sample in addition to the initial cluster center vector and described initial
The distance of nearest initial cluster center vector in cluster centre vector, if more than the pre-determined distance threshold value, then by this feature to
Sample is measured as another initial cluster center vector, until when the quantity of the initial cluster center vector reaches preset threshold
Stop;Submodule is determined, for regarding each initial cluster center vector as the cluster centre vector.
Further, genre classification module includes:
Vector classify submodule, for according in addition to the cluster centre vector each described eigenvector sample with
The distance of each cluster centre vector carries out vector classification to all described eigenvector samples, and obtains vector point
Class categorical measure;Wherein, the range difference of any two feature vector sample in every one kind is within the scope of preset data;
Center determines submodule, for recalculating the new cluster centre vector of each category feature vector sample;
Quantity determines submodule, if the new cluster centre vector for each category feature vector sample is and in original cluster
The distance of Heart vector is in pre-determined distance threshold range or reaches set default the number of iterations, it is determined that the feature to
The other quantity of commercial paper for measuring sample is identical as vector class categories quantity;
Style determines submodule, for determining the Advertising Copy style of each category feature vector sample, and as described pre-
If Advertising Copy style.
Further, style determines that submodule includes:
Style determination unit, for for each category feature vector sample, advertisement corresponding to described eigenvector sample
Official documents and correspondence sample carries out vocabulary frequency statistics, filters out the vocabulary of specified quantity before ranking, obtains according to specified quantity before ranking
The Advertising Copy style that vocabulary determines.
In the present embodiment, the style based on K-means++ cluster finds method, can find new Advertising Copy wind in time
Lattice.
It should be noted that the course of work of modules, submodule and unit in the present embodiment, please refers to above-mentioned reality
The respective description in example is applied, details are not described herein.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of Advertising Copy style recognition methods characterized by comprising
Obtain the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Determine the feature vector of the targeted advertisements official documents and correspondence;
Based on described eigenvector, the corresponding Advertising Copy wind of described eigenvector is identified from default Advertising Copy style
Lattice;The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample and obtains.
2. Advertising Copy style recognition methods according to claim 1, which is characterized in that the spy of the targeted advertisements official documents and correspondence
Levying vector includes: at least one of structural eigenvector and semantic feature vector;The structural eigenvector includes character spy
Levy at least one of vector, lexical feature vector sum sentence characteristics vector;The semantic feature vector is to characterize the advertisement
The term vector of official documents and correspondence.
3. Advertising Copy style recognition methods according to claim 1, which is characterized in that the default Advertising Copy style
Generating process include:
Obtain the corresponding feature vector sample of Advertising Copy sample;
Cluster centre vector is selected from described eigenvector sample;
Based on the cluster centre vector, genre classification is carried out to the Advertising Copy sample, obtains the default Advertising Copy
Style.
4. Advertising Copy style recognition methods according to claim 3, which is characterized in that described from described eigenvector sample
Cluster centre vector is selected in this, comprising:
A feature vector sample is randomly selected out from described eigenvector sample as initial cluster center vector;
Calculate each described eigenvector sample in addition to the initial cluster center vector and the initial cluster center to
The distance of amount, if more than pre-determined distance threshold value, then using this feature vector sample as another initial cluster center vector;
Calculate each described eigenvector sample in addition to the initial cluster center vector and the initial cluster center to
The distance of nearest initial cluster center vector in amount, if more than the pre-determined distance threshold value, then using this feature vector sample as
Another initial cluster center vector, until stopping when the quantity of the initial cluster center vector reaches preset threshold;
It regard each initial cluster center vector as the cluster centre vector.
5. Advertising Copy style recognition methods according to claim 4, which is characterized in that based on the cluster centre to
Amount carries out genre classification to the Advertising Copy sample, obtains the default Advertising Copy style, comprising:
According to each described eigenvector sample and each cluster centre vector in addition to the cluster centre vector
Distance carries out vector classification to all described eigenvector samples, and obtains vector class categories quantity;Wherein, every one kind
In any two feature vector sample range difference within the scope of preset data;
Recalculate the new cluster centre vector of each category feature vector sample;
If the new cluster centre vector of each category feature vector sample is at a distance from former cluster centre vector in pre-determined distance
Threshold range is interior or reaches set default the number of iterations, it is determined that the other quantity of the commercial paper of described eigenvector sample
It is identical as the vector class categories quantity;
Determine the Advertising Copy style of each category feature vector sample, and as the default Advertising Copy style.
6. Advertising Copy style recognition methods according to claim 5, which is characterized in that each category feature of determination to
Measure the Advertising Copy style of sample, comprising:
For each category feature vector sample, vocabulary frequency system is carried out to the corresponding Advertising Copy sample of described eigenvector sample
Meter, filters out the vocabulary of specified quantity before ranking;
Obtain the Advertising Copy style determined according to the vocabulary of specified quantity before ranking.
7. Advertising Copy style recognition methods according to claim 6, which is characterized in that it is based on described eigenvector, from
The corresponding Advertising Copy style of described eigenvector is identified in default Advertising Copy style, comprising:
Obtain the ad classification model including the default Advertising Copy style;
Based on described eigenvector and the ad classification model, determine that the default Advertising Copy style of corresponding maximum probability is made
For the corresponding Advertising Copy style of described eigenvector.
8. a kind of Advertising Copy style identifying system characterized by comprising
Official documents and correspondence obtains module, for obtaining the targeted advertisements official documents and correspondence of pending Advertising Copy genre classification;
Vector determining module, for determining the feature vector of the targeted advertisements official documents and correspondence;
Style identification module identifies described eigenvector from default Advertising Copy style for being based on described eigenvector
Corresponding Advertising Copy style;The default Advertising Copy style is based on the corresponding feature vector sample training of Advertising Copy sample
It obtains.
9. Advertising Copy style identifying system according to claim 8, which is characterized in that further include:
Sample acquisition module, for obtaining the corresponding feature vector sample of Advertising Copy sample;
Vector chooses module, for selecting cluster centre vector from described eigenvector sample;
Genre classification module carries out genre classification to the Advertising Copy sample, obtains for being based on the cluster centre vector
The default Advertising Copy style.
10. Advertising Copy style identifying system according to claim 9, the vector chooses module and includes:
Submodule is screened, for randomly selecting out a feature vector sample from described eigenvector sample as initial clustering
Center vector;
Computational submodule, for calculate each described eigenvector sample in addition to the initial cluster center vector with it is described
The distance of initial cluster center vector, it is if more than pre-determined distance threshold value, then initial poly- using this feature vector sample as another
Class center vector;Calculate each described eigenvector sample and the initial clustering in addition to the initial cluster center vector
The distance of nearest initial cluster center vector in center vector, if more than the pre-determined distance threshold value, then by this feature vector sample
This is as another initial cluster center vector, until the quantity of the initial cluster center vector is stopped when reaching preset threshold
Only;
Submodule is determined, for regarding each initial cluster center vector as the cluster centre vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910748697.3A CN110458236A (en) | 2019-08-14 | 2019-08-14 | A kind of Advertising Copy style recognition methods and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910748697.3A CN110458236A (en) | 2019-08-14 | 2019-08-14 | A kind of Advertising Copy style recognition methods and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110458236A true CN110458236A (en) | 2019-11-15 |
Family
ID=68486484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910748697.3A Pending CN110458236A (en) | 2019-08-14 | 2019-08-14 | A kind of Advertising Copy style recognition methods and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458236A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948538A (en) * | 2021-01-29 | 2021-06-11 | 北京字节跳动网络技术有限公司 | Text processing method and equipment |
CN113553499A (en) * | 2021-06-22 | 2021-10-26 | 杭州摸象大数据科技有限公司 | Cheating detection method and system based on marketing fission and electronic equipment |
CN113934815A (en) * | 2021-09-18 | 2022-01-14 | 有米科技股份有限公司 | Advertisement and pattern characteristic information identification method and device based on neural network |
CN113935765A (en) * | 2021-09-18 | 2022-01-14 | 有米科技股份有限公司 | Neural network-based advertising pattern style identification method and device |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144158A1 (en) * | 2003-11-18 | 2005-06-30 | Capper Liesl J. | Computer network search engine |
US20060259360A1 (en) * | 2005-05-16 | 2006-11-16 | Manyworlds, Inc. | Multiple Attribute and Behavior-based Advertising Process |
CN101000627A (en) * | 2007-01-15 | 2007-07-18 | 北京搜狗科技发展有限公司 | Method and device for issuing correlation information |
US8038060B2 (en) * | 2006-09-04 | 2011-10-18 | Seiko Instruments Inc. | ID image providing device |
CN104156690A (en) * | 2014-06-27 | 2014-11-19 | 辽宁石油化工大学 | Gesture recognition method based on image space pyramid bag of features |
US9299081B2 (en) * | 2012-09-10 | 2016-03-29 | Yahoo! Inc. | Deriving a user profile from questions |
CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature |
CN106874923A (en) * | 2015-12-14 | 2017-06-20 | 阿里巴巴集团控股有限公司 | A kind of genre classification of commodity determines method and device |
CN108596051A (en) * | 2018-04-04 | 2018-09-28 | 浙江大学城市学院 | A kind of intelligent identification Method towards product style image |
CN108898165A (en) * | 2018-06-12 | 2018-11-27 | 浙江大学 | A kind of recognition methods of billboard style |
CN109447706A (en) * | 2018-10-25 | 2019-03-08 | 深圳前海微众银行股份有限公司 | Advertising Copy generation method, device, equipment and readable storage medium storing program for executing |
CN109583952A (en) * | 2018-11-28 | 2019-04-05 | 深圳前海微众银行股份有限公司 | Advertising Copy processing method, device, equipment and computer readable storage medium |
CN109831684A (en) * | 2019-03-11 | 2019-05-31 | 深圳前海微众银行股份有限公司 | Video optimized recommended method, device and readable storage medium storing program for executing |
CN110880120A (en) * | 2018-09-06 | 2020-03-13 | 阿里巴巴集团控股有限公司 | Advertisement method, advertisement system and identification device |
-
2019
- 2019-08-14 CN CN201910748697.3A patent/CN110458236A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144158A1 (en) * | 2003-11-18 | 2005-06-30 | Capper Liesl J. | Computer network search engine |
US20060259360A1 (en) * | 2005-05-16 | 2006-11-16 | Manyworlds, Inc. | Multiple Attribute and Behavior-based Advertising Process |
US8038060B2 (en) * | 2006-09-04 | 2011-10-18 | Seiko Instruments Inc. | ID image providing device |
CN101000627A (en) * | 2007-01-15 | 2007-07-18 | 北京搜狗科技发展有限公司 | Method and device for issuing correlation information |
US9299081B2 (en) * | 2012-09-10 | 2016-03-29 | Yahoo! Inc. | Deriving a user profile from questions |
CN104156690A (en) * | 2014-06-27 | 2014-11-19 | 辽宁石油化工大学 | Gesture recognition method based on image space pyramid bag of features |
CN106874923A (en) * | 2015-12-14 | 2017-06-20 | 阿里巴巴集团控股有限公司 | A kind of genre classification of commodity determines method and device |
CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature |
CN108596051A (en) * | 2018-04-04 | 2018-09-28 | 浙江大学城市学院 | A kind of intelligent identification Method towards product style image |
CN108898165A (en) * | 2018-06-12 | 2018-11-27 | 浙江大学 | A kind of recognition methods of billboard style |
CN110880120A (en) * | 2018-09-06 | 2020-03-13 | 阿里巴巴集团控股有限公司 | Advertisement method, advertisement system and identification device |
CN109447706A (en) * | 2018-10-25 | 2019-03-08 | 深圳前海微众银行股份有限公司 | Advertising Copy generation method, device, equipment and readable storage medium storing program for executing |
CN109583952A (en) * | 2018-11-28 | 2019-04-05 | 深圳前海微众银行股份有限公司 | Advertising Copy processing method, device, equipment and computer readable storage medium |
CN109831684A (en) * | 2019-03-11 | 2019-05-31 | 深圳前海微众银行股份有限公司 | Video optimized recommended method, device and readable storage medium storing program for executing |
Non-Patent Citations (2)
Title |
---|
JIE LIU等: "A Chinese Character Localization Method Based on Intergrating Structure and CC-Clustering for Advertising Images", 《2011 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION》 * |
陈静: "M-Learning内容设计及评估标准研究", 《中国优秀硕士学位论文全文数据库(社会科学Ⅱ辑)》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948538A (en) * | 2021-01-29 | 2021-06-11 | 北京字节跳动网络技术有限公司 | Text processing method and equipment |
CN113553499A (en) * | 2021-06-22 | 2021-10-26 | 杭州摸象大数据科技有限公司 | Cheating detection method and system based on marketing fission and electronic equipment |
CN113934815A (en) * | 2021-09-18 | 2022-01-14 | 有米科技股份有限公司 | Advertisement and pattern characteristic information identification method and device based on neural network |
CN113935765A (en) * | 2021-09-18 | 2022-01-14 | 有米科技股份有限公司 | Neural network-based advertising pattern style identification method and device |
CN113934815B (en) * | 2021-09-18 | 2024-10-29 | 有米科技股份有限公司 | Advertisement document characteristic information identification method and device based on neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458236A (en) | A kind of Advertising Copy style recognition methods and system | |
CN111177374B (en) | Question-answer corpus emotion classification method and system based on active learning | |
CN109767787B (en) | Emotion recognition method, device and readable storage medium | |
CN111104526A (en) | Financial label extraction method and system based on keyword semantics | |
CN109872162B (en) | Wind control classification and identification method and system for processing user complaint information | |
WO2021218028A1 (en) | Artificial intelligence-based interview content refining method, apparatus and device, and medium | |
CN109086265B (en) | Semantic training method and multi-semantic word disambiguation method in short text | |
CN110347908B (en) | Voice shopping method, device, medium and electronic equipment | |
CN107958385B (en) | Bidding based on buyer defined function | |
CN109582788A (en) | Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing | |
CN110263854B (en) | Live broadcast label determining method, device and storage medium | |
CN103870000A (en) | Method and device for sorting candidate items generated by input method | |
CN109492103A (en) | Label information acquisition methods, device, electronic equipment and computer-readable medium | |
CN110750646B (en) | Attribute description extracting method for hotel comment text | |
WO2021218027A1 (en) | Method and apparatus for extracting terminology in intelligent interview, device, and medium | |
CN111783424B (en) | Text sentence dividing method and device | |
US20240046155A1 (en) | Interface for artificial intelligence training | |
CN113051380A (en) | Information generation method and device, electronic equipment and storage medium | |
CN114429134B (en) | Hierarchical high-quality speech mining method and device based on multivariate semantic representation | |
CN114722810A (en) | Real estate customer portrait method and system based on information extraction and multi-attribute decision | |
CN111191029B (en) | AC construction method based on supervised learning and text classification | |
CN114742062B (en) | Text keyword extraction processing method and system | |
Schuller et al. | Semantic speech tagging: Towards combined analysis of speaker traits | |
CN110705308A (en) | Method and device for recognizing field of voice information, storage medium and electronic equipment | |
CN115293818A (en) | Advertisement putting and selecting method and device, equipment and medium thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191115 |
|
RJ01 | Rejection of invention patent application after publication |