CN106469170A - The treating method and apparatus of text data - Google Patents
The treating method and apparatus of text data Download PDFInfo
- Publication number
- CN106469170A CN106469170A CN201510509639.7A CN201510509639A CN106469170A CN 106469170 A CN106469170 A CN 106469170A CN 201510509639 A CN201510509639 A CN 201510509639A CN 106469170 A CN106469170 A CN 106469170A
- Authority
- CN
- China
- Prior art keywords
- destination object
- plot
- analyzed
- state
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind for the treatment of method and apparatus of text data.Wherein, the method includes:Read the text data of the multiple destination objects play, destination object includes any one object following:Film, TV play, modern drama, documentary film, speech and advertisement;Multiple destination objects are carried out with Screening Treatment, generates destination object to be analyzed;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains a plurality of segmentation plot of destination object to be analyzed;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains the probabilistic model of the dramatic progression of destination object to be analyzed;Using the probabilistic model of the dramatic progression of destination object to be analyzed, obtain the probability of the plot state development of destination object to be analyzed.Present application addresses in prior art when screening destination object, because the subjectivity of the text data of manual read's destination object is strong, leading to the inaccurate technical problem of the selection result.
Description
Technical field
The application is related to data processing field, in particular to a kind for the treatment of method and apparatus of text data.
Background technology
When investment objective object (e.g., film, TV play, modern drama, documentary film or advertisement), destination object
Text data (as the drama of film, TV play, modern drama, documentary film or advertisement) the obviously investment to destination object
Success or not has conclusive impact, when the text data of destination object meet spectators aesthetic when, due to target pair
As being liked by spectators, therefore can bring higher interests for investor, then this destination object is clearly once
Successfully invest.When the text data of destination object is not liked by spectators, the spectators paying close attention to this destination object substantially can
Less, destination object does not bring the effect that investor is expected, also cannot for investor bring desired interests or
Higher interests, now, the investment of this destination object clearly once failure.
In prior art, when for screening destination object to be invested, mainly pass through the literary composition of manual read's destination object
Whether the mode of notebook data is worth selecting judging destination object, but, not only efficiency is low for manual read's text data,
And subjective, different people has different judged results, therefore when judging the more worth selection of which destination object,
The mode efficiency of manual read's text data is low, judged result accuracy is relatively low.
Below, the problems referred to above are described in detail taking the application scenarios as film for the destination object as a example.
Video display company, as producer, may need to invest up to ten thousand films every year, and when it invests film, drama
Quality obviously have conclusive impact to the box office of film, and the box office of film characterizes the commercial distribution feelings of film
Whether successfully condition, be to weigh a film one of important symbol, whether the investment that this is directly connected to producer is successful.
Wherein, meet the aesthetic story of a play or opera of spectators and can produce higher box office, therefore more worth investment.
In prior art, when helping producer to carry out film investment decision from the story of a play or opera, it usually needs by manually readding
The mode reading drama (being likely to most of the time simply read story of a play or opera outline) is processing a large amount of dramas, and judges it
In the film that may be liked by spectators, to select the drama of more worth investment from magnanimity drama, help producer to enter
The more valuable investment decision of row.However, manual read's drama is less efficient and subjective, therefore sentencing
During the disconnected more worth investment of which film, less efficient, judged result the accuracy of judgement is relatively low.
In prior art, existing artificial intelligence's patent great majority about film are devoted to commending system, and its purpose exists
In searching out the film that each spectators likes the most from magnanimity film, its output is the film that spectators most possibly like
List (the probability arrangement by liking) is that is to say, that these commending systems are only to carry out " mistake to history cinematic data
Filter " and " sequence ", thus realizing based on a large amount of films shown, are the film that spectators recommend that it is liked the most.
But, above-mentioned commending system cannot screen to magnanimity drama, also cannot judge that in magnanimity drama, which is more worth
Investment.
For in prior art screen destination object when, due to the subjectivity of the text data of manual read's destination object
By force, lead to the inaccurate technical problem of the selection result, not yet propose effective solution at present.
Content of the invention
The embodiment of the present application provides a kind for the treatment of method and apparatus of text data, with least solve in prior art
During screening destination object, because the subjectivity of the text data of manual read's destination object is strong, the selection result is led to be forbidden
True technical problem.
A kind of one side according to the embodiment of the present application, there is provided processing method of text data, including:Read
The text data of the multiple destination objects through playing, destination object includes any one object following:Film, TV
Play, modern drama, documentary film, speech and advertisement;Multiple destination objects are carried out with Screening Treatment, generates target to be analyzed
Object;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains the many of destination object to be analyzed
Bar segmentation plot;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains destination object to be analyzed
Dramatic progression probabilistic model, wherein, probabilistic model is used for characterizing a plurality of segmentation plot of destination object to be analyzed
Included in any two or multiple segmentation plot transformation result;Dramatic progression using destination object to be analyzed
Probabilistic model, obtain the probability of the plot state development of destination object to be analyzed, wherein, plot state development bag
Include any two or multiple segmentation plot.
According to the another aspect of the embodiment of the present application, additionally provide a kind of processing meanss of text data, including:Read
Unit, for reading the text data of multiple destination objects play, destination object include following any one
Object:Film, TV play, modern drama, documentary film, speech and advertisement;Signal generating unit, for multiple destination objects
Carry out Screening Treatment, generate destination object to be analyzed;Processing unit, for the text to destination object to be analyzed
Data carries out Text Pretreatment, obtains a plurality of segmentation plot of destination object to be analyzed;Modeling unit, is used for treating
The a plurality of segmentation plot of the destination object of analysis is modeled, and obtains the probability of the dramatic progression of destination object to be analyzed
Model, wherein, probabilistic model is used for characterizing any two included in a plurality of segmentation plot of destination object to be analyzed
The transformation result of individual or multiple segmentation plot;Acquiring unit, for the dramatic progression using destination object to be analyzed
Probabilistic model, obtains the probability of the plot state development of destination object to be analyzed, wherein, plot state development includes
Any two or multiple segmentation plot.
If it is desired to never screen the target pair of a certain class in the magnanimity destination object of broadcasting in scheme disclosed in the present application
As, the text data of the multiple destination objects that can have been play by reading, and to the plurality of destination object
After text data carries out screening and obtains destination object to be analyzed same type of with required destination object, can will treat
The text data of the destination object of analysis carries out a plurality of segmentation plot that Text Pretreatment obtains this destination object to be analyzed,
Then, a plurality of segmentation plot of this destination object to be analyzed is modeled obtaining the feelings of this destination object to be analyzed
After the probabilistic model of section development, this programme can obtain the plot shape of this destination object to be analyzed using this probabilistic model
The probability of state development, then sieves according in the magnanimity destination object of probability never broadcasting of plot state development getting
The required destination object of choosing.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide
Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling
The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play
Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play
As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application,
Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity
Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings
The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators
Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification
During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment
The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play
Required destination object.
Thus, the scheme that the application provides solves in prior art when screening destination object, due to manual read's mesh
The subjectivity of the text data of mark object is strong, leads to the inaccurate technical problem of the selection result.
Brief description
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please is used for explaining the application, does not constitute the improper restriction to the application.In accompanying drawing
In:
Fig. 1 is a kind of hardware block diagram of the terminal of the processing method of text data of the embodiment of the present application;
Fig. 2 is the flow chart of the processing method of the text data according to the embodiment of the present application one;
Fig. 3 is the flow chart of the processing method of a kind of optional text data according to the embodiment of the present application one;
Fig. 4 is the schematic diagram of the processing meanss of the text data according to the embodiment of the present application two;
Fig. 5 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 6 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 7 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 8 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 9 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Figure 10 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Figure 11 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;And
Figure 12 is a kind of structured flowchart of the terminal according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described the embodiment it is clear that described to the technical scheme in the embodiment of the present application
It is only the embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of not making creative work, all should belong to
The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, "
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this
Sample use data can exchange in the appropriate case so that embodiments herein described herein can with except
Here the order beyond those illustrating or describing is implemented.Additionally, term " comprising " and " having " and they
Any deformation, it is intended that covering non-exclusive comprising, for example, contains process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
List or for these processes, method, product or the intrinsic other steps of equipment or unit.
First, the part noun occurring during the embodiment of the present application is described or term are applied to following solution
Release:
Latent Dirichlet Allocation:Abbreviation LDA, is a kind of document subject matter generation model, also referred to as one
Individual three layers of bayesian probability model, comprise word, theme and document three-decker.LDA is a kind of non-supervisory machine learning
Technology, can be used to identify hiding subject information in extensive document sets or corpus.
Semantic model:It is to increase brand-new data builder data on the basis of relational model to process primitive, for table
Reach the construction of complexity and the new data model of a class of abundant semanteme.
Markov Chain:It is the discrete event stochastic process in mathematics with Markov property, during being somebody's turn to do, giving
In the case of determining current knowledge or information, the past (i.e. currently former historic state) is (i.e. current in the future for prediction
Later to-be) it is unrelated.
HMM:It is statistical model, for describing a markoff process containing implicit unknown parameter,
Be be considered as a markoff process with (hiding) state for observing in the system being modeled statistics horse
Er Kefu model.
Bayesian model:I.e. bayes predictive model, is a kind of time series forecasting with dynamic model as object of study
Method, Bayesian model not only make use of the data message of early stage, is additionally added the information such as experience and the judgement of policymaker,
And objective factor and subjective factorss are combined, there is more motility to abnormal conditions.
TF-IDF model:It is the information retrieval model being widely used in the practical applications such as search engine, its main thought
If the probability that occurs in a document d of word w is high, and seldom occur then it is assumed that word in other documents
W has good separating capacity, is adapted to an article d and other articles make a distinction.
Hadoop platform:Hadoop be on distributed server cluster storage mass data and run distributed analysis should
A kind of method, Hadoop platform is the distributed storage of a suitable big data and the platform calculating, and it is distributed
The core calculating is MapReduce.
MapReduce:It is a kind of programming model, for the concurrent operation of large-scale dataset (more than 1TB), its master
Thought is wanted to be the operation to large-scale dataset, each partial node being distributed under a host node management completes jointly,
Then pass through to integrate the intermediate result of each node, obtain final result.
Embodiment 1
According to the embodiment of the present application, additionally provide a kind of embodiment of the method for the processing method of text data, need explanation
, the step illustrating in the flow process of accompanying drawing can be in the computer system of such as one group of computer executable instructions
Execution, and although showing logical order in flow charts, but in some cases, can be with different from this
The step shown or described by order execution at place.
The embodiment of the method that the embodiment of the present application one is provided can be in mobile terminal, terminal or similar fortune
Calculate in device and execute.Taking run on computer terminals as a example, Fig. 1 is a kind of text data of the embodiment of the present application
The hardware block diagram of the terminal of processing method.As shown in figure 1, terminal 10 can include one or
Multiple (in figure only illustrates one) processor 102 (processor 102 can include but is not limited to Micro-processor MCV or
The processing meanss of PLD FPGA etc.), for the memorizer 104 of data storage and be used for the work(that communicates
The transport module 106 of energy.It will appreciated by the skilled person that the structure shown in Fig. 1 is only illustrating, it is simultaneously
The structure of above-mentioned electronic installation is not caused to limit.For example, terminal 10 may also include more more than shown in Fig. 1
Or less assembly, or there are the configurations different from shown in Fig. 1.
Memorizer 104 can be used for storing software program and the module of application software, the such as text in the embodiment of the present application
Corresponding programmed instruction/the module of processing method of data, processor 102 is stored in soft in memorizer 104 by operation
Part program and module, thus executing various function application and data processing, that is, realize the place of above-mentioned text data
Reason method.Memorizer 104 may include high speed random access memory, may also include nonvolatile memory, such as one or
Multiple magnetic storage devices, flash memory or other non-volatile solid state memories.In some instances, memorizer 104
The memorizer remotely located with respect to processor 102 can be further included, these remote memories can by network even
It is connected to terminal 10.The example of above-mentioned network includes but is not limited to the Internet, intranet, LAN, shifting
Dynamic communication network and combinations thereof.
Transmitting device 106 is used for receiving via a network or sends data.Above-mentioned network instantiation may include
The wireless network that the communication providerses of terminal 10 provide.In an example, transmitting device 106 includes one
Network adapter (Network Interface Controller, NIC), it can be by base station and other network equipments
It is connected thus can be communicated with the Internet.In an example, transmitting device 106 can be radio frequency (Radio
Frequency, RF) module, it is used for wirelessly being communicated with the Internet.
Under above-mentioned running environment, this application provides the processing method of text data as shown in Figure 2.Fig. 2 is root
Flow chart according to the processing method of the text data of the embodiment of the present application one.
As shown in Fig. 2 the method may include steps of:
Step S21, reads the text data of the multiple destination objects play.Wherein, destination object can include
Any one object following:Film, TV play, modern drama, documentary film, speech and advertisement.
Alternatively, the text data of the multiple destination objects play can be stored in data base.In the application
In above-mentioned steps S21, when screening a certain class destination object in the magnanimity destination object needing never broadcasting, Ke Yicong
The text data of the multiple destination objects play of storage is read, with based on many to play in data base
A certain class target pair exactly, is objectively screened in the magnanimity destination object of the analysis result of individual destination object never broadcasting
As.
In an optional embodiment, the text data of destination object can be the feature for characterizing destination object
Text data.Alternatively, the text data of destination object can include but is not limited to the title of destination object, target pair
The protagonist of elephant and its role, the type of destination object, the broadcasting area of destination object, the language of destination object, target
The show time of object, the plot and content (as story of a play or opera outline) of destination object and the hot broadcast level data of destination object
(e.g., box office).
For example, to screen screen play to be invested from magnanimity screen play, destination object is the application scenarios of film
As a example, the above embodiments of the present application are illustrated.Be stored with data base multiple destination objects (e.g., film over the years)
Text data (e.g., cinematic data over the years), this cinematic data over the years includes but is not limited to:The title of each film,
Act the leading role (role must be indicated), type, area, language, show date, story of a play or opera outline, box office.When needs are from magnanimity
Screen play in when screening screen play to be invested, can with the cinematic data over the years of storage in analytical database, and
According to the analysis result screening of cinematic data over the years screen play to be invested, judge the electricity of more worth investment exactly
Film and drama originally it is ensured that the screen play invested can produce high box office, improves the profit of investment.
Firstly, it is necessary to read the cinematic data above-mentioned over the years of the film multiple over the years play from this data base,
To be subsequently analyzed to the cinematic data over the years that these read, thus screen play is screened according to analysis result.
Further, taking read the text data of a destination object as a example, the above embodiments of the present application are illustrated.
For example, with destination object as film《Cause the youth》Application scenarios as a example, the application above-mentioned steps S21 are lifted
Example explanation.Read from data base《Cause the youth》Text data, following text data can be read:Title,
Act the leading role (containing role), type, language, show date, story of a play or opera outline and box office, wherein:
1) title:Cause the youth that we pass at last;
2) act the leading role:Yang Zishan (female one), Zhao Youting (man one), Han Geng (man two), Jiang Shuying (female two), Liu Ya
Plucked instrument (female three), Zhang Yao (female four), bag Bel (man three), Zheng Kai (man four), Wang Jiajia (female five) etc.;
3) type:Love;
4) language:Standard Chinese;
5) show the date:2013-04-26;
6) story of a play or opera outline:The Zheng Wei (Yang Zishan decorations) of 18 years old finally achieve one's goal be admitted to innocent childhood friend next-door elder brother woods quiet
The adjacent school of (Han Geng decorations) place school, waits her to be filled with expecting and steps into campus, but meet with hit woods quiet go out
State studies abroad, and disappears for good and all.Zheng Wei feels more and loses, and go through thick and thin moment and Ruan's room-mate tabernaemontanus bulrush (Jiang Shuying decorations), Zhu little Bei (Liu
Ya Se adorns), Li Weijuan (precious jade decorations) and senior fellow apprentice Lao Zhang-open (bag Bel's decorations) forge deep friendship, simultaneously rich
Family son is permitted the pursuit that Kaiyang (Zheng Kai decorations) expands madness to Zheng Wei, and enjoys Ruan's tabernaemontanus bulrush that boy student welcomes peculiar with her
Chilly guard for loved people Zhao Shiyong (Huang Ming decorations) loyal and steadfast.Once accidental misunderstanding makes Zheng Weiyu Lao Zhang room
Deadly enemy is become in the old filial piety of friend just (Zhao and court of a feudal ruler decorations), and in strikeing back one after another, Zheng Wei finds oneself to fall in love with this surface
Top student grim, that heart is kindhearted, then insanely counterattack develop into extremely twine rotten pursue with beating, and Chen Xiaozheng is also finally
Lay down the arms and surrender under storming, quarrelsome lovers becomes happy lover eventually.When big four graduations, the life of Zheng Wei is again subjected to examine
Test:Old filial piety was just once given birth the planned number of going abroad to study of (Wang Jiajia decorations), but slowly dare not tell Zheng Wei, sensation is again
Cheated Zheng Wei painfully leaves Chen Xiaozheng.After for many years, Zheng Wei has changed in quality for the white-collar beauty on job market, unexpectedly again
Taste the impermanence of destiny:Woods with repentant meaning and love is quiet and Chen Xiaozheng simultaneously returns in her life!Zheng Wei, this
How the little flying dragon in beautiful face in the past, will vouchsafe her dense fog and choice ... in the face of life and youth;
7) box office:726000000.
Multiple destination objects are carried out Screening Treatment by step S23, generate destination object to be analyzed.
Alternatively, Screening Treatment is carried out to multiple destination objects according to the text data of the multiple destination objects reading,
Using the class destination object that obtains of screening as destination object to be analyzed, wherein, this destination object to be analyzed and institute
The destination object that need to screen belongs to identical type.
In an optional embodiment, can be according to the text of default screening rule and the multiple destination objects reading
Data is screening multiple destination objects, and the destination object execution subsequent treatment to be analyzed being obtained based on screening.Optional
Ground, can filter out the text data meeting default screening rule, so from the text data of the destination object reading
The destination object belonging to text data obtaining screening afterwards, as destination object to be analyzed, is obtained and institute by screening
The destination object that need to screen belongs to the destination object to be analyzed of same type.
It is alternatively possible to title based on destination object, the protagonist of destination object and its role, the type of destination object,
The broadcasting area of destination object, the language of destination object, the show time of destination object, the plot and content of destination object
The hot broadcast level data (e.g., box office) of (as story of a play or opera outline) and destination object, to the multiple destination objects reading
Carry out Screening Treatment.
Still to screen screen play to be invested from magnanimity screen play, as a example destination object is the application scenarios of film,
The above embodiments of the present application are illustrated.When producer wants to invest romance movie, need the screen play from magnanimity
The screen play of middle screening love type to be invested.Before selecting screen play, can be based on having play
The screen play which kind of story of a play or opera love class film analyzes more is liked by spectators, when selecting screen play to be invested,
Select the screen play more liked by spectators to be invested according to analysis result, filter out and meet the aesthetic photodrama of spectators
This, thus improve the box office of film.For example, it is possible to read the text data of multiple destination objects from data base (such as
Magnanimity cinematic data over the years), then the heat shown (is such as filtered out 2012 to 2015 according to default screening rule
Door love class film, wherein, when box office exceedes predetermined threshold value, that is, is considered popular film), from magnanimity film over the years
The cinematic data meeting this screening rule is filtered out in data, and by the film corresponding to the cinematic data filtering out (such as
The popular love class film shown for 2012 to 2015) as destination object to be analyzed.
For example, after reading magnanimity cinematic data over the years from data base, from wherein filtering out as filtered out 2012
Year showed to 2015, box office exceedes the love class film of predetermined threshold value, the love class film that screening is obtained is as upper
The destination object to be analyzed stated.
In the application above-mentioned steps S23, by preliminary screening is carried out to magnanimity destination object, the mesh obtaining will be screened
Mark object, as destination object to be analyzed, can filter out the destination object corresponding to actual needs, and remove and need with actual
Want other unrelated destination objects, reducing in subsequent processes needs data volume to be processed, thus improving data processing
Efficiency.
Step S25, carries out Text Pretreatment to the text data of destination object to be analyzed, obtains target pair to be analyzed
The a plurality of segmentation plot of elephant.
Specifically, after screening obtains destination object to be analyzed, the text data of destination object to be analyzed is entered
Row Text Pretreatment, obtains a plurality of segmentation plot of destination object to be analyzed.Alternatively, Text Pretreatment can wrap
Include but be not limited to Text Feature Extraction process, subordinate sentence process, duplicate removal process and merging treatment.
In an optional embodiment, the text data of the destination object being analysed to by Text Pretreatment is converted to
Comparable, accessible a plurality of segmentation plot, so that a plurality of segmentation that can be obtained using division in subsequent processes
Plot is modeled.It is alternatively possible to text is carried out to the plot and content in the text data of destination object to be analyzed
Pretreatment, plot and content is converted to comparable, accessible a plurality of segmentation plot, thus the target pair being analysed to
The plot and content of elephant is converted to each stage of concrete details development, for subsequently excavating and extracting destination object to be analyzed
Dramatic progression general rule provide basis.
Still to screen screen play to be invested from magnanimity screen play, as a example destination object is the application scenarios of film,
The above embodiments of the present application are illustrated.With destination object to be analyzed for screening 2013 to 2015 obtaining
The popular love class film shown, Text Pretreatment include Text Feature Extraction process, subordinate sentence is processed, duplicate removal is processed and closes
And it is processed as application scenarios, the text data execution text of the popular love class film that 2013 to 2015 are shown
Extraction process, the plot and content (as story of a play or opera outline) in text data is extracted;Obstruct to extracting the story of a play or opera obtaining
Generally carry out subordinate sentence to process and duplicate removal process, processed by subordinate sentence and duplicate removal is processed the content of repetition semantic in story of a play or opera outline
Get rid of, then each subordinate sentence removing after the semantic content repeating is merged into by a plurality of Semantic Coherence by merging treatment
Sentence, wherein, each sentence characterize story of a play or opera plot different;The a plurality of sentence that obtains will be merged as above-mentioned
A plurality of segmentation plot, thus the story of a play or opera outline of the popular love class film shown for all 2013 to 2015 turns
It is changed to comparable object, so that the follow-up dramatic progression to popular love class film is analyzed, and according to analysis knot
Fruit screens the love class screen play being more worth investment from magnanimity screen play.
In the application above-mentioned steps S25, by Text Pretreatment is carried out to the text data of destination object to be analyzed,
Text data is converted to accessible a plurality of segmentation plot, provides convenience for follow-up modeling process.
Step S27, is modeled to a plurality of segmentation plot of destination object to be analyzed, obtains destination object to be analyzed
Dramatic progression probabilistic model.Wherein, probabilistic model is used for characterizing a plurality of segmentation plot of destination object to be analyzed
Included in any two or multiple segmentation plot transformation result.
Alternatively, using predetermined modeler model, statistical modeling is carried out to a plurality of segmentation plot obtaining, analyze each point
Transformational relation between section plot, obtains the probabilistic model of the dramatic progression of destination object to be analyzed, is treated with characterizing this
Any two or the transformation result of multiple segmentation plot that the destination object of analysis is comprised.
In an optional embodiment, after the text data to destination object to be analyzed carries out Text Pretreatment,
Using statistical models, a plurality of segmentation plot obtaining is trained, sets up the transformational relation between each segmentation plot,
Thus obtaining characterizing the probabilistic model of the dramatic progression of development trend between each segmentation plot.Alternatively, because this is general
Rate model is to be obtained based on magnanimity destination object analysis to be analyzed, and the probabilistic model of this dramatic progression can be used as treating point
In the destination object of analysis, the universal model of dramatic progression propulsion, objectively reflects in the plot of this destination object to be analyzed
Hold.
Still will screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application are illustrated by example.(e.g., 2013 to 2015 to obtain destination object to be analyzed in screening
The popular love class film that year shows) after, by Text Pretreatment is carried out to destination object to be analyzed, by its text
Data (as story of a play or opera outline) is converted to a plurality of different segmentation plot, thus 2013 were shown to 2015
Popular love class film film plot be converted into film plot development each stage.Then, using predetermined system
All segmentation plots that the popular love class film that model was shown learned to all 2013 to 2015 by meter are trained,
Obtain the transformation result of conversion between each segmentation plot.For example, from a segmentation scenario transition to another segmentation feelings
The transition probability of section, wherein, the size of transition probability can represent from a segmentation dramatic progression to another segmentation feelings
The size of the trend of section, transition probability is bigger, from the probability of a segmentation dramatic progression to another segmentation plot just
Bigger.By the probabilistic model of dramatic progression, can set up between a plurality of segmentation plot obtaining in above-described embodiment
Contact and determine the development trend of each segmentation plot, thus realizing going out popular love class electricity using big data technology mining
The general rule of film plot propulsion in shadow, helps producer to carry out from the story of a play or opera of screen play based on this general rule
Investment decision.
Step S29, using the probabilistic model of the dramatic progression of destination object to be analyzed, obtains destination object to be analyzed
Plot state development probability.Wherein, plot state development can include any two or multiple segmentation plot.
Alternatively, a plurality of segmentation plot in the destination object to be analyzed based on magnanimity generates the probabilistic model of dramatic progression
Afterwards, using this probabilistic model, the segmentation plot that comprised based on plot state development, calculate target pair to be analyzed
The probability of the plot state development of elephant, thus according to the probability of each plot state development, analyze the target pair of the type
General law of development as included plot and content.
Still to screen screen play to be invested from magnanimity screen play, as a example destination object is the application scenarios of film,
The above embodiments of the present application are illustrated.Take destination object to be analyzed as the hot topic shown for 2013 to 2015
As a example the application scenarios of love class film, turn in the text data (as story of a play or opera outline) based on destination object to be analyzed
After a plurality of segmentation plot got in return sets up the probabilistic model of dramatic progression, calculated using this probabilistic model and comprise difference
The probability corresponding to plot state development of segmentation plot, wherein, the probability of plot state development is bigger, represents spectators
Like the probability of such story of a play or opera bigger, film to launch according to this plot state development the probability of the story of a play or opera
Bigger.The size of the probability of the distinctive circumstance state development according to popular love class film, can analyze and obtain in love
In feelings class film, which kind of plot state development is more liked by spectators, and is sieved from the new drama of magnanimity film based on analysis result
Select the screen play of more worth investment.
Further, when the probability according to plot state development, producer can understand from text data easily and goes through
The dramatic progression of year film, the new drama that the band taken at hand by comparing it screens, if the drama feelings of this new drama
The probability of the plot state development of section is low, and the probability that is, spectators like is low, then can directly abandon;If this new drama
Plot state development probability high, the high plot of the probability liked with spectators is more identical, then can coordinate editor
Processing plot, obtains more preferable drama, further to improve the success rate of investment.
If it is desired to never screen certain in the magnanimity destination object of broadcasting in scheme disclosed in the above embodiments of the present application one
The destination object of one class, the text data of the multiple destination objects that can have been play by reading, and many to this
The text data of individual destination object carry out screening obtain to be analyzed destination object same type of with required destination object it
Afterwards, the text data of the destination object being analysed to carries out Text Pretreatment and obtains this destination object to be analyzed
A plurality of segmentation plot, then, is modeled obtaining this to be analyzed to a plurality of segmentation plot of this destination object to be analyzed
The probabilistic model of the dramatic progression of destination object after, this programme can obtain this mesh to be analyzed using this probabilistic model
The probability of the plot state development of mark object, the then sea according to the probability never broadcasting of plot state development getting
Required destination object is screened in amount destination object.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide
Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling
The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play
Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play
As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application,
Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity
Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings
The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators
Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification
During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment
The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play
Required destination object.
Thus, the scheme of above-described embodiment one that the application provides solves in prior art when screening destination object,
Because the subjectivity of the text data of manual read's destination object is strong, lead to the inaccurate technical problem of the selection result.
According to the above embodiments of the present application, step S23, multiple destination objects are carried out with Screening Treatment, generate to be analyzed
Destination object, can include:
Multiple destination objects are classified by step S231 using default type of theme, obtain any one theme class
One group of destination object that type is comprised.
Specifically, using default type of theme, according to the type in the text data of the multiple destination objects reading
The multiple destination objects reading are categorized as multigroup destination object, every group of destination object corresponds to a kind of type of theme.
Alternatively, default type of theme can include comedy, tragedy, history, action, love, crime, terrible,
The polytypes such as suspense, animation, magical, family, the application is not construed as limiting to the concrete division of type of theme.
In an optional embodiment, after reading the text data of the multiple destination objects play, can
Carry out the classification of coarseness with the type in the text data according to destination object, then recycle default type of theme
Destination object is further divided to some fine-grained type of theme by (type of theme such as being generated by LDA).
For example, after a destination object (as film) is divided into this type of theme of romance movie, can also be entered
It is divided into the type of theme such as youth, marriage, war to one step.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.Text with a destination object
The above embodiments of the present application are illustrated by data instance.For example, as shown in figure 3, with destination object as film
As a example application scenarios, reading the multiple targets play from the data base 30 being stored with cinematic data over the years
After the text data (as cinematic data over the years) of object, for one of destination object (such as《Cause the youth》),
Movie themes grader 31 can be passed through will《Cause the youth》Incorporate this type of theme of romance movie into, further, profit
Will with the topic model being generated by LDA《Cause the youth》Youth subject matter can be further divided into.
Step S233, screens, from any one group of destination object, the object that attention rate exceedes predetermined threshold according to pre-defined rule,
Obtain destination object to be analyzed.
Specifically, using default type of theme, the classification of multiple destination objects is being obtained any one type of theme and wrapped
After the one group of destination object containing, screen popular object from any one group of destination object according to pre-defined rule and (pay close attention to
Degree exceedes the object of predetermined threshold), the object that screening is obtained is as destination object to be analyzed.
Alternatively, pre-defined rule can include but is not limited to:Choice attention exceedes the destination object of predetermined threshold.
In an optional embodiment, if destination object is film, modern drama or speech, attention rate can be box office;
If destination object is TV play or advertisement, attention rate can be audience ratings.
Alternatively, above-mentioned steps S233 are intended to filter out the popular mesh of each classification from categorized good destination object
Mark object.In the above embodiments of the present application, first classification is selected to be some types the reason regenerating popular destination object
Destination object attention rate itself just general, if first generating destination object, then the natively general target of attention rate
Object will not screened out.So that destination object is as film as a example, the born box office of film of some subject matters is just general,
Such as literary film, if first generate popular film, then the result of the Screening Treatment that literary film may would not occur in
In.
Further, in this embodiment, alternatively quickly, we can be with the electricity of nearly 3 to five years for the focus of film
Shadow data, selects the film much surmounting average box office as popular film, wherein, pre-defined rule by the use of pre-defined rule
May be, but is not limited to " selecting more than the film of category box office median in a year ".
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.For example, as shown in figure 3,
Movie themes grader 31 will《Cause the youth》After incorporating love-youth type of theme into, popular film maker 32
Then basis《Cause the youth》Box office whether exceed predetermined threshold, to judge whether by《Cause the youth》Classify as popular electricity
Shadow.If《Cause the youth》Box office exceed predetermined threshold, then be classified as popular film, and in the popular film of classification
Export theme and the film self-information of this film in list 33, be exemplified below:
1) title:Cause the youth that we pass at last;
2) act the leading role:Yang Zishan (female one), Zhao Youting (man one), Han Geng (man two), Jiang Shuying (female two), Liu Ya
Plucked instrument (female three), Zhang Yao (female four), bag Bel (man three), Zheng Kai (man four), Wang Jiajia (female five), Huang Ming (man
Five);
3) language:Standard Chinese;
4) show the date:2013-04-26;
5) story of a play or opera outline:The Zheng Wei (Yang Zishan decorations) of 18 years old finally achieve one's goal be admitted to innocent childhood friend next-door elder brother woods quiet
The adjacent school of (Han Geng decorations) place school, waits her to be filled with expecting and steps into campus, but meet with hit woods quiet go out
State studies abroad, and disappears for good and all.Zheng Wei feels more and loses, and go through thick and thin moment and Ruan's room-mate tabernaemontanus bulrush (Jiang Shuying decorations), Zhu little Bei (Liu
Ya Se adorns), Li Weijuan (precious jade decorations) and senior fellow apprentice Lao Zhang-open (bag Bel's decorations) forge deep friendship, simultaneously rich
Family son is permitted the pursuit that Kaiyang (Zheng Kai decorations) expands madness to Zheng Wei, and enjoys Ruan's tabernaemontanus bulrush that boy student welcomes peculiar with her
Chilly guard for loved people Zhao Shiyong (Huang Ming decorations) loyal and steadfast.Once accidental misunderstanding makes Zheng Weiyu Lao Zhang room
Deadly enemy is become in the old filial piety of friend just (Zhao and court of a feudal ruler decorations), and in strikeing back one after another, Zheng Wei finds oneself to fall in love with this surface
Top student grim, that heart is kindhearted, then insanely counterattack develop into extremely twine rotten pursue with beating, and Chen Xiaozheng is also finally
Lay down the arms and surrender under storming, quarrelsome lovers becomes happy lover eventually.When big four graduations, the life of Zheng Wei is again subjected to examine
Test:Old filial piety was just once given birth the planned number of going abroad to study of (Wang Jiajia decorations), but slowly dare not tell Zheng Wei, sensation is again
Cheated Zheng Wei painfully leaves Chen Xiaozheng.After for many years, Zheng Wei has changed in quality for the white-collar beauty on job market, unexpectedly again
Taste the impermanence of destiny:Woods with repentant meaning and love is quiet and Chen Xiaozheng simultaneously returns in her life!Zheng Wei, this
How the little flying dragon in beautiful face in the past, will vouchsafe her dense fog and choice ... in the face of life and youth;
6) type of theme:Love-youth.
According to the above embodiments of the present application, step S25, text is carried out to the text data of destination object to be analyzed and locates in advance
Reason, obtains a plurality of segmentation plot of destination object to be analyzed, can include:
Step S251, extracts the plot and content of destination object to be analyzed from the text data of destination object to be analyzed.
Specifically, extract the plot and content of this destination object to be analyzed from the text data of destination object to be analyzed.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.For example, from target to be analyzed
Carry in the text data (as cinematic data over the years) of object (the popular love class films as 2013 to 2015)
After taking the plot and content (as story of a play or opera outline) of this destination object to be analyzed, by the story of a play or opera outline of the film reading
Input to film outline sectionaliser 34.
Step S252, carries out fine granularity to the plot and content of destination object to be analyzed or the subordinate sentence of coarseness is processed.
It is alternatively possible to plot and content be carried out using punctuation mark (as comma, fullstop, branch etc.) fine-grained
Subordinate sentence is processed it is also possible to be processed, using punctuation mark (as fullstop), the subordinate sentence that plot and content carries out coarseness.
In one alternatively embodiment, carry out at the subordinate sentence of coarseness in the plot and content to destination object to be analyzed
After reason, the subordinate sentence obtaining after the subordinate sentence of coarseness being processed according to predetermined subordinate sentence rule carries out subordinate sentence process.
In an optional embodiment, in step S252, particulate is carried out to the plot and content of destination object to be analyzed
After the subordinate sentence of degree or coarseness is processed, above-mentioned method can also include:
Step S2521, carries out duplicate removal using semantic model to the subordinate sentence after subordinate sentence process, obtains the subordinate sentence after duplicate removal.
It is alternatively possible to by the semantic model of current comparative maturity, remove the semantic subordinate sentence repeating, after obtaining duplicate removal
Multiple subordinate sentences.
Step S2523, the subordinate sentence that the meaning of one's words in the subordinate sentence after duplicate removal is linked up merges.
It is alternatively possible to using pronoun and conjunction auxiliary with some rules (such as fullstop segmentation can not connect)
Again piece the subordinate sentence after the duplicate removal in this embodiment together, so, the subordinate sentence that some meaning of one's words link up may be incorporated in together.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, film outline
If the story of a play or opera outline of the film of input is divided into stem portion by sectionaliser 34, continue with《Cause the youth》As a example, by film
Outline sectionaliser 34 can separate and obtain following sections:
A) Zheng Wei (Yang Zishan decorations) of 18 years old finally achieve one's goal be admitted to innocent childhood friend next-door elder brother woods quiet (Han Geng decorations)
The adjacent school of place school;
B) Lin Jing goes abroad to study, and disappears for good and all;
C) Zheng Wei feels more and loses, and go through thick and thin moment and Ruan's room-mate tabernaemontanus bulrush (Jiang Shuying decorations), Zhu little Bei (Liu Yase decorations), multitude
Dimension beautiful (precious jade decorations) and senior fellow apprentice Lao Zhang-open (bag Bel's decorations) forge deep friendship;
D) son rich family is permitted the pursuit that Kaiyang (Zheng Kai decorations) expands madness to Zheng Wei simultaneously;
E) Ruan's tabernaemontanus bulrush that boy student welcomes is enjoyed chilly to be guarded for loved people Zhao Shiyong's (Huang Ming decorations) with she is distinctive
Loyal and steadfast;
F) once accidental misunderstanding makes the old filial piety of Zheng Weiyu Lao Zhang room-mate just (Zhao and court of a feudal ruler decorations) become deadly enemy;
G) in strikeing back one after another, Zheng Wei finds oneself to fall in love with the top student that this surface is grim, heart is kindhearted,
Then insanely counterattack develops into extremely to twine and rotten pursues with beating;
H) and Chen Xiaozheng also lays down the arms and surrenders finally under storming, quarrelsome lovers becomes eventually happy lover;
I) old filial piety was just once given birth the planned number of going abroad to study of (Wang Jiajia decorations), but slowly dare not tell Zheng Wei;
J) feel that cheated Zheng Wei painfully leaves Chen Xiaozheng again;
K) after for many years, Zheng Wei has changed in quality for the white-collar beauty on job market, unexpectedly tastes the impermanence of destiny again:With repentant meaning
And Chen Xiaozheng quiet with the woods of love simultaneously returns in her life.
Step S253, the plot and content after subordinate sentence is processed carries out abstract process, obtains a plurality of segmentation plot.
Specifically, the plot and content after above-mentioned steps S253 are intended to process subordinate sentence takes out main contents, convenient follow-up
Carry out statistical analysis and modeling.
In an optional embodiment, step S253, the plot and content after subordinate sentence is processed carries out abstract process, obtains
To a plurality of segmentation plot, can include:
Step S2531, the plot and content after subordinate sentence is processed carries out word segmentation processing, and removes stop-word, obtains a plurality of point
Section plot.
Wherein, after obtaining a plurality of segmentation plot, pre-conditioned according to meeting in every segmentation plot of extraction of semantics
Sentence, and replace, using predetermined general word, the subject meeting in pre-conditioned sentence.
Specifically, after the plot and content after processing subordinate sentence carries out word segmentation processing, stop-word therein is removed,
Obtain a plurality of segmentation plot, and according to meeting pre-conditioned sentence in every segmentation plot of extraction of semantics, and use
The subject meeting in pre-conditioned sentence replaced in predetermined general word.
In an optional embodiment, according in every segmentation plot of extraction of semantics meet pre-conditioned sentence can
To extract the major part in segmentation plot by the main body recognition methodss of current comparative maturity.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, continue with《Cause
Youth》As a example, after being separated by film outline sectionaliser 34 and obtaining above-mentioned several partial contents, film feelings
Save abstract device 35 to pass through by the sentence participle in above-mentioned several partial contents and remove stop-word, then using comparing at present
The major part removing the text after stop-word is extracted in ripe main body recognition methodss, and utilizes actor information, by people
Thing is abstracted into predetermined general word (as film common language " female one ", " man one " etc.), by the abstract device of film plot
35 can be by above-mentioned content a) to k) being converted to following content:
A) female one is admitted to male two universities;
B) man two goes abroad silence;
C) female one and female two, female three, female four, man three become good friend;
D) man four pursuit female one;
E) female two pursues man five;
F) female one and man one become unintentionally deadly enemy;
G) female one pursues man one;
H) man one promises female one;
I) man one obtains female five and goes abroad planned number.Do not tell female one;
J) female one leaves man one;
K) female one and man met again after more than one year.
Further, as shown in figure 3, after the abstract device of film plot 35 obtains above-mentioned content, can will be above-mentioned
Content further arrange, and input to segmentation plot list 36, wherein, the content after arrangement is exemplified below:
1) plot sequence number:1;Subject:Female one;Predicate:It is admitted to;Object:Man two, university;
2) plot sequence number:2;Subject:Man two;Predicate:Go abroad.
According to the above embodiments of the present application, step S27, a plurality of segmentation plot of destination object to be analyzed is modeled,
Obtain the probabilistic model of the dramatic progression of destination object to be analyzed, can include:
Step S271, is modeled to a plurality of segmentation plot of destination object to be analyzed using any one model following:
Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach.
Wherein, the transfer that the probabilistic model of the dramatic progression of destination object to be analyzed is included between any two state is general
Rate, state includes at least one segmentation plot.
Alternatively, above-mentioned steps S271 are intended to using statistical models, segmentation plot is modeled, thus obtaining plot
The probabilistic model of development.In terms of modeling, it is possible to use Markov chain model is calculating by a state to another
The transition probability of state.Plot in order to make close is easier to condense together, it is possible to use hidden Markov chain
Model or bivariate Bayesian hierarchical approach are modeled to a plurality of segmentation plot.
In this embodiment, the output of each model above-mentioned is the transition probability between state and state.
Alternatively, the transition probability between any two state can calculate according to equation below:
State x to state y transition probability=state x to state y frequency/state x to all states send out
Raw number of times.Wherein, x, y are natural number;State x and state y represent in the above-mentioned probabilistic model of dramatic progression
Any two state.
Alternatively, after a plurality of segmentation plot obtaining destination object to be analyzed, using any one mould above-mentioned
After type is modeled to a plurality of segmentation plot, in the probabilistic model obtaining, statistics is changed into state y from state x
Probability of happening is as the transition probability of state x to state y.Specifically, in all of state, statistics is by state x
It is changed into the frequency A of this event of state y, and count sending out of every other this event of state is changed into from state x
Raw number of times B, is used the transition probability as state x to state y for the ratio of frequency A and frequency B, i.e.
Account for the ratio that state x is changed into total frequency of every other state using the frequency being changed into state y from state x
Example characterizes the probability of happening that state x is changed into this event of state y.
In an optional embodiment, statistic behavior x can be realized to state by Hadoop MapReduce instrument
The frequency of y and state x are to the frequency of all states.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, in segmentation feelings
Section list 36 inputs a plurality of segmentation plot to plot progressions model 37, and dramatic progression model 37 uses Markov
Chain model, Hidden Markov chain model or bivariate Bayesian hierarchical approach are modeled to a plurality of segmentation plot, can obtain many
Transition probability between bar state and state, for example, so that segmentation plot is in a large number about the plot of love triangle as a example,
Dramatic progression model 37 may obtain following result:
1) state 1:Female one and female two are boudoir honey, and female one and female two are classmates, female once and female two long ago recognize;
2) state 2:Female one and man one are a pair of lovers, and female one is in unrequited love with man one;
3) state 3:Female two a party run into man one, female one, female two, man one flat when play very well;
4) state 4:Female is out of shape in all one's life, and female one there occurs traffic accident;
5) state 5:Female two and man one come together to look after female one, and female two is met with man one in hospital unintentionally;
6) state 6:Female two and man one have touched out spark;
……
Wherein, the transition probability that state 1 arrives state 2 is 0.9, and the transition probability that state 2 arrives state 3 is 0.7, shape
The transition probability that state 2 arrives state 4 is 0.3, and the transition probability that state 3 arrives state 6 is 0.5, and state 4 arrives state 5
Transition probability be 0.6, state 5 arrive state 6 transition probability be 0.4.
In an optional embodiment, step S29, using the probabilistic model of the dramatic progression of destination object to be analyzed,
Obtain the probability of the plot state development of destination object to be analyzed, can include:
Step S291, obtains any one or more plot state development.Wherein, plot state development is a bar state
Chain, state chain includes at least one state, and the transfer sequence of state.
Step S293, using the transition probability between two neighboring state, is calculated the probability of plot state development.
Alternatively, obtain the transition probability between two neighboring state in one or more state chain, and use state chain,
Calculate the probability of this state chain (namely plot state development).
In an optional embodiment, state chain can include the transfer sequence of multiple states and each state, permissible
It is calculated the probability P of plot state development by equation below:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize between two neighboring state
Transition probability, i be natural number.
Alternatively, due to there is the destination object that magnanimity had been play, a lot of dividing may in therefore one state, be comprised
Section plot, in order that investor can open-and-shut understand a state represented by implication, need to build for this state
A vertical model, using the summary as this state.Can be found using TF-IDF model more ripe at present and can
Reflect the word of this state.Before using this model, need to reject the subject in state first, only will describe thing
The predicate of part, adverbial modifier etc. put into model.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are
The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, sending out in plot
After exhibition model 37 obtains state, plot descriptive model 38 passes through TF-IDF model and generates state description for each state,
Continue with《Cause the youth》As a example, then the possible output result of plot descriptive model 38 is as follows:
State:1;Main body:Female one, female two;Event:It is on speaking terms, boudoir is sweet, good friend;
State:2;Main body:Female one, man one;Event:Like, like, be in unrequited love with;
……
Output it after result exports to film story of a play or opera list 39 in plot descriptive model 38, film story of a play or opera list 39
Calculate the probability of each state chain using the transition probability of plot descriptive model 38, its output form is plot state development
(i.e. state chain) and corresponding probability, as follows:
Plot state development:1;State chain:1、2、3、5;Probability:0.85;
Plot state development:2;State chain:2、4、6、8;Probability:0.7;
……
The output result of film story of a play or opera list 39 is associated the state description that above-mentioned plot descriptive model 38 exports, with regard to energy
Enough it is easily understood that the implication of plot state development, wherein, the probability of above-mentioned plot state development can be read as
Spectators like the probability of this story of a play or opera.
By the above embodiments of the present application, can type of theme film plot generation module based on Markov chain model
(dramatic progression model as shown in Figure 3), can be according to current hot issue and film over the years, by big data text
Digging technology, is analyzed to the film over the years of different themes type and excavates popular, popular feelings therein
Section content, using Markov chain model infer the film story of a play or opera development, thus be automatically performed according to the result of model right
The Potential Evaluation of new drama, can also automatically generate a new screen play according to the development of the film story of a play or opera of this deduction.
In embodiment shown in the application Fig. 3, using big data Text Mining Technology, first magnanimity film is pressed certain
Granularity resolves into some subject matters, then takes out the different development of action stages from the popular film of identical subject matter, and digs
Excavate film plot law of development, thus helping motion picture producer to be best understood from the film what story of a play or opera spectators like,
Producer is helped to carry out investment decision.By the above embodiments of the present application, to heat from this point of penetration of the film story of a play or opera
Door film carries out depth excavation, and its output is the probability that the story of a play or opera outline of abstract and spectators like.Producer is permissible
Easily from these dramatic progression of text understanding, compare the new drama that it takes at hand, if this drama plot spectators
The probability liked is very low, then can directly abandon;If changing drama and plot that spectators like probability high being more identical,
Playwright, screenwriter's processing plot can be coordinated, obtain more preferable drama.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as one and be
The combination of actions of row, but those skilled in the art should know, and the application is not subject to limiting of described sequence of movement
System, because according to the application, some steps can be carried out using other orders or simultaneously.Secondly, art technology
Personnel also should know, embodiment described in this description belongs to preferred embodiment, involved action and module
Not necessarily necessary to the application.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned enforcement
The method of example can be realized by the mode of software plus necessary general hardware platform naturally it is also possible to pass through hardware, but
The former is more preferably embodiment in many cases.Based on such understanding, the technical scheme of the application substantially or
Say that what prior art was contributed partly can be embodied in the form of software product, this computer software product is deposited
Storage, in a storage medium (as ROM/RAM, magnetic disc, CD), includes some instructions use so that a station terminal
Described in equipment (can be mobile phone, computer, server, or network equipment etc.) execution each embodiment of the application
Method.
Embodiment 2
According to the embodiment of the present application, additionally provide a kind of text data of the processing method for implementing above-mentioned text data
Processing meanss, as shown in figure 4, this device can include:Reading unit 41, signal generating unit 43, processing unit
45th, modeling unit 47 and acquiring unit 49.
Wherein, reading unit 41 is used for reading the text data of the multiple destination objects play, wherein, target
Object includes any one object following:Film, TV play, modern drama, documentary film, speech and advertisement.
Alternatively, the text data of the multiple destination objects play can be stored in data base.In the application
In above-mentioned reading unit 41, when screening a certain class destination object in the magnanimity destination object needing never broadcasting, permissible
Read the text data of multiple destination objects of having play of storage from data base, with based on to having play
A certain class target exactly, is objectively screened in the magnanimity destination object of the analysis result of multiple destination objects never broadcasting
Object.
In an optional embodiment, the text data of destination object can be the feature for characterizing destination object
Text data.Alternatively, the text data of destination object can include but is not limited to the title of destination object, target pair
The protagonist of elephant and its role, the type of destination object, the broadcasting area of destination object, the language of destination object, target
The show time of object, the plot and content (as story of a play or opera outline) of destination object and the hot broadcast level data of destination object
(e.g., box office).
Signal generating unit 43 is used for multiple destination objects are carried out Screening Treatment, generates destination object to be analyzed.
Alternatively, Screening Treatment is carried out to multiple destination objects according to the text data of the multiple destination objects reading,
Using the class destination object that obtains of screening as destination object to be analyzed, wherein, this destination object to be analyzed and institute
The destination object that need to screen belongs to identical type.
In an optional embodiment, can be according to the text of default screening rule and the multiple destination objects reading
Data is screening multiple destination objects, and the destination object execution subsequent treatment to be analyzed being obtained based on screening.Optional
Ground, can filter out the text data meeting default screening rule, so from the text data of the destination object reading
The destination object belonging to text data obtaining screening afterwards, as destination object to be analyzed, is obtained and institute by screening
The destination object that need to screen belongs to the destination object to be analyzed of same type.
It is alternatively possible to title based on destination object, the protagonist of destination object and its role, the type of destination object,
The broadcasting area of destination object, the language of destination object, the show time of destination object, the plot and content of destination object
The hot broadcast level data (e.g., box office) of (as story of a play or opera outline) and destination object, to the multiple destination objects reading
Carry out Screening Treatment.
In the above-mentioned signal generating unit of the application 43, by preliminary screening is carried out to magnanimity destination object, screening is obtained
Destination object, as destination object to be analyzed, can filter out the destination object corresponding to actual needs, and removes and reality
Need other unrelated destination objects, reducing in subsequent processes needs data volume to be processed, thus improving at data
The efficiency of reason.
Processing unit 45 is used for carrying out Text Pretreatment to the text data of destination object to be analyzed, obtains to be analyzed
The a plurality of segmentation plot of destination object.
Specifically, after screening obtains destination object to be analyzed, the text data of destination object to be analyzed is entered
Row Text Pretreatment, obtains a plurality of segmentation plot of destination object to be analyzed.Alternatively, Text Pretreatment can wrap
Include but be not limited to Text Feature Extraction process, subordinate sentence process, duplicate removal process and merging treatment.
In an optional embodiment, the text data of the destination object being analysed to by Text Pretreatment is converted to
Comparable, accessible a plurality of segmentation plot, so that a plurality of segmentation that can be obtained using division in subsequent processes
Plot is modeled.It is alternatively possible to text is carried out to the plot and content in the text data of destination object to be analyzed
Pretreatment, plot and content is converted to comparable, accessible a plurality of segmentation plot, thus the target pair being analysed to
The plot and content of elephant is converted to each stage of concrete details development, for subsequently excavating and extracting destination object to be analyzed
Dramatic progression general rule provide basis.
In the above-mentioned processing unit of the application 45, located in advance by text is carried out to the text data of destination object to be analyzed
Reason, text data is converted to accessible a plurality of segmentation plot, provides convenience for follow-up modeling process.
Modeling unit 47 is used for a plurality of segmentation plot of destination object to be analyzed is modeled, and obtains mesh to be analyzed
The probabilistic model of the dramatic progression of mark object, wherein, probabilistic model is used for characterizing a plurality of point of destination object to be analyzed
Any two included in section plot or the transformation result of multiple segmentation plot.
Alternatively, using predetermined modeler model, statistical modeling is carried out to a plurality of segmentation plot obtaining, analyze each point
Transformational relation between section plot, obtains the probabilistic model of the dramatic progression of destination object to be analyzed, is treated with characterizing this
Any two or the transformation result of multiple segmentation plot that the destination object of analysis is comprised.
In an optional embodiment, after the text data to destination object to be analyzed carries out Text Pretreatment,
Using statistical models, a plurality of segmentation plot obtaining is trained, sets up the transformational relation between each segmentation plot,
Thus obtaining characterizing the probabilistic model of the dramatic progression of development trend between each segmentation plot.Alternatively, because this is general
Rate model is to be obtained based on magnanimity destination object analysis to be analyzed, and the probabilistic model of this dramatic progression can be used as treating point
In the destination object of analysis, the universal model of dramatic progression propulsion, objectively reflects in the plot of this destination object to be analyzed
Hold.
Acquiring unit 49 is used for the probabilistic model of the dramatic progression using destination object to be analyzed, obtains mesh to be analyzed
The probability of the plot state development of mark object, wherein, plot state development includes any two or multiple segmentation plot.
Alternatively, a plurality of segmentation plot in the destination object to be analyzed based on magnanimity generates the probabilistic model of dramatic progression
Afterwards, using this probabilistic model, the segmentation plot that comprised based on plot state development, calculate target pair to be analyzed
The probability of the plot state development of elephant, thus according to the probability of each plot state development, analyze the target pair of the type
General law of development as included plot and content.
If it is desired to never screen certain in the magnanimity destination object of broadcasting in scheme disclosed in the above embodiments of the present application two
The destination object of one class, the text data of the multiple destination objects that can have been play by reading, and many to this
The text data of individual destination object carry out screening obtain to be analyzed destination object same type of with required destination object it
Afterwards, the text data of the destination object being analysed to carries out Text Pretreatment and obtains this destination object to be analyzed
A plurality of segmentation plot, then, is modeled obtaining this to be analyzed to a plurality of segmentation plot of this destination object to be analyzed
The probabilistic model of the dramatic progression of destination object after, this programme can obtain this mesh to be analyzed using this probabilistic model
The probability of the plot state development of mark object, the then sea according to the probability never broadcasting of plot state development getting
Required destination object is screened in amount destination object.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide
Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling
The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play
Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play
As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application,
Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity
Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings
The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators
Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification
During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment
The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play
Required destination object.
Thus, the scheme of above-described embodiment two that the application provides solves in prior art when screening destination object,
Because the subjectivity of the text data of manual read's destination object is strong, lead to the inaccurate technical problem of the selection result.
According to the above embodiments of the present application, as shown in figure 5, signal generating unit 43 can include:Sort module 51 and sieve
Modeling block 53.
Wherein, sort module 51 is used for using default type of theme, multiple destination objects being classified, and obtains arbitrarily
One group of destination object that a kind of type of theme is comprised.
Specifically, using default type of theme, according to the type in the text data of the multiple destination objects reading
The multiple destination objects reading are categorized as multigroup destination object, every group of destination object corresponds to a kind of type of theme.
Alternatively, default type of theme can include comedy, tragedy, history, action, love, crime, terrible,
The polytypes such as suspense, animation, magical, family, the application is not construed as limiting to the concrete division of type of theme.
In an optional embodiment, after reading the text data of the multiple destination objects play, can
Carry out the classification of coarseness with the type in the text data according to destination object, then recycle default type of theme
Destination object is further divided to some fine-grained type of theme by (type of theme such as being generated by LDA).
For example, after a destination object (as film) is divided into this type of theme of romance movie, can also be entered
It is divided into the type of theme such as youth, marriage, war to one step.
Screening module 53 exceedes predetermined threshold for screening attention rate from any one group of destination object according to pre-defined rule
Object, obtain destination object to be analyzed.
Specifically, using default type of theme, the classification of multiple destination objects is being obtained any one type of theme and wrapped
After the one group of destination object containing, screen popular object from any one group of destination object according to pre-defined rule and (pay close attention to
Degree exceedes the object of predetermined threshold), the object that screening is obtained is as destination object to be analyzed.
Alternatively, pre-defined rule can include but is not limited to:Choice attention exceedes the destination object of predetermined threshold.
In an optional embodiment, if destination object is film, modern drama or speech, attention rate can be box office;
If destination object is TV play or advertisement, attention rate can be audience ratings.
Alternatively, above-mentioned screening module 53 is intended to filter out the hot topic of each classification from categorized good destination object
Destination object.In the above embodiments of the present application, first classification is selected to be some classes the reason regenerating popular destination object
The destination object of type attention rate itself is just general, if first generating destination object, then the natively general mesh of attention rate
Mark object will not screened out.So that destination object is as film as a example, the born box office of film of some subject matters is just general,
Such as literary film, if first generate popular film, then the result of the Screening Treatment that literary film may would not occur in
In.
Further, in this embodiment, alternatively quickly, we can be with the electricity of nearly 3 to five years for the focus of film
Shadow data, selects the film much surmounting average box office as popular film, wherein, pre-defined rule by the use of pre-defined rule
May be, but is not limited to " selecting more than the film of category box office median in a year ".
According to the above embodiments of the present application, as shown in fig. 6, processing unit 45 can include:Extraction module 61, point
Sentence module 63 and abstract module 65.
Extraction module 61 is used for extracting the feelings of destination object to be analyzed from the text data of destination object to be analyzed
Section content.
Specifically, extract the plot and content of this destination object to be analyzed from the text data of destination object to be analyzed.
Subordinate sentence module 63 is used for the plot and content of destination object to be analyzed is carried out at fine granularity or the subordinate sentence of coarseness
Reason.
It is alternatively possible to plot and content be carried out using punctuation mark (as comma, fullstop, branch etc.) fine-grained
Subordinate sentence is processed it is also possible to be processed, using punctuation mark (as fullstop), the subordinate sentence that plot and content carries out coarseness.
In one alternatively embodiment, carry out at the subordinate sentence of coarseness in the plot and content to destination object to be analyzed
After reason, the subordinate sentence obtaining after the subordinate sentence of coarseness being processed according to predetermined subordinate sentence rule carries out subordinate sentence process.
In an optional embodiment, as shown in fig. 7, above-mentioned device can also include:Deduplication module 71 and conjunction
And module 73.
Wherein, deduplication module 71 is used for carrying out fine granularity or coarseness in the plot and content to destination object to be analyzed
After subordinate sentence is processed, using semantic model, duplicate removal is carried out to the subordinate sentence after subordinate sentence process, obtain the subordinate sentence after duplicate removal.
It is alternatively possible to by the semantic model of current comparative maturity, remove the semantic subordinate sentence repeating, after obtaining duplicate removal
Multiple subordinate sentences.
Merge module 73 to be used for merging the coherent subordinate sentence of the meaning of one's words in the subordinate sentence after duplicate removal.
It is alternatively possible to using pronoun and conjunction auxiliary with some rules (such as fullstop segmentation can not connect)
Again piece the subordinate sentence after the duplicate removal in this embodiment together, so, the subordinate sentence that some meaning of one's words link up may be incorporated in together.
Plot and content after abstract module 65 is used for processing subordinate sentence carries out abstract process, obtains a plurality of segmentation plot.
Specifically, the plot and content after above-mentioned abstract module 65 is intended to process subordinate sentence takes out main contents, after convenience
Continue and carry out statistical analysis and modeling.
In an optional embodiment, as shown in figure 8, abstract module 63 can include:Participle submodule 81.
Wherein, the plot and content after participle submodule 81 is used for processing subordinate sentence carries out word segmentation processing, and removes stop-word,
Obtain a plurality of segmentation plot;Wherein, after obtaining a plurality of segmentation plot, according in every segmentation plot of extraction of semantics
Meet pre-conditioned sentence, and replace, using predetermined general word, the subject meeting in pre-conditioned sentence.
Specifically, after the plot and content after processing subordinate sentence carries out word segmentation processing, stop-word therein is removed,
Obtain a plurality of segmentation plot, and according to meeting pre-conditioned sentence in every segmentation plot of extraction of semantics, and use
The subject meeting in pre-conditioned sentence replaced in predetermined general word.
In an optional embodiment, according in every segmentation plot of extraction of semantics meet pre-conditioned sentence can
To extract the major part in segmentation plot by the main body recognition methodss of current comparative maturity.
According to the above embodiments of the present application, as shown in figure 9, modeling unit 47 can include:MBM 91.
Wherein, MBM 91 is used for using the following a plurality of segmentation feelings to destination object to be analyzed for any one model
Section is modeled:Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach.
Wherein, the transfer that the probabilistic model of the dramatic progression of destination object to be analyzed is included between any two state is general
Rate, state includes at least one segmentation plot.
Alternatively, above-mentioned MBM 91 is intended to using statistical models, segmentation plot is modeled, thus obtaining feelings
The probabilistic model of section development.In terms of modeling, it is possible to use Markov chain model is calculating by a state to another
The transition probability of individual state.Plot in order to make close is easier to condense together, it is possible to use Hidden Markov
Chain model or bivariate Bayesian hierarchical approach are modeled to a plurality of segmentation plot.
In this embodiment, the output of each model above-mentioned is the transition probability between state and state.
Alternatively, the transition probability between any two state can calculate according to equation below:
State x to state y transition probability=state x to state y frequency/state x to all states send out
Raw number of times.Wherein, x, y are natural number;State x and state y represent in the above-mentioned probabilistic model of dramatic progression
Any two state.
Alternatively, after a plurality of segmentation plot obtaining destination object to be analyzed, using any one mould above-mentioned
After type is modeled to a plurality of segmentation plot, in the probabilistic model obtaining, statistics is changed into state y from state x
Probability of happening is as the transition probability of state x to state y.Specifically, in all of state, statistics is by state x
It is changed into the frequency A of this event of state y, and count sending out of every other this event of state is changed into from state x
Raw number of times B, is used the transition probability as state x to state y for the ratio of frequency A and frequency B, i.e.
Account for the ratio that state x is changed into total frequency of every other state using the frequency being changed into state y from state x
Example characterizes the probability of happening that state x is changed into this event of state y.
In an optional embodiment, statistic behavior x can be realized to state by Hadoop MapReduce instrument
The frequency of y and state x are to the frequency of all states.
In an optional embodiment, acquiring unit 49 as shown in Figure 10 can include:Acquisition module 1001 and meter
Calculate module 1003.
Acquisition module 1001 is used for obtaining any one or more plot state development, and wherein, plot state development is one
Bar state chain, state chain includes at least one state, and the transfer sequence of state.
Computing module 1003 is used for using the transition probability between two neighboring state, is calculated plot state development
Probability.
Alternatively, obtain the transition probability between two neighboring state in one or more state chain, and use state chain,
Calculate the probability of this state chain (namely plot state development).
In an optional embodiment, as shown in figure 11, state chain includes multiple states and the transfer of each state is suitable
Sequence, computing module 1003 can include:Calculating sub module 1101, for being calculated plot shape by equation below
The probability P of state development:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize between two neighboring state
Transition probability, i be natural number.
Alternatively, due to there is the destination object that magnanimity had been play, a lot of dividing may in therefore one state, be comprised
Section plot, in order that investor can open-and-shut understand a state represented by implication, need to build for this state
A vertical model, using the summary as this state.Can be found using TF-IDF model more ripe at present and can
Reflect the word of this state.Before using this model, need to reject the subject in state first, only will describe thing
The predicate of part, adverbial modifier etc. put into model.
Alternatively, in the present embodiment, the processing meanss of above-mentioned text data can apply to meter as shown in Figure 1
In the hardware environment that calculation machine terminal 10 is constituted.As shown in figure 1, terminal 10 passes through network calculating with other
Machine terminal is attached, and above-mentioned network includes but is not limited to:Wide area network, Metropolitan Area Network (MAN) or LAN.
Embodiment 3
Embodiments herein can provide a kind of terminal, and this terminal can be in terminal group
Any one computer terminal.Alternatively, in the present embodiment, above computer terminal can also replace with
The terminal units such as mobile terminal.
Alternatively, in the present embodiment, above computer terminal may be located in multiple network equipments of computer network
At least one network equipment.
In the present embodiment, above computer terminal can execute following steps in the leak detection method of application program
Program code:Read the text data of the multiple destination objects play, wherein, destination object includes following appointing
Anticipate a kind of object:Film, TV play, modern drama, documentary film, speech and advertisement;Multiple destination objects are screened
Process, generate destination object to be analyzed;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains
A plurality of segmentation plot to destination object to be analyzed;The a plurality of segmentation plot of destination object to be analyzed is modeled,
Obtain the probabilistic model of the dramatic progression of destination object to be analyzed, wherein, probabilistic model is used for characterizing mesh to be analyzed
Any two included in a plurality of segmentation plot of mark object or the transformation result of multiple segmentation plot;Using to be analyzed
The dramatic progression of destination object probabilistic model, obtain the probability of the plot state development of destination object to be analyzed,
Wherein, plot state development includes any two or multiple segmentation plot.
Alternatively, Figure 12 is a kind of structured flowchart of the terminal according to the embodiment of the present application.As shown in figure 12,
This terminal A can include:One or more (in figure only illustrates one) processor 1201, memorizer 1202,
And transmitting device 1203.
Wherein, memorizer can be used for storing software program and module, the such as place of the text data in the embodiment of the present application
The reason corresponding programmed instruction/module of method and apparatus, processor pass through to run be stored in software program in memorizer and
Module, thus executing various function application and data processing, that is, realizes the processing method of above-mentioned text data.Deposit
Reservoir may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage
Device, flash memory or other non-volatile solid state memories.In some instances, memorizer can further include phase
For the remotely located memorizer of processor, these remote memories can be by network connection to terminal A.
The example of above-mentioned network includes but is not limited to the Internet, intranet, LAN, mobile radio communication and combinations thereof.
Processor can call information and the application program of memory storage by transmitting device, to execute following step:
Read the text data of the multiple destination objects play, wherein, destination object includes any one object following:
Film, TV play, modern drama, documentary film, speech and advertisement;Screening Treatment is carried out to multiple destination objects, generation is treated
The destination object of analysis;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains mesh to be analyzed
The a plurality of segmentation plot of mark object;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains to be analyzed
The dramatic progression of destination object probabilistic model, wherein, probabilistic model be used for characterize the many of destination object to be analyzed
Any two included in bar segmentation plot or the transformation result of multiple segmentation plot;Using destination object to be analyzed
Dramatic progression probabilistic model, obtain the probability of the plot state development of destination object to be analyzed, wherein, plot
State development includes any two or multiple segmentation plot.
Optionally, above-mentioned processor can also carry out the program code of following steps:Using default type of theme to many
The carrying out of individual destination object is classified, and obtains one group of destination object that any one type of theme is comprised;Establish rules according to pre-
Then from any one group of destination object, screening attention rate exceedes the object of predetermined threshold, obtains destination object to be analyzed.
Optionally, above-mentioned processor can also carry out the program code of following steps:Literary composition from destination object to be analyzed
The plot and content of destination object to be analyzed is extracted in notebook data;The plot and content of destination object to be analyzed is carried out carefully
The subordinate sentence of granularity or coarseness is processed;Plot and content after subordinate sentence is processed carries out abstract process, obtains a plurality of segmentation feelings
Section.
Optionally, above-mentioned processor can also carry out the program code of following steps:Using semantic model, subordinate sentence is processed
Subordinate sentence afterwards carries out duplicate removal, obtains the subordinate sentence after duplicate removal;The subordinate sentence that the meaning of one's words in subordinate sentence after duplicate removal is linked up merges.
Optionally, above-mentioned processor can also carry out the program code of following steps:Plot and content after subordinate sentence is processed
Carry out word segmentation processing, and remove stop-word, obtain a plurality of segmentation plot;Wherein, after obtaining a plurality of segmentation plot,
According to the pre-conditioned sentence that meets in every segmentation plot of extraction of semantics, and replaced full using predetermined general word
Subject in the pre-conditioned sentence of foot.
Optionally, above-mentioned processor can also carry out the program code of following steps:Using any one model pair following
The a plurality of segmentation plot of destination object to be analyzed is modeled:Markov chain model, Hidden Markov chain model and
Bivariate Bayesian hierarchical approach;Wherein, the probabilistic model of the dramatic progression of destination object to be analyzed includes any two state
Between transition probability, state includes at least one segmentation plot.
Optionally, above-mentioned processor can also carry out the program code of following steps:Obtain any one or more plots
State development, wherein, plot state development is a bar state chain, and state chain includes at least one state, and state
Transfer sequence;Using the transition probability between two neighboring state, it is calculated the probability of plot state development.
Optionally, above-mentioned processor can also carry out the program code of following steps:State chain includes multiple states and every
The transfer sequence of individual state, is calculated the probability P of plot state development by equation below:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize turning between two neighboring state
Move probability, i is natural number.
If it is desired to never screen in the magnanimity destination object of broadcasting a certain in scheme disclosed in the above embodiments of the present application
The destination object of class, the text data of the multiple destination objects that can have been play by reading, and to the plurality of
After the text data of destination object carries out screening and obtains destination object to be analyzed same type of with required destination object,
The text data of the destination object that can be analysed to carries out Text Pretreatment and obtains a plurality of of this destination object to be analyzed
Segmentation plot, then, is modeled obtaining this mesh to be analyzed to a plurality of segmentation plot of this destination object to be analyzed
After the probabilistic model of dramatic progression of mark object, this programme can obtain this target pair to be analyzed using this probabilistic model
The probability of the plot state development of elephant, the then magnanimity mesh according to the probability never broadcasting of plot state development getting
Required destination object is screened in mark object.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide
Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling
The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play
Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play
As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application,
Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity
Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings
The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators
Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification
During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment
The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play
Required destination object.
Thus, the application provide above-described embodiment scheme solve in prior art screen destination object when, by
Strong in the subjectivity of the text data of manual read's destination object, lead to the inaccurate technical problem of the selection result.
It will appreciated by the skilled person that the structure shown in Figure 12 is only illustrating, terminal can also be
Smart mobile phone (as Android phone, iOS mobile phone etc.), panel computer, applause computer and mobile internet device
The terminal unit such as (Mobile Internet Devices, MID), PAD.Figure 12 its not to above-mentioned electronic installation
Structure cause limit.For example, terminal A may also include the assembly more or more less than shown in Figure 12 (such as
Network interface, display device etc.), or there are the configurations different from shown in Figure 12.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is permissible
Completed come the device-dependent hardware of command terminal by program, this program can be stored in a computer-readable storage medium
In matter, storage medium can include:Flash disk, read only memory (Read-Only Memory, ROM), deposit at random
Take device (Random Access Memory, RAM), disk or CD etc..
Embodiment 4
Embodiments herein additionally provides a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium
Can be used for preserving the program code performed by the processing method of text data that above-described embodiment one is provided.
Alternatively, in the present embodiment, above-mentioned storage medium may be located in computer network Computer terminal group
In any one terminal, or it is located in any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, storage medium is arranged to store the program code for executing following steps:
Read the text data of the multiple destination objects play, wherein, destination object includes any one object following:
Film, TV play, modern drama, documentary film, speech and advertisement;Screening Treatment is carried out to multiple destination objects, generation is treated
The destination object of analysis;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains mesh to be analyzed
The a plurality of segmentation plot of mark object;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains to be analyzed
The dramatic progression of destination object probabilistic model, wherein, probabilistic model be used for characterize the many of destination object to be analyzed
Any two included in bar segmentation plot or the transformation result of multiple segmentation plot;Using destination object to be analyzed
Dramatic progression probabilistic model, obtain the probability of the plot state development of destination object to be analyzed, wherein, plot
State development includes any two or multiple segmentation plot.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Using default
Type of theme the carrying out of multiple destination objects is classified, obtain one group of target pair that any one type of theme is comprised
As;Screen, from any one group of destination object, the object that attention rate exceedes predetermined threshold according to pre-defined rule, obtain treating point
The destination object of analysis.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:From to be analyzed
The text data of destination object in extract the plot and content of destination object to be analyzed;To destination object to be analyzed
Plot and content carries out fine granularity or the subordinate sentence of coarseness is processed;Plot and content after subordinate sentence is processed carries out abstract process,
Obtain a plurality of segmentation plot.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Using semanteme
Model carries out duplicate removal to the subordinate sentence after subordinate sentence process, obtains the subordinate sentence after duplicate removal;The meaning of one's words in subordinate sentence after duplicate removal is linked up
Subordinate sentence merge.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:At subordinate sentence
Plot and content after reason carries out word segmentation processing, and removes stop-word, obtains a plurality of segmentation plot;Wherein, obtain many
After bar segmentation plot, according to meeting pre-conditioned sentence in every segmentation plot of extraction of semantics, and using predetermined
General word replace the subject meeting in pre-conditioned sentence.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Using as follows
Any one model is modeled to a plurality of segmentation plot of destination object to be analyzed:Markov chain model, hidden horse
Er Kefu chain model and bivariate Bayesian hierarchical approach;Wherein, the probabilistic model bag of the dramatic progression of destination object to be analyzed
Include the transition probability between any two state, state includes at least one segmentation plot.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Obtain arbitrarily
One or more plot state development, wherein, plot state development is a bar state chain, and state chain includes at least one
State, and the transfer sequence of state;Using the transition probability between two neighboring state, it is calculated plot state
The probability of development.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:State chain bag
Include the transfer sequence of multiple states and each state, be calculated the probability P of plot state development by equation below:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize turning between two neighboring state
Move probability, i is natural number.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment
The part describing in detail, may refer to the associated description of other embodiment.
It should be understood that disclosed technology contents in several embodiments provided herein, other can be passed through
Mode realize.Wherein, device embodiment described above is only the schematically division of for example described unit,
It is only a kind of division of logic function, actual can have other dividing mode when realizing, for example multiple units or assembly
Can in conjunction with or be desirably integrated into another system, or some features can be ignored, or does not execute.Another, institute
The coupling each other of display or discussion or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, show as unit
The part showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple NEs.Some or all of unit therein can be selected according to the actual needs to realize the present embodiment
The purpose of scheme.
In addition, can be integrated in a processing unit in each functional unit in each embodiment of the application it is also possible to
It is that unit is individually physically present it is also possible to two or more units are integrated in a unit.Above-mentioned integrated
Unit both can be to be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If described integrated unit realized using in the form of SFU software functional unit and as independent production marketing or use when,
Can be stored in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application
On all or part of the part that in other words prior art contributed or this technical scheme can be with software product
Form embodies, and this computer software product is stored in a storage medium, including some instructions with so that one
Platform computer equipment (can be personal computer, server or network equipment etc.) executes each embodiment institute of the application
State all or part of step of method.And aforesaid storage medium includes:USB flash disk, read only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD
Etc. various can be with the medium of store program codes.
The above is only the preferred implementation of the application it is noted that ordinary skill people for the art
For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten
Decorations also should be regarded as the protection domain of the application.
Claims (16)
1. a kind of processing method of text data is it is characterised in that include:
Read the text data of the multiple destination objects play, wherein, described destination object includes as follows
Any one object:Film, TV play, modern drama, documentary film, speech and advertisement;
The plurality of destination object is carried out with Screening Treatment, generates destination object to be analyzed;
Text Pretreatment is carried out to the text data of described destination object to be analyzed, obtains described mesh to be analyzed
The a plurality of segmentation plot of mark object;
The a plurality of segmentation plot of described destination object to be analyzed is modeled, obtains described target to be analyzed
The probabilistic model of the dramatic progression of object, wherein, described probabilistic model is used for characterizing described target pair to be analyzed
Any two included in a plurality of segmentation plot of elephant or the transformation result of multiple segmentation plot;
Using the probabilistic model of the dramatic progression of described destination object to be analyzed, obtain described target to be analyzed
The probability of the plot state development of object, wherein, plot state development includes described any two or multiple segmentation
Plot.
2. method according to claim 1 is it is characterised in that carry out Screening Treatment to the plurality of destination object,
Generate destination object to be analyzed, including:
Using default type of theme, the plurality of destination object is classified, obtain any one type of theme
The one group of destination object being comprised;
Screen, from any one group of destination object, the object that attention rate exceedes predetermined threshold according to pre-defined rule, obtain
Described destination object to be analyzed.
3. method according to claim 1 is it is characterised in that text data to described destination object to be analyzed
Carry out Text Pretreatment, obtain a plurality of segmentation plot of described destination object to be analyzed, including:
Extract from the text data of described destination object to be analyzed in the plot of described destination object to be analyzed
Hold;
Fine granularity is carried out to the plot and content of described destination object to be analyzed or the subordinate sentence of coarseness is processed;
Plot and content after described subordinate sentence is processed carries out abstract process, obtains described a plurality of segmentation plot.
4. method according to claim 3 is it is characterised in that in the plot to described destination object to be analyzed
After appearance carries out fine granularity or the subordinate sentence process of coarseness, methods described also includes:
Subordinate sentence after described subordinate sentence being processed using semantic model carries out duplicate removal, obtains the subordinate sentence after duplicate removal;
The subordinate sentence that the meaning of one's words in subordinate sentence after described duplicate removal is linked up merges.
5. method according to claim 3 is it is characterised in that the plot and content after processing described subordinate sentence is taken out
As processing, obtain described a plurality of segmentation plot, including:
Plot and content after described subordinate sentence is processed carries out word segmentation processing, and removes stop-word, obtains described a plurality of
Segmentation plot;
Wherein, after obtaining described a plurality of segmentation plot, according to the satisfaction in every segmentation plot of extraction of semantics
Pre-conditioned sentence, and replace the described subject meeting in pre-conditioned sentence using predetermined general word.
6. the method according to any one in claim 1-5 is it is characterised in that to described target pair to be analyzed
The a plurality of segmentation plot of elephant is modeled, and obtains the probabilistic model of the dramatic progression of described destination object to be analyzed,
Including:
Using any one model following, a plurality of segmentation plot of described destination object to be analyzed is modeled:
Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach;
Wherein, the probabilistic model of the dramatic progression of described destination object to be analyzed is included between any two state
Transition probability, described state includes at least one segmentation plot.
7. method according to claim 6 is it is characterised in that sent out using the plot of described destination object to be analyzed
The probabilistic model of exhibition, obtains the probability of the plot state development of described destination object to be analyzed, including:
Obtain any one or more plot state development, wherein, described plot state development is a bar state chain,
Described state chain includes at least one state, and the transfer sequence of state;
Using the transition probability between two neighboring state, it is calculated the probability of described plot state development.
8. method according to claim 7 is it is characterised in that described state chain includes multiple states and each state
Transfer sequence, be calculated the probability P of described plot state development by equation below:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize described two neighboring
Transition probability between state, i is natural number.
9. a kind of processing meanss of text data are it is characterised in that include:
Reading unit, for reading the text data of the multiple destination objects play, wherein, described mesh
Mark object includes any one object following:Film, TV play, modern drama, documentary film, speech and advertisement;
Signal generating unit, for the plurality of destination object is carried out with Screening Treatment, generates destination object to be analyzed;
Processing unit, for carrying out Text Pretreatment to the text data of described destination object to be analyzed, obtains
The a plurality of segmentation plot of described destination object to be analyzed;
Modeling unit, for being modeled to a plurality of segmentation plot of described destination object to be analyzed, obtains institute
State the probabilistic model of the dramatic progression of destination object to be analyzed, wherein, described probabilistic model is used for characterizing described
Any two included in a plurality of segmentation plot of destination object to be analyzed or the Change-over knot of multiple segmentation plot
Really:
Acquiring unit, for the probabilistic model of the dramatic progression using described destination object to be analyzed, obtains institute
State the probability of the plot state development of destination object to be analyzed, wherein, plot state development includes described any
Two or more segmentation plots.
10. device according to claim 9 is it is characterised in that described signal generating unit includes:
Sort module, for being classified to the plurality of destination object using default type of theme, must take office
One group of destination object that a kind of type of theme of anticipating is comprised;
Screening module, exceedes predetermined threshold for screening attention rate from any one group of destination object according to pre-defined rule
The object of value, obtains described destination object to be analyzed.
11. devices according to claim 9 are it is characterised in that described processing unit includes:
Extraction module, for extracting described mesh to be analyzed from the text data of described destination object to be analyzed
The plot and content of mark object;
Subordinate sentence module, for carrying out fine granularity or coarseness to the plot and content of described destination object to be analyzed
Subordinate sentence is processed;
Abstract module, carries out abstract process for the plot and content after processing described subordinate sentence, obtains described a plurality of
Segmentation plot.
12. devices according to claim 11 are it is characterised in that described device also includes:
Deduplication module, for carrying out fine granularity or coarseness in the plot and content to described destination object to be analyzed
Subordinate sentence process after, the subordinate sentence after described subordinate sentence being processed using semantic model carries out duplicate removal, after obtaining duplicate removal
Subordinate sentence;
Merge module, the subordinate sentence for the meaning of one's words in the subordinate sentence after described duplicate removal links up merges.
13. devices according to claim 11 are it is characterised in that described abstract module includes:
Participle submodule, carries out word segmentation processing for the plot and content after processing described subordinate sentence, and removes stopping
Word, obtains described a plurality of segmentation plot;
Wherein, after obtaining described a plurality of segmentation plot, according to the satisfaction in every segmentation plot of extraction of semantics
Pre-conditioned sentence, and replace the described subject meeting in pre-conditioned sentence using predetermined general word.
14. devices according to any one in claim 9-13 are it is characterised in that described modeling unit includes:
MBM, for using the following a plurality of segmentation to described destination object to be analyzed for any one model
Plot is modeled:Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach;
Wherein, the probabilistic model of the dramatic progression of described destination object to be analyzed is included between any two state
Transition probability, described state includes at least one segmentation plot.
15. devices according to claim 14 are it is characterised in that described acquiring unit includes:
Acquisition module, for obtaining any one or more plot state development, wherein, described plot state is sent out
Open up as a bar state chain, described state chain includes at least one state, and the transfer sequence of state;
Computing module, for using the transition probability between two neighboring state, being calculated described plot state
The probability of development.
16. devices according to claim 15 are it is characterised in that described state chain includes multiple states and each state
Transfer sequence, described computing module includes:Calculating sub module, described for being calculated by equation below
The probability P of plot state development:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize described two neighboring
Transition probability between state, i is natural number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510509639.7A CN106469170B (en) | 2015-08-18 | 2015-08-18 | The treating method and apparatus of text data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510509639.7A CN106469170B (en) | 2015-08-18 | 2015-08-18 | The treating method and apparatus of text data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106469170A true CN106469170A (en) | 2017-03-01 |
CN106469170B CN106469170B (en) | 2019-09-10 |
Family
ID=58214848
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510509639.7A Active CN106469170B (en) | 2015-08-18 | 2015-08-18 | The treating method and apparatus of text data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106469170B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368529A (en) * | 2017-06-13 | 2017-11-21 | 中国传媒大学 | Documentary film data content feature obtains system and tag library |
CN107404671A (en) * | 2017-06-13 | 2017-11-28 | 中国传媒大学 | Movie contents feature obtains system and application system |
CN107577672A (en) * | 2017-09-19 | 2018-01-12 | 网智天元科技集团股份有限公司 | Method and apparatus based on public sentiment setting drama |
CN107766330A (en) * | 2017-10-25 | 2018-03-06 | 西安影视数据评估中心有限公司 | A kind of system and method for carrying out this quality analysis of movie and television play |
CN108460024A (en) * | 2018-03-30 | 2018-08-28 | 掌阅科技股份有限公司 | Generation method, computing device and the computer storage media of e-book plot trend |
CN109063485A (en) * | 2018-07-27 | 2018-12-21 | 东北大学秦皇岛分校 | A kind of vulnerability classification statistical system and method based on loophole platform |
CN109902701A (en) * | 2018-04-12 | 2019-06-18 | 华为技术有限公司 | Image classification method and device |
CN109902169A (en) * | 2019-01-26 | 2019-06-18 | 北京工业大学 | The method for promoting film recommender system performance based on caption information |
CN111382282A (en) * | 2018-12-28 | 2020-07-07 | 北京国双科技有限公司 | Method, device, storage medium and processor for processing data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706794A (en) * | 2009-11-24 | 2010-05-12 | 上海显智信息科技有限公司 | Information browsing and retrieval method based on semantic entity-relationship model and visualized recommendation |
CN102831234A (en) * | 2012-08-31 | 2012-12-19 | 北京邮电大学 | Personalized news recommendation device and method based on news content and theme feature |
CN103324712A (en) * | 2013-06-19 | 2013-09-25 | 西北工业大学 | Extraction method for non-redundancy plot rule |
CN103914743A (en) * | 2014-04-21 | 2014-07-09 | 中国科学技术大学先进技术研究院 | On-line serial content popularity prediction method based on autoregressive model |
US20150154246A1 (en) * | 2013-12-03 | 2015-06-04 | International Business Machines Corporation | Recommendation Engine using Inferred Deep Similarities for Works of Literature |
CN104965874A (en) * | 2015-06-11 | 2015-10-07 | 腾讯科技(北京)有限公司 | Information processing method and apparatus |
-
2015
- 2015-08-18 CN CN201510509639.7A patent/CN106469170B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101706794A (en) * | 2009-11-24 | 2010-05-12 | 上海显智信息科技有限公司 | Information browsing and retrieval method based on semantic entity-relationship model and visualized recommendation |
CN102831234A (en) * | 2012-08-31 | 2012-12-19 | 北京邮电大学 | Personalized news recommendation device and method based on news content and theme feature |
CN103324712A (en) * | 2013-06-19 | 2013-09-25 | 西北工业大学 | Extraction method for non-redundancy plot rule |
US20150154246A1 (en) * | 2013-12-03 | 2015-06-04 | International Business Machines Corporation | Recommendation Engine using Inferred Deep Similarities for Works of Literature |
CN103914743A (en) * | 2014-04-21 | 2014-07-09 | 中国科学技术大学先进技术研究院 | On-line serial content popularity prediction method based on autoregressive model |
CN104965874A (en) * | 2015-06-11 | 2015-10-07 | 腾讯科技(北京)有限公司 | Information processing method and apparatus |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368529A (en) * | 2017-06-13 | 2017-11-21 | 中国传媒大学 | Documentary film data content feature obtains system and tag library |
CN107404671A (en) * | 2017-06-13 | 2017-11-28 | 中国传媒大学 | Movie contents feature obtains system and application system |
CN107577672A (en) * | 2017-09-19 | 2018-01-12 | 网智天元科技集团股份有限公司 | Method and apparatus based on public sentiment setting drama |
CN107577672B (en) * | 2017-09-19 | 2021-07-06 | 网智天元科技集团股份有限公司 | Public opinion-based script setting method and device |
CN107766330A (en) * | 2017-10-25 | 2018-03-06 | 西安影视数据评估中心有限公司 | A kind of system and method for carrying out this quality analysis of movie and television play |
CN108460024B (en) * | 2018-03-30 | 2019-03-15 | 掌阅科技股份有限公司 | The generation method of e-book plot trend calculates equipment and computer storage medium |
CN108460024A (en) * | 2018-03-30 | 2018-08-28 | 掌阅科技股份有限公司 | Generation method, computing device and the computer storage media of e-book plot trend |
CN109902701A (en) * | 2018-04-12 | 2019-06-18 | 华为技术有限公司 | Image classification method and device |
CN109063485A (en) * | 2018-07-27 | 2018-12-21 | 东北大学秦皇岛分校 | A kind of vulnerability classification statistical system and method based on loophole platform |
CN109063485B (en) * | 2018-07-27 | 2020-08-04 | 东北大学秦皇岛分校 | Vulnerability classification statistical system and method based on vulnerability platform |
CN111382282A (en) * | 2018-12-28 | 2020-07-07 | 北京国双科技有限公司 | Method, device, storage medium and processor for processing data |
CN109902169A (en) * | 2019-01-26 | 2019-06-18 | 北京工业大学 | The method for promoting film recommender system performance based on caption information |
CN109902169B (en) * | 2019-01-26 | 2021-03-30 | 北京工业大学 | Method for improving performance of film recommendation system based on film subtitle information |
Also Published As
Publication number | Publication date |
---|---|
CN106469170B (en) | 2019-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106469170A (en) | The treating method and apparatus of text data | |
Smith et al. | Harnessing ai for augmenting creativity: Application to movie trailer creation | |
CN113748439B (en) | Prediction of successful quotient of movies | |
CN107578292B (en) | User portrait construction system | |
CN104102723B (en) | Search for content providing and search engine | |
CN103744928B (en) | A kind of network video classification method based on history access record | |
CN116702737B (en) | Document generation method, device, equipment, storage medium and product | |
CN107357889A (en) | A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude | |
CN112749608A (en) | Video auditing method and device, computer equipment and storage medium | |
CN107330021A (en) | Data classification method, device and equipment based on multiway tree | |
CN111259154B (en) | Data processing method and device, computer equipment and storage medium | |
CN112153426A (en) | Content account management method and device, computer equipment and storage medium | |
CN111861550B (en) | Family portrait construction method and system based on OTT equipment | |
CN113392331A (en) | Text processing method and equipment | |
CN109816438A (en) | Information-pushing method and device | |
CN110309114A (en) | Processing method, device, storage medium and the electronic device of media information | |
CN110096591A (en) | Long text classification method, device, computer equipment and storage medium based on bag of words | |
CN110489593A (en) | Topic processing method, device, electronic equipment and the storage medium of video | |
CN113688951A (en) | Video data processing method and device | |
CN109978491A (en) | Remind prediction technique, device, computer equipment and storage medium | |
WO2022148108A1 (en) | Systems, devices and methods for distributed hierarchical video analysis | |
CN103324662A (en) | Visual method and equipment for dynamic view evolution of social media event | |
CN106777040A (en) | A kind of across media microblogging the analysis of public opinion methods based on feeling polarities perception algorithm | |
Bello et al. | Reverse engineering the behaviour of twitter bots | |
CN105869058B (en) | A kind of method that multilayer latent variable model user portrait extracts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211118 Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province Patentee after: ZHEJIANG TMALL TECHNOLOGY Co.,Ltd. Address before: Box 847, four, Grand Cayman capital, Cayman Islands, UK Patentee before: ALIBABA GROUP HOLDING Ltd. |
|
TR01 | Transfer of patent right |