Nothing Special   »   [go: up one dir, main page]

CN106469170A - The treating method and apparatus of text data - Google Patents

The treating method and apparatus of text data Download PDF

Info

Publication number
CN106469170A
CN106469170A CN201510509639.7A CN201510509639A CN106469170A CN 106469170 A CN106469170 A CN 106469170A CN 201510509639 A CN201510509639 A CN 201510509639A CN 106469170 A CN106469170 A CN 106469170A
Authority
CN
China
Prior art keywords
destination object
plot
analyzed
state
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510509639.7A
Other languages
Chinese (zh)
Other versions
CN106469170B (en
Inventor
叶舟
王瑜
赵诚成
李龙
付志嵩
徐季秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tmall Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510509639.7A priority Critical patent/CN106469170B/en
Publication of CN106469170A publication Critical patent/CN106469170A/en
Application granted granted Critical
Publication of CN106469170B publication Critical patent/CN106469170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

This application discloses a kind for the treatment of method and apparatus of text data.Wherein, the method includes:Read the text data of the multiple destination objects play, destination object includes any one object following:Film, TV play, modern drama, documentary film, speech and advertisement;Multiple destination objects are carried out with Screening Treatment, generates destination object to be analyzed;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains a plurality of segmentation plot of destination object to be analyzed;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains the probabilistic model of the dramatic progression of destination object to be analyzed;Using the probabilistic model of the dramatic progression of destination object to be analyzed, obtain the probability of the plot state development of destination object to be analyzed.Present application addresses in prior art when screening destination object, because the subjectivity of the text data of manual read's destination object is strong, leading to the inaccurate technical problem of the selection result.

Description

The treating method and apparatus of text data
Technical field
The application is related to data processing field, in particular to a kind for the treatment of method and apparatus of text data.
Background technology
When investment objective object (e.g., film, TV play, modern drama, documentary film or advertisement), destination object Text data (as the drama of film, TV play, modern drama, documentary film or advertisement) the obviously investment to destination object Success or not has conclusive impact, when the text data of destination object meet spectators aesthetic when, due to target pair As being liked by spectators, therefore can bring higher interests for investor, then this destination object is clearly once Successfully invest.When the text data of destination object is not liked by spectators, the spectators paying close attention to this destination object substantially can Less, destination object does not bring the effect that investor is expected, also cannot for investor bring desired interests or Higher interests, now, the investment of this destination object clearly once failure.
In prior art, when for screening destination object to be invested, mainly pass through the literary composition of manual read's destination object Whether the mode of notebook data is worth selecting judging destination object, but, not only efficiency is low for manual read's text data, And subjective, different people has different judged results, therefore when judging the more worth selection of which destination object, The mode efficiency of manual read's text data is low, judged result accuracy is relatively low.
Below, the problems referred to above are described in detail taking the application scenarios as film for the destination object as a example.
Video display company, as producer, may need to invest up to ten thousand films every year, and when it invests film, drama Quality obviously have conclusive impact to the box office of film, and the box office of film characterizes the commercial distribution feelings of film Whether successfully condition, be to weigh a film one of important symbol, whether the investment that this is directly connected to producer is successful. Wherein, meet the aesthetic story of a play or opera of spectators and can produce higher box office, therefore more worth investment.
In prior art, when helping producer to carry out film investment decision from the story of a play or opera, it usually needs by manually readding The mode reading drama (being likely to most of the time simply read story of a play or opera outline) is processing a large amount of dramas, and judges it In the film that may be liked by spectators, to select the drama of more worth investment from magnanimity drama, help producer to enter The more valuable investment decision of row.However, manual read's drama is less efficient and subjective, therefore sentencing During the disconnected more worth investment of which film, less efficient, judged result the accuracy of judgement is relatively low.
In prior art, existing artificial intelligence's patent great majority about film are devoted to commending system, and its purpose exists In searching out the film that each spectators likes the most from magnanimity film, its output is the film that spectators most possibly like List (the probability arrangement by liking) is that is to say, that these commending systems are only to carry out " mistake to history cinematic data Filter " and " sequence ", thus realizing based on a large amount of films shown, are the film that spectators recommend that it is liked the most. But, above-mentioned commending system cannot screen to magnanimity drama, also cannot judge that in magnanimity drama, which is more worth Investment.
For in prior art screen destination object when, due to the subjectivity of the text data of manual read's destination object By force, lead to the inaccurate technical problem of the selection result, not yet propose effective solution at present.
Content of the invention
The embodiment of the present application provides a kind for the treatment of method and apparatus of text data, with least solve in prior art During screening destination object, because the subjectivity of the text data of manual read's destination object is strong, the selection result is led to be forbidden True technical problem.
A kind of one side according to the embodiment of the present application, there is provided processing method of text data, including:Read The text data of the multiple destination objects through playing, destination object includes any one object following:Film, TV Play, modern drama, documentary film, speech and advertisement;Multiple destination objects are carried out with Screening Treatment, generates target to be analyzed Object;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains the many of destination object to be analyzed Bar segmentation plot;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains destination object to be analyzed Dramatic progression probabilistic model, wherein, probabilistic model is used for characterizing a plurality of segmentation plot of destination object to be analyzed Included in any two or multiple segmentation plot transformation result;Dramatic progression using destination object to be analyzed Probabilistic model, obtain the probability of the plot state development of destination object to be analyzed, wherein, plot state development bag Include any two or multiple segmentation plot.
According to the another aspect of the embodiment of the present application, additionally provide a kind of processing meanss of text data, including:Read Unit, for reading the text data of multiple destination objects play, destination object include following any one Object:Film, TV play, modern drama, documentary film, speech and advertisement;Signal generating unit, for multiple destination objects Carry out Screening Treatment, generate destination object to be analyzed;Processing unit, for the text to destination object to be analyzed Data carries out Text Pretreatment, obtains a plurality of segmentation plot of destination object to be analyzed;Modeling unit, is used for treating The a plurality of segmentation plot of the destination object of analysis is modeled, and obtains the probability of the dramatic progression of destination object to be analyzed Model, wherein, probabilistic model is used for characterizing any two included in a plurality of segmentation plot of destination object to be analyzed The transformation result of individual or multiple segmentation plot;Acquiring unit, for the dramatic progression using destination object to be analyzed Probabilistic model, obtains the probability of the plot state development of destination object to be analyzed, wherein, plot state development includes Any two or multiple segmentation plot.
If it is desired to never screen the target pair of a certain class in the magnanimity destination object of broadcasting in scheme disclosed in the present application As, the text data of the multiple destination objects that can have been play by reading, and to the plurality of destination object After text data carries out screening and obtains destination object to be analyzed same type of with required destination object, can will treat The text data of the destination object of analysis carries out a plurality of segmentation plot that Text Pretreatment obtains this destination object to be analyzed, Then, a plurality of segmentation plot of this destination object to be analyzed is modeled obtaining the feelings of this destination object to be analyzed After the probabilistic model of section development, this programme can obtain the plot shape of this destination object to be analyzed using this probabilistic model The probability of state development, then sieves according in the magnanimity destination object of probability never broadcasting of plot state development getting The required destination object of choosing.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application, Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play Required destination object.
Thus, the scheme that the application provides solves in prior art when screening destination object, due to manual read's mesh The subjectivity of the text data of mark object is strong, leads to the inaccurate technical problem of the selection result.
Brief description
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used for explaining the application, does not constitute the improper restriction to the application.In accompanying drawing In:
Fig. 1 is a kind of hardware block diagram of the terminal of the processing method of text data of the embodiment of the present application;
Fig. 2 is the flow chart of the processing method of the text data according to the embodiment of the present application one;
Fig. 3 is the flow chart of the processing method of a kind of optional text data according to the embodiment of the present application one;
Fig. 4 is the schematic diagram of the processing meanss of the text data according to the embodiment of the present application two;
Fig. 5 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 6 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 7 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 8 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Fig. 9 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Figure 10 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;
Figure 11 is the schematic diagram of the processing meanss of a kind of optional text data according to the embodiment of the present application two;And
Figure 12 is a kind of structured flowchart of the terminal according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, is clearly and completely described the embodiment it is clear that described to the technical scheme in the embodiment of the present application It is only the embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of not making creative work, all should belong to The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, " Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this Sample use data can exchange in the appropriate case so that embodiments herein described herein can with except Here the order beyond those illustrating or describing is implemented.Additionally, term " comprising " and " having " and they Any deformation, it is intended that covering non-exclusive comprising, for example, contains process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear List or for these processes, method, product or the intrinsic other steps of equipment or unit.
First, the part noun occurring during the embodiment of the present application is described or term are applied to following solution Release:
Latent Dirichlet Allocation:Abbreviation LDA, is a kind of document subject matter generation model, also referred to as one Individual three layers of bayesian probability model, comprise word, theme and document three-decker.LDA is a kind of non-supervisory machine learning Technology, can be used to identify hiding subject information in extensive document sets or corpus.
Semantic model:It is to increase brand-new data builder data on the basis of relational model to process primitive, for table Reach the construction of complexity and the new data model of a class of abundant semanteme.
Markov Chain:It is the discrete event stochastic process in mathematics with Markov property, during being somebody's turn to do, giving In the case of determining current knowledge or information, the past (i.e. currently former historic state) is (i.e. current in the future for prediction Later to-be) it is unrelated.
HMM:It is statistical model, for describing a markoff process containing implicit unknown parameter, Be be considered as a markoff process with (hiding) state for observing in the system being modeled statistics horse Er Kefu model.
Bayesian model:I.e. bayes predictive model, is a kind of time series forecasting with dynamic model as object of study Method, Bayesian model not only make use of the data message of early stage, is additionally added the information such as experience and the judgement of policymaker, And objective factor and subjective factorss are combined, there is more motility to abnormal conditions.
TF-IDF model:It is the information retrieval model being widely used in the practical applications such as search engine, its main thought If the probability that occurs in a document d of word w is high, and seldom occur then it is assumed that word in other documents W has good separating capacity, is adapted to an article d and other articles make a distinction.
Hadoop platform:Hadoop be on distributed server cluster storage mass data and run distributed analysis should A kind of method, Hadoop platform is the distributed storage of a suitable big data and the platform calculating, and it is distributed The core calculating is MapReduce.
MapReduce:It is a kind of programming model, for the concurrent operation of large-scale dataset (more than 1TB), its master Thought is wanted to be the operation to large-scale dataset, each partial node being distributed under a host node management completes jointly, Then pass through to integrate the intermediate result of each node, obtain final result.
Embodiment 1
According to the embodiment of the present application, additionally provide a kind of embodiment of the method for the processing method of text data, need explanation , the step illustrating in the flow process of accompanying drawing can be in the computer system of such as one group of computer executable instructions Execution, and although showing logical order in flow charts, but in some cases, can be with different from this The step shown or described by order execution at place.
The embodiment of the method that the embodiment of the present application one is provided can be in mobile terminal, terminal or similar fortune Calculate in device and execute.Taking run on computer terminals as a example, Fig. 1 is a kind of text data of the embodiment of the present application The hardware block diagram of the terminal of processing method.As shown in figure 1, terminal 10 can include one or Multiple (in figure only illustrates one) processor 102 (processor 102 can include but is not limited to Micro-processor MCV or The processing meanss of PLD FPGA etc.), for the memorizer 104 of data storage and be used for the work(that communicates The transport module 106 of energy.It will appreciated by the skilled person that the structure shown in Fig. 1 is only illustrating, it is simultaneously The structure of above-mentioned electronic installation is not caused to limit.For example, terminal 10 may also include more more than shown in Fig. 1 Or less assembly, or there are the configurations different from shown in Fig. 1.
Memorizer 104 can be used for storing software program and the module of application software, the such as text in the embodiment of the present application Corresponding programmed instruction/the module of processing method of data, processor 102 is stored in soft in memorizer 104 by operation Part program and module, thus executing various function application and data processing, that is, realize the place of above-mentioned text data Reason method.Memorizer 104 may include high speed random access memory, may also include nonvolatile memory, such as one or Multiple magnetic storage devices, flash memory or other non-volatile solid state memories.In some instances, memorizer 104 The memorizer remotely located with respect to processor 102 can be further included, these remote memories can by network even It is connected to terminal 10.The example of above-mentioned network includes but is not limited to the Internet, intranet, LAN, shifting Dynamic communication network and combinations thereof.
Transmitting device 106 is used for receiving via a network or sends data.Above-mentioned network instantiation may include The wireless network that the communication providerses of terminal 10 provide.In an example, transmitting device 106 includes one Network adapter (Network Interface Controller, NIC), it can be by base station and other network equipments It is connected thus can be communicated with the Internet.In an example, transmitting device 106 can be radio frequency (Radio Frequency, RF) module, it is used for wirelessly being communicated with the Internet.
Under above-mentioned running environment, this application provides the processing method of text data as shown in Figure 2.Fig. 2 is root Flow chart according to the processing method of the text data of the embodiment of the present application one.
As shown in Fig. 2 the method may include steps of:
Step S21, reads the text data of the multiple destination objects play.Wherein, destination object can include Any one object following:Film, TV play, modern drama, documentary film, speech and advertisement.
Alternatively, the text data of the multiple destination objects play can be stored in data base.In the application In above-mentioned steps S21, when screening a certain class destination object in the magnanimity destination object needing never broadcasting, Ke Yicong The text data of the multiple destination objects play of storage is read, with based on many to play in data base A certain class target pair exactly, is objectively screened in the magnanimity destination object of the analysis result of individual destination object never broadcasting As.
In an optional embodiment, the text data of destination object can be the feature for characterizing destination object Text data.Alternatively, the text data of destination object can include but is not limited to the title of destination object, target pair The protagonist of elephant and its role, the type of destination object, the broadcasting area of destination object, the language of destination object, target The show time of object, the plot and content (as story of a play or opera outline) of destination object and the hot broadcast level data of destination object (e.g., box office).
For example, to screen screen play to be invested from magnanimity screen play, destination object is the application scenarios of film As a example, the above embodiments of the present application are illustrated.Be stored with data base multiple destination objects (e.g., film over the years) Text data (e.g., cinematic data over the years), this cinematic data over the years includes but is not limited to:The title of each film, Act the leading role (role must be indicated), type, area, language, show date, story of a play or opera outline, box office.When needs are from magnanimity Screen play in when screening screen play to be invested, can with the cinematic data over the years of storage in analytical database, and According to the analysis result screening of cinematic data over the years screen play to be invested, judge the electricity of more worth investment exactly Film and drama originally it is ensured that the screen play invested can produce high box office, improves the profit of investment.
Firstly, it is necessary to read the cinematic data above-mentioned over the years of the film multiple over the years play from this data base, To be subsequently analyzed to the cinematic data over the years that these read, thus screen play is screened according to analysis result.
Further, taking read the text data of a destination object as a example, the above embodiments of the present application are illustrated. For example, with destination object as film《Cause the youth》Application scenarios as a example, the application above-mentioned steps S21 are lifted Example explanation.Read from data base《Cause the youth》Text data, following text data can be read:Title, Act the leading role (containing role), type, language, show date, story of a play or opera outline and box office, wherein:
1) title:Cause the youth that we pass at last;
2) act the leading role:Yang Zishan (female one), Zhao Youting (man one), Han Geng (man two), Jiang Shuying (female two), Liu Ya Plucked instrument (female three), Zhang Yao (female four), bag Bel (man three), Zheng Kai (man four), Wang Jiajia (female five) etc.;
3) type:Love;
4) language:Standard Chinese;
5) show the date:2013-04-26;
6) story of a play or opera outline:The Zheng Wei (Yang Zishan decorations) of 18 years old finally achieve one's goal be admitted to innocent childhood friend next-door elder brother woods quiet The adjacent school of (Han Geng decorations) place school, waits her to be filled with expecting and steps into campus, but meet with hit woods quiet go out State studies abroad, and disappears for good and all.Zheng Wei feels more and loses, and go through thick and thin moment and Ruan's room-mate tabernaemontanus bulrush (Jiang Shuying decorations), Zhu little Bei (Liu Ya Se adorns), Li Weijuan (precious jade decorations) and senior fellow apprentice Lao Zhang-open (bag Bel's decorations) forge deep friendship, simultaneously rich Family son is permitted the pursuit that Kaiyang (Zheng Kai decorations) expands madness to Zheng Wei, and enjoys Ruan's tabernaemontanus bulrush that boy student welcomes peculiar with her Chilly guard for loved people Zhao Shiyong (Huang Ming decorations) loyal and steadfast.Once accidental misunderstanding makes Zheng Weiyu Lao Zhang room Deadly enemy is become in the old filial piety of friend just (Zhao and court of a feudal ruler decorations), and in strikeing back one after another, Zheng Wei finds oneself to fall in love with this surface Top student grim, that heart is kindhearted, then insanely counterattack develop into extremely twine rotten pursue with beating, and Chen Xiaozheng is also finally Lay down the arms and surrender under storming, quarrelsome lovers becomes happy lover eventually.When big four graduations, the life of Zheng Wei is again subjected to examine Test:Old filial piety was just once given birth the planned number of going abroad to study of (Wang Jiajia decorations), but slowly dare not tell Zheng Wei, sensation is again Cheated Zheng Wei painfully leaves Chen Xiaozheng.After for many years, Zheng Wei has changed in quality for the white-collar beauty on job market, unexpectedly again Taste the impermanence of destiny:Woods with repentant meaning and love is quiet and Chen Xiaozheng simultaneously returns in her life!Zheng Wei, this How the little flying dragon in beautiful face in the past, will vouchsafe her dense fog and choice ... in the face of life and youth;
7) box office:726000000.
Multiple destination objects are carried out Screening Treatment by step S23, generate destination object to be analyzed.
Alternatively, Screening Treatment is carried out to multiple destination objects according to the text data of the multiple destination objects reading, Using the class destination object that obtains of screening as destination object to be analyzed, wherein, this destination object to be analyzed and institute The destination object that need to screen belongs to identical type.
In an optional embodiment, can be according to the text of default screening rule and the multiple destination objects reading Data is screening multiple destination objects, and the destination object execution subsequent treatment to be analyzed being obtained based on screening.Optional Ground, can filter out the text data meeting default screening rule, so from the text data of the destination object reading The destination object belonging to text data obtaining screening afterwards, as destination object to be analyzed, is obtained and institute by screening The destination object that need to screen belongs to the destination object to be analyzed of same type.
It is alternatively possible to title based on destination object, the protagonist of destination object and its role, the type of destination object, The broadcasting area of destination object, the language of destination object, the show time of destination object, the plot and content of destination object The hot broadcast level data (e.g., box office) of (as story of a play or opera outline) and destination object, to the multiple destination objects reading Carry out Screening Treatment.
Still to screen screen play to be invested from magnanimity screen play, as a example destination object is the application scenarios of film, The above embodiments of the present application are illustrated.When producer wants to invest romance movie, need the screen play from magnanimity The screen play of middle screening love type to be invested.Before selecting screen play, can be based on having play The screen play which kind of story of a play or opera love class film analyzes more is liked by spectators, when selecting screen play to be invested, Select the screen play more liked by spectators to be invested according to analysis result, filter out and meet the aesthetic photodrama of spectators This, thus improve the box office of film.For example, it is possible to read the text data of multiple destination objects from data base (such as Magnanimity cinematic data over the years), then the heat shown (is such as filtered out 2012 to 2015 according to default screening rule Door love class film, wherein, when box office exceedes predetermined threshold value, that is, is considered popular film), from magnanimity film over the years The cinematic data meeting this screening rule is filtered out in data, and by the film corresponding to the cinematic data filtering out (such as The popular love class film shown for 2012 to 2015) as destination object to be analyzed.
For example, after reading magnanimity cinematic data over the years from data base, from wherein filtering out as filtered out 2012 Year showed to 2015, box office exceedes the love class film of predetermined threshold value, the love class film that screening is obtained is as upper The destination object to be analyzed stated.
In the application above-mentioned steps S23, by preliminary screening is carried out to magnanimity destination object, the mesh obtaining will be screened Mark object, as destination object to be analyzed, can filter out the destination object corresponding to actual needs, and remove and need with actual Want other unrelated destination objects, reducing in subsequent processes needs data volume to be processed, thus improving data processing Efficiency.
Step S25, carries out Text Pretreatment to the text data of destination object to be analyzed, obtains target pair to be analyzed The a plurality of segmentation plot of elephant.
Specifically, after screening obtains destination object to be analyzed, the text data of destination object to be analyzed is entered Row Text Pretreatment, obtains a plurality of segmentation plot of destination object to be analyzed.Alternatively, Text Pretreatment can wrap Include but be not limited to Text Feature Extraction process, subordinate sentence process, duplicate removal process and merging treatment.
In an optional embodiment, the text data of the destination object being analysed to by Text Pretreatment is converted to Comparable, accessible a plurality of segmentation plot, so that a plurality of segmentation that can be obtained using division in subsequent processes Plot is modeled.It is alternatively possible to text is carried out to the plot and content in the text data of destination object to be analyzed Pretreatment, plot and content is converted to comparable, accessible a plurality of segmentation plot, thus the target pair being analysed to The plot and content of elephant is converted to each stage of concrete details development, for subsequently excavating and extracting destination object to be analyzed Dramatic progression general rule provide basis.
Still to screen screen play to be invested from magnanimity screen play, as a example destination object is the application scenarios of film, The above embodiments of the present application are illustrated.With destination object to be analyzed for screening 2013 to 2015 obtaining The popular love class film shown, Text Pretreatment include Text Feature Extraction process, subordinate sentence is processed, duplicate removal is processed and closes And it is processed as application scenarios, the text data execution text of the popular love class film that 2013 to 2015 are shown Extraction process, the plot and content (as story of a play or opera outline) in text data is extracted;Obstruct to extracting the story of a play or opera obtaining Generally carry out subordinate sentence to process and duplicate removal process, processed by subordinate sentence and duplicate removal is processed the content of repetition semantic in story of a play or opera outline Get rid of, then each subordinate sentence removing after the semantic content repeating is merged into by a plurality of Semantic Coherence by merging treatment Sentence, wherein, each sentence characterize story of a play or opera plot different;The a plurality of sentence that obtains will be merged as above-mentioned A plurality of segmentation plot, thus the story of a play or opera outline of the popular love class film shown for all 2013 to 2015 turns It is changed to comparable object, so that the follow-up dramatic progression to popular love class film is analyzed, and according to analysis knot Fruit screens the love class screen play being more worth investment from magnanimity screen play.
In the application above-mentioned steps S25, by Text Pretreatment is carried out to the text data of destination object to be analyzed, Text data is converted to accessible a plurality of segmentation plot, provides convenience for follow-up modeling process.
Step S27, is modeled to a plurality of segmentation plot of destination object to be analyzed, obtains destination object to be analyzed Dramatic progression probabilistic model.Wherein, probabilistic model is used for characterizing a plurality of segmentation plot of destination object to be analyzed Included in any two or multiple segmentation plot transformation result.
Alternatively, using predetermined modeler model, statistical modeling is carried out to a plurality of segmentation plot obtaining, analyze each point Transformational relation between section plot, obtains the probabilistic model of the dramatic progression of destination object to be analyzed, is treated with characterizing this Any two or the transformation result of multiple segmentation plot that the destination object of analysis is comprised.
In an optional embodiment, after the text data to destination object to be analyzed carries out Text Pretreatment, Using statistical models, a plurality of segmentation plot obtaining is trained, sets up the transformational relation between each segmentation plot, Thus obtaining characterizing the probabilistic model of the dramatic progression of development trend between each segmentation plot.Alternatively, because this is general Rate model is to be obtained based on magnanimity destination object analysis to be analyzed, and the probabilistic model of this dramatic progression can be used as treating point In the destination object of analysis, the universal model of dramatic progression propulsion, objectively reflects in the plot of this destination object to be analyzed Hold.
Still will screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application are illustrated by example.(e.g., 2013 to 2015 to obtain destination object to be analyzed in screening The popular love class film that year shows) after, by Text Pretreatment is carried out to destination object to be analyzed, by its text Data (as story of a play or opera outline) is converted to a plurality of different segmentation plot, thus 2013 were shown to 2015 Popular love class film film plot be converted into film plot development each stage.Then, using predetermined system All segmentation plots that the popular love class film that model was shown learned to all 2013 to 2015 by meter are trained, Obtain the transformation result of conversion between each segmentation plot.For example, from a segmentation scenario transition to another segmentation feelings The transition probability of section, wherein, the size of transition probability can represent from a segmentation dramatic progression to another segmentation feelings The size of the trend of section, transition probability is bigger, from the probability of a segmentation dramatic progression to another segmentation plot just Bigger.By the probabilistic model of dramatic progression, can set up between a plurality of segmentation plot obtaining in above-described embodiment Contact and determine the development trend of each segmentation plot, thus realizing going out popular love class electricity using big data technology mining The general rule of film plot propulsion in shadow, helps producer to carry out from the story of a play or opera of screen play based on this general rule Investment decision.
Step S29, using the probabilistic model of the dramatic progression of destination object to be analyzed, obtains destination object to be analyzed Plot state development probability.Wherein, plot state development can include any two or multiple segmentation plot.
Alternatively, a plurality of segmentation plot in the destination object to be analyzed based on magnanimity generates the probabilistic model of dramatic progression Afterwards, using this probabilistic model, the segmentation plot that comprised based on plot state development, calculate target pair to be analyzed The probability of the plot state development of elephant, thus according to the probability of each plot state development, analyze the target pair of the type General law of development as included plot and content.
Still to screen screen play to be invested from magnanimity screen play, as a example destination object is the application scenarios of film, The above embodiments of the present application are illustrated.Take destination object to be analyzed as the hot topic shown for 2013 to 2015 As a example the application scenarios of love class film, turn in the text data (as story of a play or opera outline) based on destination object to be analyzed After a plurality of segmentation plot got in return sets up the probabilistic model of dramatic progression, calculated using this probabilistic model and comprise difference The probability corresponding to plot state development of segmentation plot, wherein, the probability of plot state development is bigger, represents spectators Like the probability of such story of a play or opera bigger, film to launch according to this plot state development the probability of the story of a play or opera Bigger.The size of the probability of the distinctive circumstance state development according to popular love class film, can analyze and obtain in love In feelings class film, which kind of plot state development is more liked by spectators, and is sieved from the new drama of magnanimity film based on analysis result Select the screen play of more worth investment.
Further, when the probability according to plot state development, producer can understand from text data easily and goes through The dramatic progression of year film, the new drama that the band taken at hand by comparing it screens, if the drama feelings of this new drama The probability of the plot state development of section is low, and the probability that is, spectators like is low, then can directly abandon;If this new drama Plot state development probability high, the high plot of the probability liked with spectators is more identical, then can coordinate editor Processing plot, obtains more preferable drama, further to improve the success rate of investment.
If it is desired to never screen certain in the magnanimity destination object of broadcasting in scheme disclosed in the above embodiments of the present application one The destination object of one class, the text data of the multiple destination objects that can have been play by reading, and many to this The text data of individual destination object carry out screening obtain to be analyzed destination object same type of with required destination object it Afterwards, the text data of the destination object being analysed to carries out Text Pretreatment and obtains this destination object to be analyzed A plurality of segmentation plot, then, is modeled obtaining this to be analyzed to a plurality of segmentation plot of this destination object to be analyzed The probabilistic model of the dramatic progression of destination object after, this programme can obtain this mesh to be analyzed using this probabilistic model The probability of the plot state development of mark object, the then sea according to the probability never broadcasting of plot state development getting Required destination object is screened in amount destination object.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application, Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play Required destination object.
Thus, the scheme of above-described embodiment one that the application provides solves in prior art when screening destination object, Because the subjectivity of the text data of manual read's destination object is strong, lead to the inaccurate technical problem of the selection result.
According to the above embodiments of the present application, step S23, multiple destination objects are carried out with Screening Treatment, generate to be analyzed Destination object, can include:
Multiple destination objects are classified by step S231 using default type of theme, obtain any one theme class One group of destination object that type is comprised.
Specifically, using default type of theme, according to the type in the text data of the multiple destination objects reading The multiple destination objects reading are categorized as multigroup destination object, every group of destination object corresponds to a kind of type of theme.
Alternatively, default type of theme can include comedy, tragedy, history, action, love, crime, terrible, The polytypes such as suspense, animation, magical, family, the application is not construed as limiting to the concrete division of type of theme.
In an optional embodiment, after reading the text data of the multiple destination objects play, can Carry out the classification of coarseness with the type in the text data according to destination object, then recycle default type of theme Destination object is further divided to some fine-grained type of theme by (type of theme such as being generated by LDA). For example, after a destination object (as film) is divided into this type of theme of romance movie, can also be entered It is divided into the type of theme such as youth, marriage, war to one step.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.Text with a destination object The above embodiments of the present application are illustrated by data instance.For example, as shown in figure 3, with destination object as film As a example application scenarios, reading the multiple targets play from the data base 30 being stored with cinematic data over the years After the text data (as cinematic data over the years) of object, for one of destination object (such as《Cause the youth》), Movie themes grader 31 can be passed through will《Cause the youth》Incorporate this type of theme of romance movie into, further, profit Will with the topic model being generated by LDA《Cause the youth》Youth subject matter can be further divided into.
Step S233, screens, from any one group of destination object, the object that attention rate exceedes predetermined threshold according to pre-defined rule, Obtain destination object to be analyzed.
Specifically, using default type of theme, the classification of multiple destination objects is being obtained any one type of theme and wrapped After the one group of destination object containing, screen popular object from any one group of destination object according to pre-defined rule and (pay close attention to Degree exceedes the object of predetermined threshold), the object that screening is obtained is as destination object to be analyzed.
Alternatively, pre-defined rule can include but is not limited to:Choice attention exceedes the destination object of predetermined threshold.
In an optional embodiment, if destination object is film, modern drama or speech, attention rate can be box office; If destination object is TV play or advertisement, attention rate can be audience ratings.
Alternatively, above-mentioned steps S233 are intended to filter out the popular mesh of each classification from categorized good destination object Mark object.In the above embodiments of the present application, first classification is selected to be some types the reason regenerating popular destination object Destination object attention rate itself just general, if first generating destination object, then the natively general target of attention rate Object will not screened out.So that destination object is as film as a example, the born box office of film of some subject matters is just general, Such as literary film, if first generate popular film, then the result of the Screening Treatment that literary film may would not occur in In.
Further, in this embodiment, alternatively quickly, we can be with the electricity of nearly 3 to five years for the focus of film Shadow data, selects the film much surmounting average box office as popular film, wherein, pre-defined rule by the use of pre-defined rule May be, but is not limited to " selecting more than the film of category box office median in a year ".
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.For example, as shown in figure 3, Movie themes grader 31 will《Cause the youth》After incorporating love-youth type of theme into, popular film maker 32 Then basis《Cause the youth》Box office whether exceed predetermined threshold, to judge whether by《Cause the youth》Classify as popular electricity Shadow.If《Cause the youth》Box office exceed predetermined threshold, then be classified as popular film, and in the popular film of classification Export theme and the film self-information of this film in list 33, be exemplified below:
1) title:Cause the youth that we pass at last;
2) act the leading role:Yang Zishan (female one), Zhao Youting (man one), Han Geng (man two), Jiang Shuying (female two), Liu Ya Plucked instrument (female three), Zhang Yao (female four), bag Bel (man three), Zheng Kai (man four), Wang Jiajia (female five), Huang Ming (man Five);
3) language:Standard Chinese;
4) show the date:2013-04-26;
5) story of a play or opera outline:The Zheng Wei (Yang Zishan decorations) of 18 years old finally achieve one's goal be admitted to innocent childhood friend next-door elder brother woods quiet The adjacent school of (Han Geng decorations) place school, waits her to be filled with expecting and steps into campus, but meet with hit woods quiet go out State studies abroad, and disappears for good and all.Zheng Wei feels more and loses, and go through thick and thin moment and Ruan's room-mate tabernaemontanus bulrush (Jiang Shuying decorations), Zhu little Bei (Liu Ya Se adorns), Li Weijuan (precious jade decorations) and senior fellow apprentice Lao Zhang-open (bag Bel's decorations) forge deep friendship, simultaneously rich Family son is permitted the pursuit that Kaiyang (Zheng Kai decorations) expands madness to Zheng Wei, and enjoys Ruan's tabernaemontanus bulrush that boy student welcomes peculiar with her Chilly guard for loved people Zhao Shiyong (Huang Ming decorations) loyal and steadfast.Once accidental misunderstanding makes Zheng Weiyu Lao Zhang room Deadly enemy is become in the old filial piety of friend just (Zhao and court of a feudal ruler decorations), and in strikeing back one after another, Zheng Wei finds oneself to fall in love with this surface Top student grim, that heart is kindhearted, then insanely counterattack develop into extremely twine rotten pursue with beating, and Chen Xiaozheng is also finally Lay down the arms and surrender under storming, quarrelsome lovers becomes happy lover eventually.When big four graduations, the life of Zheng Wei is again subjected to examine Test:Old filial piety was just once given birth the planned number of going abroad to study of (Wang Jiajia decorations), but slowly dare not tell Zheng Wei, sensation is again Cheated Zheng Wei painfully leaves Chen Xiaozheng.After for many years, Zheng Wei has changed in quality for the white-collar beauty on job market, unexpectedly again Taste the impermanence of destiny:Woods with repentant meaning and love is quiet and Chen Xiaozheng simultaneously returns in her life!Zheng Wei, this How the little flying dragon in beautiful face in the past, will vouchsafe her dense fog and choice ... in the face of life and youth;
6) type of theme:Love-youth.
According to the above embodiments of the present application, step S25, text is carried out to the text data of destination object to be analyzed and locates in advance Reason, obtains a plurality of segmentation plot of destination object to be analyzed, can include:
Step S251, extracts the plot and content of destination object to be analyzed from the text data of destination object to be analyzed.
Specifically, extract the plot and content of this destination object to be analyzed from the text data of destination object to be analyzed.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.For example, from target to be analyzed Carry in the text data (as cinematic data over the years) of object (the popular love class films as 2013 to 2015) After taking the plot and content (as story of a play or opera outline) of this destination object to be analyzed, by the story of a play or opera outline of the film reading Input to film outline sectionaliser 34.
Step S252, carries out fine granularity to the plot and content of destination object to be analyzed or the subordinate sentence of coarseness is processed.
It is alternatively possible to plot and content be carried out using punctuation mark (as comma, fullstop, branch etc.) fine-grained Subordinate sentence is processed it is also possible to be processed, using punctuation mark (as fullstop), the subordinate sentence that plot and content carries out coarseness.
In one alternatively embodiment, carry out at the subordinate sentence of coarseness in the plot and content to destination object to be analyzed After reason, the subordinate sentence obtaining after the subordinate sentence of coarseness being processed according to predetermined subordinate sentence rule carries out subordinate sentence process.
In an optional embodiment, in step S252, particulate is carried out to the plot and content of destination object to be analyzed After the subordinate sentence of degree or coarseness is processed, above-mentioned method can also include:
Step S2521, carries out duplicate removal using semantic model to the subordinate sentence after subordinate sentence process, obtains the subordinate sentence after duplicate removal.
It is alternatively possible to by the semantic model of current comparative maturity, remove the semantic subordinate sentence repeating, after obtaining duplicate removal Multiple subordinate sentences.
Step S2523, the subordinate sentence that the meaning of one's words in the subordinate sentence after duplicate removal is linked up merges.
It is alternatively possible to using pronoun and conjunction auxiliary with some rules (such as fullstop segmentation can not connect) Again piece the subordinate sentence after the duplicate removal in this embodiment together, so, the subordinate sentence that some meaning of one's words link up may be incorporated in together.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, film outline If the story of a play or opera outline of the film of input is divided into stem portion by sectionaliser 34, continue with《Cause the youth》As a example, by film Outline sectionaliser 34 can separate and obtain following sections:
A) Zheng Wei (Yang Zishan decorations) of 18 years old finally achieve one's goal be admitted to innocent childhood friend next-door elder brother woods quiet (Han Geng decorations) The adjacent school of place school;
B) Lin Jing goes abroad to study, and disappears for good and all;
C) Zheng Wei feels more and loses, and go through thick and thin moment and Ruan's room-mate tabernaemontanus bulrush (Jiang Shuying decorations), Zhu little Bei (Liu Yase decorations), multitude Dimension beautiful (precious jade decorations) and senior fellow apprentice Lao Zhang-open (bag Bel's decorations) forge deep friendship;
D) son rich family is permitted the pursuit that Kaiyang (Zheng Kai decorations) expands madness to Zheng Wei simultaneously;
E) Ruan's tabernaemontanus bulrush that boy student welcomes is enjoyed chilly to be guarded for loved people Zhao Shiyong's (Huang Ming decorations) with she is distinctive Loyal and steadfast;
F) once accidental misunderstanding makes the old filial piety of Zheng Weiyu Lao Zhang room-mate just (Zhao and court of a feudal ruler decorations) become deadly enemy;
G) in strikeing back one after another, Zheng Wei finds oneself to fall in love with the top student that this surface is grim, heart is kindhearted, Then insanely counterattack develops into extremely to twine and rotten pursues with beating;
H) and Chen Xiaozheng also lays down the arms and surrenders finally under storming, quarrelsome lovers becomes eventually happy lover;
I) old filial piety was just once given birth the planned number of going abroad to study of (Wang Jiajia decorations), but slowly dare not tell Zheng Wei;
J) feel that cheated Zheng Wei painfully leaves Chen Xiaozheng again;
K) after for many years, Zheng Wei has changed in quality for the white-collar beauty on job market, unexpectedly tastes the impermanence of destiny again:With repentant meaning And Chen Xiaozheng quiet with the woods of love simultaneously returns in her life.
Step S253, the plot and content after subordinate sentence is processed carries out abstract process, obtains a plurality of segmentation plot.
Specifically, the plot and content after above-mentioned steps S253 are intended to process subordinate sentence takes out main contents, convenient follow-up Carry out statistical analysis and modeling.
In an optional embodiment, step S253, the plot and content after subordinate sentence is processed carries out abstract process, obtains To a plurality of segmentation plot, can include:
Step S2531, the plot and content after subordinate sentence is processed carries out word segmentation processing, and removes stop-word, obtains a plurality of point Section plot.
Wherein, after obtaining a plurality of segmentation plot, pre-conditioned according to meeting in every segmentation plot of extraction of semantics Sentence, and replace, using predetermined general word, the subject meeting in pre-conditioned sentence.
Specifically, after the plot and content after processing subordinate sentence carries out word segmentation processing, stop-word therein is removed, Obtain a plurality of segmentation plot, and according to meeting pre-conditioned sentence in every segmentation plot of extraction of semantics, and use The subject meeting in pre-conditioned sentence replaced in predetermined general word.
In an optional embodiment, according in every segmentation plot of extraction of semantics meet pre-conditioned sentence can To extract the major part in segmentation plot by the main body recognition methodss of current comparative maturity.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, continue with《Cause Youth》As a example, after being separated by film outline sectionaliser 34 and obtaining above-mentioned several partial contents, film feelings Save abstract device 35 to pass through by the sentence participle in above-mentioned several partial contents and remove stop-word, then using comparing at present The major part removing the text after stop-word is extracted in ripe main body recognition methodss, and utilizes actor information, by people Thing is abstracted into predetermined general word (as film common language " female one ", " man one " etc.), by the abstract device of film plot 35 can be by above-mentioned content a) to k) being converted to following content:
A) female one is admitted to male two universities;
B) man two goes abroad silence;
C) female one and female two, female three, female four, man three become good friend;
D) man four pursuit female one;
E) female two pursues man five;
F) female one and man one become unintentionally deadly enemy;
G) female one pursues man one;
H) man one promises female one;
I) man one obtains female five and goes abroad planned number.Do not tell female one;
J) female one leaves man one;
K) female one and man met again after more than one year.
Further, as shown in figure 3, after the abstract device of film plot 35 obtains above-mentioned content, can will be above-mentioned Content further arrange, and input to segmentation plot list 36, wherein, the content after arrangement is exemplified below:
1) plot sequence number:1;Subject:Female one;Predicate:It is admitted to;Object:Man two, university;
2) plot sequence number:2;Subject:Man two;Predicate:Go abroad.
According to the above embodiments of the present application, step S27, a plurality of segmentation plot of destination object to be analyzed is modeled, Obtain the probabilistic model of the dramatic progression of destination object to be analyzed, can include:
Step S271, is modeled to a plurality of segmentation plot of destination object to be analyzed using any one model following: Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach.
Wherein, the transfer that the probabilistic model of the dramatic progression of destination object to be analyzed is included between any two state is general Rate, state includes at least one segmentation plot.
Alternatively, above-mentioned steps S271 are intended to using statistical models, segmentation plot is modeled, thus obtaining plot The probabilistic model of development.In terms of modeling, it is possible to use Markov chain model is calculating by a state to another The transition probability of state.Plot in order to make close is easier to condense together, it is possible to use hidden Markov chain Model or bivariate Bayesian hierarchical approach are modeled to a plurality of segmentation plot.
In this embodiment, the output of each model above-mentioned is the transition probability between state and state.
Alternatively, the transition probability between any two state can calculate according to equation below:
State x to state y transition probability=state x to state y frequency/state x to all states send out Raw number of times.Wherein, x, y are natural number;State x and state y represent in the above-mentioned probabilistic model of dramatic progression Any two state.
Alternatively, after a plurality of segmentation plot obtaining destination object to be analyzed, using any one mould above-mentioned After type is modeled to a plurality of segmentation plot, in the probabilistic model obtaining, statistics is changed into state y from state x Probability of happening is as the transition probability of state x to state y.Specifically, in all of state, statistics is by state x It is changed into the frequency A of this event of state y, and count sending out of every other this event of state is changed into from state x Raw number of times B, is used the transition probability as state x to state y for the ratio of frequency A and frequency B, i.e. Account for the ratio that state x is changed into total frequency of every other state using the frequency being changed into state y from state x Example characterizes the probability of happening that state x is changed into this event of state y.
In an optional embodiment, statistic behavior x can be realized to state by Hadoop MapReduce instrument The frequency of y and state x are to the frequency of all states.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, in segmentation feelings Section list 36 inputs a plurality of segmentation plot to plot progressions model 37, and dramatic progression model 37 uses Markov Chain model, Hidden Markov chain model or bivariate Bayesian hierarchical approach are modeled to a plurality of segmentation plot, can obtain many Transition probability between bar state and state, for example, so that segmentation plot is in a large number about the plot of love triangle as a example, Dramatic progression model 37 may obtain following result:
1) state 1:Female one and female two are boudoir honey, and female one and female two are classmates, female once and female two long ago recognize;
2) state 2:Female one and man one are a pair of lovers, and female one is in unrequited love with man one;
3) state 3:Female two a party run into man one, female one, female two, man one flat when play very well;
4) state 4:Female is out of shape in all one's life, and female one there occurs traffic accident;
5) state 5:Female two and man one come together to look after female one, and female two is met with man one in hospital unintentionally;
6) state 6:Female two and man one have touched out spark;
……
Wherein, the transition probability that state 1 arrives state 2 is 0.9, and the transition probability that state 2 arrives state 3 is 0.7, shape The transition probability that state 2 arrives state 4 is 0.3, and the transition probability that state 3 arrives state 6 is 0.5, and state 4 arrives state 5 Transition probability be 0.6, state 5 arrive state 6 transition probability be 0.4.
In an optional embodiment, step S29, using the probabilistic model of the dramatic progression of destination object to be analyzed, Obtain the probability of the plot state development of destination object to be analyzed, can include:
Step S291, obtains any one or more plot state development.Wherein, plot state development is a bar state Chain, state chain includes at least one state, and the transfer sequence of state.
Step S293, using the transition probability between two neighboring state, is calculated the probability of plot state development.
Alternatively, obtain the transition probability between two neighboring state in one or more state chain, and use state chain, Calculate the probability of this state chain (namely plot state development).
In an optional embodiment, state chain can include the transfer sequence of multiple states and each state, permissible It is calculated the probability P of plot state development by equation below:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize between two neighboring state Transition probability, i be natural number.
Alternatively, due to there is the destination object that magnanimity had been play, a lot of dividing may in therefore one state, be comprised Section plot, in order that investor can open-and-shut understand a state represented by implication, need to build for this state A vertical model, using the summary as this state.Can be found using TF-IDF model more ripe at present and can Reflect the word of this state.Before using this model, need to reject the subject in state first, only will describe thing The predicate of part, adverbial modifier etc. put into model.
Continue to screen screen play to be invested from magnanimity screen play, destination object is that the application scenarios of film are The above embodiments of the present application, in conjunction with the embodiment shown in Fig. 3, are illustrated by example.As shown in figure 3, sending out in plot After exhibition model 37 obtains state, plot descriptive model 38 passes through TF-IDF model and generates state description for each state, Continue with《Cause the youth》As a example, then the possible output result of plot descriptive model 38 is as follows:
State:1;Main body:Female one, female two;Event:It is on speaking terms, boudoir is sweet, good friend;
State:2;Main body:Female one, man one;Event:Like, like, be in unrequited love with;
……
Output it after result exports to film story of a play or opera list 39 in plot descriptive model 38, film story of a play or opera list 39 Calculate the probability of each state chain using the transition probability of plot descriptive model 38, its output form is plot state development (i.e. state chain) and corresponding probability, as follows:
Plot state development:1;State chain:1、2、3、5;Probability:0.85;
Plot state development:2;State chain:2、4、6、8;Probability:0.7;
……
The output result of film story of a play or opera list 39 is associated the state description that above-mentioned plot descriptive model 38 exports, with regard to energy Enough it is easily understood that the implication of plot state development, wherein, the probability of above-mentioned plot state development can be read as Spectators like the probability of this story of a play or opera.
By the above embodiments of the present application, can type of theme film plot generation module based on Markov chain model (dramatic progression model as shown in Figure 3), can be according to current hot issue and film over the years, by big data text Digging technology, is analyzed to the film over the years of different themes type and excavates popular, popular feelings therein Section content, using Markov chain model infer the film story of a play or opera development, thus be automatically performed according to the result of model right The Potential Evaluation of new drama, can also automatically generate a new screen play according to the development of the film story of a play or opera of this deduction.
In embodiment shown in the application Fig. 3, using big data Text Mining Technology, first magnanimity film is pressed certain Granularity resolves into some subject matters, then takes out the different development of action stages from the popular film of identical subject matter, and digs Excavate film plot law of development, thus helping motion picture producer to be best understood from the film what story of a play or opera spectators like, Producer is helped to carry out investment decision.By the above embodiments of the present application, to heat from this point of penetration of the film story of a play or opera Door film carries out depth excavation, and its output is the probability that the story of a play or opera outline of abstract and spectators like.Producer is permissible Easily from these dramatic progression of text understanding, compare the new drama that it takes at hand, if this drama plot spectators The probability liked is very low, then can directly abandon;If changing drama and plot that spectators like probability high being more identical, Playwright, screenwriter's processing plot can be coordinated, obtain more preferable drama.
It should be noted that for aforesaid each method embodiment, in order to be briefly described, therefore it is all expressed as one and be The combination of actions of row, but those skilled in the art should know, and the application is not subject to limiting of described sequence of movement System, because according to the application, some steps can be carried out using other orders or simultaneously.Secondly, art technology Personnel also should know, embodiment described in this description belongs to preferred embodiment, involved action and module Not necessarily necessary to the application.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned enforcement The method of example can be realized by the mode of software plus necessary general hardware platform naturally it is also possible to pass through hardware, but The former is more preferably embodiment in many cases.Based on such understanding, the technical scheme of the application substantially or Say that what prior art was contributed partly can be embodied in the form of software product, this computer software product is deposited Storage, in a storage medium (as ROM/RAM, magnetic disc, CD), includes some instructions use so that a station terminal Described in equipment (can be mobile phone, computer, server, or network equipment etc.) execution each embodiment of the application Method.
Embodiment 2
According to the embodiment of the present application, additionally provide a kind of text data of the processing method for implementing above-mentioned text data Processing meanss, as shown in figure 4, this device can include:Reading unit 41, signal generating unit 43, processing unit 45th, modeling unit 47 and acquiring unit 49.
Wherein, reading unit 41 is used for reading the text data of the multiple destination objects play, wherein, target Object includes any one object following:Film, TV play, modern drama, documentary film, speech and advertisement.
Alternatively, the text data of the multiple destination objects play can be stored in data base.In the application In above-mentioned reading unit 41, when screening a certain class destination object in the magnanimity destination object needing never broadcasting, permissible Read the text data of multiple destination objects of having play of storage from data base, with based on to having play A certain class target exactly, is objectively screened in the magnanimity destination object of the analysis result of multiple destination objects never broadcasting Object.
In an optional embodiment, the text data of destination object can be the feature for characterizing destination object Text data.Alternatively, the text data of destination object can include but is not limited to the title of destination object, target pair The protagonist of elephant and its role, the type of destination object, the broadcasting area of destination object, the language of destination object, target The show time of object, the plot and content (as story of a play or opera outline) of destination object and the hot broadcast level data of destination object (e.g., box office).
Signal generating unit 43 is used for multiple destination objects are carried out Screening Treatment, generates destination object to be analyzed.
Alternatively, Screening Treatment is carried out to multiple destination objects according to the text data of the multiple destination objects reading, Using the class destination object that obtains of screening as destination object to be analyzed, wherein, this destination object to be analyzed and institute The destination object that need to screen belongs to identical type.
In an optional embodiment, can be according to the text of default screening rule and the multiple destination objects reading Data is screening multiple destination objects, and the destination object execution subsequent treatment to be analyzed being obtained based on screening.Optional Ground, can filter out the text data meeting default screening rule, so from the text data of the destination object reading The destination object belonging to text data obtaining screening afterwards, as destination object to be analyzed, is obtained and institute by screening The destination object that need to screen belongs to the destination object to be analyzed of same type.
It is alternatively possible to title based on destination object, the protagonist of destination object and its role, the type of destination object, The broadcasting area of destination object, the language of destination object, the show time of destination object, the plot and content of destination object The hot broadcast level data (e.g., box office) of (as story of a play or opera outline) and destination object, to the multiple destination objects reading Carry out Screening Treatment.
In the above-mentioned signal generating unit of the application 43, by preliminary screening is carried out to magnanimity destination object, screening is obtained Destination object, as destination object to be analyzed, can filter out the destination object corresponding to actual needs, and removes and reality Need other unrelated destination objects, reducing in subsequent processes needs data volume to be processed, thus improving at data The efficiency of reason.
Processing unit 45 is used for carrying out Text Pretreatment to the text data of destination object to be analyzed, obtains to be analyzed The a plurality of segmentation plot of destination object.
Specifically, after screening obtains destination object to be analyzed, the text data of destination object to be analyzed is entered Row Text Pretreatment, obtains a plurality of segmentation plot of destination object to be analyzed.Alternatively, Text Pretreatment can wrap Include but be not limited to Text Feature Extraction process, subordinate sentence process, duplicate removal process and merging treatment.
In an optional embodiment, the text data of the destination object being analysed to by Text Pretreatment is converted to Comparable, accessible a plurality of segmentation plot, so that a plurality of segmentation that can be obtained using division in subsequent processes Plot is modeled.It is alternatively possible to text is carried out to the plot and content in the text data of destination object to be analyzed Pretreatment, plot and content is converted to comparable, accessible a plurality of segmentation plot, thus the target pair being analysed to The plot and content of elephant is converted to each stage of concrete details development, for subsequently excavating and extracting destination object to be analyzed Dramatic progression general rule provide basis.
In the above-mentioned processing unit of the application 45, located in advance by text is carried out to the text data of destination object to be analyzed Reason, text data is converted to accessible a plurality of segmentation plot, provides convenience for follow-up modeling process.
Modeling unit 47 is used for a plurality of segmentation plot of destination object to be analyzed is modeled, and obtains mesh to be analyzed The probabilistic model of the dramatic progression of mark object, wherein, probabilistic model is used for characterizing a plurality of point of destination object to be analyzed Any two included in section plot or the transformation result of multiple segmentation plot.
Alternatively, using predetermined modeler model, statistical modeling is carried out to a plurality of segmentation plot obtaining, analyze each point Transformational relation between section plot, obtains the probabilistic model of the dramatic progression of destination object to be analyzed, is treated with characterizing this Any two or the transformation result of multiple segmentation plot that the destination object of analysis is comprised.
In an optional embodiment, after the text data to destination object to be analyzed carries out Text Pretreatment, Using statistical models, a plurality of segmentation plot obtaining is trained, sets up the transformational relation between each segmentation plot, Thus obtaining characterizing the probabilistic model of the dramatic progression of development trend between each segmentation plot.Alternatively, because this is general Rate model is to be obtained based on magnanimity destination object analysis to be analyzed, and the probabilistic model of this dramatic progression can be used as treating point In the destination object of analysis, the universal model of dramatic progression propulsion, objectively reflects in the plot of this destination object to be analyzed Hold.
Acquiring unit 49 is used for the probabilistic model of the dramatic progression using destination object to be analyzed, obtains mesh to be analyzed The probability of the plot state development of mark object, wherein, plot state development includes any two or multiple segmentation plot.
Alternatively, a plurality of segmentation plot in the destination object to be analyzed based on magnanimity generates the probabilistic model of dramatic progression Afterwards, using this probabilistic model, the segmentation plot that comprised based on plot state development, calculate target pair to be analyzed The probability of the plot state development of elephant, thus according to the probability of each plot state development, analyze the target pair of the type General law of development as included plot and content.
If it is desired to never screen certain in the magnanimity destination object of broadcasting in scheme disclosed in the above embodiments of the present application two The destination object of one class, the text data of the multiple destination objects that can have been play by reading, and many to this The text data of individual destination object carry out screening obtain to be analyzed destination object same type of with required destination object it Afterwards, the text data of the destination object being analysed to carries out Text Pretreatment and obtains this destination object to be analyzed A plurality of segmentation plot, then, is modeled obtaining this to be analyzed to a plurality of segmentation plot of this destination object to be analyzed The probabilistic model of the dramatic progression of destination object after, this programme can obtain this mesh to be analyzed using this probabilistic model The probability of the plot state development of mark object, the then sea according to the probability never broadcasting of plot state development getting Required destination object is screened in amount destination object.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application, Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play Required destination object.
Thus, the scheme of above-described embodiment two that the application provides solves in prior art when screening destination object, Because the subjectivity of the text data of manual read's destination object is strong, lead to the inaccurate technical problem of the selection result.
According to the above embodiments of the present application, as shown in figure 5, signal generating unit 43 can include:Sort module 51 and sieve Modeling block 53.
Wherein, sort module 51 is used for using default type of theme, multiple destination objects being classified, and obtains arbitrarily One group of destination object that a kind of type of theme is comprised.
Specifically, using default type of theme, according to the type in the text data of the multiple destination objects reading The multiple destination objects reading are categorized as multigroup destination object, every group of destination object corresponds to a kind of type of theme.
Alternatively, default type of theme can include comedy, tragedy, history, action, love, crime, terrible, The polytypes such as suspense, animation, magical, family, the application is not construed as limiting to the concrete division of type of theme.
In an optional embodiment, after reading the text data of the multiple destination objects play, can Carry out the classification of coarseness with the type in the text data according to destination object, then recycle default type of theme Destination object is further divided to some fine-grained type of theme by (type of theme such as being generated by LDA). For example, after a destination object (as film) is divided into this type of theme of romance movie, can also be entered It is divided into the type of theme such as youth, marriage, war to one step.
Screening module 53 exceedes predetermined threshold for screening attention rate from any one group of destination object according to pre-defined rule Object, obtain destination object to be analyzed.
Specifically, using default type of theme, the classification of multiple destination objects is being obtained any one type of theme and wrapped After the one group of destination object containing, screen popular object from any one group of destination object according to pre-defined rule and (pay close attention to Degree exceedes the object of predetermined threshold), the object that screening is obtained is as destination object to be analyzed.
Alternatively, pre-defined rule can include but is not limited to:Choice attention exceedes the destination object of predetermined threshold.
In an optional embodiment, if destination object is film, modern drama or speech, attention rate can be box office; If destination object is TV play or advertisement, attention rate can be audience ratings.
Alternatively, above-mentioned screening module 53 is intended to filter out the hot topic of each classification from categorized good destination object Destination object.In the above embodiments of the present application, first classification is selected to be some classes the reason regenerating popular destination object The destination object of type attention rate itself is just general, if first generating destination object, then the natively general mesh of attention rate Mark object will not screened out.So that destination object is as film as a example, the born box office of film of some subject matters is just general, Such as literary film, if first generate popular film, then the result of the Screening Treatment that literary film may would not occur in In.
Further, in this embodiment, alternatively quickly, we can be with the electricity of nearly 3 to five years for the focus of film Shadow data, selects the film much surmounting average box office as popular film, wherein, pre-defined rule by the use of pre-defined rule May be, but is not limited to " selecting more than the film of category box office median in a year ".
According to the above embodiments of the present application, as shown in fig. 6, processing unit 45 can include:Extraction module 61, point Sentence module 63 and abstract module 65.
Extraction module 61 is used for extracting the feelings of destination object to be analyzed from the text data of destination object to be analyzed Section content.
Specifically, extract the plot and content of this destination object to be analyzed from the text data of destination object to be analyzed.
Subordinate sentence module 63 is used for the plot and content of destination object to be analyzed is carried out at fine granularity or the subordinate sentence of coarseness Reason.
It is alternatively possible to plot and content be carried out using punctuation mark (as comma, fullstop, branch etc.) fine-grained Subordinate sentence is processed it is also possible to be processed, using punctuation mark (as fullstop), the subordinate sentence that plot and content carries out coarseness.
In one alternatively embodiment, carry out at the subordinate sentence of coarseness in the plot and content to destination object to be analyzed After reason, the subordinate sentence obtaining after the subordinate sentence of coarseness being processed according to predetermined subordinate sentence rule carries out subordinate sentence process.
In an optional embodiment, as shown in fig. 7, above-mentioned device can also include:Deduplication module 71 and conjunction And module 73.
Wherein, deduplication module 71 is used for carrying out fine granularity or coarseness in the plot and content to destination object to be analyzed After subordinate sentence is processed, using semantic model, duplicate removal is carried out to the subordinate sentence after subordinate sentence process, obtain the subordinate sentence after duplicate removal.
It is alternatively possible to by the semantic model of current comparative maturity, remove the semantic subordinate sentence repeating, after obtaining duplicate removal Multiple subordinate sentences.
Merge module 73 to be used for merging the coherent subordinate sentence of the meaning of one's words in the subordinate sentence after duplicate removal.
It is alternatively possible to using pronoun and conjunction auxiliary with some rules (such as fullstop segmentation can not connect) Again piece the subordinate sentence after the duplicate removal in this embodiment together, so, the subordinate sentence that some meaning of one's words link up may be incorporated in together.
Plot and content after abstract module 65 is used for processing subordinate sentence carries out abstract process, obtains a plurality of segmentation plot.
Specifically, the plot and content after above-mentioned abstract module 65 is intended to process subordinate sentence takes out main contents, after convenience Continue and carry out statistical analysis and modeling.
In an optional embodiment, as shown in figure 8, abstract module 63 can include:Participle submodule 81.
Wherein, the plot and content after participle submodule 81 is used for processing subordinate sentence carries out word segmentation processing, and removes stop-word, Obtain a plurality of segmentation plot;Wherein, after obtaining a plurality of segmentation plot, according in every segmentation plot of extraction of semantics Meet pre-conditioned sentence, and replace, using predetermined general word, the subject meeting in pre-conditioned sentence.
Specifically, after the plot and content after processing subordinate sentence carries out word segmentation processing, stop-word therein is removed, Obtain a plurality of segmentation plot, and according to meeting pre-conditioned sentence in every segmentation plot of extraction of semantics, and use The subject meeting in pre-conditioned sentence replaced in predetermined general word.
In an optional embodiment, according in every segmentation plot of extraction of semantics meet pre-conditioned sentence can To extract the major part in segmentation plot by the main body recognition methodss of current comparative maturity.
According to the above embodiments of the present application, as shown in figure 9, modeling unit 47 can include:MBM 91.
Wherein, MBM 91 is used for using the following a plurality of segmentation feelings to destination object to be analyzed for any one model Section is modeled:Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach.
Wherein, the transfer that the probabilistic model of the dramatic progression of destination object to be analyzed is included between any two state is general Rate, state includes at least one segmentation plot.
Alternatively, above-mentioned MBM 91 is intended to using statistical models, segmentation plot is modeled, thus obtaining feelings The probabilistic model of section development.In terms of modeling, it is possible to use Markov chain model is calculating by a state to another The transition probability of individual state.Plot in order to make close is easier to condense together, it is possible to use Hidden Markov Chain model or bivariate Bayesian hierarchical approach are modeled to a plurality of segmentation plot.
In this embodiment, the output of each model above-mentioned is the transition probability between state and state.
Alternatively, the transition probability between any two state can calculate according to equation below:
State x to state y transition probability=state x to state y frequency/state x to all states send out Raw number of times.Wherein, x, y are natural number;State x and state y represent in the above-mentioned probabilistic model of dramatic progression Any two state.
Alternatively, after a plurality of segmentation plot obtaining destination object to be analyzed, using any one mould above-mentioned After type is modeled to a plurality of segmentation plot, in the probabilistic model obtaining, statistics is changed into state y from state x Probability of happening is as the transition probability of state x to state y.Specifically, in all of state, statistics is by state x It is changed into the frequency A of this event of state y, and count sending out of every other this event of state is changed into from state x Raw number of times B, is used the transition probability as state x to state y for the ratio of frequency A and frequency B, i.e. Account for the ratio that state x is changed into total frequency of every other state using the frequency being changed into state y from state x Example characterizes the probability of happening that state x is changed into this event of state y.
In an optional embodiment, statistic behavior x can be realized to state by Hadoop MapReduce instrument The frequency of y and state x are to the frequency of all states.
In an optional embodiment, acquiring unit 49 as shown in Figure 10 can include:Acquisition module 1001 and meter Calculate module 1003.
Acquisition module 1001 is used for obtaining any one or more plot state development, and wherein, plot state development is one Bar state chain, state chain includes at least one state, and the transfer sequence of state.
Computing module 1003 is used for using the transition probability between two neighboring state, is calculated plot state development Probability.
Alternatively, obtain the transition probability between two neighboring state in one or more state chain, and use state chain, Calculate the probability of this state chain (namely plot state development).
In an optional embodiment, as shown in figure 11, state chain includes multiple states and the transfer of each state is suitable Sequence, computing module 1003 can include:Calculating sub module 1101, for being calculated plot shape by equation below The probability P of state development:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize between two neighboring state Transition probability, i be natural number.
Alternatively, due to there is the destination object that magnanimity had been play, a lot of dividing may in therefore one state, be comprised Section plot, in order that investor can open-and-shut understand a state represented by implication, need to build for this state A vertical model, using the summary as this state.Can be found using TF-IDF model more ripe at present and can Reflect the word of this state.Before using this model, need to reject the subject in state first, only will describe thing The predicate of part, adverbial modifier etc. put into model.
Alternatively, in the present embodiment, the processing meanss of above-mentioned text data can apply to meter as shown in Figure 1 In the hardware environment that calculation machine terminal 10 is constituted.As shown in figure 1, terminal 10 passes through network calculating with other Machine terminal is attached, and above-mentioned network includes but is not limited to:Wide area network, Metropolitan Area Network (MAN) or LAN.
Embodiment 3
Embodiments herein can provide a kind of terminal, and this terminal can be in terminal group Any one computer terminal.Alternatively, in the present embodiment, above computer terminal can also replace with The terminal units such as mobile terminal.
Alternatively, in the present embodiment, above computer terminal may be located in multiple network equipments of computer network At least one network equipment.
In the present embodiment, above computer terminal can execute following steps in the leak detection method of application program Program code:Read the text data of the multiple destination objects play, wherein, destination object includes following appointing Anticipate a kind of object:Film, TV play, modern drama, documentary film, speech and advertisement;Multiple destination objects are screened Process, generate destination object to be analyzed;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains A plurality of segmentation plot to destination object to be analyzed;The a plurality of segmentation plot of destination object to be analyzed is modeled, Obtain the probabilistic model of the dramatic progression of destination object to be analyzed, wherein, probabilistic model is used for characterizing mesh to be analyzed Any two included in a plurality of segmentation plot of mark object or the transformation result of multiple segmentation plot;Using to be analyzed The dramatic progression of destination object probabilistic model, obtain the probability of the plot state development of destination object to be analyzed, Wherein, plot state development includes any two or multiple segmentation plot.
Alternatively, Figure 12 is a kind of structured flowchart of the terminal according to the embodiment of the present application.As shown in figure 12, This terminal A can include:One or more (in figure only illustrates one) processor 1201, memorizer 1202, And transmitting device 1203.
Wherein, memorizer can be used for storing software program and module, the such as place of the text data in the embodiment of the present application The reason corresponding programmed instruction/module of method and apparatus, processor pass through to run be stored in software program in memorizer and Module, thus executing various function application and data processing, that is, realizes the processing method of above-mentioned text data.Deposit Reservoir may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage Device, flash memory or other non-volatile solid state memories.In some instances, memorizer can further include phase For the remotely located memorizer of processor, these remote memories can be by network connection to terminal A. The example of above-mentioned network includes but is not limited to the Internet, intranet, LAN, mobile radio communication and combinations thereof.
Processor can call information and the application program of memory storage by transmitting device, to execute following step: Read the text data of the multiple destination objects play, wherein, destination object includes any one object following: Film, TV play, modern drama, documentary film, speech and advertisement;Screening Treatment is carried out to multiple destination objects, generation is treated The destination object of analysis;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains mesh to be analyzed The a plurality of segmentation plot of mark object;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains to be analyzed The dramatic progression of destination object probabilistic model, wherein, probabilistic model be used for characterize the many of destination object to be analyzed Any two included in bar segmentation plot or the transformation result of multiple segmentation plot;Using destination object to be analyzed Dramatic progression probabilistic model, obtain the probability of the plot state development of destination object to be analyzed, wherein, plot State development includes any two or multiple segmentation plot.
Optionally, above-mentioned processor can also carry out the program code of following steps:Using default type of theme to many The carrying out of individual destination object is classified, and obtains one group of destination object that any one type of theme is comprised;Establish rules according to pre- Then from any one group of destination object, screening attention rate exceedes the object of predetermined threshold, obtains destination object to be analyzed.
Optionally, above-mentioned processor can also carry out the program code of following steps:Literary composition from destination object to be analyzed The plot and content of destination object to be analyzed is extracted in notebook data;The plot and content of destination object to be analyzed is carried out carefully The subordinate sentence of granularity or coarseness is processed;Plot and content after subordinate sentence is processed carries out abstract process, obtains a plurality of segmentation feelings Section.
Optionally, above-mentioned processor can also carry out the program code of following steps:Using semantic model, subordinate sentence is processed Subordinate sentence afterwards carries out duplicate removal, obtains the subordinate sentence after duplicate removal;The subordinate sentence that the meaning of one's words in subordinate sentence after duplicate removal is linked up merges.
Optionally, above-mentioned processor can also carry out the program code of following steps:Plot and content after subordinate sentence is processed Carry out word segmentation processing, and remove stop-word, obtain a plurality of segmentation plot;Wherein, after obtaining a plurality of segmentation plot, According to the pre-conditioned sentence that meets in every segmentation plot of extraction of semantics, and replaced full using predetermined general word Subject in the pre-conditioned sentence of foot.
Optionally, above-mentioned processor can also carry out the program code of following steps:Using any one model pair following The a plurality of segmentation plot of destination object to be analyzed is modeled:Markov chain model, Hidden Markov chain model and Bivariate Bayesian hierarchical approach;Wherein, the probabilistic model of the dramatic progression of destination object to be analyzed includes any two state Between transition probability, state includes at least one segmentation plot.
Optionally, above-mentioned processor can also carry out the program code of following steps:Obtain any one or more plots State development, wherein, plot state development is a bar state chain, and state chain includes at least one state, and state Transfer sequence;Using the transition probability between two neighboring state, it is calculated the probability of plot state development.
Optionally, above-mentioned processor can also carry out the program code of following steps:State chain includes multiple states and every The transfer sequence of individual state, is calculated the probability P of plot state development by equation below: P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize turning between two neighboring state Move probability, i is natural number.
If it is desired to never screen in the magnanimity destination object of broadcasting a certain in scheme disclosed in the above embodiments of the present application The destination object of class, the text data of the multiple destination objects that can have been play by reading, and to the plurality of After the text data of destination object carries out screening and obtains destination object to be analyzed same type of with required destination object, The text data of the destination object that can be analysed to carries out Text Pretreatment and obtains a plurality of of this destination object to be analyzed Segmentation plot, then, is modeled obtaining this mesh to be analyzed to a plurality of segmentation plot of this destination object to be analyzed After the probabilistic model of dramatic progression of mark object, this programme can obtain this target pair to be analyzed using this probabilistic model The probability of the plot state development of elephant, the then magnanimity mesh according to the probability never broadcasting of plot state development getting Required destination object is screened in mark object.
It is easily noted that, during due to screening required destination object in the magnanimity destination object of never broadcasting, only need to divide Analyse the text data of the multiple destination objects play, obtained and required destination object same class by statistics modeling The probability of the plot state development of the destination object to be analyzed of type is it is possible to according to the multiple destination objects play Text data and meet objective reality plot state development probability, objectively analyze the magnanimity target pair do not play As in the destination object of which desirable type more liked by spectators, therefore, the scheme that provided by the embodiment of the present application, Text data without manual read's magnanimity destination object, it is possible to achieve the literary composition of the destination object play from magnanimity Excavate the general rule of the plot state development of a certain class destination object in notebook data, so not only achieve according to feelings The general rule of section state development accurately, is objectively screened from the destination object that magnanimity is not play and more to be liked by spectators Required destination object, and, the text data of the destination object play from magnanimity is excavated a certain classification During the general rule of plot state development of mark object, needs are reduced by Screening Treatment and Text Pretreatment The data volume processing, it may therefore be assured that objective, accurately and efficiently screening from the destination object that magnanimity is not play Required destination object.
Thus, the application provide above-described embodiment scheme solve in prior art screen destination object when, by Strong in the subjectivity of the text data of manual read's destination object, lead to the inaccurate technical problem of the selection result.
It will appreciated by the skilled person that the structure shown in Figure 12 is only illustrating, terminal can also be Smart mobile phone (as Android phone, iOS mobile phone etc.), panel computer, applause computer and mobile internet device The terminal unit such as (Mobile Internet Devices, MID), PAD.Figure 12 its not to above-mentioned electronic installation Structure cause limit.For example, terminal A may also include the assembly more or more less than shown in Figure 12 (such as Network interface, display device etc.), or there are the configurations different from shown in Figure 12.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is permissible Completed come the device-dependent hardware of command terminal by program, this program can be stored in a computer-readable storage medium In matter, storage medium can include:Flash disk, read only memory (Read-Only Memory, ROM), deposit at random Take device (Random Access Memory, RAM), disk or CD etc..
Embodiment 4
Embodiments herein additionally provides a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium Can be used for preserving the program code performed by the processing method of text data that above-described embodiment one is provided.
Alternatively, in the present embodiment, above-mentioned storage medium may be located in computer network Computer terminal group In any one terminal, or it is located in any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, storage medium is arranged to store the program code for executing following steps: Read the text data of the multiple destination objects play, wherein, destination object includes any one object following: Film, TV play, modern drama, documentary film, speech and advertisement;Screening Treatment is carried out to multiple destination objects, generation is treated The destination object of analysis;Text Pretreatment is carried out to the text data of destination object to be analyzed, obtains mesh to be analyzed The a plurality of segmentation plot of mark object;The a plurality of segmentation plot of destination object to be analyzed is modeled, obtains to be analyzed The dramatic progression of destination object probabilistic model, wherein, probabilistic model be used for characterize the many of destination object to be analyzed Any two included in bar segmentation plot or the transformation result of multiple segmentation plot;Using destination object to be analyzed Dramatic progression probabilistic model, obtain the probability of the plot state development of destination object to be analyzed, wherein, plot State development includes any two or multiple segmentation plot.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Using default Type of theme the carrying out of multiple destination objects is classified, obtain one group of target pair that any one type of theme is comprised As;Screen, from any one group of destination object, the object that attention rate exceedes predetermined threshold according to pre-defined rule, obtain treating point The destination object of analysis.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:From to be analyzed The text data of destination object in extract the plot and content of destination object to be analyzed;To destination object to be analyzed Plot and content carries out fine granularity or the subordinate sentence of coarseness is processed;Plot and content after subordinate sentence is processed carries out abstract process, Obtain a plurality of segmentation plot.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Using semanteme Model carries out duplicate removal to the subordinate sentence after subordinate sentence process, obtains the subordinate sentence after duplicate removal;The meaning of one's words in subordinate sentence after duplicate removal is linked up Subordinate sentence merge.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:At subordinate sentence Plot and content after reason carries out word segmentation processing, and removes stop-word, obtains a plurality of segmentation plot;Wherein, obtain many After bar segmentation plot, according to meeting pre-conditioned sentence in every segmentation plot of extraction of semantics, and using predetermined General word replace the subject meeting in pre-conditioned sentence.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Using as follows Any one model is modeled to a plurality of segmentation plot of destination object to be analyzed:Markov chain model, hidden horse Er Kefu chain model and bivariate Bayesian hierarchical approach;Wherein, the probabilistic model bag of the dramatic progression of destination object to be analyzed Include the transition probability between any two state, state includes at least one segmentation plot.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:Obtain arbitrarily One or more plot state development, wherein, plot state development is a bar state chain, and state chain includes at least one State, and the transfer sequence of state;Using the transition probability between two neighboring state, it is calculated plot state The probability of development.
Alternatively, above-mentioned storage medium is also configured to store the program code for executing following steps:State chain bag Include the transfer sequence of multiple states and each state, be calculated the probability P of plot state development by equation below: P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize turning between two neighboring state Move probability, i is natural number.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment The part describing in detail, may refer to the associated description of other embodiment.
It should be understood that disclosed technology contents in several embodiments provided herein, other can be passed through Mode realize.Wherein, device embodiment described above is only the schematically division of for example described unit, It is only a kind of division of logic function, actual can have other dividing mode when realizing, for example multiple units or assembly Can in conjunction with or be desirably integrated into another system, or some features can be ignored, or does not execute.Another, institute The coupling each other of display or discussion or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, show as unit The part showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple NEs.Some or all of unit therein can be selected according to the actual needs to realize the present embodiment The purpose of scheme.
In addition, can be integrated in a processing unit in each functional unit in each embodiment of the application it is also possible to It is that unit is individually physically present it is also possible to two or more units are integrated in a unit.Above-mentioned integrated Unit both can be to be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If described integrated unit realized using in the form of SFU software functional unit and as independent production marketing or use when, Can be stored in a computer read/write memory medium.Based on such understanding, the technical scheme essence of the application On all or part of the part that in other words prior art contributed or this technical scheme can be with software product Form embodies, and this computer software product is stored in a storage medium, including some instructions with so that one Platform computer equipment (can be personal computer, server or network equipment etc.) executes each embodiment institute of the application State all or part of step of method.And aforesaid storage medium includes:USB flash disk, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD Etc. various can be with the medium of store program codes.
The above is only the preferred implementation of the application it is noted that ordinary skill people for the art For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten Decorations also should be regarded as the protection domain of the application.

Claims (16)

1. a kind of processing method of text data is it is characterised in that include:
Read the text data of the multiple destination objects play, wherein, described destination object includes as follows Any one object:Film, TV play, modern drama, documentary film, speech and advertisement;
The plurality of destination object is carried out with Screening Treatment, generates destination object to be analyzed;
Text Pretreatment is carried out to the text data of described destination object to be analyzed, obtains described mesh to be analyzed The a plurality of segmentation plot of mark object;
The a plurality of segmentation plot of described destination object to be analyzed is modeled, obtains described target to be analyzed The probabilistic model of the dramatic progression of object, wherein, described probabilistic model is used for characterizing described target pair to be analyzed Any two included in a plurality of segmentation plot of elephant or the transformation result of multiple segmentation plot;
Using the probabilistic model of the dramatic progression of described destination object to be analyzed, obtain described target to be analyzed The probability of the plot state development of object, wherein, plot state development includes described any two or multiple segmentation Plot.
2. method according to claim 1 is it is characterised in that carry out Screening Treatment to the plurality of destination object, Generate destination object to be analyzed, including:
Using default type of theme, the plurality of destination object is classified, obtain any one type of theme The one group of destination object being comprised;
Screen, from any one group of destination object, the object that attention rate exceedes predetermined threshold according to pre-defined rule, obtain Described destination object to be analyzed.
3. method according to claim 1 is it is characterised in that text data to described destination object to be analyzed Carry out Text Pretreatment, obtain a plurality of segmentation plot of described destination object to be analyzed, including:
Extract from the text data of described destination object to be analyzed in the plot of described destination object to be analyzed Hold;
Fine granularity is carried out to the plot and content of described destination object to be analyzed or the subordinate sentence of coarseness is processed;
Plot and content after described subordinate sentence is processed carries out abstract process, obtains described a plurality of segmentation plot.
4. method according to claim 3 is it is characterised in that in the plot to described destination object to be analyzed After appearance carries out fine granularity or the subordinate sentence process of coarseness, methods described also includes:
Subordinate sentence after described subordinate sentence being processed using semantic model carries out duplicate removal, obtains the subordinate sentence after duplicate removal;
The subordinate sentence that the meaning of one's words in subordinate sentence after described duplicate removal is linked up merges.
5. method according to claim 3 is it is characterised in that the plot and content after processing described subordinate sentence is taken out As processing, obtain described a plurality of segmentation plot, including:
Plot and content after described subordinate sentence is processed carries out word segmentation processing, and removes stop-word, obtains described a plurality of Segmentation plot;
Wherein, after obtaining described a plurality of segmentation plot, according to the satisfaction in every segmentation plot of extraction of semantics Pre-conditioned sentence, and replace the described subject meeting in pre-conditioned sentence using predetermined general word.
6. the method according to any one in claim 1-5 is it is characterised in that to described target pair to be analyzed The a plurality of segmentation plot of elephant is modeled, and obtains the probabilistic model of the dramatic progression of described destination object to be analyzed, Including:
Using any one model following, a plurality of segmentation plot of described destination object to be analyzed is modeled: Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach;
Wherein, the probabilistic model of the dramatic progression of described destination object to be analyzed is included between any two state Transition probability, described state includes at least one segmentation plot.
7. method according to claim 6 is it is characterised in that sent out using the plot of described destination object to be analyzed The probabilistic model of exhibition, obtains the probability of the plot state development of described destination object to be analyzed, including:
Obtain any one or more plot state development, wherein, described plot state development is a bar state chain, Described state chain includes at least one state, and the transfer sequence of state;
Using the transition probability between two neighboring state, it is calculated the probability of described plot state development.
8. method according to claim 7 is it is characterised in that described state chain includes multiple states and each state Transfer sequence, be calculated the probability P of described plot state development by equation below:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize described two neighboring Transition probability between state, i is natural number.
9. a kind of processing meanss of text data are it is characterised in that include:
Reading unit, for reading the text data of the multiple destination objects play, wherein, described mesh Mark object includes any one object following:Film, TV play, modern drama, documentary film, speech and advertisement;
Signal generating unit, for the plurality of destination object is carried out with Screening Treatment, generates destination object to be analyzed;
Processing unit, for carrying out Text Pretreatment to the text data of described destination object to be analyzed, obtains The a plurality of segmentation plot of described destination object to be analyzed;
Modeling unit, for being modeled to a plurality of segmentation plot of described destination object to be analyzed, obtains institute State the probabilistic model of the dramatic progression of destination object to be analyzed, wherein, described probabilistic model is used for characterizing described Any two included in a plurality of segmentation plot of destination object to be analyzed or the Change-over knot of multiple segmentation plot Really:
Acquiring unit, for the probabilistic model of the dramatic progression using described destination object to be analyzed, obtains institute State the probability of the plot state development of destination object to be analyzed, wherein, plot state development includes described any Two or more segmentation plots.
10. device according to claim 9 is it is characterised in that described signal generating unit includes:
Sort module, for being classified to the plurality of destination object using default type of theme, must take office One group of destination object that a kind of type of theme of anticipating is comprised;
Screening module, exceedes predetermined threshold for screening attention rate from any one group of destination object according to pre-defined rule The object of value, obtains described destination object to be analyzed.
11. devices according to claim 9 are it is characterised in that described processing unit includes:
Extraction module, for extracting described mesh to be analyzed from the text data of described destination object to be analyzed The plot and content of mark object;
Subordinate sentence module, for carrying out fine granularity or coarseness to the plot and content of described destination object to be analyzed Subordinate sentence is processed;
Abstract module, carries out abstract process for the plot and content after processing described subordinate sentence, obtains described a plurality of Segmentation plot.
12. devices according to claim 11 are it is characterised in that described device also includes:
Deduplication module, for carrying out fine granularity or coarseness in the plot and content to described destination object to be analyzed Subordinate sentence process after, the subordinate sentence after described subordinate sentence being processed using semantic model carries out duplicate removal, after obtaining duplicate removal Subordinate sentence;
Merge module, the subordinate sentence for the meaning of one's words in the subordinate sentence after described duplicate removal links up merges.
13. devices according to claim 11 are it is characterised in that described abstract module includes:
Participle submodule, carries out word segmentation processing for the plot and content after processing described subordinate sentence, and removes stopping Word, obtains described a plurality of segmentation plot;
Wherein, after obtaining described a plurality of segmentation plot, according to the satisfaction in every segmentation plot of extraction of semantics Pre-conditioned sentence, and replace the described subject meeting in pre-conditioned sentence using predetermined general word.
14. devices according to any one in claim 9-13 are it is characterised in that described modeling unit includes:
MBM, for using the following a plurality of segmentation to described destination object to be analyzed for any one model Plot is modeled:Markov chain model, Hidden Markov chain model and bivariate Bayesian hierarchical approach;
Wherein, the probabilistic model of the dramatic progression of described destination object to be analyzed is included between any two state Transition probability, described state includes at least one segmentation plot.
15. devices according to claim 14 are it is characterised in that described acquiring unit includes:
Acquisition module, for obtaining any one or more plot state development, wherein, described plot state is sent out Open up as a bar state chain, described state chain includes at least one state, and the transfer sequence of state;
Computing module, for using the transition probability between two neighboring state, being calculated described plot state The probability of development.
16. devices according to claim 15 are it is characterised in that described state chain includes multiple states and each state Transfer sequence, described computing module includes:Calculating sub module, described for being calculated by equation below The probability P of plot state development:
P=P (A1)×P(A2)×......×P(Ai)×......×P(An), wherein, P (Ai) characterize described two neighboring Transition probability between state, i is natural number.
CN201510509639.7A 2015-08-18 2015-08-18 The treating method and apparatus of text data Active CN106469170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510509639.7A CN106469170B (en) 2015-08-18 2015-08-18 The treating method and apparatus of text data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510509639.7A CN106469170B (en) 2015-08-18 2015-08-18 The treating method and apparatus of text data

Publications (2)

Publication Number Publication Date
CN106469170A true CN106469170A (en) 2017-03-01
CN106469170B CN106469170B (en) 2019-09-10

Family

ID=58214848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510509639.7A Active CN106469170B (en) 2015-08-18 2015-08-18 The treating method and apparatus of text data

Country Status (1)

Country Link
CN (1) CN106469170B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368529A (en) * 2017-06-13 2017-11-21 中国传媒大学 Documentary film data content feature obtains system and tag library
CN107404671A (en) * 2017-06-13 2017-11-28 中国传媒大学 Movie contents feature obtains system and application system
CN107577672A (en) * 2017-09-19 2018-01-12 网智天元科技集团股份有限公司 Method and apparatus based on public sentiment setting drama
CN107766330A (en) * 2017-10-25 2018-03-06 西安影视数据评估中心有限公司 A kind of system and method for carrying out this quality analysis of movie and television play
CN108460024A (en) * 2018-03-30 2018-08-28 掌阅科技股份有限公司 Generation method, computing device and the computer storage media of e-book plot trend
CN109063485A (en) * 2018-07-27 2018-12-21 东北大学秦皇岛分校 A kind of vulnerability classification statistical system and method based on loophole platform
CN109902701A (en) * 2018-04-12 2019-06-18 华为技术有限公司 Image classification method and device
CN109902169A (en) * 2019-01-26 2019-06-18 北京工业大学 The method for promoting film recommender system performance based on caption information
CN111382282A (en) * 2018-12-28 2020-07-07 北京国双科技有限公司 Method, device, storage medium and processor for processing data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706794A (en) * 2009-11-24 2010-05-12 上海显智信息科技有限公司 Information browsing and retrieval method based on semantic entity-relationship model and visualized recommendation
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN103324712A (en) * 2013-06-19 2013-09-25 西北工业大学 Extraction method for non-redundancy plot rule
CN103914743A (en) * 2014-04-21 2014-07-09 中国科学技术大学先进技术研究院 On-line serial content popularity prediction method based on autoregressive model
US20150154246A1 (en) * 2013-12-03 2015-06-04 International Business Machines Corporation Recommendation Engine using Inferred Deep Similarities for Works of Literature
CN104965874A (en) * 2015-06-11 2015-10-07 腾讯科技(北京)有限公司 Information processing method and apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706794A (en) * 2009-11-24 2010-05-12 上海显智信息科技有限公司 Information browsing and retrieval method based on semantic entity-relationship model and visualized recommendation
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN103324712A (en) * 2013-06-19 2013-09-25 西北工业大学 Extraction method for non-redundancy plot rule
US20150154246A1 (en) * 2013-12-03 2015-06-04 International Business Machines Corporation Recommendation Engine using Inferred Deep Similarities for Works of Literature
CN103914743A (en) * 2014-04-21 2014-07-09 中国科学技术大学先进技术研究院 On-line serial content popularity prediction method based on autoregressive model
CN104965874A (en) * 2015-06-11 2015-10-07 腾讯科技(北京)有限公司 Information processing method and apparatus

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368529A (en) * 2017-06-13 2017-11-21 中国传媒大学 Documentary film data content feature obtains system and tag library
CN107404671A (en) * 2017-06-13 2017-11-28 中国传媒大学 Movie contents feature obtains system and application system
CN107577672A (en) * 2017-09-19 2018-01-12 网智天元科技集团股份有限公司 Method and apparatus based on public sentiment setting drama
CN107577672B (en) * 2017-09-19 2021-07-06 网智天元科技集团股份有限公司 Public opinion-based script setting method and device
CN107766330A (en) * 2017-10-25 2018-03-06 西安影视数据评估中心有限公司 A kind of system and method for carrying out this quality analysis of movie and television play
CN108460024B (en) * 2018-03-30 2019-03-15 掌阅科技股份有限公司 The generation method of e-book plot trend calculates equipment and computer storage medium
CN108460024A (en) * 2018-03-30 2018-08-28 掌阅科技股份有限公司 Generation method, computing device and the computer storage media of e-book plot trend
CN109902701A (en) * 2018-04-12 2019-06-18 华为技术有限公司 Image classification method and device
CN109063485A (en) * 2018-07-27 2018-12-21 东北大学秦皇岛分校 A kind of vulnerability classification statistical system and method based on loophole platform
CN109063485B (en) * 2018-07-27 2020-08-04 东北大学秦皇岛分校 Vulnerability classification statistical system and method based on vulnerability platform
CN111382282A (en) * 2018-12-28 2020-07-07 北京国双科技有限公司 Method, device, storage medium and processor for processing data
CN109902169A (en) * 2019-01-26 2019-06-18 北京工业大学 The method for promoting film recommender system performance based on caption information
CN109902169B (en) * 2019-01-26 2021-03-30 北京工业大学 Method for improving performance of film recommendation system based on film subtitle information

Also Published As

Publication number Publication date
CN106469170B (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN106469170A (en) The treating method and apparatus of text data
Smith et al. Harnessing ai for augmenting creativity: Application to movie trailer creation
CN113748439B (en) Prediction of successful quotient of movies
CN107578292B (en) User portrait construction system
CN104102723B (en) Search for content providing and search engine
CN103744928B (en) A kind of network video classification method based on history access record
CN116702737B (en) Document generation method, device, equipment, storage medium and product
CN107357889A (en) A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude
CN112749608A (en) Video auditing method and device, computer equipment and storage medium
CN107330021A (en) Data classification method, device and equipment based on multiway tree
CN111259154B (en) Data processing method and device, computer equipment and storage medium
CN112153426A (en) Content account management method and device, computer equipment and storage medium
CN111861550B (en) Family portrait construction method and system based on OTT equipment
CN113392331A (en) Text processing method and equipment
CN109816438A (en) Information-pushing method and device
CN110309114A (en) Processing method, device, storage medium and the electronic device of media information
CN110096591A (en) Long text classification method, device, computer equipment and storage medium based on bag of words
CN110489593A (en) Topic processing method, device, electronic equipment and the storage medium of video
CN113688951A (en) Video data processing method and device
CN109978491A (en) Remind prediction technique, device, computer equipment and storage medium
WO2022148108A1 (en) Systems, devices and methods for distributed hierarchical video analysis
CN103324662A (en) Visual method and equipment for dynamic view evolution of social media event
CN106777040A (en) A kind of across media microblogging the analysis of public opinion methods based on feeling polarities perception algorithm
Bello et al. Reverse engineering the behaviour of twitter bots
CN105869058B (en) A kind of method that multilayer latent variable model user portrait extracts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211118

Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: ZHEJIANG TMALL TECHNOLOGY Co.,Ltd.

Address before: Box 847, four, Grand Cayman capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.

TR01 Transfer of patent right