Nothing Special   »   [go: up one dir, main page]

CN109743589A - Article generation method and device - Google Patents

Article generation method and device Download PDF

Info

Publication number
CN109743589A
CN109743589A CN201811600339.XA CN201811600339A CN109743589A CN 109743589 A CN109743589 A CN 109743589A CN 201811600339 A CN201811600339 A CN 201811600339A CN 109743589 A CN109743589 A CN 109743589A
Authority
CN
China
Prior art keywords
paragraph
sentence
adjacent
words
time difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811600339.XA
Other languages
Chinese (zh)
Other versions
CN109743589B (en
Inventor
陈杰
张玉东
杨宏生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811600339.XA priority Critical patent/CN109743589B/en
Publication of CN109743589A publication Critical patent/CN109743589A/en
Application granted granted Critical
Publication of CN109743589B publication Critical patent/CN109743589B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • User Interface Of Digital Computer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of article generation method and device, and wherein method includes: to obtain video and corresponding voice, identifies to voice, obtains each sentence;The characteristic information for obtaining each sentence carries out paragraph division to each sentence according to characteristic information, obtains paragraph sequence;For each paragraph in paragraph sequence, the crucial sentence in paragraph is obtained;The crucial sentence corresponding period is obtained, selects key video sequence frame as the corresponding picture of paragraph out of the period in video corresponding video-frequency band;According to each paragraph and corresponding picture generation article in paragraph sequence, wherein include each paragraph and corresponding picture in article, can effectively embody video content, so that user easily chooses the video of desired viewing, improve video playing efficiency.

Description

Article generation method and device
Technical field
The present invention relates to technical field of video processing more particularly to a kind of article generation method and devices.
Background technique
Currently, can be analyzed and processed to video before issuing video, select the wherein frame picture in video as view The thumbnail of frequency, so that user first can understand video content according to thumbnail, and then determine whether to select after issuing video Watch video.However, the content that thumbnail is shown is less in above scheme, it is difficult to video content is effectively embodied, so that user It is difficult to choose the video for wanting viewing, can drop by the wayside when choosing the video for being not desired to viewing, be broadcast to reduce video Put efficiency.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, the first purpose of this invention is to propose a kind of article generation method, regarded in the prior art for solving Frequency thumbnail is difficult to effectively embody video content, the inefficient problem of video playing.
Second object of the present invention is to propose a kind of article generating means.
Third object of the present invention is to propose another article generating means.
Fourth object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
5th purpose of the invention is to propose a kind of computer program product.
In order to achieve the above object, first aspect present invention embodiment proposes a kind of article generation method, comprising:
Video and corresponding voice are obtained, the voice is identified, each sentence is obtained;
The characteristic information for obtaining each sentence carries out paragraph division to each sentence according to characteristic information, obtains paragraph sequence Column;
For each paragraph in the paragraph sequence, the crucial sentence in the paragraph is obtained;
The crucial sentence corresponding period is obtained, is selected in the corresponding video-frequency band of the period described in the video Key video sequence frame is as the corresponding picture of the paragraph;
According to each paragraph and corresponding picture generation article in the paragraph sequence.
Further, described that the voice is identified, obtain each sentence, comprising:
The voice is identified, each word and the corresponding timestamp of each word are obtained;
Described two phases are calculated according to the corresponding timestamp of described two adjacent words for any two adjacent word The time difference of adjacent word;
Judge whether the time difference is more than or equal to the first difference threshold;
If the time difference is divided into the same sentence less than the first difference threshold, by described two adjacent words In;
If the time difference is more than or equal to the first difference threshold, described two adjacent words are divided into different sentences In, obtain each sentence.
Further, include: the corresponding interlude stamp of sentence in the characteristic information, whether have conjunction in sentence;
It is described that paragraph division is carried out to each sentence according to characteristic information, obtain paragraph sequence, comprising:
For the adjacent sentence of any two, is stabbed according to the corresponding interlude of described two adjacent sentences, calculate described two The time difference of a adjacent sentence;
Judge whether the time difference is more than or equal in the second difference threshold and described two adjacent sentences rearward Whether sentence has conjunction;
If the time difference has connection less than the sentence in the second difference threshold or described two adjacent sentences rearward Described two adjacent sentences are then divided into identical paragraph by word;
If the time difference is more than or equal to the second difference threshold, and the sentence in described two adjacent sentences rearward does not have Described two adjacent sentences are then divided into different paragraphs, obtain paragraph sequence by conjunction.
Further, the determination method of second difference threshold is,
According to the time difference of the adjacent sentence of any two, time difference set is generated;
According to the time difference set, the standard deviation for determining the time difference set is calculated;
By the product of the standard deviation and predetermined coefficient, it is determined as second difference threshold.
Further, the characteristic information for obtaining each sentence carries out paragraph to each sentence according to characteristic information and draws Point, after obtaining paragraph sequence, further includes:
For each paragraph in the paragraph sequence, the number of words of the paragraph is obtained;
Judge whether the number of words of the paragraph is less than default number of words threshold value;
If the number of words of the paragraph is less than default number of words threshold value, the paragraph and adjacent segment rearward are dropped into capable conjunction And until the number of words of the paragraph after merging is more than or equal to default number of words threshold value.
Further, each paragraph in the paragraph sequence, obtains the crucial sentence in the paragraph, wraps It includes:
Obtain the title of the video;
By all sentences and the title in the paragraph sequence, preset keyword models are inputted, are obtained each Keyword and corresponding weight generate keyword set;
For each paragraph in the paragraph sequence, according to each sentence query keyword set in the paragraph, Obtain keyword included in each sentence;
According to the corresponding weight of keyword and keyword included in each sentence, the weight of each sentence is determined;
By the maximum sentence of weight corresponding in the paragraph, the crucial sentence being determined as in the paragraph.
It is further, described to obtain the crucial sentence corresponding period, comprising:
Obtain the corresponding interlude stamp of the crucial sentence;
According to the corresponding interlude stamp of the key sentence and preset threshold, determine that the crucial sentence is corresponding Period;The start time point of the period is the difference of interlude stamp and the preset threshold, the period Terminate time point be the interlude stamp with the preset threshold and value.
The article generation method of the embodiment of the present invention identifies voice by obtaining video and corresponding voice, Obtain each sentence;The characteristic information for obtaining each sentence carries out paragraph division to each sentence according to characteristic information, obtains section Fall sequence;For each paragraph in paragraph sequence, the crucial sentence in paragraph is obtained;Obtain the crucial sentence corresponding time Section, selects key video sequence frame as the corresponding picture of paragraph out of the period in video corresponding video-frequency band;According to paragraph sequence In each paragraph and corresponding picture generate article, wherein in article include each paragraph and corresponding picture, can Effective embodiment video content improves video playing efficiency so that user easily chooses the video of desired viewing.
In order to achieve the above object, second aspect of the present invention embodiment proposes a kind of article generating means, comprising:
Module is obtained to identify the voice for obtaining video and corresponding voice, obtain each sentence;
Division module carries out paragraph to each sentence according to characteristic information and draws for obtaining the characteristic information of each sentence Point, obtain paragraph sequence;
The acquisition module is also used to obtain the key in the paragraph for each paragraph in the paragraph sequence Sentence;
Selecting module, for obtaining the crucial sentence corresponding period, the period described in the video is corresponding Video-frequency band in select key video sequence frame as the corresponding picture of the paragraph;
Generation module, for according to each paragraph and corresponding picture generation article in the paragraph sequence.
Further, the acquisition module is specifically used for,
The voice is identified, each word and the corresponding timestamp of each word are obtained;
Described two phases are calculated according to the corresponding timestamp of described two adjacent words for any two adjacent word The time difference of adjacent word;
Judge whether the time difference is more than or equal to the first difference threshold;
If the time difference is divided into the same sentence less than the first difference threshold, by described two adjacent words In;
If the time difference is more than or equal to the first difference threshold, described two adjacent words are divided into different sentences In, obtain each sentence.
Further, include: the corresponding interlude stamp of sentence in the characteristic information, whether have conjunction in sentence;
The division module is specifically used for,
For the adjacent sentence of any two, is stabbed according to the corresponding interlude of described two adjacent sentences, calculate described two The time difference of a adjacent sentence;
Judge whether the time difference is more than or equal in the second difference threshold and described two adjacent sentences rearward Whether sentence has conjunction;
If the time difference has connection less than the sentence in the second difference threshold or described two adjacent sentences rearward Described two adjacent sentences are then divided into identical paragraph by word;
If the time difference is more than or equal to the second difference threshold, and the sentence in described two adjacent sentences rearward does not have Described two adjacent sentences are then divided into different paragraphs, obtain paragraph sequence by conjunction.
Further, the determination method of second difference threshold is,
According to the time difference of the adjacent sentence of any two, time difference set is generated;
According to the time difference set, the standard deviation for determining the time difference set is calculated;
By the product of the standard deviation and predetermined coefficient, it is determined as second difference threshold.
Further, the device further include: judgment module and merging module;
The acquisition module is also used to obtain the number of words of the paragraph for each paragraph in the paragraph sequence;
The judgment module, for judging whether the number of words of the paragraph is less than default number of words threshold value;
The merging module, when being less than default number of words threshold value for the number of words in the paragraph, by the paragraph and rearward Adjacent segment fall and merge, until the number of words of the paragraph after merging is more than or equal to default number of words threshold value.
Further, the acquisition module is specifically used for,
Obtain the title of the video;
By all sentences and the title in the paragraph sequence, preset keyword models are inputted, are obtained each Keyword and corresponding weight generate keyword set;
For each paragraph in the paragraph sequence, according to each sentence query keyword set in the paragraph, Obtain keyword included in each sentence;
According to the corresponding weight of keyword and keyword included in each sentence, the weight of each sentence is determined;
By the maximum sentence of weight corresponding in the paragraph, the crucial sentence being determined as in the paragraph.
Further, the selecting module is specifically used for,
Obtain the corresponding interlude stamp of the crucial sentence;
According to the corresponding interlude stamp of the key sentence and preset threshold, determine that the crucial sentence is corresponding Period;The start time point of the period is the difference of interlude stamp and the preset threshold, the period Terminate time point be the interlude stamp with the preset threshold and value.
The article generating means of the embodiment of the present invention identify voice by obtaining video and corresponding voice, Obtain each sentence;The characteristic information for obtaining each sentence carries out paragraph division to each sentence according to characteristic information, obtains section Fall sequence;For each paragraph in paragraph sequence, the crucial sentence in paragraph is obtained;Obtain the crucial sentence corresponding time Section, selects key video sequence frame as the corresponding picture of paragraph out of the period in video corresponding video-frequency band;According to paragraph sequence In each paragraph and corresponding picture generate article, wherein in article include each paragraph and corresponding picture, can Effective embodiment video content improves video playing efficiency so that user easily chooses the video of desired viewing.
In order to achieve the above object, third aspect present invention embodiment proposes another article generating means, comprising: storage Device, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that the processor Article generation method as described above is realized when executing described program.
To achieve the goals above, fourth aspect present invention embodiment proposes a kind of computer readable storage medium, On be stored with computer program, which realizes article generation method as described above when being executed by processor.
To achieve the goals above, fifth aspect present invention embodiment proposes a kind of computer program product, when described When instruction processing unit in computer program product executes, article generation method as described above is realized.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is a kind of flow diagram of article generation method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of article generating means provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of another article generating means provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of another article generating means provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the article generation method and device of the embodiment of the present invention are described.
Fig. 1 is a kind of flow diagram of article generation method provided in an embodiment of the present invention.As shown in Figure 1, this article Generation method the following steps are included:
S101, video and corresponding voice are obtained, voice is identified, each sentence is obtained.
The executing subject of article generation method provided by the invention is article generating means, and article generating means can be for eventually The hardware devices such as end equipment, server, or the software to be installed on hardware device.In the present embodiment, video for example can be Video etc. to be released.
In the present embodiment, in the corresponding voice of video, pause since people speaks to exist, it is general between especially every words Can exist and pause, therefore, according to the corresponding timestamp of each word, so that it may determine each sentence in voice.It is corresponding, text Chapter generating means identify that the process for obtaining each sentence is specifically as follows to voice, identify to voice, obtain each Word and the corresponding timestamp of each word;For any two adjacent word, according to two adjacent words corresponding time Stamp, calculates the time difference of two adjacent words;Judge whether time difference is more than or equal to the first difference threshold;If time difference Less than the first difference threshold, then two adjacent words are divided into the same sentence;If it is poor that time difference is more than or equal to first It is worth threshold value, then two adjacent words is divided into different sentences, obtains each sentence.Wherein, timestamp can be word Initial time stamp, interlude stamp terminate timestamp.Time difference for example can be 0.2 second, 0.3 second etc..
S102, the characteristic information for obtaining each sentence carry out paragraph division to each sentence according to characteristic information, obtain section Fall sequence.
It may include: the corresponding interlude stamp of sentence in the present embodiment, in characteristic information, whether have conjunction in sentence Deng.Wherein, conjunction for example " and ", " still " etc..Corresponding, the process that article generating means execute step 102 specifically may be used Think, for the adjacent sentence of any two, is stabbed according to the corresponding interlude of two adjacent sentences, calculate two adjacent sentences Time difference;The sentence for judging whether time difference is more than or equal in the second difference threshold and two adjacent sentences rearward is It is no to have conjunction;If time difference has conjunction less than the sentence in the second difference threshold or two adjacent sentences rearward, Two adjacent sentences are divided into identical paragraph;If time difference is more than or equal to the second difference threshold, and two adjacent sentences In sentence rearward there is no conjunction, then two adjacent sentences are divided into different paragraphs, obtain paragraph sequence.
Wherein, the determination method of the second difference threshold can be, according to the time difference of the adjacent sentence of any two, to generate Time difference set;According to time difference set, the standard deviation for determining time difference set is calculated;By standard deviation and predetermined coefficient Product, be determined as the second difference threshold.Wherein, the second difference threshold is greater than the first difference threshold.Wherein, predetermined coefficient is for example It can be N, the value of N can be 2 etc..
Further, on the basis of the above embodiments, due to generally having a certain number of numbers of words in paragraph, In order to more accurately divide paragraph, after step 102, above-mentioned method can be the following steps are included: in paragraph sequence Each paragraph, obtain the number of words of paragraph;Judge whether the number of words of paragraph is less than default number of words threshold value;If the number of words of paragraph is less than Default number of words threshold value, then fall with adjacent segment rearward by paragraph and merge, until the number of words of the paragraph after merging is more than or equal to Default number of words threshold value.
For example, first paragraph and second paragraph are closed if the number of words of first paragraph is less than default number of words threshold value It and is a paragraph;Judge whether the number of words of the paragraph after merging is less than default number of words threshold value, if being less than default number of words threshold value, By after merging paragraph and third paragraph merge, the paragraph after being merged.At this point, if paragraph after reconsolidating Number of words is more than or equal to default number of words threshold value, then stops operating the paragraph after merging;Then the 4th paragraph, judgement are obtained Whether the number of words of the 4th paragraph is less than default number of words threshold value.
S103, for each paragraph in paragraph sequence, obtain the crucial sentence in paragraph.
In the present embodiment, crucial sentence in paragraph, for the sentence for best embodying paragraph central idea in paragraph.Article is raw It is specifically as follows at the process that device executes step 103, obtains the title of video;By all sentences and mark in paragraph sequence Topic, inputs preset keyword models, obtains each keyword and corresponding weight, generates keyword set;For paragraph Each paragraph in sequence obtains pass included in each sentence according to each sentence query keyword set in paragraph Keyword;According to the corresponding weight of keyword and keyword included in each sentence, the weight of each sentence is determined;It will The corresponding maximum sentence of weight, the crucial sentence being determined as in paragraph in paragraph.
Wherein, keyword can be more for frequency of occurrence in all sentences, or embodies the word of all paragraph central ideas Language.Wherein, keyword models can be neural network model etc., and keyword models can be according to training text and training text Corresponding keyword set is trained.
In the present embodiment, according to the corresponding weight of keyword and keyword included in each sentence, determine each The process of the weight of sentence is specifically as follows, and for each sentence, obtains in the sentence included keyword, keyword The corresponding weight of frequency of occurrence, keyword;The frequency of occurrence of each keyword and the product of weight are calculated, numerical value is obtained, by institute Including the numerical value of each keyword sum up, obtain the weight of sentence.
S104, the crucial sentence corresponding period is obtained, crucial view is selected out of the period in video corresponding video-frequency band Frequency frame is as the corresponding picture of paragraph.
In the present embodiment, the process that article generating means obtain the crucial sentence corresponding period is specifically as follows, and obtains The corresponding interlude stamp of crucial sentence;According to the corresponding interlude stamp of crucial sentence and preset threshold, critical sentence is determined The son corresponding period;The start time point of period is interlude stamp and the difference of preset threshold, when the termination of period Between point be interlude stamp with preset threshold and value.
Wherein, the crucial sentence corresponding period can be located at the start time point according to crucial sentence and terminate the time In period determined by point.In the present embodiment, article generating means can select most complete video frame out of video-frequency band, make For key video sequence frame.
S105, according in paragraph sequence each paragraph and corresponding picture generate article.
It may include the interior of first paragraph in article for including 3 paragraphs in paragraph sequence in the present embodiment Appearance, the corresponding picture of first paragraph, the content of second paragraph, the corresponding picture of second paragraph, third paragraph it is interior Hold, the corresponding picture of third paragraph.
In the present embodiment, after generating article, the corresponding chained address of article is can be generated in article generating means.It is regarded in publication When frequency, the corresponding chained address of article is shown on the publication page of video, so that user can first lead to before watching video Chained address browsing article is crossed, determines whether video is that oneself wants the video of viewing, and then determines whether to watch according to article Video etc..
In addition, the corresponding chained address of video can be generated in article generating means after generating article.By the corresponding chain of video Ground connection location is shown on the page where article, so that user is after watching article, to the interested situation of article content Under, chained address can be clicked directly on, to watch video.
The article generation method of the embodiment of the present invention identifies voice by obtaining video and corresponding voice, Obtain each sentence;The characteristic information for obtaining each sentence carries out paragraph division to each sentence according to characteristic information, obtains section Fall sequence;For each paragraph in paragraph sequence, the crucial sentence in paragraph is obtained;Obtain the crucial sentence corresponding time Section, selects key video sequence frame as the corresponding picture of paragraph out of the period in video corresponding video-frequency band;According to paragraph sequence In each paragraph and corresponding picture generate article, wherein in article include each paragraph and corresponding picture, can Effective embodiment video content improves video playing efficiency so that user easily chooses the video of desired viewing.
Fig. 2 is a kind of structural schematic diagram of article generating means provided in an embodiment of the present invention.As shown in Figure 2, comprising: obtain Modulus block 21, division module 22, selecting module 23 and generation module 24.
Wherein, module 21 is obtained, for obtaining video and corresponding voice, the voice is identified, is obtained each A sentence;
Division module 22 carries out paragraph to each sentence according to characteristic information for obtaining the characteristic information of each sentence It divides, obtains paragraph sequence;
The acquisition module 21 is also used to obtain the pass in the paragraph for each paragraph in the paragraph sequence Key sentence;
Selecting module 23, for obtaining the crucial sentence corresponding period, the period pair described in the video Select key video sequence frame as the corresponding picture of the paragraph in the video-frequency band answered;
Generation module 24, for according to each paragraph and corresponding picture generation article in the paragraph sequence.
Article generating means provided by the invention can be the hardware devices such as terminal device, server, or set for hardware The software of standby upper installation.In the present embodiment, video for example can be video etc. to be released.
In the present embodiment, in the corresponding voice of video, pause since people speaks to exist, it is general between especially every words Can exist and pause, therefore, according to the corresponding timestamp of each word, so that it may determine each sentence in voice.It is corresponding, it obtains Modulus block 21 specifically can be used for, and identify to voice, obtain each word and the corresponding timestamp of each word;For Any two adjacent word calculates the time difference of two adjacent words according to the corresponding timestamp of two adjacent words;Judgement Whether time difference is more than or equal to the first difference threshold;If time difference is less than the first difference threshold, by two adjacent words It is divided into the same sentence;If time difference is more than or equal to the first difference threshold, two adjacent words are divided into difference In sentence, each sentence is obtained.Wherein, timestamp can be the initial time stamp of word, interlude stamp or termination time Stamp.Time difference for example can be 0.2 second, 0.3 second etc..
It may include: the corresponding interlude stamp of sentence in the present embodiment, in characteristic information, whether have conjunction in sentence Deng.Wherein, conjunction for example " and ", " still " etc..Corresponding, division module 22 specifically can be used for, for any two Adjacent sentence stabs according to the corresponding interlude of two adjacent sentences, calculates the time difference of two adjacent sentences;Judge the time Whether the sentence whether difference is more than or equal in the second difference threshold and two adjacent sentences rearward has conjunction;If the time Difference has conjunction less than the sentence in the second difference threshold or two adjacent sentences rearward, then draws two adjacent sentences It assigns in identical paragraph;If time difference is more than or equal to the second difference threshold, and the sentence in two adjacent sentences rearward does not have Two adjacent sentences are then divided into different paragraphs, obtain paragraph sequence by conjunction.
Wherein, the determination method of the second difference threshold can be, according to the time difference of the adjacent sentence of any two, to generate Time difference set;According to time difference set, the standard deviation for determining time difference set is calculated;By standard deviation and predetermined coefficient Product, be determined as the second difference threshold.Wherein, the second difference threshold is greater than the first difference threshold.
Further, on the basis of the above embodiments, due to generally having a certain number of numbers of words in paragraph, In order to more accurately divide paragraph, in conjunction with reference Fig. 3, the device can also include: judgment module 25 and merging module 26;
Wherein, the acquisition module 21 is also used to obtain the paragraph for each paragraph in the paragraph sequence Number of words;
The judgment module 25, for judging whether the number of words of the paragraph is less than default number of words threshold value;
The merging module 26, when being less than default number of words threshold value for the number of words in the paragraph, by the paragraph with lean on Adjacent segment afterwards, which is fallen, to be merged, until the number of words of the paragraph after merging is more than or equal to default number of words threshold value.
For example, first paragraph and second paragraph are closed if the number of words of first paragraph is less than default number of words threshold value It and is a paragraph;Judge whether the number of words of the paragraph after merging is less than default number of words threshold value, if being less than default number of words threshold value, By after merging paragraph and third paragraph merge, the paragraph after being merged.At this point, if paragraph after reconsolidating Number of words is more than or equal to default number of words threshold value, then stops operating the paragraph after merging;Then the 4th paragraph, judgement are obtained Whether the number of words of the 4th paragraph is less than default number of words threshold value.
In the present embodiment, crucial sentence in paragraph, for the sentence for best embodying paragraph central idea in paragraph.It is corresponding , obtaining module 21 specifically can be used for, and obtain the title of video;By all sentences and title in paragraph sequence, input Preset keyword models obtain each keyword and corresponding weight, generate keyword set;For in paragraph sequence Each paragraph obtains keyword included in each sentence according to each sentence query keyword set in paragraph;According to The included corresponding weight of keyword and keyword, determines the weight of each sentence in each sentence;It will be corresponding in paragraph The maximum sentence of weight, the crucial sentence being determined as in paragraph.
Wherein, keyword can be more for frequency of occurrence in all sentences, or embodies the word of all paragraph central ideas Language.Wherein, keyword models can be neural network model etc., and keyword models can be according to training text and training text Corresponding keyword set is trained.
In the present embodiment, according to the corresponding weight of keyword and keyword included in each sentence, determine each The process of the weight of sentence is specifically as follows, and for each sentence, obtains in the sentence included keyword, keyword The corresponding weight of frequency of occurrence, keyword;The frequency of occurrence of each keyword and the product of weight are calculated, numerical value is obtained, by institute Including the numerical value of each keyword sum up, obtain the weight of sentence.
In the present embodiment, acquisition module 21 obtains the process of crucial sentence corresponding period and is specifically as follows, and obtains and closes The corresponding interlude stamp of key sentence;According to the corresponding interlude stamp of crucial sentence and preset threshold, key sentence is determined The corresponding period;The start time point of period is the difference of interlude stamp and preset threshold, the termination time of period Point is that interlude stabs with preset threshold and value.
Wherein, the crucial sentence corresponding period can be located at the start time point according to crucial sentence and terminate the time In period determined by point.In the present embodiment, article generating means can select most complete video frame out of video-frequency band, make For key video sequence frame.
In the present embodiment, after generating article, the corresponding chained address of article is can be generated in article generating means.It is regarded in publication When frequency, the corresponding chained address of article is shown on the publication page of video, so that user can first lead to before watching video Chained address browsing article is crossed, determines whether video is that oneself wants the video of viewing, and then determines whether to watch according to article Video etc..
In addition, the corresponding chained address of video can be generated in article generating means after generating article.By the corresponding chain of video Ground connection location is shown on the page where article, so that user is after watching article, to the interested situation of article content Under, chained address can be clicked directly on, to watch video.
The article generating means of the embodiment of the present invention identify voice by obtaining video and corresponding voice, Obtain each sentence;The characteristic information for obtaining each sentence carries out paragraph division to each sentence according to characteristic information, obtains section Fall sequence;For each paragraph in paragraph sequence, the crucial sentence in paragraph is obtained;Obtain the crucial sentence corresponding time Section, selects key video sequence frame as the corresponding picture of paragraph out of the period in video corresponding video-frequency band;According to paragraph sequence In each paragraph and corresponding picture generate article, wherein in article include each paragraph and corresponding picture, can Effective embodiment video content improves video playing efficiency so that user easily chooses the video of desired viewing.
Fig. 4 is the structural schematic diagram of another article generating means provided in an embodiment of the present invention.This article generating means Include:
Memory 1001, processor 1002 and it is stored in the calculating that can be run on memory 1001 and on processor 1002 Machine program.
Processor 1002 realizes the article generation method provided in above-described embodiment when executing described program.
Further, article generating means further include:
Communication interface 1003, for the communication between memory 1001 and processor 1002.
Memory 1001, for storing the computer program that can be run on processor 1002.
Memory 1001 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
Processor 1002 realizes article generation method described in above-described embodiment when for executing described program.
If memory 1001, processor 1002 and the independent realization of communication interface 1003, communication interface 1003, memory 1001 and processor 1002 can be connected with each other by bus and complete mutual communication.The bus can be industrial standard Architecture (Industry Standard Architecture, referred to as ISA) bus, external equipment interconnection (Peripheral Component, referred to as PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus etc..The bus can be divided into address bus, data/address bus, control Bus processed etc..Only to be indicated with a thick line in Fig. 4, it is not intended that an only bus or a type of convenient for indicating Bus.
Optionally, in specific implementation, if memory 1001, processor 1002 and communication interface 1003, are integrated in one It is realized on block chip, then memory 1001, processor 1002 and communication interface 1003 can be completed mutual by internal interface Communication.
Processor 1002 may be a central processing unit (Central Processing Unit, referred to as CPU), or Person is specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC) or quilt It is configured to implement one or more integrated circuits of the embodiment of the present invention.
The present invention also provides a kind of non-transitorycomputer readable storage mediums, are stored thereon with computer program, the journey Article generation method as described above is realized when sequence is executed by processor.
The present invention also provides a kind of computer program products, when the instruction processing unit in the computer program product executes When, realize article generation method as described above.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used Any one of art or their combination are realized: have for data-signal is realized the logic gates of logic function from Logic circuit is dissipated, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as to limit of the invention System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of the invention Type.

Claims (17)

1. a kind of article generation method characterized by comprising
Video and corresponding voice are obtained, the voice is identified, each sentence is obtained;
The characteristic information for obtaining each sentence carries out paragraph division to each sentence according to characteristic information, obtains paragraph sequence;
For each paragraph in the paragraph sequence, the crucial sentence in the paragraph is obtained;
The crucial sentence corresponding period is obtained, is selected in the corresponding video-frequency band of the period described in the video crucial Video frame is as the corresponding picture of the paragraph;
According to each paragraph and corresponding picture generation article in the paragraph sequence.
2. each sentence is obtained the method according to claim 1, wherein described identify the voice, Include:
The voice is identified, each word and the corresponding timestamp of each word are obtained;
Described two adjacent words are calculated according to the corresponding timestamp of described two adjacent words for any two adjacent word The time difference of language;
Judge whether the time difference is more than or equal to the first difference threshold;
If described two adjacent words are divided into the same sentence by the time difference less than the first difference threshold;
If the time difference is more than or equal to the first difference threshold, described two adjacent words are divided into different sentences, Obtain each sentence.
3. the method according to claim 1, wherein when including: that sentence is corresponding intermediate in the characteristic information Between stab, whether have conjunction in sentence;
It is described that paragraph division is carried out to each sentence according to characteristic information, obtain paragraph sequence, comprising:
For the adjacent sentence of any two, is stabbed according to the corresponding interlude of described two adjacent sentences, calculate described two phases The time difference of adjacent sentence;
Judge whether the time difference is more than or equal to the sentence in the second difference threshold and described two adjacent sentences rearward Whether conjunction is had;
If the time difference has conjunction less than the sentence in the second difference threshold or described two adjacent sentences rearward, Then described two adjacent sentences are divided into identical paragraph;
If the time difference is more than or equal to the second difference threshold, and the sentence in described two adjacent sentences rearward does not connect Described two adjacent sentences are then divided into different paragraphs, obtain paragraph sequence by word.
4. according to the method described in claim 3, it is characterized in that, the determination method of second difference threshold is,
According to the time difference of the adjacent sentence of any two, time difference set is generated;
According to the time difference set, the standard deviation for determining the time difference set is calculated;
By the product of the standard deviation and predetermined coefficient, it is determined as second difference threshold.
5. method according to claim 1 or 3, which is characterized in that the characteristic information for obtaining each sentence, according to spy Reference breath carries out paragraph division to each sentence, after obtaining paragraph sequence, further includes:
For each paragraph in the paragraph sequence, the number of words of the paragraph is obtained;
Judge whether the number of words of the paragraph is less than default number of words threshold value;
If the number of words of the paragraph is less than default number of words threshold value, the paragraph is fallen with adjacent segment rearward and is merged, directly The number of words of paragraph after to merging is more than or equal to default number of words threshold value.
6. the method according to claim 1, wherein each paragraph in the paragraph sequence, is obtained Take the crucial sentence in the paragraph, comprising:
Obtain the title of the video;
By all sentences and the title in the paragraph sequence, preset keyword models are inputted, each key is obtained Word and corresponding weight generate keyword set;
It is obtained for each paragraph in the paragraph sequence according to each sentence query keyword set in the paragraph Included keyword in each sentence;
According to the corresponding weight of keyword and keyword included in each sentence, the weight of each sentence is determined;
By the maximum sentence of weight corresponding in the paragraph, the crucial sentence being determined as in the paragraph.
7. the method according to claim 1, wherein described obtain the crucial sentence corresponding period, packet It includes:
Obtain the corresponding interlude stamp of the crucial sentence;
According to the corresponding interlude stamp of the key sentence and preset threshold, the crucial sentence corresponding time is determined Section;The start time point of the period is the difference of interlude stamp and the preset threshold, the end of the period Only time point is that the interlude stabs with the preset threshold and value.
8. a kind of article generating means characterized by comprising
Module is obtained to identify the voice for obtaining video and corresponding voice, obtain each sentence;
Division module carries out paragraph division to each sentence according to characteristic information, obtains for obtaining the characteristic information of each sentence To paragraph sequence;
The acquisition module is also used to obtain the crucial sentence in the paragraph for each paragraph in the paragraph sequence;
Selecting module, for obtaining the crucial sentence corresponding period, the corresponding view of the period described in the video Select key video sequence frame as the corresponding picture of the paragraph in frequency range;
Generation module, for according to each paragraph and corresponding picture generation article in the paragraph sequence.
9. device according to claim 8, which is characterized in that the acquisition module is specifically used for,
The voice is identified, each word and the corresponding timestamp of each word are obtained;
Described two adjacent words are calculated according to the corresponding timestamp of described two adjacent words for any two adjacent word The time difference of language;
Judge whether the time difference is more than or equal to the first difference threshold;
If described two adjacent words are divided into the same sentence by the time difference less than the first difference threshold;
If the time difference is more than or equal to the first difference threshold, described two adjacent words are divided into different sentences, Obtain each sentence.
10. device according to claim 8, which is characterized in that when including: that sentence is corresponding intermediate in the characteristic information Between stab, whether have conjunction in sentence;
The division module is specifically used for,
For the adjacent sentence of any two, is stabbed according to the corresponding interlude of described two adjacent sentences, calculate described two phases The time difference of adjacent sentence;
Judge whether the time difference is more than or equal to the sentence in the second difference threshold and described two adjacent sentences rearward Whether conjunction is had;
If the time difference has conjunction less than the sentence in the second difference threshold or described two adjacent sentences rearward, Then described two adjacent sentences are divided into identical paragraph;
If the time difference is more than or equal to the second difference threshold, and the sentence in described two adjacent sentences rearward does not connect Described two adjacent sentences are then divided into different paragraphs, obtain paragraph sequence by word.
11. device according to claim 10, which is characterized in that the determination method of second difference threshold is,
According to the time difference of the adjacent sentence of any two, time difference set is generated;
According to the time difference set, the standard deviation for determining the time difference set is calculated;
By the product of the standard deviation and predetermined coefficient, it is determined as second difference threshold.
12. the device according to claim 8 or 10, which is characterized in that further include: judgment module and merging module;
The acquisition module is also used to obtain the number of words of the paragraph for each paragraph in the paragraph sequence;
The judgment module, for judging whether the number of words of the paragraph is less than default number of words threshold value;
The merging module, when being less than default number of words threshold value for the number of words in the paragraph, by the paragraph and phase rearward Adjacent paragraph merges, until the number of words of the paragraph after merging is more than or equal to default number of words threshold value.
13. device according to claim 8, which is characterized in that the acquisition module is specifically used for,
Obtain the title of the video;
By all sentences and the title in the paragraph sequence, preset keyword models are inputted, each key is obtained Word and corresponding weight generate keyword set;
It is obtained for each paragraph in the paragraph sequence according to each sentence query keyword set in the paragraph Included keyword in each sentence;
According to the corresponding weight of keyword and keyword included in each sentence, the weight of each sentence is determined;
By the maximum sentence of weight corresponding in the paragraph, the crucial sentence being determined as in the paragraph.
14. device according to claim 8, which is characterized in that the selecting module is specifically used for,
Obtain the corresponding interlude stamp of the crucial sentence;
According to the corresponding interlude stamp of the key sentence and preset threshold, the crucial sentence corresponding time is determined Section;The start time point of the period is the difference of interlude stamp and the preset threshold, the end of the period Only time point is that the interlude stabs with the preset threshold and value.
15. a kind of article generating means characterized by comprising
Memory, processor and storage are on a memory and the computer program that can run on a processor, which is characterized in that institute State the article generation method realized as described in any in claim 1-7 when processor executes described program.
16. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program The article generation method as described in any in claim 1-7 is realized when being executed by processor.
17. a kind of computer program product realizes such as right when the instruction processing unit in the computer program product executes It is required that any article generation method in 1-7.
CN201811600339.XA 2018-12-26 2018-12-26 Article generation method and device Active CN109743589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811600339.XA CN109743589B (en) 2018-12-26 2018-12-26 Article generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811600339.XA CN109743589B (en) 2018-12-26 2018-12-26 Article generation method and device

Publications (2)

Publication Number Publication Date
CN109743589A true CN109743589A (en) 2019-05-10
CN109743589B CN109743589B (en) 2021-12-14

Family

ID=66359996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811600339.XA Active CN109743589B (en) 2018-12-26 2018-12-26 Article generation method and device

Country Status (1)

Country Link
CN (1) CN109743589B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245339A (en) * 2019-06-20 2019-09-17 北京百度网讯科技有限公司 Article generation method, device, equipment and storage medium
CN111883136A (en) * 2020-07-30 2020-11-03 潘忠鸿 Rapid writing method and device based on artificial intelligence
CN111966839A (en) * 2020-08-17 2020-11-20 北京奇艺世纪科技有限公司 Data processing method and device, electronic equipment and computer storage medium
CN112733654A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN113286173A (en) * 2021-05-19 2021-08-20 北京沃东天骏信息技术有限公司 Video editing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110239107A1 (en) * 2010-03-29 2011-09-29 Phillips Michael E Transcript editor
CN104794104A (en) * 2015-04-30 2015-07-22 努比亚技术有限公司 Multimedia document generating method and device
CN106134216A (en) * 2014-04-11 2016-11-16 三星电子株式会社 Broadcast receiver and method for clip Text service
CN106982344A (en) * 2016-01-15 2017-07-25 阿里巴巴集团控股有限公司 video information processing method and device
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110239107A1 (en) * 2010-03-29 2011-09-29 Phillips Michael E Transcript editor
CN106134216A (en) * 2014-04-11 2016-11-16 三星电子株式会社 Broadcast receiver and method for clip Text service
CN104794104A (en) * 2015-04-30 2015-07-22 努比亚技术有限公司 Multimedia document generating method and device
CN106982344A (en) * 2016-01-15 2017-07-25 阿里巴巴集团控股有限公司 video information processing method and device
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245339A (en) * 2019-06-20 2019-09-17 北京百度网讯科技有限公司 Article generation method, device, equipment and storage medium
CN111883136A (en) * 2020-07-30 2020-11-03 潘忠鸿 Rapid writing method and device based on artificial intelligence
CN111966839A (en) * 2020-08-17 2020-11-20 北京奇艺世纪科技有限公司 Data processing method and device, electronic equipment and computer storage medium
CN112733654A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN113286173A (en) * 2021-05-19 2021-08-20 北京沃东天骏信息技术有限公司 Video editing method and device
CN113286173B (en) * 2021-05-19 2023-08-04 北京沃东天骏信息技术有限公司 Video editing method and device

Also Published As

Publication number Publication date
CN109743589B (en) 2021-12-14

Similar Documents

Publication Publication Date Title
CN109743589A (en) Article generation method and device
US10140368B2 (en) Method and apparatus for generating a recommendation page
CN109862432A (en) Clicking rate prediction technique and device
CN105138568B (en) Search result shows method, apparatus and search engine
CN104182481B (en) Resource recommendation method and device
CN110188350A (en) Text coherence calculation method and device
CN106844341A (en) News in brief extracting method and device based on artificial intelligence
CN109286850A (en) A kind of video labeling method and terminal based on barrage
CN109511015A (en) Multimedia resource recommended method, device, storage medium and equipment
CN106101846A (en) A kind of information processing method and device, terminal
CN103076950B (en) A kind of management method of threads of conversation list
CN104699696A (en) File recommendation method and device
CN109582882A (en) Search result shows method, apparatus and electronic equipment
TW201717067A (en) System, method and computer readable recording media for issue display
CN107748802A (en) Polymerizable clc method and device
US20200151220A1 (en) Interactive representation of content for relevance detection and review
CN104216885A (en) Recommending system and method with static and dynamic recommending reasons automatically combined
US20240330581A1 (en) Method for automatically generating responsive media
CN109710773A (en) The generation method and its device of event body
CN108874674A (en) page debugging method and device
CN106970985A (en) Information flow channel classification exchange method, device and the server guided based on demand
CN109739367A (en) Candidate word list generation method and device
CN104657480B (en) Caricature searching method and device
CN106055688A (en) Search result display method and device and mobile terminal
CN109710840A (en) The appraisal procedure and device of article content depth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant