CN110245334B

CN110245334B - Method and device for outputting information

Info

Publication number: CN110245334B
Application number: CN201910552619.6A
Authority: CN
Inventors: 蒋帅; 陈思姣; 梁海金; 罗雨; 卞东海
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2023-06-16
Anticipated expiration: 2039-06-25
Also published as: CN110245334A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for outputting information. One embodiment of the method comprises the following steps: acquiring audio information to be converted; converting the audio information into text information; word segmentation is carried out on the text information to obtain a word sequence; for words in the word sequence, inquiring the connection probability between the word and the next word of the word and the connection probability between the word and various punctuations according to a word connection probability table obtained by a word connection probability model trained in advance, and determining the connection target of the word based on the inquired connection probability; and connecting each word in the word sequence with a corresponding connection target to generate an article with punctuation and outputting the article. This embodiment can automatically turn audio into punctuated articles.

Description

Method and device for outputting information

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for outputting information.

Background

In the field of automatic article generation, the number of automatically generated articles by multimedia transcription is small, and most of the automatically generated articles are generated according to structured text data, so that the data sources are single, and the generated articles are not rich and wide enough; the manually edited multimedia articles are time-consuming and cumbersome, resulting in unnecessary human and financial costs. The conventional method mainly comprises the steps of manually editing, manually converting related audio into text, searching related pictures on a network according to audio subjects and the like, and finally manually rendering the converted text and pictures.

The main problems of the manual-based method are: (1) conversion of audio: the manual mode is time-consuming and labor-consuming, and the accuracy is not necessarily high; (2) selection of a matching chart: the related pictures are selected according to the theme, and a great deal of labor is consumed in a manual searching mode; (3) And the organization rendering of the article, and the organization of the related text and the picture finally generates the article with strong reading performance.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatus for outputting information.

In a first aspect, embodiments of the present disclosure provide a method for outputting information, comprising: acquiring audio information to be converted; converting the audio information into text information; word segmentation is carried out on the text information to obtain a word sequence; for words in the word sequence, inquiring the connection probability between the word and the next word of the word and the connection probability between the word and various punctuations according to a word connection probability table obtained by a word connection probability model trained in advance, and determining the connection target of the word based on the inquired connection probability; and connecting each word in the word sequence with a corresponding connection target to generate an article with punctuation and outputting the article.

In some embodiments, the word connection probability table is obtained by: acquiring a training sample set, wherein the training sample comprises sentences containing punctuation; taking sentences of training samples in the training sample set as the input of the LSTM model, and training to obtain a word connection probability model; and generating a word connection probability table according to the probability between each word and each punctuation obtained in the middle process of training the word connection probability model.

In some embodiments, obtaining a training sample set includes: obtaining a sample article, and segmenting the sample article according to the granularity of a large sentence to obtain a sample sentence set, wherein the large sentence is a sentence ending with a period, a question mark or an exclamation mark; and for sample sentences in the sample sentence set, generating word vectors as training samples after word segmentation is carried out on the sample sentences.

In some embodiments, the method further comprises: the article is divided into at least one paragraph.

In some embodiments, the method further comprises: determining the subject and entity of the article; acquiring images matched with the subjects and the entities of the articles; and generating graphic and text information according to the images and the articles.

In some embodiments, the method further comprises: and typesetting and optimizing the image-text information.

In a second aspect, embodiments of the present disclosure provide an apparatus for outputting information, comprising: an acquisition unit configured to acquire audio information to be converted; a conversion unit configured to convert the audio information into text information; the word segmentation unit is configured to segment the text information to obtain a word sequence; the judging unit is configured to query a word connection probability table obtained through a pre-trained word connection probability model for the word in the word sequence, the connection probability between the word and the next word of the word and the connection probability between the word and various punctuations, and determine the connection target of the word based on the queried connection probability; and the connection unit is configured to connect each word in the word sequence with a corresponding connection target to generate and output an article with punctuation.

In some embodiments, the apparatus further comprises a training unit configured to: acquiring a training sample set, wherein the training sample comprises sentences containing punctuation; taking sentences of training samples in the training sample set as the input of the LSTM model, and training to obtain a word connection probability model; and generating a word connection probability table according to the probability between each word and each punctuation obtained in the middle process of training the word connection probability model.

In some embodiments, the training unit is further configured to: obtaining a sample article, and segmenting the sample article according to the granularity of a large sentence to obtain a sample sentence set, wherein the large sentence is a sentence ending with a period, a question mark or an exclamation mark; and for sample sentences in the sample sentence set, generating word vectors as training samples after word segmentation is carried out on the sample sentences.

In some embodiments, the apparatus further comprises a segmentation unit configured to: the article is divided into at least one paragraph.

In some embodiments, the apparatus further comprises a mapping unit configured to: determining the subject and entity of the article; acquiring images matched with the subjects and the entities of the articles; and generating graphic and text information according to the images and the articles.

In some embodiments, the apparatus further comprises a typesetting unit configured to: and typesetting and optimizing the image-text information.

In a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as in any of the first aspects.

In a fourth aspect, embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any of the first aspects.

The method and the device for outputting information provided by the embodiment of the disclosure can link and segment sentences according to text content analyzed by audio, then match pictures according to the text subject content, and finally typesetting and optimizing texts and pictures to generate articles. Compared with the traditional article generation system, the system has the advantages of more abundant and various data and wider sources. Compared with the traditional small-sized handwriting articles, the method has higher timeliness and coverage, and simultaneously saves labor cost and time cost.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method for outputting information according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for outputting information according to the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a method for outputting information according to the present disclosure;

fig. 5a, 5b are network structure diagrams of LSTM model of a method for outputting information according to the present disclosure.

FIG. 6 is a schematic structural diagram of one embodiment of an apparatus for outputting information according to the present disclosure;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the methods of the present disclosure for outputting information or apparatuses for outputting information may be applied.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or send messages or the like. Various communication client applications, such as an audio-to-text application, a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a microphone, a display screen and supporting audio-to-text, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer 4) players, laptop and desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background editing server providing support for text displayed on the

terminal devices

101, 102, 103. The background editing server may analyze and process the received data such as audio, and feed back the processing result (e.g., an article generated according to the audio) to the terminal device.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., a plurality of software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be noted that the method for outputting information provided by the embodiments of the present disclosure may be performed by the

terminal devices

101, 102, 103, or may be performed by the server 105. Accordingly, the means for outputting information may be provided in the

terminal devices

101, 102, 103 or in the server 105. The present invention is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for outputting information according to the present disclosure is shown. The method for outputting information comprises the following steps:

in step 201, audio information to be converted is acquired.

In the present embodiment, an execution subject of the method for outputting information (e.g., a server shown in fig. 1) may receive audio information from a terminal with which a user performs a voice writing through a wired connection or a wireless connection. The audio information may be audio files in various formats. Which contains a large number of sentences. The title of the piece of audio may be contained in the name of the audio file.

Step 202, converting the audio information into text information.

In this embodiment, the audio information may be converted into whole text by existing automatic speech recognition (ASR, automatic Speech Recognition) techniques. The text obtained based on ASR analysis is a text without sentence breaking, so that the text is also required to be cut and linked according to the semantic meaning, and marked with punctuation.

And 203, word segmentation is carried out on the text information to obtain a word sequence.

In this embodiment, word segmentation operation is performed on the whole text based on the lexical structure of chinese or english, so as to obtain a word sequence of the whole audio. The word segmentation method can comprise a maximum reverse matching method and other common word segmentation modes. The audio language, e.g., chinese, english, or other languages, may be first identified. And then performing word segmentation operation according to the lexical structure of the language.

Step 204, for the word in the word sequence, inquiring the connection probability between the word and the next word of the word and the connection probability between the word and various punctuations according to the word connection probability table obtained by the pre-trained word connection probability model, and determining the connection target of the word based on the inquired connection probability.

In this embodiment, according to the word connection probability table generated by the word connection probability model, the probability of each word to the next word and each punctuation is calculated, and the word or punctuation with the largest probability value is taken for linking. The word connection probability table is used for representing the probabilities of words and words or various punctuations. We treat each word as independent, that is, each word is potentially punctuation (i.e.,. Aiming at the current word, the probabilities of the word, various punctuations and the next word are calculated respectively, and finally the word with the highest probability is taken for linking. If the probability is the next word, the word addition is performed directly without punctuation connection. If the probability is highest, punctuation marks are added behind the word. And (3) carrying out the steps on all the words, and finally obtaining sentences connected by punctuation. For example, the word sequence "me", "love", "chinese" and "because", in turn, queries the connection probability between "me" and "love", and the connection probability of "me" to punctuation such as periods, commas, etc. The connection probability between I'm and love is the greatest, so punctuation is not used between I'm and love. The connection probability of Chinese and period is far greater than that of Chinese and period and that of Chinese and other punctuations, so period is added after Chinese. The word connection probability model is an important point of the subsystem, a related model needs to be trained to obtain the occurrence probability among words so as to generate a word connection probability table, and then the word with the highest probability is taken as a connection target of the word. The generation of the word connection probability table will be described in steps 401-403.

And 205, connecting each word in the word sequence with a corresponding connection target to generate a punctuation article for output.

In this embodiment, according to the result of step 204, each word in the word sequence is connected with the corresponding connection target to generate the article with the punctuation, and output.

In some optional implementations of the present embodiment, the method further includes: the article is divided into at least one paragraph. The articles may be semantically analyzed and then segmented according to semantics. Literal content of the same semantic meaning is classified into one segment.

In some optional implementations of the present embodiment, the method further includes: determining the subject and entity of the article; acquiring images matched with the subjects and the entities of the articles; and generating graphic and text information according to the images and the articles. According to the text data obtained by the audio conversion module, the entities (here, the entities are of relatively fine granularity and comprise characters such as stars and things such as banks) and topics (such as finance, entertainment, sports and the like) in the text are mined, then related entity graphs are searched according to the entity-removing graph library, and related topic graphs are searched according to the topic-removing graph library. The pictures are text related pictures and can be directly used as the pictures of the articles.

In some optional implementations of the present embodiment, the method further includes: and typesetting and optimizing the image-text information. And automatically inserting the picture into a more reasonable position of the article, and adjusting the size of the picture so that the area ratio of the text content to the picture reaches a preset value.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for outputting information according to the present embodiment. In the application scenario of fig. 3, the server receives an audio file "marble" sent by the terminal. The user describes the geomantic omen condition of Yunnan university in the audio file in voice. The audio file is parsed into whole-segment text by ASR techniques. And then, after the text of the whole section is segmented, inquiring the connection probability between words and punctuation, and taking the word or punctuation with the highest probability as a connection target. Each word is connected to generate an article with punctuation. And searching the picture according to the text content to find out a proper matching picture. The articles are then segmented according to text semantics. And finally, inserting the searched picture into the article and performing color rendering treatment.

According to the method provided by the embodiment of the disclosure, sentence linking and segmentation can be performed according to text content analyzed by audio, then mapping is performed according to the text subject content, and finally typesetting optimization is performed on texts and pictures to generate articles. Compared with the traditional article generation system, the system has the advantages that the data is more abundant and various, and the source is wider; compared with the traditional small-sized handwriting articles, the method has higher timeliness and coverage, and simultaneously saves labor cost and time cost.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for outputting information is shown. The flow 400 of the method for outputting information comprises the steps of:

step 401, a training sample set is obtained.

In this embodiment, the execution subject (e.g., the server shown in fig. 1) of the method for outputting information may acquire a training sample set through a wired connection manner or a wireless connection manner, where the training sample includes a sentence including punctuation. The execution body of flow 400 may be the same as the execution body of flow 200 or may be a different execution body. The third party server may execute the process 400 to generate a word connection probability table for use by the executing entity of the process 200.

Normal news text or articles are used as training data.

Firstly, cutting an article into sentences, and cutting the article according to the granularity of one large sentence, wherein the large sentence is a sentence ending with a period, a question mark and an exclamation mark. Each big sentence is used as a piece of data;

then, sentence word segmentation is carried out, and word segmentation is carried out on sentences according to an English or Chinese lexical structure;

finally, word encoding is carried out, every word is subjected to embedding, every sentence embedding representation is obtained, and a training sample is obtained. The words herein include punctuation marks.

Step 402, taking sentences of training samples in the training sample set as the input of the LSTM model, and training to obtain a word connection probability model.

In this embodiment, LSTM (Long Short-Term Memory) is a Long-Term and Short-Term Memory network, which is a time-cycled neural network suitable for processing and predicting relatively Long-spaced and delayed events of interest in a time series. The hidden layer of the original RNN has only one state (fig. 5 a), which is very sensitive to short-term inputs. Then, if we add a state (fig. 5 b) again, let it save the long-term state.

LSTM is also such a structure, but the repeated modules have a different structure. Unlike a single neural network layer, there are four here, which interact in a very specific way.

At time t, the inputs of LSTM have three: the input value of the network at the current moment, the output value of the LSTM at the last moment and the state of the unit at the last moment; the output of LSTM has two things: the current time LSTM output value, and the cell state at the current time.

The key to LSTM is how to control long term status. Here, the LSTM idea is to use three control switches. The first switch is responsible for controlling the continuous preservation of the long-term state; a second switch responsible for controlling the input of the instant status to the long-term status; the third switch is responsible for controlling whether the long-term state is taken as the current LSTM output.

Step 403, generating a word connection probability table according to the probabilities between each word and each punctuation and the probabilities between each word obtained in the middle process of training the word connection probability model.

In this embodiment, the sentence after the ebedding is used as the input of the LSTM model, and the model is trained. And pulling the middle process of the model to obtain the connection probability between each word. And carrying out statistical analysis on the connection probability among the words to obtain a word connection probability table. The word connection probability table is searched to obtain the connection probability of the word and the word before.

With further reference to fig. 6, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an apparatus for outputting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the apparatus 600 for outputting information of the present embodiment includes: an acquisition unit 601, a conversion unit 602, a word segmentation unit 603, a judgment unit 604, and a connection unit 605. Wherein, the obtaining unit 601 is configured to obtain audio information to be converted; a conversion unit 602 configured to convert the audio information into text information; a word segmentation unit 603 configured to segment the text information to obtain a word sequence; a judging unit 604 configured to query, for a word in the word sequence, a word connection probability table obtained by a word connection probability model trained in advance, a connection probability between the word and a next word of the word and a connection probability between the word and various punctuations, and determine a connection target of the word based on the queried connection probability; the connection unit 605 is configured to connect each word in the word sequence with a corresponding connection target to generate a punctuation article for output.

In the present embodiment, specific processes of the acquisition unit 601, the conversion unit 602, the word segmentation unit 603, the judgment unit 604, and the connection unit 605 of the apparatus 600 for outputting information may refer to step 201, step 202, step 203, step 204, and step 205 in the corresponding embodiment of fig. 2.

In some optional implementations of the present embodiment, the apparatus 600 further includes a training unit (not shown in the drawings) configured to: acquiring a training sample set, wherein the training sample comprises sentences containing punctuation; taking sentences of training samples in the training sample set as the input of the LSTM model, and training to obtain a word connection probability model; and generating a word connection probability table according to the probability between each word and each punctuation obtained in the middle process of training the word connection probability model.

In some optional implementations of this embodiment, the training unit is further configured to: obtaining a sample article, and segmenting the sample article according to the granularity of a large sentence to obtain a sample sentence set, wherein the large sentence is a sentence ending with a period, a question mark or an exclamation mark; and for sample sentences in the sample sentence set, generating word vectors as training samples after word segmentation is carried out on the sample sentences.

In some optional implementations of the present embodiment, the apparatus 600 further includes a segmentation unit (not shown in the drawings) configured to: the article is divided into at least one paragraph.

In some optional implementations of the present embodiment, the apparatus 600 further includes a mapping unit (not shown in the drawings) configured to: determining the subject and entity of the article; acquiring images matched with the subjects and the entities of the articles; and generating graphic and text information according to the images and the articles.

In some optional implementations of the present embodiment, the apparatus 600 further includes a typesetting unit (not shown in the drawings) configured to: and typesetting and optimizing the image-text information.

Referring now to fig. 7, a schematic diagram of an electronic device (e.g., server in fig. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The server illustrated in fig. 7 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure in any way.

As shown in fig. 7, the electronic device 700 may include a processing means (e.g., a central processor, a graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

In general, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 shows an electronic device 700 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 7 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 709, or installed from storage 708, or installed from ROM 702. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 701. It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring audio information to be converted; converting the audio information into text information; word segmentation is carried out on the text information to obtain a word sequence; for words in the word sequence, inquiring the connection probability between the word and the next word of the word and the connection probability between the word and various punctuations according to a word connection probability table obtained by a word connection probability model trained in advance, and determining the connection target of the word based on the inquired connection probability; and connecting each word in the word sequence with a corresponding connection target to generate an article with punctuation and outputting the article.

Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: the processor comprises an acquisition unit, a conversion unit, a word segmentation unit, a judgment unit and a connection unit. The names of these units do not constitute a limitation on the unit itself in some cases, and the acquisition unit may also be described as "a unit that acquires audio information to be converted", for example.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. A method for outputting information, comprising:

acquiring audio information to be converted;

converting the audio information into text information;

identifying the language of the audio frequency and cutting words from the text information according to the lexical structure of the language to obtain word sequences;

for words in the word sequence, inquiring a word connection probability table obtained through a pre-trained word connection probability model, namely, the connection probability between the word and the next word of the word and the connection probability between the word and various punctuations, and determining a connection target of the word based on the inquired connection probability, wherein the word connection probability table is obtained by carrying out statistical analysis on the probability between each word and each punctuation and the probability between each word obtained in the middle process of training the word connection probability model;

connecting each word in the word sequence with a corresponding connection target to generate an article with punctuation and outputting the article;

determining the subject and entity of the article;

searching the related entity graph according to the entity de-entity graph library and searching the related topic graph according to the topic de-topic graph library;

and generating graphic and text information according to the entity diagram, the theme diagram and the article.

2. The method of claim 1, wherein the word connection probability table is obtained by:

acquiring a training sample set, wherein the training sample comprises sentences containing punctuation;

taking sentences of training samples in the training sample set as the input of an LSTM model, and training to obtain a word connection probability model;

and generating a word connection probability table according to the probability between each word and the probability between each word and each punctuation obtained in the middle process of training the word connection probability model.

3. The method of claim 2, wherein the acquiring a set of training samples comprises:

obtaining a sample article, and segmenting the sample article according to the granularity of a large sentence to obtain a sample sentence set, wherein the large sentence is a sentence ending with a period, a question mark or an exclamation mark;

and for the sample sentences in the sample sentence set, generating word vectors as training samples after word segmentation is carried out on the sample sentences.

4. The method of claim 1, wherein the method further comprises:

the article is divided into at least one paragraph.

5. The method of claim 1, wherein the method further comprises:

and typesetting and optimizing the image-text information.

6. An apparatus for outputting information, comprising:

an acquisition unit configured to acquire audio information to be converted;

a conversion unit configured to convert the audio information into text information;

the word segmentation unit is configured to identify the language of the audio and segment the text information according to the lexical structure of the language to obtain a word sequence;

the judging unit is configured to query the word and the connection probability between the word and the next word of the word and the connection probability between the word and various punctuations according to a word connection probability table obtained by a word connection probability model trained in advance, and determine the connection target of the word based on the queried connection probability, wherein the word connection probability table is obtained by carrying out statistical analysis according to the probability between each word and each punctuation obtained in the middle process of the word connection probability model training;

the connection unit is configured to connect each word in the word sequence with a corresponding connection target to generate an article with punctuation and output the article;

a mapping unit configured to determine topics and entities of the articles; searching the related entity graph according to the entity de-entity graph library and searching the related topic graph according to the topic de-topic graph library; and generating graphic and text information according to the entity diagram, the theme diagram and the article.

7. The apparatus of claim 6, wherein the apparatus further comprises a training unit configured to:

8. The apparatus of claim 7, wherein the training unit is further configured to:

9. The apparatus of claim 6, wherein the apparatus further comprises a segmentation unit configured to:

the article is divided into at least one paragraph.

10. The apparatus of claim 6, wherein the apparatus further comprises a typesetting unit configured to:

and typesetting and optimizing the image-text information.

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

12. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-5.