WO2021139242A1 - Presentation file generation method, apparatus, and device and storage medium - Google Patents
Presentation file generation method, apparatus, and device and storage medium Download PDFInfo
- Publication number
- WO2021139242A1 WO2021139242A1 PCT/CN2020/118349 CN2020118349W WO2021139242A1 WO 2021139242 A1 WO2021139242 A1 WO 2021139242A1 CN 2020118349 W CN2020118349 W CN 2020118349W WO 2021139242 A1 WO2021139242 A1 WO 2021139242A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- file
- processed
- presentation file
- text
- keywords
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/438—Presentation of query results
- G06F16/4387—Presentation of query results by the use of playlists
- G06F16/4393—Multimedia presentations, e.g. slide shows, multimedia albums
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/189—Automatic justification
Definitions
- This application relates to the field of computer technology, and in particular to a method, device, device, and storage medium for generating a presentation.
- presentations are widely used in all aspects of social life. For example, presentations are used in work reports, corporate publicity, product promotion, wedding celebrations, project bidding, management consulting, education and training.
- the inventor found that the production of presentations is mainly by manually filling pictures, texts and other elements into preset templates.
- this method requires high labor costs, and in some cases, the templates and content cannot be very good.
- Good integration requires users to repeatedly adjust the template to achieve the desired effect, resulting in low efficiency of presentation generation.
- the embodiments of the present application provide a presentation method, device, equipment, and storage medium, which can improve the efficiency of generating presentations.
- an embodiment of the present application provides a presentation method.
- the method includes: obtaining at least two keywords in a file to be processed and characteristic attribute information of the file to be processed, where the characteristic attribute information includes all At least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; the file to be processed is divided according to the at least two keywords, Obtain at least two text fragments, one text fragment corresponding to at least one keyword; from the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; The fragments are imported into the target presentation file template to obtain the target presentation file.
- an embodiment of the present application provides a presentation generation device, the device includes: an acquisition module for acquiring at least two keywords in a file to be processed and characteristic attribute information of the file to be processed, so The characteristic attribute information includes at least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed;
- the keywords divide the file to be processed to obtain at least two text fragments, one text fragment corresponds to at least one keyword;
- the recognition module is used to identify the characteristics of the file to be processed from the presentation file template library A target presentation file template with matching attribute information; an import module for importing the at least two text fragments into the target presentation file template to obtain a target presentation file.
- an embodiment of the present application provides an electronic device, which includes: a processor, adapted to implement one or more instructions; and, a computer storage medium that stores one or more instructions, The one or more instructions are suitable for being loaded by the processor and executing the following steps: acquiring at least two keywords in the file to be processed and characteristic attribute information of the file to be processed, the characteristic attribute information including all At least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; the file to be processed is divided according to the at least two keywords, Obtain at least two text fragments, one text fragment corresponding to at least one keyword; from the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; The fragments are imported into the target presentation file template to obtain the target presentation file.
- an embodiment of the present application provides a computer-readable storage medium, including: the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing the following steps : Obtain at least two keywords in the file to be processed, and characteristic attribute information of the file to be processed, the characteristic attribute information including the field to which the file to be processed belongs and the number of keywords in the file to be processed And at least one of the topics of the document to be processed; dividing the document to be processed according to the at least two keywords to obtain at least two text fragments, one text fragment corresponding to at least one keyword; In the presentation file template library, a target presentation file template matching the characteristic attribute information of the file to be processed is identified; the at least two text fragments are imported into the target presentation file template to obtain the target presentation file.
- the file to be processed by acquiring at least two keywords in the file to be processed and the characteristic attribute information of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments; from the presentation file In the template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import at least two text fragments into the target presentation file template to obtain the target presentation file.
- the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment.
- the target presentation file template matching the characteristic attribute information of the file to be processed, and import at least two text fragments into the target presentation file template to obtain the target presentation file, that is, the target presentation file includes the presentation corresponding to each text fragment Manuscript.
- the whole process of generating the target presentation file does not require human involvement, which can improve the efficiency and flexibility of presentation generation; and ensure the accuracy and relevance of the presentation.
- FIG. 1 is a schematic flowchart of a method for generating a presentation provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of a method for importing at least two text fragments into a target presentation file template according to an embodiment of the present application.
- FIG. 3 is a schematic flowchart of another method for generating a presentation provided by an embodiment of the present application.
- Fig. 4 is a schematic structural diagram of a presentation generating device provided by an embodiment of the present application.
- Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- Computer Vision technology is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets. And further graphics processing, so that computer processing becomes more suitable for human eyes to observe or send to the instrument to detect the image.
- Computer vision studies related theories and technologies trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
- Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
- This application relates to image recognition technology in artificial intelligence.
- the image recognition technology is used to automatically convert images into presentations without manual participation, which can improve the efficiency and accuracy of generating presentations; this application can be applied to smart government affairs, smart education, etc. This field is conducive to promoting the construction of smart cities.
- FIG. 1 is a schematic flowchart of a method for generating a presentation provided by an embodiment of the present application.
- the embodiment of the present application may be executed by an electronic device.
- the method for generating a presentation includes the following steps S101 to S104.
- S101 Acquire at least two keywords in a file to be processed and characteristic attribute information of the file to be processed.
- the file to be processed is a text file provided by the user for making the target presentation file, and the file to be processed contains content information corresponding to each page of the presentation in the target presentation file.
- the keywords in the file to be processed can be extracted through the LDA model to obtain at least two keywords in the file to be processed; and the title and content of the file to be processed can be analyzed to obtain characteristic attribute information of the file to be processed.
- the aforementioned LDA model refers to a document topic generation model, which is used to infer the topic distribution of a document, and text classification can be performed according to the topic distribution.
- the characteristic attribute information of the file to be processed includes at least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; wherein the field to which the file to be processed belongs includes technology, education, Political parties, finance, tourism, etc.
- the topics of the documents to be processed include work summary, marriage and love, graduation defense, induction training, and so on.
- S102 Divide the file to be processed according to the above at least two keywords to obtain at least two text fragments.
- the paragraph of the keyword in the file to be processed can be used as the text segment corresponding to the keyword; or, the file to be processed can be divided according to the position of the keyword in the file to be processed to obtain at least two text segments.
- the length of the text fragments can be the same or different; a text fragment corresponds to at least one keyword, that is, the keyword corresponding to the text fragment can refer to the topic of the text fragment, and the keywords corresponding to different text fragments can be different .
- step S102 includes: obtaining the similarity between every two adjacent keywords in the at least two keywords, and dividing the corresponding paragraphs in the file to be processed with the keywords whose similarity is greater than the similarity threshold into the same text segment
- at least two text fragments are obtained, and two adjacent keywords can be located in adjacent paragraphs, or located in the same paragraph.
- the foregoing obtaining the similarity between every two adjacent keywords in at least two keywords includes: using a distance algorithm to obtain the distance between every two adjacent keywords in the at least two keywords, and determining to obtain the at least The similarity between every two adjacent keywords in two keywords. Among them, the greater the distance between two adjacent keywords, the smaller the similarity between two adjacent keywords; the smaller the distance between two adjacent keywords, the similarity between two adjacent keywords The greater the degree.
- the distance algorithm may include at least one of Minkowski distance, Manhattan distance, Manhattan distance, and Chebyshev distance.
- the file to be processed may be divided according to the above at least two keywords to obtain at least two candidate text fragments; the above at least two text fragments are generated based on the at least two candidate file fragments.
- each candidate text fragment is a new topic sentence according to the text meaning of each candidate text fragment and the word meaning of the keyword corresponding to each candidate text fragment.
- the BERT model refers to a method of pre-training language representation.
- a general "language understanding" model is trained on a large amount of text corpus (Wikipedia), and then based on this model, candidate text fragments can be classified, and whether the candidate text is Judge the new topic sentence.
- the text vector of each candidate text segment can be obtained, and each keyword and the text vector of the candidate text segment corresponding to each keyword can be input into the BERT model to obtain a 1/0 indication result.
- the text vector of each candidate text segment can be obtained during the training process of the BERT model, and the text vector of each candidate file segment is used to describe the global semantic information of the text segment.
- the BERT model will determine whether each candidate text segment is a new topic sentence based on the meaning of each keyword and the text vector of the candidate text segment corresponding to each keyword.
- the BERT model will output a 1/0 indication result to determine Whether each candidate text segment is a new topic sentence. If the BERT model outputs a 1 indicating result, it means that the candidate text segment is a new topic sentence, and then the candidate text fragment is determined as a text fragment; if the BERT model outputs a 0 indicating result, it means that the candidate text fragment is not a new topic sentence. Then, the keyword of the candidate text segment is reacquired, and then it is judged whether the candidate text segment after reacquiring the keywords is a new topic sentence.
- S103 Identify a target presentation file template matching the characteristic attribute information of the file to be processed from the presentation file template library.
- the presentation file template library includes a variety of presentation file templates, each presentation file template includes multiple presentations, and the number of presentations included in different presentation file templates is inconsistent; and/or, the color information and layout of different presentation file templates The information can be different. Therefore, different presentation file templates are suitable for different fields, or different presentation file templates are suitable for different themes; or, different presentation file templates are suitable for generating text files of different lengths corresponding to the presentation files.
- the presentation file template library After obtaining the characteristic attribute information of the file to be processed, in the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed, and generate the target presentation file corresponding to the file to be processed according to the target presentation file template .
- the aforementioned characteristic attribute information includes the number of keywords in the file to be processed, and the number of text fragments in at least two text fragments can be determined according to the number of keywords in the file to be processed;
- the number of presentations included in the presentation file template; the number of presentation texts included in the presentation file template library, which is the same as the number of text fragments, is determined as the target presentation file template.
- the target presentation file template can be determined in the presentation file template library according to the number of keywords in the file to be processed. First, the number of text fragments of at least two text fragments can be determined according to the number of keywords of the file to be processed. At least one keyword corresponds to one text fragment, that is, the sentence corresponding to the at least one keyword is divided into one text fragment. Then obtain the number of presentations included in each presentation file template in the target presentation file template library, and determine the number of presentation files included in the presentation file template library as the target presentation file template with the same number of text fragments as the presentation file template. .
- the number of presentation files included in the presentation file template library is more than one presentation file template with the same number of text fragments, one of them can be determined as the target presentation based on the text content of the file to be processed, user designation, or random selection.
- Document template if the number of presentation files included in the presentation file template library is more than one presentation file template with the same number of text fragments, one of them can be determined as the target presentation based on the text content of the file to be processed, user designation, or random selection.
- the above-mentioned characteristic attribute information includes the theme of the file to be processed, the theme of each presentation file template in the presentation file template library can be obtained; the theme of each presentation file template in the presentation file template library is determined separately, and The matching degree between the themes of the files to be processed; the presentation file template with the largest matching degree is selected from the presentation file template library as the target presentation file template.
- the subject of the document to be processed can be obtained according to the text content of the document to be processed, such as work summary, marriage and love, graduation defense, induction training, etc., and the title and content of the document to be processed are analyzed.
- Get the subject of the file to be processed Then obtain the theme of each presentation file template in the presentation file template library, and determine the matching degree between the theme corresponding to each presentation file template in the presentation file template library and the theme of the file to be processed.
- the theme of the presentation file template in the presentation file template library and the presentation file template with the greatest matching degree between the theme of the file to be processed are used as the target presentation file template. You can store multiple theme presentation file templates in the presentation file template library in advance, and one presentation file template corresponds to one theme.
- the above-mentioned characteristic attribute information includes the field to which the file to be processed belongs, and the attribute information of the presentation file corresponding to the file to be processed can be predicted according to the field to which the file to be processed belongs.
- the attribute information of the presentation file corresponding to the file to be processed includes The typesetting information and color information of the presentation file corresponding to the file to be processed; obtaining the attribute information of each presentation file template in the presentation file template library, and the attribute information of each presentation file template includes the typesetting information and color information of each presentation file template;
- the attribute information of the presentation file template in the presentation file template library and the presentation file template with the greatest degree of matching with the attribute information of the presentation file corresponding to the file to be processed are determined as the target presentation file template.
- the target presentation file template corresponding to the file to be processed can be determined in the presentation file template library according to the field to which the file to be processed belongs.
- the attribute information of the presentation file corresponding to the file to be processed can be predicted according to the field to which the file to be processed belongs.
- the attribute information of the presentation file corresponding to the file to be processed includes the typesetting information and color information of the presentation file corresponding to the file to be processed.
- the field of the document to be processed can be science and technology, education, political parties, finance, tourism, etc.
- the typesetting information of the presentation file corresponding to the document to be processed is folio typesetting , That is, half of the page is used to display the scenery, half of the page is used to introduce the scenery, or multi-picture layout, etc.
- the color information of the presentation file corresponding to the to-be-processed file of the tourism category should be relatively fresh, such as sky blue, green, and so on.
- S104 Import the above-mentioned at least two text fragments into the target presentation file template to obtain the target presentation file.
- the file to be processed is divided according to at least two keywords of the file to be processed to obtain at least two text fragments, and after the target presentation file template is obtained, the at least two text fragments are imported into the target presentation file template to obtain the target presentation file.
- the target presentation file template includes multiple presentation templates. At least two text fragments can be imported into the target presentation file template in a preset order to obtain the target presentation file, that is, according to the order of the text fragments, each of the at least two text fragments. The two text fragments are imported into the presentation template corresponding to the target presentation file template, the presentation corresponding to each file fragment is obtained, and the target presentation file is generated according to each presentation.
- the preset order may be obtained according to the position information of the keyword of each text fragment in the at least two text fragments in the file to be processed, and the at least two text fragments may be imported into the target presentation file template according to the preset order. If there are three text fragments, namely, text fragment 1, text fragment 2, and text fragment 3, the three text fragments are sorted as text fragment 1 before text fragment 2, and text fragment 2 before text fragment 3.
- the three text fragments can be imported into the target presentation file template in turn according to the order between the three text fragments, that is, text fragment 1 is imported into the first presentation template in the target presentation file template, and text fragment 2 is imported into the target presentation file.
- text fragment 3 is imported into the third presentation template in the target presentation file template.
- FIG. 2 it is a schematic diagram of a method for importing at least two text fragments into a target presentation file template provided by an embodiment of the present application.
- a method provided by an embodiment of the present application combines at least two
- the method for importing text fragments into a target presentation file template includes steps S21-S23.
- S21 Acquire location information in the file to be processed of the keyword of each of the at least two text fragments.
- the location information in the file to be processed where the keyword corresponding to each text segment of at least two text fragments is located can be obtained, and each file is recorded.
- the location information in the file to be processed where the keyword corresponding to each text fragment is located is located.
- S22 Sort at least two text fragments according to the position information of the keywords in the processed file.
- Each presentation in the target presentation file template has a fixed sequence, and at least two sorted text fragments are sequentially imported into the target presentation file template to obtain the target presentation file.
- each text segment is processed according to the preprocessing method of each text segment to obtain at least two processed text segments; the processed at least two text segments Import into the target presentation file template to get the target presentation file.
- the text feature information corresponding to each of the at least two text segments can be acquired, and the preprocessing mode of each text segment can be determined according to the text feature information corresponding to each text segment.
- the text feature information of the text fragment includes the text length information of the text fragment, the hierarchical title information of the text fragment, the information whether the text of the text fragment is wrong, etc.
- the preprocessing method includes simplified processing, split processing, correction processing, and typesetting processing. At least one of. Each text segment is processed according to the preprocessing method of each text segment to obtain at least two processed text segments.
- Pointer-Generator Network pointer generation network
- text information in each text segment is also corrected, and the wrong text and wrong punctuation in each text segment are corrected. Import the processed at least two text fragments into the target presentation file template to obtain the target presentation file.
- a suitable presentation can be selected in the target presentation file template according to the typesetting information and text content of each text fragment. If a text fragment contains four subtitles, select a presentation containing four subtitles in the target presentation file template.
- the opening page is set according to the theme of the file to be processed
- the directory page is set according to the content of each text fragment in the file to be processed
- the end page is set. Complete the complete target presentation file.
- the target presentation file is presented to the user, and the user's instructions to adjust the color, font, and content of each presentation in the target presentation file can be accepted.
- Each presentation in the target presentation file is You can accept personalized adjustments to the shape of the icon, the style of the wireframe, etc., and finally get the final version of the target presentation file.
- the electronic device in this application can refer to any node device in the blockchain.
- the so-called blockchain is a computer technology such as distributed data storage, peer-to-peer transmission (P2P transmission), consensus mechanism, encryption algorithm, etc.
- the new type of application model is essentially a decentralized database; a block chain can be composed of multiple serial transaction records (also called blocks) that are connected and protected by cryptography.
- the connected distributed ledger allows multiple parties to effectively record the transaction, and the transaction can be permanently checked (not tampered with).
- the consensus mechanism refers to the mathematical algorithm that realizes the establishment of trust between different nodes and the acquisition of rights and interests in the blockchain network; that is to say, the consensus mechanism is a mathematical algorithm recognized by all network nodes of the blockchain.
- This application can use the consensus mechanism of the blockchain to generate the target demonstration file based on the files to be processed, which can improve the accuracy of restoring the target demonstration file.
- each node device in the blockchain performs consensus verification on the execution results of the above steps S101 ⁇ S104, and the execution result of each step is passed by the consensus verification, it can be determined that the accuracy of generating the target presentation file is relatively high; if there are steps If the execution result of is not passed by the consensus verification, it can be determined that the accuracy of generating the target demonstration file is relatively low, and the node device may perform the above steps S101 to S104 again to obtain the target demonstration file again.
- each node device in the blockchain can perform consensus verification on the target presentation file (that is, only the execution result of step S104).
- the node device may perform the above steps S101 to S104 again to obtain the target presentation file again.
- the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment.
- the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file.
- the target presentation file includes the presentation corresponding to each text fragment .
- the entire process of generating the target presentation file does not require human involvement, and the generated results are directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
- FIG. 3 is a schematic flowchart of another method for generating a presentation provided by an embodiment of the present application, which is executed by the electronic device in the embodiment of the present application.
- the another method for generating a presentation includes the following steps S201 to S206.
- S201 Acquire at least two keywords in a file to be processed and characteristic attribute information of the file to be processed.
- S202 Divide the file to be processed according to at least two keywords to obtain at least two text fragments.
- S203 Identify a target presentation file template matching the characteristic attribute information of the file to be processed from the presentation file template library.
- the content in steps S201 to S203 of the another method for generating a presentation can refer to the content shown in FIG. 1, which will not be repeated in the embodiment of the present application.
- S205 Sort at least two text fragments according to the affiliation between the keywords of every two text fragments.
- the affiliation relationship between keywords corresponding to each of the at least two text fragments can be obtained according to the BERT model, and the affiliation relationship may refer to the containment relationship and the sequence relationship between the keywords.
- the sequence between the at least two text segments can be determined according to the subordination relationship between the keywords of every two text segments, and the at least two text segments can be sorted according to the sequence.
- the document to be processed is about tourism promotion of a certain place. If the key word corresponding to text fragment 1 is the historical culture of a certain place, the key word corresponding to text fragment 2 is the story of a certain place during the Republic of China. A certain period of culture, therefore, the order of the keywords corresponding to the text fragment 1 is before the order of the keywords corresponding to the text fragment 2.
- the subordination relationship between the keywords of each two text fragments at least two text fragments are sorted, and the sorted at least two text fragments are sequentially imported into the target presentation file template to obtain the target presentation file. The accuracy of the presentation.
- the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment.
- the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file.
- the target presentation file includes the presentation corresponding to each text fragment .
- at least two text fragments will be sorted according to the affiliation between the keywords of each two text fragments in the at least two text fragments, and then the sorted Import at least two text fragments into the target presentation file template, so that any two presentations in the target presentation file have a reasonable sequence, and improve the accuracy of the target presentation file generation.
- the generated result is directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
- the data processing device includes the following: an obtaining module 11, configured to obtain at least two keywords in the file to be processed, and characteristic attribute information of the file to be processed, the characteristic attribute information including the to-be-processed file At least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed;
- the file is divided to obtain at least two text fragments, one text fragment corresponds to at least one keyword;
- the recognition module 13 is used to identify the target presentation that matches the characteristic attribute information of the file to be processed from the presentation file template library File template; wherein the identification module 13 includes: a first determining unit, a first acquiring unit, and a second determining unit.
- the first determining unit is configured to determine the number of text fragments in the at least two text fragments according to the number of keywords in the file to be processed; the first acquiring unit is configured to acquire each of the presentation file template libraries The number of presentation documents included in the presentation file template; the second determining unit is used to determine the number of presentation texts included in the presentation file template library as the presentation file template with the same number of text fragments as the Target presentation file template.
- the identification module 13 further includes: a second acquisition unit, a third determination unit, and a selection unit.
- the second obtaining unit is used to obtain the theme of each presentation file template in the presentation file template library; the third determining unit is used to separately determine the theme of each presentation file template in the presentation file template library, and The degree of matching between the topics of the files to be processed; a selection unit for selecting a presentation file template with the largest matching degree from the presentation file template library as the target presentation file template.
- the identification module 13 further includes: a prediction unit, a third acquisition unit, and a fourth determination unit.
- the prediction unit is configured to predict the attribute information of the presentation file corresponding to the file to be processed according to the field to which the file to be processed belongs, and the attribute information of the presentation file corresponding to the file to be processed includes the presentation corresponding to the file to be processed Typesetting information and color information of the file;
- the third obtaining unit is used to obtain the attribute information of each presentation file template in the presentation file template library, and the attribute information of each presentation file template includes each presentation file template.
- the fourth determining unit is used to combine the attribute information of the presentation file template in the presentation file template library with the presentation file template with the greatest degree of match between the attribute information of the presentation file corresponding to the file to be processed, Determined as the target presentation file template.
- the import module 14 is configured to import the at least two text fragments into the target presentation file template to obtain the target presentation file.
- the import module 14 includes: a fourth acquisition unit, a first sorting unit, and a first import unit.
- the fourth obtaining unit is configured to obtain the position information of the keyword of each text fragment in the at least two text fragments in the to-be-processed file; the first sorting unit is configured to compare the position information to the At least two text fragments are sorted; the first import unit is configured to import the at least two text fragments sorted into the target presentation file template in order to obtain the target presentation file.
- the import module 14 further includes: a fifth acquisition unit, a second sorting unit, and a second import unit.
- the fifth acquiring unit is configured to acquire the affiliation relationship between keywords of each two text fragments in the at least two text fragments; the second sorting unit is configured to compare the at least two texts according to the affiliation relationship. The fragments are sorted; the second import unit is used to import at least two text fragments sorted into the target presentation file template in sequence to obtain the target presentation file.
- the import module 14 further includes: a sixth acquisition unit, a fifth determination unit, a processing unit, and a third import unit.
- the sixth acquiring unit is configured to acquire the text feature information corresponding to each of the at least two text fragments; the fifth determining unit is configured to determine the text feature information corresponding to each of the text fragments.
- a preprocessing method for text fragments includes at least one of simplification processing, split processing, correction processing, and typesetting processing; a processing unit configured to perform processing on the preprocessing method of each text fragment Each text fragment is processed to obtain at least two processed text fragments; the third import unit is configured to import the processed at least two text fragments into the target presentation file template to obtain the target presentation file.
- the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment.
- the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file.
- the target presentation file includes the presentation corresponding to each text fragment .
- at least two text fragments will be sorted according to the affiliation between the keywords of each two text fragments in the at least two text fragments, and then the sorted Import at least two text fragments into the target presentation file template, so that any two presentations in the target presentation file have a reasonable sequence, and improve the accuracy of the target presentation file generation.
- the generated result is directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
- FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device in this embodiment may include: one or more processors 21; and one or more input devices 22.
- the aforementioned processor 21, input device 22, output device 23, and memory 24 are connected by a bus 25.
- the processor 21 may be a central processing unit (Central Processing Unit, CPU), the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), ready-made programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the input device 22 may include a touch panel, a fingerprint sensor (used to collect user fingerprint information and fingerprint orientation information), a microphone, etc.
- the output device 23 may include a display (LCD, etc.), a speaker, etc., and the output device 23 may output calibration The processed data sheet.
- the memory 24 may include a read-only memory and a random access memory, and provides instructions and data to the processor 21. A part of the memory 24 may also include a non-volatile random access memory.
- the memory 24 is used to store a computer program.
- the computer program includes program instructions.
- the processor 21 is used to execute the program instructions stored in the memory 24 to execute a program.
- a method for generating a presentation is used to perform the following operations: acquiring at least two keywords in a file to be processed and characteristic attribute information of the file to be processed, where the characteristic attribute information includes the field to which the file to be processed belongs At least one of the number of keywords in the file to be processed and the subject of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments, One text fragment corresponds to at least one keyword; from the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import the at least two text fragments into the target presentation In the file template, the target presentation file is obtained.
- the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: determine the number of text fragments in the at least two text fragments according to the number of keywords in the file to be processed; The number of presentations included in each presentation file template in the presentation file template library; the number of presentation texts included in the presentation file template library is determined to be the presentation file template with the same number of text fragments as The target presentation file template.
- the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: obtain the theme of each presentation file template in the presentation file template library; respectively determine the presentation file template in the presentation file template library The matching degree between the theme of each presentation file template and the theme of the file to be processed; the presentation file template with the largest matching degree is selected from the presentation file template library as the target presentation file template.
- the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: predict the attribute information of the presentation file corresponding to the to-be-processed file according to the field to which the to-be-processed file belongs, and the to-be-processed file
- the attribute information of the presentation file corresponding to the processed file includes the typesetting information and color information of the presentation file corresponding to the file to be processed; the attribute information of each presentation file template in the presentation file template library is acquired, and each presentation file template is
- the attribute information includes the typesetting information and color information of each presentation file template; the attribute information of the presentation file template in the presentation file template library is matched with the attribute information of the presentation file corresponding to the file to be processed.
- the presentation file template is determined as the target presentation file template.
- the processor 21 is configured to execute program instructions stored in the memory 24, and is configured to perform the following operations: obtain the position information of the keyword of each text fragment in the at least two text fragments in the file to be processed; The at least two text fragments are sorted according to the location information; the at least two text fragments after sorting are sequentially imported into the target presentation file template to obtain the target presentation file.
- the processor 21 is configured to execute program instructions stored in the memory 24, and is configured to perform the following operations: obtain the affiliation relationship between keywords of every two text fragments in the at least two text fragments; Relationship, sort the at least two text fragments; import the at least two text fragments sorted into the target presentation file template in sequence to obtain the target presentation file.
- the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: obtain the text feature information corresponding to each of the at least two text fragments; Determine the preprocessing mode of each text segment in the text feature information, the preprocessing mode includes at least one of simplification processing, split processing, correction processing, and typesetting processing; according to the preprocessing of each text segment
- Each of the text fragments is processed in a manner to obtain at least two text fragments after processing; the at least two text fragments after the processing are imported into the target presentation file template to obtain the target presentation file.
- the processor 21, the input device 22, and the output device 23 described in the embodiments of this application can execute the implementations described in the first embodiment and the second embodiment of the presentation generation method provided in the embodiments of this application.
- the implementation of the electronic device described in the embodiments of the present application is implemented, which will not be repeated here.
- the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment.
- the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file.
- the target presentation file includes the presentation corresponding to each text fragment .
- the at least two text fragments will be sorted according to the affiliation between the keywords of each two text fragments in the at least two text fragments, and then the sorted Import at least two text fragments into the target presentation file template, so that any two presentations in the target presentation file have a reasonable sequence, and improve the accuracy of the target presentation file generation.
- the generated result is directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
- An embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions.
- the program instructions When the program instructions are executed by a processor, the implementation is shown in FIG. 1 and FIG. 3 The presentation generation method shown in the embodiment.
- the computer-readable storage medium may be non-volatile or volatile.
- the computer-readable storage medium may be an internal storage unit of the electronic device described in any of the foregoing embodiments, such as a hard disk or a memory of a control device.
- the computer-readable storage medium may also be an external storage device of the control device, such as a plug-in hard disk equipped on the control device, a smart memory card (Smart Media Card, SMC), or a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
- the computer-readable storage medium may also include both an internal storage unit of the control device and an external storage device.
- the computer-readable storage medium is used to store the computer program and other programs and data required by the control device.
- the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
- the above-mentioned computer-readable storage medium may be deployed on one computer device for execution, or deployed on multiple computer devices located in one location, or on multiple computer devices that are distributed in multiple locations and interconnected by a communication network.
- Executed on a computer device multiple computer devices distributed in multiple locations and interconnected by a communication network can form a blockchain network.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Processing Or Creating Images (AREA)
Abstract
A presentation file generation method and apparatus, a device, and a storage medium. The method comprises: obtaining at least two keywords in a file to be processed, as well as feature attribute information of the file (S101); dividing the file according to the at least two keywords, so as to obtain at least two text fragments (S102); identifying in a presentation file template library a target presentation file template that matches the feature attribute information of said file; (S103); and importing the at least two text fragments into the target presentation file template so as to obtain a target presentation file (S104). In the method, a target presentation file may be generated according to the text information inputted by the user, increasing presentation file generation efficiency. The present method relates to image recognition technology in artificial intelligence. The method is suitable for fields such as smart government services and smart education, and is thus conducive to promoting the construction of smart cities.
Description
本申请要求于2020年07月16日提交中国专利局、申请号为202010686330.6、申请名称为“一种药品发现方法、设备、服务器及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 202010686330.6, and the application name is "a drug discovery method, equipment, server and readable storage medium" on July 16, 2020, and its entire contents Incorporated in this application by reference.
本申请涉及计算机技术领域,尤其涉及一种演示文稿生成方法、装置、设备及存储介质。This application relates to the field of computer technology, and in particular to a method, device, device, and storage medium for generating a presentation.
随着办公软件的普遍推广,演示文稿被广泛应用于社会生活的方方面面,例如,演示文稿被应用于工作汇报、企业宣传、产品推介、婚礼庆典、项目竞标、管理咨询、教育培训等领域。目前,发明人发现,演示文稿的制作主要是通过人工将图片、文字等元素填充到预设的模板,但是,这种方式需要耗费较高的人力成本,而且在一些情况下模板与内容无法很好的融合,需要用户反复调整模板才能达到预期的效果,从而导致演示文稿生成效率较低。With the general promotion of office software, presentations are widely used in all aspects of social life. For example, presentations are used in work reports, corporate publicity, product promotion, wedding celebrations, project bidding, management consulting, education and training. At present, the inventor found that the production of presentations is mainly by manually filling pictures, texts and other elements into preset templates. However, this method requires high labor costs, and in some cases, the templates and content cannot be very good. Good integration requires users to repeatedly adjust the template to achieve the desired effect, resulting in low efficiency of presentation generation.
基于现有演示文稿生成方式需要耗费较高的人力成本,而且在一些情况下模板与内容无法很好的融合,需要用户反复调整模板才能达到预期的效果,从而导致演示文稿生成效率较低的问题,本申请实施方式提供一种演示文稿方法、装置、设备及存储介质,可提高生成演示文稿的效率。Existing presentation generation methods require high labor costs, and in some cases the template and content cannot be well integrated, requiring users to repeatedly adjust the template to achieve the desired effect, resulting in low efficiency of presentation generation , The embodiments of the present application provide a presentation method, device, equipment, and storage medium, which can improve the efficiency of generating presentations.
第一方面,本申请实施例提供了一种演示文稿方法,该方法包括:获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。In the first aspect, an embodiment of the present application provides a presentation method. The method includes: obtaining at least two keywords in a file to be processed and characteristic attribute information of the file to be processed, where the characteristic attribute information includes all At least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; the file to be processed is divided according to the at least two keywords, Obtain at least two text fragments, one text fragment corresponding to at least one keyword; from the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; The fragments are imported into the target presentation file template to obtain the target presentation file.
第二方面,本申请实施例提供了一种演示文稿生成装置,该装置包括:获取模块,用于获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;划分模块,用于根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;识别模块,用于从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;导入模块,用于将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。In a second aspect, an embodiment of the present application provides a presentation generation device, the device includes: an acquisition module for acquiring at least two keywords in a file to be processed and characteristic attribute information of the file to be processed, so The characteristic attribute information includes at least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; The keywords divide the file to be processed to obtain at least two text fragments, one text fragment corresponds to at least one keyword; the recognition module is used to identify the characteristics of the file to be processed from the presentation file template library A target presentation file template with matching attribute information; an import module for importing the at least two text fragments into the target presentation file template to obtain a target presentation file.
第三方面,本申请实施例提供了一种电子设备,该设备包括:处理器,适于实现一条或一条以上指令;以及,计算机存储介质,所述计算机存储介质存储有一条或一条以上指令,所述一条或一条以上指令适于由所述处理器加载并执行如下步骤:获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。In a third aspect, an embodiment of the present application provides an electronic device, which includes: a processor, adapted to implement one or more instructions; and, a computer storage medium that stores one or more instructions, The one or more instructions are suitable for being loaded by the processor and executing the following steps: acquiring at least two keywords in the file to be processed and characteristic attribute information of the file to be processed, the characteristic attribute information including all At least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; the file to be processed is divided according to the at least two keywords, Obtain at least two text fragments, one text fragment corresponding to at least one keyword; from the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; The fragments are imported into the target presentation file template to obtain the target presentation file.
第四方面,本申请实施例提供了一种计算机可读存储介质,包括:所述计算机存储介质存储有一条或一条以上指令,所述一条或一条以上指令适于由处理器加载并执行如下步骤:获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including: the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing the following steps : Obtain at least two keywords in the file to be processed, and characteristic attribute information of the file to be processed, the characteristic attribute information including the field to which the file to be processed belongs and the number of keywords in the file to be processed And at least one of the topics of the document to be processed; dividing the document to be processed according to the at least two keywords to obtain at least two text fragments, one text fragment corresponding to at least one keyword; In the presentation file template library, a target presentation file template matching the characteristic attribute information of the file to be processed is identified; the at least two text fragments are imported into the target presentation file template to obtain the target presentation file.
本申请实施例中,通过获取待处理文件中的至少两个关键词,以及待处理文件的特征属性信息;根据至少两个关键词对待处理文件进行划分,得到至少两个文本片段;从演示文件模板库中,识别与待处理文件的特征属性信息匹配的目标演示文件模板;将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件。在本方案中,根据上述至少两个关键词对待处理文件进行划分,得到至少两个文本片段,有利于生成每个文件片段对应的演示文稿。并识别出与待处理文件的特征属性信息匹配的目标演示文件模板,将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件,即目标演示文件中包括每个文本片段对应的演示文稿。生成目标演示文件的整个过程中无需人工参与,可提高演示文稿生成的效率以及灵活性;并确保演示文稿生成的准确率性、切题性。In the embodiment of the present application, by acquiring at least two keywords in the file to be processed and the characteristic attribute information of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments; from the presentation file In the template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import at least two text fragments into the target presentation file template to obtain the target presentation file. In this solution, the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment. And identify the target presentation file template matching the characteristic attribute information of the file to be processed, and import at least two text fragments into the target presentation file template to obtain the target presentation file, that is, the target presentation file includes the presentation corresponding to each text fragment Manuscript. The whole process of generating the target presentation file does not require human involvement, which can improve the efficiency and flexibility of presentation generation; and ensure the accuracy and relevance of the presentation.
图1是本申请实施例提供的一种演示文稿生成方法的流程示意图。FIG. 1 is a schematic flowchart of a method for generating a presentation provided by an embodiment of the present application.
图2是本申请实施例提供的一种将至少两个文本片段导入目标演示文件模板方法的示意图。FIG. 2 is a schematic diagram of a method for importing at least two text fragments into a target presentation file template according to an embodiment of the present application.
图3是本申请实施例提供的另一种演示文稿生成方法的流程示意图。FIG. 3 is a schematic flowchart of another method for generating a presentation provided by an embodiment of the present application.
图4是本申请实施例提供的一种演示文稿生成装置的结构示意图。Fig. 4 is a schematic structural diagram of a presentation generating device provided by an embodiment of the present application.
图5是本申请实施例提供的一种电子设备的结构示意图。Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
其中,计算机视觉技术(Computer Vision, CV)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。Among them, computer vision technology (Computer Vision, CV) is a science that studies how to make machines "see". Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets. And further graphics processing, so that computer processing becomes more suitable for human eyes to observe or send to the instrument to detect the image. As a scientific discipline, computer vision studies related theories and technologies, trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
本申请涉及人工智能中的图像识别技术,利用图像识别技术将图像自动转换为演示文稿,不需要人工参与,可提高生成演示文稿的效率以及准确度;本申请可适用于智慧政务、智慧教育等领域,有利于推动智慧城市的建设。This application relates to image recognition technology in artificial intelligence. The image recognition technology is used to automatically convert images into presentations without manual participation, which can improve the efficiency and accuracy of generating presentations; this application can be applied to smart government affairs, smart education, etc. This field is conducive to promoting the construction of smart cities.
请参见图1,是本申请实施例提供的一种演示文稿生成方法的流程示意图,本申请实施例可由电子设备来执行,该演示文稿生成方法包括以下步骤S101~S104。Please refer to FIG. 1, which is a schematic flowchart of a method for generating a presentation provided by an embodiment of the present application. The embodiment of the present application may be executed by an electronic device. The method for generating a presentation includes the following steps S101 to S104.
S101,获取待处理文件中的至少两个关键词,以及待处理文件的特征属性信息。S101: Acquire at least two keywords in a file to be processed and characteristic attribute information of the file to be processed.
在本申请实施例中,待处理文件是用户提供用于制作目标演示文件的文本文件,待处理文件中包含目标演示文件中每一页演示文稿对应的内容信息。可以通过LDA模型提取出待处理文件中的关键词,获得待处理文件中的至少两个关键词;并对待处理文件中的标题以及内容进行分析,得到待处理文件的特征属性信息。上述LDA模型是指文档主题生成模型,用来推测文档的主题分布,可以根据主题分布进行文本分类。该待处理文件的特征属性信息包括待处理文件所属的领域、待处理文件中的关键词的数量以及待处理文件的主题中的至少一种;其中,待处理文件所属的领域包括科技、教育、政党、金融、旅游等等,待处理文件的主题包括工作总结、结婚恋爱、毕业答辩、入职培训等等。In the embodiment of the present application, the file to be processed is a text file provided by the user for making the target presentation file, and the file to be processed contains content information corresponding to each page of the presentation in the target presentation file. The keywords in the file to be processed can be extracted through the LDA model to obtain at least two keywords in the file to be processed; and the title and content of the file to be processed can be analyzed to obtain characteristic attribute information of the file to be processed. The aforementioned LDA model refers to a document topic generation model, which is used to infer the topic distribution of a document, and text classification can be performed according to the topic distribution. The characteristic attribute information of the file to be processed includes at least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; wherein the field to which the file to be processed belongs includes technology, education, Political parties, finance, tourism, etc. The topics of the documents to be processed include work summary, marriage and love, graduation defense, induction training, and so on.
S102,根据上述至少两个关键词对待处理文件进行划分,得到至少两个文本片段。S102: Divide the file to be processed according to the above at least two keywords to obtain at least two text fragments.
可以将关键词在待处理文件中的段落,作为该关键词对应的文本片段;或者,可以根据关键词在待处理文件中的位置,对待处理文件进行划分,得到至少两个文本片段。文本片段对应的长度可以相同,也可以不相同;一个文本片段与至少一个关键词相对应,即文本片段对应的关键词可以是指该文本片段的主题,不同文本片段对应的关键词可以不相同。The paragraph of the keyword in the file to be processed can be used as the text segment corresponding to the keyword; or, the file to be processed can be divided according to the position of the keyword in the file to be processed to obtain at least two text segments. The length of the text fragments can be the same or different; a text fragment corresponds to at least one keyword, that is, the keyword corresponding to the text fragment can refer to the topic of the text fragment, and the keywords corresponding to different text fragments can be different .
进一步,步骤S102包括:获取至少两个关键词中每两个相邻关键词之间的相似度,将相似度大于相似度阈值的关键词在待处理文件中对应的段落划分至同一个文本片段中,得到至少两个文本片段,两个相邻关键词可以位于相邻段落,或者位于同一段落。通过根据关键词的相似度对待处理文件进行划分,可以提高对待处理文件划分的准确度。Further, step S102 includes: obtaining the similarity between every two adjacent keywords in the at least two keywords, and dividing the corresponding paragraphs in the file to be processed with the keywords whose similarity is greater than the similarity threshold into the same text segment In, at least two text fragments are obtained, and two adjacent keywords can be located in adjacent paragraphs, or located in the same paragraph. By dividing the files to be processed according to the similarity of keywords, the accuracy of dividing the files to be processed can be improved.
上述获取至少两个关键词中每两个相邻关键词之间的相似度,包括:采用距离算法获取至少两个关键词中每两个相邻关键词之间的距离,根据距离确定获取至少两个关键词中每两个相邻关键词之间的相似度。其中,两个相邻关键词之间距离越大,则两个相邻关键词之间相似度越小;两个相邻关键词之间距离越小,则两个相邻关键词之间相似度越大。距离算法可包括闵可夫斯基距离、曼哈顿距离、曼哈顿距离以及切比雪夫距离中的至少一种。The foregoing obtaining the similarity between every two adjacent keywords in at least two keywords includes: using a distance algorithm to obtain the distance between every two adjacent keywords in the at least two keywords, and determining to obtain the at least The similarity between every two adjacent keywords in two keywords. Among them, the greater the distance between two adjacent keywords, the smaller the similarity between two adjacent keywords; the smaller the distance between two adjacent keywords, the similarity between two adjacent keywords The greater the degree. The distance algorithm may include at least one of Minkowski distance, Manhattan distance, Manhattan distance, and Chebyshev distance.
可选的,可以根据上述至少两个关键词对待处理文件进行划分,得到至少两个候选文本片段;根据至少两个候选文件片段生成上述至少两个文本片段。其中,得到至少两个候选文本片段后,可以对该至少两个候选文本片段是否为新主题句进行判断。若候选文本片段为新主题句,则将候选文本片段确定为文本片段;若候选文本片段不为新主题句,则重新获取该候选文本片段的关键词,再判断重新获取关键词后的候选文本片段是否为新主题句。其中,可以根据每个候选文本片段的文本含义和每个候选文本片段对应的关键词的词含义,对每个候选文本片段是否为新主题句进行判断。例如,可以根据BERT模型,对每个候选文本片段是否为新主题句进行判断。BERT模型是指一种预训练语言表示的方法,在大量文本语料(维基百科)上训练了一个通用的“语言理解”模型,然后根据这个模型可以对候选文本片段进行分类,以及对候选文本是否为新主题句进行判断。可以获取每个候选文本片段的文本向量,并将每个关键词,以及每个关键词对应的候选文本片段的文本向量输入BERT模型中,得到1/0指示结果。每个候选文本片段的文本向量可以在BERT模型的训练过程中获得,每个候选文件片段的文本向量用于描述文本片段的全局语义信息。BERT模型会根据每个关键词的词含义和每个关键词对应的候选文本片段的文本向量,对每个候选文本片段是否为新主题句进行判断,BERT模型会输出1/0的指示结果确定每个候选文本片段是否为新主题句。若BERT模型输出1指示结果,则说明该候选文本片段为新主题句,则将该候选文本片段确定为文本片段;若BERT模型输出0指示结果,则说明该候选文本片段不为新主题句,则重新获取该候选文本片段的关键词,再判断重新获取关键词后的候选文本片段是否为新主题句。Optionally, the file to be processed may be divided according to the above at least two keywords to obtain at least two candidate text fragments; the above at least two text fragments are generated based on the at least two candidate file fragments. Wherein, after at least two candidate text fragments are obtained, it can be judged whether the at least two candidate text fragments are new topic sentences. If the candidate text fragment is a new topic sentence, the candidate text fragment is determined as a text fragment; if the candidate text fragment is not a new topic sentence, the keywords of the candidate text fragment are re-acquired, and then the candidate text after re-obtaining the keywords is determined Whether the fragment is a new topic sentence. Among them, it is possible to judge whether each candidate text fragment is a new topic sentence according to the text meaning of each candidate text fragment and the word meaning of the keyword corresponding to each candidate text fragment. For example, it is possible to judge whether each candidate text segment is a new topic sentence according to the BERT model. The BERT model refers to a method of pre-training language representation. A general "language understanding" model is trained on a large amount of text corpus (Wikipedia), and then based on this model, candidate text fragments can be classified, and whether the candidate text is Judge the new topic sentence. The text vector of each candidate text segment can be obtained, and each keyword and the text vector of the candidate text segment corresponding to each keyword can be input into the BERT model to obtain a 1/0 indication result. The text vector of each candidate text segment can be obtained during the training process of the BERT model, and the text vector of each candidate file segment is used to describe the global semantic information of the text segment. The BERT model will determine whether each candidate text segment is a new topic sentence based on the meaning of each keyword and the text vector of the candidate text segment corresponding to each keyword. The BERT model will output a 1/0 indication result to determine Whether each candidate text segment is a new topic sentence. If the BERT model outputs a 1 indicating result, it means that the candidate text segment is a new topic sentence, and then the candidate text fragment is determined as a text fragment; if the BERT model outputs a 0 indicating result, it means that the candidate text fragment is not a new topic sentence. Then, the keyword of the candidate text segment is reacquired, and then it is judged whether the candidate text segment after reacquiring the keywords is a new topic sentence.
S103,从演示文件模板库中,识别与待处理文件的特征属性信息匹配的目标演示文件模板。S103: Identify a target presentation file template matching the characteristic attribute information of the file to be processed from the presentation file template library.
演示文件模板库中包括多种演示文件模板,每种演示文件模板包括多张演示文稿,不同演示文件模板所包括的演示文稿的张数不一致;和/或,不同演示文件模板的颜色信息以及排版信息可以不相同,因此,不同演示文件模板适用于不同领域,或者,不同演示文件模板适用于不同的主题;或者,不同演示文件模板适用于生成不同长度的文本文件对应演示文件。The presentation file template library includes a variety of presentation file templates, each presentation file template includes multiple presentations, and the number of presentations included in different presentation file templates is inconsistent; and/or, the color information and layout of different presentation file templates The information can be different. Therefore, different presentation file templates are suitable for different fields, or different presentation file templates are suitable for different themes; or, different presentation file templates are suitable for generating text files of different lengths corresponding to the presentation files.
获取待处理文件的特征属性信息后,在演示文件模板库中,识别出与待处理文件的特征属性信息匹配的目标演示文件模板,根据该目标演示文件模板,生成待处理文件对应的目标演示文件。After obtaining the characteristic attribute information of the file to be processed, in the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed, and generate the target presentation file corresponding to the file to be processed according to the target presentation file template .
可选的,上述特征属性信息包括待处理文件中的关键词的数量,可以根据待处理文件中的关键词的数量确定至少两个文本片段中的文本片段数量;获取演示文件模板库中每个演示文件模板所包括的演示文稿的数量;将演示文件模板库中所包括的演示文本的数量,与文本片段数量相同的演示文件模板,确定为目标演示文件模板。Optionally, the aforementioned characteristic attribute information includes the number of keywords in the file to be processed, and the number of text fragments in at least two text fragments can be determined according to the number of keywords in the file to be processed; The number of presentations included in the presentation file template; the number of presentation texts included in the presentation file template library, which is the same as the number of text fragments, is determined as the target presentation file template.
可以根据待处理文件中关键词的数量,在演示文件模板库中确定目标演示文件模板。首先可以根据待处理文件的关键词的数量确定至少两个文本片段的文本片段数量,至少一个关键词对应一个文本片段,即将至少一个关键词对应的语句划分为一个文本片段。再获取目标演示文件模板库中每个演示文件模板所包括的演示文稿数量,将演示文件模板库中所包括的演示文件的数量,与文本片段数量相同的演示文件模板,确定为目标演示文件模板。其中,若演示文件模板库中所包括的演示文件的数量,与文本片段数量相同的演示文件模板有多个,则可以根据待处理文件的文本内容、用户指定或者随机挑选其中一个确定为目标演示文件模板。The target presentation file template can be determined in the presentation file template library according to the number of keywords in the file to be processed. First, the number of text fragments of at least two text fragments can be determined according to the number of keywords of the file to be processed. At least one keyword corresponds to one text fragment, that is, the sentence corresponding to the at least one keyword is divided into one text fragment. Then obtain the number of presentations included in each presentation file template in the target presentation file template library, and determine the number of presentation files included in the presentation file template library as the target presentation file template with the same number of text fragments as the presentation file template. . Among them, if the number of presentation files included in the presentation file template library is more than one presentation file template with the same number of text fragments, one of them can be determined as the target presentation based on the text content of the file to be processed, user designation, or random selection. Document template.
可选的,上述特征属性信息包括所述待处理文件的主题,可以获取演示文件模板库中的每个演示文件模板的主题;分别确定演示文件模板库中的每个演示文件模板的主题,与待处理文件的主题之间的匹配度;从演示文件模板库中选择匹配度最大的演示文件模板,作为目标演示文件模板。Optionally, the above-mentioned characteristic attribute information includes the theme of the file to be processed, the theme of each presentation file template in the presentation file template library can be obtained; the theme of each presentation file template in the presentation file template library is determined separately, and The matching degree between the themes of the files to be processed; the presentation file template with the largest matching degree is selected from the presentation file template library as the target presentation file template.
获取待处理文件后,可以根据待处理文件的文本内容获取待处理文件的主题,如工作总结、结婚恋爱、毕业答辩、入职培训等等,其中,并对待处理文件中的标题以及内容进行分析,获取待处理文件的主题。再获取演示文件模板库中每个演示文件模板的主题,分别确定演示文件模板库中每个演示文件模板对应的主题,与待处理文件的主题之间的匹配度。将演示文件模板库中演示文件模板的主题,与待处理文件的主题之间的匹配度最大的演示文件模板,作为目标演示文件模板。可以预先在演示文件模板库中存入多个主题的演示文件模板,一个演示文件模板对应一个主题。After obtaining the document to be processed, the subject of the document to be processed can be obtained according to the text content of the document to be processed, such as work summary, marriage and love, graduation defense, induction training, etc., and the title and content of the document to be processed are analyzed. Get the subject of the file to be processed. Then obtain the theme of each presentation file template in the presentation file template library, and determine the matching degree between the theme corresponding to each presentation file template in the presentation file template library and the theme of the file to be processed. The theme of the presentation file template in the presentation file template library and the presentation file template with the greatest matching degree between the theme of the file to be processed are used as the target presentation file template. You can store multiple theme presentation file templates in the presentation file template library in advance, and one presentation file template corresponds to one theme.
可选的,上述特征属性信息包括所述待处理文件所属的领域,可以根据待处理文件所属的领域,预测待处理文件对应的演示文件的属性信息,待处理文件对应的演示文件的属性信息包括待处理文件对应的演示文件的排版信息以及颜色信息;获取演示文件模板库中每个演示文件模板的属性信息,每个演示文件模板的属性信息包括每个演示文件模板的排版信息以及颜色信息;将演示文件模板库中演示文件模板的属性信息,与待处理文件对应的演示文件的属性信息匹配度最大的演示文件模板,确定为目标演示文件模板。Optionally, the above-mentioned characteristic attribute information includes the field to which the file to be processed belongs, and the attribute information of the presentation file corresponding to the file to be processed can be predicted according to the field to which the file to be processed belongs. The attribute information of the presentation file corresponding to the file to be processed includes The typesetting information and color information of the presentation file corresponding to the file to be processed; obtaining the attribute information of each presentation file template in the presentation file template library, and the attribute information of each presentation file template includes the typesetting information and color information of each presentation file template; The attribute information of the presentation file template in the presentation file template library and the presentation file template with the greatest degree of matching with the attribute information of the presentation file corresponding to the file to be processed are determined as the target presentation file template.
可以根据待处理文件所属的领域,在演示文件模板库中确定待处理文件对应的目标演示文件模板。可以根据待处理文件所属的领域,预测待处理文件对应的演示文件的属性信息,待处理文件对应的演示文件的属性信息包括待处理文件对应的演示文件的排版信息以及颜色信息。待处理文件所属的领域可以为科技、教育、政党、金融、旅游等等,如若待处理文件所属的领域为旅游类,预测该旅游类的待处理文件对应的演示文件的排版信息是对开排版,即页面的一半用于展示景色,页面的一半用于介绍该景色,或者多图排版等等。以及该旅游类的待处理文件对应的演示文件的颜色信息应该比较清新,如天蓝色、绿色等等。再获取演示文件模板库中每一个演示文件模板的属性信息,每一个演示文件模板的属性信息包括每个演示文件模板的排版信息以及颜色信息。将演示文件模板库中每一个演示文件模板的属性信息,与待处理文件对应的演示文件的属性信息进行匹配,获得匹配度最大的演示文件模板,将该匹配度最大的演示文件模板作为目标演示文件模板。The target presentation file template corresponding to the file to be processed can be determined in the presentation file template library according to the field to which the file to be processed belongs. The attribute information of the presentation file corresponding to the file to be processed can be predicted according to the field to which the file to be processed belongs. The attribute information of the presentation file corresponding to the file to be processed includes the typesetting information and color information of the presentation file corresponding to the file to be processed. The field of the document to be processed can be science and technology, education, political parties, finance, tourism, etc. If the field of the document to be processed is tourism, it is predicted that the typesetting information of the presentation file corresponding to the document to be processed is folio typesetting , That is, half of the page is used to display the scenery, half of the page is used to introduce the scenery, or multi-picture layout, etc. And the color information of the presentation file corresponding to the to-be-processed file of the tourism category should be relatively fresh, such as sky blue, green, and so on. Then obtain the attribute information of each presentation file template in the presentation file template library, and the attribute information of each presentation file template includes the typesetting information and color information of each presentation file template. Match the attribute information of each presentation file template in the presentation file template library with the attribute information of the presentation file corresponding to the file to be processed to obtain the presentation file template with the greatest matching degree, and use the presentation file template with the largest matching degree as the target presentation Document template.
S104,将上述至少两个文本片段导入至目标演示文件模板中,得到目标演示文件。S104: Import the above-mentioned at least two text fragments into the target presentation file template to obtain the target presentation file.
根据待处理文件的至少两个关键词对待处理文件进行划分,得到至少两个文本片段,以及获得目标演示文件模板后,将至少两个文本片段导入目标演示文件模板中,得到目标演示文件。目标演示文件模板包括多张演示文稿的模板,可以按照预设顺序将至少两个文本片段导入目标演示文件模板中,得到目标演示文件,即按照文本片段的排序,将至少两个文本片段中每个文本片段导入目标演示文件模板对应的演示文稿模板中,得到每个文件片段对应的演示文稿,根据每个演示文稿生成目标演示文件。预设顺序可以是根据至少两个文本片段中每个文本片段的关键词在待处理文件中的位置信息获得,可以根据该预设顺序将至少两个文本片段导入目标演示文件模板中。如有三个文本片段,即文本片段1、文本片段2、文本片段3,三个文本为片段之间排序为文本片段1先于文本片段2,文本片段2先于文本片段3。可以根据该三个文本片段之间的排序将该三个文本片段依次导入目标演示文件模板中,即文本片段1导入目标演示文件模板中的第一张演示文稿模板,文本片段2导入目标演示文件模板中的第二张演示文稿模板中,文本片段3导入目标演示文件模板中的第三张演示文稿模板中。当然,也可以根据每个文本片段的排版信息和文本信息,在目标演示文件模板中选择演示文稿模板,如文本片段4中有四个小标题,则可以在目标演示文件模板中选择有四个小标题样式的演示文稿模板。The file to be processed is divided according to at least two keywords of the file to be processed to obtain at least two text fragments, and after the target presentation file template is obtained, the at least two text fragments are imported into the target presentation file template to obtain the target presentation file. The target presentation file template includes multiple presentation templates. At least two text fragments can be imported into the target presentation file template in a preset order to obtain the target presentation file, that is, according to the order of the text fragments, each of the at least two text fragments The two text fragments are imported into the presentation template corresponding to the target presentation file template, the presentation corresponding to each file fragment is obtained, and the target presentation file is generated according to each presentation. The preset order may be obtained according to the position information of the keyword of each text fragment in the at least two text fragments in the file to be processed, and the at least two text fragments may be imported into the target presentation file template according to the preset order. If there are three text fragments, namely, text fragment 1, text fragment 2, and text fragment 3, the three text fragments are sorted as text fragment 1 before text fragment 2, and text fragment 2 before text fragment 3. The three text fragments can be imported into the target presentation file template in turn according to the order between the three text fragments, that is, text fragment 1 is imported into the first presentation template in the target presentation file template, and text fragment 2 is imported into the target presentation file. In the second presentation template in the template, text fragment 3 is imported into the third presentation template in the target presentation file template. Of course, you can also select the presentation template in the target presentation file template according to the typesetting information and text information of each text fragment. For example, if there are four subtitles in the text fragment 4, you can select four in the target presentation file template. Presentation template in headline style.
其中,如图2所示,是本申请实施例提供的一种将至少两个文本片段导入目标演示文件模板方法的示意图,如图2所示,本申请实施例提供的一种将至少两个文本片段导入目标演示文件模板方法包括步骤S21-S23。Wherein, as shown in FIG. 2, it is a schematic diagram of a method for importing at least two text fragments into a target presentation file template provided by an embodiment of the present application. As shown in FIG. 2, a method provided by an embodiment of the present application combines at least two The method for importing text fragments into a target presentation file template includes steps S21-S23.
S21,获取至少两个文本片段中每个文本片段的关键词在待处理文件中的位置信息。S21: Acquire location information in the file to be processed of the keyword of each of the at least two text fragments.
若待处理文件中的文本内容是用户根据需要展示内容的展示顺序进行收集的,则可以获取至少两个文本片段中每个文本片段对应的关键词所在待处理文件中的位置信息,并记录每个文本片段对应的关键词所在待处理文件中的位置信息。If the text content in the file to be processed is collected by the user according to the display order of the content that needs to be displayed, the location information in the file to be processed where the keyword corresponding to each text segment of at least two text fragments is located can be obtained, and each file is recorded. The location information in the file to be processed where the keyword corresponding to each text fragment is located.
S22,根据关键词在该处理文件中的位置信息,对至少两个文本片段进行排序。S22: Sort at least two text fragments according to the position information of the keywords in the processed file.
S23,将排序后的至少两个文本片段依次导入至目标演示文件模板中,得到目标演示文件。S23: Import at least two sorted text fragments into the target presentation file template in sequence to obtain the target presentation file.
根据关键词所在待处理文件中的位置信息,获得至少两个关键词的先后顺序,根据该至少两个关键词的先后顺序,确定该至少两个关键词对应的至少两个文本片段的排序顺序,对至少两个文本片段进行排序。Obtain the sequence of at least two keywords according to the location information in the file to be processed where the keywords are located, and determine the sequence of at least two text fragments corresponding to the at least two keywords according to the sequence of the at least two keywords To sort at least two text fragments.
目标演示文件模板中的各个演示文稿之间是有固定的先后顺序的,则将排序后的至少两个文本片段依次导入目标演示文件模板中,得到目标演示文件。Each presentation in the target presentation file template has a fixed sequence, and at least two sorted text fragments are sequentially imported into the target presentation file template to obtain the target presentation file.
可选的,获取至少两个文本片段中每个文本片段对应的文本特征信息;根据每个文本片段对应的文本特征信息确定每个文本片段的预处理方式,该预处理方式包括简化处理、拆分处理、校正处理、排版处理中的至少一种;根据每个文本片段的预处理方式对每个文本片段进行处理,得到处理后的至少两个文本片段;将处理后的至少两个文本片段导入至目标演示文件模板中,得到目标演示文件。Optionally, obtain the text feature information corresponding to each text segment in the at least two text segments; determine the preprocessing mode of each text segment according to the text feature information corresponding to each text segment, and the preprocessing mode includes simplified processing and disassembly. At least one of sub-processing, correction processing, and typesetting processing; each text segment is processed according to the preprocessing method of each text segment to obtain at least two processed text segments; the processed at least two text segments Import into the target presentation file template to get the target presentation file.
可以获取至少两个文本片段中每个文本片段对应的文本特征信息,根据每个文本片段对应的文本特征信息确定每个文本片段的预处理方式。该文本片段的文本特征信息包括文本片段的文本长度信息、文本片段的层级标题信息、文本片段的文字是否错误信息等等,该预处理方式包括简化处理、拆分处理、校正处理、排版处理中的至少一种。根据每个文本片段的预处理方式对每个文本片段进行处理,得到处理后的至少两个文本片段。如可以根据每一个文本片段的文本长度信息,对于文本长度过长的文本片段,可以根据TextRank(提取关键词)文本摘要的方法进行简化处理,并删减其中一些多余的文字,保证每一个文本片段的内容不会太累赘,提高每一个文本片段的可读性。对于文本片段中一些句子太长部分进行拆分,是每个句子之间读起来更加通顺。同时根据文本片段的层级标题信息,以及每一个文本片段的整体内容、每一个层级标题下的整体内容,利用seq2seq和 Pointer-Generator Network,为每一个文本片段中每一段内容生成一个标题。Seq2seq是机器学习里用于自然语言处理的一个算法簇,主要应用于语言翻译,图像字幕,会话模型和文本摘要提取,Pointer-Generator Network(指针产生网络)也是应用于文本摘要提取。另外,也对每一个文本片段中的文字信息进行校正,将每一个文本片段中错误的文字以及错误的标点符号进行校正。将处理后的至少两个文本片段导入目标演示文件模板中,得到目标演示文件。The text feature information corresponding to each of the at least two text segments can be acquired, and the preprocessing mode of each text segment can be determined according to the text feature information corresponding to each text segment. The text feature information of the text fragment includes the text length information of the text fragment, the hierarchical title information of the text fragment, the information whether the text of the text fragment is wrong, etc. The preprocessing method includes simplified processing, split processing, correction processing, and typesetting processing. At least one of. Each text segment is processed according to the preprocessing method of each text segment to obtain at least two processed text segments. For example, according to the text length information of each text fragment, for the text fragment with too long text length, you can simplify the processing according to the method of TextRank (extract keywords) text summary, and delete some of the redundant text to ensure that each text The content of the fragments is not too cumbersome, and the readability of each text fragment is improved. Splitting some sentences that are too long in the text fragment makes each sentence read more smoothly. At the same time, based on the hierarchical title information of the text fragments, the overall content of each text fragment, and the overall content under each hierarchical heading, use seq2seq and Pointer-Generator Network to generate a title for each content in each text fragment. Seq2seq is an algorithm cluster for natural language processing in machine learning. It is mainly used for language translation, image subtitles, conversation models and text summary extraction. Pointer-Generator Network (pointer generation network) is also used for text summary extraction. In addition, the text information in each text segment is also corrected, and the wrong text and wrong punctuation in each text segment are corrected. Import the processed at least two text fragments into the target presentation file template to obtain the target presentation file.
其中,将至少两个文本片段导入目标演示文件模板中时,可以根据每一个文本片段的排版信息以及文本内容,在目标演示文件模板中选择合适的演示文稿。如一个文本片段中包含四个小标题,则在目标演示文件模板中选择含有四个小标题的演示文稿。并且除了待处理文件中每一个文本片段对应的演示文稿页外,还会根据待处理文件的主题,设置开头页,根据待处理文件中每一个文本片段的内容设置目录页,以及设置结尾页,完成完整的目标演示文件。另外,完成目标演示文件后,将目标演示文件呈现为用户,可以接受用户对目标演示文件中对每一个演示文稿的颜色、字体以及内容进行调整的指令,目标演示文件中的每一个演示文稿都可以接受对图标的形状、线框的样式等进行个性化调整,最后获得最终版的目标演示文件。Among them, when importing at least two text fragments into the target presentation file template, a suitable presentation can be selected in the target presentation file template according to the typesetting information and text content of each text fragment. If a text fragment contains four subtitles, select a presentation containing four subtitles in the target presentation file template. In addition to the presentation page corresponding to each text fragment in the file to be processed, the opening page is set according to the theme of the file to be processed, the directory page is set according to the content of each text fragment in the file to be processed, and the end page is set. Complete the complete target presentation file. In addition, after the target presentation file is completed, the target presentation file is presented to the user, and the user's instructions to adjust the color, font, and content of each presentation in the target presentation file can be accepted. Each presentation in the target presentation file is You can accept personalized adjustments to the shape of the icon, the style of the wireframe, etc., and finally get the final version of the target presentation file.
可选的,本申请中的电子设备可以是指区块链中的任一节点设备,所谓区块链是一种分布式数据存储、点对点传输(P2P传输)、共识机制、加密算法等计算机技术的新型应用模式,其本质上是一个去中心化的数据库;区块链可由多个借由密码学串接并保护内容的串连交易记录(又称区块)构成,用区块链所串接的分布式账本能让多方有效纪录交易,且可永久查验此交易(不可篡改)。其中,共识机制是指区块链网络中实现不同节点之间建立信任、获取权益的数学算法;也就是说,共识机制是区块链各网络节点共同认可的一种数学算法。本申请可利用区块链的共识机制,来实现根据待处理文件生成目标演示文件,可提高还原目标演示文件的准确度。Optionally, the electronic device in this application can refer to any node device in the blockchain. The so-called blockchain is a computer technology such as distributed data storage, peer-to-peer transmission (P2P transmission), consensus mechanism, encryption algorithm, etc. The new type of application model is essentially a decentralized database; a block chain can be composed of multiple serial transaction records (also called blocks) that are connected and protected by cryptography. The connected distributed ledger allows multiple parties to effectively record the transaction, and the transaction can be permanently checked (not tampered with). Among them, the consensus mechanism refers to the mathematical algorithm that realizes the establishment of trust between different nodes and the acquisition of rights and interests in the blockchain network; that is to say, the consensus mechanism is a mathematical algorithm recognized by all network nodes of the blockchain. This application can use the consensus mechanism of the blockchain to generate the target demonstration file based on the files to be processed, which can improve the accuracy of restoring the target demonstration file.
例如,区块链中的各个节点设备对上述步骤S101~S104的执行结果进行共识验证,每个步骤的执行结果均被共识验证通过,则可以确定生成目标演示文件准确度比较高;如果存在步骤的执行结果未被共识验证通过,则可以确定生成目标演示文件的准确度比较低,则节点设备可以再次执行上述步骤S101~S104,重新获取目标演示文件。或者,区块链中的各个节点设备可以对目标演示文件(即仅对步骤S104的执行结果)进行共识验证,如果共识验证通过,则确定目标演示文件的准确度比较高;如果共识验证未通过,则确定目标演示文件的准确度比较低,节点设备可再次执行上述步骤S101~S104,重新获取目标演示文件。For example, each node device in the blockchain performs consensus verification on the execution results of the above steps S101~S104, and the execution result of each step is passed by the consensus verification, it can be determined that the accuracy of generating the target presentation file is relatively high; if there are steps If the execution result of is not passed by the consensus verification, it can be determined that the accuracy of generating the target demonstration file is relatively low, and the node device may perform the above steps S101 to S104 again to obtain the target demonstration file again. Alternatively, each node device in the blockchain can perform consensus verification on the target presentation file (that is, only the execution result of step S104). If the consensus verification is passed, it is determined that the accuracy of the target presentation file is relatively high; if the consensus verification fails , It is determined that the accuracy of the target presentation file is relatively low, and the node device may perform the above steps S101 to S104 again to obtain the target presentation file again.
本申请实施例中,通过获取待处理文件中的至少两个关键词,以及待处理文件的特征属性信息;根据至少两个关键词对待处理文件进行划分,得到至少两个文本片段;从演示文件模板库中,识别与待处理文件的特征属性信息匹配的目标演示文件模板;将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件。在本方案中,根据上述至少两个关键词对待处理文件进行划分,得到至少两个文本片段,有利于生成每个文件片段对应的演示文稿。其中,会对每一个文本片段中的内容进行简化处理、拆分处理、校正处理或者排版处理,简化并校正每一个文本片段的内容,提高生成目标演示文件的准确性。并识别出与待处理文件的特征属性信息匹配的目标演示文件模板,将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件,目标演示文件中包括每个文本片段对应的演示文稿。生成目标演示文件的整个过程中无需人工参与,直接输出生成结果,可提高演示文稿生成的效率以及灵活性,并确保演示文稿生成的准确率性、切题性。In the embodiment of the present application, by acquiring at least two keywords in the file to be processed and the characteristic attribute information of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments; from the presentation file In the template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import at least two text fragments into the target presentation file template to obtain the target presentation file. In this solution, the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment. Among them, the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file. And identify the target presentation file template that matches the characteristic attribute information of the file to be processed, and import at least two text fragments into the target presentation file template to obtain the target presentation file. The target presentation file includes the presentation corresponding to each text fragment . The entire process of generating the target presentation file does not require human involvement, and the generated results are directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
请参见图3,是本申请实施例提供的另一种演示文稿生成方法的流程示意图,本申请实施例电子设备来执行,该另一种演示文稿生成方法包括以下步骤S201~S206。Please refer to FIG. 3, which is a schematic flowchart of another method for generating a presentation provided by an embodiment of the present application, which is executed by the electronic device in the embodiment of the present application. The another method for generating a presentation includes the following steps S201 to S206.
S201,获取待处理文件中的至少两个关键词,以及待处理文件的特征属性信息。S201: Acquire at least two keywords in a file to be processed and characteristic attribute information of the file to be processed.
S202,根据至少两个关键词对待处理文件进行划分,得到至少两个文本片段。S202: Divide the file to be processed according to at least two keywords to obtain at least two text fragments.
S203,从演示文件模板库中,识别与待处理文件的特征属性信息匹配的目标演示文件模板。S203: Identify a target presentation file template matching the characteristic attribute information of the file to be processed from the presentation file template library.
在本申请实施例中,该另一种演示文稿生成方法步骤S201~S203中的内容可以参看图1所示的内容,本申请实施例不再累述。In the embodiment of the present application, the content in steps S201 to S203 of the another method for generating a presentation can refer to the content shown in FIG. 1, which will not be repeated in the embodiment of the present application.
S204,获取至少两个文本片段中每两个文本片段的关键词之间的从属关系。S204: Obtain the affiliation relationship between keywords of every two text fragments in the at least two text fragments.
S205,根据每两个文本片段的关键词之间的从属关系,对至少两个文本片段进行排序。S205: Sort at least two text fragments according to the affiliation between the keywords of every two text fragments.
S206,将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到目标演示文件。S206: Import at least two sorted text fragments into the target presentation file template in sequence to obtain the target presentation file.
可以根据BERT模型获取至少两个文本片段中每两个文本片段对应的关键词之间的从属关系,该从属关系可以是指各个关键词之间的包含关系以及先后关系。可根据每两个文本片段的关键词之间的从属关系确定至少两个文本片段之间的先后顺序,并根据该先后顺序对至少两个文本片段进行排序。例如该待处理文件是关于某个地方的旅游宣传,若文本片段1对应的关键词为某个地方的历史文化,文本片段2对应的关键词为某个地方民国时期的故事,由于历史文化包含某个时期的文化,因此,文本片段1对应的关键词的排列顺序位于文本片段2对应的关键词的排列顺序之前。根据每两个文本片段的关键词之间的从属关系,对至少两个文本片段进行排序,将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到目标演示文件,提高演示文稿的准确度。The affiliation relationship between keywords corresponding to each of the at least two text fragments can be obtained according to the BERT model, and the affiliation relationship may refer to the containment relationship and the sequence relationship between the keywords. The sequence between the at least two text segments can be determined according to the subordination relationship between the keywords of every two text segments, and the at least two text segments can be sorted according to the sequence. For example, the document to be processed is about tourism promotion of a certain place. If the key word corresponding to text fragment 1 is the historical culture of a certain place, the key word corresponding to text fragment 2 is the story of a certain place during the Republic of China. A certain period of culture, therefore, the order of the keywords corresponding to the text fragment 1 is before the order of the keywords corresponding to the text fragment 2. According to the subordination relationship between the keywords of each two text fragments, at least two text fragments are sorted, and the sorted at least two text fragments are sequentially imported into the target presentation file template to obtain the target presentation file. The accuracy of the presentation.
本申请实施例中,通过获取待处理文件中的至少两个关键词,以及待处理文件的特征属性信息;根据至少两个关键词对待处理文件进行划分,得到至少两个文本片段;从演示文件模板库中,识别与待处理文件的特征属性信息匹配的目标演示文件模板;将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件。在本方案中,根据上述至少两个关键词对待处理文件进行划分,得到至少两个文本片段,有利于生成每个文件片段对应的演示文稿。其中,会对每一个文本片段中的内容进行简化处理、拆分处理、校正处理或者排版处理,简化并校正每一个文本片段的内容,提高生成目标演示文件的准确性。并识别出与待处理文件的特征属性信息匹配的目标演示文件模板,将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件,目标演示文件中包括每个文本片段对应的演示文稿。将至少两个文本片段导入目标演示文件模板中时,会根据至少两个文本片段中每两个文本片段的关键词之间的从属关系,对至少两个文本片段进行排序,再将排序后的至少两个文本片段导入目标演示文件模板中,使目标演示文件中的任意两个演示文稿之间具有合理的先后顺序,提高目标演示文件生成的准确率。并且在生成目标演示文件的整个过程中无需人工参与,直接输出生成结果,可提高演示文稿生成的效率以及灵活性,并确保演示文稿生成的准确率性、切题性。In the embodiment of the present application, by acquiring at least two keywords in the file to be processed and the characteristic attribute information of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments; from the presentation file In the template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import at least two text fragments into the target presentation file template to obtain the target presentation file. In this solution, the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment. Among them, the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file. And identify the target presentation file template that matches the characteristic attribute information of the file to be processed, and import at least two text fragments into the target presentation file template to obtain the target presentation file. The target presentation file includes the presentation corresponding to each text fragment . When importing at least two text fragments into the target presentation file template, at least two text fragments will be sorted according to the affiliation between the keywords of each two text fragments in the at least two text fragments, and then the sorted Import at least two text fragments into the target presentation file template, so that any two presentations in the target presentation file have a reasonable sequence, and improve the accuracy of the target presentation file generation. And in the entire process of generating the target presentation file, there is no need for human involvement, and the generated result is directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
请参见图4,是本申请实施例提供的一种演示文稿生成装置的结构示意图,本申请实施例的所述演示文稿生成装置可以在上述提及的电子设备中。本实施例中,该数据处理装置包括以下:获取模块11,用于获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;划分模块12,用于根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;识别模块13,用于从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;其中,识别模块13包括:第一确定单元、第一获取单元、第二确定单元。第一确定单元,用于根据所述待处理文件中的关键词的数量确定所述至少两个文本片段中的文本片段数量;第一获取单元,用于获取所述演示文件模板库中每个演示文件模板所包括的演示文稿的数量;第二确定单元,用于将所述演示文件模板库中所包括的演示文本的数量,与所述文本片段数量相同的演示文件模板,确定为所述目标演示文件模板。Refer to FIG. 4, which is a schematic structural diagram of a presentation generating apparatus provided by an embodiment of the present application. The presentation generating apparatus of the embodiment of the present application may be in the above-mentioned electronic device. In this embodiment, the data processing device includes the following: an obtaining module 11, configured to obtain at least two keywords in the file to be processed, and characteristic attribute information of the file to be processed, the characteristic attribute information including the to-be-processed file At least one of the field to which the file to be processed belongs, the number of keywords in the file to be processed, and the subject of the file to be processed; The file is divided to obtain at least two text fragments, one text fragment corresponds to at least one keyword; the recognition module 13 is used to identify the target presentation that matches the characteristic attribute information of the file to be processed from the presentation file template library File template; wherein the identification module 13 includes: a first determining unit, a first acquiring unit, and a second determining unit. The first determining unit is configured to determine the number of text fragments in the at least two text fragments according to the number of keywords in the file to be processed; the first acquiring unit is configured to acquire each of the presentation file template libraries The number of presentation documents included in the presentation file template; the second determining unit is used to determine the number of presentation texts included in the presentation file template library as the presentation file template with the same number of text fragments as the Target presentation file template.
其中,识别模块13还包括:第二获取单元、第三确定单元、选择单元。Wherein, the identification module 13 further includes: a second acquisition unit, a third determination unit, and a selection unit.
第二获取单元,用于获取所述演示文件模板库中的每个演示文件模板的主题;第三确定单元,用于分别确定所述演示文件模板库中的每个演示文件模板的主题,与所述待处理文件的主题之间的匹配度;选择单元,用于从所述演示文件模板库中选择匹配度最大的演示文件模板,作为所述目标演示文件模板。The second obtaining unit is used to obtain the theme of each presentation file template in the presentation file template library; the third determining unit is used to separately determine the theme of each presentation file template in the presentation file template library, and The degree of matching between the topics of the files to be processed; a selection unit for selecting a presentation file template with the largest matching degree from the presentation file template library as the target presentation file template.
其中,所述识别模块13还包括:预测单元、第三获取单元、第四确定单元。Wherein, the identification module 13 further includes: a prediction unit, a third acquisition unit, and a fourth determination unit.
预测单元,用于根据所述待处理文件所属的领域,预测所述待处理文件对应的演示文件的属性信息,所述待处理文件对应的演示文件的属性信息包括所述待处理文件对应的演示文件的排版信息以及颜色信息;第三获取单元,用于获取所述演示文件模板库中每个演示文件模板的属性信息,所述每个演示文件模板的属性信息包括所述每个演示文件模板的排版信息以及颜色信息;第四确定单元,用于将所述演示文件模板库中演示文件模板的属性信息,与所述待处理文件对应的演示文件的属性信息匹配度最大的演示文件模板,确定为所述目标演示文件模板。The prediction unit is configured to predict the attribute information of the presentation file corresponding to the file to be processed according to the field to which the file to be processed belongs, and the attribute information of the presentation file corresponding to the file to be processed includes the presentation corresponding to the file to be processed Typesetting information and color information of the file; the third obtaining unit is used to obtain the attribute information of each presentation file template in the presentation file template library, and the attribute information of each presentation file template includes each presentation file template. The typesetting information and color information of the file; the fourth determining unit is used to combine the attribute information of the presentation file template in the presentation file template library with the presentation file template with the greatest degree of match between the attribute information of the presentation file corresponding to the file to be processed, Determined as the target presentation file template.
导入模块14,用于将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。The import module 14 is configured to import the at least two text fragments into the target presentation file template to obtain the target presentation file.
其中,所述导入模块14包括:第四获取单元、第一排序单元、第一导入单元。Wherein, the import module 14 includes: a fourth acquisition unit, a first sorting unit, and a first import unit.
第四获取单元,用于获取所述至少两个文本片段中每个文本片段的关键词在所述待处理文件中的位置信息;第一排序单元,用于根据所述位置信息,对所述至少两个文本片段进行排序;第一导入单元,用于将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。The fourth obtaining unit is configured to obtain the position information of the keyword of each text fragment in the at least two text fragments in the to-be-processed file; the first sorting unit is configured to compare the position information to the At least two text fragments are sorted; the first import unit is configured to import the at least two text fragments sorted into the target presentation file template in order to obtain the target presentation file.
其中,所述导入模块14还包括:第五获取单元、第二排序单元、第二导入单元。Wherein, the import module 14 further includes: a fifth acquisition unit, a second sorting unit, and a second import unit.
第五获取单元,用于获取所述至少两个文本片段中每两个文本片段的关键词之间的从属关系;第二排序单元,用于根据所述从属关系,对所述至少两个文本片段进行排序;第二导入单元,用于将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。The fifth acquiring unit is configured to acquire the affiliation relationship between keywords of each two text fragments in the at least two text fragments; the second sorting unit is configured to compare the at least two texts according to the affiliation relationship. The fragments are sorted; the second import unit is used to import at least two text fragments sorted into the target presentation file template in sequence to obtain the target presentation file.
其中,所述导入模块14还包括:第六获取单元、第五确定单元、处理单元、第三导入单元。Wherein, the import module 14 further includes: a sixth acquisition unit, a fifth determination unit, a processing unit, and a third import unit.
第六获取单元,用于获取所述至少两个文本片段中每个文本片段对应的文本特征信息;第五确定单元,用于根据所述每个文本片段对应的文本特征信息确定所述每个文本片段的预处理方式,所述预处理方式包括简化处理、拆分处理、校正处理、排版处理中的至少一种;处理单元,用于根据所述每个文本片段的预处理方式对所述每个文本片段进行处理,得到处理后的至少两个文本片段;第三导入单元,用于将所述处理后的至少两个文本片段导入至所述目标演示文件模板中,得到所述目标演示文件。The sixth acquiring unit is configured to acquire the text feature information corresponding to each of the at least two text fragments; the fifth determining unit is configured to determine the text feature information corresponding to each of the text fragments. A preprocessing method for text fragments, the preprocessing method includes at least one of simplification processing, split processing, correction processing, and typesetting processing; a processing unit configured to perform processing on the preprocessing method of each text fragment Each text fragment is processed to obtain at least two processed text fragments; the third import unit is configured to import the processed at least two text fragments into the target presentation file template to obtain the target presentation file.
本申请实施例中,通过获取待处理文件中的至少两个关键词,以及待处理文件的特征属性信息;根据至少两个关键词对待处理文件进行划分,得到至少两个文本片段;从演示文件模板库中,识别与待处理文件的特征属性信息匹配的目标演示文件模板;将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件。在本方案中,根据上述至少两个关键词对待处理文件进行划分,得到至少两个文本片段,有利于生成每个文件片段对应的演示文稿。其中,会对每一个文本片段中的内容进行简化处理、拆分处理、校正处理或者排版处理,简化并校正每一个文本片段的内容,提高生成目标演示文件的准确性。并识别出与待处理文件的特征属性信息匹配的目标演示文件模板,将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件,目标演示文件中包括每个文本片段对应的演示文稿。将至少两个文本片段导入目标演示文件模板中时,会根据至少两个文本片段中每两个文本片段的关键词之间的从属关系,对至少两个文本片段进行排序,再将排序后的至少两个文本片段导入目标演示文件模板中,使目标演示文件中的任意两个演示文稿之间具有合理的先后顺序,提高目标演示文件生成的准确率。并且在生成目标演示文件的整个过程中无需人工参与,直接输出生成结果,可提高演示文稿生成的效率以及灵活性,并确保演示文稿生成的准确率性、切题性。In the embodiment of the present application, by acquiring at least two keywords in the file to be processed and the characteristic attribute information of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments; from the presentation file In the template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import at least two text fragments into the target presentation file template to obtain the target presentation file. In this solution, the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment. Among them, the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file. And identify the target presentation file template that matches the characteristic attribute information of the file to be processed, and import at least two text fragments into the target presentation file template to obtain the target presentation file. The target presentation file includes the presentation corresponding to each text fragment . When importing at least two text fragments into the target presentation file template, at least two text fragments will be sorted according to the affiliation between the keywords of each two text fragments in the at least two text fragments, and then the sorted Import at least two text fragments into the target presentation file template, so that any two presentations in the target presentation file have a reasonable sequence, and improve the accuracy of the target presentation file generation. And in the entire process of generating the target presentation file, there is no need for human involvement, and the generated result is directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
请参见图5,是本申请实施例提供的一种电子设备的结构示意图,如图5所示的本实施例中的电子设备可以包括:一个或多个处理器21;一个或多个输入装置22,一个或多个输出装置23和存储器24。上述处理器21、输入装置22、输出装置23和存储器24通过总线25连接。Please refer to FIG. 5, which is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 5, the electronic device in this embodiment may include: one or more processors 21; and one or more input devices 22. One or more output devices 23 and storage 24. The aforementioned processor 21, input device 22, output device 23, and memory 24 are connected by a bus 25.
所处理器21可以是中央处理单元(Central
Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器 (Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现成可编程门阵列 (Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 21 may be a central processing unit (Central
Processing Unit, CPU), the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), ready-made programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
输入装置22可以包括触控板、指纹采传感器(用于采集用户的指纹信息和指纹的方向信息)、麦克风等,输出装置23可以包括显示器(LCD等)、扬声器等,输出装置23可以输出校正处理后的数据表。The input device 22 may include a touch panel, a fingerprint sensor (used to collect user fingerprint information and fingerprint orientation information), a microphone, etc., the output device 23 may include a display (LCD, etc.), a speaker, etc., and the output device 23 may output calibration The processed data sheet.
该存储器24可以包括只读存储器和随机存取存储器,并向处理器21提供指令和数据。存储器24的一部分还可以包括非易失性随机存取存储器,存储器24用于存储计算机程序,所述计算机程序包括程序指令,处理器21用于执行存储器24存储的程序指令,以用于执行一种演示文稿生成方法,即用于执行以下操作:获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。The memory 24 may include a read-only memory and a random access memory, and provides instructions and data to the processor 21. A part of the memory 24 may also include a non-volatile random access memory. The memory 24 is used to store a computer program. The computer program includes program instructions. The processor 21 is used to execute the program instructions stored in the memory 24 to execute a program. A method for generating a presentation is used to perform the following operations: acquiring at least two keywords in a file to be processed and characteristic attribute information of the file to be processed, where the characteristic attribute information includes the field to which the file to be processed belongs At least one of the number of keywords in the file to be processed and the subject of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments, One text fragment corresponds to at least one keyword; from the presentation file template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import the at least two text fragments into the target presentation In the file template, the target presentation file is obtained.
可选的,处理器21用于执行存储器24存储的程序指令,用于执行以下操作:根据所述待处理文件中的关键词的数量确定所述至少两个文本片段中的文本片段数量;获取所述演示文件模板库中每个演示文件模板所包括的演示文稿的数量;将所述演示文件模板库中所包括的演示文本的数量,与所述文本片段数量相同的演示文件模板,确定为所述目标演示文件模板。Optionally, the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: determine the number of text fragments in the at least two text fragments according to the number of keywords in the file to be processed; The number of presentations included in each presentation file template in the presentation file template library; the number of presentation texts included in the presentation file template library is determined to be the presentation file template with the same number of text fragments as The target presentation file template.
可选的,处理器21用于执行存储器24存储的程序指令,用于执行以下操作:获取所述演示文件模板库中的每个演示文件模板的主题;分别确定所述演示文件模板库中的每个演示文件模板的主题,与所述待处理文件的主题之间的匹配度;从所述演示文件模板库中选择匹配度最大的演示文件模板,作为所述目标演示文件模板。Optionally, the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: obtain the theme of each presentation file template in the presentation file template library; respectively determine the presentation file template in the presentation file template library The matching degree between the theme of each presentation file template and the theme of the file to be processed; the presentation file template with the largest matching degree is selected from the presentation file template library as the target presentation file template.
可选的,处理器21用于执行存储器24存储的程序指令,用于执行以下操作:根据所述待处理文件所属的领域,预测所述待处理文件对应的演示文件的属性信息,所述待处理文件对应的演示文件的属性信息包括所述待处理文件对应的演示文件的排版信息以及颜色信息;获取所述演示文件模板库中每个演示文件模板的属性信息,所述每个演示文件模板的属性信息包括所述每个演示文件模板的排版信息以及颜色信息;将所述演示文件模板库中演示文件模板的属性信息,与所述待处理文件对应的演示文件的属性信息匹配度最大的演示文件模板,确定为所述目标演示文件模板。Optionally, the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: predict the attribute information of the presentation file corresponding to the to-be-processed file according to the field to which the to-be-processed file belongs, and the to-be-processed file The attribute information of the presentation file corresponding to the processed file includes the typesetting information and color information of the presentation file corresponding to the file to be processed; the attribute information of each presentation file template in the presentation file template library is acquired, and each presentation file template is The attribute information includes the typesetting information and color information of each presentation file template; the attribute information of the presentation file template in the presentation file template library is matched with the attribute information of the presentation file corresponding to the file to be processed. The presentation file template is determined as the target presentation file template.
可选的,处理器21用于执行存储器24存储的程序指令,用于执行以下操作:获取所述至少两个文本片段中每个文本片段的关键词在所述待处理文件中的位置信息;根据所述位置信息,对所述至少两个文本片段进行排序;将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。Optionally, the processor 21 is configured to execute program instructions stored in the memory 24, and is configured to perform the following operations: obtain the position information of the keyword of each text fragment in the at least two text fragments in the file to be processed; The at least two text fragments are sorted according to the location information; the at least two text fragments after sorting are sequentially imported into the target presentation file template to obtain the target presentation file.
可选的,处理器21用于执行存储器24存储的程序指令,用于执行以下操作:获取所述至少两个文本片段中每两个文本片段的关键词之间的从属关系;根据所述从属关系,对所述至少两个文本片段进行排序;将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。Optionally, the processor 21 is configured to execute program instructions stored in the memory 24, and is configured to perform the following operations: obtain the affiliation relationship between keywords of every two text fragments in the at least two text fragments; Relationship, sort the at least two text fragments; import the at least two text fragments sorted into the target presentation file template in sequence to obtain the target presentation file.
可选的,处理器21用于执行存储器24存储的程序指令,用于执行以下操作:获取所述至少两个文本片段中每个文本片段对应的文本特征信息;根据所述每个文本片段对应的文本特征信息确定所述每个文本片段的预处理方式,所述预处理方式包括简化处理、拆分处理、校正处理、排版处理中的至少一种;根据所述每个文本片段的预处理方式对所述每个文本片段进行处理,得到处理后的至少两个文本片段;将所述处理后的至少两个文本片段导入至所述目标演示文件模板中,得到所述目标演示文件。Optionally, the processor 21 is configured to execute program instructions stored in the memory 24 to perform the following operations: obtain the text feature information corresponding to each of the at least two text fragments; Determine the preprocessing mode of each text segment in the text feature information, the preprocessing mode includes at least one of simplification processing, split processing, correction processing, and typesetting processing; according to the preprocessing of each text segment Each of the text fragments is processed in a manner to obtain at least two text fragments after processing; the at least two text fragments after the processing are imported into the target presentation file template to obtain the target presentation file.
本申请实施例中所描述的处理器21、输入装置22、输出装置23可执行本申请实施例提供的演示文稿生成方法的第一实施例和第二实施例中所描述的实现方式,也可执行本申请实施例所描述的电子设备的实现方式,在此不再赘述。The processor 21, the input device 22, and the output device 23 described in the embodiments of this application can execute the implementations described in the first embodiment and the second embodiment of the presentation generation method provided in the embodiments of this application. The implementation of the electronic device described in the embodiments of the present application is implemented, which will not be repeated here.
本申请实施例中,通过获取待处理文件中的至少两个关键词,以及待处理文件的特征属性信息;根据至少两个关键词对待处理文件进行划分,得到至少两个文本片段;从演示文件模板库中,识别与待处理文件的特征属性信息匹配的目标演示文件模板;将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件。在本方案中,根据上述至少两个关键词对待处理文件进行划分,得到至少两个文本片段,有利于生成每个文件片段对应的演示文稿。其中,会对每一个文本片段中的内容进行简化处理、拆分处理、校正处理或者排版处理,简化并校正每一个文本片段的内容,提高生成目标演示文件的准确性。并识别出与待处理文件的特征属性信息匹配的目标演示文件模板,将至少两个文本片段导入至目标演示文件模板中,得到目标演示文件,目标演示文件中包括每个文本片段对应的演示文稿。将至少两个文本片段导入目标演示文件模板中时,会根据至少两个文本片段中每两个文本片段的关键词之间的从属关系,对至少两个文本片段进行排序,再将排序后的至少两个文本片段导入目标演示文件模板中,使目标演示文件中的任意两个演示文稿之间具有合理的先后顺序,提高目标演示文件生成的准确率。并且在生成目标演示文件的整个过程中无需人工参与,直接输出生成结果,可提高演示文稿生成的效率以及灵活性,并确保演示文稿生成的准确率性、切题性。In the embodiment of this application, by acquiring at least two keywords in the file to be processed and the characteristic attribute information of the file to be processed; dividing the file to be processed according to the at least two keywords to obtain at least two text fragments; from the presentation file In the template library, identify the target presentation file template matching the characteristic attribute information of the file to be processed; import at least two text fragments into the target presentation file template to obtain the target presentation file. In this solution, the file to be processed is divided according to the above at least two keywords to obtain at least two text fragments, which is beneficial to generating a presentation corresponding to each file fragment. Among them, the content in each text segment will be simplified, split, corrected, or typeset to simplify and correct the content of each text segment, and improve the accuracy of generating the target presentation file. And identify the target presentation file template that matches the characteristic attribute information of the file to be processed, and import at least two text fragments into the target presentation file template to obtain the target presentation file. The target presentation file includes the presentation corresponding to each text fragment . When importing at least two text fragments into the target presentation file template, the at least two text fragments will be sorted according to the affiliation between the keywords of each two text fragments in the at least two text fragments, and then the sorted Import at least two text fragments into the target presentation file template, so that any two presentations in the target presentation file have a reasonable sequence, and improve the accuracy of the target presentation file generation. And in the whole process of generating the target presentation file, there is no need for manual participation, and the generated result is directly output, which can improve the efficiency and flexibility of presentation generation, and ensure the accuracy and relevance of the presentation.
本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时实现如图1及图3实施例中所示的演示文稿生成方法。所述计算机可读存储介质可以是非易失性,也可以是易失性。An embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions. When the program instructions are executed by a processor, the implementation is shown in FIG. 1 and FIG. 3 The presentation generation method shown in the embodiment. The computer-readable storage medium may be non-volatile or volatile.
所述计算机可读存储介质可以是前述任一实施例所述的电子设备的内部存储单元,例如控制设备的硬盘或内存。所述计算机可读存储介质也可以是所述控制设备的外部存储设备,例如所述控制设备上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital,
SD)卡,闪存卡(Flash Card)等。进一步地,所述计算机可读存储介质还可以既包括所述控制设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述控制设备所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of the electronic device described in any of the foregoing embodiments, such as a hard disk or a memory of a control device. The computer-readable storage medium may also be an external storage device of the control device, such as a plug-in hard disk equipped on the control device, a smart memory card (Smart Media Card, SMC), or a secure digital (Secure Digital,
SD) card, flash card (Flash Card), etc. Further, the computer-readable storage medium may also include both an internal storage unit of the control device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the control device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
作为示例,上述计算机可读存储介质可被部署在一个计算机设备上执行,或者被部署位于一个地点的多个计算机设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算机设备上执行,分布在多个地点且通过通信网络互连的多个计算机设备可以组成区块链网络。As an example, the above-mentioned computer-readable storage medium may be deployed on one computer device for execution, or deployed on multiple computer devices located in one location, or on multiple computer devices that are distributed in multiple locations and interconnected by a communication network. Executed on a computer device, multiple computer devices distributed in multiple locations and interconnected by a communication network can form a blockchain network.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.
Claims (20)
- 一种演示文稿生成方法,其中,包括:A method for generating presentations, including:获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;Obtain at least two keywords in the file to be processed, and characteristic attribute information of the file to be processed, where the characteristic attribute information includes the field to which the file to be processed belongs, the number of keywords in the file to be processed, and At least one of the themes of the file to be processed;根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;Dividing the file to be processed according to the at least two keywords to obtain at least two text fragments, one text fragment corresponding to at least one keyword;从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;From the presentation file template library, identify the target presentation file template that matches the characteristic attribute information of the file to be processed;将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。Import the at least two text fragments into the target presentation file template to obtain the target presentation file.
- 根据权利要求1所述的方法,其中,所述特征属性信息包括所述待处理文件中的关键词的数量,所述从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板,包括:The method according to claim 1, wherein the characteristic attribute information includes the number of keywords in the file to be processed, and the identification from the presentation file template library matches the characteristic attribute information of the file to be processed The target presentation file template includes:根据所述待处理文件中的关键词的数量确定所述至少两个文本片段中的文本片段数量;Determining the number of text fragments in the at least two text fragments according to the number of keywords in the file to be processed;获取所述演示文件模板库中每个演示文件模板所包括的演示文稿的数量;Acquiring the number of presentations included in each presentation file template in the presentation file template library;将所述演示文件模板库中所包括的演示文本的数量,与所述文本片段数量相同的演示文件模板,确定为所述目标演示文件模板。The number of presentation texts included in the presentation file template library and the presentation file template with the same number of text fragments are determined as the target presentation file template.
- 根据权利要求1所述的方法,其中,所述特征属性信息包括所述待处理文件的主题,所述从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板,包括:The method according to claim 1, wherein the characteristic attribute information includes the subject of the file to be processed, and the target presentation file that matches the characteristic attribute information of the file to be processed is identified from the presentation file template library Templates, including:获取所述演示文件模板库中的每个演示文件模板的主题;Acquiring the theme of each presentation file template in the presentation file template library;分别确定所述演示文件模板库中的每个演示文件模板的主题,与所述待处理文件的主题之间的匹配度;Respectively determine the matching degree between the theme of each presentation file template in the presentation file template library and the theme of the file to be processed;从所述演示文件模板库中选择匹配度最大的演示文件模板,作为所述目标演示文件模板。Select the presentation file template with the greatest matching degree from the presentation file template library as the target presentation file template.
- 根据权利要求1所述的方法,其中,所述特征属性信息包括所述待处理文件所属的领域,所述从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板,包括:The method according to claim 1, wherein the characteristic attribute information includes the field to which the file to be processed belongs, and the presentation file template library is used to identify a target presentation that matches the characteristic attribute information of the file to be processed Document templates, including:根据所述待处理文件所属的领域,预测所述待处理文件对应的演示文件的属性信息,所述待处理文件对应的演示文件的属性信息包括所述待处理文件对应的演示文件的排版信息以及颜色信息;According to the field to which the file to be processed belongs, the attribute information of the presentation file corresponding to the file to be processed is predicted, and the attribute information of the presentation file corresponding to the file to be processed includes the typesetting information of the presentation file corresponding to the file to be processed and Color information获取所述演示文件模板库中每个演示文件模板的属性信息,所述每个演示文件模板的属性信息包括所述每个演示文件模板的排版信息以及颜色信息;Acquiring attribute information of each presentation file template in the presentation file template library, where the attribute information of each presentation file template includes typesetting information and color information of each presentation file template;将所述演示文件模板库中演示文件模板的属性信息,与所述待处理文件对应的演示文件的属性信息匹配度最大的演示文件模板,确定为所述目标演示文件模板。The attribute information of the presentation file template in the presentation file template library and the presentation file template with the greatest degree of matching with the attribute information of the presentation file corresponding to the file to be processed are determined as the target presentation file template.
- 根据权利要求1所述的方法,其中,所述将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件,包括:The method according to claim 1, wherein said importing said at least two text fragments into said target presentation file template to obtain a target presentation file comprises:获取所述至少两个文本片段中每个文本片段的关键词在所述待处理文件中的位置信息;Acquiring the position information of the keyword of each of the at least two text fragments in the file to be processed;根据所述位置信息,对所述至少两个文本片段进行排序;Sort the at least two text fragments according to the location information;将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。At least two sorted text fragments are sequentially imported into the target presentation file template to obtain the target presentation file.
- 根据权利要求1所述的方法,其中,所述将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件,包括:The method according to claim 1, wherein said importing said at least two text fragments into said target presentation file template to obtain a target presentation file comprises:获取所述至少两个文本片段中每两个文本片段的关键词之间的从属关系;Acquiring the affiliation relationship between keywords of every two text fragments in the at least two text fragments;根据所述从属关系,对所述至少两个文本片段进行排序;Sort the at least two text fragments according to the affiliation;将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。At least two sorted text fragments are sequentially imported into the target presentation file template to obtain the target presentation file.
- 根据权利要求1所述的方法,其中,所述将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件,包括:The method according to claim 1, wherein said importing said at least two text fragments into said target presentation file template to obtain a target presentation file comprises:获取所述至少两个文本片段中每个文本片段对应的文本特征信息;Acquiring text feature information corresponding to each of the at least two text fragments;根据所述每个文本片段对应的文本特征信息确定所述每个文本片段的预处理方式,所述预处理方式包括简化处理、拆分处理、校正处理、排版处理中的至少一种;Determining a preprocessing manner for each text fragment according to the text feature information corresponding to each text fragment, where the preprocessing manner includes at least one of simplification processing, split processing, correction processing, and typesetting processing;根据所述每个文本片段的预处理方式对所述每个文本片段进行处理,得到处理后的至少两个文本片段;Processing each of the text fragments according to the preprocessing manner of each of the text fragments to obtain at least two processed text fragments;将所述处理后的至少两个文本片段导入至所述目标演示文件模板中,得到所述目标演示文件。Import the processed at least two text fragments into the target presentation file template to obtain the target presentation file.
- 根据权利要求1所述的方法,其中,所述根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,包括:The method according to claim 1, wherein the dividing the file to be processed according to the at least two keywords to obtain at least two text fragments comprises:获取所述至少两个关键词中每两个相邻关键词之间的相似度;Acquiring the similarity between every two adjacent keywords in the at least two keywords;将相似度大于相似度阈值的关键词在所述待处理文件中对应的段落划分至同一个文本片段,以得到所述至少两个文本片段,其中,所述两个相邻关键词可以位于所述待处理文件中的相邻段落或者同一段落。The corresponding paragraphs in the file to be processed are divided into the same text segment to obtain the at least two text segments, wherein the two adjacent keywords can be located in the same text segment. Describe adjacent paragraphs or the same paragraph in the document to be processed.
- 根据权利要求8所述的方法,其中,所述获取所述至少两个关键词中每两个相邻关键词之间的相似度,包括:The method according to claim 8, wherein said obtaining the similarity between every two adjacent keywords in the at least two keywords comprises:计算所述至少两个关键词中每两个相邻关键词之间的距离,根据所述距离确定所述至少两个关键词中每两个相邻关键词之间的相似度。The distance between every two adjacent keywords in the at least two keywords is calculated, and the similarity between every two adjacent keywords in the at least two keywords is determined according to the distance.
- 一种演示文稿生成装置,其中,包括:A presentation generating device, which includes:获取模块,用于获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;The obtaining module is used to obtain at least two keywords in the file to be processed and the characteristic attribute information of the file to be processed. The characteristic attribute information includes the field to which the file to be processed belongs and the information in the file to be processed. At least one of the number of keywords and the subject of the document to be processed;划分模块,用于根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;A dividing module, configured to divide the file to be processed according to the at least two keywords to obtain at least two text fragments, one text fragment corresponding to the at least one keyword;识别模块,用于从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;The recognition module is used to identify the target presentation file template matching the characteristic attribute information of the file to be processed from the presentation file template library;导入模块,用于将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。The import module is used to import the at least two text fragments into the target presentation file template to obtain the target presentation file.
- 一种电子设备,其中,包括:An electronic device, including:处理器,适于实现一条或一条以上指令;以及,Processor, suitable for implementing one or more instructions; and,计算机可读存储介质,所述计算机可读存储介质存储有一条或一条以上指令,所述一条或一条以上指令适于由所述处理器加载并执行以下步骤:A computer-readable storage medium storing one or more instructions, and the one or more instructions are suitable for being loaded by the processor and executing the following steps:获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;Obtain at least two keywords in the file to be processed, and characteristic attribute information of the file to be processed, where the characteristic attribute information includes the field to which the file to be processed belongs, the number of keywords in the file to be processed, and At least one of the themes of the file to be processed;根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;Dividing the file to be processed according to the at least two keywords to obtain at least two text fragments, one text fragment corresponding to at least one keyword;从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;From the presentation file template library, identify the target presentation file template that matches the characteristic attribute information of the file to be processed;将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。Import the at least two text fragments into the target presentation file template to obtain the target presentation file.
- 根据权利要求11所述的电子设备,其中,所述特征属性信息包括所述待处理文件中的关键词的数量,所述处理器用于:11. The electronic device according to claim 11, wherein the characteristic attribute information includes the number of keywords in the file to be processed, and the processor is configured to:根据所述待处理文件中的关键词的数量确定所述至少两个文本片段中的文本片段数量;Determining the number of text fragments in the at least two text fragments according to the number of keywords in the file to be processed;获取所述演示文件模板库中每个演示文件模板所包括的演示文稿的数量;Acquiring the number of presentations included in each presentation file template in the presentation file template library;将所述演示文件模板库中所包括的演示文本的数量,与所述文本片段数量相同的演示文件模板,确定为所述目标演示文件模板。The number of presentation texts included in the presentation file template library and the presentation file template with the same number of text fragments are determined as the target presentation file template.
- 根据权利要求11所述的电子设备,其中,所述特征属性信息包括所述待处理文件的主题,所述处理器用于:11. The electronic device according to claim 11, wherein the characteristic attribute information includes the subject of the file to be processed, and the processor is configured to:获取所述演示文件模板库中的每个演示文件模板的主题;Acquiring the theme of each presentation file template in the presentation file template library;分别确定所述演示文件模板库中的每个演示文件模板的主题,与所述待处理文件的主题之间的匹配度;Respectively determine the matching degree between the theme of each presentation file template in the presentation file template library and the theme of the file to be processed;从所述演示文件模板库中选择匹配度最大的演示文件模板,作为所述目标演示文件模板。Select the presentation file template with the greatest matching degree from the presentation file template library as the target presentation file template.
- 根据权利要求11所述的电子设备,其中,所述特征属性信息包括所述待处理文件所属的领域,所述处理器用于:11. The electronic device according to claim 11, wherein the characteristic attribute information includes the field to which the file to be processed belongs, and the processor is configured to:根据所述待处理文件所属的领域,预测所述待处理文件对应的演示文件的属性信息,所述待处理文件对应的演示文件的属性信息包括所述待处理文件对应的演示文件的排版信息以及颜色信息;According to the field to which the file to be processed belongs, the attribute information of the presentation file corresponding to the file to be processed is predicted, and the attribute information of the presentation file corresponding to the file to be processed includes the typesetting information of the presentation file corresponding to the file to be processed and Color information获取所述演示文件模板库中每个演示文件模板的属性信息,所述每个演示文件模板的属性信息包括所述每个演示文件模板的排版信息以及颜色信息;Acquiring attribute information of each presentation file template in the presentation file template library, where the attribute information of each presentation file template includes typesetting information and color information of each presentation file template;将所述演示文件模板库中演示文件模板的属性信息,与所述待处理文件对应的演示文件的属性信息匹配度最大的演示文件模板,确定为所述目标演示文件模板。The attribute information of the presentation file template in the presentation file template library and the presentation file template with the greatest degree of matching with the attribute information of the presentation file corresponding to the file to be processed are determined as the target presentation file template.
- 根据权利要求11所述的电子设备,其中,所述处理器用于:The electronic device according to claim 11, wherein the processor is configured to:获取所述至少两个文本片段中每个文本片段的关键词在所述待处理文件中的位置信息;Acquiring the position information of the keyword of each of the at least two text fragments in the file to be processed;根据所述位置信息,对所述至少两个文本片段进行排序;Sort the at least two text fragments according to the location information;将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。At least two sorted text fragments are sequentially imported into the target presentation file template to obtain the target presentation file.
- 根据权利要求11所述的电子设备,其中,所述处理器用于:The electronic device according to claim 11, wherein the processor is configured to:获取所述至少两个文本片段中每两个文本片段的关键词之间的从属关系;Acquiring the affiliation relationship between keywords of every two text fragments in the at least two text fragments;根据所述从属关系,对所述至少两个文本片段进行排序;Sort the at least two text fragments according to the affiliation;将排序后的至少两个文本片段依次导入至所述目标演示文件模板中,得到所述目标演示文件。At least two sorted text fragments are sequentially imported into the target presentation file template to obtain the target presentation file.
- 根据权利要求11所述的电子设备,其中,所述处理器用于:The electronic device according to claim 11, wherein the processor is configured to:获取所述至少两个文本片段中每个文本片段对应的文本特征信息;Acquiring text feature information corresponding to each of the at least two text fragments;根据所述每个文本片段对应的文本特征信息确定所述每个文本片段的预处理方式,所述预处理方式包括简化处理、拆分处理、校正处理、排版处理中的至少一种;Determining a preprocessing manner for each text fragment according to the text feature information corresponding to each text fragment, where the preprocessing manner includes at least one of simplification processing, split processing, correction processing, and typesetting processing;根据所述每个文本片段的预处理方式对所述每个文本片段进行处理,得到处理后的至少两个文本片段;Processing each of the text fragments according to the preprocessing manner of each of the text fragments to obtain at least two processed text fragments;将所述处理后的至少两个文本片段导入至所述目标演示文件模板中,得到所述目标演示文件。Import the processed at least two text fragments into the target presentation file template to obtain the target presentation file.
- 根据权利要求11所述的电子设备,其中,所述处理器用于:The electronic device according to claim 11, wherein the processor is configured to:获取所述至少两个关键词中每两个相邻关键词之间的相似度;Acquiring the similarity between every two adjacent keywords in the at least two keywords;将相似度大于相似度阈值的关键词在所述待处理文件中对应的段落划分至同一个文本片段,以得到所述至少两个文本片段,其中,所述两个相邻关键词可以位于所述待处理文件中的相邻段落或者同一段落。The corresponding paragraphs in the file to be processed are divided into the same text segment to obtain the at least two text segments, wherein the two adjacent keywords can be located in the same text segment. Describe adjacent paragraphs or the same paragraph in the document to be processed.
- 根据权利要求18所述的电子设备,其中,所述处理器用于:The electronic device according to claim 18, wherein the processor is configured to:计算所述至少两个关键词中每两个相邻关键词之间的距离,根据所述距离确定所述至少两个关键词中每两个相邻关键词之间的相似度。The distance between every two adjacent keywords in the at least two keywords is calculated, and the similarity between every two adjacent keywords in the at least two keywords is determined according to the distance.
- 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有一条或者一条以上指令,所述一条或一条以上指令适于由处理器加载并执行以下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing the following steps:获取待处理文件中的至少两个关键词,以及所述待处理文件的特征属性信息,所述特征属性信息包括所述待处理文件所属的领域、所述待处理文件中的关键词的数量以及所述待处理文件的主题中的至少一种;Obtain at least two keywords in the file to be processed, and characteristic attribute information of the file to be processed, where the characteristic attribute information includes the field to which the file to be processed belongs, the number of keywords in the file to be processed, and At least one of the themes of the file to be processed;根据所述至少两个关键词对所述待处理文件进行划分,得到至少两个文本片段,一个文本片段与至少一个关键词相对应;Dividing the file to be processed according to the at least two keywords to obtain at least two text fragments, one text fragment corresponding to at least one keyword;从演示文件模板库中,识别与所述待处理文件的特征属性信息匹配的目标演示文件模板;From the presentation file template library, identify the target presentation file template that matches the characteristic attribute information of the file to be processed;将所述至少两个文本片段导入至所述目标演示文件模板中,得到目标演示文件。Import the at least two text fragments into the target presentation file template to obtain the target presentation file.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010686330.6A CN111930976B (en) | 2020-07-16 | 2020-07-16 | Presentation generation method, device, equipment and storage medium |
CN202010686330.6 | 2020-07-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021139242A1 true WO2021139242A1 (en) | 2021-07-15 |
Family
ID=73313954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/118349 WO2021139242A1 (en) | 2020-07-16 | 2020-09-28 | Presentation file generation method, apparatus, and device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111930976B (en) |
WO (1) | WO2021139242A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118041A (en) * | 2021-11-01 | 2022-03-01 | 深圳前海微众银行股份有限公司 | Text generation method and device and storage medium |
CN114254610A (en) * | 2021-12-17 | 2022-03-29 | 广州金山移动科技有限公司 | Slide processing method and device and electronic equipment |
CN114547474A (en) * | 2022-04-21 | 2022-05-27 | 北京泰迪熊移动科技有限公司 | Data searching method, system, electronic equipment and storage medium |
CN114912425A (en) * | 2022-05-17 | 2022-08-16 | 中国银行股份有限公司 | Presentation generation method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491426A (en) * | 2016-10-25 | 2017-12-19 | 宝沃汽车(中国)有限公司 | The automatic generation method and system of PPT reports |
CN110390091A (en) * | 2018-04-18 | 2019-10-29 | 成都野望数码科技有限公司 | A kind of typesetting document structure tree method, device and equipment |
CN110489735A (en) * | 2018-05-14 | 2019-11-22 | 成都野望数码科技有限公司 | A kind of PowerPoint generation method, device and equipment |
WO2020000812A1 (en) * | 2018-06-29 | 2020-01-02 | 平安科技(深圳)有限公司 | Method and apparatus for generating ppt document, storage medium, and computer device |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003208083A (en) * | 2002-01-11 | 2003-07-25 | Nippon Telegr & Teleph Corp <Ntt> | Method and device for generating teaching material, teaching material generating program, and storage medium with the teaching material generating program stored therein |
US20050223314A1 (en) * | 2004-03-31 | 2005-10-06 | Satyam Computer Services Inc. | System and method for automatic generation of presentations based on agenda |
KR100841066B1 (en) * | 2007-12-26 | 2008-06-25 | 천지인커뮤니케이션주식회사 | Method for working multimedia presentation document |
US20110196862A1 (en) * | 2010-02-05 | 2011-08-11 | International Business Machines Corporation | Outline-based composition and search of presentation material |
WO2012095971A1 (en) * | 2011-01-13 | 2012-07-19 | 三菱電機株式会社 | Classification rule generation device, classification rule generation method, classification rule generation program and recording medium |
US20120210351A1 (en) * | 2011-02-11 | 2012-08-16 | Microsoft Corporation | Presentation of customized digital media programming |
US20160092419A1 (en) * | 2014-09-30 | 2016-03-31 | Microsoft Technology Licensing, Llc | Structured Sample Authoring Content |
CN105528332A (en) * | 2014-10-23 | 2016-04-27 | 珠海金山办公软件有限公司 | A presentation file processing method and device |
CN105740213B (en) * | 2014-12-10 | 2018-11-16 | 珠海金山办公软件有限公司 | A kind of PowerPoint template provider method and device |
CN105786784B (en) * | 2014-12-23 | 2018-10-16 | 珠海金山办公软件有限公司 | A kind of PowerPoint template replacing method and device |
CN105701082A (en) * | 2016-01-13 | 2016-06-22 | 刘敏 | Automatic typesetting method and system for presentation document |
US11481550B2 (en) * | 2016-11-10 | 2022-10-25 | Google Llc | Generating presentation slides with distilled content |
US10380259B2 (en) * | 2017-05-22 | 2019-08-13 | International Business Machines Corporation | Deep embedding for natural language content based on semantic dependencies |
CN108287927B (en) * | 2018-03-05 | 2019-10-22 | 北京百度网讯科技有限公司 | For obtaining the method and device of information |
US10977484B2 (en) * | 2018-03-19 | 2021-04-13 | Microsoft Technology Licensing, Llc | System and method for smart presentation system |
CN108509405A (en) * | 2018-04-11 | 2018-09-07 | 北京深度智耀科技有限公司 | A kind of generation method of PowerPoint, device and equipment |
US11003638B2 (en) * | 2018-10-29 | 2021-05-11 | Beijing Jingdong Shangke Information Technology Co., Ltd. | System and method for building an evolving ontology from user-generated content |
CN109885816A (en) * | 2019-01-18 | 2019-06-14 | 泰康保险集团股份有限公司 | Production method, device and the storage medium of PowerPoint |
CN111291210B (en) * | 2020-01-14 | 2024-04-12 | 广州视源电子科技股份有限公司 | Image material library generation method, image material recommendation method and related devices |
-
2020
- 2020-07-16 CN CN202010686330.6A patent/CN111930976B/en active Active
- 2020-09-28 WO PCT/CN2020/118349 patent/WO2021139242A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491426A (en) * | 2016-10-25 | 2017-12-19 | 宝沃汽车(中国)有限公司 | The automatic generation method and system of PPT reports |
CN110390091A (en) * | 2018-04-18 | 2019-10-29 | 成都野望数码科技有限公司 | A kind of typesetting document structure tree method, device and equipment |
CN110489735A (en) * | 2018-05-14 | 2019-11-22 | 成都野望数码科技有限公司 | A kind of PowerPoint generation method, device and equipment |
WO2020000812A1 (en) * | 2018-06-29 | 2020-01-02 | 平安科技(深圳)有限公司 | Method and apparatus for generating ppt document, storage medium, and computer device |
Also Published As
Publication number | Publication date |
---|---|
CN111930976A (en) | 2020-11-13 |
CN111930976B (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8577882B2 (en) | Method and system for searching multilingual documents | |
WO2021139242A1 (en) | Presentation file generation method, apparatus, and device and storage medium | |
US9514417B2 (en) | Cloud-based plagiarism detection system performing predicting based on classified feature vectors | |
US8341528B2 (en) | Managing the content of shared slide presentations | |
CN111737499B (en) | Data searching method based on natural language processing and related equipment | |
CN112287914B (en) | PPT video segment extraction method, device, equipment and medium | |
KR20200094627A (en) | Method, apparatus, device and medium for determining text relevance | |
CN111414122B (en) | Intelligent text processing method and device, electronic equipment and storage medium | |
US11436282B2 (en) | Methods, devices and media for providing search suggestions | |
WO2022140900A1 (en) | Method and apparatus for constructing personal knowledge graph, and related device | |
JP6693582B2 (en) | Document abstract generation method, device, electronic device, and computer-readable storage medium | |
CN111291177A (en) | Information processing method and device and computer storage medium | |
WO2021169186A1 (en) | Text duplicate checking method, electronic device and computer-readable storage medium | |
US11138426B2 (en) | Template matching, rules building and token extraction | |
WO2023045184A1 (en) | Text category recognition method and apparatus, computer device, and medium | |
US11829710B2 (en) | Deriving global intent from a composite document to facilitate editing of the composite document | |
US9141867B1 (en) | Determining word segment boundaries | |
CN113407536A (en) | Method and device for associating table data, terminal equipment and medium | |
CN114168715A (en) | Method, device and equipment for generating target data set and storage medium | |
WO2021114824A1 (en) | Presentation generation method, apparatus, and device, and medium | |
CN117077668A (en) | Risk image display method, apparatus, computer device, and readable storage medium | |
CN116225956A (en) | Automated testing method, apparatus, computer device and storage medium | |
US11663215B2 (en) | Selectively targeting content section for cognitive analytics and search | |
US10380167B1 (en) | Multi-volume content mapping | |
CN114637839B (en) | Text highlighting method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20911734 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20911734 Country of ref document: EP Kind code of ref document: A1 |