CN118586375A - Text generation method, device and system, computing device, and readable storage medium - Google Patents
Text generation method, device and system, computing device, and readable storage medium Download PDFInfo
- Publication number
- CN118586375A CN118586375A CN202410625183.XA CN202410625183A CN118586375A CN 118586375 A CN118586375 A CN 118586375A CN 202410625183 A CN202410625183 A CN 202410625183A CN 118586375 A CN118586375 A CN 118586375A
- Authority
- CN
- China
- Prior art keywords
- data
- tasks
- target
- task
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 100
- 238000012805 post-processing Methods 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 24
- 230000010354 integration Effects 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 description 43
- 230000000875 corresponding effect Effects 0.000 description 32
- 238000012552 review Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 239000000463 material Substances 0.000 description 8
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 7
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 7
- 230000001419 dependent effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 3
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 208000004547 Hallucinations Diseases 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
技术领域Technical Field
本说明书实施例涉及计算机技术领域,特别涉及一种文本生成方法;同时还涉及一种文本生成装置,一种文本生成系统,一种计算设备,一种计算机可读存储介质,以及一种计算机程序产品。The embodiments of this specification relate to the field of computer technology, and in particular to a text generation method; they also relate to a text generation device, a text generation system, a computing device, a computer-readable storage medium, and a computer program product.
背景技术Background Art
随着计算机技术的发展,人们生活工作中的很多各种任务均可以利用计算机来执行。With the development of computer technology, many tasks in people's lives and work can be performed using computers.
示例地,可以利用计算机对一些写作所需基于的基础数据进行分析以生成符合写作要求的文本。通常,针对待生成的文本会设置相应的文本生成模板,该文本生成模板可以包括多条配置信息,该配置信息可以指示待生成的文本的主题以及需要包含的内容等。计算机可以基于该文本生成模板对基础数据进行分析,以生成符合该文本生成模板的文本。For example, a computer may be used to analyze some basic data required for writing to generate text that meets the writing requirements. Usually, a corresponding text generation template is set for the text to be generated, and the text generation template may include multiple configuration information, which may indicate the theme of the text to be generated and the content to be included. The computer may analyze the basic data based on the text generation template to generate text that meets the text generation template.
但是,目前的文本生成效率仍有待提高。However, the current text generation efficiency still needs to be improved.
发明内容Summary of the invention
本说明书实施例提供了一种文本生成方法,可以提升文本生成效率。本说明书一个或者多个实施例同时还涉及一种文本生成装置,一种文本生成系统,一种计算设备,一种计算机可读存储介质,以及一种计算机程序产品。The embodiments of this specification provide a text generation method, which can improve the efficiency of text generation. One or more embodiments of this specification also relate to a text generation device, a text generation system, a computing device, a computer-readable storage medium, and a computer program product.
根据本说明书实施例的一方面,提供了一种文本生成方法,所述方法包括:According to one aspect of an embodiment of this specification, a text generation method is provided, the method comprising:
获取待分析的目标数据,以及基于所述目标数据生成文本所需依照的文本生成模板;Acquire target data to be analyzed, and a text generation template required to generate text based on the target data;
基于所述文本生成模板中的各条配置信息,确定针对所述目标数据待执行的多个任务;Determining, based on each piece of configuration information in the text generation template, a plurality of tasks to be performed on the target data;
基于所述多个任务的依赖关系,对所述多个任务进行拓扑排序,确定所述多个任务中支持并行执行的目标任务;Based on the dependency relationship of the multiple tasks, topologically sort the multiple tasks to determine a target task that supports parallel execution among the multiple tasks;
针对所述目标数据并行执行所述目标任务,并整合各目标任务的执行结果,生成符合所述文本生成模板的文本。The target tasks are executed in parallel for the target data, and the execution results of the target tasks are integrated to generate text that conforms to the text generation template.
根据本说明书实施例的另一方面,提供了一种文本生成装置,包括:According to another aspect of the embodiments of this specification, there is provided a text generation device, including:
获取模块,用于获取待分析的目标数据,以及基于所述目标数据生成文本所需依照的文本生成模板;An acquisition module, used to acquire target data to be analyzed, and a text generation template required to generate text based on the target data;
第一确定模块,用于基于所述文本生成模板中的各条配置信息,确定针对所述目标数据待执行的多个任务;A first determination module, configured to determine a plurality of tasks to be performed on the target data based on each piece of configuration information in the text generation template;
第二确定模块,用于基于所述多个任务的依赖关系,对所述多个任务进行拓扑排序,确定所述多个任务中支持并行执行的目标任务;A second determining module is used to topologically sort the multiple tasks based on the dependency relationship of the multiple tasks, and determine a target task that supports parallel execution among the multiple tasks;
文本生成模块,用于针对所述目标数据并行执行所述目标任务,并整合各目标任务的执行结果,生成符合所述文本生成模板的文本。The text generation module is used to execute the target tasks in parallel for the target data, and integrate the execution results of each target task to generate text that conforms to the text generation template.
根据本说明书实施例的再一方面,提供了一种文本生成系统,包括终端和服务端;According to another aspect of the embodiments of this specification, there is provided a text generation system, including a terminal and a server;
所述终端用于:向所述服务端发送针对目标数据的文本生成指令和基于所述目标数据生成文本所需依照的文本生成模板;The terminal is used to: send to the server a text generation instruction for target data and a text generation template required to generate text based on the target data;
所述服务端用于:响应于所述文本生成指令执行上述的方法,生成符合所述文本生成模板的文本,且向所述终端发送生成的文本。The server is used to: execute the above method in response to the text generation instruction, generate text that conforms to the text generation template, and send the generated text to the terminal.
根据本说明书实施例的又一方面,提供了一种计算设备,包括:存储器和处理器;According to another aspect of the embodiments of this specification, there is provided a computing device, including: a memory and a processor;
所述存储器用于存储计算机程序/指令,所述处理器用于执行所述计算机程序/指令,所述程序/指令被处理器执行时实现上述方法的步骤。The memory is used to store computer programs/instructions, and the processor is used to execute the computer programs/instructions. When the programs/instructions are executed by the processor, the steps of the above method are implemented.
根据本说明书实施例的再一方面,提供了一种计算机可读存储介质,其存储有计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述方法的步骤。According to another aspect of the embodiments of this specification, a computer-readable storage medium is provided, which stores a computer program/instruction, and the steps of the above method are implemented when the computer program/instruction is executed by a processor.
根据本说明书实施例的又一方面,提供了一种计算机程序产品,包括计算机程序/指令,当所述计算机程序产品中的计算机程序/指令被处理器执行时实现上述方法的步骤。According to another aspect of the embodiments of this specification, a computer program product is provided, including a computer program/instruction, and when the computer program/instruction in the computer program product is executed by a processor, the steps of the above method are implemented.
本说明书一个实施例中,可以获取待分析的目标数据以及生成文本所需依照的文本生成模板,在基于待分析的目标数据依照该文本生成模板生成文本的过程中,可以基于该文本生成模板中的各条配置信息确定针对目标数据待执行的多个任务,且针对待执行的任务基于其依赖关系进行拓扑排序,以确定支持并行执行的目标任务且对其并行执行,进而整合各任务的执行结果,生成符合文本生成模板的文本。如此,可以使生成模式化文本的过程(如范式写作)灵活性较高,且通过合理地并行执行文本生成过程中的目标任务,可以提升文本生成效率。In one embodiment of the present specification, the target data to be analyzed and the text generation template required to generate the text can be obtained. In the process of generating the text according to the text generation template based on the target data to be analyzed, multiple tasks to be executed for the target data can be determined based on the various configuration information in the text generation template, and the tasks to be executed are topologically sorted based on their dependencies to determine the target tasks that support parallel execution and execute them in parallel, and then the execution results of each task are integrated to generate text that conforms to the text generation template. In this way, the process of generating patterned text (such as paradigm writing) can be made more flexible, and the efficiency of text generation can be improved by reasonably executing the target tasks in the text generation process in parallel.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是相关技术提供的一种基于模板生成文本的简易流程图;FIG1 is a simplified flow chart of a template-based text generation method provided by the related art;
图2是本说明书一实施例提供的一种文本生成系统的结构示意图;FIG2 is a schematic diagram of the structure of a text generation system provided in an embodiment of this specification;
图3是本说明书一实施例提供的一种文本生成方法的流程图;FIG3 is a flow chart of a text generation method provided by an embodiment of this specification;
图4是本说明书一实施例提供的一种对多个任务进行拓扑排序所得结果的示意图;FIG4 is a schematic diagram of a result of topologically sorting multiple tasks provided in an embodiment of this specification;
图5是本说明书一实施例提供的一种对目标任务中多个子任务进行拓扑排序所得结果的示意图;FIG5 is a schematic diagram of a result of topologically sorting multiple subtasks in a target task provided by an embodiment of this specification;
图6是本说明书一实施例提供的一种文本生成装置的结构示意图;FIG6 is a schematic diagram of the structure of a text generation device provided in an embodiment of this specification;
图7是本说明书一实施例提供的一种计算设备的结构框图。FIG. 7 is a structural block diagram of a computing device provided in an embodiment of the present specification.
具体实施方式DETAILED DESCRIPTION
在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本说明书内涵的情况下做类似推广,因此本说明书不受下面公开的具体实施的限制。Many specific details are described in the following description to facilitate a full understanding of this specification. However, this specification can be implemented in many other ways than those described herein, and those skilled in the art can make similar generalizations without violating the connotation of this specification, so this specification is not limited to the specific implementation disclosed below.
在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。本说明书一个或多个实施例中的术语“至少一个”指的是“一个或多个”,“多个”指的是“两个或两个以上”。术语“包括”为开放性的描述,应当理解为“包括但不限定”,在已描述的内容的基础上还可能包括其他内容。The terms used in one or more embodiments of this specification are for the purpose of describing specific embodiments only and are not intended to limit one or more embodiments of this specification. The singular forms of "one", "said" and "the" used in one or more embodiments of this specification and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used in one or more embodiments of this specification refers to and includes any or all possible combinations of one or more associated listed items. The term "at least one" in one or more embodiments of this specification refers to "one or more", and "multiple" refers to "two or more". The term "including" is an open description and should be understood as "including but not limited to", and may include other content on the basis of the content already described.
应当理解,尽管在本说明书一个或多个实施例中可能采用术语“第一”、“第二”等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,“第一”也可以被称为“第二”,类似地,“第二”也可以被称为“第一”。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms "first", "second", etc. may be used to describe various information in one or more embodiments of this specification, this information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of one or more embodiments of this specification, "first" may also be referred to as "second", and similarly, "second" may also be referred to as "first". Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".
此外,本说明书一个或多个实施例所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关的标准和要求,并提供有相应的操作入口,供用户选择授权或者拒绝。In addition, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in one or more embodiments of this specification are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant standards and requirements, and corresponding operation entrances are provided for users to choose to authorize or refuse.
本说明书一个或多个实施例中,大模型是指具有大规模模型参数的深度学习模型,通常包含上亿、上百亿、上千亿、上万亿甚至十万亿以上的模型参数。大模型又可以称为基石模型/基础模型(Foundation Model),通过大规模无标注的语料进行大模型的预训练,产出亿级以上参数的预训练模型,这种模型能适应广泛的下游任务,模型具有较好的泛化能力,例如大规模语言模型(Large Language Model,LLM)、多模态预训练模型(multi-modal pre-training model)等。In one or more embodiments of this specification, a large model refers to a deep learning model with large-scale model parameters, which usually contains hundreds of millions, tens of billions, hundreds of billions, trillions, or even more than 10 trillion model parameters. A large model can also be called a foundation model/foundation model. The large model is pre-trained with large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters. This model can adapt to a wide range of downstream tasks, and the model has good generalization ability, such as a large-scale language model (LLM), a multi-modal pre-training model, etc.
大模型在实际应用时,仅需少量样本对预训练模型进行微调即可应用于不同的任务中,大模型可以广泛应用于自然语言处理(Natural Language Processing,简称NLP)、计算机视觉等领域,具体可以应用于如视觉问答(Visual Question Answering,简称VQA)、图像描述(Image Caption,简称IC)、图像生成等计算机视觉领域任务,以及基于文本的情感分类、文本摘要生成、机器翻译等自然语言处理领域任务,大模型主要的应用场景包括数字助理、智能机器人、搜索、在线教育、办公软件、电子商务、智能设计等。本说明书实施例中涉及的模型包括但不限定于上述的大模型,可以是任意类型的机器学习模型。如可以为卷积神经网络(CNN,convolutional neural network)模型、Transformer模型等。In actual application, the large model only needs a small amount of samples to fine-tune the pre-trained model and can be applied to different tasks. The large model can be widely used in natural language processing (Natural Language Processing, referred to as NLP), computer vision and other fields, and can be specifically applied to computer vision tasks such as visual question answering (Visual Question Answering, referred to as VQA), image description (Image Caption, referred to as IC), image generation, and text-based sentiment classification, text summary generation, machine translation and other natural language processing tasks. The main application scenarios of the large model include digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc. The model involved in the embodiments of this specification includes but is not limited to the above-mentioned large model, which can be any type of machine learning model. For example, it can be a convolutional neural network (CNN, convolutional neural network) model, a Transformer model, etc.
随着计算机技术以及信息数字化的发展,各种数据越来越多,对于数据效率的要求也越来越高。很多原本由人工执行的任务均可以由计算机执行,以降低人工成本,且提升任务执行效率。示例地,在很多领域中需要对已有的数据进行分析,并将针对该数据的分析结果按照设定的模板写成文本,该过程可以称为范式写作。如在金融领域,金融机构需要针对其财报进行分析,以得到该财报的点评文本。点评文本中需要包括的内容类型可以由设定的模板指示,如该模板可以指示点评文本需包含点评针对的金融机构,财报的真实度、完善度,以及金融结构的收益趋势等内容。With the development of computer technology and information digitization, various data are increasing, and the requirements for data efficiency are getting higher and higher. Many tasks that were originally performed manually can be performed by computers to reduce labor costs and improve task execution efficiency. For example, in many fields, it is necessary to analyze existing data and write the analysis results of the data into text according to a set template. This process can be called paradigm writing. For example, in the financial field, financial institutions need to analyze their financial reports to obtain a review text for the financial report. The type of content that needs to be included in the review text can be indicated by a set template. For example, the template can indicate that the review text must include the financial institution targeted by the review, the authenticity and completeness of the financial report, and the earnings trend of the financial structure.
目前,自然语言模型的应用越来越广泛,自然语言模型可以应用于写作场景,以生成符合写作要求的文本。如该自然语言模型可以包括大规模语言模型LLM。大规模语言模型在自然语言处理上展示了强大的能力,但其在实际应用中仍面临一定的挑战,如大规模语言模型会存在幻觉现象,其知识更新较为缓慢,并且在输出回复的过程中信息透明度较低。针对这些问题,通常采用检索增强生成(RAG,Retrieval-Augmented Generation)技术来进行改善。RAG指的是在使用LLM回答问题之前,从外部知识库中检索相关信息,以使LLM结合该相关信息以及回答问题原本需分析的数据进行问题回答。利用RAG可以显著提高LLM输出的答案的准确性,减少LLM的幻觉现象,特别是对于知识密集型任务效果更加显著。LLM可以通过RAG使输出的答案显示引用来源,以便于用户验证答案的准确性,并增加对模型输出的信任。利用RAG还促进了知识更新和特定领域知识的引入,可以使LLM有效结合其参数化知识和非参数化的外部知识库,保证LLM工作的可靠性。At present, the application of natural language models is becoming more and more extensive. Natural language models can be applied to writing scenarios to generate texts that meet writing requirements. For example, the natural language model can include a large-scale language model LLM. Large-scale language models have demonstrated powerful capabilities in natural language processing, but they still face certain challenges in practical applications, such as the existence of hallucinations in large-scale language models, slow knowledge updates, and low information transparency in the process of outputting replies. In response to these problems, retrieval-augmented generation (RAG) technology is usually used to improve them. RAG refers to retrieving relevant information from an external knowledge base before using LLM to answer questions, so that LLM can answer questions in combination with the relevant information and the data that originally needs to be analyzed to answer the questions. Using RAG can significantly improve the accuracy of the answers output by LLM and reduce the hallucination phenomenon of LLM, especially for knowledge-intensive tasks. LLM can use RAG to display the reference source of the output answer, so that users can verify the accuracy of the answer and increase trust in the model output. The use of RAG also promotes knowledge updating and the introduction of specific domain knowledge, which can enable LLM to effectively combine its parameterized knowledge and non-parameterized external knowledge base to ensure the reliability of LLM work.
范式写作也可以使用LLM基于RAG来实现。示例地,针对范式写作所需分析的目标数据,可以利用LLM查询外部数据源以获取目标数据的相关信息,然后LLM再回答相应的问题(query)或生成文本。如此保证LLM输出的回答是基于检索到的信息,可以显著提高写作内容的准确性和相关性。此外,可以通过预定义的模板指示的文本结构和生成规则,将LLM的输出转换为符合范式要求的文本输出。Paradigm writing can also be implemented using LLM based on RAG. For example, for the target data that needs to be analyzed in paradigm writing, LLM can be used to query external data sources to obtain relevant information of the target data, and then LLM can answer the corresponding questions (queries) or generate text. This ensures that the answers output by LLM are based on the retrieved information, which can significantly improve the accuracy and relevance of the writing content. In addition, the output of LLM can be converted into text output that meets the requirements of the paradigm through the text structure and generation rules indicated by the predefined template.
图1是相关技术提供的一种基于模板生成文本的简易流程图。如图1所示,文本生成装置可以先获取生成文本所需依照的模板,基于该模板中的配置信息得到生成的自然语言文本。如待生成的文本为财报点评文本,该模板为财报点评写作的模板。模板中的每行配置信息可以对应一种生成的文本中需包括的内容。文本生成装置可以依次针对模板中的各行配置信息,进行查询语句(query)改写(rewrite),将各配置信息映射为一系列的查询语句。该查询语句也即是自然语言模型对应的检索条件。FIG1 is a simple flow chart of generating text based on a template provided by the related art. As shown in FIG1 , the text generating device can first obtain the template required for generating the text, and obtain the generated natural language text based on the configuration information in the template. If the text to be generated is a financial report review text, the template is a template for writing financial report reviews. Each line of configuration information in the template can correspond to a content that needs to be included in a generated text. The text generating device can rewrite the query statement (query) for each line of configuration information in the template in turn, and map each configuration information into a series of query statements. The query statement is also the retrieval condition corresponding to the natural language model.
文本生成装置可以基于得到的查询语句从素材库中进行信息召回(recall),得到多个与查询语句相关联的信息。如可以通过向量召回方法或者BM25(Best Matching 25)算法进行信息召回。该素材库可以包括待点评的财报,以及与该财报相关的历史数据,如与该财报属于同一公司的年报及公告等数据。文本生成装置可以针对召回的结果进行排序(ranking),如按照与查询语句的匹配程度进行排序,得到排序后的多份素材。The text generation device can recall information from the material library based on the obtained query statement to obtain multiple pieces of information associated with the query statement. For example, information recall can be performed through a vector recall method or a BM25 (Best Matching 25) algorithm. The material library may include the financial report to be reviewed, as well as historical data related to the financial report, such as annual reports and announcements of the same company as the financial report. The text generation device can rank the results of the recall, such as by ranking according to the degree of match with the query statement, to obtain a plurality of ranked materials.
文本生成装置在获取到模板后还可以触发大语言模型的调用指令,以调用大语言模型在得到排序后的多份素材后,利用大语言模型基于该多份素材进行文本生成(generate),得到每个查询语句对应的一段点评文本。文本生成装置还可以针对得到的每段点评文本进行后处理(Post process),以得到最终的点评文本。如文本生成装置对该点评文本进行校验;或者,确定该点评文本是否符合需求,在不符合需求时重新生成点评文本;或者还可以执行一定的处理,以去除大语言模型本身可能存在的问题。After obtaining the template, the text generation device can also trigger the call instruction of the large language model to call the large language model to generate text based on the multiple materials after obtaining the sorted materials, and obtain a paragraph of review text corresponding to each query statement. The text generation device can also perform post-processing on each paragraph of review text obtained to obtain the final review text. For example, the text generation device verifies the review text; or determines whether the review text meets the requirements, and regenerates the review text if it does not meet the requirements; or performs certain processing to remove possible problems with the large language model itself.
但是,相关技术中生成文本的上述过程呈现较高的线性和刚性,导致文本生成效率较低,也即范式写作效率较低,且还缺乏一定的系统化。本说明书提供了一种文本生成方法,使用与相关技术不同的文本生成过程,可以提高文本生成的效率和质量。本说明书同时还涉及一种文本生成装置、一种文本生成系统、一种计算设备、一种计算机可读存储介质以及一种计算机程序产品。该文本生成装置和计算设备可以为一种服务端设备,或者也可以为计算能力较为强大的终端设备。However, the above process of generating text in the related art is highly linear and rigid, resulting in low text generation efficiency, that is, low paradigm writing efficiency, and lack of certain systematization. The present specification provides a text generation method, which uses a text generation process different from the related art to improve the efficiency and quality of text generation. The present specification also relates to a text generation device, a text generation system, a computing device, a computer-readable storage medium, and a computer program product. The text generation device and the computing device can be a server-side device, or can also be a terminal device with relatively powerful computing power.
图2是本说明书一实施例提供的一种文本生成系统的结构示意图,如图2所示,该文本生成系统10可以包括服务端101和终端102,终端102可以与服务端101建立通信连接。示例地,该服务端101可以为云端服务器或者服务器集群,该终端102可以为智能手机、台式电脑、笔记本电脑、平板电脑或者智能可穿戴设备等。FIG2 is a schematic diagram of the structure of a text generation system provided in an embodiment of the present specification. As shown in FIG2, the text generation system 10 may include a server 101 and a terminal 102, and the terminal 102 may establish a communication connection with the server 101. For example, the server 101 may be a cloud server or a server cluster, and the terminal 102 may be a smart phone, a desktop computer, a laptop computer, a tablet computer, or a smart wearable device.
本说明书提供的文本生成方法可以应用于该服务端101。终端102可以与服务端101进行交互,如可以向服务端101发送数据,或者从服务端101获取数据。示例地,终端102可以将待分析的目标数据和生成文本需依照的文本生成模板发送至服务端101,服务端101可以对接收到的目标数据进行处理,如执行本说明书实施例提供的文本生成方法,得到符合文本生成模板的文本,并向终端102反馈生成的文本。终端102可以向服务端101发送文本生成请求,该请求可以携带目标数据和文本生成模板。该文本生成模板能以文档的形式存在,或者也可以由终端102直接采用对话的形式向服务端101发送模板中的内容。The text generation method provided in this specification can be applied to the server 101. The terminal 102 can interact with the server 101, such as sending data to the server 101, or obtaining data from the server 101. For example, the terminal 102 can send the target data to be analyzed and the text generation template according to which the generated text is to be generated to the server 101, and the server 101 can process the received target data, such as executing the text generation method provided in the embodiment of this specification, obtaining the text that conforms to the text generation template, and feeding back the generated text to the terminal 102. The terminal 102 can send a text generation request to the server 101, and the request can carry the target data and the text generation template. The text generation template can exist in the form of a document, or the terminal 102 can directly send the content in the template to the server 101 in the form of a dialogue.
在一些实施方式中,若终端102的计算能力较强,则终端102也可以不与服务端101连接,由终端102直接执行本说明书实施例提供的文本生成方法。In some implementations, if the computing capability of the terminal 102 is relatively strong, the terminal 102 may not be connected to the server 101 , and the terminal 102 may directly execute the text generation method provided in the embodiments of this specification.
图3是本说明书一实施例提供的一种文本生成方法的流程图,下面以该方法应用于文本生成装置,且以该文本生成装置为服务端为例进行介绍。如图3所示,该方法可以包括如下步骤:FIG3 is a flow chart of a text generation method provided in an embodiment of the present specification. The following is an introduction to the method applied to a text generation device, and the text generation device is taken as a server. As shown in FIG3, the method may include the following steps:
步骤302、获取待分析的目标数据,以及基于目标数据生成文本所需依照的文本生成模板。Step 302: Obtain target data to be analyzed and a text generation template required to generate text based on the target data.
本说明书实施例中,文本生成装置可以针对一定的目标数据进行分析,以基于该目标数据生成所需的文本。该目标数据可以由终端指定,用户可以通过终端向文本生成装置发送待分析的目标数据。示例地,目标数据可以为财报,针对该目标数据待生成的文本可以为财报的点评文本。目标数据也可以为其他领域的数据,如企业的项目执行过程中的相关数据,待生成的文本可以为针对该项目执行过程的总结文本等。In the embodiments of this specification, the text generation device can analyze certain target data to generate the required text based on the target data. The target data can be specified by the terminal, and the user can send the target data to be analyzed to the text generation device through the terminal. For example, the target data can be a financial report, and the text to be generated for the target data can be a comment text of the financial report. The target data can also be data in other fields, such as relevant data in the project execution process of the enterprise, and the text to be generated can be a summary text of the project execution process, etc.
文本生成装置针对目标数据生成文本可以依照一定的文本生成模板,该文本生成模板可以用于规定待生成的文本中所需包含的内容。示例地,文本生成模板可以包括文本的主题,评价指标,以及需要回答的问题等。该文本生成模板也可以由终端指定,如由终端在发送目标数据时一并发送。可选地,该文本生成模板也可以存储在文本生成装置中,终端可以仅指示待生成的文本类型,由文本生成装置基于该文本类型确定对应的文本生成模板。The text generation device can generate text for the target data according to a certain text generation template, and the text generation template can be used to specify the content required to be included in the text to be generated. For example, the text generation template can include the theme of the text, evaluation indicators, and questions that need to be answered. The text generation template can also be specified by the terminal, such as being sent by the terminal when sending the target data. Optionally, the text generation template can also be stored in the text generation device, and the terminal can only indicate the type of text to be generated, and the text generation device determines the corresponding text generation template based on the text type.
可选地,终端也可以不直接向文本生成装置发送目标数据和文本生成模板,而向文本生成装置发送目标数据的指示信息,文本生成装置可以基于该指示信息从其他数据存储设备中获取目标数据和文本生成模板。Optionally, the terminal may not directly send the target data and the text generation template to the text generation device, but may send indication information of the target data to the text generation device, and the text generation device may obtain the target data and the text generation template from other data storage devices based on the indication information.
步骤304、基于文本生成模板中的各条配置信息,确定针对目标数据待执行的多个任务。Step 304: Determine multiple tasks to be performed on the target data based on each configuration information in the text generation template.
文本生成模板中的不同配置信息可以对应待生成的文本中的不同内容。如每条配置信息可以占用文本生成模板的一行,其可以为文本生成模板的基本单位,对应最终生成的文本中的一段基本内容。Different configuration information in the text generation template may correspond to different contents in the text to be generated. For example, each configuration information may occupy a line of the text generation template, which may be the basic unit of the text generation template, corresponding to a basic content in the final generated text.
文本生成装置可以针对每条配置信息执行至少一个任务(Tasks),以生成该配置信息对应的文本。相应地,可以基于每条配置信息确定对应的至少一个任务,该至少一个任务可以均为针对目标数据待执行的任务。The text generation device may execute at least one task for each piece of configuration information to generate text corresponding to the configuration information. Accordingly, at least one corresponding task may be determined based on each piece of configuration information, and the at least one task may be a task to be executed for the target data.
本说明书实施例中,文本生成装置可能还需执行与文本生成模板无关的任务,以生成所需的文本。文本生成装置可以将这些任务也确定为针对目标数据待执行的任务。示例地,针对各种文本的生成过程,文本生成装置可能均需执行某些固定的任务,如数据格式化任务、索引创建任务等,文本生成装置可以将这些任务也确定为针对目标数据待执行的任务。In the embodiments of this specification, the text generation device may also need to perform tasks unrelated to the text generation template to generate the required text. The text generation device can also determine these tasks as tasks to be performed for the target data. For example, for the generation process of various texts, the text generation device may need to perform certain fixed tasks, such as data formatting tasks, index creation tasks, etc. The text generation device can also determine these tasks as tasks to be performed for the target data.
步骤306、基于该多个任务的依赖关系,对多个任务进行拓扑排序,确定该多个任务中支持并行执行的目标任务。Step 306: Topologically sort the multiple tasks based on the dependency relationship of the multiple tasks, and determine the target task that supports parallel execution among the multiple tasks.
针对目标数据待执行的各任务中,会存在部分任务之间具有依赖关系,部分任务之间不具有依赖关系,该依赖关系也即是执行先后顺序的关系。如任务A的执行需要用到任务B的执行结果,则任务A与任务B具有依赖关系,任务A依赖于任务B。Among the tasks to be executed for the target data, some tasks have dependencies, while others do not. This dependency is the order of execution. For example, if the execution of task A requires the execution result of task B, then task A and task B have a dependency relationship, and task A depends on task B.
文本生成装置在确定针对目标数据待执行的各任务后,还可以分析各任务的依赖关系,进而基于该依赖关系,对各任务进行拓扑排序,针对各任务生成有向无环图(DAG,Directed Acyclic Graph)。基于该拓扑排序的结果可以直观地确定各任务的执行流程,且可以确定出需要依照先后顺序依次执行的任务,以及不具有先后执行的顺序限定,支持并行执行的目标任务。After determining the tasks to be executed for the target data, the text generation device can also analyze the dependencies of the tasks, and then topologically sort the tasks based on the dependencies, and generate a directed acyclic graph (DAG) for each task. Based on the result of the topological sorting, the execution process of each task can be intuitively determined, and the tasks that need to be executed in sequence can be determined, as well as the target tasks that do not have a sequential execution order and support parallel execution.
示例地,任务生成装置可以在对目标数据的格式化处理之后,才能针对目标数据进行检索,该格式化处理任务和该检索任务具有先后顺序。对于不同数据的格式化处理便可以不具有先后执行顺序,支持并行执行。For example, the task generation device can retrieve the target data only after formatting the target data, and the formatting task and the retrieval task have a sequence. The formatting of different data may not have a sequence of execution, and parallel execution is supported.
步骤308、针对目标数据并行执行目标任务,并整合各目标任务的执行结果,生成符合文本生成模板的文本。Step 308: Execute target tasks in parallel for target data, and integrate the execution results of each target task to generate text that conforms to the text generation template.
文本生成装置在针对各个任务进行拓扑排序,确定各个任务的执行顺序后,便可以按照该顺序针对目标数据执行各个任务。对于具有先后执行顺序的任务便按照该顺序依次执行,对于支持并行执行的目标任务便并行执行,以提升任务执行效率。对于每个任务,任务生成装置均使该任务在其所依赖的任务执行完毕后再开始执行。After the text generation device performs topological sorting on each task and determines the execution order of each task, it can execute each task on the target data in this order. For tasks with a sequential execution order, they are executed in sequence in this order, and for target tasks that support parallel execution, they are executed in parallel to improve the efficiency of task execution. For each task, the task generation device starts executing the task after the task on which it depends is completed.
由于并行执行的各个目标任务得到的执行结果是独立的,而最终需要生成完整的文本,故文本生成装置还可以按照各目标任务之间的关系,以及目标任务与整体文本的生成之间的关系,整合各目标任务的执行结果,以生成符合文本生成模板的文本。Since the execution results of each target task executed in parallel are independent, but a complete text needs to be generated ultimately, the text generation device can also integrate the execution results of each target task according to the relationship between the target tasks and the relationship between the target tasks and the generation of the overall text to generate text that conforms to the text generation template.
在一些实施方式中,文本生成过程中得到完整文本的步骤,仅依赖并行执行的多个目标任务,则直接按照设定的整合方式整合各目标任务的执行结果,即可得到最终生成的文本。在另一些实施方式中,多个目标任务并行执行后还需执行其他任务,此时在整合多个目标任务的执行结果后还需基于该执行结果进行继续处理,直至执行完生成完整文本的步骤才得到符合文本生成模板的文本。在文本生成过程中,可能存在多组目标任务,每组目标任务可以包括多个目标任务,同一组中的多个目标任务可以并行执行,不同组的目标任务可以存在依赖关系,或者也可以并行执行。In some embodiments, the step of obtaining the complete text in the text generation process only relies on multiple target tasks executed in parallel, and the execution results of each target task are directly integrated according to the set integration method to obtain the final generated text. In other embodiments, after multiple target tasks are executed in parallel, other tasks need to be executed. At this time, after integrating the execution results of multiple target tasks, it is necessary to continue processing based on the execution results until the step of generating the complete text is completed to obtain the text that conforms to the text generation template. In the text generation process, there may be multiple groups of target tasks, each group of target tasks may include multiple target tasks, multiple target tasks in the same group may be executed in parallel, and target tasks in different groups may have dependencies, or may also be executed in parallel.
本说明书实施例的文本生成方法相当于将范式写作过程拆分为多个独立的任务,根据这些任务之间的依赖关系,对支持并行处理的目标任务进行并行处理,可以有效地优化处理流程,提高范式写作的效率。The text generation method of the embodiment of this specification is equivalent to splitting the paradigm writing process into multiple independent tasks. According to the dependencies between these tasks, the target tasks that support parallel processing are processed in parallel, which can effectively optimize the processing flow and improve the efficiency of paradigm writing.
在一些实施方式中,上述步骤308中针对目标数据并行执行目标任务,包括:针对目标数据,利用协程池调用多个协程并行执行目标任务。协程运行在线程之上,当一个协程执行完成后,可以选择主动让出,让另一个协程运行在当前线程之上。如针对每个目标任务可以调用一个协程,具体利用该协程所运行的线程执行目标任务。利用协程可以有效地控制多任务的并发执行。本说明书实施例还可以利用协程池对各协程进行限制和重用,避免无限制的开辟协程,避免资源浪费。In some embodiments, executing the target task in parallel for the target data in the above step 308 includes: for the target data, using the coroutine pool to call multiple coroutines to execute the target tasks in parallel. The coroutine runs on the thread. When a coroutine is executed, it can choose to give up actively and let another coroutine run on the current thread. For example, a coroutine can be called for each target task, and the thread running by the coroutine is specifically used to execute the target task. The use of coroutines can effectively control the concurrent execution of multiple tasks. The embodiments of this specification can also use the coroutine pool to limit and reuse each coroutine to avoid unlimited opening of coroutines and waste of resources.
协程并没有增加线程数量,只是在线程的基础之上通过分时复用的方式运行多个协程,而且协程的切换在用户态完成,切换的代价比线程从用户态到内核态的代价小很多。Coroutines do not increase the number of threads. They just run multiple coroutines on the basis of threads through time-sharing multiplexing. Moreover, the switching of coroutines is completed in user state, and the cost of switching is much smaller than the cost of switching threads from user state to kernel state.
在一些实施方式中,可以设置最大并发任务数量,相应地,步骤308中针对目标数据并行执行目标任务可以包括:针对目标数据,基于设定的并发任务数量,控制小于或等于并发任务数量的目标任务并行执行。In some implementations, a maximum number of concurrent tasks may be set. Accordingly, executing target tasks in parallel for target data in step 308 may include: for the target data, based on the set number of concurrent tasks, controlling target tasks less than or equal to the number of concurrent tasks to be executed in parallel.
示例地,若支持并行执行的目标任务的数量大于设定的并发任务数量m,则文本生成装置可以先并行执行m个目标任务,之后再执行剩余的目标任务。可选地,文本生成装置也可以将各目标任务分为多组目标任务,每组中的目标任务数量相差较小且均小于或等于m,进而依次执行该多组目标任务,且使每组中的目标任务并行执行。For example, if the number of target tasks that support parallel execution is greater than the set number of concurrent tasks m, the text generation device may first execute m target tasks in parallel, and then execute the remaining target tasks. Optionally, the text generation device may also divide each target task into multiple groups of target tasks, with the number of target tasks in each group being slightly different and less than or equal to m, and then execute the multiple groups of target tasks in sequence, and execute the target tasks in each group in parallel.
本说明书实施例中,可以利用自然语言模型(如大语言模型LLM)实现文本生成,相应地可以提高基于LLM进行范式写作的效率。示例地,在步骤306中可以确定自然语言模型可以并行执行的目标任务,在步骤308中可以调用自然语言模型针对目标数据执行一个或多个任务。In the embodiments of this specification, a natural language model (such as a large language model LLM) can be used to implement text generation, and the efficiency of paradigm writing based on LLM can be improved accordingly. For example, in step 306, target tasks that can be executed in parallel by the natural language model can be determined, and in step 308, the natural language model can be called to execute one or more tasks for the target data.
综上所述,本说明书实施例提供的文本生成方法中,可以获取待分析的目标数据以及生成文本所需依照的文本生成模板,在基于待分析的目标数据依照该文本生成模板生成文本的过程中,可以基于该文本生成模板中的各条配置信息确定针对目标数据待执行的多个任务,且针对待执行的任务基于其依赖关系进行拓扑排序,以确定支持并行执行的目标任务且对其并行执行,进而整合各任务的执行结果,生成符合文本生成模板的文本。如此,可以使生成模式化文本的过程(如范式写作)灵活性较高,且通过合理地并行执行文本生成过程中的目标任务,可以提升文本生成效率。In summary, in the text generation method provided by the embodiment of this specification, the target data to be analyzed and the text generation template required to generate the text can be obtained. In the process of generating the text based on the target data to be analyzed according to the text generation template, multiple tasks to be executed for the target data can be determined based on the various configuration information in the text generation template, and the tasks to be executed are topologically sorted based on their dependencies to determine the target tasks that support parallel execution and execute them in parallel, and then the execution results of each task are integrated to generate text that conforms to the text generation template. In this way, the process of generating patterned text (such as paradigm writing) can be made more flexible, and the efficiency of text generation can be improved by reasonably executing the target tasks in the text generation process in parallel.
在一些实施方式中,上述步骤304基于文本生成模板中的各条配置信息,确定针对目标数据待执行的多个任务,包括:In some implementations, the above step 304 determines multiple tasks to be performed on the target data based on each configuration information in the text generation template, including:
基于文本生成模板中的各条配置信息,确定每条配置信息对应的至少一个数据查询任务,以及针对各数据查询任务的查询结果的整合任务;Based on each configuration information in the text generation template, determine at least one data query task corresponding to each configuration information, and an integration task for query results of each data query task;
针对目标数据中的各数据源,确定每个数据源对应的数据格式化任务和至少一个索引构建任务;For each data source in the target data, determine a data formatting task and at least one index building task corresponding to each data source;
基于数据查询任务、整合任务、数据格式化任务和索引构建任务,得到针对目标数据待执行的多个任务。Based on the data query task, the integration task, the data formatting task and the index building task, multiple tasks to be executed for the target data are obtained.
本说明书实施例中,文本生成过程中待执行的任务可以包括:针对目标数据的数据格式化(Format Data)任务和索引构建(Build Index)任务,还可以包括针对目标数据的数据查询任务,以及针对各数据查询任务的查询结果的整合(combination)任务。该整合任务可以属于文本生成中的后处理过程中的任务。In the embodiment of the present specification, the tasks to be performed in the text generation process may include: a data formatting task and an index building task for the target data, and may also include a data query task for the target data, and a combination task for the query results of each data query task. The combination task may belong to the tasks in the post-processing process in the text generation.
文本生成模板中的每条配置信息可以对应至少一个数据查询任务,该数据查询任务可以为从目标数据及其相关的数据中(如从上述素材库中),查询所需数据的任务。该数据查询任务可以为利用自然语言模型执行的任务。文本生成装置可以针对各数据查询任务的执行结果进行整合,以生成所需的文本,相应地可以确定依赖于数据查询任务的整合任务。Each configuration information in the text generation template may correspond to at least one data query task, and the data query task may be a task of querying required data from the target data and its related data (such as from the above-mentioned material library). The data query task may be a task executed using a natural language model. The text generation device may integrate the execution results of each data query task to generate the required text, and accordingly may determine the integration task that depends on the data query task.
待分析的目标数据中的数据可以较为繁杂,如可以包括多种数据源的数据,不同数据源的数据格式可能存在区别。本说明书实施例中所述的不同数据源的数据可以对应从不同设备收集的数据,也可以对应从同一设备的不同功能模块收集的数据,不同数据源的数据的内容、类型或形式存在区别。为了便于针对目标数据进行分析(如查询),文本生成装置在执行数据查询任务之前,可以先对目标数据进行格式化处理,以将目标数据转换成统一格式的数据。相应地,该格式化处理的过程可以对应数据格式化任务。文本生成装置可以针对不同的数据源分别进行格式化处理,相应地,每个数据源均可以对应有数据格式化任务。The data in the target data to be analyzed may be relatively complex, such as data from multiple data sources, and the data formats of different data sources may be different. The data from different data sources described in the embodiments of this specification may correspond to data collected from different devices, or may correspond to data collected from different functional modules of the same device, and the content, type or form of the data from different data sources may be different. In order to facilitate the analysis of the target data (such as query), the text generation device may format the target data before executing the data query task to convert the target data into data in a unified format. Accordingly, the formatting process may correspond to a data formatting task. The text generation device may perform formatting processing on different data sources respectively, and accordingly, each data source may correspond to a data formatting task.
文本生成装置在针对不同数据源的数据进行格式化处理后,还可以针对各数据源的数据构建索引,以便于后续基于索引快速地确定所需的数据以进行分析。相应地,针对各数据源可以设置索引构建任务。可选地,对于一个数据源可以仅建立一个索引构建任务;或者一个数据源中可能包括不同类型的数据,对于不同类型的数据也可以建立不同的索引构建任务,以基于该索引直接确定相应类型的数据。After formatting the data from different data sources, the text generation device can also build an index for the data from each data source, so as to quickly determine the required data for analysis based on the index. Accordingly, an index building task can be set for each data source. Optionally, only one index building task can be established for one data source; or one data source may include different types of data, and different index building tasks can be established for different types of data to directly determine the corresponding type of data based on the index.
针对同一数据源的数据格式化任务、索引构建任务、数据查询任务以及整合任务具有依次依赖关系,故针对统一数据源的这些任务不支持并行执行,但是针对不同数据源的这些任务可以并行执行。如上述步骤306基于多个任务的依赖关系,对多个任务进行拓扑排序,确定多个任务中支持并行执行的目标任务,可以包括:基于多个任务的依赖关系,对多个任务进行拓扑排序,确定针对不同数据源的数据格式化任务、索引构建任务和数据查询任务,属于支持并行执行的目标任务。The data formatting tasks, index building tasks, data query tasks and integration tasks for the same data source have sequential dependencies, so these tasks for the unified data source do not support parallel execution, but these tasks for different data sources can be executed in parallel. For example, the above step 306 topologically sorts the multiple tasks based on the dependencies of the multiple tasks to determine the target tasks that support parallel execution among the multiple tasks, which may include: topologically sorting the multiple tasks based on the dependencies of the multiple tasks to determine that the data formatting tasks, index building tasks and data query tasks for different data sources are target tasks that support parallel execution.
图4是本说明书一实施例提供的一种对多个任务进行拓扑排序所得结果的示意图。如图4所示,目标数据可以包括两个数据源的数据,如一个数据源的数据可以为一些财经数据,另一个数据源的数据可以为一些其他描述信息,针对该两种数据可以执行数据格式化任务,得到图4中示出的标准格式的财经数据和pdf解析文件。针对该两种数据可以分别执行索引构建任务。图4以针对财经数据执行一种索引构建任务,得到财经数据索引;针对块解析文件执行两种不同的索引构建任务,分别得到表格数据的索引和文本数据的索引为例。可选地,文本生成装置还可以获取目标数据之外的其他索引,如该其他索引可以为与目标数据相关的外部数据的索引。FIG4 is a schematic diagram of a result of topological sorting of multiple tasks provided by an embodiment of the present specification. As shown in FIG4 , the target data may include data from two data sources, such as data from one data source may be some financial data, and data from another data source may be some other descriptive information. A data formatting task may be performed for the two types of data to obtain financial data and a PDF parsing file in a standard format shown in FIG4 . Index building tasks may be performed for the two types of data, respectively. FIG4 takes an example of performing an index building task on financial data to obtain a financial data index; performing two different index building tasks on a block parsing file to obtain an index of table data and an index of text data, respectively. Optionally, the text generation device may also obtain other indexes other than the target data, such as the other index may be an index of external data related to the target data.
文本生成装置可以基于各数据的索引,调用多个数据查询实例执行数据查询任务。图4以该多个数据查询实例分别对应财经数据查询任务、问题查询任务和特征查询任务为例。不同的查询任务所针对的数据可以存在交叉或重叠。如图4中,财经数据查询任务可以基于财经数据索引和表格数据索引进行数据查询,问题查询任务可以基于文本数据索引和外部数据索引进行数据查询,特征查询任务可以基于表格数据索引、文本数据索引和外部数据索引进行数据查询。在该多个数据查询任务之后,可以执行后处理,如对各数据查询任务的查询结果进行整合。The text generation device can call multiple data query instances to perform data query tasks based on the indexes of each data. Figure 4 takes the multiple data query instances corresponding to the financial data query task, the problem query task and the feature query task as an example. The data targeted by different query tasks may be intersecting or overlapping. As shown in Figure 4, the financial data query task can perform data query based on the financial data index and the table data index, the problem query task can perform data query based on the text data index and the external data index, and the feature query task can perform data query based on the table data index, the text data index and the external data index. After the multiple data query tasks, post-processing can be performed, such as integrating the query results of each data query task.
示例地,从图4可知,针对两种数据的数据格式化任务和索引构建任务可以并行执行,各数据查询任务可以在索引构建任务执行完成后并行执行。获取外部数据索引的任务也可以与该数据格式化任务和索引构建任务并行执行。For example, as shown in Figure 4, the data formatting task and the index building task for the two types of data can be executed in parallel, and each data query task can be executed in parallel after the index building task is completed. The task of obtaining the external data index can also be executed in parallel with the data formatting task and the index building task.
在一些实施方式中,上述步骤306基于多个任务的依赖关系,对多个任务进行拓扑排序,确定多个任务中支持并行执行的目标任务,可以包括:基于多个任务的依赖关系,对多个任务进行拓扑排序,确定针对同一数据源的不同索引构建任务,属于支持并行执行的目标任务。示例地,如图4所示,针对块解析文件执行的两种索引构建任务,属于支持并行执行的目标任务。在进行数据格式化处理得到块解析文件后,可以并行地执行该两种索引构建任务。In some embodiments, the above step 306 topologically sorts the multiple tasks based on the dependency relationship of the multiple tasks, and determines the target task that supports parallel execution among the multiple tasks, which may include: topologically sorting the multiple tasks based on the dependency relationship of the multiple tasks, and determining that different index building tasks for the same data source are target tasks that support parallel execution. For example, as shown in FIG4, two index building tasks executed for block parsing files are target tasks that support parallel execution. After the data formatting process is performed to obtain the block parsing file, the two index building tasks can be executed in parallel.
本说明书实施例中,各个任务可以支持相互嵌套。示例地,文本生成过程中的任一任务可以包括多个子任务,该多个子任务也可以类似于前述任务的执行方式并行执行。以目标任务为例,文本生成装置可以基于目标任务包括的多个子任务的依赖关系,对该多个子任务进行拓扑排序,确定多个子任务中支持并行执行的目标子任务。进而,可以并行执行目标子任务,并整合各目标子任务的执行结果,得到目标任务的执行结果。In the embodiments of this specification, each task can support mutual nesting. For example, any task in the text generation process can include multiple subtasks, and the multiple subtasks can also be executed in parallel in a manner similar to the execution of the aforementioned tasks. Taking the target task as an example, the text generation device can topologically sort the multiple subtasks based on the dependency relationship of the multiple subtasks included in the target task, and determine the target subtask that supports parallel execution among the multiple subtasks. Furthermore, the target subtasks can be executed in parallel, and the execution results of each target subtask can be integrated to obtain the execution result of the target task.
示例地,目标任务为数据查询任务,数据查询任务可以为基于文本生成模板中的配置信息确定的待执行任务。该数据查询任务也可以为其他设定类型的数据的查询任务,如可以为直接设定的查询满足某些特征条件的数据的任务。目标任务可以包括需要依次执行的查询语句改写子任务、数据的召回子任务、分析子任务和后处理子任务。For example, the target task is a data query task, which can be a task to be executed based on the configuration information in the text generation template. The data query task can also be a query task for data of other set types, such as a task that directly queries data that meets certain characteristic conditions. The target task can include query statement rewriting subtasks, data recall subtasks, analysis subtasks, and post-processing subtasks that need to be executed in sequence.
上述步骤308中针对目标数据执行目标任务的过程可以包括:The process of executing the target task for the target data in the above step 308 may include:
步骤s2、针对文本生成模板中的每条配置信息,执行查询语句改写子任务,得到自然语言模型的提示信息。Step s2: For each configuration information in the text generation template, execute the query statement rewriting subtask to obtain prompt information of the natural language model.
文本生成模板中的每条配置信息可以指示待生成的文本中所需包含的一些内容。文本生成装置可以将该配置信息映射为利用自然语言模型生成文本,所需基于的提示信息(prompt)。如通过执行查询语句改写子任务来实现该映射。Each configuration information in the text generation template may indicate some content that needs to be included in the text to be generated. The text generation device may map the configuration information to prompt information (prompt) required to generate text using a natural language model. For example, the mapping is achieved by executing a query statement rewriting subtask.
步骤s4、针对目标数据执行提示信息的相关数据的召回子任务。Step s4: executing the subtask of recalling the relevant data of the prompt information for the target data.
针对任一提示信息,文本生成装置可以从素材库中召回该提示信息的相关数据,以得到多份素材。该相关信息可以基于与提示信息的匹配程度来确定。关于该召回子任务可以参考图1中关于信息召回的相关介绍,此处不作赘述。示例地,在目标数据包括多个数据源的情况下,针对不同数据源可以并行地执行提示信息的相关数据的召回子任务。For any prompt information, the text generation device can recall the relevant data of the prompt information from the material library to obtain multiple materials. The relevant information can be determined based on the degree of matching with the prompt information. For the recall subtask, please refer to the relevant introduction about information recall in Figure 1, which will not be repeated here. For example, when the target data includes multiple data sources, the recall subtask of the relevant data of the prompt information can be executed in parallel for different data sources.
可选地,文本生成装置可以基于数据索引进行相关数据的召回,针对不同数据索引可以从不同的存储位置进行数据召回。针对不同的数据索引可以并行地执行召回子任务。Optionally, the text generation device may recall relevant data based on the data index, and may recall data from different storage locations for different data indexes. Recall subtasks may be executed in parallel for different data indexes.
步骤s6、利用自然语言模型针对多个相关数据并行地执行分析子任务,得到每个相关数据与提示信息的关联度。Step s6: Utilize the natural language model to perform analysis subtasks in parallel on multiple related data to obtain the relevance between each related data and the prompt information.
在召回提示信息的多个相关数据后,文本生成装置可以利用自然语言模型对每个相关数据进行分析,以基于该相关数据生成与提示信息对应的文本。此过程可以参考图1中关于生成点评文本的介绍。文本生成装置还可以基于该相关数据的分析结果(如生成的文本),确定该分析结果的准确性,以确定该相关数据与提示信息的关联度。该关联度可以作为相关数据的得分(score)。After recalling multiple relevant data of the prompt information, the text generation device can analyze each relevant data using a natural language model to generate text corresponding to the prompt information based on the relevant data. This process can refer to the introduction of generating the review text in Figure 1. The text generation device can also determine the accuracy of the analysis result based on the analysis result of the relevant data (such as the generated text) to determine the correlation between the relevant data and the prompt information. The correlation can be used as a score of the relevant data.
步骤s8、基于各相关数据及其与提示信息的关联度执行后处理子任务,得到提示信息对应的结果文本,并基于该结果文本确定数据查询任务的执行结果。Step s8: Execute the post-processing subtask based on the relevant data and the correlation between the data and the prompt information to obtain the result text corresponding to the prompt information, and determine the execution result of the data query task based on the result text.
文本生成装置可以执行后处理子任务,如基于各相关数据的分析结果进行整合,或者针对该分析结果进一步处理或调整,以得到提示信息对应的结果文本。The text generation device may perform post-processing subtasks, such as integrating the analysis results based on the relevant data, or further processing or adjusting the analysis results to obtain a result text corresponding to the prompt information.
图5是本说明书一实施例提供的一种对目标任务中多个子任务进行拓扑排序所得结果的示意图,且图5以该目标任务对应图4中的问题查询任务为例进行说明。如图5所示,文本生成装置针对配置信息进行改写得到相应的提示信息。之后,可以分别基于文本数据索引和外部数据索引执行召回子任务,从文本数据中和外部数据中召回提示信息,如从该文本数据中召回文件1和文件2,从外部数据中召回文件3和文件4。针对每个文件均可以利用大语言模型LLM服务执行分析子任务,得到分析结果,且基于分析结果对每个文件进行打分。之后可以基于各文件的分析结果进行后处理,得到目标任务生成的结果文本。FIG5 is a schematic diagram of a result obtained by topologically sorting multiple subtasks in a target task provided by an embodiment of the present specification, and FIG5 is explained by taking the problem query task in FIG4 corresponding to the target task as an example. As shown in FIG5, the text generation device rewrites the configuration information to obtain corresponding prompt information. Afterwards, the recall subtask can be executed based on the text data index and the external data index respectively, and the prompt information can be recalled from the text data and the external data, such as recalling file 1 and file 2 from the text data, and recalling file 3 and file 4 from the external data. For each file, the large language model LLM service can be used to execute the analysis subtask to obtain the analysis result, and each file can be scored based on the analysis result. Afterwards, post-processing can be performed based on the analysis results of each file to obtain the result text generated by the target task.
对于图5所示的目标任务,基于基于文本数据索引和外部数据索引执行召回子任务的过程可以并行执行,针对召回的各个文件调用LLM服务进行分析的过程也可以并行执行。For the target task shown in FIG5 , the process of executing the recall subtask based on the text data index and the external data index can be executed in parallel, and the process of calling the LLM service for analyzing each recalled file can also be executed in parallel.
本说明书实施例中,文本生成装置可以设置重试机制,在针对任一任务执行失败后可以重新执行该任务。上述步骤s8可以包括:In the embodiment of this specification, the text generation device may set a retry mechanism, and may re-execute any task after the task fails. The above step s8 may include:
基于各相关数据及其与提示信息的关联度执行后处理子任务;Perform post-processing subtasks based on the relevant data and their relevance to the prompt information;
在后处理子任务的处理结果与提示信息的关联度均不符合关联度条件的情况下,重新执行召回子任务、分析子任务和后处理子任务;When the relevance between the processing result of the post-processing subtask and the prompt information does not meet the relevance condition, the recall subtask, the analysis subtask and the post-processing subtask are re-executed;
在后处理子任务的处理结果与提示信息的关联度符合关联度条件的情况下,将该处理结果确定为提示信息对应的结果文本。When the correlation between the processing result of the post-processing subtask and the prompt information meets the correlation condition, the processing result is determined as the result text corresponding to the prompt information.
关于后处理子任务可以参考前述图1中关于后处理的相关介绍。示例地,文本生成装置执行后处理子任务的过程中可以整合关联度大于阈值的相关数据的分析结果,得到后处理子任务的处理结果。又示例地,文本生成装置可以对关联度大于阈值的相关数据的分析结果进一步调整,得到后处理子任务的处理结果。For the post-processing subtask, please refer to the relevant introduction about post-processing in the aforementioned FIG. 1. By way of example, during the process of executing the post-processing subtask, the text generation device may integrate the analysis results of the relevant data with a correlation greater than a threshold value to obtain the processing result of the post-processing subtask. By way of another example, the text generation device may further adjust the analysis results of the relevant data with a correlation greater than a threshold value to obtain the processing result of the post-processing subtask.
文本生成装置还可以判断后处理子任务的处理结果是否满足该提示信息对应的文本生成需求,如通过确定该处理结果与提示信息的关联度是否符合关联度条件(如是否大于关联度阈值)的方式,来确定是否满足文本生成需求。在不符合关联度条件的情况下,可以重新基于提示信息执行召回子任务和分析子任务;在符合关联度条件的情况下,便可以将该后处理结果确定为提示信息对应的结果文本,也即是确定为目标任务的执行结果。如图5所示,在后处理结果无效的情况下,可以进行重试,以返回执行改写之后的流程。The text generation device can also determine whether the processing result of the post-processing subtask meets the text generation requirements corresponding to the prompt information, such as by determining whether the correlation between the processing result and the prompt information meets the correlation condition (such as whether it is greater than the correlation threshold) to determine whether the text generation requirements are met. If the correlation condition is not met, the recall subtask and the analysis subtask can be re-executed based on the prompt information; if the correlation condition is met, the post-processing result can be determined as the result text corresponding to the prompt information, that is, it is determined as the execution result of the target task. As shown in Figure 5, if the post-processing result is invalid, a retry can be performed to return to the process after the rewrite.
可选地,可以针对该重试机制还可以设置重试次数和重试时间间隔。文本生成装置可以在针对某一任务的重试次数小于设定的次数阈值,且与上一次执行该任务的时间间隔大于设定的重试时间间隔的情况下,才重新执行该任务。如重新执行上述召回子任务和分析子任务。Optionally, the number of retries and the retry time interval may be set for the retry mechanism. The text generation device may re-execute a task only when the number of retries for a task is less than a set number threshold and the time interval between the last execution of the task and the last execution of the task is greater than the set retry time interval. For example, the recall subtask and the analysis subtask may be re-executed.
本说明书实施例中,文本生成装置可以调用DAG引擎,实现上述步骤306和步骤308。该DAG引擎可以定义有可运行(Runnable)接口,基础任务(Task Base)类和DAG类,基础任务类和DAG类对应可执行实体。In the embodiment of this specification, the text generation device can call the DAG engine to implement the above steps 306 and 308. The DAG engine can define a runnable interface, a basic task class and a DAG class, and the basic task class and the DAG class correspond to executable entities.
其中,Runnable接口定义了所有可执行实体的基本结构。Runnable接口作为一个抽象基类,确保不同类型的执行实体遵循统一的行为模式。基础任务类继承自Runnable接口,可以提供任务执行的基础实现和额外功能(如重试机制)。DAG类属于DAG引擎的核心,其类扩展了Runnable接口,使其不仅能作为独立的任务执行单元,还能作为任务的集合。DAG类支持嵌套,允许一个DAG作为子任务被添加到更大的DAG中。这种嵌套功能可以为构建复杂且多层次的任务执行流程提供了极大的灵活性和强大的组织能力。Among them, the Runnable interface defines the basic structure of all executable entities. As an abstract base class, the Runnable interface ensures that different types of execution entities follow a unified behavior pattern. The basic task class inherits from the Runnable interface and can provide the basic implementation of task execution and additional functions (such as a retry mechanism). The DAG class is the core of the DAG engine. Its class extends the Runnable interface so that it can be used not only as an independent task execution unit, but also as a collection of tasks. The DAG class supports nesting, allowing a DAG to be added as a subtask to a larger DAG. This nesting function provides great flexibility and powerful organizational capabilities for building complex and multi-level task execution processes.
示例地,文本生成装置在文本生成过程中,可以先定义基础任务类的子类实例,确定每个子类实例实现的具体任务逻辑。还可以创建DAG实例,将所需执行的各子任务添加至该DAG实例中,实现DAG嵌套。之后,可以调用DAG类的异步执行方法,进行任务的拓扑排序,管理DAG实例中各子任务的执行顺序和并发。通过DAG的嵌套能力可以构建复杂的任务结构,优化文本生成流程的组织和执行。For example, during the text generation process, the text generation device can first define the subclass instance of the basic task class and determine the specific task logic implemented by each subclass instance. A DAG instance can also be created to add each subtask to be executed to the DAG instance to achieve DAG nesting. After that, the asynchronous execution method of the DAG class can be called to perform topological sorting of tasks and manage the execution order and concurrency of each subtask in the DAG instance. The nesting capability of DAG can be used to build a complex task structure and optimize the organization and execution of the text generation process.
本说明书实施例中,文本生成装置可以将整个范式写作的流程拆成了几个基础任务实例和DAG实例。该基础任务实例可以包括数据格式化任务实例,索引创建任务实例,以及后处理任务实例。该DAG实例可以包括各数据查询任务实例。In the embodiment of this specification, the text generation device can split the entire paradigm writing process into several basic task instances and DAG instances. The basic task instance may include a data formatting task instance, an index creation task instance, and a post-processing task instance. The DAG instance may include various data query task instances.
在一些实施方式中,针对上述接口和类,均可以为其定义相应的属性和需执行的方法,在调用相应的实例执行任务时,需要依照该属性和方法。示例地,Runnable接口的属性可以包括:标识符(name,用于识别和追踪相应的实例)、执行结果(output)、依赖的任务列表(dependencies)、依赖此任务的其他任务(dependents)未解决的依赖数量(dependency_count)以及当前状态(status,如包括未开始、运行中、完成以及失败等状态)。In some embodiments, for the above interfaces and classes, corresponding properties and methods to be executed can be defined for them, and when calling the corresponding instance to execute the task, the properties and methods need to be followed. For example, the properties of the Runnable interface may include: an identifier (name, used to identify and track the corresponding instance), an execution result (output), a list of dependent tasks (dependencies), the number of unresolved dependencies (dependency_count) of other tasks (dependents) that depend on this task, and the current status (status, such as not started, running, completed, and failed).
Runnable接口对应的方法可以包括:初始化基础属性,如可以用函数initialize()表示;添加依赖任务,如可以用函数add_dependencies(tasks)表示;检查任务准备情况,如可以用函数is_ready()表示;以及,定义异步执行逻辑,如可以用函数async_run()表示。The methods corresponding to the Runnable interface may include: initializing basic properties, such as can be represented by the function initialize(); adding dependent tasks, such as can be represented by the function add_dependencies(tasks); checking task readiness, such as can be represented by the function is_ready(); and defining asynchronous execution logic, such as can be represented by the function async_run().
基础任务类的属性可以包括:任务名称、任务输入参数、任务输出结果、依赖的任务列表、依赖此任务的其他任务列表、任务重试次数(max_retries)和重试间隔(retry_delay)以及当前任务状态。基础任务类对应的方法可以包括:添加一组依赖任务(add_dependencies),检查任务是否准备好执行(is_ready),异步执行任务(包括处理重试逻辑和更新依赖任务状态),以及设置子类需要实现具体的任务逻辑(execute)。The properties of the basic task class may include: task name, task input parameters, task output results, list of dependent tasks, list of other tasks that depend on this task, number of task retries (max_retries) and retry interval (retry_delay), and current task status. The corresponding methods of the basic task class may include: adding a set of dependent tasks (add_dependencies), checking whether the task is ready to execute (is_ready), asynchronously executing tasks (including processing retry logic and updating dependent task status), and setting the specific task logic that the subclass needs to implement (execute).
DAG类的属性可以包括::并发执行任务的最大数量(max_workers)、DAG中的所有任务列表,以及表示任务是否已经拓扑排序(Sorted)的属性。DAG类对应的方法可以包括:添加单个任务(add_task)或任务列表(add_tasks)到DAG中,以及异步执行DAG中的所有子任务。The properties of the DAG class can include: the maximum number of concurrently executed tasks (max_workers), a list of all tasks in the DAG, and a property indicating whether the tasks have been topologically sorted (Sorted). The corresponding methods of the DAG class can include: adding a single task (add_task) or a list of tasks (add_tasks) to the DAG, and asynchronously executing all subtasks in the DAG.
本说明书实施例中,采用新的文本生成模式,优化了范式写作的生成链路,可以填补了技术空白,使得文本生成过程更为高效和可控。可选地,文本生成装置也可以使用DAG引擎之外的其他任务调度算法或并行处理框架来优化文本生成流程。In the embodiments of this specification, a new text generation mode is adopted to optimize the generation link of paradigm writing, which can fill the technical gap and make the text generation process more efficient and controllable. Optionally, the text generation device can also use other task scheduling algorithms or parallel processing frameworks other than the DAG engine to optimize the text generation process.
综上所述,本说明书实施例提供的文本生成方法中,可以获取待分析的目标数据以及生成文本所需依照的文本生成模板,在基于待分析的目标数据依照该文本生成模板生成文本的过程中,可以基于该文本生成模板中的各条配置信息确定针对目标数据待执行的多个任务,且针对待执行的任务基于其依赖关系进行拓扑排序,以确定支持并行执行的目标任务且对其并行执行,进而整合各任务的执行结果,生成符合文本生成模板的文本。如此,可以使生成模式化文本的过程(如范式写作)灵活性较高,且通过合理地并行执行文本生成过程中的目标任务,可以提升文本生成效率。In summary, in the text generation method provided by the embodiment of this specification, the target data to be analyzed and the text generation template required to generate the text can be obtained. In the process of generating the text based on the target data to be analyzed according to the text generation template, multiple tasks to be executed for the target data can be determined based on the various configuration information in the text generation template, and the tasks to be executed are topologically sorted based on their dependencies to determine the target tasks that support parallel execution and execute them in parallel, and then the execution results of each task are integrated to generate text that conforms to the text generation template. In this way, the process of generating patterned text (such as paradigm writing) can be made more flexible, and the efficiency of text generation can be improved by reasonably executing the target tasks in the text generation process in parallel.
与上述方法实施例相对应,本说明书还提供了文本生成装置实施例,该文本生成装置可以是一种服务器。图6是本说明书一实施例提供的一种文本生成装置的结构示意图。如图6所示,该文本生成装置包括:Corresponding to the above method embodiment, this specification also provides a text generation device embodiment, which can be a server. FIG6 is a structural diagram of a text generation device provided in an embodiment of this specification. As shown in FIG6, the text generation device includes:
获取模块602,用于获取待分析的目标数据,以及基于目标数据生成文本所需依照的文本生成模板;An acquisition module 602 is used to acquire target data to be analyzed and a text generation template required to generate text based on the target data;
第一确定模块604,用于基于文本生成模板中的各条配置信息,确定针对目标数据待执行的多个任务;A first determination module 604, configured to determine a plurality of tasks to be performed on the target data based on each piece of configuration information in the text generation template;
第二确定模块606,用于基于该多个任务的依赖关系,对该多个任务进行拓扑排序,确定该多个任务中支持并行执行的目标任务;A second determining module 606 is used to perform topological sorting on the multiple tasks based on the dependency relationship of the multiple tasks, and determine a target task that supports parallel execution among the multiple tasks;
文本生成模块608,用于针对目标数据并行执行目标任务,并整合各目标任务的执行结果,生成符合文本生成模板的文本。The text generation module 608 is used to execute target tasks in parallel for target data, and integrate the execution results of each target task to generate text that conforms to the text generation template.
可选地,第一确定模块604用于:Optionally, the first determining module 604 is used to:
基于文本生成模板中的各条配置信息,确定每条配置信息对应的至少一个数据查询任务,以及针对各数据查询任务的查询结果的整合任务;Based on each configuration information in the text generation template, determine at least one data query task corresponding to each configuration information, and an integration task for query results of each data query task;
针对目标数据中的各数据源,确定每个数据源对应的数据格式化任务和至少一个索引构建任务;For each data source in the target data, determine a data formatting task and at least one index building task corresponding to each data source;
基于数据查询任务、整合任务、数据格式化任务和索引构建任务,得到针对目标数据待执行的多个任务。Based on the data query task, the integration task, the data formatting task and the index building task, multiple tasks to be executed for the target data are obtained.
可选地,针对同一数据源的数据格式化任务、索引构建任务、数据查询任务以及整合任务具有依次依赖关系;第二确定模块606用于:Optionally, the data formatting task, index building task, data query task and integration task for the same data source have a sequential dependency relationship; the second determining module 606 is used to:
基于该多个任务的依赖关系,对该多个任务进行拓扑排序,确定针对不同数据源的数据格式化任务、索引构建任务和数据查询任务,属于支持并行执行的目标任务。Based on the dependency relationship of the multiple tasks, the multiple tasks are topologically sorted to determine data formatting tasks, index building tasks and data query tasks for different data sources, which are target tasks that support parallel execution.
可选地,第二确定模块606用于:Optionally, the second determining module 606 is used to:
基于该多个任务的依赖关系,对该多个任务进行拓扑排序,确定针对同一数据源的不同索引构建任务,属于支持并行执行的目标任务。Based on the dependency relationship of the multiple tasks, the multiple tasks are topologically sorted to determine that different index building tasks for the same data source belong to target tasks that support parallel execution.
可选地,目标任务包括数据查询任务,文本生成模块608用于:Optionally, the target task includes a data query task, and the text generation module 608 is used to:
针对文本生成模板中的每条配置信息,执行查询语句改写子任务,得到自然语言模型的提示信息;For each configuration information in the text generation template, execute the query statement rewriting subtask to obtain the prompt information of the natural language model;
针对目标数据执行提示信息的相关数据的召回子任务;Perform the recall subtask of the relevant data of the prompt information for the target data;
利用自然语言模型针对该多个相关数据并行地执行分析子任务,得到每个相关数据与提示信息的关联度;Using a natural language model to perform analysis subtasks in parallel on the multiple related data, to obtain a correlation between each related data and the prompt information;
基于各相关数据及其与提示信息的关联度执行后处理子任务,得到提示信息对应的结果文本,并基于结果文本确定数据查询任务的执行结果。The post-processing subtask is executed based on each relevant data and its correlation with the prompt information to obtain the result text corresponding to the prompt information, and the execution result of the data query task is determined based on the result text.
可选地,文本生成模块608用于:Optionally, the text generation module 608 is used to:
在目标数据包括多个数据源的情况下,针对不同数据源并行地执行提示信息的相关数据的召回子任务。In the case that the target data includes multiple data sources, the subtask of recalling data related to the prompt information is executed in parallel for different data sources.
可选地,文本生成模块608用于:Optionally, the text generation module 608 is used to:
基于各相关数据及其与提示信息的关联度执行后处理子任务;Perform post-processing subtasks based on the relevant data and their relevance to the prompt information;
在后处理子任务的处理结果与提示信息的关联度均不符合关联度条件的情况下,重新执行召回子任务、分析子任务和后处理子任务;When the relevance between the processing result of the post-processing subtask and the prompt information does not meet the relevance condition, the recall subtask, the analysis subtask and the post-processing subtask are re-executed;
在后处理子任务的处理结果与提示信息的关联度符合关联度条件的情况下,将该处理结果确定为提示信息对应的结果文本。When the correlation between the processing result of the post-processing subtask and the prompt information meets the correlation condition, the processing result is determined as the result text corresponding to the prompt information.
可选地,文本生成模块608用于:Optionally, the text generation module 608 is used to:
针对目标数据,基于设定的并发任务数量,控制小于或等于并发任务数量的目标任务并行执行。For the target data, based on the set number of concurrent tasks, control the parallel execution of target tasks that are less than or equal to the number of concurrent tasks.
可选地,文本生成模块608用于:Optionally, the text generation module 608 is used to:
针对目标数据,利用协程池调用多个协程并行执行目标任务。For the target data, the coroutine pool is used to call multiple coroutines to execute the target tasks in parallel.
综上所述,本说明书实施例提供的文本生成装置中,获取模块可以获取待分析的目标数据以及生成文本所需依照的文本生成模板,在基于待分析的目标数据依照该文本生成模板生成文本的过程中,第一确定模块可以基于该文本生成模板中的各条配置信息确定针对目标数据待执行的多个任务,且第二确定模块针对待执行的任务基于其依赖关系进行拓扑排序,以确定支持并行执行的目标任务且对其并行执行,进而文本生成模块整合各任务的执行结果,生成符合文本生成模板的文本。如此,可以使生成模式化文本的过程(如范式写作)灵活性较高,且通过合理地并行执行文本生成过程中的目标任务,可以提升文本生成效率。In summary, in the text generation device provided by the embodiment of the present specification, the acquisition module can acquire the target data to be analyzed and the text generation template required to generate the text. In the process of generating the text based on the target data to be analyzed according to the text generation template, the first determination module can determine multiple tasks to be executed for the target data based on each configuration information in the text generation template, and the second determination module topologically sorts the tasks to be executed based on their dependencies to determine the target tasks that support parallel execution and execute them in parallel, and then the text generation module integrates the execution results of each task to generate text that conforms to the text generation template. In this way, the process of generating patterned text (such as paradigm writing) can be made more flexible, and the efficiency of text generation can be improved by reasonably executing the target tasks in the text generation process in parallel.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于文本生成装置而言,由于其基本相似于文本生成方法实施例,所以描述的比较简单,相关之处参见文本生成方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the text generation device, since it is basically similar to the text generation method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the text generation method embodiment.
图7是本说明书一实施例提供的一种计算设备的结构框图。该计算设备700的部件包括但不限于存储器710和处理器720。处理器720与存储器710通过总线730相连接,数据库750用于保存数据。Fig. 7 is a block diagram of a computing device provided in an embodiment of the present specification. The components of the computing device 700 include but are not limited to a memory 710 and a processor 720. The processor 720 is connected to the memory 710 via a bus 730, and the database 750 is used to store data.
计算设备700还包括接入设备740,接入设备740使得计算设备700能够经由一个或多个网络760通信。这些网络的示例包括公用交换电话网(PSTN,Public SwitchedTelephone Network)、局域网(LAN,Local Area Network)、广域网(WAN,Wide AreaNetwork)、个域网(PAN,Personal Area Network)或诸如因特网的通信网络的组合。接入设备740可以包括有线或无线的任何类型的网络接口(例如,网络接口卡(NIC,networkinterface controller))中的一个或多个,诸如IEEE802.11无线局域网(WLAN,WirelessLocal Area Network)无线接口、全球微波互联接入(Wi-MAX,WorldwideInteroperability for Microwave Access)接口、以太网接口、通用串行总线(USB,Universal Serial Bus)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC,Near FieldCommunication)。The computing device 700 also includes an access device 740 that enables the computing device 700 to communicate via one or more networks 760. Examples of these networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet. The access device 740 may include one or more of any type of network interface (e.g., a network interface card (NIC)) that is wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, a world-wide interoperability for microwave access (Wi-MAX) interface, an Ethernet interface, a universal serial bus (USB) interface, a cellular network interface, a Bluetooth interface, and a near field communication (NFC).
在本说明书的一个实施例中,计算设备700的上述部件以及图7中未示出的其他部件也可以彼此相连接,例如通过总线。应当理解,图7所示的计算设备结构框图仅仅是出于示例的目的,而不是对本说明书范围的限制。本领域技术人员可以根据需要,增添或替换其他部件。In one embodiment of the present specification, the above components of the computing device 700 and other components not shown in FIG. 7 may also be connected to each other, for example, through a bus. It should be understood that the computing device structure block diagram shown in FIG. 7 is only for illustrative purposes and is not intended to limit the scope of the present specification. Those skilled in the art may add or replace other components as needed.
计算设备700可以是任何类型的静止或移动计算设备,包括移动计算机或移动计算设备(例如,平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如,智能手机)、可佩戴的计算设备(例如,智能手表、智能眼镜等)或其他类型的移动设备,或者诸如台式计算机或个人计算机(PC,Personal Computer)的静止计算设备。计算设备700还可以是移动式或静止式的服务器。The computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., a tablet computer, a personal digital assistant, a laptop computer, a notebook computer, a netbook, etc.), a mobile phone (e.g., a smart phone), a wearable computing device (e.g., a smart watch, smart glasses, etc.), or other types of mobile devices, or a stationary computing device such as a desktop computer or a personal computer (PC). The computing device 700 may also be a mobile or stationary server.
其中,处理器720用于执行计算机程序/指令,该计算机程序/指令被处理器执行时实现上述的文本生成方法中的步骤。The processor 720 is used to execute computer programs/instructions, which implement the steps in the above-mentioned text generation method when executed by the processor.
对于计算设备实施例而言,由于其基本相似于文本生成方法实施例,所以描述的比较简单,相关之处参见文本生成方法实施例的部分说明即可。As for the computing device embodiment, since it is basically similar to the text generation method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the text generation method embodiment.
本说明书一个实施例还提供一种计算机可读存储介质,其存储有计算机指令,该计算机指令被处理器执行时实现上述文本生成方法的步骤。所述计算机指令包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读存储介质包含的内容可以根据管辖区内立法和专利实践的要求进行适当的增减,例如在某些管辖区,根据立法和专利实践,计算机可读存储介质不包括电载波信号和电信信号。One embodiment of the present specification also provides a computer-readable storage medium, which stores computer instructions, and when the computer instructions are executed by the processor, the steps of the above-mentioned text generation method are implemented. The computer instructions include computer program code, and the computer program code can be in source code form, object code form, executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, USB flash drive, mobile hard disk, disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable storage medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable storage media do not include electric carrier signals and telecommunication signals.
本说明书一个实施例还提供一种计算机程序产品,包括计算机程序/指令,当所述计算机程序/指令在处理器中执行时实现上述文本生成方法的步骤。An embodiment of the present specification also provides a computer program product, including a computer program/instruction, which implements the steps of the above-mentioned text generation method when the computer program/instruction is executed in a processor.
对于计算机可读存储介质实施例和计算机程序产品实施例而言,由于其基本相似于文本生成方法实施例,所以描述的比较简单,相关之处参见文本生成方法实施例的部分说明即可。As for the computer-readable storage medium embodiment and the computer program product embodiment, since they are basically similar to the text generation method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the text generation method embodiment.
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The above is a description of a specific embodiment of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the accompanying drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
需要说明的是,上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本说明书实施例所必须的。It should be noted that the above is a description of a specific embodiment of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the accompanying drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of the present specification.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.
以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书实施例的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本说明书实施例的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本说明书。The preferred embodiments of this specification disclosed above are only used to help explain this specification. The optional embodiments do not describe all the details in detail, nor do they limit the invention to the specific implementation methods described. Obviously, many modifications and changes can be made according to the content of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that technicians in the relevant technical field can understand and use this specification well.
Claims (14)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410625183.XA CN118586375A (en) | 2024-05-20 | 2024-05-20 | Text generation method, device and system, computing device, and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410625183.XA CN118586375A (en) | 2024-05-20 | 2024-05-20 | Text generation method, device and system, computing device, and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118586375A true CN118586375A (en) | 2024-09-03 |
Family
ID=92523440
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410625183.XA Pending CN118586375A (en) | 2024-05-20 | 2024-05-20 | Text generation method, device and system, computing device, and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118586375A (en) |
-
2024
- 2024-05-20 CN CN202410625183.XA patent/CN118586375A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3244312B1 (en) | A personal digital assistant | |
CN116775183A (en) | Task generation method, system, equipment and storage medium based on large language model | |
CN118093801A (en) | Information interaction method, device and electronic device based on large language model | |
CN117725191B (en) | Guide information generation method and device of large language model and electronic equipment | |
CN117520514A (en) | Question-answering task processing method, device, equipment and readable storage medium | |
CN117494814A (en) | Prompt word full life cycle management method, system, electronic equipment and storage medium | |
WO2019217214A1 (en) | Personal history recall | |
EP4165541A1 (en) | Systems and methods for identification of repetitive language in document using linguistic analysis and correction thereof | |
CN118503396B (en) | ERP system large model calling method, device and medium based on open prompt words | |
CN112000813B (en) | Knowledge base construction method and device | |
CN118585619A (en) | A question-answering method, device, equipment and medium | |
CN118569390A (en) | Scenario-based AIGC content generation method, system and storage medium | |
CN118861560A (en) | A medical task execution method and computer device | |
CN111931034A (en) | Data searching method, device, equipment and storage medium | |
CN117194620A (en) | Information processing method, device, equipment and storage medium | |
CN117875274A (en) | Method for generating article content and electronic equipment | |
CN118586375A (en) | Text generation method, device and system, computing device, and readable storage medium | |
CN113934450B (en) | Method, apparatus, computer device and medium for generating annotation information | |
CN113778541A (en) | Applet generation method and device, electronic equipment and storage medium | |
CN118349661B (en) | Data processing method, device and equipment based on large language model | |
CN118070925B (en) | Model training method, device, electronic equipment, storage medium and program product | |
CN119377363A (en) | Dialogue data processing method, system and computer device | |
CN119415661A (en) | User problem information response method, device, equipment, medium and program product | |
CN118916068A (en) | Code repairing method, electronic device and computer readable storage medium | |
CN118551057A (en) | Task execution method, device, equipment, medium and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |