CN116932733B

CN116932733B - Information recommendation method and related device based on large language model

Info

Publication number: CN116932733B
Application number: CN202310370259.4A
Authority: CN
Inventors: 黄际洲; 王少磊; 孙一博
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2024-10-18
Anticipated expiration: 2043-04-07
Also published as: CN116932733A

Abstract

The disclosure provides an information recommendation method and a related device based on a generative large language model, and relates to the technical fields of generative models, large language models, information recommendation, control generation based on planning and the like. The method comprises the following steps: acquiring natural language input of a user and personalized information of the user; inputting a target generation type large language model trained by taking natural language input and personalized information as input data and taking a planning-based control generation idea as an optimization target; the control generation idea based on planning refers to generation of constraint output data by adopting a preset planner matched with input data, wherein the preset planner records a processing procedure planning for realizing information acquisition requirements corresponding to the input data; and returning the natural language output of the target generation type large language model to the user as recommendation information. By applying the method, the reply content of the model can be more in accordance with the preference of the user and has stronger interpretability.

Description

Information recommendation method and related device based on large language model

Technical Field

The disclosure relates to the field of data processing, in particular to the technical field of artificial intelligence such as a generative model, a large language model, information recommendation, control generation based on planning and the like, and particularly relates to an information recommendation method based on the generative large language model, and a matched device, electronic equipment, a computer readable storage medium and a computer program product.

Background

Large language models (LLM, large Language Model), which are essentially generative models, also simply generative large language models, are capable of generating human-like fluent responses for many downstream tasks, such as task-oriented conversations and problem solutions.

When a user makes a request for acquiring certain information, the generated large language model can understand the meaning of the request and replies, but often the replies cannot be combined with personalized information (such as preference and the like) of the user, and the returned result has no corresponding explanation reason, so that the user cannot know how the model is specifically analyzed, and the accuracy of the result is questioned by the user. That is, the lack of a current generative large language model's answer to a user's question can manifest the structure or hierarchy of a mental chain (Chain Of Thoughts, COT) or analysis chain.

Disclosure of Invention

The embodiment of the disclosure provides an information recommendation method, an information recommendation device, electronic equipment, a computer readable storage medium and a computer program product based on a generative large language model.

In a first aspect, an embodiment of the present disclosure provides an information recommendation method based on a generated large language model, including: acquiring natural language input of a user and personalized information of the user; inputting a target generation type large language model trained by taking natural language input and personalized information as input data and taking a planning-based control generation idea as an optimization target; the control generation idea based on planning refers to generation of constraint output data by adopting a preset planner matched with input data, wherein the preset planner records a processing procedure planning for realizing information acquisition requirements corresponding to the input data; and returning the natural language output of the target generation type large language model to the user as recommendation information.

In a second aspect, an embodiment of the present disclosure provides an information recommendation apparatus based on a generative large language model, including: an input data acquisition unit configured to acquire natural language input of a user and personalized information of the user; the model calling unit is configured to input a target generation type large language model trained by taking natural language input and personalized information as input data and taking a control generation idea based on planning as an optimization target; the control generation idea based on planning refers to generation of constraint output data by adopting a preset planner matched with input data, wherein the preset planner records a processing procedure planning for realizing information acquisition requirements corresponding to the input data; and a recommendation information return unit configured to return natural language output of the target-generated large language model to the user as recommendation information.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the information recommendation method based on the generative large language model as described in the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement the method for information recommendation based on a generative large language model as described in the first aspect.

In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, is capable of implementing the steps of the information recommendation method based on a generative large language model as described in the first aspect.

According to the information recommendation method based on the generated large language model, natural language input and personalized information of the user describing information acquisition requirements are used as input data, the input data are input into the target generated large language model trained by taking planning-based control generation (Controlled Generation VIA PLANNING) thought as an optimization target, generation of output data is restrained by a preset planner which is adopted by the planning-based control generation thought and matched with the input data, the preset planner is used for recording the characteristic of processing procedure planning for realizing the information acquisition requirements corresponding to the input data, natural language output energy which can be generated by the generated large language model is given to thinking, analysis and processing procedures reflecting the information acquisition requirements corresponding to the natural language output by the generated large language model, and candidate results are screened by combining personalized information, so that the final natural language output meets the personalized requirements of the user more, and has higher interpretability and reliability.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture in which the present disclosure may be applied;

FIG. 2 is a flowchart of an information recommendation method based on a generative large language model according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method of constructing a training sample for training to obtain a target-generated large language model provided by an embodiment of the present disclosure;

FIG. 4 is a flowchart of a method for returning natural language output to a user as recommendation information according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an information recommendation device based on a large language model generated according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device adapted to perform an information recommendation method based on a generative large language model according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the information recommendation methods, apparatus, electronic devices, and computer-readable storage media based on a generative large language model of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various applications for implementing information communication between the terminal devices 101, 102, 103 and the server 105, such as an information recommendation application, a voice interaction application, an instant messaging application, and the like, may be installed on the terminal devices.

The terminal devices 101, 102, 103 and the server 105 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smartphones, tablets, laptop and desktop computers, etc.; when the terminal devices 101, 102, 103 are software, they may be installed in the above-listed electronic devices, which may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not particularly limited herein. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not particularly limited herein.

The server 105 can provide various services through various built-in applications, and for example, an information recommendation type application that can provide an information recommendation service, the server 105 can achieve the following effects when running the information recommendation type application: firstly, receiving natural language input of an incoming user and personalized information of the user through a network 104; then, taking the natural language input and the personalized information as input data, inputting a target generation type large language model trained by taking a planning-based control generation idea as an optimization target, wherein the planning-based control generation idea refers to generation of constraint output data by adopting a preset planner matched with the input data, and the preset planner records a processing procedure plan for realizing information acquisition requirements corresponding to the input data; finally, the natural language output of the target-generated large language model is returned to the terminal devices 101, 102 and 103 as recommendation information through the network 104 again, so as to be conveniently presented to users in front of the terminal devices.

It should be noted that the natural language input and the personalized information describing the information acquisition requirements may be stored in advance in the server 105 (particularly, the personalized information of the user) in various ways, in addition to being temporarily acquired from the terminal devices 101, 102, 103 through the network 104. Thus, when the server 105 detects that such data has been stored locally, it may choose to obtain the data directly from the local, in which case the exemplary system architecture 100 may not include the terminal devices 101, 102, 103 and the network 104.

Because the information recommendation combined with the personalized information of the user needs to occupy more operation resources and stronger operation capability, the information recommendation method based on the generated large language model provided in the subsequent embodiments of the present disclosure is generally executed by the server 105 having stronger operation capability and more operation resources, and accordingly, the information recommendation device based on the generated large language model is also generally disposed in the server 105. However, it should be noted that, when the terminal devices 101, 102, 103 also have the required computing capability and computing resources, the terminal devices 101, 102, 103 may also complete each operation performed by the server 105 through the information recommendation application installed thereon, and further output the same result as the server 105. Particularly, in the case where a plurality of terminal devices having different computing capabilities exist at the same time, when the information recommendation type application determines that the terminal device where the information recommendation type application is located has a stronger computing capability and more computing resources remain, the terminal device may be allowed to perform the above-mentioned computation, so that the computing pressure of the server 105 is properly reduced, and accordingly, the information recommendation device based on the generated large language model may also be provided in the terminal devices 101, 102, 103. In this case, the exemplary system architecture 100 may also not include the server 105 and the network 104.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring to fig. 2, fig. 2 is a flowchart of an information recommendation method based on a generated large language model according to an embodiment of the disclosure, wherein the flowchart 200 includes the following steps:

Step 201: acquiring natural language input of a user and personalized information of the user;

This step aims at acquiring natural language input of a user and personalized information of the user incoming from a user terminal (e.g., terminal devices 101, 102, 103 shown in fig. 1) by an execution subject (e.g., server 105 shown in fig. 1) of an information recommendation method based on a generative large language model.

The natural language input means that a user describes information acquisition requirements in a natural language mode and represents the information acquisition requirements in a proper mode (such as text-form natural language texts, voice-form natural language voices and the like), and for facilitating subsequent processing, original incoming non-text-form natural language inputs can be converted into natural language texts convenient to process by means of corresponding conversion technologies; the personalized information of the user is used for referring to the preference, habit, characteristic, preference and the like of the user, such as a place with low density of favorite tourists like to play, two children, pets, and the like, and can be obtained by analyzing and determining the selection preference or feedback condition of various types of information on the history line of the user, extracted or determined from the history inquiry session, filled or uploaded by the user.

Specifically, the recall system can also be constructed by adopting the modes of user interest modeling and user portrait analysis to determine the user personalized information and constructing the recall system by the user personalized information:

The user interest modeling is to build an interest model of the user by analyzing information such as historical behaviors, search records, purchase records, evaluation and the like of the user, and then the interest model is used as input of a recall system, so that personalized recall capability is improved;

The user portrait analysis is to build user portrait by analyzing personal information of the user, including information such as age, sex, occupation, hobbies and the like of the user. These information can be sent to LLM generation model (API interface call) to help LLM better understand user's needs, thus generating more accurate recommendation;

the personalized recall system is a thousand-person and thousand-face personalized recall system which is built by combining the user interest model and portrait information. The system inputs include the output of the user interest model, item information, session information (search keywords for a short time by the user, browsing history, shopping cart information, etc.), etc. The output includes two parts: 1) A list of recommended results for the recommended TopK; 2) The reason for recommendation of each recommendation result.

Step 202: inputting a target generation type large language model trained by taking natural language input and personalized information as input data and taking a planning-based control generation idea as an optimization target;

On the basis of step 201, this step aims at combining the natural language input of the user and the personalized information thereof into input data by the execution subject, and then inputting the input data into the target-generated large language model obtained after model training with the optimized target constructed based on the planned control generation concept.

The control generation idea based on planning, which is adopted as a model optimization target in the present disclosure, refers to that for each input data, a preset planner matched with the input data is adopted to restrict the generation of output data, and specifically, the preset planner is used for recording a processing procedure plan for realizing an information acquisition requirement corresponding to the input data, that is, the information acquisition requirement corresponding to the input data is understood, analyzed and processed in a stepwise and step-by-step manner through the processing procedure plan recorded in the preset planner, so that each part of the finally generated output data has reasonable sequence, and proper and accurate content expression. In brief, the process plan recorded in the preset planner corresponds to constraints on the output data, and the constraint object can be simply understood as: the ordering, modification and expression adjustment of the reply content of each part.

Specifically, the preset planner may include a plurality of processing steps for processing information acquisition requirements corresponding to the input data, where each processing step is obtained by understanding and planning the corresponding information acquisition requirements in steps, and considering that the input data includes personalized information of the user, the processing step should include at least a personalized screening step for screening candidate results matching the personalized information, and a reason generating step for generating a reason for recommending matching the personalized information for the screened result, so as to screen the candidate results matching the personalized information of the user through the personalized screening step, thereby improving user satisfaction, and meanwhile, notifying the user of how the candidate results provided for the user match the requirements of the user through the reason generating step, thereby improving the interpretability of the results and improving the reliability of the results.

Further, in addition to the above-mentioned personalized screening step and reason generating step, at least one of the following other processing steps may be included:

1) A requirement type analyzing step for analyzing the type of the information acquisition requirement corresponding to the natural language input;

For example, for the natural language input — child just completed with his turn around, i want to take his turn around, and if we can help me recommend several places ", by this step it can be determined that the demand belongs to the type of recommendation for the place of interest (Point OF Interest, POI); for the natural language input — recommend some songs suitable for driving travel listening to me ", it can be determined through this step that the requirement belongs to the song recommendation type; for natural language input — i skin is drier, there is no good skin care product recommendation ", it can be determined through this step that the demand belongs to commodity recommendation type, etc.

The method aims at analyzing and determining the type of the information acquisition requirement corresponding to the natural language input so as to obtain proper recommended information by adopting proper analysis and processing procedures under the condition of determining the problem type.

2) A result query step for performing all the candidate result queries according to the analyzed demand types;

based on the requirement type analysis step mentioned in 1), the processing step specifically performs corresponding query operation according to the analyzed requirement type to initially obtain all the alternative results.

3) A requirement re-description step for re-summarizing the information acquisition requirement corresponding to the natural language input;

in the processing step, the information acquisition requirements corresponding to the natural language input are re-summarized, so that the user confirms that the model really understands the expressed requirements according to the re-summarized information acquisition requirements.

For example, for the natural language input — child just completed with his turn around, i want to take his turn around, and we can recommend several places, the requirements obtained by the re-summary available through this step are expressed as: "child just participated in college entrance examination, which is a moment worth celebrating. During this time, you can consider taking the child to go to scenic spots that can relax the mind and body and can also widen the field of view. The above requirement expression summarized again obviously shows that the model captures the characteristic requirement that the user wants to search the playing places meeting the requirement, namely 'take children to go out to play' and 'take children to just go out to examine', so that the model is suitable for taking children to go to some scenic spots which can relax body and mind and can open the field of vision to play.

Through the step, the credibility of the user on the reply content output by the model later can be definitely improved.

4) A fusion step for fusing the different information with the association obtained in the previous step;

The present processing step usually exists as an intermediate step, and is used for merging the different information with the association obtained in the previous step, so as to improve the compactness of the associated information.

5) A deduplication step for removing duplicate content.

The present process step is generally used as a post-process step, i.e. by removing the repeated parts in the content generated before, the readability and conciseness of the content after final processing are improved in a modified manner.

The preset planner may include 1 or more of the above 5 processing steps according to actual situations, and may further include other processing steps for achieving similar purposes or effects besides the above mentioned various processing steps, which are not listed and described herein.

Step 203: and returning the natural language output of the target generation type large language model to the user as recommendation information.

On the basis of step 202, this step aims to return the natural language output of the target-generated large language model to the user as recommendation information by the execution subject, so that the user plans and schedules subsequent operations according to the recommendation information.

According to the information recommendation method based on the generation type large language model, natural language input and personalized information of the user describing information acquisition requirements are used as input data, the input data are input into the target generation type large language model trained by taking a control generation idea based on planning as an optimization target, generation of output data is restrained by a preset planner which is adopted by the control generation idea based on planning and is matched with the input data, the preset planner is used for recording the characteristic of processing procedure planning for realizing the information acquisition requirements corresponding to the input data, natural language output which can be generated by the generation type large language model can be given to thinking, analysis and processing procedures reflecting the information acquisition requirements corresponding to the natural language output by the generation type large language model, and candidate results are screened by combining personalized information, so that the final natural language output meets the personalized requirements of the user more, and has higher interpretability and reliability.

Referring to fig. 3, fig. 3 is a flowchart of a method for constructing a training sample for training to obtain a target-generated large language model according to the embodiment of the present disclosure, where the flowchart 300 includes the following steps:

Step 301: clustering natural language inputs of different users according to the text and the intention similarity to obtain a plurality of clustering results;

The step aims at carrying out clustering operation on natural language inputs of different users according to the similarity of texts and intentions by the execution main body so as to obtain the effect of clustering different natural language inputs with the same intentions under the same clustering center, namely, each clustering center corresponds to one type of clustering result, so that a plurality of clustering results can be obtained generally.

Step 302: extracting a first number of target natural language inputs from each clustering result respectively, and obtaining a preset planner for marking objects as target natural language input marks;

Based on step 301, this step aims at extracting a small amount of target natural language input from each clustering result by the execution subject, and obtaining a preset planner for labeling the target natural language input label by the labeling object.

Only a small amount of target natural language input is extracted from each clustering result, and the marking difficulty and the marking time of the matched preset planners are considered, so that the time of marking links is saved as much as possible. However, in order to improve the function of the labeled small amount of target natural language input, the natural language input with clear expression and strong representativeness can be selected as far as possible when the target natural language input is extracted.

Step 303: aiming at each clustering result, a real sample pair formed by a target natural language input under the corresponding clustering result and a corresponding preset planner is used as a few sample prompt, and a second number of increment sample pairs are generated through a generation type large language model with code generation capability;

Based on step 302, this step aims at using the above-mentioned execution subject as a few-sample hint (few-shot prompts) for each clustering result, the real sample pair (i.e. "target natural language input-preset planner") composed of the target natural language input under the corresponding clustering result and the corresponding preset planner, and generating a second number of incremental sample pairs through a large-scale language model with code generation capability.

The following is an example few-shot prompts:

"I want you to let you act as a software developer, I will provide some APIs (Application Programming Interface, application Programming interfaces) for specific functions and details, you work by calling the provided APIs to generate the final user reply by writing a program (instruction template, a representation of a preset planner), the APIs specifically include: parse_query (#API function introduction, YY, ZZ, etc.). I provide several query to program examples: examples 1-XXX, examples 2-YYY, examples 3-ZZZ. Next, given the query, "ask for recommendation of a terrorist tablet," please generate multiple programs.

This instance acts as a software developer by setting up a generative large language model, informing the requirements model to learn what the example gives based on a number of given APIs: and (3) combining a given query to automatically generate a plurality of new programs according to the corresponding relation between the query and the programs, so that the query and the new programs form incremental sample pairs.

Step 304: training samples are constructed based on the true sample pairs and the incremental sample pairs.

Based on steps 302 and 303, this step aims to construct training samples based on the real sample pair and the incremental sample pair by the execution subject, that is, the quality of the training samples is guaranteed by the real sample pair and the quantity of the training samples is guaranteed by the incremental sample pair, so as to construct the training samples with proper quality and quantity together.

In this embodiment, a specific implementation manner of constructing training samples is provided through steps 301-304, a large number of user natural language inputs are clustered into a plurality of clustering results according to text and intention similarity in a clustering manner, a small number of target natural language input labels are selected from each clustering result to obtain a preset planner, and a large language model with code generating capability is combined with a small sample prompting manner to generate a large number of increment sample pairs based on a small number of real sample pairs, so that the training samples are ensured to have enough quality and quantity, and the models trained according to the clustering results are ensured to have corresponding capabilities.

To enhance the understanding of how the preset planner embodies the "planning-based control generation concept," it is further illustrated herein by the following two specific implementations:

The implementation mode is as follows:

An instruction template (namely program) formed based on a coding instruction form is used as the preset planner, the instruction template comprises a plurality of processing instructions for processing information acquisition requirements corresponding to input data, each processing instruction is obtained after step understanding and planning of the corresponding information acquisition requirements, and the processing instructions comprise execution instructions of corresponding application programming interfaces (namely APIs).

Example 1: for the input data- "child just has high examination, i want to take him out to turn around, i can help me recommend several places, user_id=12345 (here user ID is used to characterize his personalized information)", its corresponding program can be expressed as:

“def recommend_poi(user_query,user_personal_info):

First step #: API for calling resolution query

query_attributes＝query_parser_api.parse_query(user_query)

And step # II: calling an API to perform a specific query task

location_recommendations,recommendation_reasons＝

query_executor_location_api.execute_query(query_attributes,user_personal_inf o)

Third step #: performing merge on the result of the second step to remove repeated items

merged_location_recommendations,merged_reasons＝

query_executor_location_api.merge_results(location_recommendations,recomm endation_reasons)

Fourth step #: invoking a query interpretation class POI to generate a section of interpretation of the query and present the section of interpretation to a user

query_interpretation＝

query_interpreter_api.generate_interpretation(user_query,user_personal_info)

Fifth step #: calling a response generation class API based on Large language model to generate by combining the third step merge result and query, and preferentially showing the specific recommended place and the modified recommendation reason according to the personalized information of the user

response＝

response_generator_api.generate_response(merged_location_recommendations,merged_reasons,user_personal_info,user_query)

Sixth step: calling API with summarization function, generating a summary of query and presenting the summary to user

query_summary＝summary_api.generate_summary(user_query,merged_location_recommendations)

Step # seventh step: the content generated step by step is merger and displayed to the user

return chat_response_generate(query_interpretation,response,query_summary)”。

After the program is executed, the following replies can be obtained:

"child just participated in college entrance examination, which is a moment worth celebrating. During this time, you can consider taking the child to go to scenic spots that can relax the mind and body and can also widen the field of vision. The following are some attractions recommended according to your situation:

beijing world film city: the Beijing-around-the-world movie city is a theme park integrating movies, animations and special effects, has scenic spots with seven different themes such as transformers, congou pandas and halibut, and is suitable for amusement projects with children experiencing stimulation.

Shanghai Disney park: the Shanghai Disney park is provided with six theme parks such as a Mickey street, a fanciful garden, a quest island and the like, is suitable for playing all families, and simultaneously has various performances and exhibition activities, so that children can also open the field of vision and relax the mind and body while playing.

Western Anda Tang not night city: the western Anda Tang not night city is an archaized building group with Tang culture as background, is positioned in the Yanta region of western An city of Shaanxi province, and can take children to shop on surrounding pedestrian streets and taste delicacies, thereby feeling a thick historical culture atmosphere.

In general, just after the college entrance examination, the utility model is suitable for easy travel, and the user can make an attack before remembering the travel and carry the personal belongings.

Example 2: for the input data — i skin drier, there is no good skin care recommendation, user_id=12345 ", the corresponding program can be expressed as:

“def recommend_product(user_query,user_personal_info):

First step #: API for calling resolution query

query_attributes＝query_parser_api.parse_query(user_query)

And step # II: calling an API to perform a specific query task

product_recommendations,recommendation_reasons＝

query_executor_product_api.execute_query(query_attributes,user_personal_info)

merged_product_recommendations,merged_reasons＝

query_executor_product_api.merge_results(product_recommendations,recomme ndation_reasons)

query_interpretation＝

query_interpreter_poi.generate_interpretation(user_query,user_personal_info)

Fifth step #: in combination with the third step merge result and query, invoking a response generation class API based on Large language model to generate, and preferentially displaying specific recommendation according to user personalized information

Commodity and reason for recommendation after modification

response＝

response_generator_api.generate_response(merged_product_recommendations,merged_reasons,user_personal_info,user_query)

query_summary＝summary_api.generate_summary(user_query,merged_product_recommendations)

After the program is executed, the following replies can be obtained:

The following skin care products suitable for dry skin and moderate in price are recommended for you:

butter fruit eye cream: contains rich plant squalane, can deeply moisten skin, moisturize water and improve skin color;

moisturizing facial cream: contains high-grade gem and repairing factors, powerfully repairs cell injury on the skin surface, restores skin elasticity, and is suitable for all skin types;

honey mask: the honey-containing skin-care cream contains abundant honey components, can help moisten skin, strengthen skin elasticity and luster, and is also beneficial to relieving skin problems such as acnes;

The friendship suggests that moisturizing is the most core skin care step of dry skin, and skin care products suitable for skin types of users can be selected to enable the skin to be more moist and healthy, so that the effect of half effort is achieved.

The implementation mode II is as follows:

The prompt template (namely sub-prompts) formed based on the sub-prompt form is used as a preset planner, the prompt template comprises a plurality of sub-prompts for processing the whole information acquisition requirement corresponding to the input data to be split, each sub-prompt is obtained after the corresponding information acquisition requirement is understood and planned in steps, and the sub-prompts comprise execution instructions of corresponding application programming interfaces (namely APIs).

Example 1: for the input data- "child just has high examination, i want to take him to turn around, i can help me recommend several places, user_id=12345", its corresponding sub-prompts can be expressed as:

"context+query: the child just has to go up and go around, i want to take him out and turn around, and can help me recommend several places

Sub prompt1: given a user query: q and preference "self-driving, high revenue group,".

Do you need to call an API?

master LLM output2：parse_query(Q)

API output: inquiring scenic spots: type: leisure tag: is suitable for carrying children

Sub prompt2: given a user query: q and preference "self-driving, high revenue group,". Query for attractions: type: leisure tag: is suitable for carrying children. Do you need to call an API?

MASTER LLM output2: execution_query ("query scenic spot: type: leisure tag: suitable for child", "self-driving, high income group")

API output: beijing world film city: XXX; the palace: YYY ×

Sub prompt3: given a user query: q and preference "self-driving, high revenue group,". Query for attractions: type: leisure tag: suitable for carrying child query results: beijing world shadow city-XXX;

Do the palace-YYY $ you need to call the API?

MASTER LLM output3: merge_results (query result: beijing ring-ball shadow city-XXX; imperial palace-YYY)

API output: merge results: beijing world film city; palace of the same origin

Sub prompt4 given user query: q and preference "self-driving, high revenue group,". Query for attractions: type: leisure tag: suitable for carrying child query results: beijing world shadow city-XXX;

The combination of the palace-YYY: beijing world film city; do you need to call an API?

master LLM output4：None

Finaly prompt5 given user query: q and preference "self-driving, high revenue group,". Query for attractions: type: leisure tag: suitable for carrying child query results: beijing world shadow city-XXX;

The combination of the palace-YYY: beijing world film city; the palace generates user comments LLM for chat Response: children just participated in college entrance examination.

After the execution according to the sub-prompts is finished, the following replies can be obtained as well:

Example 2: for the input data — i skin drier, there is no good skin care recommendation, user_id=12345 ", the corresponding sub-prompts can be expressed as:

"context+query: is i skin drier, there is nothing to recommend skin care products for good use?

Sub prompt1: given a user query: q and preference "medium income group,". Do you need to call an API?

master LLM output2：parse_query(Q)

API output: querying the commodity: name: skin care product tag: dry skin

Sub prompt2: given a user query: q and preference "medium income group,". Query for merchandise: name: skin care product tag: dry skin. Do you need to call an API?

MASTER LLM output2: execution_query ("query commodity: name: skin care product tag: dry skin", "high income group:")

API output: butter fruit eye cream: XXX; moisturizing facial cream: YYY ×

Sub prompt3: given a user query: q and preference "medium income group,". Query for merchandise: name: skin care product tag: dry skin. Query results: butter fruit eye cream: XXX;

moisturizing facial cream: YYY $ $ $ you $ $ $ $ you is an API required to be called?

MASTER LLM output3: merge_results (query results: butter eye cream: XXX; moisturizing face cream: YYY)

API output: combining the results: butter fruit eye cream; moisturizing facial cream

Sub prompt4: given a user query: q and preference "medium income group,". Query for merchandise: name: skin care product tag: dry skin. Query results: butter fruit eye cream: XXX;

Moisturizing facial cream: YYY $$ combined result: butter fruit eye cream; moisturizing noodle cream × $' does you need to call an API?

master LLM output4：None

Finaly prompt5: given a user query: q and preference "medium income group,". Query for merchandise: name: skin care product tag: dry skin. Query results: butter fruit eye cream; moisturizing face cream combined results: butter eye cream, moisturizing face cream, please generate user comments LLM for chat Response: the following items are recommended for you to fit dry skin:. The following are all the following.

From the two examples provided above for each of the two implementations, it can be seen that both the recommended needs for the location of interest and the recommended needs for cosmetics are contained in the program or sub-prompts: the method comprises a demand type analyzing step, a query step, a demand repeating step, an information fusion step, a personalized screening step, a recommendation reason generating step and a duplication removing step, wherein the processing steps are arranged in a proper execution sequence, and the steps are realized in a mode of constructing and executing corresponding APIs, so that the result finally presented to a user is more credible and convincing, and the information acquisition requirement of the user can be rapidly and accurately met.

The instruction template implementation mode based on the program has 1) clear structure, clear level and flexible structure adjustment according to the requirement; 2) Because LLM with code generation capability has stronger self logic, the program-based scheme has stronger expandability, and for new functions, the advantage of zero-shot capability can be realized only by adding API name description. The implementation mode of the prompt template based on the sub-prompts is 1) more visual and is easy for labeling personnel to label; 2) Pure text-based LLM can be achieved with the advantage of lower LLM requirements.

The method specifically adopts any one of the two implementation modes, and can be flexibly selected according to the actual requirements in the actual application scene.

Because in the reinforcement learning mode based on human feedback, the recommendation result is usually required to be better aligned with the user requirement by using an RM (Reward Modle, reward model), that is, different replies of the same problem are rewarded by using the RM model, so that the target-generated large language model obtained by final training can select TopM items (alternative results) more conforming to the user preference from the TopK of the underlying personalized recall system to be recommended to the user as final candidates, the RM target should achieve two purposes:

1) The ranking TopM as far as possible is of real interest to the user;

2) In top M, further optimize, let the user really interesting reveal, rank more top.

Based on the optimization objective described above, RM data comes mainly from two parts, on-line and off-line:

Wherein, the offline data refers to: a large amount of user preference information such as user searches, clicks, collections, etc. has historically been accumulated. For a historical search behavior of a user, a model is used for simulation generation, so that the more interested items of the user appear, the higher the ranking output is, the more the output meets the requirements of the user, and the output can be used as a positive example, and the negative example otherwise.

On-line data refers to: the method for collecting the real feedback data of the online user specifically comprises the following steps: the online system may set feedback buttons (e.g., set like and dislike buttons) to allow the user to evaluate the dialogue experience in a session, to actively accept the recommendation of the recommendation system (e.g., the user selects to play the music recommended by the system), to reply to some dissatisfied sentences (e.g., "bad comments, to be improved"), etc.

RM training: the RM is trained in session-based (session-based) manner, i.e. by directly evaluating the quality of a session. The trained LLM RM model has the ability to evaluate a session quality, and to achieve this goal, the loss function can be set as:

i.e. the loss function comprises two parts: for feedback of online user scores etc., MSE (mean square error) loss functions are employed. Where M represents the total number of collected samples containing user feedback, y _m represents the actual user review (score normalized by scoring, bad scoring, etc.), Representing model predictive scores. The scoring mechanism can score the whole dialogue experience of the user, thereby globally optimizing the system capacity

For user search history, online clicks, etc., a pair wise loss function is employed (which is typically used in ranking problems, model is trained by comparing the differences between adjacent pairs-pairs to minimize the difference between the predicted result and the true ranking). Where N represents the number of pair pairs collected, y _p represents the positive example, y _n represents the negative example, d (x, y _p) represents the predictive score of the model for the positive example, and d (x, y _n) represents the predictive score of the model for the negative example. Through this loss function, the RM can achieve the following two capabilities:

1) For the result list recalled by the personalized recall module, the more the user interested results appear in the final recommendation list (selectively generated by the simple-based LLM), the higher the reward score is, so that the model after reinforcement learning can display more user interested results;

2) And for the finally displayed result list, the higher the ranking of the results really interested by the user is, the higher the rewarding score is, so that the user can see the really interested results as soon as possible, and the reading pressure is reduced.

Briefly, the penalty function employed for training the reward model for deriving the target-generative large language model may include two parts:

A first loss function based on feedback construction of the user on the line, and a second loss function based on feedback construction of click behavior in the search history of the user; wherein the first loss function is constructed based on a mean square error function and the second loss function is constructed based on a pair-wise loss function.

On the basis of any of the above embodiments, considering that the finally generated natural language output may contain a plurality of formats and different amounts of information, how to present it to the user in a suitable manner so as to facilitate the user to obtain the corresponding information. The embodiment shows an implementation manner of returning natural language output to a user as recommendation information through fig. 4, and a flow 400 thereof includes the following steps:

step 401: acquiring natural language output generated by a target generation type large language model;

step 402: determining a matched recommended information presentation form according to the information format and/or the information quantity output by the natural language;

The step aims at determining the matched recommended information presentation form according to the information format and/or the information quantity output by the execution subject through natural language.

Wherein, the recommended information presentation form may include: plain text presentation form, plain image presentation form, plain voice presentation form, and mixed presentation form in which at least two of text, image, and voice are mixed.

Different information formats and information amounts correspond to different information presentation forms, for example, natural language output containing only a small amount of information is directly presented in a pure voice mode, so that a user can quickly acquire the information in a voice receiving playing mode without looking at a display screen.

Step 403: and presenting the natural language output to the user according to the recommended information presentation form.

Based on step 402, this step aims at presenting the natural language output to the user in the form of a recommended information presentation by the execution body.

With further reference to fig. 5, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an information recommendation apparatus based on a generated large language model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 5, the information recommendation apparatus 500 based on the generated large language model of the present embodiment may include: an input data acquisition unit 501, a model calling unit 502, and a recommendation information return unit 503. Wherein, the input data acquisition unit 501 is configured to acquire natural language input of a user and personalized information of the user; a model calling unit 502 configured to input a target generation type large language model trained by taking natural language input and personalized information as input data and taking a control generation idea based on planning as an optimization target; the control generation idea based on planning refers to generation of constraint output data by adopting a preset planner matched with input data, wherein the preset planner records a processing procedure planning for realizing information acquisition requirements corresponding to the input data; the recommendation information return unit 503 is configured to return the natural language output of the target-generation-type large language model to the user as recommendation information.

In the present embodiment, in the information recommendation apparatus 500 based on the generated large language model: the specific processing of the input data obtaining unit 501, the model calling unit 502, and the recommendation information returning unit 503 and the technical effects thereof may refer to the relevant descriptions of steps 201 to 203 in the corresponding embodiment of fig. 2, and are not described herein.

In some optional implementations of this embodiment, the preset planner includes a plurality of processing steps for processing information acquisition requirements corresponding to the input data, where each processing step is obtained by understanding and planning the corresponding information acquisition requirements in steps, and the processing steps include at least a personalized screening step for screening candidate results matching personalized information, and a reason generating step for generating a reason for recommendation matching personalized information for the screened results.

In some alternative implementations of the present embodiment, the processing step may further include at least one of:

A question analyzing step for analyzing a type to which a question represented by the natural language input belongs;

A question query step for querying all alternative results according to the analyzed question type;

A question re-formulation step for re-summarizing a question represented by the natural language input;

a fusion step for fusing the different information with the association obtained in the previous step;

A deduplication step for removing duplicate content.

In some optional implementations of the present embodiment, the information recommendation apparatus 500 based on the generative large language model may further include: a training sample construction unit configured to train training samples of the target-generated large language model, the training sample construction unit may be further configured to:

clustering natural language inputs of different users according to the text and the intention similarity to obtain a plurality of clustering results;

Extracting a first number of target natural language inputs from each clustering result respectively, and obtaining a preset planner for marking objects as target natural language input marks;

Aiming at each clustering result, real sample pairs formed by target natural language input under the corresponding clustering result and a corresponding preset template are taken as few sample prompts, and a second number of increment sample pairs are generated through a generation type large language model with code generation capability; wherein the second number is substantially greater than the first number;

Training samples are constructed based on the true sample pairs and the incremental sample pairs.

In some optional implementations of the present embodiment, the preset planner may include:

an instruction template based on the encoded instruction form; the instruction template comprises a plurality of processing instructions for processing information acquisition requirements corresponding to input data, each processing instruction is obtained after step understanding and planning of the corresponding information acquisition requirements, and the processing instructions comprise execution instructions of corresponding application programming interfaces.

A hint template formed based on the sub hint form; the prompt template comprises a plurality of sub prompts for processing the whole information acquisition requirement split corresponding to the input data, each sub prompt is obtained after the corresponding information acquisition requirement is understood and planned in steps, and the sub prompts comprise execution instructions of corresponding application programming interfaces.

In some alternative implementations of the present embodiment, the penalty function employed for training the reward model for obtaining the target-generated large language model includes:

In some optional implementations of the present embodiment, the recommendation information returning unit 503 may be further configured to:

acquiring natural language output generated by a target generation type large language model;

Determining a matched recommended information presentation form according to the information format and/or the information quantity output by the natural language; the recommended information presentation form comprises: a plain text presentation form, a plain image presentation form, a plain voice presentation form, and a mixed presentation form in which at least two of text, image, and voice are mixed;

and presenting the natural language output to the user according to the recommended information presentation form.

The present embodiment exists as an embodiment of an apparatus corresponding to the foregoing embodiment of the method, and the information recommendation device based on a generative large language model provided in this embodiment uses natural language input and personalized information of a user describing an information acquisition requirement thereof as input data, and inputs the input data into a target generative large language model trained by using a control generation idea based on planning as an optimization target, so as to fully restrict generation of output data by means of a preset planner matched with the input data and adopted in a control generation idea based on planning, and the preset planner is used for recording a characteristic of a process plan for realizing the information acquisition requirement corresponding to the input data, so that the natural language output generated by the generative large language model can be given to a thinking, analyzing and processing process embodying the information acquisition requirement corresponding to the natural language output by the natural language output, and the candidate result is screened by combining personalized information, so that the final natural language output meets the personalized requirement of the user more with higher interpretability and reliability.

According to an embodiment of the present disclosure, the present disclosure further provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the information recommendation method based on the generated large language model described in any of the above embodiments when executed by the at least one processor.

According to an embodiment of the present disclosure, there is also provided a readable storage medium storing computer instructions for enabling a computer to implement the information recommendation method based on the generated large language model described in any of the above embodiments when executed.

According to an embodiment of the present disclosure, there is also provided a computer program product which, when executed by a processor, is capable of implementing the steps of the information recommendation method based on a generative large language model described in any of the above embodiments.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, for example, an information recommendation method based on a generative large language model. For example, in some embodiments, the information recommendation method based on the generative large language model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the information recommendation method based on the generative large language model described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the information recommendation method based on the generative large language model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) PRIVATE SERVER service.

According to the technical scheme of the embodiment of the disclosure, the natural language input of the information acquisition requirement of a user and personalized information thereof are simultaneously used as input data, the input data are input into the target generation type large language model trained by taking the control generation idea based on planning as an optimization target, generation of output data is restrained by a preset planner which is adopted by the control generation idea based on planning and matched with the input data, the preset planner is used for recording the characteristic of process planning for realizing the information acquisition requirement corresponding to the input data, the generated large language model can output natural language capable of being generated into thinking, analyzing and processing processes for reflecting the information acquisition requirement corresponding to the natural language output, and the candidate result is screened by combining personalized information, so that the final natural language output meets the personalized requirement of the user and has higher interpretability and reliability.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An information recommendation method based on a generative large language model comprises the following steps:

Acquiring natural language input of a user and personalized information of the user;

inputting the natural language input and the personalized information as input data, and inputting a target generation type large language model trained by taking a control generation idea based on planning as an optimization target, wherein the control generation idea based on planning refers to generation of constraint output data by adopting a preset planner matched with the input data, the preset planner records a processing procedure plan for realizing information acquisition requirements corresponding to the input data, the processing procedure plan comprises a plurality of processing steps which are obtained after step understanding and planning based on corresponding information acquisition requirements, and the processing steps at least comprise a personalized screening step for screening candidate results matched with the personalized information and a reason generation step for generating recommendation reasons matched with the personalized information for the screened results;

and returning natural language output of the target generation type large language model to the user as recommendation information.

2. The method of claim 1, wherein the processing step further comprises at least one of:

A requirement type analyzing step for analyzing the type of the information acquisition requirement corresponding to the natural language input;

A result query step for performing all the candidate result queries according to the analyzed demand types;

A requirement review step for re-summarizing the information acquisition requirement corresponding to the natural language input;

A deduplication step for removing duplicate content.

3. The method of claim 1, further comprising: building a training sample for training to obtain the target generation type large language model, wherein the building the training sample for training to obtain the target generation type large language model comprises the following steps:

extracting a first number of target natural language inputs from each clustering result respectively, and obtaining a preset planner for labeling objects as input labels of the target natural language;

Aiming at each clustering result, real sample pairs formed by target natural language input under the corresponding clustering result and a corresponding preset planner are taken as few sample prompts, and a second number of increment sample pairs are generated through a generation type large language model with code generation capability; wherein the second number is substantially greater than the first number;

The training samples are constructed based on the real sample pairs and the incremental sample pairs.

4. A method according to any one of claims 1-3, wherein the preset planner comprises:

an instruction template based on the encoded instruction form; the instruction template comprises a plurality of processing instructions for processing information acquisition requirements corresponding to the input data, each processing instruction is obtained based on step understanding and planning of the corresponding information acquisition requirements, and the processing instructions comprise execution instructions of corresponding application programming interfaces.

5. A method according to any one of claims 1-3, wherein the preset planner comprises:

A hint template formed based on the sub hint form; the prompt template comprises a plurality of sub prompts for processing the split of the whole information acquisition requirement corresponding to the input data, each sub prompt is obtained after the corresponding information acquisition requirement is understood and planned in steps, and the sub prompts comprise execution instructions for corresponding application programming interfaces.

6. The method of claim 1, wherein the penalty function employed for training the reward model for deriving the target-generated large language model comprises:

7. The method of claim 1, wherein the returning the natural language output of the target-generated large language model to the user as recommendation information comprises:

acquiring natural language output generated by the target generation type large language model;

Determining a matched recommended information presentation form according to the information format and/or the information quantity output by the natural language; wherein, the recommended information presentation form includes: a plain text presentation form, a plain image presentation form, a plain voice presentation form, and a mixed presentation form in which at least two of text, image, and voice are mixed;

8. An information recommendation device based on a generative large language model, comprising:

an input data acquisition unit configured to acquire natural language input of a user and personalized information of the user;

A model calling unit configured to input a target generation type large language model trained by taking the natural language input and the personalized information as input data, wherein the target generation type large language model is trained by taking a control generation idea based on a plan as an optimization target, the control generation idea based on the plan refers to generation of constraint output data by a preset planner matched with the input data, the preset planner records a processing procedure plan for realizing information acquisition requirements corresponding to the input data, the processing procedure plan comprises a plurality of processing steps which are obtained after step understanding and planning are respectively based on the corresponding information acquisition requirements, and the processing steps at least comprise a personalized screening step for screening candidate results matched with the personalized information and a reason generation step for generating recommendation reasons matched with the personalized information for the screened results;

And a recommendation information return unit configured to return natural language output of the target-generated large language model to the user as recommendation information.

9. The apparatus of claim 8, wherein the processing step further comprises at least one of:

A deduplication step for removing duplicate content.

10. The apparatus of claim 8, further comprising: a training sample construction unit configured to train training samples from which the target-generated large language model is derived, the training sample construction unit being further configured to:

11. The apparatus of any of claims 8-10, wherein the preset planner comprises:

12. The apparatus of any of claims 8-10, wherein the preset planner comprises:

13. The apparatus of claim 8, wherein the penalty function employed for training a reward model for deriving the target-generated large language model comprises:

14. The apparatus of claim 8, wherein the recommendation information return unit is further configured to:

15. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the large language model-based information recommendation method of any one of claims 1-7.

16. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the large language model-based information recommendation method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the information recommendation method based on a generative large language model according to any of claims 1 to 7.