CN118211065A

CN118211065A - Training method and device for large language model, electronic equipment and storage medium

Info

Publication number: CN118211065A
Application number: CN202410275666.1A
Authority: CN
Inventors: 叶忻; 林梓佳
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-06-18

Abstract

The disclosure relates to a training method, a training device, an electronic device and a storage medium of a large language model, wherein the training method comprises the following steps: in the current stage of training the large language model, classifying training texts in a current training data set through a proxy model trained in the previous stage to obtain a first classification result; training the current stage of the large language model trained in the previous stage according to the first classification result; training the proxy model of the previous stage according to the first classification result to obtain the proxy model of the current stage; classifying the training texts through the proxy model to obtain a second classification result; calibrating the second classification result to obtain a calibration result; retraining the proxy model according to the checking result to obtain a retrained proxy model in the current stage; and taking the next stage as the current stage, and iteratively executing the steps until the training of the large language model is completed. The present disclosure may improve training efficiency and reduce computing resource consumption.

Description

Training method and device for large language model, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of natural language processing, and in particular relates to a training method and device for a large language model, electronic equipment and a storage medium.

Background

In the information age today, large language models have become the core technology in the field of natural language processing, being widely used in various business and scientific research scenarios. These models can automatically learn the grammar, semantics and context of natural language, making them excellent in tasks such as text generation, translation, question-answering, etc. However, training of large-scale language models requires a large corpus, as well as more personalization and customization, which in many cases is expensive and time consuming.

In the related art, training of large language models typically employs conventional pre-training and fine-tuning methods. In the pre-training phase, the model is exposed to a large corpus to learn language knowledge, and then in the fine-tuning phase, fine-tuning is performed based on task-specific data. The training process can be detected by fixed indexes, but cannot be performed with autonomous label updating, which requires a lot of time, calculation resources and human resources for training, and is unknown to the use of the model after long-term black box training.

Disclosure of Invention

The disclosure provides a training method, device, electronic equipment and storage medium for a large language model, so as to at least solve the problems of low training efficiency and resource waste of the large language model in the related technology. The technical scheme of the present disclosure is as follows:

According to a first aspect of embodiments of the present disclosure, there is provided a training method of a large language model, including:

In a current stage of training a large language model, classifying training texts in a current training data set according to text attributes of training texts through a proxy model trained in a previous stage to obtain a first classification result, wherein the current training data set is a data set for training the large language model in the current stage, and the first classification result comprises a first corresponding relation between the training texts and category labels;

according to the first classification result, inputting the training text into the large language model trained in the previous stage, predicting a dialogue output text corresponding to the training text through the large language model, and training the large language model in the current stage according to the training text and the dialogue output text;

Continuously training the agent model trained in the previous stage according to the first classification result to obtain an agent model in the current stage;

classifying the training texts in the current training data set according to the text attribute of the training texts through the proxy model of the current stage to obtain a second classification result;

checking the second classification result to obtain a checking result;

retraining the proxy model of the current stage according to the checking result to obtain a retrained proxy model of the current stage, wherein the retrained proxy model of the current stage is used for classifying training texts in the next stage;

And when the large language model is not trained, taking the next stage as the current stage of the large language model, and iteratively executing the steps until the large language model is trained.

Optionally, the training the agent model trained in the previous stage according to the first classification result includes:

And extracting training texts from the current training data set, and continuously training the agent model trained in the previous stage based on the extracted training texts and class labels of the extracted training texts in the first classification result.

Optionally, inputting the training text into the large language model trained in the previous stage according to the first classification result, predicting a dialogue output text corresponding to the training text through the large language model, and performing current stage training on the large language model according to the training text and the dialogue output text, where the method includes:

according to a preset class proportion and class labels of each training text in the first classification result, training texts in the current training data set are formed into a plurality of training batches, and each training batch comprises a plurality of training texts;

And sequentially inputting the training texts of each training batch into the large language model trained in the previous stage, predicting dialogue output texts corresponding to the training texts in the training batch through the large language model, and training the large language model trained in the previous stage according to the training texts and the dialogue output texts in the training batch.

Optionally, the inputting the training text into the large language model trained in the previous stage according to the first classification result, predicting a dialogue output text corresponding to the training text through the large language model, and performing the training in the current stage on the large language model according to the training text and the dialogue output text, and further includes:

After training the training batch of the large language model according to each training batch, determining the verification index value of the large language model trained by the training batch according to the verification data set.

Optionally, the second classification result includes a second correspondence between the training text and a class label, and a classification feature of each class label; the classification features of the class labels are obtained by extracting features of training texts under the class labels in the second classification result;

the proofreading result comprises a second corresponding relation between each training text and the class label after proofreading, and classification characteristics after proofreading of each class label.

Optionally, retraining the proxy model of the current stage according to the collation result to obtain a retrained proxy model of the current stage, including:

Clustering the classified features after each class label is checked and the classified features determined in the previous stage to obtain updated classified features of each class label;

classifying the training texts in the current training data set according to the updated classification characteristics to obtain a third classification result;

And retraining the proxy model of the current stage according to the third classification result to obtain the retrained proxy model of the current stage.

Optionally, in the current stage of training the large language model, classifying the training text in the current training dataset according to the text attribute of the training text by using the proxy model trained in the previous stage to obtain a first classification result, including:

And when the current stage is the first stage, classifying the training texts in the current training data set through the initial classification characteristics of each class label to obtain the first classification result.

Optionally, the classifying, by the proxy model of the current stage, the training text in the current training dataset to obtain a second classification result includes:

randomly sampling training texts in the current training data set;

And classifying the sampled training texts through the proxy model of the current stage to obtain the second classification result.

According to a second aspect of embodiments of the present disclosure, there is provided a training apparatus of a large language model, including:

The first classification module is configured to execute the training text in a current training data set according to the text attribute of the training text by the agent model trained in the previous stage in the current stage of training the large language model to obtain a first classification result, wherein the current training data set is used for training the large language model in the current stage, and the first classification result comprises a first corresponding relation between the training text and a class label;

The large model training module is configured to execute a large language model after the training text is input into the previous stage training according to the first classification result, predict a dialogue output text corresponding to the training text through the large language model, and train the large language model in the current stage according to the training text and the dialogue output text;

The agent model training module is configured to execute continuous training on the agent model trained in the previous stage according to the first classification result to obtain an agent model in the current stage;

The second classification module is configured to perform classification on the training texts in the current training data set according to the text attributes of the training texts through the proxy model of the current stage to obtain a second classification result;

The checking module is configured to perform checking on the second classification result to obtain a checking result;

The agent model retraining module is configured to execute retraining of the agent model of the current stage according to the checking result to obtain a retrained agent model of the current stage, and the retrained agent model of the current stage is used for classifying training texts in the next stage;

And the iteration control module is configured to take the next stage as the current stage of the large language model when the large language model is not trained, and iteratively execute the steps until the large language model is trained.

Optionally, the proxy model training module is configured to perform:

Optionally, the large model training module includes:

A training batch construction unit configured to execute training texts in the current training dataset according to a preset category proportion and category labels of each training text in the first classification result to form a plurality of training batches, wherein each training batch comprises a plurality of training texts;

and the large model training unit is configured to execute the large language model after the training of the previous stage by sequentially inputting the training texts of each training batch, predict the dialogue output text corresponding to each training text in the training batch through the large language model, and train the current stage on the large language model after the training of the previous stage according to the training texts and the dialogue output text in the training batch.

Optionally, the large model training module further includes:

and the verification module is configured to determine verification index values of the large language model after training the training batch according to the verification data set after training the large language model according to each training batch.

Optionally, the proxy model retraining module includes:

A feature clustering unit configured to perform clustering on the classification feature of each class label after the calibration and the classification feature determined in the previous stage, to obtain an updated classification feature of each class label;

The data classification unit is configured to perform classification on the training texts in the current training data set according to the updated classification characteristics to obtain a third classification result;

And the retraining unit is configured to retrain the proxy model of the current stage according to the third classification result to obtain the retrained proxy model of the current stage.

Optionally, the first classification module includes:

and the initial classification unit is configured to perform classification on the training text in the current training data set according to the initial classification characteristics of each class label when the current stage is the first stage, so as to obtain the first classification result.

Optionally, the second classification module includes:

a sampling unit configured to perform random sampling of training text in the current training data set;

and the second classification unit is configured to perform classification on the training text obtained by sampling through the proxy model of the current stage, and obtain the second classification result.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

A processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the large language model as described in the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the training method of the large language model as described in the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program or computer instructions which, when executed by a processor, implement the training method of the large language model according to the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the embodiment of the disclosure, the trained text is classified according to the text attribute of the trained text by using the proxy model trained in the previous stage in the current stage of training the large language model, the training in the current stage is continuously carried out on the large language model trained in the previous stage based on the first classification result, meanwhile, the continuous training is carried out on the proxy model trained in the previous stage based on the first classification result, the proxy model in the current training data set is obtained, the trained text in the current training data set is classified by using the proxy model trained in the previous stage, the second classification result is obtained, the correction result is obtained, the proxy model in the current stage is further trained again based on the correction result, the trained text can be classified in the next stage, and as the proxy model in the current stage is classified, the training of the proxy model is carried out again, the active learning of the large language model is realized based on the calibration and the proxy model, the complicated traditional process of pre-training, pre-training and reinforcement learning in the large language model is combined, more real-time data are introduced in the pre-training process, the redundancy data is reduced, the time consumption of the relevant training data is shortened, and the calculation cost is shortened.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flowchart illustrating a method of training a large language model, according to an example embodiment;

FIG. 2 is a training schematic of a large language model in an embodiment of the present disclosure;

FIG. 3 is a block diagram of a training apparatus for a large language model, shown in accordance with an exemplary embodiment;

fig. 4 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart illustrating a training method of a large language model according to an exemplary embodiment, and as shown in fig. 1, the training method of a large language model may be used in an electronic device such as a server, including the following steps.

In step S11, in a current stage of training a large language model, classifying training texts in a current training dataset according to text attributes of the training texts by using a proxy model trained in a previous stage, to obtain a first classification result, where the current training dataset is a dataset for training the large language model in the current stage, and the first classification result includes a first correspondence between the training texts and class labels.

When training a large language model, the training can be performed in a plurality of stages, and in the training process of each stage, the training process can be manually interfered to correct the corresponding category so as to realize personalized and customized model training. Large language models, such as GPT-3, are powerful natural language processing tools that perform well on a variety of natural language tasks, including text generation, translation, emotion analysis, etc., and these models are typically composed of deep neural networks.

In the training process of each stage of the large language model, a small-scale (less network parameters) proxy model is trained simultaneously, and the proxy model obtained by retraining in one stage is used for classifying training texts in the training of the next stage. Each stage is trained using a different training text.

And in the current stage of training the large language model, acquiring part of training texts which are not used in the previous training stages from the large-scale training corpus to form a current training data set, classifying the training texts in the current training data set by using the agent model after retraining in the previous stage according to the text attribute of the training texts, and obtaining a first classification result of the current training data set. The first classification result comprises a plurality of category labels, each category label is preset, and the first classification result gives a first corresponding relation between the training text and the category labels, namely the training text included in each category label. The category labels may be differentiated by source, type, difficulty, etc. of the training text. The text attributes may include the source, type, difficulty, etc. of the training text.

In step S12, according to the first classification result, the training text is input into the large language model trained in the previous stage, the dialogue output text corresponding to the training text is predicted through the large language model, and the training in the current stage is performed on the large language model according to the training text and the dialogue output text.

The large language model can be used in the scene of question-answer dialog, and can also be used in other scenes. And training the large language model trained in the previous stage continuously in the current stage by using the training text and the class label corresponding to the training text in the first classification result. When the training of the current stage is continued on the large language model trained in the previous stage, the training text is input into the large language model trained in the previous stage, the dialogue output text corresponding to the training text is predicted through the large language model, and the network parameters of the large language model are adjusted according to the training text and the dialogue output text so as to train the large language model in the current stage. After the training of the current stage is completed, freezing the network parameters of the large language model trained in the current stage, and updating the network parameters of the large language model again in the next stage of training.

In step S13, the agent model trained in the previous stage is continuously trained according to the first classification result, so as to obtain the agent model in the current stage.

The Proxy Model (Proxy Model) is a classification Model, and is used for classifying training texts, and the Proxy Model can divide the training texts into a plurality of categories, namely categories corresponding to a plurality of category labels in the first classification result.

Inputting training texts in the first classification result into the agent model trained in the previous stage, processing the training texts through the agent model trained in the previous stage, outputting the prediction category to which the training texts belong, adjusting parameters of the agent model trained in the previous stage based on the prediction category and the category label of the training texts in the first classification result, and iteratively executing the training process of the agent model trained in the previous stage until the agent model trained in the previous stage is continuously trained, so as to obtain the agent model in the current stage.

In step S14, classifying the training texts in the current training dataset according to the text attribute of the training texts through the proxy model of the current stage, so as to obtain a second classification result.

After the agent model training of the current stage is completed, classifying the training texts in the current training data set by using the agent model according to the text attribute of the training texts to obtain a second classification result. A portion of the training text may be extracted from the current training dataset and the extracted training text may be input into the proxy model to obtain a category label for the training text in the second classification result. The class label of a training text in the second classification result may be the same as or different from the class label of the training text in the first classification result.

The second classification result comprises a second corresponding relation between the training text and the class labels and classification characteristics of each class label; the classification features of the class labels are obtained by extracting features from training texts under the class labels in the second classification result. After classifying the training texts through the proxy model to obtain class labels corresponding to the training texts, extracting classification features of all training texts under the class labels in the second classification result according to each class label to obtain classification features of the class labels. The classification features may include features such as keywords.

In step S15, the second classification result is checked to obtain a checked result.

And classifying training texts in the current training data set through the agent model trained in the current stage, and after a second classification result is obtained, checking the second classification result to obtain a checking result. The method for checking the second classification result may be that the second classification result is sent to a target person for checking, the target person may correct the class label of the training text with wrong class label in the second classification result, and may correct, supplement, delete the classification feature corresponding to each class label in the second classification result, so as to calibrate the class label. And after the second classification result is checked by the target personnel, acquiring a check result of the second classification result by the target personnel. The proofreading result comprises a second corresponding relation between each training text and the class label after proofreading, and classification characteristics of each class label after proofreading.

The manual instructions are mapped on the class labels of the training text and can influence the training in real time, the whole pre-training process of the large language model can be dynamically regulated, and compared with the traditional method, the functions of fine adjustment and reinforcement learning RLHF (Reinforcement Learning from Human Feedback, human feedback reinforcement learning) are all covered in the pre-training model, so that the utilization efficiency of various resources is improved, the training of the model is more efficient while the training process is interfered by weak manual, and real-time influence and quicker updating iteration can be generated.

In step S16, retraining the proxy model of the current stage according to the collation result, to obtain a retrained proxy model of the current stage, where the retrained proxy model of the current stage is used to classify training text in the next stage.

And retraining the proxy model of the current stage based on the checking result, namely inputting a training text in the checking result into the proxy model of the current stage, acquiring an output result of the proxy model, adjusting parameters of the proxy model based on the output result and a class label of the training text in the checking result, and iteratively executing the training process until the retraining of the proxy model of the current stage is completed. Or reclassifying the current training data set based on the updated classification characteristic in the calibration result, and retraining the proxy model of the current stage based on the reclassifying training text to obtain the retrained proxy model of the current stage.

Through the above-mentioned proofreading of the second classification result and retraining the agent model based on the proofreading result, the active learning in the training process of the large language model is realized, and the label calibration can be performed in the training process. Active learning is a machine learning method in which a model selectively requests labels or answers to improve its performance. This approach can be used for data selection to help the model achieve better performance with a limited data budget.

In step S17, it is determined whether the training of the large language model is completed, if yes, the training is ended, if not, the next stage is taken as the current stage of the large language model, and steps S11 to S17 are iteratively executed until the training of the large language model is completed.

The condition for completing the training of the large language model may be parameter convergence of the large language model, or may be that the training times reach the target times, etc.

Judging whether the large language model is trained based on the condition that the large language model is trained, ending the training if the large language model is trained, taking the large language model trained in the current stage as a final large language model, taking the next stage of the current stage as the current stage of the large language model training again if the large language model is not trained, and iteratively executing the steps S11 to S17 until the large language model training is finished.

According to the training method of the large language model provided by the embodiment, the training text is classified according to the text attribute of the training text by using the agent model trained in the previous stage in the current stage of training the large language model, the training in the current stage is continuously carried out on the large language model trained in the previous stage based on the first classification result, the agent model trained in the previous stage is continuously trained based on the first classification result, the agent model in the current stage is obtained, the training text in the current training data set is classified by the agent model after training, the second classification result is obtained, the second classification result is checked, and the check result is obtained, and then training the agent model in the current stage again based on the correction result, and classifying the training text in the next stage by the retrained agent model, because the agent model in the current stage can be calibrated and the training of the agent model is retrained, the active learning of the large language model is realized based on the calibration and the agent model, and the complicated traditional processes of pretraining, fine tuning and reinforcement learning in the large language model are combined, more real-time is introduced only when all functions are realized in pretraining, and the training of redundant data and irrelevant data is reduced, thereby improving the training efficiency, shortening the training period and reducing the consumption of calculation resources and the time cost.

On the basis of the above technical solution, the training the agent model trained in the previous stage according to the first classification result includes: and extracting training texts from the current training data set, and continuously training the agent model trained in the previous stage based on the extracted training texts and class labels of the extracted training texts in the first classification result.

And extracting part of training texts from the current training data set, and continuously training the agent model trained in the previous stage by using the extracted part of training texts. Inputting the extracted training text into the agent model trained in the previous stage, classifying the extracted training text through the agent model trained in the previous stage to obtain a prediction type corresponding to the extracted training text, adjusting parameters of the agent model trained in the previous stage based on the prediction type and the type label of the extracted training text in the first classification result, and iteratively executing the training process of the agent model trained in the previous stage based on the extracted training text until the agent model trained in the previous stage is continuously trained.

Because the parameter quantity of the agent model is small, the training of the agent model can be completed by extracting part of training text, and the agent model is trained based on the first classification result, so that the agent model can learn the classification characteristics in the first classification result, and further the classification characteristics of the second classification result obtained based on the agent model are the same as the classification characteristics in the first classification result, thereby facilitating the correction of the agent model classification and controlling the training direction of the large language model.

On the basis of the above technical solution, the inputting the training text into the large language model trained in the previous stage according to the first classification result, predicting the dialogue output text corresponding to the training text by the large language model, and performing the training of the current stage on the large language model according to the training text and the dialogue output text includes: according to a preset class proportion and class labels of each training text in the first classification result, training texts in the current training data set are formed into a plurality of training batches, and each training batch comprises a plurality of training texts; and sequentially inputting the training texts of each training batch into the large language model trained in the previous stage, predicting dialogue output texts corresponding to the training texts in the training batch through the large language model, and training the large language model trained in the previous stage according to the training texts and the dialogue output texts in the training batch.

According to the data volume required by the training batches and the preset class proportion, determining the data volume corresponding to each class label in each training Batch, further acquiring the training text of the data volume corresponding to each class label from the current training data set based on the class label of each training text in the first classification result, forming a training Batch (Batch), and forming a plurality of training batches required by a training stage according to the mode, for example, forming 3 training batches Batch1, batch2 and Batch3. And (3) continuously training the large language model trained in the previous stage by using the training texts of each training batch, namely sequentially inputting the training texts of each training batch into the large language model trained in the previous stage, predicting dialogue output texts corresponding to the training texts in the training batch through the large language model, and training the large language model trained in the previous stage according to the training texts and the dialogue output texts in the training batch.

The training batches are formed based on the preset class proportion, and the large language model is trained in the current stage based on the training batches, so that the large language model learns the characteristics of training texts of different classes in one training batch, and the training effect of the model is improved.

In an exemplary embodiment, the inputting the training text into the large language model trained in the previous stage according to the first classification result, predicting a dialogue output text corresponding to the training text through the large language model, and performing current stage training on the large language model according to the training text and the dialogue output text, and further includes: after training the training batch of the large language model according to each training batch, determining the verification index value of the large language model trained by the training batch according to the verification data set.

After training the large language model based on the training batch, performing verification index monitoring on the large language model trained by the training batch by using a verification data set, namely inputting verification texts in the verification data set into the large language model trained by the training batch, outputting dialogue output texts corresponding to the verification texts based on the large language model, determining verification index values of the large language model, and determining whether the verification index values meet requirements, so that the large language model is subjected to verification index monitoring, and the normal operation of the large language model is ensured. The verification data set may be a manually marked data set, or may also be a data set obtained by classifying text data based on the determined classification feature, or may be a data set of a data category label determined in other manners.

Based on the above technical solution, retraining the proxy model of the current stage according to the collation result to obtain a retrained proxy model of the current stage, including: clustering the classified features after each class label is checked and the classified features determined in the previous stage to obtain updated classified features of each class label; classifying the training texts in the current training data set according to the updated classification characteristics to obtain a third classification result; and retraining the proxy model of the current stage according to the third classification result to obtain the retrained proxy model of the current stage.

Wherein the classification characteristic determined by one classification label in the previous stage is the classification characteristic of the target person after the classification label is checked in the training of the previous stage.

And clustering the classified features after each class label is calibrated and the classified features determined in the previous stage, namely integrating the classified features of one class label after the class label is calibrated and the classified features in the previous stage to obtain all the classified features corresponding to one class label, wherein the classified features are updated classified features of the class label. And classifying the training texts in the current training data set again based on the updated classification characteristics of the class labels to obtain a third classification result, namely if one training text comprises the updated classification characteristics of the class labels, determining that the class labels are the class labels of the training text in the third classification result. And retraining the proxy model of the current stage based on the third classification result to obtain the retrained proxy model of the current stage.

The classification features of the class labels are clustered, the agent model is retrained based on the updated classification features, and the division mode of the training text can be updated continuously based on the requirements, so that the large language model can adapt to the continuously changing tasks and requirements in the training process, and the accuracy and quality of the training text can be improved through data correction, so that the model can better understand the training text.

On the basis of the technical scheme, in the current stage of training the large language model, classifying the training texts in the current training dataset according to the text attributes of the training texts through the proxy model trained in the previous stage to obtain a first classification result, wherein the method comprises the following steps: and when the current stage is the first stage, classifying the training texts in the current training data set through the initial classification characteristics of each class label to obtain the first classification result.

The initial classification feature can be used as a seed (seed) to guide the preliminary training process based on the source, type and difficulty of the training text and preset the required class labels and the initial classification feature corresponding to each class label.

In the first stage of training of the large language model, the training texts in the current training data set are classified based on the initial classification characteristics of each class of labels, the classification characteristics in each training text can be extracted, if the classification characteristics of one training text comprise the initial classification characteristics of one class label, the training text can be determined to belong to the class corresponding to the class label, and further the class label corresponding to each training text is obtained, so that a first classification result is obtained.

Through the initial classification characteristic, a clear initial direction is provided for training of a large language model, the uncertainty of the model can be reduced, and the training efficiency is improved.

The method introduces a dynamic data partitioning strategy, a part of seed data (text data classified based on initial classification characteristics) can be initialized according to different data sources, data types and data difficulties, then the partitioning mode of the training text is gradually updated according to the seed data (retraining of the proxy model), and continuous training of the large language model is gradually carried out (continue training), so that the large language model can dynamically adjust the training text according to specific tasks and requirements, and better adaptability and performance are realized.

On the basis of the above technical solution, the classifying, by the proxy model in the current stage, the training text in the current training dataset according to the text attribute of the training text to obtain a second classification result includes: randomly sampling training texts in the current training data set; and classifying the sampled training texts through the proxy model of the current stage to obtain the second classification result.

And randomly sampling training texts in a current training data set used for training the large language model of the current stage, and classifying the sampled training texts by using the proxy model of the current stage to obtain a second classification result.

The training texts obtained by sampling are classified through the agent model, so that a second classification result is obtained, the data size of the second classification result is small, the calibration is convenient, the manual calibration is particularly convenient, the training efficiency of the model can be improved, and the accuracy and the quality of the training texts can be improved through label calibration of the agent model and the calibration.

FIG. 2 is a schematic diagram of training a large language model in an embodiment of the disclosure, as shown in FIG. 2, performing pre-training initialization on the large language model, classifying training texts in a large-scale training corpus based on initial classification features (data partitioning manner) to obtain a first classification result, constructing a training batch based on the first classification result, and performing a first-stage training on the large language model based on the training batch; extracting part of training text from the first classification result, training the proxy model, classifying the training text by using the trained proxy model, performing label calibration on the classification result of the proxy model by manual intervention, and performing subsequent-stage pre-training by using the calibrated training text, so that the model can be better converged by cyclic reciprocation.

In the embodiment of the disclosure, more active learning ideas are introduced in the pre-training process of the large language model, so that manual and agent models are allowed to intervene in each stage of training of the large language model, more accurate model training can be provided without relying on a traditional reinforcement learning or automation method, pre-training can even achieve almost real-time strong manual feedback, manual intervention can directly influence the learning of the model, the model is more in line with specific tasks and requirements, each gradient of the model can be back-propagated towards the direction of label difference, and batch training can be continuously performed; the embodiment of the disclosure breaks through the strong limitation of the data labels in the pre-training process, allows personalized and customized training texts to be provided for the model according to a wide dividing mode, enables the model to better adapt to different business requirements and tasks, and improves the practical applicability of the model; through dynamic data division and label calibration, each stage of continuous training (continuous train) of pre-training can be updated towards the direction of optimal gradient, so that the quality of training text is improved, a model can more accurately understand and learn text data, and the performance and accuracy of the model are improved.

FIG. 3 is a block diagram of a training apparatus for a large language model, according to an example embodiment. Referring to FIG. 3, the apparatus includes a first classification module 31, a large model training module 32, a proxy model training module 33, a second classification module 34, a collation module 35, a proxy model retraining module 36, and an iteration control module 37.

The first classification module 31 is configured to perform classification on training texts in a current training dataset according to text attributes of training texts by a proxy model trained in a previous stage in a current stage of training the large language model, so as to obtain a first classification result, wherein the current training dataset is a dataset for training the large language model in the current stage, and the first classification result comprises a first corresponding relation between the training texts and class labels;

The big model training module 32 is configured to execute the big language model after inputting the training text into the previous training according to the first classification result, predict the dialogue output text corresponding to the training text through the big language model, and perform the current training on the big language model according to the training text and the dialogue output text;

The proxy model training module 33 is configured to perform continuous training on the proxy model trained in the previous stage according to the first classification result, so as to obtain a proxy model in the current stage;

The second classification module 34 is configured to perform classification of the training text in the current training dataset according to the text attribute of the training text through the proxy model of the current stage, so as to obtain a second classification result;

the collation module 35 is configured to execute collation of the second classification result, resulting in a collation result;

The proxy model retraining module 36 is configured to perform retraining of the proxy model of the current stage according to the collation results to obtain a retrained proxy model of the current stage, the retrained proxy model of the current stage being used to classify training text in a next stage;

The iteration control module 37 is configured to take the next stage as the current stage of the large language model when the large language model is not trained, and to iteratively perform the above steps until the large language model is trained.

Optionally, the proxy model training module is configured to perform:

Optionally, the large model training module includes:

Optionally, the large model training module further includes:

Optionally, the proxy model retraining module includes:

Optionally, the first classification module includes:

Optionally, the second classification module includes:

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 4 is a block diagram of an electronic device, according to an example embodiment. For example, the electronic device 400 may be provided as a server or the like. Referring to fig. 4, electronic device 400 includes a processing component 422 that further includes one or more processors, and memory resources represented by memory 432, for storing instructions, such as applications, executable by processing component 422. The application program stored in memory 432 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 422 is configured to execute instructions to perform the training method of the large language model described above.

The electronic device 400 may also include a power component 426 configured to perform power management of the electronic device 400, a wired or wireless network interface 450 configured to connect the electronic device 400 to a network, and an input output (I/O) interface 458. The electronic device 400 may operate based on an operating system stored in the memory 432, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory 432, comprising instructions executable by the processing component 422 of the electronic device 400 to perform the training method of the large language model described above. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, a computer program product is also provided, comprising a computer program or computer instructions which, when executed by a processor, implement the above-described training method of a large language model.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for training a large language model, comprising:

checking the second classification result to obtain a checking result;

2. The method of claim 1, wherein the continuing training of the previously trained proxy model based on the first classification result comprises:

3. The method according to claim 1, wherein the inputting the training text into the large language model trained in the previous stage according to the first classification result, predicting the dialogue output text corresponding to the training text through the large language model, and performing the training of the current stage on the large language model according to the training text and the dialogue output text includes:

4. The method according to claim 3, wherein the inputting the training text into the large language model trained in the previous stage according to the first classification result, predicting the dialogue output text corresponding to the training text through the large language model, and performing the training in the current stage on the large language model according to the training text and the dialogue output text, further comprises:

5. The method of any one of claims 1-4, wherein the second classification result comprises a second correspondence of the training text to category labels, and a classification feature for each of the category labels; the classification features of the class labels are obtained by extracting features of training texts under the class labels in the second classification result;

6. The method of claim 5, wherein retraining the proxy model of the current stage based on the collation results to obtain a retrained proxy model of the current stage, comprises:

7. The method according to any one of claims 1-4, wherein in the current stage of training the large language model, classifying the training text in the current training dataset according to the text attribute of the training text by the proxy model trained in the previous stage to obtain a first classification result, including:

8. The method according to any one of claims 1-4, wherein classifying the training text in the current training dataset according to the text attribute of the training text by the proxy model of the current stage to obtain a second classification result, including:

randomly sampling training texts in the current training data set;

9. A training device for a large language model, comprising:

10. An electronic device, comprising:

A processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the large language model of any one of claims 1 to 8.

11. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the training method of the large language model of any one of claims 1 to 8.

12. A computer program product comprising a computer program which when executed by a processor implements a method of training a large language model as claimed in any one of claims 1 to 8.