CN117094361A - Method for selecting parameter efficient fine adjustment module - Google Patents
Method for selecting parameter efficient fine adjustment module Download PDFInfo
- Publication number
- CN117094361A CN117094361A CN202311352064.3A CN202311352064A CN117094361A CN 117094361 A CN117094361 A CN 117094361A CN 202311352064 A CN202311352064 A CN 202311352064A CN 117094361 A CN117094361 A CN 117094361A
- Authority
- CN
- China
- Prior art keywords
- hidden state
- input sample
- parameter
- final
- efficient fine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012512 characterization method Methods 0.000 claims abstract description 12
- 238000009966 trimming Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 5
- 230000009466 transformation Effects 0.000 description 4
- QVFWZNCVPCJQOP-UHFFFAOYSA-N chloralodol Chemical compound CC(O)(C)CC(C)OC(O)C(Cl)(Cl)Cl QVFWZNCVPCJQOP-UHFFFAOYSA-N 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention provides a method for selecting a parameter efficient fine adjustment module, which comprises the following steps: acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample; constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample; judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected. The invention solves the problems of small selectivity and large consumption in the training process of the large-scale language model fine tuning method in the prior art.
Description
Technical Field
The invention relates to the technical field of language models, in particular to a method for selecting a parameter efficient fine tuning module.
Background
The existing large-scale language model, though more and more powerful, shows a certain general learning capability, namely, by observing a few groups of examples, the existing large-scale language model can have a certain completion capability for tasks which are never seen. However, to support different needs of customers in different application scenarios, we may not be able to just run the exact same model. Therefore, we may need to customize the model based on the customer data. For example, a client has a requirement for privacy protection, the input data is dialogue data subjected to encryption processing, and possibly text is completely different, and the model needs to be customized in order to understand the dialogue content and formulate reply content. In order to use a large model base to meet the requirements of different customization tasks, a parameter efficient fine-tuning method is needed.
The existing method uses only one parameter efficient fine tuning method to fine tune a large model for one task, and the training process is consumed greatly.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a method for selecting a parameter efficient fine tuning module, and solves the problems of low selectivity and high consumption in the training process of a large-scale language model fine tuning method in the prior art.
In order to achieve the above object, the present invention provides the following solutions:
a method of selecting a parameter efficient trim module, comprising:
acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample;
constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample;
judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected.
Preferably, the obtaining the final hidden state of the parameter to be selected efficient fine tuning module and the input sample includes:
acquiring a first hidden state of an input sample;
obtaining a second hidden state of the input sample according to the first hidden state and part of the operation of the transducer layer;
obtaining a third hidden state of the input sample according to the second hidden state and the parameter to be selected high-efficiency fine adjustment module;
and obtaining the final hidden state of the input sample according to the remaining operation of the transformer layer and the third hidden state of the input sample.
Preferably, the expression of the final hidden state of the input sample is:
;
wherein,for the efficient fine-tuning module of the parameter to be selected +.>For the first hidden state, ++>In order to be in the final hidden state,is a second functional expression.
Preferably, the formula of the final characterization of the input sample is:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a learning coefficient.
Preferably, the learning coefficient is determined by a bernoulli random number, and the bernoulli random number is 0 or 1, wherein the probability of the bernoulli random number being 1 is 0.5.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a method for selecting a parameter efficient fine-tuning module, which allows a plurality of parameter fine-tuning methods to be used by setting a parameter efficient fine-tuning super network, and eliminates redundant parameter efficient fine-tuning modules according to the final hidden state of an input sample and judgment of learning parameters, so that training consumption of a language model is reduced, and different transducer layers select corresponding parameter efficient fine-tuning modules to obtain better effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for selecting a parameter efficient fine tuning module according to an embodiment of the present invention;
fig. 2 is a schematic diagram of 3 known efficient fine tuning methods according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a method for selecting a parameter efficient fine tuning module, which solves the problems of low selectivity and high consumption in the training process of a large-scale language model fine tuning method in the prior art.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, the present invention provides a method for selecting a parameter efficient fine tuning module, which includes:
step 100: acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample;
step 200: constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample;
step 300: judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected.
Specifically, assume we already have a large-scale language modelThis model may be our autonomous pre-training or may be an open source model, such as ChatGLM-6B. The model is generally over 50 billion in model parameter scale. When the model is trained, the pre-training trunk is not changed, and only parameters of the PEFT module on the right side are changed.
Further, the obtaining the final hidden state of the parameter to be selected high-efficiency fine adjustment module and the input sample includes:
acquiring a first hidden state of an input sample;
obtaining a second hidden state of the input sample according to the first hidden state and part of the operation of the transducer layer;
obtaining a third hidden state of the input sample according to the second hidden state and the parameter to be selected high-efficiency fine adjustment module;
and obtaining the final hidden state of the input sample according to the remaining operation of the transformer layer and the third hidden state of the input sample.
In particular, assume that for one parameter a module is efficiently trimmedThe hidden state of the input sample before entering the transducer layer is +.>It reaches->Part of the operation of passing through a transducer before accessing the location (recorded as a function) The previous hidden state is +.>Through->After that become +.>Then the remaining operations through the transducer layer (denoted as a function +.>) Obtain->。
The final hidden state of the input sample is expressed as:
;
wherein,for the efficient fine-tuning module of the parameter to be selected +.>For the first hidden state, ++>In order to be in the final hidden state,is a second functional expression.
This example discloses 3 known methods for efficient fine tuning of parameters, as shown in fig. 2: loRA, prefixing, adapter tuning. LoRA is the modification of the parameter matrix, and Prefix is the splicing of the parameter matrix with random initialization of the vector characterization of the sample; adapter is the modification of the hidden state of each transform layer output.
Specifically, we first set a parameter efficient fine-tuning super-network, i.e. we allow the six parameter efficient fine-tuning modules (2 adapter positions, two prefix positions, and two LoRA positions) in fig. 2 to be all used at the same time. Thus, the formula for the final characterization of the input sample (final table of samples through the transducer layer) is:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For learning coefficients, the value is between 0 and 1 (here, a real number is taken as sigmoid). We finally want to realize: giving a threshold value->If->Is greater than->Then keep +.>Otherwise remove->. Thus, by adjusting the threshold, we can eliminate redundant parameter efficient trimming modules.
The transformation layer of the pre-training model is equivalent to a trunk, the parameter efficient fine-tuning module is a Bowman on the transformation layer, and both the transformation layer and the parameter efficient fine-tuning module process and change the implicit state of the sample, so that the transformation layer and the parameter efficient fine-tuning module are more suitable for outputting useful outputs, such as the category of sentences, how the next sentence of sentences should be written, and the like.
Further, the learning coefficient is determined by a Bernoulli random number, the Bernoulli random number is 0 or 1, and the probability of the Bernoulli random number being 1 is 0.5.
In particular, the method comprises the steps of,learning parameters: />The parameters are considered as part of the model parameters and are learned along with the parameters of the parameter efficient fine tuning module. Finally, go up>By learning, a value near 0 or 1 indicates that the PEFT module on the customization task is less important. Although it can learn +.>Parameters, but there is a problem: finally we want to screen out less important PEFT modules, but during training and end use the composition of PEFT modules is different, training directly +.>Resulting in portions of the modules not being adequately trained. Therefore, it can cause +>The importance of parameter learning is inaccurate. So to ensureThe parameters may express the importance of the PEFT module, we propose the following regularization method. We are on each forward propagation for each +.>Parameters, all randomly sample a Bernoulli random number +.>The random number takes on a value of 0 or 1, with a probability of p=0.5 of 1, then +.>The value of (2) is->And->Multiplication is performed to obtain>Whether or not can be used by +.>And (5) determining. Thus, each forward propagation activates different parameter efficient fine tuning modules due to the randomness of the Bernoulli random numbers, and different results are obtained. The hidden state obtained by two different forward propagation is marked as +.>And->We believe that the super-network needs to ensure that when different PEFT modules are used, the provided sample semantic representation is stable enough that we need to select different PEFTs for the transducer modules at different positions of the model so that we can learn the number a by training enough for our different PEFT modules i Adding the PEFT modules, and learning a through the regular term of I description plus training data loss i The value of (a), finally a i If smaller, the corresponding PEFT module is removed. Thus, we add an additional regularization term on the basis of training loss:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a regular term
That is, the requirement isAnd->Is as small as possible. By the constraint of this regularization term, part of the modules in our PEFT super-network can be trained sufficiently so that the corresponding +.>The parameters may truly reflect their importance.
The beneficial effects of the invention are as follows:
the invention provides a method for selecting a parameter efficient fine-tuning module, which allows a plurality of parameter fine-tuning methods to be used by setting a parameter efficient fine-tuning super network, and eliminates redundant parameter efficient fine-tuning modules according to the final hidden state of an input sample and judgment of learning parameters, so that training consumption of a language model is reduced, and different transducer layers select corresponding parameter efficient fine-tuning modules to obtain better effects.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (5)
1. A method of selecting a parameter efficient trim module, comprising:
acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample;
constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample;
judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected.
2. The method of claim 1, wherein the obtaining the final hidden state of the parameter efficient trimming module to be selected and the input sample comprises:
acquiring a first hidden state of an input sample;
obtaining a second hidden state of the input sample according to the first hidden state and part of the operation of the transducer layer;
obtaining a third hidden state of the input sample according to the second hidden state and the parameter to be selected high-efficiency fine adjustment module;
and obtaining the final hidden state of the input sample according to the remaining operation of the transformer layer and the third hidden state of the input sample.
3. The method of claim 1, wherein the final hidden state of the input samples is expressed as:
;
wherein,for the efficient fine-tuning module of the parameter to be selected +.>For the first hidden state, ++>At the mostA final hidden state of the device,is a second functional expression.
4. A method of selecting a parameter efficient fine tuning module as claimed in claim 3 wherein the formula of the final characterization of the input sample is:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a learning coefficient.
5. The method of claim 1, wherein the learning factor is determined by a bernoulli random number, the bernoulli random number having a value of 0 or 1, and wherein the probability of the bernoulli random number having a value of 1 is 0.5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311352064.3A CN117094361B (en) | 2023-10-19 | 2023-10-19 | Method for selecting parameter efficient fine adjustment module |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311352064.3A CN117094361B (en) | 2023-10-19 | 2023-10-19 | Method for selecting parameter efficient fine adjustment module |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117094361A true CN117094361A (en) | 2023-11-21 |
CN117094361B CN117094361B (en) | 2024-01-26 |
Family
ID=88772147
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311352064.3A Active CN117094361B (en) | 2023-10-19 | 2023-10-19 | Method for selecting parameter efficient fine adjustment module |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117094361B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110580543A (en) * | 2019-08-06 | 2019-12-17 | 天津大学 | Power load prediction method and system based on deep belief network |
US20200364574A1 (en) * | 2019-05-16 | 2020-11-19 | Samsung Electronics Co., Ltd. | Neural network model apparatus and compressing method of neural network model |
CN114357172A (en) * | 2022-01-07 | 2022-04-15 | 北京邮电大学 | Rumor detection method based on ERNIE-BiGRU-Attention |
CN114676234A (en) * | 2022-02-22 | 2022-06-28 | 华为技术有限公司 | Model training method and related equipment |
KR20220124389A (en) * | 2021-03-03 | 2022-09-14 | 에스케이 주식회사 | Method and provision system for finetuned model service using pretrain model |
CN116882474A (en) * | 2023-07-18 | 2023-10-13 | 平安科技(深圳)有限公司 | Fine tuning method, device, equipment and medium of pre-training model |
-
2023
- 2023-10-19 CN CN202311352064.3A patent/CN117094361B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200364574A1 (en) * | 2019-05-16 | 2020-11-19 | Samsung Electronics Co., Ltd. | Neural network model apparatus and compressing method of neural network model |
CN110580543A (en) * | 2019-08-06 | 2019-12-17 | 天津大学 | Power load prediction method and system based on deep belief network |
KR20220124389A (en) * | 2021-03-03 | 2022-09-14 | 에스케이 주식회사 | Method and provision system for finetuned model service using pretrain model |
CN114357172A (en) * | 2022-01-07 | 2022-04-15 | 北京邮电大学 | Rumor detection method based on ERNIE-BiGRU-Attention |
CN114676234A (en) * | 2022-02-22 | 2022-06-28 | 华为技术有限公司 | Model training method and related equipment |
CN116882474A (en) * | 2023-07-18 | 2023-10-13 | 平安科技(深圳)有限公司 | Fine tuning method, device, equipment and medium of pre-training model |
Non-Patent Citations (6)
Title |
---|
JIAMING HAN 等: "ImageBind-LLM: Multi-modality Instruction Tuning", 《ARXIV》, pages 1 - 24 * |
TAO JIANG 等: "Gaseous emission during the composting of pig feces from Chinese Ganqinfen system", 《CHEMOSPHERE》, vol. 90, no. 4, pages 1545 - 1551 * |
ZINIU LI 等: "ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models", 《ARXIV》, pages 1 - 20 * |
徐峰 等: "综合模块选择, 资源共享与任务调度的SoC设计方案搜索算法", 《计算机辅助设计与图形学学报》, vol. 21, no. 7, pages 1005 - 1010 * |
王亮亮 等: "基于深度学习的快速车辆检测算法", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑(月刊)》, no. 1, pages 034 - 1013 * |
羽林小王子: "一文读懂:LoRA实现大模型LLM微调", pages 1 - 10, Retrieved from the Internet <URL:https://developer.aliyun.com/article/1257855> * |
Also Published As
Publication number | Publication date |
---|---|
CN117094361B (en) | 2024-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7421604B2 (en) | Model pre-training method and device, text generation method and device, electronic equipment, storage medium and computer program | |
CN110598224B (en) | Training method of translation model, text processing method, device and storage medium | |
CN111914551B (en) | Natural language processing method, device, electronic equipment and storage medium | |
JP2022549238A (en) | Semantic understanding model training method, apparatus, electronic device and computer program | |
US20200265315A1 (en) | Neural architecture search | |
JP6876814B2 (en) | Batch renormalization layer | |
US10181320B2 (en) | Computer-implemented method and apparatus for generating grapheme-to-phoneme model | |
US11488060B2 (en) | Learning method, learning program, learning device, and learning system | |
WO2023137911A1 (en) | Intention classification method and apparatus based on small-sample corpus, and computer device | |
CN111241789A (en) | Text generation method and device | |
CN117332791B (en) | Large language model training method, device, equipment and storage medium | |
CN111125323B (en) | Chat corpus labeling method and device, electronic equipment and storage medium | |
CN114626529B (en) | Natural language reasoning fine tuning method, system, device and storage medium | |
DE102023108430A1 (en) | GENERATING CONVERSATIONAL RESPONSE USING NEURAL NETWORKS | |
CN112199637B (en) | Regression modeling method for generating contrast network data enhancement based on regression attention | |
CN117094361B (en) | Method for selecting parameter efficient fine adjustment module | |
CN114048296A (en) | Semantic gate-based chatting type multi-round conversation method, system, medium and equipment | |
CN109829054A (en) | A kind of file classification method and system | |
CN111797220B (en) | Dialog generation method, apparatus, computer device and storage medium | |
CN111090740A (en) | Knowledge graph generation method for dialog system | |
CN115587173A (en) | Dialog text prediction method, device, equipment and storage medium | |
CN111401069A (en) | Intention recognition method and intention recognition device for conversation text and terminal | |
CN115240654A (en) | Speech recognition model training method, device, equipment and storage medium | |
CN110909142B (en) | Question and sentence processing method and device of question-answer model, electronic equipment and storage medium | |
CN113591492B (en) | Corpus generation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |