Nothing Special   »   [go: up one dir, main page]

WO2023077819A1 - Data processing system, method and apparatus, and device, storage medium, computer program and computer program product - Google Patents

Data processing system, method and apparatus, and device, storage medium, computer program and computer program product Download PDF

Info

Publication number
WO2023077819A1
WO2023077819A1 PCT/CN2022/099715 CN2022099715W WO2023077819A1 WO 2023077819 A1 WO2023077819 A1 WO 2023077819A1 CN 2022099715 W CN2022099715 W CN 2022099715W WO 2023077819 A1 WO2023077819 A1 WO 2023077819A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
neural network
target
training
training data
Prior art date
Application number
PCT/CN2022/099715
Other languages
French (fr)
Chinese (zh)
Inventor
邵婧
李阳光
王坤
尹榛菲
陈思禹
何逸楠
黄耿石
滕家宁
刘丰刚
孙庆宏
梁鼎
吴一超
高梦雅
刘宇
宋广录
刘吉豪
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023077819A1 publication Critical patent/WO2023077819A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to but not limited to the technical field of artificial intelligence, and in particular relates to a data processing system and method, device, device, storage medium, computer program, and computer program product.
  • General artificial intelligence technology is an important topic in the field of artificial intelligence research. Taking the field of computer vision as an example, the general visual neural network built with general artificial intelligence technology can break through the limitations of a single model for specific computer vision tasks, and thus can be widely used in various computer tasks, such as image classification, object detection, Semantic segmentation, depth estimation, etc.
  • a method for generating a general visual neural network is provided in the related art.
  • the method uses a general data set to train a classification network so as to train a general visual representation through the classification task.
  • Embodiments of the present disclosure at least provide a data processing system and method, device, device, storage medium, computer program, and computer program product.
  • An embodiment of the present disclosure provides a data processing system, including: a data collection module, a network generation module, and a network training module; the data collection module, the network generation module, and the network training module are sequentially connected by communication;
  • the data collection module is configured to obtain a training data set and at least two network composition modules for forming a target neural network;
  • the network generation module is configured to generate at least one target neural network based on the obtained training data set and the at least two network composition modules; each of the target neural networks is used to perform a corresponding target task;
  • the network training module is configured to perform joint training on at least two of the target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network uses Perform the target tasks in the migration to downstream business scenarios.
  • the training data and at least two network component modules used to form the target neural network are obtained, based on the acquired training data set and the at least two network component modules, at least A target neural network.
  • at least two target neural networks have been trained, joint training can be performed on the at least two target neural networks, so as to obtain a trained joint neural network.
  • the basic network composition modules of the present disclosure can generate target neural networks suitable for various target tasks, and then through joint training, can generate a joint neural network suitable for downstream business scenarios, which has good versatility and accuracy.
  • An embodiment of the present disclosure also provides a data processing method, including:
  • At least one target neural network is generated based on the acquired training data set and at least two network components; each target neural network is used to perform a corresponding target task.
  • An embodiment of the present disclosure also provides a data processing device, including:
  • An acquisition module configured to acquire a training data set and at least two network composition modules for forming a target neural network
  • the generating module is configured to generate at least one target neural network based on the acquired training data set and at least two of the network composition modules; each of the target neural networks is used to perform a corresponding target task.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor, a memory, and a bus.
  • the memory stores machine-readable instructions executable by the processor.
  • the processor and the The memory communicates with each other through a bus, and when the machine-readable instructions are executed by the processor, the steps of the data processing method described in any one of the second aspect and its implementation manners are executed.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the data described in any one of the second aspect and its implementation manners is executed. The steps of the processing method.
  • An embodiment of the present disclosure provides a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, a part or part of the method in any embodiment of the present disclosure is realized. All steps.
  • An embodiment of the present disclosure provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, any embodiment of the present disclosure is realized Some or all of the steps in the method.
  • FIG. 1 shows a schematic diagram of a data processing system provided by an embodiment of the present disclosure
  • FIG. 2(a) shows a schematic diagram of the down-sampling mode in the data processing system provided by the embodiment of the present disclosure
  • Fig. 2(b) shows a schematic diagram of the downsampling mode in the data processing system provided by the embodiment of the present disclosure
  • FIG. 2(c) shows a schematic diagram of the down-sampling mode in the data processing system provided by the embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of searching candidate search paths in the data processing system provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of pre-training of the target neural network in the data processing system provided by an embodiment of the present disclosure
  • Fig. 5 shows the flow chart of the joint neural network training method in the data processing system provided by the embodiment of the present disclosure
  • Fig. 6(a) shows the first connection in the data processing system provided by the embodiment of the present disclosure Schematic diagram of the connection mode of the layers;
  • FIG. 6(b) shows a schematic diagram of a connection mode of the second connection layer in the data processing system provided by an embodiment of the present disclosure
  • FIG. 6(c) shows a schematic diagram of a connection mode of the third connection layer in the data processing system provided by an embodiment of the present disclosure
  • FIG. 7 shows a schematic diagram of codebook training in the data processing system provided by an embodiment of the present disclosure
  • FIG. 8 shows a flowchart of a data processing method provided by an embodiment of the present disclosure
  • FIG. 9 shows a schematic diagram of a data processing device provided by an embodiment of the present disclosure.
  • Fig. 10 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • the construction of a general-purpose visual neural network has not yet formed a set of effective procedures and reliable results.
  • Many existing computer vision technologies are constrained by various factors, making it difficult to achieve the goal of a general-purpose visual neural network.
  • the method uses a general data set to train a classification network to train a general visual representation through the classification task.
  • the present disclosure provides a scheme for network search candidate search paths based on reinforcement learning to realize neural network generation, and the scheme has achieved remarkable results in both network performance and network versatility.
  • the data processing system includes: a data acquisition module 101, a network generation module 102 and a network training module 103; a data acquisition module 101, a network generation module 102 and a network
  • the training module 103 is sequentially connected by communication;
  • the data collection module 101 is configured to obtain a training data set and at least two network composition modules for forming a target neural network;
  • the network generation module 102 is configured to generate at least one target neural network based on the acquired training data set and at least two network composition modules; each target neural network is used to perform a corresponding target task;
  • the network training module 103 is configured to perform joint training on at least two target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network is used for migration to downstream services Perform the target task in the scene.
  • the data processing system in the embodiments of the present disclosure can be applied to the field of vision, for example, the neural network based on the generated object is applied to scenarios such as object detection, image classification, and depth estimation.
  • the embodiments of the present disclosure provide a data processing system that generates a target neural network based on network component modules, and then obtains a joint neural network adapted to various target tasks through joint training.
  • the target neural network here may be determined based on a search result of at least two candidate search paths associated with at least two network constituent modules by the reinforcement learning network.
  • the candidate search paths here correspond to a set of specific combination methods. Based on this combination method, the corresponding network components can be combined, and then the target neural network that meets the requirements can be obtained.
  • the network composition module in the embodiment of the present disclosure may include a feature map extraction unit for feature map extraction, and may also include a downsampling unit for downsampling the feature map output by the feature map extraction unit.
  • the down sampling (Down Sampling Modules, DSM) unit can include local down sampling (Local DSM, L-DSM), and the hidden layer 201 in L-DSM is convolution stride (stride) is 2.
  • the DSM unit can also include local-global downsampling (Local-global DSM, LG-DSM), and the hidden layer 202 in LG-DSM has a convolution step size of 2 and a dimension of two dimensions
  • the convolutional layer, the hidden layer 203 is a multi-head attention layer (Multi-Head Attention).
  • the DSM unit can also include global downsampling (Global DSM, G-DSM), and the hidden layer 204 in G-DSM is a convolutional layer with a convolution step size of 2 and a dimension of one dimension , the hidden layer 205 is a multi-head attention layer (Multi-Head Attention).
  • the multi-head attention layer can be used to determine the query vector (Q), key vector (K) and value vector (V).
  • a unified search space may be constructed to search for candidate search paths based on the unified search space.
  • a unified search space 301 may be determined by a network composition model 302 , a downsampling unit 303 and a network size 304 .
  • the network composition model 302 may also be called general operations (General Operations, GOP), and may include a convolutional network (Convolution), a natural language processing model (Transformer), and a multilayer perceptron (Multilayer Perceptron, MLP).
  • the downsampling unit (DSM) 303 may include L-DSM, LG-DSM and G-DSM.
  • the network size (Size) 304 may include the number of iterations (Repeats), the number of channels (Channels), the number of expansions (Expansion), and the like. In the process of searching for candidate search paths, multiple searches may be performed, for example, the first time (N 1 ), the second time (N 2 )...the fifth time (N 5 ) and so on.
  • the above-mentioned feature map extraction unit can be implemented based on convolution operation, can also be based on Transformer architecture, can also be implemented based on multi-layer perceptron, and can also be based on other features with feature extraction functions It is implemented by a related unit, and there is no limitation here; the above-mentioned down-sampling unit can be implemented based on a convolution operation, or based on a multi-layer attention mechanism, or based on other related units with sampling functions. This is also not limited.
  • the training data set in the embodiment of the present disclosure may include various training data, for example, may include training data corresponding to different target tasks, and may also include the first training data having at least two image-text pairs, and The second training data having at least two images may also include other training data, which is not limited here, and the corresponding data may be selected based on different requirements, and the above-mentioned various training data may be pre-split, so as to facilitate In the case of corresponding network training, the corresponding training data is quickly extracted.
  • the training data set provided in the embodiments of the present disclosure may be high-quality network data screened out based on an active learning network.
  • the network data here may be acquired based on the network input interface, and in some embodiments, the network data may be automatically acquired from the network input interface by means of a web crawler.
  • the training data here may be high-quality network data screened out after evaluating the quality of the acquired network data, and the high-quality training data can ensure the accuracy of network training.
  • the training data set in the embodiment of the present disclosure may also have an initial labeling result, and in order to adapt to the training requirements of various networks, the labeling result may be extended here. That is to say, the embodiment of the present disclosure provides a large-scale labeling system, where the knowledge map structure can be used to expand the initial labeling results to obtain the expanded labeling results.
  • the initial labeling results of the training dataset can be extended based on the knowledge graph structure.
  • the extended annotation results can provide more realistic supervision signals for network training, which ensures the accuracy of the determined network accuracy to a certain extent.
  • the embodiments of the present disclosure can also determine the final training data and corresponding labeling results through data reorganization and label cleaning. Such a label system is more suitable for computer vision tasks. demand.
  • At least two target neural networks can be combined to perform Joint training, so that the trained joint neural network has the task characteristics of each target neural network, so that it can better adapt to downstream business scenarios.
  • the downstream business scenarios here can be scenarios related to computer vision, for example, the access control field, Unmanned driving field, etc.
  • training can be performed based on the training data corresponding to the current task.
  • the detection neural network related to the detection task can be trained, and the related classification task can also be trained.
  • Classification neural network can be trained.
  • the training data and a plurality of network components used to form the target neural network are obtained, at least one target can be generated based on the obtained training data set and the plurality of network components.
  • Neural Networks In this way, in the case of training multiple target neural networks, the multiple target neural networks can be jointly trained to obtain a trained joint neural network.
  • the basic network composition modules of the present disclosure can generate target neural networks suitable for various target tasks, and then through joint training, can generate a joint neural network suitable for downstream business scenarios, which has good versatility and accuracy.
  • the network generation module 102 can generate the target neural network for performing the corresponding target task according to the following steps:
  • Step 1 Determine at least two candidate search paths associated with at least two network constituent modules, wherein each candidate search path corresponds to a combination mode, and the combination mode is used to represent the operational relationship between each network constituent module;
  • Step 2 using the training data corresponding to the target task and the reinforcement learning network to search at least two candidate search paths at least once, and obtain the reward score after each search;
  • Step 3 According to the combination method corresponding to the candidate search paths whose return scores meet the preset requirements, the inner network components are combined to obtain the target neural network for performing the target task.
  • one or more path searches can be performed based on the learning of the reinforcement learning network, and a searched candidate search path can be obtained for each path search.
  • the training times of the reinforcement learning network gradually increase, its learning ability becomes stronger and stronger, and the search ability will gradually increase with the increase of training times, and then more and better candidate search paths can be screened out.
  • the reinforcement learning network can be used to search the multiple candidate search paths, so as to generate the target neural network according to the reward score after each search.
  • the candidate search paths in this disclosure can be characterized by the operational relationship between various network components, and the number of candidate search paths that can be determined based on various operational relationships is also large, and the reinforcement learning network can be used to learn from a large number of In the candidate search path, a candidate search path that represents better performance of the neural network is searched, so that the target neural network generated based on the searched candidate search path has better versatility and accuracy.
  • the standard environment for reinforcement learning networks includes State, Action and Reword.
  • the update form is to input an action at the current moment, and the environment obtains the state and reward at that moment through a single-step operation.
  • the state policy function can calculate the input action at the next moment, and the reward is used to update the weight parameters of the strategy.
  • the input action at the current moment may point to searching for the next candidate search path, and the state at this moment may point to the selection probability of selecting the corresponding candidate search path.
  • any candidate search path can be used as the initial state information of the reinforcement learning network, and the candidate search path selected by the first search can be determined based on the initial state information, and the candidate search path selected based on the first search and corresponding to the target
  • the training data of the task determines the reward score after the first search and the selection probability of selecting the corresponding candidate search path.
  • the choice of the second search can be determined based on the return score after the first search and the selection probability of the corresponding candidate search path Candidate search paths, and then you can get the return score after the second search and the selection probability of selecting the corresponding candidate search path, and in this way, you can determine the third, fourth, and last search based on the next search. Return score.
  • n searches may be performed.
  • n may be an integer greater than 1, for example, may be 100 times, 1000 times, etc.
  • the value of n may be determined in combination with requirements of different application scenarios, and there is no limitation thereto.
  • the return score after the corresponding search and the selection probability of selecting the corresponding candidate search path can be determined.
  • the search strategy with a relatively high return score can be automatically selected, so that the obtained candidate search Paths are more reliable.
  • the execution of the search can be determined based on the network cut-off condition, where the network cut-off condition can be that the number of iterations is large enough, the number of candidate search paths that meet the preset requirements is large enough, or other conditions, Embodiments of the present disclosure do not limit this.
  • a candidate search path with a relatively high return score may be selected from the searched candidate search paths.
  • the labeling result of the training data can be the labeling result based on the expanded knowledge map structure.
  • the extended annotation results can provide more realistic supervision signals for network training, which ensures the accuracy of the determined network accuracy to a certain extent.
  • the reward score after each search may be determined based on the network accuracy of the candidate neural network.
  • a candidate neural network with higher network precision it can be explained to a certain extent that the training performance of the candidate neural network is better.
  • positive feedback can be used to motivate the execution of similar path searches.
  • the network with lower precision As far as the candidate search path is concerned, it can be explained to a certain extent that the training performance of the candidate neural network is poor.
  • negative feedback can be used to reduce the execution of similar path searches.
  • the reward score may be determined based on the network accuracy with respect to the candidate neural network.
  • the network performance of the candidate neural network with higher network accuracy can better meet the needs of various fields, and then can be given a higher return score. Under such a scoring mechanism, more and better candidate search paths can be obtained.
  • the difference between the output result and the labeling result for the training data can be determined.
  • the larger the difference the better the candidate neural network is.
  • the network accuracy of the candidate neural network can be jointly determined through the comparison results corresponding to at least two training data, for example, the final network accuracy can be determined based on the average network accuracy obtained from the network accuracy corresponding to each training data .
  • the network generation module 102 when the network generation module 102 generates a target neural network, pre-training of large-scale multi-modal data can be performed based on the network training module 103 to improve the training of the target neural network performance.
  • the target neural network here includes a backbone network layer for feature extraction and other network layers for feature processing.
  • the above-mentioned network training module 103 can train the target neural network according to the following steps:
  • Step 1 using the first training data to train the backbone network layer included in the target neural network to be trained to obtain the trained backbone network layer;
  • Step 2 When the network parameter values of the trained backbone network layer remain unchanged, use the second training data to train other network layers included in the target neural network to be trained to obtain other trained network layers.
  • the backbone network layer included in the target neural network to be trained may be trained using the acquired first training data including image-text pairs.
  • the backbone network layer and other network layers of the target neural network can be trained respectively by using different training data, so as to further improve the training performance of the corresponding network layer.
  • the network parameter values of the backbone network layer remain unchanged.
  • the second training data including images can be used to perform training on other network layers included in the target neural network. training to further improve the training performance of the target neural network.
  • the training process of the backbone network layer can be implemented through the following steps:
  • Step 1 Input the first training data into the target neural network to be trained, and obtain image feature information and text feature information respectively corresponding to the image and text in the image-text pair included in the first training data;
  • Step 2 based on the feature similarity between the image feature information and the text feature information, determine the first loss function value
  • Step 3 When the current round of training does not meet the iteration cut-off condition, adjust the network parameter value of the backbone network layer based on the first loss function value, and perform the next round of training based on the adjusted backbone network layer until the iteration is satisfied. Deadline.
  • the first loss function value can be determined based on the feature similarity between the image feature information corresponding to the image and the text in the image-text pair and the text feature information.
  • the smaller the first loss function value the better the relationship between the two feature information.
  • Inputting the first training data into the untrained target neural network may include: using the untrained target neural network to perform feature extraction on the first training data.
  • the above-mentioned image-text pairs may be crawled from the Internet, and the number of crawled image-text pairs is huge.
  • the embodiments of the present disclosure can use self-supervision technology to find more supervision information from large-scale noisy image-text pairs, so as to ensure better training performance.
  • the iteration cut-off condition may be that the number of iterations is sufficient, the value of the first loss function is small enough, or the condition that the first training data has been traversed, etc., and there is no limitation here.
  • the pre-training of the target neural network may include acquiring image data 401 and text data 403 .
  • the image encoder 402 to extract the image features of the image data 401
  • using the text encoder 404 to extract the text features of the text data 403, the inner product can be determined based on all the image features and text features to obtain a feature matrix 405.
  • the row direction of the feature matrix 405 is a classifier
  • the column direction of the feature matrix 405 is also a classifier.
  • the original supervision (Original Supervision) method can be used for training.
  • the pre-training of the target neural network may include acquiring at least two image data 406 and at least two text data 407 .
  • Image features of image data 406 are extracted by image encoder 407
  • text features of text data 408 are extracted by text encoder 408
  • feature extraction is performed on text features
  • feature queue 410 is obtained. All image features, text features and feature queues 410 can be determined by inner product respectively to obtain a feature matrix 411 .
  • initial supervision, self-supervision, multi-view supervision and nearest-neighbor supervision can be used for training.
  • the feature matrix 411 obtained through training can be used to further perform pre-training on the backbone network layer 412 for tasks such as target detection or target segmentation 416, thereby fixing the parameters of the backbone network layer; Contrastive Learning, SOCO) mode 413, pre-training feature pyramid network (Feature Pyramid Networks, FPN) 414 and detection head (Hesd) network layer 415.
  • SOCO Contrastive Learning
  • Step 1 inputting the second training data into the target neural network to be trained, and obtaining output results of other network layers included in the target neural network;
  • Step 2 Determine a second loss function value based on the output result and the labeling result of the image included in the second training data
  • Step 3 When the current round of training does not meet the iteration cut-off condition, adjust the network parameter values of other network layers based on the second loss function value, and perform the next round of training based on the adjusted other network layers until the iteration is satisfied. Deadline.
  • the second loss function value can be determined based on the matching degree between the output results of other network layers and the labeling results of the images included in the second training data. The smaller the second loss function value, the closer the two results are. This is also the purpose of training other network layers.
  • Inputting the second training data into the target neural network to be trained may include: performing feature extraction on the second training data by using an untrained target neural network.
  • the embodiments of the present disclosure may not label the images included in the second training data, but use self-supervision to train other network layers, so as to further improve the training performance of the target neural network.
  • the embodiment of the present disclosure provides a balanced
  • the training method of the joint neural network of multi-task performance can be realized by the network training module 103 in the embodiment of the present disclosure, in some embodiments, can include steps:
  • Step 1 using at least two target neural networks to extract features from the training data in the training data set, and obtain the feature information output by the backbone network layer included in each target neural network;
  • Step 2 Determine the loss function value of the joint neural network to be trained based on the feature information output by the backbone network layer included in each target neural network, wherein the joint neural network consists of at least two target neural networks and the backbone of each target neural network.
  • Step 4 Perform at least one round of network training on the joint neural network to be trained based on the value of the loss function to obtain a trained joint neural network.
  • the characteristic information output by the backbone network layer included in each target neural network can be combined to determine the loss function value of the joint neural network to be trained. Since there is a connection layer between the backbone network layers included in each target neural network, use this connection The layer can fuse the feature information output by the backbone network layer included in each target neural network, and further make the trained joint neural network have the task characteristics of each target neural network.
  • one target neural network in the at least two target neural networks is used as the main neural network of the joint neural network, and other target neural networks in the at least two target neural networks are used as the secondary neural network of the joint neural network.
  • the network training module 103 can determine the loss function value of the joint neural network to be trained according to the following steps:
  • Step 1 Based on the first feature information output by the first backbone network layer included in the secondary neural network, the second feature information output by the second backbone network layer included in the main neural network is updated to obtain updated second feature information;
  • Step 2 inputting the updated second characteristic information into other network layers included in the main neural network, and obtaining output results of other network layers;
  • Step 3 Determine the loss function value of the joint neural network based on the output results of other network layers and the labeling results of the corresponding task of the main neural network.
  • the loss function value of the joint neural network can be determined based on the output results of other network layers determined by the updated second feature information and the comparison results of the labeling results under the corresponding task of the main neural network.
  • the main neural network by using other network layers included in the main neural network, feature extraction is performed on the updated second feature information, so as to obtain output results of other network layers.
  • the main neural network and the auxiliary neural network can correspond to heterogeneous tasks, for example, they can be detection tasks and classification tasks respectively.
  • the main neural network can be obtained by using isomorphic data training, and the isomorphic data here can point to the data that performs similar tasks.
  • the main neural network here can be used to perform the detection task 1 of pedestrian detection and for Execute detection task 2 for vehicle detection.
  • the sub-neural network can also be trained using isomorphic data.
  • the sub-neural network here can be used to perform classification task 1 and classification task 2 of image classification.
  • Similar tasks can share the same hidden layers of the network, but the network near the output layer starts to fork to do different tasks, such as the above-mentioned detection task 1 and detection task 2.
  • Different tasks that is, heterogeneous tasks learn some common low-level abstraction features by sharing several hidden layers at the bottom of the network, and the parameters shared by the bottom layer can be exactly the same.
  • each task can design its own task-specific layer to learn features with a higher level of abstraction. All tasks can share some related hidden layers while retaining task-specific output layers.
  • each task e.g., classification, detection
  • each task has a backbone network layer with the same size as the parameter space.
  • the feature information output by the backbone network layer of the sub-neural network corresponding to the sub-task can be fused with the feature information output by the backbone network layer of the main neural network corresponding to the main task.
  • the trained joint neural network can have the task characteristics of multi-task, and then it can be more commonly used in various task scenarios in subsequent downstream applications.
  • FIG. 5 is a schematic diagram of training two target neural networks according to an embodiment of the present disclosure.
  • the training of the two target neural networks may include a mixed share (Mixed Share) manner, and the mixed share may include three branches.
  • the network of the first branch may include a backbone network layer (Stage) 501, a first detection task (Head1) 502 and a second detection task (Head2) 503, and the network of the second branch may include a backbone network layer 501, a first classification task ( Head3) 504 and the second classification task (Head4) 505, etc.
  • Soft sharing includes two branches, the left branch corresponds to the main neural network, and the right branch corresponds to the secondary neural network.
  • the main neural network may include a backbone network layer 501 and a first detection task 502 and a second detection task 503
  • the secondary neural network may include a backbone network layer 501 and a first classification task 504 and a second classification task 505 .
  • Stage corresponds to the backbone network layer of the neural network
  • Head corresponds to other network layers related to tasks. For example, Head1 and Head2 correspond to detection task 1 and detection task 2 respectively, and Head3 and Head4 correspond to classification task 1 and classification respectively. Task 2.
  • the stage included in the main neural network can extract the corresponding second feature information
  • the stage included in the sub-neural network can extract the corresponding first feature information
  • through the connection layer T can realize fusion between the first feature information and the second feature information.
  • the above loss function can be determined based on the second feature information output by the main neural network and the labeling results under the two tasks corresponding to the main neural network. value.
  • connection layer T in the embodiments of the present disclosure may be varied. As shown in Figures 6(a) to 6(c), Figure 6(a) shows that the connection layer 602 can be placed in the middle of a specific backbone network layer 601, and feature migration is performed on a feature layer of the same size, 6( b) shows that the connection layer 604 performs feature migration between backbone network layers 603 of different sizes, and Figure 6(c) shows the connection layer 606 of the connection layer 607 (block) in different backbone network layers 605 of the same size Perform fusion.
  • Figure 6(a) shows that the connection layer 602 can be placed in the middle of a specific backbone network layer 601, and feature migration is performed on a feature layer of the same size
  • 6( b) shows that the connection layer 604 performs feature migration between backbone network layers 603 of different sizes
  • Figure 6(c) shows the connection layer 606 of the connection layer 607 (block) in different backbone network layers 605 of the same size Perform fusion.
  • training data here may be image samples related to each task, and the image samples may be training data with labeling results.
  • the process of training the target neural network and the joint neural network in the data processing system provided by the embodiments of the present disclosure may be obtained through upstream training.
  • the embodiments of the present disclosure provide a data re-characterization training method to retrain the joint neural network.
  • the network training shown in FIG. 1 can be used.
  • the network migration module 104 connected with the module 103 is realized.
  • the above-mentioned process of training the joint neural network may include the following steps:
  • Step 1 Based on at least two images included in the training data set, determine a codebook for decomposing each image into at least two primitives for characterizing;
  • Step 2 In the case of migrating the trained target neural network to a downstream business scenario, characterize the target training data collected in the target business scenario based on the obtained codebook, and obtain the represented target training data;
  • Step 3 retraining the joint neural network by using the represented target training data to obtain a trained joint neural network for processing the target scene data collected in the target business scene.
  • the upstream training data can be used to re-characterize the downstream target training data to efficiently and accurately Apply the trained joint neural network to the downstream target business scenario.
  • codebook training can be performed using an adversarial network composed of paired encoders and decoders.
  • the image can be input to the untrained encoder to obtain the codebook output by the encoder; the codebook output by the encoder can be input to the untrained decoder to obtain the image output by the decoder, and then the image output by the decoder can be verified Whether the similarity with the input image is greater than the preset threshold, if not greater than the preset threshold, repeat the above process until it is greater than the preset threshold.
  • the codebook here can be an image coding based on an adversarial network composed of an encoder and a decoder, with high accuracy.
  • an encoder can decompose the picture into a codebook composed of several primitives, and then the decoder can basically restore these primitives to the picture.
  • the re-represented downstream data can be used to fine-tune the network.
  • the pre-training can be fixed
  • the parameters of the backbone network layer of the joint neural network only adjust the parameters of other network layers related to the task behind the backbone network layer to improve the generalization ability in task scenarios.
  • the above-mentioned target training data may be images, or other training data including images.
  • the original downstream data can also be used for final adjustment, so as to further improve the training performance of the joint neural network.
  • the embodiment of the present disclosure can also retrain the trained target neural network according to the above data re-characterization method to improve the generalization performance of the target neural network. For the training process, refer to the above description.
  • an encoder 702 can be used to perform training based on upstream data 701 to obtain a codebook 703 .
  • the codebook 703 may be re-represented into the upstream data 701 by using the decoder 704 .
  • the encoder 702 can be used to determine the codebook 706 corresponding to the downstream images (Downstream Images) 705, and the decoder 704 can be used to re-characterize the codebook 706 into an output image (Transferred Images)707.
  • the parameters of the pretraining model 708 are fixed (that is, training), adjust the parameters in the collection network layer, detection head (Neck&Head) 709 and task loss (Task Loss) 710 related network of non-fixed parameters (that is, trainable).
  • the parameters in the pre-training model 708, collection network layer, detection head 709 and task loss (TaskLoss) 710 related network can be further adjusted based on the downstream image 705.
  • the parameters of the pre-training model 708 are non-fixed (that is, not trainable).
  • each step does not imply a strict execution order and constitutes any limitation on the implementation process.
  • the execution order of each step should be based on its function and possible internal Logically OK.
  • the embodiment of the present disclosure also provides a data processing method and device corresponding to the data processing system. Since the principle of solving the problem of the method and device in the embodiment of the present disclosure is similar to that of the above-mentioned data processing system in the embodiment of the present disclosure, Therefore, the implementation of the method and the device can refer to the implementation of the method.
  • FIG. 8 is a flowchart of a data processing method provided by an embodiment of the present disclosure, the method includes steps S801 to S802, wherein:
  • S801 Obtain a training data set and at least two network composition modules for forming a target neural network
  • S802 Generate at least one target neural network based on the acquired training data set and at least two network constituent modules; each target neural network is used to perform a corresponding target task.
  • At least one target neural network for performing a corresponding target task can be generated based on the acquired training data set and at least two network constituent modules.
  • At least two target neural networks can be jointly trained to obtain a trained joint neural network; the joint neural network is used for migration to downstream business scenarios to perform target tasks.
  • the training process of the joint neural network and the corresponding application process please refer to the above description.
  • the execution subject of the data processing method provided by the embodiments of the present disclosure is generally an electronic device with certain computing capabilities, such as: a terminal device or a server or other processing device, and the terminal device may be a user device ( User Equipment, UE), mobile devices, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc.
  • the data processing method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • the device includes: an acquisition module 901 and a generation module 902; wherein,
  • An acquisition module 901 configured to acquire a training data set and at least two network composition modules for forming a target neural network
  • the generation module 902 is configured to generate at least one target neural network based on the acquired training data set and at least two network composition modules; each target neural network is used to perform a corresponding target task.
  • the device further includes: an execution module configured to perform joint training on at least two of the target neural networks to obtain a trained joint neural network; the joint neural network is used for migration to downstream business scenarios execute the target task.
  • the device embodiment since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to at least two network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. It can be understood and implemented by those skilled in the art without creative effort.
  • the embodiment of the present disclosure also provides an electronic device, as shown in FIG. 10 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 1001 , a memory 1002 , and a bus 1003 .
  • the memory 1002 stores machine-readable instructions executable by the processor 1001 (for example, execution instructions corresponding to the acquisition module 901 and the generation module 902 in the device in FIG. Communication between them through the bus 1003, when the machine-readable instructions are executed by the processor 1001, the following processing is performed:
  • each candidate search path corresponds to a combination mode, and the combination mode is used to characterize the operation relationship between each network constituent module
  • each network component module is combined to obtain the target neural network.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data processing method described in the foregoing method embodiments are executed.
  • the computer-readable storage medium may only store the computer program corresponding to the data processing method.
  • a computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device, and may be a volatile storage medium or a nonvolatile storage medium.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, Random Access Memory (RAM), Read-Only Memory (ROM), computer Erasable Programmable Read-Only Memory (EPROM or Flash), Static Random Access Memory (SRAM), Compact Disk Read-Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanically Encoded Devices , such as a punched card with instructions stored thereon, or a raised structure in a groove, and any suitable combination of the foregoing.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • EPROM or Flash computer Erasable Programmable Read-Only Memory
  • SRAM Static Random Access Memory
  • CD-ROM Compact Disk Read-Only Memory
  • DVD Digital Versatile Disk
  • Memory Stick Floppy Disk
  • Mechanically Encoded Devices such as a punched card with instructions stored thereon, or a raised structure in a groove, and any suitable combination of the foregoing.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • An embodiment of the present disclosure also proposes a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or all steps.
  • Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data processing method described in the above-mentioned method embodiment.
  • the computer program product carries a program code
  • the instructions included in the program code can be used to execute the steps of the data processing method described in the above-mentioned method embodiment.
  • the above-mentioned computer program product may be realized by hardware, software or a combination thereof.
  • the computer program product may be embodied as a computer storage medium, and in another optional embodiment, the computer program product may be embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. wait.
  • a software development kit Software Development Kit, SDK
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to at least two network units . Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: various media capable of storing program codes such as U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the present disclosure are a data processing system, method and apparatus, and a device and a storage medium. The system comprises: a data collection module, a network generation module and a network training module, which are sequentially in communication connection, wherein the data collection module is configured to acquire a training data set, and at least two network composition modules, which are used for constituting a target neural network; the network generation module is used for generating at least one target neural network on the basis of the acquired training data set and at least two network composition modules; each target neural network is used for executing a corresponding target task; the network training module is configured to perform joint training on at least two target neural networks when the at least two target neural networks have been trained, so as to obtain a trained joint neural network; and the joint neural network is used for being migrated to a downstream service scenario to execute a target task. In the present disclosure, a joint neural network that is suitable for a downstream service scenario can be generated by means of joint training, and the universality and accuracy of the joint neural network are both relatively good.

Description

数据处理系统及方法、装置、设备、存储介质、计算机程序、计算机程序产品Data processing system and method, device, device, storage medium, computer program, computer program product
相关申请的交叉应用Cross application of related applications
本公开基于申请号为202111306897.7、申请日为2021年11月05日、申请名称为“数据处理系统、方法、装置、设备及存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。This disclosure is based on the Chinese patent application with the application number 202111306897.7, the application date is November 05, 2021, and the application name is "data processing system, method, device, equipment and storage medium", and claims the priority of the Chinese patent application , the entire content of this Chinese patent application is hereby incorporated by reference into this disclosure.
技术领域technical field
本公开涉及但不限于人工智能技术领域,尤其涉及一种数据处理系统及方法、装置、设备、存储介质、计算机程序、计算机程序产品。The present disclosure relates to but not limited to the technical field of artificial intelligence, and in particular relates to a data processing system and method, device, device, storage medium, computer program, and computer program product.
背景技术Background technique
通用人工智能技术是人工智能研究领域的重要课题。以计算机视觉领域为例,利用通用人工智能技术构建的通用视觉神经网络可以突破单一模型针对特定计算机视觉任务的局限,从而可以被广泛应用在各类计算机任务中,例如,图像分类、目标检测、语义分割、深度估计等。General artificial intelligence technology is an important topic in the field of artificial intelligence research. Taking the field of computer vision as an example, the general visual neural network built with general artificial intelligence technology can break through the limitations of a single model for specific computer vision tasks, and thus can be widely used in various computer tasks, such as image classification, object detection, Semantic segmentation, depth estimation, etc.
相关技术中提供了一种通用视觉神经网络生成的方法,该方法在上游任务中,利用通用数据集训练分类网络以通过分类任务训练出通用的视觉表征。A method for generating a general visual neural network is provided in the related art. In the upstream task, the method uses a general data set to train a classification network so as to train a general visual representation through the classification task.
然而,由于在上游训练的网络只局限于特定的分类任务,这导致在将训练出的视觉表征应用到下游诸如检测、分割等其它任务的时候,效果不佳。However, since the network trained upstream is limited to specific classification tasks, this results in poor performance when applying the trained visual representations to other downstream tasks such as detection and segmentation.
发明内容Contents of the invention
本公开实施例至少提供一种数据处理系统及方法、装置、设备、存储介质、计算机程序、计算机程序产品。Embodiments of the present disclosure at least provide a data processing system and method, device, device, storage medium, computer program, and computer program product.
本公开实施例提供了一种数据处理系统,包括:数据采集模块、网络生成模块和网络训练模块;所述数据采集模块、所述网络生成模块以及所述网络训练模块依次通信连接;An embodiment of the present disclosure provides a data processing system, including: a data collection module, a network generation module, and a network training module; the data collection module, the network generation module, and the network training module are sequentially connected by communication;
所述数据采集模块,配置为获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;The data collection module is configured to obtain a training data set and at least two network composition modules for forming a target neural network;
所述网络生成模块,配置为基于获取的所述训练数据集以及所述至少两个网络组成模块,生成至少一个目标神经网络;每个所述目标神经网络用于执行对应的目标任务;The network generation module is configured to generate at least one target neural network based on the obtained training data set and the at least two network composition modules; each of the target neural networks is used to perform a corresponding target task;
所述网络训练模块,配置为在已训练至少两个所述目标神经网络的情况下,对至少 两个所述目标神经网络进行联合训练,得到已训练的联合神经网络;所述联合神经网络用于迁移到下游业务场景中执行所述目标任务。The network training module is configured to perform joint training on at least two of the target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network uses Perform the target tasks in the migration to downstream business scenarios.
采用上述数据处理系统,在获取到训练数据以及用于构成目标神经网络的至少两个网络组成模块的情况下,可以基于获取的所述训练数据集以及所述至少两个网络组成模块,生成至少一个目标神经网络。这样,在已训练至少两个所述目标神经网络的情况下,可以对至少两个目标神经网络进行联合训练,从而得到已训练的联合神经网络。本公开基于基础的网络组成模块可以生成适应于各种目标任务的目标神经网络,而后通过联合训练可以生成适配于下游业务场景的联合神经网络,其通用性和准确性均较好。Using the above-mentioned data processing system, when the training data and at least two network component modules used to form the target neural network are obtained, based on the acquired training data set and the at least two network component modules, at least A target neural network. In this way, if at least two target neural networks have been trained, joint training can be performed on the at least two target neural networks, so as to obtain a trained joint neural network. The basic network composition modules of the present disclosure can generate target neural networks suitable for various target tasks, and then through joint training, can generate a joint neural network suitable for downstream business scenarios, which has good versatility and accuracy.
本公开实施例还提供了一种数据处理方法,包括:An embodiment of the present disclosure also provides a data processing method, including:
获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;Obtain a training data set and at least two network building blocks for forming a target neural network;
基于获取的所述训练数据集以及至少两个所述网络组成模块,生成至少一个目标神经网络;每个所述目标神经网络用于执行对应的目标任务。At least one target neural network is generated based on the acquired training data set and at least two network components; each target neural network is used to perform a corresponding target task.
本公开实施例还提供了一种数据处理装置,包括:An embodiment of the present disclosure also provides a data processing device, including:
获取模块,配置为获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;An acquisition module configured to acquire a training data set and at least two network composition modules for forming a target neural network;
生成模块,配置为基于获取的所述训练数据集以及至少两个所述网络组成模块,生成至少一个目标神经网络;每个所述目标神经网络用于执行对应的目标任务。The generating module is configured to generate at least one target neural network based on the acquired training data set and at least two of the network composition modules; each of the target neural networks is used to perform a corresponding target task.
本公开实施例还提供了一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如第二方面及其实施方式任一所述的数据处理方法的步骤。An embodiment of the present disclosure also provides an electronic device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processor and the The memory communicates with each other through a bus, and when the machine-readable instructions are executed by the processor, the steps of the data processing method described in any one of the second aspect and its implementation manners are executed.
本公开实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如第二方面及其实施方式任一所述的数据处理方法的步骤。An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the data described in any one of the second aspect and its implementation manners is executed. The steps of the processing method.
本公开实施例提供一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码被计算机读取并执行的情况下,实现本公开任一实施例中的方法的部分或全部步骤。An embodiment of the present disclosure provides a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, a part or part of the method in any embodiment of the present disclosure is realized. All steps.
本公开实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现本公开任一实施例中的方法的部分或全部步骤。An embodiment of the present disclosure provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, any embodiment of the present disclosure is realized Some or all of the steps in the method.
关于上述数据处理装置、电子设备、计算机可读存储介质、计算机程序及计算机程序产品的效果描述参见上述数据处理系统的说明。For the effect description of the above-mentioned data processing device, electronic equipment, computer-readable storage medium, computer program and computer program product, please refer to the description of the above-mentioned data processing system.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.
图1示出了本公开实施例所提供的一种数据处理系统的示意图;FIG. 1 shows a schematic diagram of a data processing system provided by an embodiment of the present disclosure;
图2(a)示出了本公开实施例所提供的数据处理系统中,下采样方式的示意图;FIG. 2(a) shows a schematic diagram of the down-sampling mode in the data processing system provided by the embodiment of the present disclosure;
图2(b)示出了本公开实施例所提供的数据处理系统中,下采样方式的示意图;Fig. 2(b) shows a schematic diagram of the downsampling mode in the data processing system provided by the embodiment of the present disclosure;
图2(c)示出了本公开实施例所提供的数据处理系统中,下采样方式的示意图;FIG. 2(c) shows a schematic diagram of the down-sampling mode in the data processing system provided by the embodiment of the present disclosure;
图3示出了本公开实施例所提供的数据处理系统中,候选搜索路径的搜索示意图;FIG. 3 shows a schematic diagram of searching candidate search paths in the data processing system provided by an embodiment of the present disclosure;
图4示出了本公开实施例所提供的数据处理系统中,目标神经网络的预训练示意图;FIG. 4 shows a schematic diagram of pre-training of the target neural network in the data processing system provided by an embodiment of the present disclosure;
图5示出了本公开实施例所提供的数据处理系统中,联合神经网络训练方法的流程图;图6(a)示出了本公开实施例所提供的数据处理系统中,第一种连接层的连接方式示意图;Fig. 5 shows the flow chart of the joint neural network training method in the data processing system provided by the embodiment of the present disclosure; Fig. 6(a) shows the first connection in the data processing system provided by the embodiment of the present disclosure Schematic diagram of the connection mode of the layers;
图6(b)示出了本公开实施例所提供的数据处理系统中,第二种连接层的连接方式示意图;FIG. 6(b) shows a schematic diagram of a connection mode of the second connection layer in the data processing system provided by an embodiment of the present disclosure;
图6(c)示出了本公开实施例所提供的数据处理系统中,第三种连接层的连接方式示意图;FIG. 6(c) shows a schematic diagram of a connection mode of the third connection layer in the data processing system provided by an embodiment of the present disclosure;
图7示出了本公开实施例所提供的数据处理系统中,码本的训练示意图;FIG. 7 shows a schematic diagram of codebook training in the data processing system provided by an embodiment of the present disclosure;
图8示出了本公开实施例所提供的一种数据处理方法的流程图;FIG. 8 shows a flowchart of a data processing method provided by an embodiment of the present disclosure;
图9示出了本公开实施例所提供的一种数据处理装置的示意图;FIG. 9 shows a schematic diagram of a data processing device provided by an embodiment of the present disclosure;
图10示出了本公开实施例所提供的一种电子设备的示意图。Fig. 10 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一 个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或至少两个元素。The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or at least two elements selected from the set formed by B and C.
经研究发现,目前有关通用视觉神经网络的构建尚未形成一套行之有效的流程及可靠的效果。很多现有的计算机视觉技术受到多种因素的制约,难以实现通用视觉神经网络的目标。以相关技术中提供的一种通用视觉神经网络生成的方法为例,该方法在上游任务中,利用通用数据集训练分类网络以通过分类任务训练出通用的视觉表征。After research, it is found that the construction of a general-purpose visual neural network has not yet formed a set of effective procedures and reliable results. Many existing computer vision technologies are constrained by various factors, making it difficult to achieve the goal of a general-purpose visual neural network. Taking a method for generating a general visual neural network provided in the related art as an example, in the upstream task, the method uses a general data set to train a classification network to train a general visual representation through the classification task.
然而,由于在上游训练的网络只局限于特定的分类任务,这导致在将训练出的视觉表征应用到下游诸如检测、分割等其它任务的时候,效果不佳。除此之外,现有大多数据集,数量有限、标签系统不完善、标注不高效,不能满足通用视觉模型的需求,难以形成规模化。However, since the network trained upstream is limited to specific classification tasks, this results in poor performance when applying the trained visual representations to other downstream tasks such as detection and segmentation. In addition, most of the existing data sets are limited in number, incomplete labeling systems, and inefficient labeling, which cannot meet the needs of general visual models and are difficult to scale.
基于上述研究,本公开提供了一种基于强化学习网络搜索候选搜索路径以实现神经网络生成的方案,该方案在网络性能和网络通用性上均取得了显著的效果。Based on the above research, the present disclosure provides a scheme for network search candidate search paths based on reinforcement learning to realize neural network generation, and the scheme has achieved remarkable results in both network performance and network versatility.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种数据处理系统进行详细介绍。如图1所示为本公开实施例提供的数据处理系统的示意图,该数据处理系统包括:数据采集模块101、网络生成模块102和网络训练模块103;数据采集模块101、网络生成模块102以及网络训练模块103依次通信连接;To facilitate understanding of this embodiment, a data processing system disclosed in an embodiment of the present disclosure is first introduced in detail. As shown in Figure 1, it is a schematic diagram of a data processing system provided by an embodiment of the present disclosure, the data processing system includes: a data acquisition module 101, a network generation module 102 and a network training module 103; a data acquisition module 101, a network generation module 102 and a network The training module 103 is sequentially connected by communication;
数据采集模块101,配置为获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;The data collection module 101 is configured to obtain a training data set and at least two network composition modules for forming a target neural network;
网络生成模块102,配置为基于获取的训练数据集以及至少两个网络组成模块,生成至少一个目标神经网络;每个目标神经网络用于执行对应的目标任务;The network generation module 102 is configured to generate at least one target neural network based on the acquired training data set and at least two network composition modules; each target neural network is used to perform a corresponding target task;
网络训练模块103,配置为在已训练至少两个所述目标神经网络的情况下,对至少两个目标神经网络进行联合训练,得到已训练的联合神经网络;联合神经网络用于迁移到下游业务场景中执行目标任务。The network training module 103 is configured to perform joint training on at least two target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network is used for migration to downstream services Perform the target task in the scene.
为了便于理解本公开实施例提供的数据处理系统,接下来首先对该方法的应用场景进行简单说明。本公开实施例中的数据处理系统可以应用于视觉领域,例如,基于生成的目标神经网络应用于目标检测、图像分类、深度估计等场景中。In order to facilitate the understanding of the data processing system provided by the embodiment of the present disclosure, the application scenario of the method will be briefly described below. The data processing system in the embodiments of the present disclosure can be applied to the field of vision, for example, the neural network based on the generated object is applied to scenarios such as object detection, image classification, and depth estimation.
考虑到相关技术中现有的很多计算机视觉神经网络只局限于特定的计算机视觉任务,难以形成通用化。基于此,本公开实施例提供了一种基于网络组成模块生成目标神经网络,而后通过联合训练得到适配于各种目标任务的联合神经网络的数据处理系统。Considering that many existing computer vision neural networks in related technologies are only limited to specific computer vision tasks, it is difficult to form generalization. Based on this, the embodiments of the present disclosure provide a data processing system that generates a target neural network based on network component modules, and then obtains a joint neural network adapted to various target tasks through joint training.
这里的目标神经网络可以是基于强化学习网络对至少两个网络组成模块所关联的至少两个候选搜索路径的搜索结果确定的。这里的候选搜索路径对应一组特定的组合方式,基于这一组合方式可以将对应的各个网络组成模块进行组合,继而可以得到符合要 求的目标神经网络。The target neural network here may be determined based on a search result of at least two candidate search paths associated with at least two network constituent modules by the reinforcement learning network. The candidate search paths here correspond to a set of specific combination methods. Based on this combination method, the corresponding network components can be combined, and then the target neural network that meets the requirements can be obtained.
本公开实施例中的网络组成模块,可以包括用于进行特征图提取的特征图提取单元,还可以包括针对特征图提取单元输出的特征图进行下采样的下采样单元。The network composition module in the embodiment of the present disclosure may include a feature map extraction unit for feature map extraction, and may also include a downsampling unit for downsampling the feature map output by the feature map extraction unit.
如图2(a)所示,下采样(Down Sampling Modules,DSM)单元可以包括局部下采样(Local DSM,L-DSM),L-DSM中的隐藏层201为卷积步长(stride)为2、维度为二维的卷积层。如图2(b)所示,DSM单元还可以包括局部-全局下采样(Local-global DSM,LG-DSM),LG-DSM中的隐藏层202为卷积步长为2、维度为二维的卷积层,隐藏层203为多头注意力层(Multi-Head Attention)。如图2(c)所示,DSM单元还可以包括全局下采样(Global DSM,G-DSM),G-DSM中的隐藏层204为卷积步长为2、维度为一维的卷积层,隐藏层205为多头注意力层(Multi-Head Attention)。其中,,多头注意力层可以用于确定查询向量(Q)、键向量(K)和值向量(V)。As shown in Fig. 2(a), the down sampling (Down Sampling Modules, DSM) unit can include local down sampling (Local DSM, L-DSM), and the hidden layer 201 in L-DSM is convolution stride (stride) is 2. A two-dimensional convolutional layer. As shown in Figure 2(b), the DSM unit can also include local-global downsampling (Local-global DSM, LG-DSM), and the hidden layer 202 in LG-DSM has a convolution step size of 2 and a dimension of two dimensions The convolutional layer, the hidden layer 203 is a multi-head attention layer (Multi-Head Attention). As shown in Figure 2(c), the DSM unit can also include global downsampling (Global DSM, G-DSM), and the hidden layer 204 in G-DSM is a convolutional layer with a convolution step size of 2 and a dimension of one dimension , the hidden layer 205 is a multi-head attention layer (Multi-Head Attention). Among them, the multi-head attention layer can be used to determine the query vector (Q), key vector (K) and value vector (V).
在一些实施例中,可以从构建统一搜索空间(Unified Searh Space),基于统一搜索空间,搜索候选搜索路径。如图3所示,统一搜索空间301可以由网络组成模型302、下采样单元303和网络大小304确定。网络组成模型302,也可以称为常见业务(General Operations,GOP),可以包括卷积网络(Convolution)、自然语言处理模型(Transformer)和多层感知机(Multilayer Perceptron,MLP)。下采样单元(DSM)303可以包括L-DSM、LG-DSM和G-DSM。网络大小(Size)304可以包括迭代次数(Repeats)、通道数(Channels)扩张数量(Expansion)等。在搜索候选搜索路径的过程中,可以进行多次搜索,如,第一次(N 1)、第二次(N 2)…第五次(N 5)等。 In some embodiments, a unified search space (Unified Search Space) may be constructed to search for candidate search paths based on the unified search space. As shown in FIG. 3 , a unified search space 301 may be determined by a network composition model 302 , a downsampling unit 303 and a network size 304 . The network composition model 302 may also be called general operations (General Operations, GOP), and may include a convolutional network (Convolution), a natural language processing model (Transformer), and a multilayer perceptron (Multilayer Perceptron, MLP). The downsampling unit (DSM) 303 may include L-DSM, LG-DSM and G-DSM. The network size (Size) 304 may include the number of iterations (Repeats), the number of channels (Channels), the number of expansions (Expansion), and the like. In the process of searching for candidate search paths, multiple searches may be performed, for example, the first time (N 1 ), the second time (N 2 )...the fifth time (N 5 ) and so on.
在一些实施例中,上述特征图提取单元可以是基于卷积运算实现的,还可以是基于Transformer架构构建的,还可以是基于多层感知机实现的,还可以是基于其它具有特征提取功能的相关单元实现的,在此不做限制;上述下采样单元可以是基于卷积运算实现的,还可以是基于多层注意力机制实现,还可以是基于其它具有采样功能的相关单元实现的,在此也不做限制。In some embodiments, the above-mentioned feature map extraction unit can be implemented based on convolution operation, can also be based on Transformer architecture, can also be implemented based on multi-layer perceptron, and can also be based on other features with feature extraction functions It is implemented by a related unit, and there is no limitation here; the above-mentioned down-sampling unit can be implemented based on a convolution operation, or based on a multi-layer attention mechanism, or based on other related units with sampling functions. This is also not limited.
本公开实施例中的训练数据集,可以包括各种各样的训练数据,例如,可以包括对应于不同目标任务的训练数据,还可以包括具有至少两个图像文本对的第一训练数据,以及具有至少两个图像的第二训练数据,还可以包括其它训练数据,这里不做限制,可以基于不同的需求来选取相应的数据,且上述各类训练数据可以是预先拆分好的,从而便于在进行相应网络训练的情况下,快速的提取对应的训练数据。The training data set in the embodiment of the present disclosure may include various training data, for example, may include training data corresponding to different target tasks, and may also include the first training data having at least two image-text pairs, and The second training data having at least two images may also include other training data, which is not limited here, and the corresponding data may be selected based on different requirements, and the above-mentioned various training data may be pre-split, so as to facilitate In the case of corresponding network training, the corresponding training data is quickly extracted.
本公开实施例提供的中的训练数据集可以是基于主动学习网络筛选出的高质量的网络数据。这里的网络数据可以是基于网络输入接口获取的,在一些实施例中,可以通过网络爬虫方式自动从网络输入接口获取网络数据。这里的训练数据可以是对获取的网络数据进行质量评估后筛选出的高质量网络数据,高质量的训练数据可以确保网络训练的准确性。The training data set provided in the embodiments of the present disclosure may be high-quality network data screened out based on an active learning network. The network data here may be acquired based on the network input interface, and in some embodiments, the network data may be automatically acquired from the network input interface by means of a web crawler. The training data here may be high-quality network data screened out after evaluating the quality of the acquired network data, and the high-quality training data can ensure the accuracy of network training.
除此之外,本公开实施例中的训练数据集还可以具有初始标注结果,而为了适配于各种网络的训练需求,这里可以进行标注结果扩展。也即,本公开实施例提供的是大规 模标签体系,这里可以利用知识图谱结构对初始标注结果进行扩展,得到扩展后的标注结果。这里,可以基于知识图谱结构对训练数据集的初始标注结果进行扩展。通过扩展后的标注结果可以为网络训练提供更为真实的监督信号,这一定程度上确保了所确定网络精度的准确性。In addition, the training data set in the embodiment of the present disclosure may also have an initial labeling result, and in order to adapt to the training requirements of various networks, the labeling result may be extended here. That is to say, the embodiment of the present disclosure provides a large-scale labeling system, where the knowledge map structure can be used to expand the initial labeling results to obtain the expanded labeling results. Here, the initial labeling results of the training dataset can be extended based on the knowledge graph structure. The extended annotation results can provide more realistic supervision signals for network training, which ensures the accuracy of the determined network accuracy to a certain extent.
在一些实施例中,还可以利用基于语义解析的其它自动化链接方式进行标签体系的扩展。In some embodiments, other automatic linking methods based on semantic analysis can also be used to extend the tag system.
需要说明的是,在采用自动化链接方式扩展标签体系之后,本公开实施例还可以通过数据的重组和标签的清洗确定最终的训练数据和对应的标注结果,这样的标签体系更为符合计算机视觉任务的需求。It should be noted that, after expanding the label system by means of automatic linking, the embodiments of the present disclosure can also determine the final training data and corresponding labeling results through data reorganization and label cleaning. Such a label system is more suitable for computer vision tasks. demand.
为了适配下游业务场景中针对各种目标任务的应用需求,这里,在生成并已训练用于执行不同的目标任务的至少两个目标神经网络的情况下,可以联合至少两个目标神经网络进行联合训练,使得所训练的联合神经网络具有各个目标神经网络的任务特征,从而可以更好的适配与下游业务场景,这里的下游业务场景可以是与计算机视觉相关的场景,例如,门禁领域、无人驾驶领域等。In order to adapt to the application requirements for various target tasks in downstream business scenarios, here, in the case of generating and training at least two target neural networks for performing different target tasks, at least two target neural networks can be combined to perform Joint training, so that the trained joint neural network has the task characteristics of each target neural network, so that it can better adapt to downstream business scenarios. The downstream business scenarios here can be scenarios related to computer vision, for example, the access control field, Unmanned driving field, etc.
需要说明的是,在训练不同任务下的目标神经网络的过程中,可以基于对应于当前任务下的训练数据进行训练,例如,可以训练有关检测任务的检测神经网络,还可以训练有关分类任务的分类神经网络。It should be noted that in the process of training the target neural network under different tasks, training can be performed based on the training data corresponding to the current task. For example, the detection neural network related to the detection task can be trained, and the related classification task can also be trained. Classification neural network.
采用上述数据处理系统,在获取到训练数据以及用于构成目标神经网络的多个网络组成模块的情况下,可以基于获取的所述训练数据集以及所述多个网络组成模块,生成至少一个目标神经网络。这样,在训练好多个目标神经网络的情况下,可以对多个目标神经网络进行联合训练,从而得到训练好的联合神经网络。本公开基于基础的网络组成模块可以生成适应于各种目标任务的目标神经网络,而后通过联合训练可以生成适配于下游业务场景的联合神经网络,其通用性和准确性均较好。Using the above data processing system, when the training data and a plurality of network components used to form the target neural network are obtained, at least one target can be generated based on the obtained training data set and the plurality of network components. Neural Networks. In this way, in the case of training multiple target neural networks, the multiple target neural networks can be jointly trained to obtain a trained joint neural network. The basic network composition modules of the present disclosure can generate target neural networks suitable for various target tasks, and then through joint training, can generate a joint neural network suitable for downstream business scenarios, which has good versatility and accuracy.
考虑到目标神经网络的生成对于本公开实施例提供的数据处理系统的关键作用,接下来可以在一些实施例中进行描述。这里的网络生成模块102,可以按照如下步骤生成用于执行对应目标任务的目标神经网络:Considering that the generation of the target neural network plays a key role in the data processing system provided by the embodiments of the present disclosure, some embodiments may be described next. The network generation module 102 here can generate the target neural network for performing the corresponding target task according to the following steps:
步骤一、确定与至少两个网络组成模块关联的至少两个候选搜索路径,其中,每个候选搜索路径对应一种组合方式,组合方式用于表征每个网络组成模块之间的运算关系; Step 1. Determine at least two candidate search paths associated with at least two network constituent modules, wherein each candidate search path corresponds to a combination mode, and the combination mode is used to represent the operational relationship between each network constituent module;
步骤二、利用对应于目标任务的训练数据以及强化学习网络,对至少两个候选搜索路径进行至少一次搜索,得到每次搜索后的回报得分; Step 2, using the training data corresponding to the target task and the reinforcement learning network to search at least two candidate search paths at least once, and obtain the reward score after each search;
步骤三、按照回报得分符合预设要求的候选搜索路径所对应的组合方式,将内个网络组成模块进行组合,得到用于执行目标任务的目标神经网络。Step 3: According to the combination method corresponding to the candidate search paths whose return scores meet the preset requirements, the inner network components are combined to obtain the target neural network for performing the target task.
本公开实施例中,确定出的候选搜索路径可以为至少两个。这里可以基于强化学习网络的学习进行一次或多次路径的搜索,每次路径搜索均可以得到搜索后的候选搜索路径。随着强化学习网络的训练次数的逐渐增加,其学习能力也越来越强,搜索能力也将 随着训练次数的增加而逐渐增强,继而可以筛选出更多更好的候选搜索路径。In the embodiment of the present disclosure, there may be at least two determined candidate search paths. Here, one or more path searches can be performed based on the learning of the reinforcement learning network, and a searched candidate search path can be obtained for each path search. As the training times of the reinforcement learning network gradually increase, its learning ability becomes stronger and stronger, and the search ability will gradually increase with the increase of training times, and then more and better candidate search paths can be screened out.
这里,在确定出与多个网络组成模块关联的多个候选搜索路径的情况下,可以利用强化学习网络对多个候选搜索路径进行搜索,以根据每次搜索后的回报得分生成目标神经网络。本公开中的候选搜索路径表征的可以是各个网络组成模块之间的运算关系,基于各种各样的运算关系可以确定的候选搜索路径的数量也较多,又利用强化学习网络可以学习从大量候选搜索路径中搜索表征神经网络性能较好的候选搜索路径,这样,基于搜索到的候选搜索路径所生成的目标神经网络的通用性和准确性均较好。Here, when multiple candidate search paths associated with multiple network constituent modules are determined, the reinforcement learning network can be used to search the multiple candidate search paths, so as to generate the target neural network according to the reward score after each search. The candidate search paths in this disclosure can be characterized by the operational relationship between various network components, and the number of candidate search paths that can be determined based on various operational relationships is also large, and the reinforcement learning network can be used to learn from a large number of In the candidate search path, a candidate search path that represents better performance of the neural network is searched, so that the target neural network generated based on the searched candidate search path has better versatility and accuracy.
为了便于描述利用强化学习网络实现至少一次搜索的过程,接下来首先对强化学习网络的标准环境进行简单说明。强化学习网络的标准环境包括状态(State),动作(Action)和回报(Reword)。其更新形式是在当前时刻输入动作,环境经过单步运行得到该时刻的状态和回报,状态策略函数可以计算下一时刻的输入动作,回报则用来更新策略的权重参数。In order to facilitate the description of the process of using the reinforcement learning network to realize at least one search, the standard environment of the reinforcement learning network is briefly described next. The standard environment for reinforcement learning networks includes State, Action and Reword. The update form is to input an action at the current moment, and the environment obtains the state and reward at that moment through a single-step operation. The state policy function can calculate the input action at the next moment, and the reward is used to update the weight parameters of the strategy.
本公开实施例中,当前时刻输入动作可以指向的是搜索下一个候选搜索路径,该时刻的状态可以指向的是选择对应候选搜索路径的选择概率。In the embodiment of the present disclosure, the input action at the current moment may point to searching for the next candidate search path, and the state at this moment may point to the selection probability of selecting the corresponding candidate search path.
这里,可以将任一个候选搜索路径作为强化学习网络的初始状态信息,并基于初始状态信息确定第一次搜索所选择的候选搜索路径,基于第一次搜索所选择的候选搜索路径以及对应于目标任务的训练数据,确定第一次搜索后的回报得分以及选择对应候选搜索路径的选择概率。Here, any candidate search path can be used as the initial state information of the reinforcement learning network, and the candidate search path selected by the first search can be determined based on the initial state information, and the candidate search path selected based on the first search and corresponding to the target The training data of the task determines the reward score after the first search and the selection probability of selecting the corresponding candidate search path.
在得到第一次搜索后的回报得分以及选择对应候选搜索路径的选择概率的情况下,可以基于第一次搜索后的回报得分以及选择对应候选搜索路径的选择概率确定第二次搜索所选择的候选搜索路径,而后可以得到第二次搜索后的回报得分以及选择对应候选搜索路径的选择概率,依此循环,可以基于下一次搜索,确定第三次、第四次、直至最后一次搜索后的回报得分。In the case of obtaining the return score after the first search and the selection probability of selecting the corresponding candidate search path, the choice of the second search can be determined based on the return score after the first search and the selection probability of the corresponding candidate search path Candidate search paths, and then you can get the return score after the second search and the selection probability of selecting the corresponding candidate search path, and in this way, you can determine the third, fourth, and last search based on the next search. Return score.
本公开实施例中,可以执行n次搜索。n可以是大于1的整数,例如,可以是100次,1000次等。有关n的数值可以结合不同应用场景需求来确定,对此不做限制。In the embodiment of the present disclosure, n searches may be performed. n may be an integer greater than 1, for example, may be 100 times, 1000 times, etc. The value of n may be determined in combination with requirements of different application scenarios, and there is no limitation thereto.
这里,通过路径搜索,可以确定对应搜索后的回报得分和选择对应候选搜索路径的选择概率,利用强化学习网络的学习原理,可以自动选择回报得分比较高的搜索策略,进而使得所得到的候选搜索路径更为可靠。Here, through the path search, the return score after the corresponding search and the selection probability of selecting the corresponding candidate search path can be determined. Using the learning principle of the reinforcement learning network, the search strategy with a relatively high return score can be automatically selected, so that the obtained candidate search Paths are more reliable.
有关搜索的执行可以是基于网络截止条件来确定的,这里的网络截止条件可以是迭代次数足够多,还可以是得到的满足预设要求的候选搜索路径的数量足够大,还可以是其它条件,本公开实施例对此不做限制。The execution of the search can be determined based on the network cut-off condition, where the network cut-off condition can be that the number of iterations is large enough, the number of candidate search paths that meet the preset requirements is large enough, or other conditions, Embodiments of the present disclosure do not limit this.
本公开实施例中,可以从搜索出的各个候选搜索路径中选取出回报得分比较高的候选搜索路径。这里,可以选择回报得分最高的候选搜索路径;还可以按照回报得分对各次搜索对应的候选搜索路径进行排名,并选择排名高于预设名次的候选搜索路径,例如,可以选取排名前三的候选搜索路径作为相应的组合方式;还可以选择回报得分高于预设阈值的候选搜索路径。这里,有关训练数据的标注结果可以是基于知识图谱结构扩展后 的标注结果。通过扩展后的标注结果可以为网络训练提供更为真实的监督信号,这一定程度上确保了所确定网络精度的准确性。In the embodiment of the present disclosure, a candidate search path with a relatively high return score may be selected from the searched candidate search paths. Here, you can select the candidate search path with the highest return score; you can also rank the candidate search paths corresponding to each search according to the return score, and select the candidate search path with a higher ranking than the preset ranking, for example, you can select the top three Candidate search paths are used as corresponding combinations; candidate search paths whose reward scores are higher than a preset threshold can also be selected. Here, the labeling result of the training data can be the labeling result based on the expanded knowledge map structure. The extended annotation results can provide more realistic supervision signals for network training, which ensures the accuracy of the determined network accuracy to a certain extent.
其中,有关每次搜索后的回报得分可以是基于有关候选神经网络的网络精度来确定的。对于网络精度越高的候选神经网络而言,一定程度上可以说明该候选神经网络的训练性能更好,此时可以通过正反馈来激励执行诸如类似的路径搜索,反之,对于网络精度越低的候选搜索路径而言,一定程度上可以说明该候选神经网络的训练性能较差,此时可以通过负反馈来减少执行诸如类似的路径搜索。Wherein, the reward score after each search may be determined based on the network accuracy of the candidate neural network. For a candidate neural network with higher network precision, it can be explained to a certain extent that the training performance of the candidate neural network is better. At this time, positive feedback can be used to motivate the execution of similar path searches. On the contrary, for the network with lower precision As far as the candidate search path is concerned, it can be explained to a certain extent that the training performance of the candidate neural network is poor. At this time, negative feedback can be used to reduce the execution of similar path searches.
这里,可以基于有关候选神经网络的网络精度来确定回报得分。网络精度越高的候选神经网络的网络性能越是能够满足各领域需求,继而可以赋予更高的回报得分。在这样的得分赋予机制下,可以得到更多更好的候选搜索路径。Here, the reward score may be determined based on the network accuracy with respect to the candidate neural network. The network performance of the candidate neural network with higher network accuracy can better meet the needs of various fields, and then can be given a higher return score. Under such a scoring mechanism, more and better candidate search paths can be obtained.
这里,在将针对目标任务的训练数据输入到构建的候选神经网络之后,可以确定输出结果与针对训练数据的标注结果之间的差值,差值越大,一定程度上说明候选神经网络的网络精度越低,反之,差值越小,一定程度上说明候选神经网络的网络精度越高。Here, after inputting the training data for the target task into the constructed candidate neural network, the difference between the output result and the labeling result for the training data can be determined. The larger the difference, the better the candidate neural network is. The lower the accuracy, on the contrary, the smaller the difference, to a certain extent, the higher the network accuracy of the candidate neural network.
本公开实施例中,可以通过至少两个训练数据对应的比对结果共同确定候选神经网络的网络精度,例如,可以基于各个训练数据对应的网络精度所求取的平均网络精度确定最终的网络精度。In the embodiment of the present disclosure, the network accuracy of the candidate neural network can be jointly determined through the comparison results corresponding to at least two training data, for example, the final network accuracy can be determined based on the average network accuracy obtained from the network accuracy corresponding to each training data .
本公开实施例提供的数据处理系统中,在网络生成模块102生成有目标神经网络的情况下,可以基于网络训练模块103进行大规模的多模态数据的预训练,以提升目标神经网络的训练性能。其中,这里的目标神经网络包括用于进行特征提取的骨干网络层以及用于进行特征处理的其他网络层,上述网络训练模块103在一些实施例中,可以按照如下步骤训练目标神经网络:In the data processing system provided by the embodiment of the present disclosure, when the network generation module 102 generates a target neural network, pre-training of large-scale multi-modal data can be performed based on the network training module 103 to improve the training of the target neural network performance. Wherein, the target neural network here includes a backbone network layer for feature extraction and other network layers for feature processing. In some embodiments, the above-mentioned network training module 103 can train the target neural network according to the following steps:
步骤一、利用第一训练数据对待训练的目标神经网络包括的骨干网络层进行训练,得到已训练的骨干网络层; Step 1, using the first training data to train the backbone network layer included in the target neural network to be trained to obtain the trained backbone network layer;
步骤二、在已训练的骨干网络层的网络参数值保持不变的情况下,利用第二训练数据对待训练的目标神经网络包括的其他网络层进行训练,得到已训练的其他网络层。Step 2: When the network parameter values of the trained backbone network layer remain unchanged, use the second training data to train other network layers included in the target neural network to be trained to obtain other trained network layers.
这里,为了提取出更为通用的视觉表征,可以利用获取的包括图像文本对的第一训练数据对待训练的目标神经网络包括的骨干网络层进行训练。这里,可以通过不同的训练数据分别对目标神经网络的骨干网络层和其他网络层进行训练,以进一步提升相应网络层的训练性能。Here, in order to extract a more general visual representation, the backbone network layer included in the target neural network to be trained may be trained using the acquired first training data including image-text pairs. Here, the backbone network layer and other network layers of the target neural network can be trained respectively by using different training data, so as to further improve the training performance of the corresponding network layer.
在已训练骨干网络层的情况下,骨干网络层的网络参数值保持不变,此时,可以基于局部自监督的方式,利用包括图像的第二训练数据对目标神经网络包括的其他网络层进行训练,从而进一步提升目标神经网络的训练性能。In the case of the trained backbone network layer, the network parameter values of the backbone network layer remain unchanged. At this time, based on local self-supervision, the second training data including images can be used to perform training on other network layers included in the target neural network. training to further improve the training performance of the target neural network.
其中,有关骨干网络层的训练过程,在一些实施例中,可以通过如下步骤实现:Among them, the training process of the backbone network layer, in some embodiments, can be implemented through the following steps:
步骤一、将第一训练数据输入到待训练的目标神经网络,得到第一训练数据所包括图像文本对中的图像和文本分别对应的图像特征信息以及文本特征信息; Step 1. Input the first training data into the target neural network to be trained, and obtain image feature information and text feature information respectively corresponding to the image and text in the image-text pair included in the first training data;
步骤二、基于图像特征信息和文本特征信息之间的特征相似度,确定第一损失函数 值; Step 2, based on the feature similarity between the image feature information and the text feature information, determine the first loss function value;
步骤三、在当前轮训练不满足迭代截止条件的情况下,基于第一损失函数值对骨干网络层的网络参数值进行调整,并基于调整后的骨干网络层进行下一轮训练,直至满足迭代截止条件。 Step 3. When the current round of training does not meet the iteration cut-off condition, adjust the network parameter value of the backbone network layer based on the first loss function value, and perform the next round of training based on the adjusted backbone network layer until the iteration is satisfied. Deadline.
这里,可以基于图像文本对中的图像和文本分别对应的图像特征信息以及文本特征信息之间的特征相似度确定第一损失函数值,第一损失函数值越小,说明两个特征信息之间的特征相似度越接近,这也是训练骨干网络层的目的。将第一训练数据输入到未训练的目标神经网络可以包括:利用未训练的目标神经网络,对第一训练数据进行特征提取。Here, the first loss function value can be determined based on the feature similarity between the image feature information corresponding to the image and the text in the image-text pair and the text feature information. The smaller the first loss function value, the better the relationship between the two feature information. The closer the feature similarity is, this is also the purpose of training the backbone network layer. Inputting the first training data into the untrained target neural network may include: using the untrained target neural network to perform feature extraction on the first training data.
其中,上述图像文本对可以是从互联网爬取的,且爬取的图像文本对的数量巨大。在这种情况下,本公开实施例可以利用自监督技术,从大规模带有噪声的图像文本对中寻找更多的监督信息,从而可以确保较佳的训练性能。Wherein, the above-mentioned image-text pairs may be crawled from the Internet, and the number of crawled image-text pairs is huge. In this case, the embodiments of the present disclosure can use self-supervision technology to find more supervision information from large-scale noisy image-text pairs, so as to ensure better training performance.
另外,有关迭代截止条件可以是迭代次数足够多,还可以是第一损失函数值足够小,还可以是有关遍历完第一训练数据等条件,在此不做限制。In addition, the iteration cut-off condition may be that the number of iterations is sufficient, the value of the first loss function is small enough, or the condition that the first training data has been traversed, etc., and there is no limitation here.
如图4所示,在一些实施例中,目标神经网络的预训练可以为,获取图像数据401和文本数据403。利用图像编码器402,提取图像数据401的图像特征,利用文本编码器404,提取文本数据403的文本特征,可以基于所有的图像特征和文本特征,分别进行内积的确定处理,得到一个特征矩阵405。对于图像数据401,特征矩阵405的行方向就是一个分类器,对于文本数据403,特征矩阵405的列方向也是一个分类器。这里,在训练的过程中,可采用初始监督(Original Supervision)方式进行训练。本公开实施例中,目标神经网络的预训练可以为,获取至少两个图像数据406和至少两个文本数据407。利用图像编码器407,提取图像数据406的图像特征,利用文本编码器408,提取文本数据408的文本特征,对文本特征进行特征提取,得到特征队列410。可以对所有的图像特征、文本特征和特征队列410,分别进行内积的确定处理,得到一个特征矩阵411。这里,在训练的过程中,可采用初始监督方式、自监督(Self-Supervision)方式、多视图监督(Multi-View Supervision)方式和最近邻插值监督(Nearest-Neighbor Supervision)方式进行训练。训练得到的特征矩阵411,可以用于进一步地针对目标检测或者目标分割416等任务,对骨干网络层412进行预训练,从而固定骨干网络层的参数;也可以基于选择性客体对比学习(Selective Object Contrastive Learning,SOCO)方式413,对特征金字塔网络(Feature Pyramid Networks,FPN)414和检测头(Hesd)网络层415进行预训练。As shown in FIG. 4 , in some embodiments, the pre-training of the target neural network may include acquiring image data 401 and text data 403 . Using the image encoder 402 to extract the image features of the image data 401, using the text encoder 404 to extract the text features of the text data 403, the inner product can be determined based on all the image features and text features to obtain a feature matrix 405. For the image data 401, the row direction of the feature matrix 405 is a classifier, and for the text data 403, the column direction of the feature matrix 405 is also a classifier. Here, in the training process, the original supervision (Original Supervision) method can be used for training. In the embodiment of the present disclosure, the pre-training of the target neural network may include acquiring at least two image data 406 and at least two text data 407 . Image features of image data 406 are extracted by image encoder 407 , text features of text data 408 are extracted by text encoder 408 , feature extraction is performed on text features, and feature queue 410 is obtained. All image features, text features and feature queues 410 can be determined by inner product respectively to obtain a feature matrix 411 . Here, in the training process, initial supervision, self-supervision, multi-view supervision and nearest-neighbor supervision can be used for training. The feature matrix 411 obtained through training can be used to further perform pre-training on the backbone network layer 412 for tasks such as target detection or target segmentation 416, thereby fixing the parameters of the backbone network layer; Contrastive Learning, SOCO) mode 413, pre-training feature pyramid network (Feature Pyramid Networks, FPN) 414 and detection head (Hesd) network layer 415.
有关其它网络层的训练过程,在一些实施例中,可以通过如下步骤实现:Regarding the training process of other network layers, in some embodiments, it can be realized through the following steps:
步骤一、将第二训练数据输入到待训练的目标神经网络,得到目标神经网络包括的其他网络层的输出结果; Step 1, inputting the second training data into the target neural network to be trained, and obtaining output results of other network layers included in the target neural network;
步骤二、基于输出结果和针对第二训练数据所包括图像的标注结果,确定第二损失函数值; Step 2. Determine a second loss function value based on the output result and the labeling result of the image included in the second training data;
步骤三、在当前轮训练不满足迭代截止条件的情况下,基于第二损失函数值对其他网络层的网络参数值进行调整,并基于调整后的其他网络层进行下一轮训练,直至满足迭代截止条件。 Step 3. When the current round of training does not meet the iteration cut-off condition, adjust the network parameter values of other network layers based on the second loss function value, and perform the next round of training based on the adjusted other network layers until the iteration is satisfied. Deadline.
这里,可以基于其他网络层的输出结果与第二训练数据所包括图像的标注结果之间的匹配度确定第二损失函数值,第二损失函数值越小,说明两个结果之间越接近,这也是训练其他网络层的目的。将第二训练数据输入到待训练的目标神经网络可以包括:利用未训练的目标神经网络,对第二训练数据进行特征提取。Here, the second loss function value can be determined based on the matching degree between the output results of other network layers and the labeling results of the images included in the second training data. The smaller the second loss function value, the closer the two results are. This is also the purpose of training other network layers. Inputting the second training data into the target neural network to be trained may include: performing feature extraction on the second training data by using an untrained target neural network.
除此之外,本公开实施例还可以是不为第二训练数据包括的图像进行标注,而是利用自监督的方式来训练其他网络层,从而进一步提升对目标神经网络的训练性能。In addition, the embodiments of the present disclosure may not label the images included in the second training data, but use self-supervision to train other network layers, so as to further improve the training performance of the target neural network.
另外,这里的有关迭代截止条件与上述训练骨干网络层的条件类似,可以参照上述描述。In addition, the relevant iteration cut-off conditions here are similar to the above-mentioned conditions for training the backbone network layer, and you can refer to the above description.
考虑到本公开实施例中的目标神经网络可以是针对不同的任务特征训练而得到的,而不同的任务之间又存在较大的异构问题,基于此,本公开实施例提供了一种兼顾多任务性能的联合神经网络的训练方法,该方法可以通过本公开实施例中的网络训练模块103来实现,在一些实施例中,可以包括步骤:Considering that the target neural network in the embodiment of the present disclosure can be obtained by training for different task characteristics, and there is a large heterogeneity problem among different tasks, based on this, the embodiment of the present disclosure provides a balanced The training method of the joint neural network of multi-task performance, this method can be realized by the network training module 103 in the embodiment of the present disclosure, in some embodiments, can include steps:
步骤一、利用至少两个目标神经网络分别对训练数据集中的训练数据进行特征提取,得到每个目标神经网络包括的骨干网络层输出的特征信息; Step 1, using at least two target neural networks to extract features from the training data in the training data set, and obtain the feature information output by the backbone network layer included in each target neural network;
步骤二、基于各个目标神经网络包括的骨干网络层输出的特征信息,确定待训练的联合神经网络的损失函数值,其中,联合神经网络由至少两个目标神经网络以及各个目标神经网络包括的骨干网络层之间的连接层构成; Step 2. Determine the loss function value of the joint neural network to be trained based on the feature information output by the backbone network layer included in each target neural network, wherein the joint neural network consists of at least two target neural networks and the backbone of each target neural network. The connection layer composition between the network layers;
步骤四、基于损失函数值对待训练的联合神经网络进行至少一轮网络训练,得到已训练的联合神经网络。Step 4: Perform at least one round of network training on the joint neural network to be trained based on the value of the loss function to obtain a trained joint neural network.
这里,可以结合各个目标神经网络包括的骨干网络层输出的特征信息,确定待训练的联合神经网络的损失函数值,由于各个目标神经网络包括的骨干网络层之间具有连接层,利用这一连接层可以融合各个目标神经网络包括的骨干网络层输出的特征信息,进一步使得所训练的联合神经网络具有各个目标神经网络的任务特征。Here, the characteristic information output by the backbone network layer included in each target neural network can be combined to determine the loss function value of the joint neural network to be trained. Since there is a connection layer between the backbone network layers included in each target neural network, use this connection The layer can fuse the feature information output by the backbone network layer included in each target neural network, and further make the trained joint neural network have the task characteristics of each target neural network.
在一些实施例中,至少两个目标神经网络中的一个目标神经网络作为联合神经网络的主神经网络,至少两个目标神经网络中的其它目标神经网络作为联合神经网络的副神经网络,这里,网络训练模块103,可以按照如下步骤确定待训练的联合神经网络的损失函数值:In some embodiments, one target neural network in the at least two target neural networks is used as the main neural network of the joint neural network, and other target neural networks in the at least two target neural networks are used as the secondary neural network of the joint neural network. Here, The network training module 103 can determine the loss function value of the joint neural network to be trained according to the following steps:
步骤一、基于副神经网络包括的第一骨干网络层输出的第一特征信息对主神经网络包括的第二骨干网络层输出的第二特征信息进行更新,得到更新后的第二特征信息; Step 1. Based on the first feature information output by the first backbone network layer included in the secondary neural network, the second feature information output by the second backbone network layer included in the main neural network is updated to obtain updated second feature information;
步骤二、将更新后的第二特征信息输入到主神经网络包括的其他网络层,得到其他网络层的输出结果; Step 2, inputting the updated second characteristic information into other network layers included in the main neural network, and obtaining output results of other network layers;
步骤三、基于其他网络层的输出结果以及主神经网络对应任务下的标注结果,确定联合神经网络的损失函数值。Step 3: Determine the loss function value of the joint neural network based on the output results of other network layers and the labeling results of the corresponding task of the main neural network.
这里,可以基于更新后的第二特征信息所确定的其他网络层的输出结果以及主神经网络对应任务下的标注结果的比对结果,确定联合神经网络的损失函数值。Here, the loss function value of the joint neural network can be determined based on the output results of other network layers determined by the updated second feature information and the comparison results of the labeling results under the corresponding task of the main neural network.
在一些实施例中,通过利用主神经网络包括的其他网络层,对更新后的第二特征信息进行特征提取,从而得到其他网络层的输出结果。主神经网络和副神经网络对应的可以是异构任务,例如,可以分别是检测任务和分类任务。In some embodiments, by using other network layers included in the main neural network, feature extraction is performed on the updated second feature information, so as to obtain output results of other network layers. The main neural network and the auxiliary neural network can correspond to heterogeneous tasks, for example, they can be detection tasks and classification tasks respectively.
其中,主神经网络可以是利用同构数据训练得到的,这里的同构数据可以指向的是执行同类任务的数据,例如,这里的主神经网络可以用于执行行人检测的检测任务1以及用于执行车辆检测的检测任务2。同理,副神经网络也可以是利用同构数据训练得到的,例如,这里的副神经网络可以用于执行图像分类的分类任务1和分类任务2。Among them, the main neural network can be obtained by using isomorphic data training, and the isomorphic data here can point to the data that performs similar tasks. For example, the main neural network here can be used to perform the detection task 1 of pedestrian detection and for Execute detection task 2 for vehicle detection. Similarly, the sub-neural network can also be trained using isomorphic data. For example, the sub-neural network here can be used to perform classification task 1 and classification task 2 of image classification.
其中,同类任务可以通过共享网络的同几层隐藏层,只不过在网络的靠近输出层的网络开始分叉去做不同的任务,如上述检测任务1和检测任务2。不同任务(即异构任务)通过共享网络底部的几层隐藏层来学习一些共有的抽象层次低的特征,底层共享的参数可以是完全相同的。此外,针对各个任务的特点,各个任务都可以设计各自的任务特有层来学习抽象层次更高的特征。所有任务在保留任务特有的输出层的同时可以共享一些相关的隐藏层。Among them, similar tasks can share the same hidden layers of the network, but the network near the output layer starts to fork to do different tasks, such as the above-mentioned detection task 1 and detection task 2. Different tasks (that is, heterogeneous tasks) learn some common low-level abstraction features by sharing several hidden layers at the bottom of the network, and the parameters shared by the bottom layer can be exactly the same. In addition, according to the characteristics of each task, each task can design its own task-specific layer to learn features with a higher level of abstraction. All tasks can share some related hidden layers while retaining task-specific output layers.
在异构多任务学习的联合神经网络训练中,每个任务(如分类、检测)都有与参数空间大小相同的骨干网络层。这里,可以将副任务所对应副神经网络的骨干网络层输出的特征信息与主任务所对应主神经网络的骨干网络层输出的特征信息进行融合。这里,通过副神经网络对主神经网络的辅助训练,可以使得所训练的联合神经网络具备多任务的任务特性,进而可以在后续下游应用中更加通用于各种任务场景。In joint neural network training for heterogeneous multi-task learning, each task (e.g., classification, detection) has a backbone network layer with the same size as the parameter space. Here, the feature information output by the backbone network layer of the sub-neural network corresponding to the sub-task can be fused with the feature information output by the backbone network layer of the main neural network corresponding to the main task. Here, through the auxiliary training of the main neural network by the auxiliary neural network, the trained joint neural network can have the task characteristics of multi-task, and then it can be more commonly used in various task scenarios in subsequent downstream applications.
在上述特征融合的过程中,本公开实施例引入了连接层T来帮助各个神经网络之间的特征交流。如图5所示为本公开实施例的两个目标神经网络的训练示意图。两个目标神经网络的训练可以包括混合共享(Mixed Share)方式,混合共享可以包括三个分支。第一分支的网络可以包括骨干网络层(Stage)501、第一检测任务(Head1)502和第二检测任务(Head2)503,第二分支的网络可以包括骨干网络层501、第一分类任务(Head3)504和第二分类任务(Head4)505等。软共享中包括两个分支,左侧分支对应的是主神经网络,右侧分支所对应的是副神经网络。主神经网络可以包括骨干网络层501和第一检测任务502和第二检测任务503,副神经网络可以包括骨干网络层501和第一分类任务504和第二分类任务505。Stage对应的是神经网络的骨干网络层,Head对应的是与任务相关的其他网络层,例如,Head1和Head2分别对应的检测任务1和检测任务2,Head3和Head4分别对应的分类任务1和分类任务2。In the above process of feature fusion, the embodiment of the present disclosure introduces a connection layer T to help feature communication between neural networks. FIG. 5 is a schematic diagram of training two target neural networks according to an embodiment of the present disclosure. The training of the two target neural networks may include a mixed share (Mixed Share) manner, and the mixed share may include three branches. The network of the first branch may include a backbone network layer (Stage) 501, a first detection task (Head1) 502 and a second detection task (Head2) 503, and the network of the second branch may include a backbone network layer 501, a first classification task ( Head3) 504 and the second classification task (Head4) 505, etc. Soft sharing includes two branches, the left branch corresponds to the main neural network, and the right branch corresponds to the secondary neural network. The main neural network may include a backbone network layer 501 and a first detection task 502 and a second detection task 503 , and the secondary neural network may include a backbone network layer 501 and a first classification task 504 and a second classification task 505 . Stage corresponds to the backbone network layer of the neural network, and Head corresponds to other network layers related to tasks. For example, Head1 and Head2 correspond to detection task 1 and detection task 2 respectively, and Head3 and Head4 correspond to classification task 1 and classification respectively. Task 2.
在将训练数据输入到主神经网络和副神经网络的情况下,主神经网络包括的Stage可以提取相应的第二特征信息,副神经网络包括的Stage可以提取相应的第一特征信息,通过连接层T可以实现第一特征信息和第二特征信息之间融合。In the case of inputting training data into the main neural network and the sub-neural network, the stage included in the main neural network can extract the corresponding second feature information, and the stage included in the sub-neural network can extract the corresponding first feature information, through the connection layer T can realize fusion between the first feature information and the second feature information.
需要说明的是,在确定有关联合神经网络的损失函数值的过程中,可以是基于主神经网络输出的第二特征信息以及所主神经网络对应的两个任务下的标注结果,确定上述 损失函数值。It should be noted that in the process of determining the loss function value of the joint neural network, the above loss function can be determined based on the second feature information output by the main neural network and the labeling results under the two tasks corresponding to the main neural network. value.
本公开实施例中的连接层T的摆放位置可以是多样的。如图6(a)~6(c)所示,图6(a)表示的是连接层602可以摆放在特定的骨干网络层601的中间,在尺寸相同的特征层进行特征迁移,6(b)表示的是连接层604在不同的尺寸的骨干网络层603之间进行特征迁移,图6(c)表示的是连接层606在相同尺寸的不同骨干网络层605的连接层607(block)进行融合。The arrangement position of the connection layer T in the embodiments of the present disclosure may be varied. As shown in Figures 6(a) to 6(c), Figure 6(a) shows that the connection layer 602 can be placed in the middle of a specific backbone network layer 601, and feature migration is performed on a feature layer of the same size, 6( b) shows that the connection layer 604 performs feature migration between backbone network layers 603 of different sizes, and Figure 6(c) shows the connection layer 606 of the connection layer 607 (block) in different backbone network layers 605 of the same size Perform fusion.
需要说明的是,这里的训练数据可以是各任务相关的图像样本,该图像样本可以是具有标注结果的训练数据。It should be noted that the training data here may be image samples related to each task, and the image samples may be training data with labeling results.
本公开实施例提供的数据处理系统中训练目标神经网络和联合神经网络的过程可以是在上游训练得到的。为了扩展在下游业务场景中的泛化能力,本公开实施例提供了一种数据重表征的训练方式对联合神经网络进行再次训练,在一些实施例中,可以通过图1所示的与网络训练模块103通信连接的网络迁移模块104来实现。The process of training the target neural network and the joint neural network in the data processing system provided by the embodiments of the present disclosure may be obtained through upstream training. In order to expand the generalization capability in downstream business scenarios, the embodiments of the present disclosure provide a data re-characterization training method to retrain the joint neural network. In some embodiments, the network training shown in FIG. 1 can be used. The network migration module 104 connected with the module 103 is realized.
其中,上述训练联合神经网络的过程,在一些实施例中,可以包括如下步骤:Wherein, the above-mentioned process of training the joint neural network, in some embodiments, may include the following steps:
步骤一、基于训练数据集中包括的至少两个图像,确定用于表征将每个图像分解为至少两个基元的码本; Step 1. Based on at least two images included in the training data set, determine a codebook for decomposing each image into at least two primitives for characterizing;
步骤二、在将已训练的目标神经网络迁移到下游业务场景的情况下,基于得到的码本对目标业务场景下采集的目标训练数据进行表征,得到表征后的目标训练数据; Step 2. In the case of migrating the trained target neural network to a downstream business scenario, characterize the target training data collected in the target business scenario based on the obtained codebook, and obtain the represented target training data;
步骤三、利用表征后的目标训练数据对联合神经网络进行再次训练,得到用于对目标业务场景下采集的目标场景数据进行处理的已训练的联合神经网络。Step 3: retraining the joint neural network by using the represented target training data to obtain a trained joint neural network for processing the target scene data collected in the target business scene.
这里,可以首先利用上游的训练数据学习一个码本,使用此码本对下游的训练数据进行重表征,然后使用重表征的下游数据对联合神经网络进行微调,最后使用原始的下游数据进行最后的微调,从而可以扩展所生成的联合神经网络在下游业务场景的泛化性能。为了充分挖掘上游所训练的联合神经网络中蕴含的特征,避免迁移到下游时发生信息丢失的问题,这里,可以利用上游训练数据所确定的码本重表征下游的目标训练数据,以高效准确的将训练好的联合神经网络应用到下游的目标业务场景。Here, you can first use the upstream training data to learn a codebook, use this codebook to re-represent the downstream training data, then use the re-represented downstream data to fine-tune the joint neural network, and finally use the original downstream data for final Fine-tuning, so that the generalization performance of the generated joint neural network in downstream business scenarios can be extended. In order to fully exploit the features contained in the joint neural network trained upstream and avoid the problem of information loss when migrating to the downstream, here, the codebook determined by the upstream training data can be used to re-characterize the downstream target training data to efficiently and accurately Apply the trained joint neural network to the downstream target business scenario.
在实际应用中,可以利用配对的编码器和解码器所构成的对抗网络进行码本的训练。这里,可以将图像输入到未训练的编码器,得到编码器输出的码本;将编码器输出的码本输入到未训练的解码器,得到解码器输出的图像,然后验证解码器输出的图像与输入的图像之间的相似度是否大于预设阈值,如果不大于预设阈值,则循环上述过程,直至大于预设阈值。这里的码本可以是基于编码器和解码器所构成的对抗网络实现的图像编码,准确性较高。In practical applications, codebook training can be performed using an adversarial network composed of paired encoders and decoders. Here, the image can be input to the untrained encoder to obtain the codebook output by the encoder; the codebook output by the encoder can be input to the untrained decoder to obtain the image output by the decoder, and then the image output by the decoder can be verified Whether the similarity with the input image is greater than the preset threshold, if not greater than the preset threshold, repeat the above process until it is greater than the preset threshold. The codebook here can be an image coding based on an adversarial network composed of an encoder and a decoder, with high accuracy.
这里,利用已训练的码本可以使得一张图片通过编码器将图片分解为由若干个基元组成的码本,再通过解码器能将这些基元基本还原为该图片。Here, by using the trained codebook, an encoder can decompose the picture into a codebook composed of several primitives, and then the decoder can basically restore these primitives to the picture.
这样,在使用已训练的码本,重表征下游业务场景下采集的目标训练数据(对应下游数据)的情况下,可以使用重表征后的下游数据对网络微调,在这一步,可以固定预训练的联合神经网络的骨干网络层的参数,只调整骨干网络层后面任务相关的其他网络 层的参数,以提升在任务场景下的泛化能力。其中,上述目标训练数据可以是图像,还可以是其它包含有图像的训练数据。In this way, in the case of using the trained codebook to re-represent the target training data (corresponding to downstream data) collected in the downstream business scenario, the re-represented downstream data can be used to fine-tune the network. In this step, the pre-training can be fixed The parameters of the backbone network layer of the joint neural network only adjust the parameters of other network layers related to the task behind the backbone network layer to improve the generalization ability in task scenarios. Wherein, the above-mentioned target training data may be images, or other training data including images.
本公开实施例在按照上述方法对联合神经网络微调之后,还可以使用原始下游数据进行最终调整,以进一步提升联合神经网络的训练性能。除此之外,本公开实施例针对已训练的目标神经网络也可以按照上述数据重表征方法进行再次训练,以提升目标神经网络的泛化性能,有关训练过程参见上述描述内容。In the embodiment of the present disclosure, after the joint neural network is fine-tuned according to the above method, the original downstream data can also be used for final adjustment, so as to further improve the training performance of the joint neural network. In addition, the embodiment of the present disclosure can also retrain the trained target neural network according to the above data re-characterization method to improve the generalization performance of the target neural network. For the training process, refer to the above description.
如图7所示,在第一步(Stage1)的初始化(Priming)过程中,可以利用编码器702,基于上游数据701,进行训练,得到码本703。其中,可以利用解码器704,将码本703重表征出上游数据701。在第二步(Stage2)的下游图像重表征过程中,可以利用编码器702,确定下游图像(Downstream Images)705对应的码本706,利用解码器704,将码本706重表征出输出的图像(Transferred Images)707。在第三步(Stage3)的微调过程中,可以基于输出的图像(Transferred Images)707、预训练模型(Pretrain Model)708,这一步骤中,预训练模型708的参数是固定的(也即可训练的),调整非固定参数(也即可训练的)的收集网络层、检测头(Neck&Head)709和任务损失(Task Loss)710相关网络中的参数。在第四步(Stage4)的最终调整的过程中,可以基于下游图像705,进一步调整预训练模型708、收集网络层、检测头709和任务损失(TaskLoss)710相关网络中的参数,这一步骤中,预训练模型708的参数是非固定的(也即不可训练的)。As shown in FIG. 7 , in the initialization (Priming) process of the first step (Stage1), an encoder 702 can be used to perform training based on upstream data 701 to obtain a codebook 703 . Wherein, the codebook 703 may be re-represented into the upstream data 701 by using the decoder 704 . In the downstream image re-characterization process of the second step (Stage2), the encoder 702 can be used to determine the codebook 706 corresponding to the downstream images (Downstream Images) 705, and the decoder 704 can be used to re-characterize the codebook 706 into an output image (Transferred Images)707. In the fine-tuning process of the third step (Stage3), it can be based on the output image (Transferred Images) 707 and the pretraining model (Pretrain Model) 708. In this step, the parameters of the pretraining model 708 are fixed (that is, training), adjust the parameters in the collection network layer, detection head (Neck&Head) 709 and task loss (Task Loss) 710 related network of non-fixed parameters (that is, trainable). In the final adjustment process of the fourth step (Stage4), the parameters in the pre-training model 708, collection network layer, detection head 709 and task loss (TaskLoss) 710 related network can be further adjusted based on the downstream image 705. This step In , the parameters of the pre-training model 708 are non-fixed (that is, not trainable).
本领域技术人员可以理解,在一些实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above methods in some embodiments, the writing order of each step does not imply a strict execution order and constitutes any limitation on the implementation process. The execution order of each step should be based on its function and possible internal Logically OK.
基于同一技术构思,本公开实施例中还提供了与数据处理系统对应的数据处理方法及装置,由于本公开实施例中的方法及装置解决问题的原理与本公开实施例上述数据处理系统相似,因此方法及装置的实施可以参见方法的实施。Based on the same technical idea, the embodiment of the present disclosure also provides a data processing method and device corresponding to the data processing system. Since the principle of solving the problem of the method and device in the embodiment of the present disclosure is similar to that of the above-mentioned data processing system in the embodiment of the present disclosure, Therefore, the implementation of the method and the device can refer to the implementation of the method.
参见图8所示,为本公开实施例提供的数据处理方法的流程图,方法包括步骤S801~S802,其中:Referring to FIG. 8 , which is a flowchart of a data processing method provided by an embodiment of the present disclosure, the method includes steps S801 to S802, wherein:
S801:获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;S801: Obtain a training data set and at least two network composition modules for forming a target neural network;
S802:基于获取的训练数据集以及至少两个网络组成模块,生成至少一个目标神经网络;每个目标神经网络用于执行对应的目标任务。S802: Generate at least one target neural network based on the acquired training data set and at least two network constituent modules; each target neural network is used to perform a corresponding target task.
这里,基于获取的训练数据集以及至少两个网络组成模块,可以生成至少一个用于执行对应的目标任务的目标神经网络。Here, at least one target neural network for performing a corresponding target task can be generated based on the acquired training data set and at least two network constituent modules.
有关训练数据集、网络组成模块的获取方式参见上述系统实施例中的相关描述,有关生成目标神经网络的方法也可以参见上述描述。For the acquisition methods of the training data set and the network component modules, refer to the relevant descriptions in the above-mentioned system embodiments, and for the method of generating the target neural network, please refer to the above-mentioned descriptions.
这里,在已训练至少两个目标神经网络的情况下,可以对至少两个目标神经网络进行联合训练,得到已训练的联合神经网络;联合神经网络用于迁移到下游业务场景中执行目标任务。有关联合神经网络的训练过程以及对应的应用过程可以参见上述描述内容。Here, if at least two target neural networks have been trained, at least two target neural networks can be jointly trained to obtain a trained joint neural network; the joint neural network is used for migration to downstream business scenarios to perform target tasks. For the training process of the joint neural network and the corresponding application process, please refer to the above description.
需要说明的是,本公开实施例所提供的数据处理方法的执行主体一般为具有一定计算能力的电子设备,该电子设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该数据处理方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。It should be noted that the execution subject of the data processing method provided by the embodiments of the present disclosure is generally an electronic device with certain computing capabilities, such as: a terminal device or a server or other processing device, and the terminal device may be a user device ( User Equipment, UE), mobile devices, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementation manners, the data processing method may be implemented by a processor invoking computer-readable instructions stored in a memory.
参照图9所示,为本公开实施例提供的一种数据处理装置的示意图,装置包括:获取模块901、生成模块902;其中,Referring to FIG. 9 , which is a schematic diagram of a data processing device provided by an embodiment of the present disclosure, the device includes: an acquisition module 901 and a generation module 902; wherein,
获取模块901,配置为获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;An acquisition module 901 configured to acquire a training data set and at least two network composition modules for forming a target neural network;
生成模块902,配置为基于获取的训练数据集以及至少两个网络组成模块,生成至少一个目标神经网络;每个目标神经网络用于执行对应的目标任务。The generation module 902 is configured to generate at least one target neural network based on the acquired training data set and at least two network composition modules; each target neural network is used to perform a corresponding target task.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.
在一些实施例中,所述装置还包括:执行模块,配置为对至少两个所述目标神经网络进行联合训练,得到已训练的联合神经网络;所述联合神经网络用于迁移到下游业务场景中执行所述目标任务。In some embodiments, the device further includes: an execution module configured to perform joint training on at least two of the target neural networks to obtain a trained joint neural network; the joint neural network is used for migration to downstream business scenarios execute the target task.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到至少两个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to at least two network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. It can be understood and implemented by those skilled in the art without creative effort.
本公开实施例还提供了一种电子设备,如图10所示,为本公开实施例提供的电子设备结构示意图,包括:处理器1001、存储器1002、和总线1003。存储器1002存储有处理器1001可执行的机器可读指令(比如,图9中的装置中获取模块901、生成模块902对应的执行指令等),当电子设备运行时,处理器1001与存储器1002之间通过总线1003通信,机器可读指令被处理器1001执行时执行如下处理:The embodiment of the present disclosure also provides an electronic device, as shown in FIG. 10 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 1001 , a memory 1002 , and a bus 1003 . The memory 1002 stores machine-readable instructions executable by the processor 1001 (for example, execution instructions corresponding to the acquisition module 901 and the generation module 902 in the device in FIG. Communication between them through the bus 1003, when the machine-readable instructions are executed by the processor 1001, the following processing is performed:
获取与至少两个网络组成模块关联的至少两个候选搜索路径,其中,每个候选搜索路径对应一种组合方式,组合方式用于表征每个网络组成模块之间的运算关系;Acquiring at least two candidate search paths associated with at least two network constituent modules, wherein each candidate search path corresponds to a combination mode, and the combination mode is used to characterize the operation relationship between each network constituent module;
利用强化学习网络对至少两个候选搜索路径进行至少一次搜索,得到每次搜索后的回报得分;performing at least one search on at least two candidate search paths by using a reinforcement learning network to obtain a return score after each search;
按照回报得分符合预设要求的候选搜索路径所对应的组合方式,将每个网络组成模块进行组合,得到目标神经网络。According to the combination method corresponding to the candidate search path whose return score meets the preset requirements, each network component module is combined to obtain the target neural network.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据处理方法的 步骤。其中,该计算机可读存储介质可以只存储数据处理方法对应的计算机程序。Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data processing method described in the foregoing method embodiments are executed. Wherein, the computer-readable storage medium may only store the computer program corresponding to the data processing method.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备,可为易失性存储介质或者非易失性存储介质。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device, and may be a volatile storage medium or a nonvolatile storage medium. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, Random Access Memory (RAM), Read-Only Memory (ROM), computer Erasable Programmable Read-Only Memory (EPROM or Flash), Static Random Access Memory (SRAM), Compact Disk Read-Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanically Encoded Devices , such as a punched card with instructions stored thereon, or a raised structure in a groove, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
本公开实施例还提出一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码被计算机读取并执行的情况下,实现本公开任一实施例中的方法的部分或全部步骤。An embodiment of the present disclosure also proposes a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or all steps.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据处理方法的步骤,可参见上述方法实施例。Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data processing method described in the above-mentioned method embodiment. For details, please refer to the above-mentioned method implementation example.
其中,上述计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品可以体现为计算机存储介质,在另一个可选实施例中,计算机程序产品可以体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Wherein, the above-mentioned computer program product may be realized by hardware, software or a combination thereof. In an optional embodiment, the computer program product may be embodied as a computer storage medium, and in another optional embodiment, the computer program product may be embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的一些工作过程,可以参考前述方法实施例中的对应过程。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,至少两个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and brevity of description, some working processes of the above-described systems and devices can refer to the corresponding processes in the foregoing method embodiments. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, at least two units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到至少两个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to at least two network units . Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: various media capable of storing program codes such as U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.
最后应说明的是:以上所述实施例,仅为本公开的一些实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-described embodiments are only some implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than to limit them. The protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims (26)

  1. 一种数据处理系统,包括:数据采集模块、网络生成模块和网络训练模块;所述数据采集模块、所述网络生成模块以及所述网络训练模块依次通信连接;A data processing system, comprising: a data collection module, a network generation module and a network training module; the data collection module, the network generation module and the network training module are sequentially connected by communication;
    所述数据采集模块,配置为获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;The data collection module is configured to obtain a training data set and at least two network composition modules for forming a target neural network;
    所述网络生成模块,配置为基于获取的所述训练数据集以及至少两个所述网络组成模块,生成至少一个目标神经网络;每个所述目标神经网络用于执行对应的目标任务;The network generation module is configured to generate at least one target neural network based on the acquired training data set and at least two of the network composition modules; each of the target neural networks is used to perform a corresponding target task;
    所述网络训练模块,配置为在已训练至少两个所述目标神经网络的情况下,对至少两个所述目标神经网络进行联合训练,得到已训练的联合神经网络;所述联合神经网络用于迁移到下游业务场景中执行所述目标任务。The network training module is configured to perform joint training on at least two of the target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network uses Perform the target tasks in the migration to downstream business scenarios.
  2. 根据权利要求1所述的系统,其中,在所述训练数据集包括对应于所述目标任务的训练数据的情况下,所述网络生成模块,用于按照如下步骤生成用于执行对应所述目标任务的目标神经网络:The system according to claim 1, wherein, in the case that the training data set includes training data corresponding to the target task, the network generation module is configured to generate The target neural network for the task:
    确定与至少两个所述网络组成模块关联的至少两个候选搜索路径,其中,每个所述候选搜索路径对应一种组合方式,所述组合方式用于表征每个所述网络组成模块之间的运算关系;Determining at least two candidate search paths associated with at least two of the network component modules, wherein each of the candidate search paths corresponds to a combination method, and the combination method is used to characterize the relationship between each of the network component modules. operation relationship;
    利用所述对应于所述目标任务的训练数据以及强化学习网络,对至少两个所述候选搜索路径进行至少一次搜索,得到每次搜索后的回报得分;performing at least one search on at least two of the candidate search paths by using the training data corresponding to the target task and the reinforcement learning network, and obtaining a reward score after each search;
    按照所述回报得分符合预设要求的候选搜索路径所对应的组合方式,将每个所述网络组成模块进行组合,得到用于执行所述目标任务的目标神经网络。According to the combination manner corresponding to the candidate search paths whose reward scores meet the preset requirements, each of the network constituent modules is combined to obtain a target neural network for performing the target task.
  3. 根据权利要求2所述的系统,其中,所述网络生成模块,配置为按照如下步骤得到每次搜索后的所述回报得分:The system according to claim 2, wherein the network generation module is configured to obtain the reward score after each search according to the following steps:
    利用所述强化学习网络对至少两个所述候选搜索路径进行第一次搜索,并基于第一次搜索所选择的候选搜索路径以及所述对应于所述目标任务的训练数据,确定第一次搜索后的回报得分以及选择对应候选搜索路径的选择概率;Use the reinforcement learning network to perform a first search on at least two of the candidate search paths, and based on the candidate search paths selected by the first search and the training data corresponding to the target task, determine the first The return score after the search and the selection probability of selecting the corresponding candidate search path;
    循环执行如下步骤直至满足网络截止条件:Perform the following steps in a loop until the network cut-off condition is met:
    基于第n-1次搜索后的回报得分以及选择对应候选搜索路径的选择概率,确定第n次搜索所选择的候选搜索路径,并基于第n次搜索所选择的候选搜索路径以及所述对应于所述目标任务的训练数据,确定第n次搜索后的回报得分以及选择对应候选搜索路径的选择概率,其中,n为大于1的整数。Based on the return score after the n-1th search and the selection probability of selecting the corresponding candidate search path, determine the candidate search path selected by the nth search, and based on the candidate search path selected by the nth search and the corresponding The training data of the target task determines the reward score after the nth search and the selection probability of selecting the corresponding candidate search path, where n is an integer greater than 1.
  4. 根据权利要求3所述的系统,其中,所述网络生成模块,配置为按照如下步骤确定第n次搜索后的回报得分:The system according to claim 3, wherein the network generation module is configured to determine the return score after the nth search according to the following steps:
    基于第n次搜索所选择的候选搜索路径,构建候选神经网络;Constructing a candidate neural network based on the candidate search path selected by the nth search;
    基于所述对应于所述目标任务的训练数据,确定构建的所述候选神经网络的网络 精度;Based on the training data corresponding to the target task, determine the network accuracy of the candidate neural network constructed;
    基于构建的所述候选神经网络的网络精度,确定第n次搜索后的回报得分。Based on the network accuracy of the constructed candidate neural network, the reward score after the nth search is determined.
  5. 根据权利要求4所述的系统,其中,所述网络生成模块,配置为按照如下步骤确定构建的所述候选神经网络的网络精度:The system according to claim 4, wherein the network generation module is configured to determine the network accuracy of the candidate neural network constructed according to the following steps:
    利用构建的所述候选神经网络,对针对所述目标任务的训练数据进行预测,得到所述候选神经网络的输出结果;Predicting the training data for the target task by using the constructed candidate neural network to obtain an output result of the candidate neural network;
    将所述输出结果与针对所述训练数据的标注结果进行比对,确定所述候选神经网络的网络精度。The output result is compared with the labeling result for the training data to determine the network accuracy of the candidate neural network.
  6. 根据权利要求2至5任一所述的系统,其中,根据以下方式中的一种选择所述回报得分符合预设要求的候选搜索路径:The system according to any one of claims 2 to 5, wherein the candidate search path whose return score meets the preset requirements is selected according to one of the following methods:
    选择所述回报得分最高的候选搜索路径;或,selecting the candidate search path with the highest score for said reward; or,
    按照所述回报得分对各次搜索对应的候选搜索路径进行排名,并选择排名高于预设名次的候选搜索路径;或,Ranking the candidate search paths corresponding to each search according to the reward scores, and selecting the candidate search paths whose ranking is higher than a preset ranking; or,
    选择所述回报得分高于预设阈值的候选搜索路径。Selecting candidate search paths whose reward scores are higher than a preset threshold.
  7. 根据权利要求1至6任一所述的系统,其中,所述数据采集模块,配置为按照如下步骤获取所述训练数据集:The system according to any one of claims 1 to 6, wherein the data collection module is configured to obtain the training data set according to the following steps:
    利用网络输入接口获取网络数据;Use the network input interface to obtain network data;
    基于主动学习网络对获取的所述网络数据进行质量评估,确定数据质量高于预设阈值的网络数据,并将所述数据质量高于所述预设阈值的网络数据作为所述训练数据集中的训练数据。Perform quality assessment on the acquired network data based on the active learning network, determine network data whose data quality is higher than a preset threshold, and use the network data whose data quality is higher than the preset threshold as the training data set training data.
  8. 根据权利要求1至7任一所述的系统,其中,所述数据采集模块,配置为按照如下步骤获取所述训练数据集:The system according to any one of claims 1 to 7, wherein the data collection module is configured to obtain the training data set according to the following steps:
    获取包括有初始标注结果的训练数据集;Obtain a training data set including initial labeling results;
    利用知识图谱结构对所述初始标注结果进行扩展,得到扩展后的标注结果;Extending the initial labeling result by using the knowledge graph structure to obtain the expanded labeling result;
    基于所述扩展后的标注结果对所述训练数据集进行更新。The training data set is updated based on the expanded labeling result.
  9. 根据权利要求1至8任一所述的系统,其中,所述至少两个网络组成模块至少包括特征图提取单元和针对所述特征图提取单元输出的特征图进行下采样的下采样单元。The system according to any one of claims 1 to 8, wherein the at least two network composition modules include at least a feature map extraction unit and a downsampling unit for downsampling the feature map output by the feature map extraction unit.
  10. 根据权利要求1至9任一所述的系统,其中,所述目标神经网络包括用于进行特征提取的骨干网络层以及用于进行特征处理的其他网络层;所述训练数据集包括具有至少两个图像文本对的第一训练数据,以及具有至少两个图像的第二训练数据;所述网络训练模块,配置为按照以下步骤训练所述目标神经网络:The system according to any one of claims 1 to 9, wherein the target neural network includes a backbone network layer for feature extraction and other network layers for feature processing; the training data set includes at least two The first training data of image-text pairs, and the second training data with at least two images; the network training module is configured to train the target neural network according to the following steps:
    利用所述第一训练数据对待训练的目标神经网络包括的骨干网络层进行训练,得到已训练的骨干网络层;Using the first training data to train the backbone network layer included in the target neural network to be trained to obtain the trained backbone network layer;
    在所述已训练的骨干网络层的网络参数值保持不变的情况下,利用所述第二训练数据对待训练的目标神经网络包括的其他网络层进行训练,得到已训练的其他网络 层。Under the condition that the network parameter value of the trained backbone network layer remains unchanged, the second training data is used to train other network layers included in the target neural network to be trained to obtain other trained network layers.
  11. 根据权利要求10所述的系统,其中,所述网络训练模块,配置为按照以下步骤得到已训练的骨干网络层:The system according to claim 10, wherein the network training module is configured to obtain a trained backbone network layer according to the following steps:
    利用未训练的目标神经网络,对所述第一训练数据进行特征提取,得到所述第一训练数据所包括所述图像文本对中的图像和文本分别对应的图像特征信息以及文本特征信息;Using an untrained target neural network to perform feature extraction on the first training data, to obtain image feature information and text feature information respectively corresponding to the image and text in the image-text pair included in the first training data;
    基于所述图像特征信息和所述文本特征信息之间的特征相似度,确定第一损失函数值;determining a first loss function value based on the feature similarity between the image feature information and the text feature information;
    在当前轮训练不满足迭代截止条件的情况下,基于所述第一损失函数值对所述骨干网络层的网络参数值进行调整,并基于调整后的骨干网络层进行下一轮训练,直至满足所述迭代截止条件。In the case that the current round of training does not meet the iteration cut-off condition, the network parameter value of the backbone network layer is adjusted based on the first loss function value, and the next round of training is performed based on the adjusted backbone network layer until the condition is satisfied. The iteration cutoff condition.
  12. 根据权利要求10或11所述的系统,其中,所述网络训练模块,配置为按照以下步骤得到已训练的其他网络层:The system according to claim 10 or 11, wherein the network training module is configured to obtain other trained network layers according to the following steps:
    利用未训练的目标神经网络,对所述第二训练数据进行特征提取,得到所述目标神经网络包括的其他网络层的输出结果;Using the untrained target neural network to perform feature extraction on the second training data to obtain output results of other network layers included in the target neural network;
    基于所述输出结果和针对所述第二训练数据所包括图像的标注结果,确定第二损失函数值;determining a second loss function value based on the output result and the labeling result of the image included in the second training data;
    在当前轮训练不满足所述迭代截止条件的情况下,基于所述第二损失函数值对所述其他网络层的网络参数值进行调整,并基于调整后的其他网络层进行下一轮训练,直至满足所述迭代截止条件。When the current round of training does not meet the iteration cut-off condition, adjusting the network parameter values of the other network layers based on the second loss function value, and performing the next round of training based on the adjusted other network layers, until the iteration cut-off condition is met.
  13. 根据权利要求10至12任一所述的系统,其中,所述网络训练模块,配置为按照以下步骤得到已训练的联合神经网络:The system according to any one of claims 10 to 12, wherein the network training module is configured to obtain a trained joint neural network according to the following steps:
    利用所述至少两个目标神经网络分别对所述训练数据集中的训练数据进行特征提取,得到每个目标神经网络包括的骨干网络层输出的特征信息;Using the at least two target neural networks to perform feature extraction on the training data in the training data set, to obtain the feature information output by the backbone network layer included in each target neural network;
    基于每个所述目标神经网络包括的骨干网络层输出的特征信息,确定未训练的联合神经网络的损失函数值,其中,所述联合神经网络由至少两个所述目标神经网络,以及每个所述目标神经网络包括的骨干网络层之间的连接层构成;Based on the feature information output by the backbone network layer included in each of the target neural networks, determine the loss function value of the untrained joint neural network, wherein the joint neural network consists of at least two of the target neural networks, and each The connection layer between the backbone network layers included in the target neural network;
    基于所述损失函数值对所述待训练的联合神经网络进行至少一轮网络训练,得到所述已训练的联合神经网络。Perform at least one round of network training on the joint neural network to be trained based on the loss function value to obtain the trained joint neural network.
  14. 根据权利要求13所述的系统,其中,至少两个所述目标神经网络中的一个目标神经网络作为所述联合神经网络的主神经网络,至少两个所述目标神经网络中的其它目标神经网络作为所述联合神经网络的副神经网络;所述网络训练模块,配置为按照以下步骤确定待训练的联合神经网络的损失函数值:The system according to claim 13, wherein one of the at least two target neural networks serves as the main neural network of the joint neural network, and the other target neural networks of the at least two target neural networks As the secondary neural network of the joint neural network; the network training module is configured to determine the loss function value of the joint neural network to be trained according to the following steps:
    基于所述副神经网络包括的第一骨干网络层输出的第一特征信息,对所述主神经网络包括的第二骨干网络层输出的第二特征信息进行更新,得到更新后的第二特征信息;Based on the first feature information output by the first backbone network layer included in the secondary neural network, the second feature information output by the second backbone network layer included in the main neural network is updated to obtain updated second feature information ;
    基于所述更新后的第二特征信息,确定所述待训练的联合神经网络的损失函数值。Based on the updated second feature information, determine a loss function value of the joint neural network to be trained.
  15. 根据权利要求14所述的系统,其中,所述网络训练模块,配置为按照以下步骤确定所述联合神经网络的损失函数值:The system according to claim 14, wherein the network training module is configured to determine the loss function value of the joint neural network according to the following steps:
    利用所述主神经网络包括的其他网络层,对所述更新后的第二特征信息进行特征提取,得到其他网络层的输出结果;Using other network layers included in the main neural network to perform feature extraction on the updated second feature information to obtain output results of other network layers;
    基于所述其他网络层的输出结果以及所述主神经网络对应任务下的标注结果,确定所述联合神经网络的损失函数值。The loss function value of the joint neural network is determined based on the output results of the other network layers and the labeling results of the corresponding tasks of the main neural network.
  16. 根据权利要求1至15任一所述的系统,其中,在所述训练数据集中包括至少两个图像的情况下,所述系统还包括网络迁移模块;所述网络迁移模块与所述网络训练模块通信连接;The system according to any one of claims 1 to 15, wherein, in the case where the training data set includes at least two images, the system further comprises a network migration module; the network migration module and the network training module communication connection;
    所述网络迁移模块,配置为基于至少两个所述图像,确定用于表征将每个所述图像分解为至少两个基元的码本;在将已训练的联合神经网络迁移到下游业务场景的情况下,基于得到的所述码本对所述目标业务场景下采集的目标训练数据进行表征,得到表征后的目标训练数据;利用所述表征后的目标训练数据对所述联合神经网络进行再次训练,得到用于对所述目标业务场景下采集的目标场景数据进行处理的已训练的联合神经网络。The network migration module is configured to determine, based on at least two of the images, a codebook for decomposing each of the images into at least two primitives; when migrating the trained joint neural network to a downstream business scenario In the case of , based on the obtained codebook, characterize the target training data collected in the target business scenario to obtain the characterized target training data; use the characterized target training data to perform the joint neural network Train again to obtain a trained joint neural network for processing the target scene data collected in the target business scene.
  17. 根据权利要求16所述的系统,其中,所述网络迁移模块,配置为按照以下步骤确定用于表征将每个所述图像分解为至少两个基元的码本:The system according to claim 16, wherein the network migration module is configured to determine a codebook for decomposing each of the images into at least two primitives according to the following steps:
    重复执行以下步骤,直至解码器输出的图像与输入到解码器中的图像之间的相似度大于预设阈值:Repeat the following steps until the similarity between the image output by the decoder and the image input to the decoder is greater than a preset threshold:
    利用未训练的编码器,对所述训练数据集中的所述图像进行编码,得到所述编码器输出的码本;利用未训练的解码器,对所述编码器输出的码本进行解码,得到所述解码器输出的图像。Using an untrained encoder to encode the images in the training data set to obtain a codebook output by the encoder; using an untrained decoder to decode the codebook output by the encoder to obtain The image output by the decoder.
  18. 根据权利要求16或17所述的系统,其中,所述网络迁移模块,配置为按照如下步骤得到用于对所述目标业务场景下采集的目标场景数据进行处理的所述已训练的联合神经网络:The system according to claim 16 or 17, wherein the network migration module is configured to obtain the trained joint neural network for processing the target scene data collected under the target business scene according to the following steps :
    在所述联合神经网络包括的骨干网络层的网络参数值保持不变的情况下,利用所述表征后的目标训练数据对所述联合神经网络包括的其他网络层进行再次训练,得到已训练的其他网络层。Under the condition that the network parameter values of the backbone network layers included in the joint neural network remain unchanged, the other network layers included in the joint neural network are retrained using the characterized target training data to obtain the trained other network layers.
  19. 一种数据处理方法,包括:A data processing method, comprising:
    获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;Obtain a training data set and at least two network building blocks for forming a target neural network;
    基于获取的所述训练数据集以及至少两个所述网络组成模块,生成至少一个目标神经网络;每个所述目标神经网络用于执行对应的目标任务。At least one target neural network is generated based on the acquired training data set and at least two network components; each target neural network is used to perform a corresponding target task.
  20. 根据权利要求19所述的方法,其中,在已训练至少两个所述目标神经网络的情况下,所述方法还包括:The method according to claim 19, wherein, where at least two of the target neural networks have been trained, the method further comprises:
    对至少两个所述目标神经网络进行联合训练,得到已训练的联合神经网络;所述联合神经网络用于迁移到下游业务场景中执行所述目标任务。Perform joint training on at least two of the target neural networks to obtain a trained joint neural network; the joint neural network is used to migrate to a downstream business scenario to perform the target task.
  21. 一种数据处理装置,包括:A data processing device, comprising:
    获取模块,配置为获取训练数据集、以及用于构成目标神经网络的至少两个网络组成模块;An acquisition module configured to acquire a training data set and at least two network composition modules for forming a target neural network;
    生成模块,配置为基于获取的所述训练数据集以及至少两个所述网络组成模块,生成至少一个目标神经网络;每个所述目标神经网络用于执行对应的目标任务。The generating module is configured to generate at least one target neural network based on the acquired training data set and at least two of the network composition modules; each of the target neural networks is used to perform a corresponding target task.
  22. 一种数据处理装置,其中,在已训练至少两个所述训练好多个目标神经网络的情况下,所述装置还包括:A data processing device, wherein, in the case of having trained at least two of the trained multiple target neural networks, the device further includes:
    执行模块,配置为对至少两个所述目标神经网络进行联合训练,得到已训练的联合神经网络;所述联合神经网络用于迁移到下游业务场景中执行所述目标任务。The execution module is configured to jointly train at least two of the target neural networks to obtain a trained joint neural network; the joint neural network is used to migrate to a downstream business scenario to perform the target task.
  23. 一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求19或20所述的数据处理方法的步骤。An electronic device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the memory through the bus , when the machine-readable instructions are executed by the processor, the steps of the data processing method according to claim 19 or 20 are executed.
  24. 一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求19或20所述的数据处理方法的步骤。A computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data processing method as claimed in claim 19 or 20 are executed.
  25. 一种计算机程序,包括计算机可读代码,在计算机可读代码在设备上运行的情况下,设备中的处理器执行用于实现权利要求19或20所述的方法。A computer program comprising computer readable code, executed by a processor in the device for implementing the method of claim 19 or 20, when the computer readable code is run on the device.
  26. 一种计算机程序产品,配置为存储计算机可读指令,所述计算机可读指令被执行时使得计算机执行权利要求19或20所述的方法。A computer program product configured to store computer readable instructions which, when executed, cause a computer to perform the method of claim 19 or 20.
PCT/CN2022/099715 2021-11-05 2022-06-20 Data processing system, method and apparatus, and device, storage medium, computer program and computer program product WO2023077819A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111306897.7 2021-11-05
CN202111306897.7A CN114037055A (en) 2021-11-05 2021-11-05 Data processing system, method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023077819A1 true WO2023077819A1 (en) 2023-05-11

Family

ID=80143049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099715 WO2023077819A1 (en) 2021-11-05 2022-06-20 Data processing system, method and apparatus, and device, storage medium, computer program and computer program product

Country Status (2)

Country Link
CN (1) CN114037055A (en)
WO (1) WO2023077819A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116660992A (en) * 2023-06-05 2023-08-29 北京石油化工学院 Seismic signal processing method based on multi-feature fusion

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037055A (en) * 2021-11-05 2022-02-11 北京市商汤科技开发有限公司 Data processing system, method, device, equipment and storage medium
CN114612685B (en) * 2022-03-22 2022-12-23 中国科学院空天信息创新研究院 Self-supervision information extraction method combining depth features and contrast learning
CN117830645A (en) * 2024-02-23 2024-04-05 中国科学院空天信息创新研究院 Feature extraction network training method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN110378278A (en) * 2019-07-16 2019-10-25 北京地平线机器人技术研发有限公司 Training method, object search method, apparatus and the electronic equipment of neural network
CN112507943A (en) * 2020-12-18 2021-03-16 华南理工大学 Visual positioning navigation method, system and medium based on multitask neural network
CN114037055A (en) * 2021-11-05 2022-02-11 北京市商汤科技开发有限公司 Data processing system, method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN110378278A (en) * 2019-07-16 2019-10-25 北京地平线机器人技术研发有限公司 Training method, object search method, apparatus and the electronic equipment of neural network
CN112507943A (en) * 2020-12-18 2021-03-16 华南理工大学 Visual positioning navigation method, system and medium based on multitask neural network
CN114037055A (en) * 2021-11-05 2022-02-11 北京市商汤科技开发有限公司 Data processing system, method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116660992A (en) * 2023-06-05 2023-08-29 北京石油化工学院 Seismic signal processing method based on multi-feature fusion
CN116660992B (en) * 2023-06-05 2024-03-05 北京石油化工学院 Seismic signal processing method based on multi-feature fusion

Also Published As

Publication number Publication date
CN114037055A (en) 2022-02-11

Similar Documents

Publication Publication Date Title
WO2023077819A1 (en) Data processing system, method and apparatus, and device, storage medium, computer program and computer program product
US11669744B2 (en) Regularized neural network architecture search
CN107423376B (en) Supervised deep hash rapid picture retrieval method and system
US20210027098A1 (en) Weakly Supervised Image Segmentation Via Curriculum Learning
JP7109302B2 (en) Text generation model update method and text generation device
JP7431833B2 (en) Language sequence labeling methods, devices, programs and computing equipment
CN111708876B (en) Method and device for generating information
US11010664B2 (en) Augmenting neural networks with hierarchical external memory
CN111831813B (en) Dialog generation method, dialog generation device, electronic equipment and medium
US20160117574A1 (en) Tagging Personal Photos with Deep Networks
JP2022554068A (en) Video content recognition method, apparatus, program and computer device
US20210234814A1 (en) Human-machine interaction
CN113795851A (en) Large-scale generation neural network model with reasoning for representation learning using antagonistic training
CN113128431B (en) Video clip retrieval method, device, medium and electronic equipment
CN111241285A (en) Method, device, equipment and storage medium for identifying question answer types
CN112016601A (en) Network model construction method based on knowledge graph enhanced small sample visual classification
CN113704460A (en) Text classification method and device, electronic equipment and storage medium
CN115269913A (en) Video retrieval method based on attention fragment prompt
CN116977701A (en) Video classification model training method, video classification method and device
WO2021012040A1 (en) Methods and systems for state navigation
CN115130461A (en) Text matching method and device, electronic equipment and storage medium
CN112668464A (en) Chinese sign language translation model construction method and device fusing scene matching
CN114329006B (en) Image retrieval method, apparatus, device, and computer-readable storage medium
US20240346364A1 (en) Co-attentive Fusion with Unified Label Graph Representation for Low-resource Text Classification
CN117350354B (en) Training method and device for large model, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22888855

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22888855

Country of ref document: EP

Kind code of ref document: A1