WO2023077819A1

WO2023077819A1 - Data processing system, method and apparatus, and device, storage medium, computer program and computer program product

Info

Publication number: WO2023077819A1
Application number: PCT/CN2022/099715
Authority: WO
Inventors: 邵婧; 李阳光; 王坤; 尹榛菲; 陈思禹; 何逸楠; 黄耿石; 滕家宁; 刘丰刚; 孙庆宏; 梁鼎; 吴一超; 高梦雅; 刘宇; 宋广录; 刘吉豪
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-11-05
Filing date: 2022-06-20
Publication date: 2023-05-11
Also published as: CN114037055A

Abstract

Provided in the present disclosure are a data processing system, method and apparatus, and a device and a storage medium. The system comprises: a data collection module, a network generation module and a network training module, which are sequentially in communication connection, wherein the data collection module is configured to acquire a training data set, and at least two network composition modules, which are used for constituting a target neural network; the network generation module is used for generating at least one target neural network on the basis of the acquired training data set and at least two network composition modules; each target neural network is used for executing a corresponding target task; the network training module is configured to perform joint training on at least two target neural networks when the at least two target neural networks have been trained, so as to obtain a trained joint neural network; and the joint neural network is used for being migrated to a downstream service scenario to execute a target task. In the present disclosure, a joint neural network that is suitable for a downstream service scenario can be generated by means of joint training, and the universality and accuracy of the joint neural network are both relatively good.

Description

Data processing system and method, device, device, storage medium, computer program, computer program product

Cross application of related applications

This disclosure is based on the Chinese patent application with the application number 202111306897.7, the application date is November 05, 2021, and the application name is "data processing system, method, device, equipment and storage medium", and claims the priority of the Chinese patent application , the entire content of this Chinese patent application is hereby incorporated by reference into this disclosure.

technical field

The present disclosure relates to but not limited to the technical field of artificial intelligence, and in particular relates to a data processing system and method, device, device, storage medium, computer program, and computer program product.

Background technique

General artificial intelligence technology is an important topic in the field of artificial intelligence research. Taking the field of computer vision as an example, the general visual neural network built with general artificial intelligence technology can break through the limitations of a single model for specific computer vision tasks, and thus can be widely used in various computer tasks, such as image classification, object detection, Semantic segmentation, depth estimation, etc.

A method for generating a general visual neural network is provided in the related art. In the upstream task, the method uses a general data set to train a classification network so as to train a general visual representation through the classification task.

However, since the network trained upstream is limited to specific classification tasks, this results in poor performance when applying the trained visual representations to other downstream tasks such as detection and segmentation.

Contents of the invention

Embodiments of the present disclosure at least provide a data processing system and method, device, device, storage medium, computer program, and computer program product.

An embodiment of the present disclosure provides a data processing system, including: a data collection module, a network generation module, and a network training module; the data collection module, the network generation module, and the network training module are sequentially connected by communication;

The data collection module is configured to obtain a training data set and at least two network composition modules for forming a target neural network;

The network generation module is configured to generate at least one target neural network based on the obtained training data set and the at least two network composition modules; each of the target neural networks is used to perform a corresponding target task;

The network training module is configured to perform joint training on at least two of the target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network uses Perform the target tasks in the migration to downstream business scenarios.

Using the above-mentioned data processing system, when the training data and at least two network component modules used to form the target neural network are obtained, based on the acquired training data set and the at least two network component modules, at least A target neural network. In this way, if at least two target neural networks have been trained, joint training can be performed on the at least two target neural networks, so as to obtain a trained joint neural network. The basic network composition modules of the present disclosure can generate target neural networks suitable for various target tasks, and then through joint training, can generate a joint neural network suitable for downstream business scenarios, which has good versatility and accuracy.

An embodiment of the present disclosure also provides a data processing method, including:

Obtain a training data set and at least two network building blocks for forming a target neural network;

At least one target neural network is generated based on the acquired training data set and at least two network components; each target neural network is used to perform a corresponding target task.

An embodiment of the present disclosure also provides a data processing device, including:

An acquisition module configured to acquire a training data set and at least two network composition modules for forming a target neural network;

The generating module is configured to generate at least one target neural network based on the acquired training data set and at least two of the network composition modules; each of the target neural networks is used to perform a corresponding target task.

An embodiment of the present disclosure also provides an electronic device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processor and the The memory communicates with each other through a bus, and when the machine-readable instructions are executed by the processor, the steps of the data processing method described in any one of the second aspect and its implementation manners are executed.

An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the data described in any one of the second aspect and its implementation manners is executed. The steps of the processing method.

An embodiment of the present disclosure provides a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, a part or part of the method in any embodiment of the present disclosure is realized. All steps.

An embodiment of the present disclosure provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, any embodiment of the present disclosure is realized Some or all of the steps in the method.

For the effect description of the above-mentioned data processing device, electronic equipment, computer-readable storage medium, computer program and computer program product, please refer to the description of the above-mentioned data processing system.

In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.

FIG. 1 shows a schematic diagram of a data processing system provided by an embodiment of the present disclosure;

FIG. 2(a) shows a schematic diagram of the down-sampling mode in the data processing system provided by the embodiment of the present disclosure;

Fig. 2(b) shows a schematic diagram of the downsampling mode in the data processing system provided by the embodiment of the present disclosure;

FIG. 2(c) shows a schematic diagram of the down-sampling mode in the data processing system provided by the embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of searching candidate search paths in the data processing system provided by an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of pre-training of the target neural network in the data processing system provided by an embodiment of the present disclosure;

Fig. 5 shows the flow chart of the joint neural network training method in the data processing system provided by the embodiment of the present disclosure; Fig. 6(a) shows the first connection in the data processing system provided by the embodiment of the present disclosure Schematic diagram of the connection mode of the layers;

FIG. 6(b) shows a schematic diagram of a connection mode of the second connection layer in the data processing system provided by an embodiment of the present disclosure;

FIG. 6(c) shows a schematic diagram of a connection mode of the third connection layer in the data processing system provided by an embodiment of the present disclosure;

FIG. 7 shows a schematic diagram of codebook training in the data processing system provided by an embodiment of the present disclosure;

FIG. 8 shows a flowchart of a data processing method provided by an embodiment of the present disclosure;

FIG. 9 shows a schematic diagram of a data processing device provided by an embodiment of the present disclosure;

Fig. 10 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or at least two elements selected from the set formed by B and C.

After research, it is found that the construction of a general-purpose visual neural network has not yet formed a set of effective procedures and reliable results. Many existing computer vision technologies are constrained by various factors, making it difficult to achieve the goal of a general-purpose visual neural network. Taking a method for generating a general visual neural network provided in the related art as an example, in the upstream task, the method uses a general data set to train a classification network to train a general visual representation through the classification task.

However, since the network trained upstream is limited to specific classification tasks, this results in poor performance when applying the trained visual representations to other downstream tasks such as detection and segmentation. In addition, most of the existing data sets are limited in number, incomplete labeling systems, and inefficient labeling, which cannot meet the needs of general visual models and are difficult to scale.

Based on the above research, the present disclosure provides a scheme for network search candidate search paths based on reinforcement learning to realize neural network generation, and the scheme has achieved remarkable results in both network performance and network versatility.

To facilitate understanding of this embodiment, a data processing system disclosed in an embodiment of the present disclosure is first introduced in detail. As shown in Figure 1, it is a schematic diagram of a data processing system provided by an embodiment of the present disclosure, the data processing system includes: a data acquisition module 101, a network generation module 102 and a network training module 103; a data acquisition module 101, a network generation module 102 and a network The training module 103 is sequentially connected by communication;

The data collection module 101 is configured to obtain a training data set and at least two network composition modules for forming a target neural network;

The network generation module 102 is configured to generate at least one target neural network based on the acquired training data set and at least two network composition modules; each target neural network is used to perform a corresponding target task;

The network training module 103 is configured to perform joint training on at least two target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network is used for migration to downstream services Perform the target task in the scene.

In order to facilitate the understanding of the data processing system provided by the embodiment of the present disclosure, the application scenario of the method will be briefly described below. The data processing system in the embodiments of the present disclosure can be applied to the field of vision, for example, the neural network based on the generated object is applied to scenarios such as object detection, image classification, and depth estimation.

Considering that many existing computer vision neural networks in related technologies are only limited to specific computer vision tasks, it is difficult to form generalization. Based on this, the embodiments of the present disclosure provide a data processing system that generates a target neural network based on network component modules, and then obtains a joint neural network adapted to various target tasks through joint training.

The target neural network here may be determined based on a search result of at least two candidate search paths associated with at least two network constituent modules by the reinforcement learning network. The candidate search paths here correspond to a set of specific combination methods. Based on this combination method, the corresponding network components can be combined, and then the target neural network that meets the requirements can be obtained.

The network composition module in the embodiment of the present disclosure may include a feature map extraction unit for feature map extraction, and may also include a downsampling unit for downsampling the feature map output by the feature map extraction unit.

As shown in Fig. 2(a), the down sampling (Down Sampling Modules, DSM) unit can include local down sampling (Local DSM, L-DSM), and the hidden layer 201 in L-DSM is convolution stride (stride) is 2. A two-dimensional convolutional layer. As shown in Figure 2(b), the DSM unit can also include local-global downsampling (Local-global DSM, LG-DSM), and the hidden layer 202 in LG-DSM has a convolution step size of 2 and a dimension of two dimensions The convolutional layer, the hidden layer 203 is a multi-head attention layer (Multi-Head Attention). As shown in Figure 2(c), the DSM unit can also include global downsampling (Global DSM, G-DSM), and the hidden layer 204 in G-DSM is a convolutional layer with a convolution step size of 2 and a dimension of one dimension , the hidden layer 205 is a multi-head attention layer (Multi-Head Attention). Among them, the multi-head attention layer can be used to determine the query vector (Q), key vector (K) and value vector (V).

In some embodiments, a unified search space (Unified Search Space) may be constructed to search for candidate search paths based on the unified search space. As shown in FIG. 3 , a unified search space 301 may be determined by a network composition model 302 , a downsampling unit 303 and a network size 304 . The network composition model 302 may also be called general operations (General Operations, GOP), and may include a convolutional network (Convolution), a natural language processing model (Transformer), and a multilayer perceptron (Multilayer Perceptron, MLP). The downsampling unit (DSM) 303 may include L-DSM, LG-DSM and G-DSM. The network size (Size) 304 may include the number of iterations (Repeats), the number of channels (Channels), the number of expansions (Expansion), and the like. In the process of searching for candidate search paths, multiple searches may be performed, for example, the first time (N ₁ ), the second time (N ₂ )...the fifth time (N ₅ ) and so on.

In some embodiments, the above-mentioned feature map extraction unit can be implemented based on convolution operation, can also be based on Transformer architecture, can also be implemented based on multi-layer perceptron, and can also be based on other features with feature extraction functions It is implemented by a related unit, and there is no limitation here; the above-mentioned down-sampling unit can be implemented based on a convolution operation, or based on a multi-layer attention mechanism, or based on other related units with sampling functions. This is also not limited.

The training data set in the embodiment of the present disclosure may include various training data, for example, may include training data corresponding to different target tasks, and may also include the first training data having at least two image-text pairs, and The second training data having at least two images may also include other training data, which is not limited here, and the corresponding data may be selected based on different requirements, and the above-mentioned various training data may be pre-split, so as to facilitate In the case of corresponding network training, the corresponding training data is quickly extracted.

The training data set provided in the embodiments of the present disclosure may be high-quality network data screened out based on an active learning network. The network data here may be acquired based on the network input interface, and in some embodiments, the network data may be automatically acquired from the network input interface by means of a web crawler. The training data here may be high-quality network data screened out after evaluating the quality of the acquired network data, and the high-quality training data can ensure the accuracy of network training.

In addition, the training data set in the embodiment of the present disclosure may also have an initial labeling result, and in order to adapt to the training requirements of various networks, the labeling result may be extended here. That is to say, the embodiment of the present disclosure provides a large-scale labeling system, where the knowledge map structure can be used to expand the initial labeling results to obtain the expanded labeling results. Here, the initial labeling results of the training dataset can be extended based on the knowledge graph structure. The extended annotation results can provide more realistic supervision signals for network training, which ensures the accuracy of the determined network accuracy to a certain extent.

In some embodiments, other automatic linking methods based on semantic analysis can also be used to extend the tag system.

It should be noted that, after expanding the label system by means of automatic linking, the embodiments of the present disclosure can also determine the final training data and corresponding labeling results through data reorganization and label cleaning. Such a label system is more suitable for computer vision tasks. demand.

In order to adapt to the application requirements for various target tasks in downstream business scenarios, here, in the case of generating and training at least two target neural networks for performing different target tasks, at least two target neural networks can be combined to perform Joint training, so that the trained joint neural network has the task characteristics of each target neural network, so that it can better adapt to downstream business scenarios. The downstream business scenarios here can be scenarios related to computer vision, for example, the access control field, Unmanned driving field, etc.

It should be noted that in the process of training the target neural network under different tasks, training can be performed based on the training data corresponding to the current task. For example, the detection neural network related to the detection task can be trained, and the related classification task can also be trained. Classification neural network.

Using the above data processing system, when the training data and a plurality of network components used to form the target neural network are obtained, at least one target can be generated based on the obtained training data set and the plurality of network components. Neural Networks. In this way, in the case of training multiple target neural networks, the multiple target neural networks can be jointly trained to obtain a trained joint neural network. The basic network composition modules of the present disclosure can generate target neural networks suitable for various target tasks, and then through joint training, can generate a joint neural network suitable for downstream business scenarios, which has good versatility and accuracy.

Considering that the generation of the target neural network plays a key role in the data processing system provided by the embodiments of the present disclosure, some embodiments may be described next. The network generation module 102 here can generate the target neural network for performing the corresponding target task according to the following steps:

Step 1. Determine at least two candidate search paths associated with at least two network constituent modules, wherein each candidate search path corresponds to a combination mode, and the combination mode is used to represent the operational relationship between each network constituent module;

Step 2, using the training data corresponding to the target task and the reinforcement learning network to search at least two candidate search paths at least once, and obtain the reward score after each search;

Step 3: According to the combination method corresponding to the candidate search paths whose return scores meet the preset requirements, the inner network components are combined to obtain the target neural network for performing the target task.

In the embodiment of the present disclosure, there may be at least two determined candidate search paths. Here, one or more path searches can be performed based on the learning of the reinforcement learning network, and a searched candidate search path can be obtained for each path search. As the training times of the reinforcement learning network gradually increase, its learning ability becomes stronger and stronger, and the search ability will gradually increase with the increase of training times, and then more and better candidate search paths can be screened out.

Here, when multiple candidate search paths associated with multiple network constituent modules are determined, the reinforcement learning network can be used to search the multiple candidate search paths, so as to generate the target neural network according to the reward score after each search. The candidate search paths in this disclosure can be characterized by the operational relationship between various network components, and the number of candidate search paths that can be determined based on various operational relationships is also large, and the reinforcement learning network can be used to learn from a large number of In the candidate search path, a candidate search path that represents better performance of the neural network is searched, so that the target neural network generated based on the searched candidate search path has better versatility and accuracy.

In order to facilitate the description of the process of using the reinforcement learning network to realize at least one search, the standard environment of the reinforcement learning network is briefly described next. The standard environment for reinforcement learning networks includes State, Action and Reword. The update form is to input an action at the current moment, and the environment obtains the state and reward at that moment through a single-step operation. The state policy function can calculate the input action at the next moment, and the reward is used to update the weight parameters of the strategy.

In the embodiment of the present disclosure, the input action at the current moment may point to searching for the next candidate search path, and the state at this moment may point to the selection probability of selecting the corresponding candidate search path.

Here, any candidate search path can be used as the initial state information of the reinforcement learning network, and the candidate search path selected by the first search can be determined based on the initial state information, and the candidate search path selected based on the first search and corresponding to the target The training data of the task determines the reward score after the first search and the selection probability of selecting the corresponding candidate search path.

In the case of obtaining the return score after the first search and the selection probability of selecting the corresponding candidate search path, the choice of the second search can be determined based on the return score after the first search and the selection probability of the corresponding candidate search path Candidate search paths, and then you can get the return score after the second search and the selection probability of selecting the corresponding candidate search path, and in this way, you can determine the third, fourth, and last search based on the next search. Return score.

In the embodiment of the present disclosure, n searches may be performed. n may be an integer greater than 1, for example, may be 100 times, 1000 times, etc. The value of n may be determined in combination with requirements of different application scenarios, and there is no limitation thereto.

Here, through the path search, the return score after the corresponding search and the selection probability of selecting the corresponding candidate search path can be determined. Using the learning principle of the reinforcement learning network, the search strategy with a relatively high return score can be automatically selected, so that the obtained candidate search Paths are more reliable.

The execution of the search can be determined based on the network cut-off condition, where the network cut-off condition can be that the number of iterations is large enough, the number of candidate search paths that meet the preset requirements is large enough, or other conditions, Embodiments of the present disclosure do not limit this.

In the embodiment of the present disclosure, a candidate search path with a relatively high return score may be selected from the searched candidate search paths. Here, you can select the candidate search path with the highest return score; you can also rank the candidate search paths corresponding to each search according to the return score, and select the candidate search path with a higher ranking than the preset ranking, for example, you can select the top three Candidate search paths are used as corresponding combinations; candidate search paths whose reward scores are higher than a preset threshold can also be selected. Here, the labeling result of the training data can be the labeling result based on the expanded knowledge map structure. The extended annotation results can provide more realistic supervision signals for network training, which ensures the accuracy of the determined network accuracy to a certain extent.

Wherein, the reward score after each search may be determined based on the network accuracy of the candidate neural network. For a candidate neural network with higher network precision, it can be explained to a certain extent that the training performance of the candidate neural network is better. At this time, positive feedback can be used to motivate the execution of similar path searches. On the contrary, for the network with lower precision As far as the candidate search path is concerned, it can be explained to a certain extent that the training performance of the candidate neural network is poor. At this time, negative feedback can be used to reduce the execution of similar path searches.

Here, the reward score may be determined based on the network accuracy with respect to the candidate neural network. The network performance of the candidate neural network with higher network accuracy can better meet the needs of various fields, and then can be given a higher return score. Under such a scoring mechanism, more and better candidate search paths can be obtained.

Here, after inputting the training data for the target task into the constructed candidate neural network, the difference between the output result and the labeling result for the training data can be determined. The larger the difference, the better the candidate neural network is. The lower the accuracy, on the contrary, the smaller the difference, to a certain extent, the higher the network accuracy of the candidate neural network.

In the embodiment of the present disclosure, the network accuracy of the candidate neural network can be jointly determined through the comparison results corresponding to at least two training data, for example, the final network accuracy can be determined based on the average network accuracy obtained from the network accuracy corresponding to each training data .

In the data processing system provided by the embodiment of the present disclosure, when the network generation module 102 generates a target neural network, pre-training of large-scale multi-modal data can be performed based on the network training module 103 to improve the training of the target neural network performance. Wherein, the target neural network here includes a backbone network layer for feature extraction and other network layers for feature processing. In some embodiments, the above-mentioned network training module 103 can train the target neural network according to the following steps:

Step 1, using the first training data to train the backbone network layer included in the target neural network to be trained to obtain the trained backbone network layer;

Step 2: When the network parameter values of the trained backbone network layer remain unchanged, use the second training data to train other network layers included in the target neural network to be trained to obtain other trained network layers.

Here, in order to extract a more general visual representation, the backbone network layer included in the target neural network to be trained may be trained using the acquired first training data including image-text pairs. Here, the backbone network layer and other network layers of the target neural network can be trained respectively by using different training data, so as to further improve the training performance of the corresponding network layer.

In the case of the trained backbone network layer, the network parameter values of the backbone network layer remain unchanged. At this time, based on local self-supervision, the second training data including images can be used to perform training on other network layers included in the target neural network. training to further improve the training performance of the target neural network.

Among them, the training process of the backbone network layer, in some embodiments, can be implemented through the following steps:

Step 1. Input the first training data into the target neural network to be trained, and obtain image feature information and text feature information respectively corresponding to the image and text in the image-text pair included in the first training data;

Step 2, based on the feature similarity between the image feature information and the text feature information, determine the first loss function value;

Step 3. When the current round of training does not meet the iteration cut-off condition, adjust the network parameter value of the backbone network layer based on the first loss function value, and perform the next round of training based on the adjusted backbone network layer until the iteration is satisfied. Deadline.

Here, the first loss function value can be determined based on the feature similarity between the image feature information corresponding to the image and the text in the image-text pair and the text feature information. The smaller the first loss function value, the better the relationship between the two feature information. The closer the feature similarity is, this is also the purpose of training the backbone network layer. Inputting the first training data into the untrained target neural network may include: using the untrained target neural network to perform feature extraction on the first training data.

Wherein, the above-mentioned image-text pairs may be crawled from the Internet, and the number of crawled image-text pairs is huge. In this case, the embodiments of the present disclosure can use self-supervision technology to find more supervision information from large-scale noisy image-text pairs, so as to ensure better training performance.

In addition, the iteration cut-off condition may be that the number of iterations is sufficient, the value of the first loss function is small enough, or the condition that the first training data has been traversed, etc., and there is no limitation here.

As shown in FIG. 4 , in some embodiments, the pre-training of the target neural network may include acquiring image data 401 and text data 403 . Using the image encoder 402 to extract the image features of the image data 401, using the text encoder 404 to extract the text features of the text data 403, the inner product can be determined based on all the image features and text features to obtain a feature matrix 405. For the image data 401, the row direction of the feature matrix 405 is a classifier, and for the text data 403, the column direction of the feature matrix 405 is also a classifier. Here, in the training process, the original supervision (Original Supervision) method can be used for training. In the embodiment of the present disclosure, the pre-training of the target neural network may include acquiring at least two image data 406 and at least two text data 407 . Image features of image data 406 are extracted by image encoder 407 , text features of text data 408 are extracted by text encoder 408 , feature extraction is performed on text features, and feature queue 410 is obtained. All image features, text features and feature queues 410 can be determined by inner product respectively to obtain a feature matrix 411 . Here, in the training process, initial supervision, self-supervision, multi-view supervision and nearest-neighbor supervision can be used for training. The feature matrix 411 obtained through training can be used to further perform pre-training on the backbone network layer 412 for tasks such as target detection or target segmentation 416, thereby fixing the parameters of the backbone network layer; Contrastive Learning, SOCO) mode 413, pre-training feature pyramid network (Feature Pyramid Networks, FPN) 414 and detection head (Hesd) network layer 415.

Regarding the training process of other network layers, in some embodiments, it can be realized through the following steps:

Step 1, inputting the second training data into the target neural network to be trained, and obtaining output results of other network layers included in the target neural network;

Step 2. Determine a second loss function value based on the output result and the labeling result of the image included in the second training data;

Step 3. When the current round of training does not meet the iteration cut-off condition, adjust the network parameter values of other network layers based on the second loss function value, and perform the next round of training based on the adjusted other network layers until the iteration is satisfied. Deadline.

Here, the second loss function value can be determined based on the matching degree between the output results of other network layers and the labeling results of the images included in the second training data. The smaller the second loss function value, the closer the two results are. This is also the purpose of training other network layers. Inputting the second training data into the target neural network to be trained may include: performing feature extraction on the second training data by using an untrained target neural network.

In addition, the embodiments of the present disclosure may not label the images included in the second training data, but use self-supervision to train other network layers, so as to further improve the training performance of the target neural network.

In addition, the relevant iteration cut-off conditions here are similar to the above-mentioned conditions for training the backbone network layer, and you can refer to the above description.

Considering that the target neural network in the embodiment of the present disclosure can be obtained by training for different task characteristics, and there is a large heterogeneity problem among different tasks, based on this, the embodiment of the present disclosure provides a balanced The training method of the joint neural network of multi-task performance, this method can be realized by the network training module 103 in the embodiment of the present disclosure, in some embodiments, can include steps:

Step 1, using at least two target neural networks to extract features from the training data in the training data set, and obtain the feature information output by the backbone network layer included in each target neural network;

Step 2. Determine the loss function value of the joint neural network to be trained based on the feature information output by the backbone network layer included in each target neural network, wherein the joint neural network consists of at least two target neural networks and the backbone of each target neural network. The connection layer composition between the network layers;

Step 4: Perform at least one round of network training on the joint neural network to be trained based on the value of the loss function to obtain a trained joint neural network.

Here, the characteristic information output by the backbone network layer included in each target neural network can be combined to determine the loss function value of the joint neural network to be trained. Since there is a connection layer between the backbone network layers included in each target neural network, use this connection The layer can fuse the feature information output by the backbone network layer included in each target neural network, and further make the trained joint neural network have the task characteristics of each target neural network.

In some embodiments, one target neural network in the at least two target neural networks is used as the main neural network of the joint neural network, and other target neural networks in the at least two target neural networks are used as the secondary neural network of the joint neural network. Here, The network training module 103 can determine the loss function value of the joint neural network to be trained according to the following steps:

Step 1. Based on the first feature information output by the first backbone network layer included in the secondary neural network, the second feature information output by the second backbone network layer included in the main neural network is updated to obtain updated second feature information;

Step 2, inputting the updated second characteristic information into other network layers included in the main neural network, and obtaining output results of other network layers;

Step 3: Determine the loss function value of the joint neural network based on the output results of other network layers and the labeling results of the corresponding task of the main neural network.

Here, the loss function value of the joint neural network can be determined based on the output results of other network layers determined by the updated second feature information and the comparison results of the labeling results under the corresponding task of the main neural network.

In some embodiments, by using other network layers included in the main neural network, feature extraction is performed on the updated second feature information, so as to obtain output results of other network layers. The main neural network and the auxiliary neural network can correspond to heterogeneous tasks, for example, they can be detection tasks and classification tasks respectively.

Among them, the main neural network can be obtained by using isomorphic data training, and the isomorphic data here can point to the data that performs similar tasks. For example, the main neural network here can be used to perform the detection task 1 of pedestrian detection and for Execute detection task 2 for vehicle detection. Similarly, the sub-neural network can also be trained using isomorphic data. For example, the sub-neural network here can be used to perform classification task 1 and classification task 2 of image classification.

Among them, similar tasks can share the same hidden layers of the network, but the network near the output layer starts to fork to do different tasks, such as the above-mentioned detection task 1 and detection task 2. Different tasks (that is, heterogeneous tasks) learn some common low-level abstraction features by sharing several hidden layers at the bottom of the network, and the parameters shared by the bottom layer can be exactly the same. In addition, according to the characteristics of each task, each task can design its own task-specific layer to learn features with a higher level of abstraction. All tasks can share some related hidden layers while retaining task-specific output layers.

In joint neural network training for heterogeneous multi-task learning, each task (e.g., classification, detection) has a backbone network layer with the same size as the parameter space. Here, the feature information output by the backbone network layer of the sub-neural network corresponding to the sub-task can be fused with the feature information output by the backbone network layer of the main neural network corresponding to the main task. Here, through the auxiliary training of the main neural network by the auxiliary neural network, the trained joint neural network can have the task characteristics of multi-task, and then it can be more commonly used in various task scenarios in subsequent downstream applications.

In the above process of feature fusion, the embodiment of the present disclosure introduces a connection layer T to help feature communication between neural networks. FIG. 5 is a schematic diagram of training two target neural networks according to an embodiment of the present disclosure. The training of the two target neural networks may include a mixed share (Mixed Share) manner, and the mixed share may include three branches. The network of the first branch may include a backbone network layer (Stage) 501, a first detection task (Head1) 502 and a second detection task (Head2) 503, and the network of the second branch may include a backbone network layer 501, a first classification task ( Head3) 504 and the second classification task (Head4) 505, etc. Soft sharing includes two branches, the left branch corresponds to the main neural network, and the right branch corresponds to the secondary neural network. The main neural network may include a backbone network layer 501 and a first detection task 502 and a second detection task 503 , and the secondary neural network may include a backbone network layer 501 and a first classification task 504 and a second classification task 505 . Stage corresponds to the backbone network layer of the neural network, and Head corresponds to other network layers related to tasks. For example, Head1 and Head2 correspond to detection task 1 and detection task 2 respectively, and Head3 and Head4 correspond to classification task 1 and classification respectively. Task 2.

In the case of inputting training data into the main neural network and the sub-neural network, the stage included in the main neural network can extract the corresponding second feature information, and the stage included in the sub-neural network can extract the corresponding first feature information, through the connection layer T can realize fusion between the first feature information and the second feature information.

It should be noted that in the process of determining the loss function value of the joint neural network, the above loss function can be determined based on the second feature information output by the main neural network and the labeling results under the two tasks corresponding to the main neural network. value.

The arrangement position of the connection layer T in the embodiments of the present disclosure may be varied. As shown in Figures 6(a) to 6(c), Figure 6(a) shows that the connection layer 602 can be placed in the middle of a specific backbone network layer 601, and feature migration is performed on a feature layer of the same size, 6( b) shows that the connection layer 604 performs feature migration between backbone network layers 603 of different sizes, and Figure 6(c) shows the connection layer 606 of the connection layer 607 (block) in different backbone network layers 605 of the same size Perform fusion.

It should be noted that the training data here may be image samples related to each task, and the image samples may be training data with labeling results.

The process of training the target neural network and the joint neural network in the data processing system provided by the embodiments of the present disclosure may be obtained through upstream training. In order to expand the generalization capability in downstream business scenarios, the embodiments of the present disclosure provide a data re-characterization training method to retrain the joint neural network. In some embodiments, the network training shown in FIG. 1 can be used. The network migration module 104 connected with the module 103 is realized.

Wherein, the above-mentioned process of training the joint neural network, in some embodiments, may include the following steps:

Step 1. Based on at least two images included in the training data set, determine a codebook for decomposing each image into at least two primitives for characterizing;

Step 2. In the case of migrating the trained target neural network to a downstream business scenario, characterize the target training data collected in the target business scenario based on the obtained codebook, and obtain the represented target training data;

Step 3: retraining the joint neural network by using the represented target training data to obtain a trained joint neural network for processing the target scene data collected in the target business scene.

Here, you can first use the upstream training data to learn a codebook, use this codebook to re-represent the downstream training data, then use the re-represented downstream data to fine-tune the joint neural network, and finally use the original downstream data for final Fine-tuning, so that the generalization performance of the generated joint neural network in downstream business scenarios can be extended. In order to fully exploit the features contained in the joint neural network trained upstream and avoid the problem of information loss when migrating to the downstream, here, the codebook determined by the upstream training data can be used to re-characterize the downstream target training data to efficiently and accurately Apply the trained joint neural network to the downstream target business scenario.

In practical applications, codebook training can be performed using an adversarial network composed of paired encoders and decoders. Here, the image can be input to the untrained encoder to obtain the codebook output by the encoder; the codebook output by the encoder can be input to the untrained decoder to obtain the image output by the decoder, and then the image output by the decoder can be verified Whether the similarity with the input image is greater than the preset threshold, if not greater than the preset threshold, repeat the above process until it is greater than the preset threshold. The codebook here can be an image coding based on an adversarial network composed of an encoder and a decoder, with high accuracy.

Here, by using the trained codebook, an encoder can decompose the picture into a codebook composed of several primitives, and then the decoder can basically restore these primitives to the picture.

In this way, in the case of using the trained codebook to re-represent the target training data (corresponding to downstream data) collected in the downstream business scenario, the re-represented downstream data can be used to fine-tune the network. In this step, the pre-training can be fixed The parameters of the backbone network layer of the joint neural network only adjust the parameters of other network layers related to the task behind the backbone network layer to improve the generalization ability in task scenarios. Wherein, the above-mentioned target training data may be images, or other training data including images.

In the embodiment of the present disclosure, after the joint neural network is fine-tuned according to the above method, the original downstream data can also be used for final adjustment, so as to further improve the training performance of the joint neural network. In addition, the embodiment of the present disclosure can also retrain the trained target neural network according to the above data re-characterization method to improve the generalization performance of the target neural network. For the training process, refer to the above description.

As shown in FIG. 7 , in the initialization (Priming) process of the first step (Stage1), an encoder 702 can be used to perform training based on upstream data 701 to obtain a codebook 703 . Wherein, the codebook 703 may be re-represented into the upstream data 701 by using the decoder 704 . In the downstream image re-characterization process of the second step (Stage2), the encoder 702 can be used to determine the codebook 706 corresponding to the downstream images (Downstream Images) 705, and the decoder 704 can be used to re-characterize the codebook 706 into an output image (Transferred Images)707. In the fine-tuning process of the third step (Stage3), it can be based on the output image (Transferred Images) 707 and the pretraining model (Pretrain Model) 708. In this step, the parameters of the pretraining model 708 are fixed (that is, training), adjust the parameters in the collection network layer, detection head (Neck&Head) 709 and task loss (Task Loss) 710 related network of non-fixed parameters (that is, trainable). In the final adjustment process of the fourth step (Stage4), the parameters in the pre-training model 708, collection network layer, detection head 709 and task loss (TaskLoss) 710 related network can be further adjusted based on the downstream image 705. This step In , the parameters of the pre-training model 708 are non-fixed (that is, not trainable).

Those skilled in the art can understand that in the above methods in some embodiments, the writing order of each step does not imply a strict execution order and constitutes any limitation on the implementation process. The execution order of each step should be based on its function and possible internal Logically OK.

Based on the same technical idea, the embodiment of the present disclosure also provides a data processing method and device corresponding to the data processing system. Since the principle of solving the problem of the method and device in the embodiment of the present disclosure is similar to that of the above-mentioned data processing system in the embodiment of the present disclosure, Therefore, the implementation of the method and the device can refer to the implementation of the method.

Referring to FIG. 8 , which is a flowchart of a data processing method provided by an embodiment of the present disclosure, the method includes steps S801 to S802, wherein:

S801: Obtain a training data set and at least two network composition modules for forming a target neural network;

S802: Generate at least one target neural network based on the acquired training data set and at least two network constituent modules; each target neural network is used to perform a corresponding target task.

Here, at least one target neural network for performing a corresponding target task can be generated based on the acquired training data set and at least two network constituent modules.

For the acquisition methods of the training data set and the network component modules, refer to the relevant descriptions in the above-mentioned system embodiments, and for the method of generating the target neural network, please refer to the above-mentioned descriptions.

Here, if at least two target neural networks have been trained, at least two target neural networks can be jointly trained to obtain a trained joint neural network; the joint neural network is used for migration to downstream business scenarios to perform target tasks. For the training process of the joint neural network and the corresponding application process, please refer to the above description.

It should be noted that the execution subject of the data processing method provided by the embodiments of the present disclosure is generally an electronic device with certain computing capabilities, such as: a terminal device or a server or other processing device, and the terminal device may be a user device ( User Equipment, UE), mobile devices, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementation manners, the data processing method may be implemented by a processor invoking computer-readable instructions stored in a memory.

Referring to FIG. 9 , which is a schematic diagram of a data processing device provided by an embodiment of the present disclosure, the device includes: an acquisition module 901 and a generation module 902; wherein,

An acquisition module 901 configured to acquire a training data set and at least two network composition modules for forming a target neural network;

The generation module 902 is configured to generate at least one target neural network based on the acquired training data set and at least two network composition modules; each target neural network is used to perform a corresponding target task.

For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.

In some embodiments, the device further includes: an execution module configured to perform joint training on at least two of the target neural networks to obtain a trained joint neural network; the joint neural network is used for migration to downstream business scenarios execute the target task.

As for the device embodiment, since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to at least two network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. It can be understood and implemented by those skilled in the art without creative effort.

The embodiment of the present disclosure also provides an electronic device, as shown in FIG. 10 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 1001 , a memory 1002 , and a bus 1003 . The memory 1002 stores machine-readable instructions executable by the processor 1001 (for example, execution instructions corresponding to the acquisition module 901 and the generation module 902 in the device in FIG. Communication between them through the bus 1003, when the machine-readable instructions are executed by the processor 1001, the following processing is performed:

Acquiring at least two candidate search paths associated with at least two network constituent modules, wherein each candidate search path corresponds to a combination mode, and the combination mode is used to characterize the operation relationship between each network constituent module;

performing at least one search on at least two candidate search paths by using a reinforcement learning network to obtain a return score after each search;

According to the combination method corresponding to the candidate search path whose return score meets the preset requirements, each network component module is combined to obtain the target neural network.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data processing method described in the foregoing method embodiments are executed. Wherein, the computer-readable storage medium may only store the computer program corresponding to the data processing method.

A computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device, and may be a volatile storage medium or a nonvolatile storage medium. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, Random Access Memory (RAM), Read-Only Memory (ROM), computer Erasable Programmable Read-Only Memory (EPROM or Flash), Static Random Access Memory (SRAM), Compact Disk Read-Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanically Encoded Devices , such as a punched card with instructions stored thereon, or a raised structure in a groove, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

An embodiment of the present disclosure also proposes a computer program, the computer program includes computer readable code, and when the computer readable code is read and executed by a computer, part of the method in any embodiment of the present disclosure is implemented or all steps.

Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data processing method described in the above-mentioned method embodiment. For details, please refer to the above-mentioned method implementation example.

Wherein, the above-mentioned computer program product may be realized by hardware, software or a combination thereof. In an optional embodiment, the computer program product may be embodied as a computer storage medium, and in another optional embodiment, the computer program product may be embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. wait.

Those skilled in the art can clearly understand that for the convenience and brevity of description, some working processes of the above-described systems and devices can refer to the corresponding processes in the foregoing method embodiments. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, at least two units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to at least two network units . Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: various media capable of storing program codes such as U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.

Finally, it should be noted that: the above-described embodiments are only some implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than to limit them. The protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims

A data processing system, comprising: a data collection module, a network generation module and a network training module; the data collection module, the network generation module and the network training module are sequentially connected by communication;

The data collection module is configured to obtain a training data set and at least two network composition modules for forming a target neural network;

The network generation module is configured to generate at least one target neural network based on the acquired training data set and at least two of the network composition modules; each of the target neural networks is used to perform a corresponding target task;

The network training module is configured to perform joint training on at least two of the target neural networks when at least two of the target neural networks have been trained to obtain a trained joint neural network; the joint neural network uses Perform the target tasks in the migration to downstream business scenarios.
The system according to claim 1, wherein, in the case that the training data set includes training data corresponding to the target task, the network generation module is configured to generate The target neural network for the task:

Determining at least two candidate search paths associated with at least two of the network component modules, wherein each of the candidate search paths corresponds to a combination method, and the combination method is used to characterize the relationship between each of the network component modules. operation relationship;

performing at least one search on at least two of the candidate search paths by using the training data corresponding to the target task and the reinforcement learning network, and obtaining a reward score after each search;

According to the combination manner corresponding to the candidate search paths whose reward scores meet the preset requirements, each of the network constituent modules is combined to obtain a target neural network for performing the target task.
The system according to claim 2, wherein the network generation module is configured to obtain the reward score after each search according to the following steps:

Use the reinforcement learning network to perform a first search on at least two of the candidate search paths, and based on the candidate search paths selected by the first search and the training data corresponding to the target task, determine the first The return score after the search and the selection probability of selecting the corresponding candidate search path;

Perform the following steps in a loop until the network cut-off condition is met:

Based on the return score after the n-1th search and the selection probability of selecting the corresponding candidate search path, determine the candidate search path selected by the nth search, and based on the candidate search path selected by the nth search and the corresponding The training data of the target task determines the reward score after the nth search and the selection probability of selecting the corresponding candidate search path, where n is an integer greater than 1.
The system according to claim 3, wherein the network generation module is configured to determine the return score after the nth search according to the following steps:

Constructing a candidate neural network based on the candidate search path selected by the nth search;

Based on the training data corresponding to the target task, determine the network accuracy of the candidate neural network constructed;

Based on the network accuracy of the constructed candidate neural network, the reward score after the nth search is determined.
The system according to claim 4, wherein the network generation module is configured to determine the network accuracy of the candidate neural network constructed according to the following steps:

Predicting the training data for the target task by using the constructed candidate neural network to obtain an output result of the candidate neural network;

The output result is compared with the labeling result for the training data to determine the network accuracy of the candidate neural network.
The system according to any one of claims 2 to 5, wherein the candidate search path whose return score meets the preset requirements is selected according to one of the following methods:

selecting the candidate search path with the highest score for said reward; or,

Ranking the candidate search paths corresponding to each search according to the reward scores, and selecting the candidate search paths whose ranking is higher than a preset ranking; or,

Selecting candidate search paths whose reward scores are higher than a preset threshold.
The system according to any one of claims 1 to 6, wherein the data collection module is configured to obtain the training data set according to the following steps:

Use the network input interface to obtain network data;

Perform quality assessment on the acquired network data based on the active learning network, determine network data whose data quality is higher than a preset threshold, and use the network data whose data quality is higher than the preset threshold as the training data set training data.
The system according to any one of claims 1 to 7, wherein the data collection module is configured to obtain the training data set according to the following steps:

Obtain a training data set including initial labeling results;

Extending the initial labeling result by using the knowledge graph structure to obtain the expanded labeling result;

The training data set is updated based on the expanded labeling result.
The system according to any one of claims 1 to 8, wherein the at least two network composition modules include at least a feature map extraction unit and a downsampling unit for downsampling the feature map output by the feature map extraction unit.
The system according to any one of claims 1 to 9, wherein the target neural network includes a backbone network layer for feature extraction and other network layers for feature processing; the training data set includes at least two The first training data of image-text pairs, and the second training data with at least two images; the network training module is configured to train the target neural network according to the following steps:

Using the first training data to train the backbone network layer included in the target neural network to be trained to obtain the trained backbone network layer;

Under the condition that the network parameter value of the trained backbone network layer remains unchanged, the second training data is used to train other network layers included in the target neural network to be trained to obtain other trained network layers.
The system according to claim 10, wherein the network training module is configured to obtain a trained backbone network layer according to the following steps:

Using an untrained target neural network to perform feature extraction on the first training data, to obtain image feature information and text feature information respectively corresponding to the image and text in the image-text pair included in the first training data;

determining a first loss function value based on the feature similarity between the image feature information and the text feature information;

In the case that the current round of training does not meet the iteration cut-off condition, the network parameter value of the backbone network layer is adjusted based on the first loss function value, and the next round of training is performed based on the adjusted backbone network layer until the condition is satisfied. The iteration cutoff condition.
The system according to claim 10 or 11, wherein the network training module is configured to obtain other trained network layers according to the following steps:

Using the untrained target neural network to perform feature extraction on the second training data to obtain output results of other network layers included in the target neural network;

determining a second loss function value based on the output result and the labeling result of the image included in the second training data;

When the current round of training does not meet the iteration cut-off condition, adjusting the network parameter values of the other network layers based on the second loss function value, and performing the next round of training based on the adjusted other network layers, until the iteration cut-off condition is met.
The system according to any one of claims 10 to 12, wherein the network training module is configured to obtain a trained joint neural network according to the following steps:

Using the at least two target neural networks to perform feature extraction on the training data in the training data set, to obtain the feature information output by the backbone network layer included in each target neural network;

Based on the feature information output by the backbone network layer included in each of the target neural networks, determine the loss function value of the untrained joint neural network, wherein the joint neural network consists of at least two of the target neural networks, and each The connection layer between the backbone network layers included in the target neural network;

Perform at least one round of network training on the joint neural network to be trained based on the loss function value to obtain the trained joint neural network.
The system according to claim 13, wherein one of the at least two target neural networks serves as the main neural network of the joint neural network, and the other target neural networks of the at least two target neural networks As the secondary neural network of the joint neural network; the network training module is configured to determine the loss function value of the joint neural network to be trained according to the following steps:

Based on the first feature information output by the first backbone network layer included in the secondary neural network, the second feature information output by the second backbone network layer included in the main neural network is updated to obtain updated second feature information ;

Based on the updated second feature information, determine a loss function value of the joint neural network to be trained.
The system according to claim 14, wherein the network training module is configured to determine the loss function value of the joint neural network according to the following steps:

Using other network layers included in the main neural network to perform feature extraction on the updated second feature information to obtain output results of other network layers;

The loss function value of the joint neural network is determined based on the output results of the other network layers and the labeling results of the corresponding tasks of the main neural network.
The system according to any one of claims 1 to 15, wherein, in the case where the training data set includes at least two images, the system further comprises a network migration module; the network migration module and the network training module communication connection;

The network migration module is configured to determine, based on at least two of the images, a codebook for decomposing each of the images into at least two primitives; when migrating the trained joint neural network to a downstream business scenario In the case of , based on the obtained codebook, characterize the target training data collected in the target business scenario to obtain the characterized target training data; use the characterized target training data to perform the joint neural network Train again to obtain a trained joint neural network for processing the target scene data collected in the target business scene.
The system according to claim 16, wherein the network migration module is configured to determine a codebook for decomposing each of the images into at least two primitives according to the following steps:

Repeat the following steps until the similarity between the image output by the decoder and the image input to the decoder is greater than a preset threshold:

Using an untrained encoder to encode the images in the training data set to obtain a codebook output by the encoder; using an untrained decoder to decode the codebook output by the encoder to obtain The image output by the decoder.
The system according to claim 16 or 17, wherein the network migration module is configured to obtain the trained joint neural network for processing the target scene data collected under the target business scene according to the following steps :

Under the condition that the network parameter values of the backbone network layers included in the joint neural network remain unchanged, the other network layers included in the joint neural network are retrained using the characterized target training data to obtain the trained other network layers.
A data processing method, comprising:

Obtain a training data set and at least two network building blocks for forming a target neural network;

At least one target neural network is generated based on the acquired training data set and at least two network components; each target neural network is used to perform a corresponding target task.
The method according to claim 19, wherein, where at least two of the target neural networks have been trained, the method further comprises:

Perform joint training on at least two of the target neural networks to obtain a trained joint neural network; the joint neural network is used to migrate to a downstream business scenario to perform the target task.
A data processing device, comprising:

An acquisition module configured to acquire a training data set and at least two network composition modules for forming a target neural network;

The generating module is configured to generate at least one target neural network based on the acquired training data set and at least two of the network composition modules; each of the target neural networks is used to perform a corresponding target task.
A data processing device, wherein, in the case of having trained at least two of the trained multiple target neural networks, the device further includes:

The execution module is configured to jointly train at least two of the target neural networks to obtain a trained joint neural network; the joint neural network is used to migrate to a downstream business scenario to perform the target task.
An electronic device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the memory through the bus , when the machine-readable instructions are executed by the processor, the steps of the data processing method according to claim 19 or 20 are executed.
A computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the steps of the data processing method as claimed in claim 19 or 20 are executed.
A computer program comprising computer readable code, executed by a processor in the device for implementing the method of claim 19 or 20, when the computer readable code is run on the device.
A computer program product configured to store computer readable instructions which, when executed, cause a computer to perform the method of claim 19 or 20.