Nothing Special   »   [go: up one dir, main page]

CN115906935A - Parallel differentiable neural network architecture searching method - Google Patents

Parallel differentiable neural network architecture searching method Download PDF

Info

Publication number
CN115906935A
CN115906935A CN202211299553.2A CN202211299553A CN115906935A CN 115906935 A CN115906935 A CN 115906935A CN 202211299553 A CN202211299553 A CN 202211299553A CN 115906935 A CN115906935 A CN 115906935A
Authority
CN
China
Prior art keywords
network
neural network
units
unit
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211299553.2A
Other languages
Chinese (zh)
Other versions
CN115906935B (en
Inventor
张秀伟
王文娜
尹翰林
邢颖慧
崔恒飞
张艳宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211299553.2A priority Critical patent/CN115906935B/en
Publication of CN115906935A publication Critical patent/CN115906935A/en
Application granted granted Critical
Publication of CN115906935B publication Critical patent/CN115906935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a parallel differentiable neural network architecture searching method, which comprises the steps of firstly constructing a dual-path super network with a binary gate; then, carrying out search space serialization by using a sigmoid function; then, optimizing the super network by using a gradient descending mode to obtain an optimal basic unit comprising a common unit and a reduction unit; and finally, stacking the obtained basic units to obtain a required deep neural network, and retraining the deep neural network until the network converges. By designing a rapid and parallel differentiable neural network architecture searching method, the speed and the performance of neural network architecture searching are obviously improved.

Description

Parallel differentiable neural network architecture searching method
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a parallel differentiable neural network architecture searching method.
Background
The rapid development of deep learning proves the dominant position of the deep learning in the fields of artificial intelligence and deep learning. Due to the diligent efforts of researchers, the performance of deep neural networks is constantly increasing. However, since the manual design of the neural network requires a continuous trial and error process and relies heavily on the design experience of experts, it is time-consuming and resource-consuming to manually create the neural network structure. To reduce manpower and cost, neural Network Architecture Search (NAS) techniques are proposed. The NAS is a technology for automatically searching a neural network architecture by means of an algorithm to meet requirements of different tasks, and becomes a research hotspot in the field of automatic machine learning.
The core of the NAS method is to construct a huge search space, then an efficient search algorithm is adopted to mine the space, and the optimal architecture is found under a series of training data and constraint conditions. Early work was primarily based on reinforcement learning and evolutionary algorithms. They have shown great potential in finding high performance neural network architectures. However, the neural network architecture search method based on reinforcement learning and evolutionary algorithm usually bears heavy computational burden, which seriously hinders the wide application and research of NAS. To reduce the heavy computational burden, weight sharing algorithms have been proposed that formulate the search space as an over-parameterized super-network and evaluate the sampled architecture without additional optimization. By sharing the weights, NAS speeds up by several orders of magnitude.
One particular type of weight sharing method is the micro neural network architecture search technique proposed in the document "certificates: differentiated architecture search". The technology firstly defines a search space as a super network stacked by basic units (a common unit and a reduction unit), and finds the optimal architecture of a neural network by searching the basic units. DARTS then converts the discrete operations into a way of weighting a fixed set of operations, so the super-network can be trained by a gradient-based two-layer optimization method. This makes it more potential for NAS to explore optimal network architectures from a large architectural search space. Nevertheless, the prior art has certain limitations: still need bear huge network architecture and the huge amount of computation cost that redundant space brought, therefore the extensive application and the research of NAS have been restricted.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a parallel differentiable neural network architecture searching method, which comprises the steps of firstly constructing a dual-path super network with a binary gate; then, carrying out search space serialization by using a sigmoid function; then, optimizing the super network by using a gradient descending mode to obtain an optimal basic unit comprising a common unit and a reduction unit; and finally, stacking the obtained basic units to obtain a required deep neural network, and retraining the deep neural network until the network converges. By designing a rapid and parallel differentiable neural network architecture searching method, the speed and the performance of neural network architecture searching are obviously improved.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a dual-path super network with binary gates;
the super network is formed by stacking L basic units;
the basic unit comprises a common unit and a reduction unit; the common unit and the reduction unit are directed acyclic graphs formed by 7 nodes, wherein the directed acyclic graphs comprise 2 input nodes, 4 middle nodes and 1 output node, the connection between the nodes represents different operations, and the connection relationship between the nodes in the common unit and the reduction unit is different;
step 1-1: setting the operation pool as O, wherein the operation pool O comprises 8 basic operation operators which are respectively: sep-conv-3X 3, sep-conv-5X 5, dil-conv-3X 3, dil-conv-5X 5, max-pool-3X 3, avg-pool-3X 3, skip-connection and none;
the operation pool O is decomposed into two operator subsets O by random sampling 1 And O 2 In which O is 1 And O 2 Satisfy | O 1 |=|O 2 |,|O 1 |+|O 2 I = O and
Figure BDA0003903563870000021
O 1 and O 2 Respectively used for constructing two sub-networks;
two groups of channels are sampled from input channels of the whole network and are respectively adopted by two sub-networks, and the two sub-networks are finally combined into one sub-network through addition operation; for two different nodes x in a basic unit of a super network i To x j The information dissemination, described as:
Figure BDA0003903563870000022
Figure BDA0003903563870000023
Figure BDA0003903563870000024
wherein x is i And x j Represent different nodes, and 0 ≦ i<j≤5,
Figure BDA0003903563870000025
And &>
Figure BDA0003903563870000026
Each represents O 1 And O 2 Weights of different operations;
Figure BDA0003903563870000027
And &>
Figure BDA0003903563870000028
Are two sets of channel sample masks, the masks consisting of only 0 and 1;
Figure BDA0003903563870000029
And &>
Figure BDA00039035638700000210
Respectively representing selected and unselected channels;
Figure BDA00039035638700000211
And &>
Figure BDA00039035638700000212
Two groups of selected channels are adopted by two operation operator subsets simultaneously;
the super network covers all the frameworks in the form of two parallel paths;
step 1-2: in the process of training the super network, selectively activating each path to participate in training by using binary gating;
for two nodes x in a basic unit i To x j The binary gating of the super network is described as:
Figure BDA0003903563870000031
wherein the value of gate1 and gate2 is 0 or 1, excluding the situation that gate1 and gate2 are 0 at the same time; binary gating operation is carried out in a random sampling mode to selectively activate corresponding paths to participate in training;
step 2: utilizing a sigmoid function to carry out search space serialization and redefining two sub-networks;
Figure BDA0003903563870000032
Figure BDA0003903563870000033
where δ (·) represents a sigmoid function, which is calculated as follows:
Figure BDA0003903563870000034
and step 3: optimizing the super network by using a gradient descending mode to obtain an optimal basic unit comprising a common unit and a reduction unit;
finding the optimal alpha by jointly optimizing the network parameter w and the structure parameter alpha to determine the optimal basic unit:
Figure BDA0003903563870000035
s.t.w * (α)=arg min L train (w,α)
wherein L is train For training loss, L val In order to verify the loss, cross entropy loss is adopted for both training loss and verification loss;
after obtaining the structural parameter α, according to a one-hot encoding:
Figure BDA0003903563870000036
selecting two operations with the maximum alpha value as the input of the intermediate node of the basic unit;
and 4, step 4: and (3) stacking the basic units obtained in the step (2) to obtain a required deep neural network, and retraining the deep neural network until the network converges.
Preferably, the super network optimization process in step 3 adopts a network formed by stacking 6 common units and 2 reduction units, wherein the 2 reduction units are respectively located at 1/3 and 2/3 of the total depth of the network.
Preferably, the deep neural network required in step 4 is a deep neural network for CIFAR-10, and is formed by stacking 20 basic units, wherein each basic unit comprises 2 reduction units and 18 common units.
Preferably, the deep neural network required in step 4 is a deep neural network for ImageNet, and is formed by stacking 12 basic units, wherein each basic unit comprises 2 reduction units and 12 common units, and the 2 reduction units are respectively located at 1/3 and 2/3 of the total depth of the network.
The invention has the following beneficial effects:
the invention provides a rapid and parallel differentiable neural network architecture searching technology, which reduces the memory consumption in the training process and improves the neural network architecture searching speed by constructing a dual-path super network with a binary gate. At the same time, considering that softmax is used to select the best input for the intermediate nodes of the two operator subsets, unfair problems may be encountered. In order to solve the problem, a sigmoid function is introduced, and the performance of each operation operator is measured under the condition of no normalization, so that the performance of the neural network architecture search technology is ensured. The invention obviously improves the speed and the performance of the neural network architecture search.
Drawings
Fig. 1 is a diagram of implementation steps of a fast parallel search method for a differentiable neural network architecture according to the present invention.
Fig. 2 is a search method of the present invention, taking a basic unit as an example.
FIG. 3 is a structural diagram of a basic unit searched on CIFAR-10 according to the present invention: wherein (a) is a normal unit and (b) is a reduction unit.
FIG. 4 is a diagram of the structure of the basic unit searched on ImageNet in the present invention: wherein (a) is a normal unit and (b) is a reduction unit.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
A parallel differentiable neural network architecture searching method comprises the following steps:
step 1: constructing a dual-path super network with binary gates;
the super network is formed by stacking L basic units;
the basic unit comprises a common unit and a reduction unit; the common unit and the reduction unit are directed acyclic graphs formed by 7 nodes, wherein the directed acyclic graphs comprise 2 input nodes, 4 middle nodes and 1 output node, the connection between the nodes represents different operations, and the connection relationship between the nodes in the common unit and the reduction unit is different;
step 1-1: setting the operation pool as O, wherein the operation pool O comprises 8 basic operation operators which are respectively: sep-conv-3X 3, sep-conv-5X 5, dil-conv-3X 3, dil-conv-5X 5, max-pool-3X 3, avg-pool-3X 3, skip-connection and none;
the operation pool O is decomposed into two operator subsets O by random sampling 1 And O 2 In which O is 1 And O 2 Satisfy | O 1 |=|O 2 |,|O 1 |+|O 2 I = O and
Figure BDA0003903563870000051
O 1 and O 2 Respectively used for constructing two sub-networks;
two groups of channels are sampled from input channels of the whole network and are respectively adopted by two sub-networks, and the two sub-networks are finally combined into one sub-network through addition operation; for two different nodes x in a basic unit of a super network i To x j The information dissemination of (c), described as:
Figure BDA0003903563870000052
Figure BDA0003903563870000053
Figure BDA0003903563870000054
wherein x is i And x j Represent different nodes, and 0 ≦ i<j≤5,
Figure BDA0003903563870000055
And &>
Figure BDA0003903563870000056
Each represents O 1 And O 2 Weights of different operations in;
Figure BDA0003903563870000057
And &>
Figure BDA0003903563870000058
Are two sets of channel sample masks, the masks consisting of only 0 and 1;
Figure BDA0003903563870000059
And &>
Figure BDA00039035638700000510
Respectively representing selected and unselected channels;
Figure BDA00039035638700000511
And &>
Figure BDA00039035638700000512
Two groups of selected channels are adopted by two operation operator subsets simultaneously;
the super network covers all the frameworks in the form of two parallel paths;
step 1-2: in the process of training the super network, selectively activating each path to participate in training by using binary gating;
for two nodes x in a basic unit i To x j The binary gating of the super network is described as:
Figure BDA00039035638700000513
wherein the value of gate1 and gate2 is 0 or 1, excluding the situation that gate1 and gate2 are 0 at the same time; binary gating operation is carried out in a random sampling mode to selectively activate corresponding paths to participate in training;
step 2: utilizing a sigmoid function to carry out search space serialization and redefining two sub-networks;
Figure BDA00039035638700000514
Figure BDA00039035638700000515
where δ (·) represents a sigmoid function, which is calculated as follows:
Figure BDA00039035638700000516
the super network optimization process adopts a network formed by stacking 6 common units and 2 reduction units, wherein the 2 reduction units are respectively positioned at 1/3 and 2/3 of the total depth of the network;
and step 3: optimizing the super network by using a gradient descending mode to obtain an optimal basic unit comprising a common unit and a reduction unit;
finding the optimal alpha by jointly optimizing the network parameter w and the structure parameter alpha to determine the optimal basic unit:
Figure BDA0003903563870000061
s.t.w * (α)=arg min L train (w,α)
wherein L is train To exercise loss, L val In order to verify the loss, cross entropy loss is adopted for both training loss and verification loss;
after obtaining the structural parameter α, according to a one-hot encoding:
Figure BDA0003903563870000062
selecting two operations with the maximum alpha value as the input of the intermediate node of the basic unit;
and 4, step 4: and (3) stacking the basic units obtained in the step (2) to obtain a required deep neural network, and retraining the deep neural network until the network converges.
The deep neural network for CIFAR-10 is formed by stacking 20 basic units, wherein each basic unit comprises 2 reduction units and 18 common units.
The deep neural network for ImageNet is formed by stacking 12 basic units, wherein each basic unit comprises 2 reduction units and 12 common units, and the 2 reduction units are respectively positioned at 1/3 and 2/3 of the total depth of the network.
The specific embodiment is as follows:
the fast parallel differentiable neural network architecture searching method of the embodiment specifically comprises the following steps:
s1: constructing a super network, wherein the super network is a dual-path super network with binary gates;
s2: utilizing a sigmoid function to carry out search space serialization;
s3: optimizing the super network by using a gradient descending mode to obtain an optimal basic unit (a common unit and a reduction unit);
s4: and (3) stacking the basic units obtained in the step (S2) to obtain a required deep neural network, and retraining the deep neural network until the network converges.
By adopting the technical scheme, the memory consumption of the neural network architecture searching technology is reduced by constructing the super network containing two parallel paths and gating operation, so that the neural network architecture searching speed is increased. Unfair problems may be encountered when using softmax to select the best input for the intermediate node of the two operator subsets. In order to solve the problem, a sigmoid function is introduced to carry out search space serialization, and the sigmoid function is used for measuring the performance of operation under the condition of no normalization. The invention obviously improves the speed and the performance of the neural network architecture search. In step S1, the super network is formed by stacking L basic units. The basic unit comprises a common unit and a reduction unit. The normal unit and the reduction unit are directed acyclic graphs composed of 7 nodes, including 2 input nodes, 4 intermediate nodes, and 1 output node, and the connection between the nodes represents a possible operation (e.g., convolution of 3 × 3). The reduction unit adopts convolution with stride of 2, so that the spatial resolution of the characteristic diagram is reduced to half of the original resolution. In order to increase the search speed, the invention constructs a dual-path super network with binary gates. The specific construction process of the dual-path super network with the binary gate is as follows:
s11: assuming that the whole operation pool is O, the operation pool O contains 8 basic operation operators, which are: sep-conv-3X 3, sep-conv-5X 5, dil-conv-3X 3, dil-conv-5X 5, max-pool-3X 3, avg-pool-3X 3, identity (skip-connection) and none. The operation pool O is decomposed into two smaller operator subsets O 1 And O 2 In which O is 1 And O 2 Satisfy | O 1 |=|O 2 |,|O 1 |+|O 2 I = O and
Figure BDA0003903563870000071
O 1 and O 2 Respectively for constructing two smaller sub-networks. In order to reduce the computational burden and to make the search space cover all possible architectures, the invention designs a dual-path super network with binary gates. First, the present invention employs a partial connection strategy. In particular, two groups of channels are sampled from the whole input channel, which are respectively employed by the two sub-networks. The two sub-networks are combined into one by addition, so that the super-network appears in a parallel fashion, as node x i To node x j For example, the super network may be described as follows:
Figure BDA0003903563870000072
Figure BDA0003903563870000073
Figure BDA0003903563870000074
wherein x is i And x j Represent different nodes, and 0 ≦ i<j≤5。
Figure BDA0003903563870000075
And &>
Figure BDA0003903563870000076
Each represents O 1 And O 2 The weights of the different operations in (1).
Figure BDA0003903563870000077
And &>
Figure BDA0003903563870000078
Are two sets of channel sample masks, the masks consisting of only 0 and 1.
Figure BDA0003903563870000079
And &>
Figure BDA00039035638700000710
Representing selected and unselected channels, respectively.
Figure BDA00039035638700000711
And &>
Figure BDA00039035638700000712
Two selected sets of channels are employed simultaneously by two subsets of operators. This design brings an intuitive advantage that the super network can cover all possible architectures in the form of two parallel paths;
s12: in the process of training the super network, each path is selectively activated to participate in training by using binary gating. With node x i To node x j For example, the super network may be described as:
Figure BDA0003903563870000081
wherein the values of gate1 and gate2 are 0 or 1. In the actual operation, the case where gate1=0 and gate2=0 is excluded. Binary gating operation is performed in a random sampling mode to selectively activate corresponding paths to participate in training, so that the memory cost is greatly reduced.
In the step S2, the sigmoid function is adopted to perform search space serialization, which is specifically as follows:
Figure BDA0003903563870000082
Figure BDA0003903563870000083
where δ (·) represents a sigmoid function, which is calculated as follows:
Figure BDA0003903563870000084
in step 3, the super network is optimized, and an optimal α is found by jointly optimizing a network parameter w and a structural parameter α to determine an optimal basic unit:
Figure BDA0003903563870000085
s.t.w * (α)=arg min L train (w,α)
wherein L is train To exercise loss, L val To verify the loss. And cross entropy loss is adopted for both training loss and verification loss. After obtaining the architecture parameter α, according to one-hot encoding:
Figure BDA0003903563870000086
and selecting the two operations with the maximum alpha value as the input of the middle node of the basic unit.
In the super network optimization process in the step S3, a large network formed by stacking 6 common units and 2 reduction units is adopted, wherein the 2 reduction units are respectively located at 1/3 and 2/3 of the total depth of the network.
In the step S4, the deep neural network for CIFAR-10 is formed by stacking 20 basic units (2 reduction units and 18 common units), and the deep neural network for ImageNet is a large network constructed by stacking 12 basic units (2 reduction units and 12 common units). Wherein 2 reduction units are respectively positioned at 1/3 and 2/3 of the total depth of the network.
And (4) evaluating the deep neural network constructed in the step (S4) on a corresponding data set, and testing the performance of the deep neural network. Wherein, the classification accuracy of 97.47 percent is realized on the CIFAR-10 only by using 0.08GPU day. Compared with the result of reference "Darts" (classification accuracy of 97.24% in 1GPU day), the search rate and the network performance are greatly improved, wherein the search rate is improved by 12.5 times. Meanwhile, the searching speed of the method is high, the method supports the direct searching on ImageNet, and only 2.44GPU days are utilized on ImageNet, so that the Top-1 classification accuracy of 76.1 percent and the Top-5 classification accuracy of 92.8 percent are realized.

Claims (4)

1. A parallel differentiable neural network architecture searching method is characterized by comprising the following steps:
step 1: constructing a dual-path super network with binary gates;
the super network is formed by stacking L basic units;
the basic unit comprises a common unit and a reduction unit; the common unit and the reduction unit are directed acyclic graphs formed by 7 nodes, wherein the directed acyclic graphs comprise 2 input nodes, 4 middle nodes and 1 output node, the connection between the nodes represents different operations, and the connection relationship between the nodes in the common unit and the reduction unit is different;
step 1-1: setting the operation pool as O, wherein the operation pool O comprises 8 basic operation operators which are respectively: sep-conv-3X 3, sep-conv-5X 5, dil-conv-3X 3, dil-conv-5X 5, max-pool-3X 3, avg-pool-3X 3, skip-connection and none;
the operation pool O is decomposed into two operator subsets O by random sampling 1 And O 2 In which O is 1 And O 2 Satisfy | O 1 |=|O 2 |,|O 1 |+|O 2 I = O and
Figure FDA00039035638600000113
O 1 and O 2 Respectively used for constructing two sub-networks;
two groups of channels are sampled from input channels of the whole network and are respectively adopted by two sub-networks, and the two sub-networks are finally combined into one sub-network through addition operation; for two different nodes x in a basic unit of a super network i To x j The information dissemination of (c), described as:
Figure FDA0003903563860000011
Figure FDA0003903563860000012
Figure FDA0003903563860000013
wherein x is i And x j Represent different nodes, and 0 ≦ i<j≤5,
Figure FDA0003903563860000014
And &>
Figure FDA0003903563860000015
Each represents O 1 And O 2 Middle and different exercisesMaking a weight;
Figure FDA0003903563860000016
And &>
Figure FDA0003903563860000017
Are two sets of channel sample masks, the masks consisting of only 0 and 1;
Figure FDA0003903563860000018
And &>
Figure FDA0003903563860000019
Respectively representing selected and unselected channels;
Figure FDA00039035638600000110
And &>
Figure FDA00039035638600000111
Two groups of selected channels are adopted by two operation operator subsets simultaneously;
the super network covers all the frameworks in the form of two parallel paths;
step 1-2: in the process of training the super network, selectively activating each path to participate in training by using binary gating;
for two nodes x in a basic unit i To x j The binary gating of the super network is described as:
Figure FDA00039035638600000112
wherein the value of gate1 and gate2 is 0 or 1, excluding the situation that gate1 and gate2 are 0 at the same time; binary gating operation carries out value taking in a random sampling mode to selectively activate corresponding paths to participate in training;
step 2: utilizing a sigmoid function to carry out search space serialization and redefine two sub-networks;
Figure FDA0003903563860000021
Figure FDA0003903563860000022
where δ (·) represents a sigmoid function, which is calculated as follows:
Figure FDA0003903563860000023
and step 3: optimizing the super network by using a gradient descending mode to obtain an optimal basic unit comprising a common unit and a reduction unit;
finding the optimal alpha by jointly optimizing the network parameter w and the structure parameter alpha to determine the optimal basic unit:
Figure FDA0003903563860000024
s.t.w * (α)=arg minL train (w,α)
wherein L is train For training loss, L val In order to verify the loss, cross entropy loss is adopted for both training loss and verification loss;
after obtaining the structural parameter α, according to one-hot encoding:
Figure FDA0003903563860000025
selecting two operations with the maximum alpha value as the input of the intermediate node of the basic unit;
and 4, step 4: and (3) stacking the basic units obtained in the step (2) to obtain a required deep neural network, and retraining the deep neural network until the network converges.
2. The parallel differentiable neural network architecture searching method according to claim 1, wherein the super network optimization process in step 3 is a network formed by stacking 6 normal units and 2 reduction units, wherein the 2 reduction units are respectively located at 1/3 and 2/3 of the total depth of the network.
3. The method according to claim 1, wherein the deep neural network required in step 4 is a deep neural network for CIFAR-10, and is formed by stacking 20 basic units, wherein each basic unit comprises 2 reduction units and 18 normal units.
4. The parallel differentiable neural network architecture searching method according to claim 1, wherein the deep neural network required in the step 4 is a deep neural network for ImageNet, and is formed by stacking 12 basic units, wherein each basic unit comprises 2 reduction units and 12 common units, and the 2 reduction units are respectively located at 1/3 and 2/3 of the total depth of the network.
CN202211299553.2A 2022-10-23 2022-10-23 Parallel differentiable neural network architecture searching method Active CN115906935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211299553.2A CN115906935B (en) 2022-10-23 2022-10-23 Parallel differentiable neural network architecture searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211299553.2A CN115906935B (en) 2022-10-23 2022-10-23 Parallel differentiable neural network architecture searching method

Publications (2)

Publication Number Publication Date
CN115906935A true CN115906935A (en) 2023-04-04
CN115906935B CN115906935B (en) 2024-10-29

Family

ID=86490625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211299553.2A Active CN115906935B (en) 2022-10-23 2022-10-23 Parallel differentiable neural network architecture searching method

Country Status (1)

Country Link
CN (1) CN115906935B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117953296A (en) * 2024-02-01 2024-04-30 华东交通大学 Neural network architecture searching method for remote sensing image classification

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361680A (en) * 2020-03-05 2021-09-07 华为技术有限公司 Neural network architecture searching method, device, equipment and medium
CN114359109A (en) * 2022-01-12 2022-04-15 西北工业大学 Twin network image denoising method, system, medium and device based on Transformer
WO2022121100A1 (en) * 2020-12-11 2022-06-16 华中科技大学 Darts network-based multi-modal medical image fusion method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361680A (en) * 2020-03-05 2021-09-07 华为技术有限公司 Neural network architecture searching method, device, equipment and medium
WO2022121100A1 (en) * 2020-12-11 2022-06-16 华中科技大学 Darts network-based multi-modal medical image fusion method
CN114359109A (en) * 2022-01-12 2022-04-15 西北工业大学 Twin network image denoising method, system, medium and device based on Transformer

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117953296A (en) * 2024-02-01 2024-04-30 华东交通大学 Neural network architecture searching method for remote sensing image classification

Also Published As

Publication number Publication date
CN115906935B (en) 2024-10-29

Similar Documents

Publication Publication Date Title
CN110473592A (en) The multi-angle of view mankind for having supervision based on figure convolutional network cooperate with lethal gene prediction technique
CN115906935A (en) Parallel differentiable neural network architecture searching method
CN106647272A (en) Robot route planning method by employing improved convolutional neural network based on K mean value
CN111738477A (en) Deep feature combination-based power grid new energy consumption capability prediction method
CN113297429A (en) Social network link prediction method based on neural network architecture search
CN109800517A (en) Improved reverse modeling method for magnetorheological damper
CN110110447B (en) Method for predicting thickness of strip steel of mixed frog leaping feedback extreme learning machine
CN110972174A (en) Wireless network interruption detection method based on sparse self-encoder
CN113095479B (en) Multi-scale attention mechanism-based extraction method for ice underlying structure
CN117633512A (en) Reservoir porosity prediction method based on one-dimensional convolution gating circulation network
Tahmasebi et al. Comparison of optimized neural network with fuzzy logic for ore grade estimation
CN107256453B (en) Capillary quality forecasting method based on improved ELM algorithm
CN114897139B (en) Bearing fault diagnosis method for ordered stable simplified sparse quantum neural network
CN115953902A (en) Traffic flow prediction method based on multi-view space-time diagram convolution network
CN115062759A (en) Fault diagnosis method based on improved long and short memory neural network
CN115438784A (en) Sufficient training method for hybrid bit width hyper-network
CN116206304A (en) Tomato leaf disease identification method based on improved convolutional neural network
CN112801264B (en) Dynamic differentiable space architecture searching method and system
Hu et al. A classification surrogate model based evolutionary algorithm for neural network structure learning
Zhao et al. Optimizing radial basis probabilistic neural networks using recursive orthogonal least squares algorithms combined with micro-genetic algorithms
CN112860882A (en) Book concept front-rear order relation extraction method based on neural network
CN112348275A (en) Regional ecological environment change prediction method based on online incremental learning
CN118196600B (en) Neural architecture searching method and system based on differential evolution algorithm
Guo et al. Extracting fuzzy rules based on fusion of soft computing in oil exploration management
Li et al. Lasso regression based channel pruning for efficient object detection model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant