Fine-tuning | All parameters in the pre-trained model are updated in the tuning process. This method is by far regarded as an effective practice to achieve state-of-the-art performance on many vision benchmark datasets. However, when vision models continue to scale up, this fine-tuning method becomes less practicable due to the storage and training overhead. | CNN: VGGNet [205], Inception [215], ResNet [80], EfficientNet [217], C3D [222], I3D [21], S3D [250], X3D [59] Transformer: ViT [49], DeiT [221], TNT [73], T2T [271], PVT [237], Swin-Vit [146], Video Swin Transformer [147], CPVT [39] CNN and Transformer: Shuffle [90], CMT [71], VOLO [272] |
Prompt Tuning | Prompt tuning unifies all downstream tasks into pre-trained tasks via designing a specific template to fully exploit the capabilities of foundation models. Prompt tuning usually learns few parameters and keeps pre-trained models frozen. In addition, the core mechanism of the vision prompts aims at exploiting the potential of the upstream pre-trained model, so that the upstream pre-trained model can perform the downstream task as well as possible with some or fewer labeled data. | Vision-driven Prompt: VPT [94], S-Prompting[239], DePT [64], ZegCLIP [298], ACT [48], PViT [83], TeCoA [156], EVP [247], ProSFDA [89], APT [11], PAT [267], LPT [47], PointCLIP [282], P2P [242], PromptGen [245], NOAH [288], PGN [148], FPTrans [278], FRPT [235], RePro [62], ViLD [68], LION [231] Language-driven Prompt: CoOp [296], SubPT [153], MPA [28], ZegOT [109], X-CLIP [164], ProGrad [299], Berg et. al [8], PTP [285], LANIT [170], SgVA-CLIP [175], LASP [17], DualCoOp [210], PLOT [25], CPL [82], DeFo [230], GALIP [218], CoCoOp [295], PointCLIP V2 [300] Vision-language Prompt: UPT [275], DPT [252], MaPLe [106], MVLPT [200], MetaPrompt [292], TPT [203] |
Adapter Tuning | Adapter tuning is a class of techniques that inserts additional trainable parameters into a pre-trained model frozen to facilitate learning for downstream tasks. The advantage of this method is its lightweight nature and ease of plug-and-play insertion into the middle of a pre-trained network, making it widely applicable in many visual tasks. | Sequential Adapter: Res-adapt [186], EPM [187], DAN [192], LST [212], Conv-Adapter [26], Polyhistor [145], Pro-tuning [165], AMixer [185], Fit [204], TINA [158], RepAdapter [150], BDTL [123], ViTDet [122], Florence [270], SND [233], MK-Adapter [280], ADA [55], AIM [261], ST-Adapter [166], PEA [199], CAOA [224], HA [108], CLIP-Adapter [63], Tip-Adapter [281], BALLAD [154], MAGMA [53], VL-Adapter [213], Hierarchical3D [169], HyperPELT [290], SVL-Adapter [168], LAVISH [129], CrossModal-Adapter [95], MV-Adapter [277] Parallel Adapter: ViT-Adapter [36], PESF-KD [184], AdaptMLP [31], Convpass [98], AMA [268], UniAdapter [149] Mix Adapter: Consolidator [75], ETT [253], PATT [81], PALT [227], TVG [202], VQT [225] |
Parameter Tuning | Parameter tuning aims to directly modify the model parameters (i.e., weight and bias). They can be grouped into three categories: bias part, weight part, and both. Common modification schemes can be addition, decomposition, or without extra parameters (i.e., directly tune part of parameters). Representative methods are bias tuning, LoRA, and Compacter. | Bias Part: Bitfit [274], Side Adapter [255], AdapterBias [60], DP-BiTFiT [15] Weight Part: LoRA [87], MoSA [111], DyLoRA [227] DnA [96], Compacter [102], KAdaptation [81], PHM [276], PHNNs [67], TARP [85], FacT [99], KronA [52], DLDR [121], Aurora [232] Weight and Bias: SSF [125] |
Remapping Tuning | Remapping-based tuning is a novel approach that involves transferring the learned knowledge of a pre-existing model to a new downstream model. This technique has shown promising results in improving the performance of downstream models and can be categorized into three different types according to the use of the pre-trained model. | Knowledge Distillation: KD [84], Fitnet [191], Student [27], DFA [69], AdaIN [259], Normalized KD [254], Heterogeneous KD [172], DeiT [221], Manifold KD [76], Paraphrasing KD [107], RKD [171], AKDNet [141], SemCKD [23], HKD [297], Review [30], DKD [291] Weight Remapping: Net2Net [32], EAS [18], N2N Learning [5], NASH [54], Path-level EAS [19], FNA [57], FNA++ [58] Architecture Remapping: DARTS [131], DATA [22], DATA-GS [283], P-DARTS [34], DARTS+ [126], SGAS [118], SNAS [251], MiLeNAS [77], DARTS- [40] |