MASS: MoErging through Adaptive Subspace Selection

Warning

Codebase undergoing a refactoring. Use with caution.

👨🏻‍💻 Development installation

Setup the development environment:

    git clone https://github.com/crisostomi/mass.git
    cd mass
    uv sync

and you are ready to go! Fast, right?

🚀 Getting Started

We provide script to replicate all the experiments in the paper and the baselines.

Warning

We ran our experiments using an NVIDIA A100 64 GB. This hardware is not necessary to replicate all the experiments, but is highly recommended especially for the ViT-L-14 encoder based experiments. Exception made for WeMoE, to the best of our knowledge, an NVIDIA GeForce RTX 2080 Ti with 11 GB of RAM is sufficient for ViT-B-32 based experiments for all supported methods.

🥤 Scripts

Below is the folder structure for the scripts provided in this repository, along with a brief description of their purpose:

src/mass/scripts/
├── evaluate_pipeline.py       # MASS evaluation script
├── train_router.py            # Train an MLP router for task classification
├── evaluate_smile.py          # SMILE evaluation script
├── evaluate_we_moe.py         # WeMoE evaluation script
├── finetune.py                # Fine-tuning checkpoints script
├── interpret.py               # Interpretability analysis of task vectors
├── evaluate_static_merging.py # Evaluate traditional merging methods
└── evaluate_task_classification.py # Task classification evaluation

To run the scripts, use uv run src/scripts/<script_name>.py from the root folder. However, you might want to check the current configuration first in the Configuration Structure section and adapt it to your needs.

💡 Examples of commands for experiments in the paper:

To run and test out method, MASS, on the 8 tasks benchmark with a ViT-B-32:

    uv run evaluate benchmark=n8 nn/module=mass nn/encoder=b32

To try out a static merging method, e.g. TSV-M, using the ViT-B-16 encoder on the 20 tasks benchmark:

    uv run static_merge merger=tsv nn/encoder=b16 benchmark=n20

🤗 Models and data

You don't have to worry about anything! Models will be downloaded automatically from Donato Crisostomi's Hugging Face page, while datasets will be downloaded from fusion bench's page. For more details, please refer to their paper at the end of this document.

We note that we produced new checkpoints for all the models used in the paper (CLIP architectures). We did so to solve an annoying bug that was coming with the DTD checkpoint that has been probably trained on an unknown split. For coherence with previous literature, we kept the specif parameters per dataset suggested in the original paper.

⭐️ Contribute

We welcome contributions to this project! If you have suggestions for improvements, bug fixes, or new features, please open an issue or submit a pull request.

Configuration Structure

conf/
├── benchmark/                # 8-14-20 Tasks Benchmark
│   ├── n8.yaml               
│   ├── n14.yaml              
│   └── n20.yaml              
├── hydra/                    # Hydra configurations
├── merger/                   # Static merging options
├── dataset/                  
├── nn/                       
│   ├── encoder/              # b32, b16, l14 Encoders 
│   │   ├── b32.yaml          
│   │   ├── b16.yaml          
│   │   └── l14.yaml          
│   └── module/               # Available methods
│       ├── aggregator/       # Deprecated to be removed
│       ├── router/           # Routers options
│       ├── smile.yaml
│       ├── mass.yaml         # MASS model config
│       ├── task.yaml 
│       └── we_moe.yaml       
├── train/                    # Training configurations
├── static_merging.yaml       # Static merging config
├── task_vectors.yaml         # Main Config
└── train_router.yaml         # Train MLP router config

📚 References

@misc{tang2024fusionbenchcomprehensivebenchmarkdeep,
      title={FusionBench: A Comprehensive Benchmark of Deep Model Fusion}, 
      author={Anke Tang and Li Shen and Yong Luo and Han Hu and Bo Du and Dacheng Tao},
      year={2024},
      eprint={2406.03280},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.03280}
}

Additional References

Models: Pre-trained and fine-tuned models are available on Donato Crisostomi's HuggingFace page
Datasets: Evaluation datasets are sourced from FusionBench's HuggingFace page

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
cluster		cluster
conf		conf
meta		meta
misc		misc
notebooks		notebooks
results		results
shell_scripts		shell_scripts
src/mass		src/mass
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MASS: MoErging through Adaptive Subspace Selection

👨🏻‍💻 Development installation

🚀 Getting Started

🥤 Scripts

🤗 Models and data

⭐️ Contribute

Configuration Structure

📚 References

Additional References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

crisostomi/mass

Folders and files

Latest commit

History

Repository files navigation

MASS: MoErging through Adaptive Subspace Selection

👨🏻‍💻 Development installation

🚀 Getting Started

🥤 Scripts

🤗 Models and data

⭐️ Contribute

Configuration Structure

📚 References

Additional References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages