Jan 5, 2024 · We introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture-of-experts (MoE) architecture.
Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across general ...
We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math).
Sep 24, 2024 · To address this issue, we introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture ...
Jan 15, 2024 · We introduce a novel approach, Parameter-Efficient Sparsity Crafting (PESC), which transitions dense models to sparse models using a Mixture of Experts (MoE) ...
The instruction tuning process of the sparse models utilizing the PESC method would require more GPU memory and computation time com- pared to dense models.
This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal increase in ...
Jan 5, 2024 · Using PESC during instruction tuning, the best sparse model outperforms other sparse and dense models and exhibits superior general ...
We implement the PESC method for instruction tuning across general tasks, achieving significant performance on various benchmarks. We develop Camelidae models, ...
Jan 5, 2024 · The paper introduces Parameter-Efficient Sparsity Crafting (PESC), a method that efficiently scales LLMs using a Mixture-of-Experts (MoE) ...