MOSS: An Open Conversational Large Language Model

Tianxiang Sun ORCID: orcid.org/0000-0001-8291-820X¹,
Xiaotian Zhang¹,
Zhengfu He¹,
Peng Li¹,
Qinyuan Cheng¹,
Xiangyang Liu¹,
Hang Yan¹,
Yunfan Shao¹,
Qiong Tang¹,
Shiduo Zhang¹,
Xingjian Zhao¹,
Ke Chen¹,
Yining Zheng¹,
Zhejian Zhou¹,
Ruixiao Li¹,
Jun Zhan¹,
Yunhua Zhou¹,
Linyang Li¹,
Xiaogui Yang ORCID: orcid.org/0009-0002-2778-0572¹,
Lingling Wu¹,
Zhangyue Yin¹,
Xuanjing Huang ORCID: orcid.org/0000-0001-9197-9426¹,
Yu-Gang Jiang ORCID: orcid.org/0000-0002-1907-8567¹ &
…
Xipeng Qiu ORCID: orcid.org/0000-0001-7163-5247¹

942 Accesses
2 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 14 September 2024

This article has been updated

Abstract

Conversational large language models (LLMs) such as ChatGPT and GPT-4 have recently exhibited remarkable capabilities across various domains, capturing widespread attention from the public. To facilitate this line of research, in this paper, we report the development of MOSS, an open-sourced conversational LLM that contains 16 B parameters and can perform a variety of instructions in multi-turn interactions with humans. The base model of MOSS is pre-trained on large-scale unlabeled English, Chinese, and code data. To optimize the model for dialogue, we generate 1.1 M synthetic conversations based on user prompts collected through our earlier versions of the model API. We then perform preference-aware training on preference data annotated from AI feedback. Evaluation results on real-world use cases and academic benchmarks demonstrate the effectiveness of the proposed approaches. In addition, we present an effective practice to augment MOSS with several external tools. Through the development of MOSS, we have established a complete technical roadmap for large language models from pre-training, supervised fine-tuning to alignment, verifying the feasibility of chatGPT under resource-limited conditions and providing a reference for both the academic and industrial communities. Model weights and code are publicly available at https://github.com/OpenMOSS/MOSS.

Article PDF

Conversational Agent Development Through Large Language Models: Approach with GPT

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Article 27 November 2023

AQLoRA: An Adaptive Quantization-Based Efficient Fine-Tuning Method for LLMs

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Change history

17 September 2024
An Erratum to this paper has been published: https://doi.org/10.1007/s11633-024-1527-z
14 September 2024
An Erratum to this paper has been published: https://doi.org/10.1007/s11633-024-1527-z

References

W. X. Zhao, K. Zhou, J. Y. Li, T. Y. Tang, X. L. Wang, Y. P. Hou, Y. Q. Min, B. C. Zhang, J. J. Zhang, Z. C. Dong, Y. F. Du, C. Yang, Y. S. Chen, Z. P. Chen, J. H. Jiang, R. Y. Ren, Y. F. Li, X. Y. Tang, Z. K. Liu, P. Y. Liu, J. Y. Nie, J. R. Wen. A survey of large language models, [Online], Available: https://arxiv.org/abs/2303.18223, 2023.
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 159, 2020.
J. W. Rae, S. Borgeaud, T. Cai, K. Millican, J. Hoffmann, F. Song, J. Aslanides, S. Henderson, R. Ring, S. Young, E. Rutherford, T. Hennigan, J. Menick, A. Cassirer, R. Powell, G. van den Driessche, L. A. Hendricks, M. Rauh, P. S. Huang, A. Glaese, J. Welbl, S. Dathathri, S. Huang, J. Uesato, J. Mellor, I. Higgins, A. Creswell, N. McAleese, A. Wu, E. Elsen, S. Jayakumar, E. Buchatskaya, D. Budden, E. Sutherland, K. Simonyan, M. Paganini, L. Sifre, L. Martens, X. L. Li, A. Kuncoro, A. Nematzadeh, E. Gribovskaya, D. Donato, A. Lazaridou, A. Mensch, J. B. Lespiau, M. Tsimpoukelli, N. Grigorev, D. Fritz, T. Sottiaux, M. Pajarskas, T. Pohlen, Z. T. Gong, D. Toyama, C. de Masson d’Autume, Y. J. Li, T. Terzi, V. Mikulik, I. Babuschkin, A. Clark, D. de Las Casas, A. Guy, C. Jones, J. Bradbury, M. Johnson, B. Hechtman, L. Weidinger, I. Gabriel, W. Isaac, E. Lockhart, S. Osindero, L. Rimell, C. Dyer, O. Vinyals, K. Ayoub, J. Stanway, L. Bennett, D. Hassabis, K. Kavukcuoglu, G. Irving. Scaling language models: Methods, analysis & insights from training gopher, [Online], Available: https://arxiv.org/abs/2112.11446, 2021.
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. S. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. C. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. W. Zhou, X. Z. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, N. Fiedel. PaLM: Scaling language modeling with pathways. The Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023
Google Scholar
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, O. Vinyals, J. W. Rae, L. Sifre. Training compute-optimal large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 2176, 2022.
A. H. Zeng, X. Liu, Z. X. Du, Z. H. Wang, H. Y. Lai, M. Ding, Z. Y. Yang, Y. F. Xu, W. D. Zheng, X. Xia, W. L. Tam, Z. X. Ma, Y. F. Xue, J. D. Zhai, W. G. Chen, Z. Y. Liu, P. Zhang, Y. X. Dong, J. Tang. GLM-130B: An open bilingual pre-trained model. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023.
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, G. Lample. LLaMA: Open and efficient foundation language models, [Online], Available: https://arxiv.org/abs/2302.13971, 2023.
OpenAI. GPT-4 technical report, [Online], Available: https://arxiv.org/abs/2303.08774, 2023.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, R. Lowe. Training language models to follow instructions with human feedback. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 2011, 2022.
Y. T. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, J. Kaplan. Training a helpful and harmless assistant with reinforcement learning from human feedback, [Online], Available: https://arxiv.org/abs/2204.05862, 2022.
A. Glaese, N. McAleese, M. Trçbacz, J. Aslanides, V. Firoiu, T. Ewalds, M. Rauh, L. Weidinger, M. Chadwick, P. Thacker, L. Campbell-Gillingham, J. Uesato, P. S. Huang, R. Comanescu, F. Yang, A. See, S. Dathathri, R. Greig, C. Chen, D. Fritz, J. S. Elias, R. Green, S. Mokrá, N. Fernando, B. X. Wu, R. Foley, S. Young, I. Gabriel, W. Isaac, J. Mellor, D. Hassabis, K. Kavukcuoglu, L. A. Hendricks, G. Irving. Improving alignment of dialogue agents via targeted human judgements, [Online], Available: https://arxiv.org/abs/2209.14375, 2022.
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei. Scaling laws for neural language models, [Online], Available: https://arxiv.org/abs/2001.08361.
Y. F. Shao, Z. C. Geng, Y. T. Liu, J. Q. Dai, H. Yan, F. Yang, L. Zhe, H. J. Bao, X. P. Qiu. CPT: A pre-trained unbalanced transformer for both Chinese language understanding and generation, [Online], Available: https://arxiv.org/abs/2109.05729, 2021.
L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshima, S. Presser, C. Leahy. The pile: An 800 GB dataset of diverse text for language modeling, [Online], Available: https://arxiv.org/abs/2101.00027, 2020.
Z. Y. Yin, Q. S. Sun, Q. P. Guo, J. W. Wu, X. P. Qiu, X. J. Huang. Do large language models know what they don’t know? In Proceedings of the Findings of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, pp. 8653–8665, 2023. DOI: https://doi.org/10.18653/v1/2023.findings-acl.551.
Google Scholar
Y. T. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Chen, C. Olsson, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, E. Tran-Johnson, E. Perez, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, K. Lukosuite, L. Lovitt, M. Sellitto, N. Elhage, N. Schiefer, N. Mercado, N. DasSarma, R. Lasenby, R. Larson, S. Ringer, S. Johnston, S. Kravec, S. El Showk, S. Fort, T. Lanham, T. Telleen-Lawton, T. Conerly, T. Henighan, T. Hume, S. R. Bowman, Z. Hatfield-Dodds, B. Mann, D. Amodei, N. Joseph, S. McCandlish, T. Brown, J. Kaplan. Constitutional AI: Harmlessness from AI feedback, [Online], Available: https://arxiv.org/abs/2212.08073, 2021.
R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, J. Schulman. WebGPT: Browser-assisted question-answering with human feedback, [Online], Available: https://arxiv.org/abs/2112.09332, 2021.
T. Schick, J. Dwivedi-Yu, R. Dessi, R. Raileanu, M. Lomeli, L. E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom. Toolformer: Language models can teach themselves to use tools. In Proceedings of the 37th Conference on Neural Information Processing Systems, New Orleans, USA, 2023.
G. Mialon, R. Dessi, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, J. Dwivedi-Yu, A. Celikyilmaz, E. Grave, Y. LeCun, T. Scialom. Augmented language models: A survey, [Online], Available: https://arxiv.org/abs/2302.07842, 2023.
X. P. Qiu, T. X. Sun, Y. G. Xu, Y. F. Shao, N. Dai, X. J. Huang. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, vol. 63, no. 10, pp. 1872–1897, 2020. DOI: https://doi.org/10.1007/s11431-020-1647-3.
Article Google Scholar
T. Y. Lin, Y. X. Wang, X. Y. Liu, X. P. Qiu. A survey of transformers. AI Open, vol. 3, pp. 111–132, 2022. DOI: https://doi.org/10.1016/j.aiopen.2022.10.001.
Article Google Scholar
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Y. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Y. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. H. Lu, Y. N. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. X. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. X. Xu, Z. Yan, I. Zarov, Y. C. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom. Llama 2: Open foundation and fine-tuned chat models, [Online], Available: https://arxiv.org/abs/2307.09288, 2023.
J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, W. Fedus. Emergent abilities of large language models. Transactions on Machine Learning Research, vol. 2022, 2022.
N. Stiennon, L. Ouyang, J. Wu, D. M. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, P. Christiano. Learning to summarize from human feedback, [Online], Available: https://arxiv.org/abs/2009.01325, 2020.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms, [Online], Available: https://arxiv.org/abs/1707.06347, 2017.
A. Askell, Y. T. Bai, A. N. Chen, D. Drain, D. Ganguli, T. Henighan, A. Jones, N. Joseph, B. Mann, N. DasSarma, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, J. Kernion, K. Ndousse, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, J. Kaplan. A general language assistant as a laboratory for alignment, [Online], Available: https://arxiv.org/abs/2112.00861, 2021.
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. Language models are unsupervised multitask learners, [Online], Available: https://openai.com/index/better-language-models, 2019.
R. Sennrich, B. Haddow, A. Birch. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, pp. 1715–1725, 2016. DOI: https://doi.org/10.18653/v1/p16-1162.
Chapter Google Scholar
S. Rajbhandari, J. Rasley, O. Ruwase, Y. X. He. ZeRO: Memory optimizations toward training trillion parameter models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE, Atlanta, USA, pp. 1–16, 2020. DOI: https://doi.org/10.1109/SC41405.2020.00024.
Google Scholar
T. Q. Chen, B. Xu, C. Y. Zhang, C. Guestrin. Training deep nets with sublinear memory cost, [Online], Available: https://arxiv.org/abs/1604.06174, 2016.
E. Nijkamp, B. Pang, H. Hayashi, L. F. Tu, H. Wang, Y. B. Zhou, S. Savarese, C. M. Xiong. CodeGen: An open large language model for code with multi-turn program synthesis. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023.
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
Q. Y. Cheng, X. G. Yang, T. X. Sun, L. Y. Li, X. P. Qiu. Improving contrastive learning of sentence embeddings from AI feedback. In Proceedings of the Findings of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, pp. 11122–11138, 2023. DOI: https://doi.org/10.18653/v1/2023.findings-acl.707.
Google Scholar
J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, Q. V. Le. Finetuned language models are zero-shot learners. In Proceedings of the 10th International Conference on Learning Representations, 2022.
V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, A. Raja, M. Dey, M. S. Bari, C. W. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, N. V. Nayak, D. Datta, J. Chang, M. T. J. Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Févry, J. A. Fries, R. Teehan, T. Le Scao, S. Biderman, L. Gao, T. Wolf, A. M. Rush. Multitask prompted training enables zero-shot task generalization. In Proceedings of the 10th International Conference on Learning Representations, 2022.
Y. Z. Wang, S. Mishra, P. Alipoormolabashi, Y. Kordi, A. Mirzaei, A. Arunkumar, A. Ashok, A. S. Dhanasekaran, A. Naik, D. Stap, E. Pathak, G. Karamanolakis, H. G. Lai, I. Purohit, I. Mondal, J. Anderson, K. Kuznia, K. Doshi, M. Patel, K. K. Pal, M. Moradshahi, M. Parmar, M. Purohit, N. Varshney, P. R. Kaza, P. Verma, R. S. Puri, R. Karia, S. K. Sampat, S. Doshi, S. Mishra, S. Reddy, S. Patro, T. Dixit, X. D. Shen, C. Baral, Y. J. Choi, N. A. Smith, H. Hajishirzi, D. Khashabi. Super-naturalinstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE, pp. 5085–5109, 2022. DOI: https://doi.org/10.18653/v1/2022.emnlp-main.340.
Y. Z. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, H. Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, pp. 13484–13508, 2023. DOI: https://doi.org/10.18653/v1/2023.acl-long.754.
Google Scholar
D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. T. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. N. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, J. Clark. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned, [Online], Available: https://arxiv.org/abs/2209.07858, 2022.
I. Loshchilov, F. Hutter. Decoupled weight decay regularization. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.
D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, J. Steinhardt. Measuring massive multitask language understanding. In Proceedings of the 9th International Conference on Learning Representations, 2021.
S. Lin, J. Hilton, O. Evans. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, pp. 3214–3252, 2022. DOI: https://doi.org/10.18653/v1/2022.acl-long.229.
Chapter Google Scholar
S. Gehman, S. Gururangan, M. Sap, Y. Choi, N. A. Smith. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Proceedings of the Findings of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 3356–3369, 2020. DOI: https://doi.org/10.18653/v1/2020.fmdings-emnlp.301.
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, W. Zaremba. Hindsight experience replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 5055–5065, 2017.
T. J. Zhang, F. C. Liu, J. Wong, P. Abbeel, J. E. Gonzalez. The wisdom of hindsight makes language models better instruction followers. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, USA, pp. 41414–41428, 2023.
R. Mihalcea, P. Tarau. TextRank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Barcelona, Spain, pp.404–411, 2004.
Google Scholar
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 10674–10685, 2022. DOI: https://doi.org/10.1109/CV-PR52688.2022.01042.
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62022027). We also extend our gratitude to the Shanghai Artificial Intelligence Laboratory, China, for providing the computational resources.

Author information

Authors and Affiliations

Fudan University, Shanghai, 200438, China
Tianxiang Sun, Xiaotian Zhang, Zhengfu He, Peng Li, Qinyuan Cheng, Xiangyang Liu, Hang Yan, Yunfan Shao, Qiong Tang, Shiduo Zhang, Xingjian Zhao, Ke Chen, Yining Zheng, Zhejian Zhou, Ruixiao Li, Jun Zhan, Yunhua Zhou, Linyang Li, Xiaogui Yang, Lingling Wu, Zhangyue Yin, Xuanjing Huang, Yu-Gang Jiang & Xipeng Qiu

Authors

Tianxiang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengfu He
View author publications
You can also search for this author in PubMed Google Scholar
Peng Li
View author publications
You can also search for this author in PubMed Google Scholar
Qinyuan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yunfan Shao
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Shiduo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xingjian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yining Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhejian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ruixiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Yunhua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Linyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaogui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lingling Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhangyue Yin
View author publications
You can also search for this author in PubMed Google Scholar
Xuanjing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Gang Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xipeng Qiu.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Tianxiang Sun received the B. Eng. degree in software engineering from Xidian University, China in 2019. He is currently a Ph.D. degree candidate in School of Computer Science, Fudan University, China.

His research interests include natural language processing and deep learning.

Xiaotian Zhang received the B. Eng. degree in civil engineering from Tongji University, China in 2021. He received the M. Eng. degree in computer science and technology at Fudan University, China in 2004, under the supervision of Professor Xipeng Qiu.

His research interest is natural language processing.

Zhengfu He received the B. Sc. degree in computer science from Fudan University, China in 2023. He is a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.

His research interests include mechanistic interpretability and large language models.

Peng Li received the B. Eng. degree in data science from East China Normal University, China in 2020. He is now a master student at Fudan University, China, supervised by Professor Xipeng Qiu.

His research interest is foundation models.

Qinyuan Cheng received the B. Eng. degree in computer science from Sun Yat-Sen University, China in 2020. He is a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.

His research interest is large language models.

Xiangyang Liu received the B. Eng. degree in intelligence science and technology from Xidian University, China in 2020. He is now a Ph. D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.

His research interests include language model training, efficient methods and AI alignment.

Hang Yan received the B. Eng. degree in electrical engineering and automation from Fudan University, China in 2015, received the M. Eng. degree in electrical engineer at Columbia University, USA in 2017. He is a Ph.D. degree candidate in computer science from Fudan University, China, under the supervision of Professor XiPeng Qiu.

His research interests include large model training, information extraction, and open-source software development.

Yunfan Shao received the B. Sc. and M. Sc. degrees in computer science from Fudan University, China in 2019 and 2022, respectively. He is a Ph.D. degree candidate at Fudan University, China.

His research interest is large language models.

Qiong Tang received the B. Sc. degree in data science from East China Normal University, China in 2022. She is a master student at Fudan University, China, supervised by Professor Xipeng Qiu.

His research interest is large language models.

Shiduo Zhang received the B. Eng. degree in software engineering from Tongji University, China in 2023. He is now a master student at Fudan University, China, supervised by Professor Xipeng Qiu.

His research interests include foundation models and embodied AI.

Xingjian Zhao received the B. Sc. degree in artificial intelligence from Fudan University, China in 2024. He is now a master student in computer science at Fudan University, China.

His research interest is large language models.

Ke Chen is an open source contributor for open-moss project and moss backend, interested in system software. He is now pursuing the Bachelor’s degree in computer science at Fudan University, China.

His research interests include natural language processing and artificial intelligence

Yining Zheng received the B. Sc. degree in computer science from Fudan University, China in 2019. He is now a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu.

His research interests include large language model training and efficient methods.

Zhejian Zhou received the B. Sc. degree in electronic and information science and technology from the School of Electronics Engineering and Computer Science, Peking University, China. He was a visiting student at the Fudan NLP Group. He is currently a Ph.D. degree candidate in computer science at University of Southern California, USA.

His research interests include artificial intelligence and natural language processing.

Ruixiao Li received the B. Sc. degree in computer science from Fudan University, China in 2024. He is now a Ph.D. degree candidate in computer science at Fudan University, China.

His research interest is large language models.

Jun Zhan received the B. Eng. degree in software engineering from Huazhong University of Science and Technology, China in 2022, and is currently a master student computer science at Fudan University, China.

His research interest is large language models.

Yunhua Zhou received the M. Sc. and Ph.D. degrees in computer science from Fudan University, China in 2019 and 2024, respectively. Currently, He is a researcher at the Shanghai Artificial Intelligence Laboratory, China.

His research interest is large language models.

Linyang Li received the B. Eng. degree in electronical engineering from Fudan University, China in 2019. He is a Ph.D. degree candidate in computer science from Fudan University, China, under the supervision of Professor Xipeng Qiu.

His research interests include large model training, AI safety studies on large language models.

Xiaogui Yang received the B. Sc. and M. Eng. degrees in computer science from Fudan University, China in 2021 and 2024, respectively. Currently, he is an engineer at the Shanghai Artificial Intelligence Laboratory, China.

His research interest is large language models.

Lingling Wu received the B. Sc. degree in computer science from Shanghai JiaoTong University and M. Eng. degree in computer science from Fudan University, China in 2021 and 2024, respectively.

Her research interest is natural language processing.

Zhangyue Yin received the B. Sc. degree in data science from East China Normal University, China in 2021. He is now a Ph.D. degree candidate at Fudan University, China, supervised by Professor Xipeng Qiu and Professor Xuanjing Huang.

His research interests include large language models and machine reasoning.

Xuanjing Huang received the Ph.D. degree in computer science from Fudan University, China in 1998. She is currently a professor at the School of Computer Science, Fudan University, China.

Her research interests include natural language processing and information retrieval, with a particular emphasis on sentiment analysis, information extraction, pre-trained language models, and the robustness and interpretability of NLP.

Yu-Gang Jiang received the Ph.D. degree in computer science from City University of Hong Kong, China in 2009. He is Vice President of Fudan University, China, and a Chang Jiang Scholar Distinguished Professor of Computer Science. He is a Fellow of IEEE and IAPR.

His research interests include multimedia, computer vision, and trustworthy AGI.

Xipeng Qiu received the B. Sc. degree and Ph.D. degrees in computer science from Fudan University, China in 2001 and 2006, respectively. Currently, he is a professor at School of Computer Science, Fudan University, China.

His research interests include natural language processing and deep learning.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

The original version of this article was revised due to a retrospective Open Access order

Reprints and permissions

About this article

Cite this article

Sun, T., Zhang, X., He, Z. et al. MOSS: An Open Conversational Large Language Model. Mach. Intell. Res. 21, 888–905 (2024). https://doi.org/10.1007/s11633-024-1502-8

Download citation

Received: 25 December 2023
Accepted: 15 March 2024
Published: 20 May 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s11633-024-1502-8

MOSS: An Open Conversational Large Language Model

Abstract

Article PDF

Similar content being viewed by others

Conversational Agent Development Through Large Language Models: Approach with GPT

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

AQLoRA: An Adaptive Quantization-Based Efficient Fine-Tuning Method for LLMs

Change history

17 September 2024

14 September 2024

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MOSS: An Open Conversational Large Language Model

Abstract

Article PDF

Similar content being viewed by others

Conversational Agent Development Through Large Language Models: Approach with GPT

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

AQLoRA: An Adaptive Quantization-Based Efficient Fine-Tuning Method for LLMs

Explore related subjects

Change history

17 September 2024

14 September 2024

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation