Nothing Special   »   [go: up one dir, main page]

Aya 23: New Open-Source Multilingual Language Models by Cohere

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

To read more such articles, please visit our blog https://socialviews81.blogspot.

com/

Aya 23: New Open-Source Multilingual Language Models by


Cohere

Introduction

Multilingual Language Models (MLLMs) are pioneering a new frontier in


the field of artificial intelligence, revolutionizing our global linguistic
interactions. They are transforming how we interact across the globe’s
tapestry of languages, making AI more accessible to people around the
world. The journey of MLLMs has been marked by significant
advancements, particularly in the field of Natural Language Processing
(NLP). These models are designed to understand, interpret, and
generate text in multiple languages, breaking down language barriers
and fostering global communication.

Yet, the path of progress is not without its challenges. One of the most
pressing issues is the performance disparity across languages,
particularly for those less commonly spoken. Most of the progress in
large language modeling has been English-centric, leading to models
which perform poorly outside of a handful of languages. This is a

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

significant hurdle in the journey of advancements of multilingual


language models.

Aya 23 emerges as a beacon in this landscape, addressing these


challenges head-on. It is a product of Cohere for AI, the non-profit
research arm of the Canadian enterprise AI startup Cohere. Cohere’s
mission is to democratize language AI, making it accessible and useful
across various industries. The development of Aya 23 was driven by the
vision to create a model that not only understands but also generates
language with unprecedented accuracy and fluency across multiple
languages. Their goal was to create a powerful multilingual large
language model that could serve a significant portion of the world’s
population, thus propelling the AI field forward.

What is Aya 23?

Aya 23 is a sophisticated family of multilingual language models


(MLLMs) that serves 23 languages, thereby expanding the horizons of
language modeling to nearly half of the world’s population.

Model Variants

The Aya 23 family comprises two main variants, each designed to cater
to different needs:

1. Aya-23-8B: Tailored for the everyday developer, this variant


features 8 billion parameters. It is optimized for generating
accurate, contextually relevant text across supported languages
and requires fewer resources than the larger model.
2. Aya-23-35B: With 35 billion parameters, this variant offers
enhanced performance for complex multilingual tasks, maintaining
consistency and coherence in the generated text.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

Key Features of Aya 23

Aya 23 boasts several unique features that set it apart:

● It is designed to significantly enhance multilingual capabilities in


NLP.
● It supports 23 languages, including Arabic, Chinese (simplified &
traditional), Czech, Dutch, English, French, German, Greek,
Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian,
Polish, Portuguese, Romanian, Russian, Spanish, Turkish,
Ukrainian, and Vietnamese.
● It outperforms its predecessor, Aya 101, as well as other widely
used models like Gemma, Mistral, and Mixtral on an extensive
range of discriminative and generative tasks.

source - technical report document

● It features an optimized transformer architecture and Instruction


Fine-Tuning (IFT), enabling it to follow human instructions
effectively and generate text with high accuracy and coherence.

Capabilities/Use Case of Aya 23

Aya 23 is not just a multilingual language model; it’s a tool that can
revolutionize various sectors with its high precision and extensive
linguistic coverage. Here are some plausible use cases:

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

● Advanced Translation Services: With its ability to understand


and generate text in 23 languages, Aya 23 can be used to build
advanced translation services. It can provide more accurate and
contextually relevant translations than traditional models, making
cross-language communication seamless.
● Customer Support: Aya 23 can be integrated into customer
support systems to provide multilingual support. It can understand
customer queries in various languages and generate appropriate
responses, improving the efficiency and effectiveness of customer
service.
● Language Learning Applications: Aya 23 can be used in
language learning applications to provide accurate translations and
language exercises. It can help users learn new languages more
effectively.
● Multilingual Chatbots: Aya 23 can power chatbots that can
interact with users in multiple languages. This can enhance user
experience and make the chatbots more user-friendly.

How does Aya 23 work?/ Architecture/Design

Aya 23 is surpassing its predecessor, Aya, on many tasks. While Aya


was a generative language model proficient in 101 languages, Aya 23
adopts a more focused strategy. It prioritizes depth, dedicating greater
computational power to a select group of languages during the
pre-training phase. This approach not only enhances the model’s
performance but also synergizes with the Aya collection to form a robust
multilingual large language model.

The core architecture of Aya 23 is a refined version of the decoder-only


Transformer design. This architecture scrutinizes each word to ascertain
its intent and context, enabling Aya 23 to deliver responses with higher
accuracy compared to models based on older methodologies. The

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

decoder-only Transformer is integral to Aya 23, empowering it to


comprehend and articulate text fluidly across a multitude of languages.

Training and Fine-tuning

All base models of Aya 23 are trained using Fax, a Jax-based distributed
training framework on TPU v4 chips. A combination of parallelism
strategies is used to ensure high training throughput.

The pre-trained models are fine-tuned using multilingual instruction data.


The fine-tuning datasets combine a range of approaches to improve data
availability, including multilingual templates, human annotations,
translated data, and synthetic data. The models are fine-tuned for
13,200 update steps using an 8192 context length with data packing
enabled.

The examples used to instruction-tune Aya 23 are formatted using


special tokens to include extra information. This formatting is used both
during instruction-tuning and inference.

Performance Evaluation with Other Models

Numerous performance evaluation tests were conducted to assess the


performance of Aya23.

In the arena of discriminative tasks, as illustrated above in graph under


'key features' section, the prowess of Aya 23 models was put to the test
with challenges like XWinograd, XCOPA, and XStoryCloze. The larger
Aya-23-35B variant demonstrated its superiority, achieving an
impressive average accuracy of 70.8%. Meanwhile, the Aya-23-8B
variant led its category, securing an average accuracy of 67.6%.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

source - technical report document

When it came to general language comprehension, the models


underwent evaluation via the Multilingual MMLU benchmark. As shown
in table above, the Aya-23-8B model stood out among its peers,
recording an average accuracy of 48.2% across languages. The
Aya-23-35B model, on the other hand, edged out the Mixtral-8x7B-Inst
with an average accuracy of 58.2% versus 57.1%.

source - technical report document

The models’ multilingual mathematical reasoning was assessed using


the MGSM benchmark, as shown in table above, where both Aya 23
variants outshone their respective baselines. The Aya-23-8B scored an
average of 36.6 over seven languages, and the Aya-23-35B surpassed
the Mixtral-8x7B-Instruct-v0.1 with a score of 53.7.

In generative tasks, such as translation and summarization, the Aya 23


models excelled significantly. The Aya-23-8B variant achieved an
average spBleu score of 37.2 in translation and a RougeL score of 27.5

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

in summarization. The Aya-23-35B variant outperformed the


Mixtral-8x7B by a margin of 7.8 spBleu (scoring 40.4 against 32.6) in
translation and 23.8 (scoring 30.9 against 7.1) in summarization.

The Multilingual Edge: Aya-23-8B’s Superior Performance


Landscape

When we examine Aya-23-8B alongside Mistral-7B-Instruct-v0.2 and


Gemma-1.1-7B-it, distinct differences become apparent.
Mistral-7B-Instruct-v0.2 represents an instruction fine-tuned iteration of
the Mistral-7B-v0.2 model, featuring a 32k context window and
incorporating Grouped-Query Attention and Byte-fallback BPE tokenizer,
but it does not utilize Sliding-Window Attention. Conversely,
Gemma-1.1-7B-it is an instruction fine-tuned model that leverages the
architectures, data, and training methodologies of the Gemini models. It
has been trained on 6T tokens from web documents, mathematics, and
code, predominantly in English, and is characterized as a lightweight,
decoder-only large language model trained with an innovative RLHF
method.

In contrast, Aya-23-8B stands out as a multilingual instruction-tuned


language model that supports 23 languages, drawing from Cohere’s
Command model framework. Performance-wise, Aya-23-8B surpasses
both Mistral-7B-Instruct-v0.2 and Gemma-1.1-7B-it across a broad
spectrum of discriminative and generative tasks. Remarkably, it achieves
this feat despite its relatively smaller size, outperforming larger models in
over half of the languages it supports.

Aya-23-8B distinguishes itself with its multilingual capabilities and its


proficiency across diverse tasks. It caters to 23 languages, thereby
extending state-of-the-art language modeling to nearly half of the global
population. This positions Aya-23-8B as a formidable asset for
multilingual language processing endeavors. Its distinctive features and

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

capabilities underscore its role as a pivotal development in the spheres


of AI and multilingual language models.

Access and Usage

The model is open-source, with weights available on Hugging Face for


both the 8B and 35B variants. It can be used locally or online via demo
links, offering flexibility for developers and researchers.

If you are interested to learn more about this AI model, All relevant links
are provided under the 'source' section at the end of this article.

Limitations and Future Work

While Aya 23 is a significant advancement in multilingual language


models, it’s important to acknowledge its existing limitations. The model
extends support to 23 languages, a modest segment of the global
linguistic diversity that encompasses around 7,000 languages. The
scope of languages that Aya 23 encompasses is confined to those
included during its pre-training phase, exhibiting a skew towards
languages that are predominantly spoken in certain geographical areas,
notably leaving many Asian and African languages less represented.

As team build on the groundwork established by the original Aya model,


future endeavors will aim to broaden the linguistic reach and enhance
the performance for the multitude of languages not yet covered.

Conclusion

Aya 23 exemplifies the power of MLLMs to transcend language barriers,


envisioning a future where AI-facilitated communication is as effortless
and intuitive as conversing in our native language. By prioritizing depth
over breadth, it delivers precise and contextually appropriate text
generation across 23 languages.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

Source
cohere website technical paper :
https://cohere.com/research/papers/aya-command-23-8b-and-35b-technical-report-2024-05-23
technical paper document : https://drive.google.com/file/d/1YKBPo61pnl97C1c_1C2ZVOnPhqf7MLSc/view
HF aya-23: https://huggingface.co/spaces/CohereForAI/aya-23
Weights for aya-23 35B: https://huggingface.co/CohereForAI/aya-23-35B
Weights for aya-23 8B: https://huggingface.co/CohereForAI/aya-23-8B
Try out demo for Aya 23 (35B) : https://huggingface.co/spaces/CohereForAI/aya-23

Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or
organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based
on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due
diligence.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/

You might also like