Day 1
Day 1
Day 1
1
Brief Contents
A brief Conclusions
What are
Brief What are history of Current Live demo and future Questions
some of its
overview of large language state-of- of a 13 directions Answering
major
Natural language models. the-art in billion for LLMs session.
shortcomings?
Language models (or LLMs. parameter How can AI be a
Processing LLMs)? Traditional model. potential danger
Language models. What future
(NLP) for the society?
solutions can solve
Architectures of
those problems? What steps can we
some popular
take?
LLMs
Conclusions and
What is pre-
future directions.
training, fine-
tuning and
prompting?
2
A field of artificial intelligence
(AI) that focuses on the
interaction between computers
and natural language.
Natural development of
Language • algorithms,
Processing • models,
• techniques
(NLP)
enable computers to
• understand,
• classify,
• extract,
• generate natural language.
3
Tasks in NLP
Entailment
Text Understanding Extracting dependency graphs based on grammar
Sentiment Analysis
Text Classification Detecting Spam messages
Keywords extraction
Information Extraction Entity extraction (Dates, Locations, Persons, Organizations, Objects)
Question Answering
Abstractive Summarization
Text Generation Dialogue systems
Creative writing
4
Language Models (LMs)
TFIDF on n-grams.
Traditionally used statistical models. HMM (Hidden Markov Models).
CRF (Conditional Random Fields).
5
Brief history of language models -1
6
Brief history of language models -2
I grew up in France.
• Recurrent Neural Networks (RNNs) When I was younger,
• LSTM (Long Short-term memory units) in 1997. I used to regularly run
to the shops to buy
• GRU (Gated recurrent units) in 2014. fresh baguettes for breakfast.
• Working principle: I speak fluent __?__.
1. https://dl.acm.org/doi/10.1162/neco.1997.9.8.1735 7
Brief history of language models -3
8
Brief history of language
models -4
• Transformers is the final answer. In 2017.
• Initially built for translation.
• BERT (Bidirectional Encoder Representations from
Transformers)
• GPT (Generative Pretrained Transformers)
• Working principle:
• The size of hidden representation is fixed.
• Key component is the attention mechanism.
• This allows for a much larger context.
• Problems
• Limited input sequence length.
9
Vaswani et al. 2017 https://papers.neurips.cc/paper/7181-attention-is-all-you-need
Transformer Architecture
https://jalammar.github.io/illustrated-transformer/ 10
1. BERT – Bidirectional Encoder Representations from Transformers.
In 2019.
BERT 2. It uses 12 stacks of encoder layers.
https://towardsdatascience.com/understanding-masked-language-models-mlm-and-causal-language-models-clm-in-nlp-194c15f56a5
https://www.scaler.com/topics/nlp/bert-next-sentence-prediction/ 12
Architecture of GPT
1. GPT - Generative Pretrained Transformers. In 2018.
2. Uses 12 stacks of the transformer decoder.
3. GPT is also pre-trained first and then fine-tuned for a
task.
4. GPT does not have bidirectional context.
5. The pre-training task is Causal Language Model
(CLM).
6. GPT-2 in addition to pre-training employs meta-
learning where it is fine-tuned on several data rich
tasks.
HuggingFace, https://huggingface.co/blog/large-language-models 14
Prompting and Prompt-Engineering
15
Types of prompts (zero-shot)
16
Example of zero-shot prompts
17
Liu et al. https://dl.acm.org/doi/full/10.1145/3560815
Types of prompts (one-shot)
18
Types of prompts (Chain-of-thought)
19
Types of prompts (Auto-prompting)
20
History of LLMs
21
Public leak of Meta's LLaMA model
LLMs 100%
92% 93%
100%
Percentage of ChatGPT
80% 76%
• LLM performance - April 2023
68%
• Relative performance over 8 tasks compared to GPT-4
(leading commercial LLM) 60%
MythicAI, https://mythic.ai/ 24
Quantization of Large LMs
25
LIVE demo of VICUNA 4bit quantized
WebUI: https://github.com/oobabooga/text-generation-webui
Vicuna Model: https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g 26
LIVE demo of VICUNA 4bit quantized
(Already asked some questions)
• Define large language models.
• Large Language Models (LLMs) are neural network architectures designed for natural language processing tasks such as text generation, question
answering, and machine translation. These models are typically trained on vast amounts of data, often in the billions of words or more, which
allows them to capture complex patterns and relationships within language. LLMs use techniques like deep learning and recurrent neural
networks to analyze and generate text, enabling them to understand context and produce coherent, human-like responses. Some popular
examples of LLMs include GPT-3, BERT, and XLNet.
• Does himalayas have the highest peak in the world? And in which country is the peak located in?
• Yes, Mount Everest, which is located in Nepal, is considered the tallest mountain in the world and is part of the Himalayan range. It stands at a
height of 8,848 meters (29,029 feet).
• What does Indian youth need to follow to make the country the new global economic leader?
• To help India become a new global economic leader, its youth should focus on developing skills in areas such as technology, innovation,
entrepreneurship, and creativity. They should also stay up-to-date with emerging trends and technologies, build strong networking connections
both domestically and internationally, and be open to collaborations and partnerships. Additionally, they should work towards building a strong
understanding of economics, finance, and business principles, and strive to create sustainable solutions for the future. Finally, they must maintain a
positive attitude, be resilient in the face of challenges, and always look for opportunities to grow and contribute positively to society.
27
Conclusions (1)
28
Conclusions (2)
29
Conclusions (3)
30
Conclusions (4)
Website: https://sites.google.com/view/arpan-mandal/
32