2024
pdf
bib
abs
GoT: Effective Graph-of-Thought Reasoning in Language Models
Yao Yao
|
Zuchao Li
|
Hai Zhao
Findings of the Association for Computational Linguistics: NAACL 2024
With the widespread use of language models (LMs) in NLP tasks, researchers have discovered the potential of Chain-of-thought (CoT) to assist LMs in accomplishing complex reasoning tasks by generating intermediate steps. However, human thought processes are often non-linear, rather than simply sequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph. By representing thought units as nodes and connections between them as edges, our approach captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes. GoT adopts a two-stage framework with an additional GoT encoder for thought graph representation and fuses the graph representation with the original input representation through a gated fusion mechanism. We evaluate GoT’s performance on a text-only reasoning task (AQUA-RAT) and a multimodal reasoning task (ScienceQA). Our model achieves significant improvement over the strong CoT baseline on the AQUA-RAT test set and boosts accuracy from 85.19% to 87.59% using the T5-base model over the state-of-the-art Multimodal-CoT on the ScienceQA test set. Our code is publicly available at https://github.com/Zoeyyao27/Graph-of-Thought
pdf
bib
abs
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment
Yao Yao
|
Zuchao Li
|
Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2024
The burgeoning size of Large Language Models (LLMs) has led to enhanced capabilities in generating responses, albeit at the expense of increased inference times and elevated resource demands. Existing methods of acceleration, predominantly hinged on knowledge distillation, generally necessitate fine-tuning of considerably large models, such as Llama-7B, posing a challenge for average users. Furthermore, present techniques for expediting inference and reducing costs operate independently. To address these issues, we introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework. This approach leverages a larger LLM as a ”teacher” to create guidance prompts, paired with a smaller ”student” model to finalize responses. Remarkably, GKT requires no fine-tuning and doesn’t necessitate the teacher and student models to have the same vocabulary, allowing for extensive batch generation to accelerate the process while ensuring user customization. GKT can be seamlessly integrated into cloud-edge collaboration architectures, and is versatile enough for plug-and-play application across various models. It excels in both efficiency and affordability, epitomizing a ”cheap and cheerful” solution. GKT achieves a maximum accuracy improvement of 14.18%, along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00 % along with a 7.73 times speed-up in CSQA. When utilizing ChatGPT as teacher model and Llama2-70B as the student model, we can achieve 95.00% of ChatGPT’s performance at 52% of the cost. The results highlight substantial enhancements in accuracy and processing speed on the GSM8K and CSQA datasets, surpassing the performance of using either the student or teacher models in isolation.
pdf
bib
abs
SirLLM: Streaming Infinite Retentive LLM
Yao Yao
|
Zuchao Li
|
Hai Zhao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As Large Language Models (LLMs) become increasingly prevalent in various domains, their ability to process inputs of any length and maintain a degree of memory becomes essential. However, the one-off input of overly long texts is limited, as studies have shown that when input lengths exceed the LLMs’ pre-trained text length, there is a dramatic decline in text generation capabilities. Moreover, simply extending the length of pre-training texts is impractical due to the difficulty in obtaining long text data and the substantial memory consumption costs this would entail for LLMs. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs, but this approach can significantly impair the model’s long-term memory capabilities.Motivated by this challenge, we introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues without the need for fine-tuning. SirLLM utilizes the Token Entropy metric and a memory decay mechanism to filter key phrases, endowing LLMs with both long-lasting and flexible memory. We designed three distinct tasks and constructed three datasets to measure the effectiveness of SirLLM from various angles: (1) DailyDialog; (2) Grocery Shopping; (3) Rock-Paper-Scissors. Our experimental results robustly demonstrate that SirLLM can achieve stable and significant improvements across different LLMs and tasks, compellingly proving its effectiveness. When having a coversation, “A sir could forget himself,” but SirLLM never does! Our code is publicly available at https://github.com/Zoeyyao27/SirLLMhttps://github.com/Zoeyyao27/SirLLM
2023
pdf
bib
abs
Learning Event-aware Measures for Event Coreference Resolution
Yao Yao
|
Zuchao Li
|
Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2023
Researchers are witnessing knowledge-inspired natural language processing shifts the focus from entity-level to event-level, whereas event coreference resolution is one of the core challenges. This paper proposes a novel model for within-document event coreference resolution. On the basis of event but not entity as before, our model learns and integrates multiple representations from both event alone and event pair. For the former, we introduce multiple linguistics-motivated event alone features for more discriminative event representations. For the latter, we consider multiple similarity measures to capture the distinction of event pair. Our proposed model achieves new state-of-the-art on the ACE 2005 benchmark, demonstrating the effectiveness of our proposed framework.
2018
pdf
bib
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Stephen Politzer-Ahles
|
Yu-Yin Hsu
|
Chu-Ren Huang
|
Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
pdf
bib
Changing against tone merging trends in community? The case of C. Y. Leung
Ziqi Chen
|
Yao Yao
|
Alan C. L. Yu
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
pdf
bib
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 25th Joint Workshop on Linguistics and Language Processing
Stephen Politzer-Ahles
|
Yu-Yin Hsu
|
Chu-Ren Huang
|
Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 25th Joint Workshop on Linguistics and Language Processing
pdf
bib
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation
Stephen Politzer-Ahles
|
Yu-Yin Hsu
|
Chu-Ren Huang
|
Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation
2017
pdf
bib
Multi-dimensional Meanings of Subjective Adverbs - Case Study of Mandarin Chinese Adverb Pianpian
Mi Zhou
|
Yao Yao
|
Chu-Ren Huang
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation
2015
pdf
bib
Create a Manual Chinese Word Segmentation Dataset Using Crowdsourcing Method
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing
pdf
bib
A Review of Corpus-based Statistical Models of Language Variation
Yao Yao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
pdf
bib
Mechanical Turk-based Experiment vs Laboratory-based Experiment: A Case Study on the Comparison of Semantic Transparency Rating Data
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation
2014
pdf
bib
Predicting the Use of BA construction in Mandarin Chinese Discourse: A Modeling Study with Two Verbs
Yao Yao
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing
pdf
bib
Exploring Mental Lexicon in an Efficient and Economic Way: Crowdsourcing Method for Linguistic Experiments
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)
pdf
bib
Building a Semantic Transparency Dataset of Chinese Nominal Compounds: A Practice of Crowdsourcing Methodology
Shichang Wang
|
Chu-Ren Huang
|
Yao Yao
|
Angel Chan
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
2010
pdf
bib
A Working Report on Statistically Modeling Dative Variation in Mandarin Chinese
Yao Yao
|
Feng-hsi Liu
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)