2024
pdf
bib
abs
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
|
Hanbo Zhang
|
Jiani Zheng
|
Jiangnan Xia
|
Guoqiang Wei
|
Yang Wei
|
Yuchen Zhang
|
Tao Kong
|
Ruihua Song
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Recent advancements in GPT-4V have displayed remarkable multi-modal capabilities in processing image inputs and following open-ended instructions. Despite these advancements, there is considerable scope for enhancing open-source multi-modal LLMs, especially in terms of multi-modal understanding accuracy and instruction-following proficiency. In this paper, we conduct a comprehensive study on training GPT4-style models. We introduce Lynx a multi-modal LLM developed through a series of controlled experiments comparing various model variants. This process allowed us to identify and implement an optimal training strategy tailored for multi-modal LLMs. In addition to our model development, we propose a plug-and-play technique designed to augment the instruction-following capabilities of multi-modal LLMs. We have validated the performance of Lynx on multiple benchmarks. Results demonstrate that Lynx not only achieves strong image understanding accuracy but also excels in instruction-following tasks, paving the path for ongoing enhancements in multi-modal LLMs.
2019
pdf
bib
abs
Incorporating External Knowledge into Machine Reading for Generative Question Answering
Bin Bi
|
Chen Wu
|
Ming Yan
|
Wei Wang
|
Jiangnan Xia
|
Chenliang Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Commonsense and background knowledge is required for a QA model to answer many nontrivial questions. Different from existing work on knowledge-aware QA, we focus on a more challenging task of leveraging external knowledge to generate answers in natural language for a given question with context. In this paper, we propose a new neural model, Knowledge-Enriched Answer Generator (KEAG), which is able to compose a natural answer by exploiting and aggregating evidence from all four information sources available: question, passage, vocabulary and knowledge. During the process of answer generation, KEAG adaptively determines when to utilize symbolic knowledge and which fact from the knowledge is useful. This allows the model to exploit external knowledge that is not explicitly stated in the given text, but that is relevant for generating an answer. The empirical study on public benchmark of answer generation demonstrates that KEAG improves answer quality over models without knowledge and existing knowledge-aware models, confirming its effectiveness in leveraging knowledge.
2018
pdf
bib
abs
Jiangnan at SemEval-2018 Task 11: Deep Neural Network with Attention Method for Machine Comprehension Task
Jiangnan Xia
Proceedings of the 12th International Workshop on Semantic Evaluation
This paper describes our submission for the International Workshop on Semantic Evaluation (SemEval-2018) shared task 11– Machine Comprehension using Commonsense Knowledge (Ostermann et al., 2018b). We use a deep neural network model to choose the correct answer from the candidate answers pair when the document and question are given. The interactions between document, question and answers are modeled by attention mechanism and a variety of manual features are used to improve model performance. We also use CoVe (McCann et al., 2017) as an external source of knowledge which is not mentioned in the document. As a result, our system achieves 80.91% accuracy on the test data, which is on the third place of the leaderboard.