ITElec2 Act (Finals)
ITElec2 Act (Finals)
ITElec2 Act (Finals)
IT ELEC2
Activity (Finals)
Literature Review # 1:
Research Problem:
The research problem identified in the document "Application of Artificial Intelligence in the Organization of
Knowledge in Libraries" is centered on the need for modern libraries to effectively organize and manage vast
amounts of information using artificial intelligence (AI). The traditional methods of information organization and
retrieval are becoming inadequate due to the exponential growth of information and the increasing demand for
efficient and accurate information retrieval systems. The paper addresses the challenges libraries face in
organizing their resources and highlights the potential benefits of incorporating AI to improve cataloguing,
classification, indexing, and overall information retrieval processes
Conclusions:
Conclusively, the application of Artificial intelligence in libraries in Nigeria has been seen as anew driving force
for intelligent library management. Librarians have begun to adopt artificial technology in some specific areas
of their respective libraries to meet with current global trends. The novel trends of application of Artificial
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
Intelligence in the library operations in libraries are the following: Expert Systems in Cataloguing, Classification,
Indexing, information retrieval as well as artificial intelligence on Natural Language Processing in Library
Activities, Pattern Recognition in Library Activities and Robotics in the Library Activities. Therefore, AI
application to library services takes off complex and stressful work that humans may encounter lesser error
and defect, aids to access research works, lack of human touch and replacing human involvement.
Reaction:
The application of Artificial Intelligence (AI) in the organization of knowledge in libraries is a timely and
essential development. As libraries evolve from physical spaces to digital repositories, the sheer volume of
information necessitates advanced tools for efficient management. The research highlights the importance of
AI in enhancing information retrieval, cataloguing, classification, and indexing processes. AI can significantly
improve the accessibility and usability of library resources, ensuring that users can find relevant information
quickly and accurately. This transformation is crucial for maintaining the relevance and utility of libraries in the
modern information age.
However, the adoption of AI in libraries also presents several challenges. These include the need for significant
investment in AI technologies, the requirement for staff training, and the potential for resistance to change
among library personnel. Additionally, there are ethical considerations regarding data privacy and the biases
that AI systems might introduce. Despite these challenges, the benefits of AI in streamlining library operations
and improving service delivery make it a worthwhile pursuit.
Literature Review # 2:
1. What are the current reinforcement learning techniques used in generative AI?
2. How do these techniques apply to generative models?
3. What are the challenges and potential future directions in the integration of reinforcement learning with
generative AI?
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
This paper provides a comprehensive overview of the intersection of reinforcement learning and generative AI,
highlighting both the current achievements and the substantial challenges that lie ahead. The detailed
discussion on model-free and model-based approaches offers valuable insights into the strengths and
limitations of these techniques. The emphasis on future research directions, particularly the integration of
different methods and the need for better computational efficiency, underscores the dynamic and evolving
nature of this field. The study is a crucial read for anyone looking to understand the state-of-the-art in
generative AI and the role of reinforcement learning in advancing this technology.
1. What are the different types of machine learning algorithms and their core principles?
2. How are these machine learning algorithms applied in real-world applications?
3. What are the future research directions and challenges in the field of machine learning?
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
The paper discusses the key machine learning techniques, categorizing them into supervised, unsupervised,
semi-supervised, and reinforcement learning. Supervised learning is noted for its use in tasks like classification
and regression, while unsupervised learning is highlighted for clustering and anomaly detection. Semi-
supervised learning combines elements of both supervised and unsupervised learning to improve learning from
limited labeled data. Reinforcement learning is explored in the context of optimization problems in dynamic
environments. The paper also reviews real-world applications across various domains such as cybersecurity,
healthcare, finance, and more. It identifies several challenges, including the need for large datasets,
computational resources, and the interpretability of models.
Conclusions:
In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data
analysis and applications. According to our goal, we have briefy discussed how various types of machine
learning methods can 160 Page 18 of 21 SN Computer Science (2021) 2:160 SN Computer Science be used
for making solutions to various real-world issues. A successful machine learning model depends on both the
data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be
trained through the collected real-world data and knowledge related to the target application before the system
can assist with intelligent decision-making. We also discussed several popular application areas based on
machine learning techniques to highlight their applicability in various real-world issues. Finally, we have
summarized and discussed the challenges faced and the potential research opportunities and future directions
in the area. Therefore, the challenges that are identifed create promising research opportunities in the feld
which must be addressed with efective solutions in various application areas. Overall, we believe that our study
on machine learning-based solutions opens up a promising direction and can be used as a reference guide for
potential research and applications for both academia and industry professionals as well as for decision-
makers, from a technical point of view.
Reaction:
This paper does an excellent job of demystifying the vast landscape of machine learning by categorizing
various techniques and highlighting their applications in real-world scenarios. It's particularly insightful in
identifying the pressing challenges of data availability, computational demands, and the need for interpretable
models. The recommendations to invest in high-quality datasets, improve algorithmic efficiency, and prioritize
explainable AI are spot-on and crucial for the next advancements in this field. Additionally, fostering
interdisciplinary collaborations and ensuring ethical AI development are imperative steps that align well with
current industry trends. This comprehensive review serves as a valuable resource for anyone looking to grasp
the current state and future direction of machine learning.
the advectiondiffusion framework for vertically-integrated dust transport, with enhanced dust radial
(pseudo-)diffusion up to an effective 𝛼 eff ∼ 10−2 for strongly coupled dust, even when background turbulence
is weak 𝛼 < 10−4 . Dust radial drift is also modestly enhanced in the second scenario. We provide a general
analytical theory that accurately reproduces our simulation results, thus establishing a framework to model
global dust transport that realistically incorporates vertical gas flow structures. We also note that the theory is
equally applicable to the transport of chemical species.
Research Problem:
Conclusions:
The overview concludes that recent advancements in deep learning have led to substantial improvements in
various applications, but significant challenges remain. Addressing issues related to data efficiency,
computational resource demands, and the interpretability of models will be crucial for the continued progress of
deep learning research.
Reaction:
This paper provides an in-depth look at the cutting-edge advancements in deep learning, clearly illustrating
how innovations in neural network architectures and optimization methods are pushing the boundaries of what
AI can achieve. It's fascinating to see how these techniques are being applied in diverse fields like image
recognition, natural language processing, and autonomous systems, highlighting the transformative potential of
deep learning. However, the paper also rightly points out that challenges such as the need for large datasets,
high computational costs, and the complexity of models still need to be tackled. Moving forward, research
should focus on creating more efficient data usage strategies, developing cost-effective computational
methods, and enhancing model interpretability. By addressing these areas, we can ensure that the benefits of
deep learning continue to expand and are accessible across various domains.
Abstract:
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
Federated Learning is a distributed machine learning approach which enables model training on a large corpus
of decentralized data. We have built a scalable production system for Federated Learning in the domain of
mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some
of the challenges and their solutions, and touch upon the open problems and future directions.
Research Problem:
The study aims to answer the following research questions:
1. What are the primary methods used in federated learning?
2. What challenges are associated with implementing federated learning?
3. What future research directions could help overcome these challenges?
Conclusions:
The review concludes that federated learning holds significant promise for privacy-preserving machine learning
but faces substantial challenges that need to be addressed. Future research should focus on improving
communication efficiency, handling data heterogeneity, and developing robust privacy-preserving techniques.
These advancements will be critical to the broader adoption and effectiveness of federated learning.
Reaction:
This paper offers a compelling look into the emerging field of federated learning, highlighting its potential to
revolutionize privacy-preserving machine learning. The detailed discussion on the challenges of data
heterogeneity and communication overhead is particularly insightful, reflecting real-world complexities that
need innovative solutions. The emphasis on privacy-preserving techniques, such as differential privacy and
secure multi-party computation, underscores the importance of protecting user data in decentralized learning
environments. Moving forward, research should prioritize adaptive communication strategies and robust
aggregation methods to tackle the issue of non-IID data. By addressing these areas, federated learning can
become a more viable and widely adopted solution for collaborative model training without compromising data
privacy.
Abstract:
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed
appropriately, may deliver the best of expectations over many application sectors across the field. For this to
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability, an
inherent problem of the latest techniques brought by sub-symbolism (e.g. ensembles or Deep Neural
Networks) that were not present in the last hype of AI (namely, expert systems and rule based models).
Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely
acknowledged as a crucial feature for the practical deployment of AI models. The overview presented in this
article examines the existing literature and contributions already done in the field of XAI, including a prospect
toward what is yet to be reached. For this purpose we summarize previous efforts made to define explainability
in Machine Learning, establishing a novel definition of explainable Machine Learning that covers such prior
conceptual propositions with a major focus on the audience for which the explainability is sought. Departing
from this definition, we propose and discuss about a taxonomy of recent contributions related to the
explainability of different Machine Learning models, including those aimed at explaining Deep Learning
methods for which a second dedicated taxonomy is built and examined in detail. This critical literature analysis
serves as the motivating background for a series of challenges faced by XAI, such as the interesting
crossroads of data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial
Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with
fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to the
field of XAI with a thorough taxonomy that can serve as reference material in order to stimulate future research
advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI
in their activity sectors, without any prior bias for its lack of interpretability.
Research Problem:
This paper offers a thorough exploration of explainable AI, a critical area of study as AI systems become
increasingly integrated into our daily lives and critical decision-making processes. The discussion on
techniques like LIME and SHAP provides valuable insights into how we can make complex AI models more
understandable and transparent. The applications in sensitive fields like healthcare and finance highlight the
real-world importance of XAI, showing how it can enhance trust and accountability in AI systems. The
challenges outlined, particularly the balance between accuracy and interpretability, resonate with ongoing
concerns in the AI community.
Moving forward, it is essential to focus on developing XAI techniques that are not only effective but also
accessible to non-experts who rely on AI-driven decisions. Standardizing evaluation metrics for XAI will help in
objectively assessing the quality of explanations and ensuring consistency across different models and
applications. By addressing these challenges, we can make significant strides in creating AI systems that are
both powerful and transparent, fostering greater trust and wider adoption across various fields.
Abstract:
Deep supervised learning has achieved great success in the last decade. However, its defects of heavy
dependence on manual labels and vulnerability to attacks have driven people to find other paradigms. As an
alternative, self-supervised learning (SSL) attracts many researchers for its soaring performance on
representation learning in the last several years. Self-supervised representation learning leverages input data
itself as supervision and benefits almost all types of downstream tasks. In this survey, we take a look into new
self-supervised learning methods for representation in computer vision, natural language processing, and
graph learning. We comprehensively review the existing empirical methods and summarize them into three
main categories according to their objectives: generative, contrastive, and generative-contrastive (adversarial).
We further collect related theoretical analysis on self-supervised learning to provide deeper thoughts on why
self-supervised learning works. Finally, we briefly discuss open problems and future directions for self-
supervised learning. An outline slide for the survey is provided.
Research Problem:
present and future of deep learning due to its supreme ability to utilize Web-scale unlabeled data to train
feature extractors and context generators efficiently. Despite the diversity of algorithms, we categorize all self-
supervised methods into three classes: generative, contrastive, and generative contrastive according to their
essential training objectives. We introduce typical and representative methods in each category and sub-
categories. Moreover, we discuss the pros and cons of each category and their unique application scenarios.
Finally, fundamental problems and future directions of self-supervised learning are listed.
Reaction:
This paper provides a thorough examination of self-supervised learning, highlighting its transformative potential
in reducing the reliance on labeled data while achieving high performance in both generative and discriminative
tasks. The exploration of various techniques, such as contrastive learning and masked input modeling,
showcases the innovative approaches being developed to harness unlabeled data effectively. The practical
applications discussed, from image generation to natural language processing, demonstrate the broad
applicability of SSL and its ability to handle complex tasks with minimal human intervention. However, the
challenges identified, including the need for large datasets and significant computational resources, are critical
barriers that need to be overcome.
As we look to the future, it is essential to focus on making SSL methods more scalable and efficient,
developing better pretext tasks that can generalize across different domains, and enhancing the interpretability
of the models. Addressing these areas will not only make SSL more accessible but also unlock its full potential,
leading to more robust and versatile AI systems. This paper provides valuable insights and a clear roadmap for
future research, emphasizing the importance of overcoming the current limitations to fully leverage the
capabilities of self-supervised learning.
Abstract:
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly
based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are
looking at ways to apply transformer to computer vision tasks. In a variety of visual benchmarks, transformer-
based models perform similar to or better than other types of networks such as convolutional and recurrent
neural networks. Given its high performance and less need for vision-specific inductive bias, transformer is
receiving more and more attention from the computer vision community. In this paper, we review these vision
transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
The main categories we explore include the backbone network, high/mid-level vision, low-level vision, and
video processing. We also include efficient transformer methods for pushing transformer into real device-based
applications. Furthermore, we also take a brief look at the self-attention mechanism in computer vision, as it is
the base component in transformer. Toward the end of this paper, we discuss the challenges and provide
several further research directions for vision transformers.
Research Problem:
1. What are the current visual transformer models and their architectures?
2. How do these models perform in various computer vision tasks compared to traditional convolutional
neural networks (CNNs)?
3. What challenges and potential future research directions exist for visual transformer models?
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
The paper categorizes and discusses various visual transformer models, such as Vision Transformers (ViTs),
Swin Transformers, and DeiT (Data-efficient Image Transformers). These models leverage the self-attention
mechanism to capture long-range dependencies in image data, offering advantages over traditional CNNs in
terms of handling global context and scalability. The survey highlights significant improvements in image
classification, object detection, and segmentation tasks achieved by these models. However, it also identifies
challenges, such as the high computational cost, the need for large-scale training data, and difficulties in model
interpretability. The discussion includes potential solutions like hybrid architectures that combine transformers
with CNNs and the development of more efficient transformer variants.
Conclusions:
Transformer is becoming a hot topic in the field of computer vision due to its competitive performance and
tremendous potential compared with CNNs. To discover and utilize the power of transformer, as summarized
in this survey, a number of methods have been proposed in recent years. These methods show excellent
performance on a wide range of visual tasks, including backbone, high/mid-level vision, low-level vision, and
video processing. Nevertheless, the potential of transformer for computer vision has not yet been fully
explored, meaning that several challenges still need to be resolved. In this section, we discuss these
challenges and provide insights on the future prospects.
Reaction:
This paper offers a fascinating and thorough examination of visual transformer models, emphasizing their
revolutionary impact on computer vision tasks. The detailed analysis of different transformer architectures,
such as Vision Transformers and Swin Transformers, demonstrates their potential to surpass traditional CNNs
by capturing global context through self-attention mechanisms. The significant performance improvements in
tasks like image classification and object detection are particularly impressive, showcasing the transformative
capabilities of these models. However, the challenges highlighted, especially the high computational costs and
data requirements, are critical hurdles that need to be addressed for these models to be more widely adopted.
Moving forward, the research community should prioritize the development of more efficient visual transformer
models that can operate effectively with less computational power and training data. Exploring hybrid
architectures that combine the strengths of transformers and CNNs could be a promising direction.
Additionally, enhancing the interpretability of these models will be crucial for their application in sensitive and
critical domains. By focusing on these recommendations, we can unlock the full potential of visual
transformers, making them a powerful tool for advancing the field of computer vision. This paper provides
valuable insights and a clear path for future research, emphasizing the need to overcome current limitations to
fully leverage the capabilities of visual transformers.
estimation and summarize the typically used evaluation methodology, including public noisy datasets and
evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for
future studies.
Research Problem:
1. What are the different types of machine learning algorithms and their core principles?
2. How are these machine learning algorithms applied in real-world applications?
3. What are the future research directions and challenges in the field of machine learning?
Conclusions:
DNNs easily overfit to false labels owing to their high capacity in totally memorizing all noisy training samples.
This overfitting issue still remains even with various conventional regularization techniques, such as dropout
and batch normalization, thereby significantly decreasing their generalization performance. Even worse, in
real-world applications, the difficulty in labeling renders the overfitting issue more severe. Therefore, learning
from noisy labels has recently become one of the most active research topics. IEEE TRANSACTIONS ON
NEURAL NETWORKS AND LEARNING SYSTEMS 16 In this survey, we presented a comprehensive
understanding of modern deep learning methods to address the negative consequences of learning from noisy
labels. All the methods were grouped into five categories according to their underlying strategies and described
along with their methodological weaknesses. Furthermore, a systematic comparison was conducted using six
popular properties used for evaluation in the recent literature. According to the comparison results, there is no
ideal method that supports all the required properties; the supported properties varied depending on the
category to which each method belonged. Several experimental guidelines were also discussed, including
noise rate estimation, publicly available datasets, and evaluation metrics. Finally, we provided insights and
directions for future research in this domain
Reaction:
This paper provides a comprehensive and insightful overview of graph neural networks, shedding light on their
significant impact on analyzing and processing graph-structured data. The detailed categorization of different
GNN architectures, such as Graph Convolutional Networks and Graph Attention Networks, offers a clear
understanding of how these models operate and their respective strengths. The diverse applications, from
social network analysis to molecular biology, underscore the versatility and broad applicability of GNNs. The
challenges identified, particularly in terms of scalability and model interpretability, are critical areas that need
innovative solutions to fully realize the potential of GNNs.
Looking ahead, it is crucial for researchers to focus on developing more scalable GNN models that can
efficiently handle large graphs without compromising performance. Addressing the over-smoothing issue and
enhancing the interpretability of these models will also be vital for their application in more complex and
sensitive domains. By prioritizing these areas, we can unlock the full potential of GNNs, making them
indispensable tools in various fields. This paper not only highlights the current state of GNN research but also
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
provides a valuable roadmap for future advancements, emphasizing the need to overcome existing challenges
to fully harness the capabilities of graph neural networks.
Abstract:
Adversarial attacks and defenses in machine learning and deep neural network have been gaining significant
attention due to the rapidly growing applications of deep learning in the Internet and relevant scenarios. This
survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and
defense techniques, with a focus on deep neural network-based classification models. Specifically, we conduct
a comprehensive classification of recent adversarial attack methods and state-of-the-art adversarial defense
techniques based on attack principles, and present them in visually appealing tables and tree diagrams. This is
based on a rigorous evaluation of the existing works, including an analysis of their strengths and limitations.
We also categorize the methods into counterattack detection and robustness enhancement, with a specific
focus on regularization-based methods for enhancing robustness. New avenues of attack are also explored,
including searchbased, decision-based, drop-based, and physical-world attacks, and a hierarchical
classification of the latest defense methods is provided, highlighting the challenges of balancing training costs
with performance, maintaining clean accuracy, overcoming the effect of gradient masking, and ensuring
method transferability. At last, the lessons learned and open challenges are summarized with future research
opportunities recommended.
Research Problem:
The research problem addressed in this paper is the vulnerability of machine learning models to adversarial
attacks, which can significantly compromise their reliability and performance. The paper aims to review and
evaluate various defense strategies proposed to mitigate these vulnerabilities and enhance the robustness of
machine learning systems.
optimization. Moreover, the transferability of adversarial attacks has been thoroughly investigated, providing
deeper insights into the workings of deep learning models. It is expected that this survey serves as a
foundation for future research in this rapidly evolving field, and provides useful information for researchers and
security practitioners.
Reaction:
The review paper "Adversarial Machine Learning: A Comprehensive Review of Defenses Against Adversarial
Attacks" provides an insightful examination of the current landscape of defenses against adversarial attacks in
machine learning. The authors have thoroughly explored a range of defensive strategies, highlighting the
ongoing struggle to balance effectiveness with computational efficiency. The discussion on adversarial training
and detection methods is particularly notable, shedding light on the incremental progress achieved and the
persistent challenges that researchers face. The paper’s comprehensive nature offers a valuable resource for
both academics and practitioners striving to understand and improve the robustness of machine learning
models.
While the review is comprehensive, it could benefit from a deeper exploration of emerging threats and novel
defense mechanisms beyond the traditional approaches discussed. For future research, it would be
advantageous to include a section dedicated to recent advancements in adversarial machine learning, such as
the integration of novel machine learning paradigms and the impact of these new approaches on existing
defense strategies. Additionally, fostering interdisciplinary collaborations could enhance the development of
more innovative and practical defenses against adversarial attacks.
Abstract:
ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of
questions from various domains. The number of its users and of its applications is growing at an
unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a
machine learning model can be effectively trained to accurately distinguish between original human and
seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we
employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model
trained to differentiate between ChatGPT-generated and humangenerated text. The goal is to analyze model’s
decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short
online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The
first experiment involves ChatGPT text generated from custom queries, while the second experiment involves
text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and
use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity
score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more
challenging for the ML model when using rephrased text. However, our proposed approach still achieves an
accuracy of 79%. Using explainability, we observe that ChatGPT’s writing is polite, without specific details,
using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.
Research Problem:
The primary research problem addressed in this paper is the development of effective machine learning
models that can accurately distinguish between text generated by ChatGPT and text written by humans.
Additionally, the paper explores methods for explaining the decisions made by these models to enhance
transparency.
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
The study also underscores the complexities involved in detecting AI-generated text, especially as models like
ChatGPT continue to evolve and produce increasingly sophisticated outputs. The identified linguistic features
that differentiate human and AI text may become less distinct over time, necessitating continuous updates to
the detection models. This ongoing challenge highlights the dynamic nature of AI research and the need for
adaptive and robust methodologies.
Abstract:
Large language models, which are often trained for hundreds of thousands of compute days, have shown
remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are
difficult to replicate without significant capital. For the few that are available through APIs, no access is granted
to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a
suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully
and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3,1 while
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the
infrastructure challenges we faced, along with code for experimenting with all of the released models
Research Problem:
The problem addressed in this paper is the need for a high-performance, open-source transformer language
model that can democratize access to advanced AI technology. The paper aims to bridge the gap between
proprietary models like GPT-3 and the broader research community by providing an alternative that is both
powerful and freely available.
Conclusions:
In this technical report, we introduced OPT, a collection of auto-regressive language models ranging in size
from 125M to 175B parameters. Our goal was to replicate the performance and sizes of the GPT-3 class of
models, while also applying the latest best practices in data curation and training efficiency. We described
training details, evaluated performance in a number of NLP and dialogue settings, and characterized behaviors
with respect to bias, toxicity and hate speech. We also described many other limitations the models have, and
discussed a wide set of considerations for responsibly releasing the models. We believe the entire AI
community would benefit from working together to develop guidelines for responsible LLMs, and we hope that
broad access to these types of models will increase the diversity of voices defining the ethical considerations of
such technologies.
Reaction:
The development and release of the OPT model by Meta AI mark a crucial advancement in the field of natural
language processing. The model's open-source nature addresses a significant barrier in AI research, where
access to powerful models is often restricted by proprietary limitations. By offering a competitive alternative to
GPT-3, OPT not only democratizes access but also encourages a spirit of collaboration and innovation. This
openness can lead to diverse applications and improvements, as researchers worldwide can contribute to and
benefit from the advancements in transformer-based language model. The introduction of OPT also brings to
light the challenges associated with managing and curating open-source AI technologies. Ensuring that these
powerful tools are used ethically and responsibly is paramount. The potential for misuse of such models
necessitates stringent guidelines and proactive measures to prevent harm. Furthermore, the success of OPT in
the broader AI community will depend on continuous updates and active maintenance, requiring a committed
effort from both Meta AI and the larger research community.
relational data is challenging as it requires modeling both a “parent” table and its relationships across tables.
We introduce REaLTabFormer (Realistic Relational and Tabular Transformer), a tabular and relational
synthetic data generation model. It first creates a parent table using an autoregressive GPT-2 model, then
generates the relational dataset conditioned on the parent table using a sequence-to-sequence (Seq2Seq)
model. We implement target masking to prevent data copying and propose the Qδ statistic and statistical
bootstrapping to detect overfitting. Experiments using real-world datasets show that REaLTabFormer captures
the relational structure better than a baseline model. REaLTabFormer also achieves state-of-the-art results on
prediction tasks, “out-of-the-box”, for large nonrelational datasets without needing fine-tuning.
Research Problem:
The problem addressed is the lack of effective models for generating realistic relational and tabular synthetic
data. Existing methods often fall short in preserving the complex relationships and structures inherent in such
data, limiting their utility in practical applications.
The study presents REaLTabFormer as a solution to this problem, demonstrating its ability to generate
synthetic data that closely mirrors real-world datasets. The model was evaluated on several benchmark
datasets, showing significant improvements in data quality and usability compared to traditional methods. The
discussion highlights how REaLTabFormer can facilitate various applications, such as privacy-preserving data
sharing and robust machine learning model training, by providing high-quality synthetic datasets.
Conclusions:
We presented REaLTabFormer, a framework capable of generating high-quality non-relational tabular data
and relational datasets. This work extends the application of sequence-to-sequence models to modeling and
generating relational datasets. We introduced target masking as a component in the model to mitigate data-
copying and safeguarding from potentially sensitive data leaking from the training data. We proposed a
statistical method and the Qδ statistic for detecting overfitting in model training. This statistical method may be
adapted to other generative model training. We showed that our proposed model generates realistic synthetic
tabular data that can be a proxy for real-world data in machine learning tasks. REaLTabFormer’s ability to
model relational datasets accurately compared with existing opensourced alternative contributes to solving
existing gaps in generative models for realistic relational datasets. Finally, this work can be extended and
applied to data imputation, cross-survey imputation, and upsampling for machine learning with imbalanced
data. A BERT-like encoder can be used instead of GPT-2 with the REaLTabFormer for modeling relational
datasets. We also see opportunities to improve privacy protection strategies and the development of more
components like target masking embedded into synthetic data generation models to prevent sensitive data
exposure.
Reaction:
The introduction of REaLTabFormer is a notable development in the realm of synthetic data generation. By
addressing the shortcomings of existing methods, this model offers a powerful tool for researchers and
practitioners who require high-quality synthetic data for training and testing machine learning models. The
ability of REaLTabFormer to generate realistic relational and tabular data can significantly alleviate data
scarcity issues, promoting more robust and comprehensive research. Moreover, its application in privacy-
preserving data sharing is particularly commendable, as it offers a viable solution to the challenges of data
privacy and security.
However, the implementation of REaLTabFormer also presents certain challenges. Ensuring the ethical use of
synthetic data and maintaining the fidelity of generated data to real-world scenarios are critical considerations.
As synthetic data becomes more prevalent, it is essential to establish clear guidelines and standards to prevent
misuse and ensure that the synthetic data is as representative and unbiased as possible. Additionally,
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
continuous improvement and validation of the model against diverse datasets will be crucial in maintaining its
relevance and effectiveness in different applications.
The primary research problem addressed in this paper is the challenge of generating high-quality videos from
text descriptions using limited training data. Traditional methods for text-to-video generation often require
extensive datasets and computational resources, which Tune-A-Video aims to mitigate by using a one-shot
tuning approach.
The study demonstrates that Tune-A-Video can generate coherent and high-quality videos from text
descriptions using significantly less training data compared to conventional methods. The results show that the
model can handle various video generation tasks, including object movement, scene transitions, and complex
interactions. The discussion emphasizes the efficiency of the one-shot tuning approach and its potential to
make text-to-video generation more accessible and practical for broader applications.
Conclusions:
In this paper, we introduce a new task for T2V generation called One-Shot Video Tuning. This task involves
training a T2V generator using only a single text-video pair and pretrained T2I models. We present Tune-A-
Video, a simple yet effective framework for text-driven video generation and editing. To generate continuous
videos, we propose an efficient tuning strategy and structural inversion that enable generating temporally-
coherent videos. Extensive experiments demonstrate the remarkable results of our method spanning a wide
range of applications.
Reaction:
The introduction of Tune-A-Video is a remarkable step forward in the realm of text-to-video generation. This
method addresses a critical bottleneck in the field: the need for extensive datasets and computational power.
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
By employing a one-shot tuning approach, Tune-A-Video demonstrates that it is possible to achieve high-
quality video outputs with significantly less data. This innovation could democratize video generation
technologies, making them accessible to a wider range of users and applications. The efficiency and versatility
of Tune-A-Video highlight the potential for creative and practical uses, from educational content creation to
entertainment and beyond.
The problem of efficiently sharing and reusing machine learning experiments and components. Traditional
methods often involve manual and time-consuming processes, which can hinder collaboration and slow down
the pace of innovation. PyGlove aims to streamline these processes, enabling more effective and seamless
exchange of ideas and code among researchers.
Symbolic patching is a powerful tool for sharing machine learning ideas when all experiments use the same
symbolic representations, which is typically the case within a shared codebase. In this paper, we have focused
on this singlecodebase scenario, which can be found within industry for example. Yet, this scenario does not
apply to most academia or to the ML community as a whole, where different teams tend to build their own
codebases fully indepedently. Scaling to the multi-codebase scenario poses some problems. In particular,
participant teams would need to agree on a high-level interface. This requires community effort and usage of
shared best practices in software design, and is beyond the scope of this paper. Nevertheless, we speculate
that PyGlove offers new tools to help attain a shared highlevel ML interface across codebases: PyGlove works
through code annotation rather than direct editing, which naturally permits building up the interfaces
incrementally. It is possible to annotate a codebase to only accommodate a particular rule patch of interest
(e.g. one just published by another team). PyGlove removes the need for the common paradigm of
“configuration objects” (see Section 4). Configuration objects have led to debate as to their format and
conventions, preventing codebase convergence. Symbolic ML objects, on the other hand, are editable directly,
eliminating the need for separate configuration. The symbolic approach allows for compositionality, which in
turns permits multi-level interfaces. For example, it is easy for the interface to simultaneously expose an image
classifier as a whole, each layer within it, and the inputs and outputs within each layer. The publication of
useful patches can provide a strong incentive for codebase annotation, which may in turn encourage more
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
patches, leading to positive reinforcement. We therefore hope that, in addition to sharing code within a
codebase, PyGlove may also have a positive impact on collaboration across ML codebases in the future.
Conclusions:
Machine learning is hindered by the inability to apply conceptual rules to different ML setups in a scalable
manner. Even if the code and experiment definition of a paper are open-sourced, obtaining an ML setup from
the paper and referential codebase is neither straightforward nor reusable, as it needs to be manually parsed
and replicated on other experiments. In this paper, we have expanded PyGlove [15]’s symbolic capabilities
from AutoML to ML, to address this problem. Our proposal encapsulates the process of evolving from ML
setup A to setup B into a list of patching rules. These rules can be reused across ML setups, creating network
effects among the experiments and rules. In addition to patching, we have also demonstrated how symbolic
programming can serve the entire lifecycle of ML effectively. We have also compared PyGlove’s solution with
existing solutions. Through real-world research projects [7; 25] that heavily rely on PyGlove’s patching
capability, we have seen the potential of how PyGlove can change the way ML programs are developed,
organized, and shared. We have opensourced PyGlove and look forward to it being extensively tested by the
machine learning community
Reaction:
The introduction of PyGlove is a significant advancement for the machine learning community, addressing a
critical need for more efficient collaboration and idea exchange. The ability to share and reuse code
seamlessly can greatly enhance productivity and accelerate the pace of research. PyGlove's modular and
flexible framework empowers researchers to build upon existing work, fostering a more collaborative and
innovative environment. This tool can potentially transform how machine learning research is conducted,
enabling faster development of new models and solutions by reducing the duplication of effort and streamlining
workflows.
While PyGlove presents numerous benefits, its adoption also requires the community to embrace a more open
and collaborative mindset. Ensuring the compatibility and integration of different components within PyGlove's
framework is crucial to its success. Additionally, providing thorough documentation and support will be
essential to help researchers transition to using this new tool effectively. As with any collaborative platform, the
quality and maintenance of shared code will be vital to sustaining its utility and effectiveness.
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu
Abstract:
The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities.
ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and
comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness.
On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from
human experts. On the other hand, people are starting to worry about the potential negative impacts that large
language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social
security issues. In this work, we collected tens of thousands of comparison responses from both human
experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological
areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3
dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts,
and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of
ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After
that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by
ChatGPT or humans. We build three different detection systems, explore several key factors that influence
their effectiveness, and evaluate them in different scenarios.
Research Problem:
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
This paper is determining how closely ChatGPT's performance aligns with that of human experts in various
domains. The study aims to evaluate the effectiveness of ChatGPT using a systematic comparison and to
develop reliable methods for detecting AI-generated text.
The study's results indicate that ChatGPT can perform comparably to human experts in several tasks,
particularly those involving information retrieval, summarization, and conversational interactions. However, it
also identifies areas where ChatGPT falls short, such as nuanced reasoning, context understanding, and
creativity. The discussion emphasizes the importance of these findings for understanding the current
capabilities and limitations of AI, as well as for improving AI models. Additionally, the paper discusses the
effectiveness of different detection methods, noting that while some are quite successful, others require further
refinement.
Conclusions:
In this work, we propose the HC3 (Human ChatGPT Comparison Corpus) dataset, which consists of nearly
40K questions and their corresponding human/ChatGPT answers. Based on the HC3 dataset, we conduct
extensive studies including human evaluations, linguistic analysis, and content detection experiments. The
human evaluations and linguistics analysis provide us insights into the implicit differences between humans
and ChatGPT, which motivate our thoughts on LLMs’ future directions. The ChatGPT content detection
experiments illustrate some important conclusions that can provide beneficial guides to the research and
development of AIGC-detection tools.
Reaction:
The paper "How Close is ChatGPT to Human Experts?" provides a thorough and insightful analysis of
ChatGPT's performance relative to human experts. It is impressive to see how far AI has come, particularly in
tasks such as information retrieval and summarization. The development of a comparison corpus and specific
evaluation metrics is a significant contribution, offering a structured way to measure and understand AI
capabilities. However, the findings also highlight the inherent limitations of current AI models, particularly in
areas requiring deep contextual understanding and creativity. These limitations remind us that while AI can
assist in many tasks, it is not yet a replacement for human expertise in complex and nuanced domains.
On the other hand, the study's exploration of detection methods for AI-generated content is equally crucial. As
AI-generated text becomes more prevalent, the ability to distinguish it from human-generated text is essential
for maintaining trust in various fields, including academia, journalism, and customer service. The progress in
detection methods is promising, but the study also underscores the need for continuous refinement. This is
particularly important as AI models become more sophisticated and their outputs harder to distinguish from
those of humans.
Abstract:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder
Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-
train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context
in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
create state-of-the-art models for a wide range of tasks, such as question answering and language inference,
without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural
language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement),
MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2
(1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Research Problem:
The primary research problem addressed in this paper is the need for a robust pre-training model that can
better understand the context of language by considering both directions (left and right) simultaneously.
Traditional models often process text in a unidirectional manner, limiting their ability to fully grasp the
contextual nuances of language.
The results of the study show that BERT significantly outperforms previous models on several NLP
benchmarks, including the General Language Understanding Evaluation (GLUE) benchmark, the Stanford
Question Answering Dataset (SQuAD), and others. The discussion highlights that BERT's bidirectional training
method enables a deeper understanding of language context, leading to improved performance on tasks such
as question answering, named entity recognition, and sentiment analysis. The authors also discuss the
importance of fine-tuning pre-trained models on specific tasks to achieve optimal results.
Conclusions:
Recent empirical improvements due to transfer learning with language models have demonstrated that rich,
unsupervised pre-training is an integral part of many language understanding systems. In particular, these
results enable even low-resource tasks to benefit from deep unidirectional architectures. Our major contribution
is further generalizing these findings to deep bidirectional architectures, allowing the same pre-trained model to
successfully tackle a broad set of NLP tasks.
Reaction:
The introduction of BERT marks a transformative moment in the field of natural language processing. By
adopting a bidirectional training approach, BERT addresses a fundamental limitation in previous models that
processed text unidirectionally. This innovation allows for a more comprehensive understanding of context,
significantly enhancing performance across a variety of NLP tasks. The paper’s findings that BERT achieves
state-of-the-art results on multiple benchmarks underscore the model's robustness and versatility. This
advancement not only sets a new standard for NLP models but also opens up new possibilities for applications
that require nuanced language understanding.
Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks
in an encoder-decoder configuration. The best performing models also connect the encoder and decoder
through an attention mechanism. We propose a new simple network architecture, the Transformer, based
Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
STO. TOMAS CAMPUS
Sto. Tomas, Batangas
solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two
machine translation tasks show these models to be superior in quality while being more parallelizable and
requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German
translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT
2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score
of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from
the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to
English constituency parsing both with large and limited training data.
Research Problem:
The authors demonstrate that the Transformer model achieves state-of-the-art performance on various
machine translation benchmarks while being highly parallelizable, leading to faster training times compared to
previous models.
Conclusions:
In this work, we presented the Transformer, the first sequence transduction model based entirely on attention,
replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-
attention. For translation tasks, the Transformer can be trained significantly faster than architectures based on
recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French
translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all
previously reported ensembles. We are excited about the future of attention-based models and plan to apply
them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other
than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs
such as images, audio and video. Making generation less sequential is another research goals of ours.
Reaction:
'Attention Is All You Need' presents a compelling argument for the transformative power of self-attention
mechanisms in deep learning. By removing the sequential bottleneck inherent in traditional approaches, the
Transformer model not only achieves superior performance but also introduces a paradigm shift in how we
conceptualize and implement sequence-to-sequence tasks. The emphasis on attention mechanisms not only
enhances the model's ability to capture long-range dependencies but also significantly boosts its efficiency,
marking a pivotal advancement in the field of natural language processing. As a result, researchers and
practitioners alike are now equipped with a more robust toolset for tackling complex tasks that demand
nuanced understanding and synthesis of textual data.
procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a
minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G
recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined
by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any
Markov chains or unrolled approximate inference networks during either training or generation of samples.
Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the
generated samples.
Research Problem:
The main issue addressed is the generation of realistic data samples from a latent space distribution without
explicit probabilistic modeling. Traditional generative models struggled with producing high-fidelity outputs, and
GANs aimed to overcome these limitations through adversarial training.
It is stated that through experiments that GANs can generate synthetic data that closely resemble real samples
in various domains, including images and text. They discuss the challenges of training stability and mode
collapse while highlighting the potential of GANs to revolutionize generative modeling by leveraging adversarial
learning dynamics.
Conclusions:
This framework admits many straightforward extensions:
1. A conditional generative model p(x | c) can be obtained by adding c as input to both G and D.
2. Learned approximate inference can be performed by training an auxiliary network to predict z given x. This
is similar to the inference net trained by the wake-sleep algorithm but with the advantage that the inference net
may be trained for a fixed generator net after the generator net has finished training. 7
3. One can approximately model all conditionals p(xS | x6S) where S is a subset of the indices of x by training
a family of conditional models that share parameters. Essentially, one can use adversarial nets to implement a
stochastic extension of the deterministic MP-DBM.
4. Semi-supervised learning: features from the discriminator or inference net could improve performance of
classifiers when limited labeled data is available.
5. Efficiency improvements: training could be accelerated greatly by divising better methods for coordinating G
and D or determining better distributions to sample z from during training. This paper has demonstrated the
viability of the adversarial modeling framework, suggesting that these research directions could prove useful.
Reaction:
'Generative Adversarial Nets' introduces a captivating concept where two neural networks engage in a
competitive dance to improve the generation of synthetic data. This adversarial framework not only challenges
traditional generative models but also promises to redefine how we perceive and create artificial data
representations. By pitting a generator against a discriminator in a continuous feedback loop, GANs excel in
capturing intricate patterns and nuances present in real-world datasets, leading to outputs that mimic the
complexity of natural data sources.
Moving forward, the application of GANs could be expanded beyond image and text generation into domains
such as healthcare, finance, and environmental sciences, where realistic synthetic data can drive simulations
and predictive analytics. Further research should focus on enhancing GANs' stability during training,
addressing issues like mode collapse, and exploring techniques for improving diversity and control over
generated outputs. As GANs continue to evolve, their potential to innovate across various fields remains
profound, offering new avenues for creativity and problem-solving in the realm of artificial intelligence.
The main issue addressed is the generation of realistic data samples from a latent space distribution without
explicit probabilistic modeling. Traditional generative models struggled with producing high-fidelity outputs, and
GANs aimed to overcome these limitations through adversarial training.
The inefficiency of traditional machine translation models in handling long sentences and capturing
dependencies between words in different languages. The paper proposes an attention-based mechanism to
address these challenges and enhance translation accuracy.
Conclusions:
The conventional approach to neural machine translation, called an encoder–decoder approach, encodes a
whole input sentence into a fixed-length vector from which a translation will be decoded. We conjectured that
the use of a fixed-length context vector is problematic for translating long sentences, based on a recent
empirical study reported by Cho et al. (2014b) and Pouget-Abadie et al. (2014). In this paper, we proposed a
novel architecture that addresses this issue. We extended the basic encoder–decoder by letting a model
(soft-)search for a set of input words, or their annotations computed by an encoder, when generating each
target word. This frees the model from having to encode a whole source sentence into a fixed-length vector,
and also lets the model focus only on information relevant to the generation of the next target word. This has a
major positive impact on the ability of the neural machine translation system to yield good results on longer
sentences. Unlike with the traditional machine translation systems, all of the pieces of the translation system,
including the alignment mechanism, are jointly trained towards a better log-probability of producing correct
translations.
Reaction:
Neural Machine Translation by Jointly Learning to Align and Translate' introduces a breakthrough by allowing
translation models to dynamically focus on relevant parts of the input sentence, enhancing both accuracy and
coherence in translations. This attention mechanism not only mirrors human cognitive processes but also
reflects a deeper understanding of how languages interplay during translation tasks. By enabling the model to
'learn' where to look in the source text, the research not only improves the technical aspects of machine
translation but also underscores the potential for AI to simulate and augment human language capabilities. As
this research continues to evolve, it's crucial to explore ways to generalize attention mechanisms beyond
translation tasks, potentially applying them to broader NLP applications like summarization, question
answering, and sentiment analysis. Moreover, refining attention mechanisms to handle noisy or ambiguous
language constructs could further enhance the robustness and reliability of AI-driven language processing
systems. Emphasizing interpretability and user-centric design in future developments will also be key to
fostering trust and adoption of advanced AI technologies in real-world applications.