ITElec2 Act (Finals)

Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

STO. TOMAS CAMPUS
Sto. Tomas, Batangas
Name: Calangian, Rigelle L. Date: 10/07/2024 Score: ___/100
IT ELEC2
Activity (Finals)
Literature Review # 1:
APPLICATION OF ARTIFICIAL INTELLIGENCE IN THE ORGANIZATION OF KNOWLEDGE IN

LIBRARIES
AJI, Kingsley Olawale, and YISADOKO, Samuel
Abstract:
Artificial Intelligence (AI) is one of the emerging trends in the field of Librarianship and Information
Science (LIS). It involves programming computers to do things, which if done by humans, would be said to
require intelligence. Organization of library information resources is imperative for effective and efficient
utilization of information resources. Artificial intelligence matters to libraries because it is used for organizing
and making available large collection of information. This paper focused on the application of Artificial
intelligence in the organization of knowledge in libraries. The concept of Artificial Intelligence (AI), Information
Retrieval (IR),Cataloguing, Classification and Indexing were defined, the benefits of artificial intelligences in
libraries, the application of AI in information retrieval. Cataloguing, classification and indexing were also
highlighted. It is recommended among others that libraries in Nigeria should embrace the use of Artificial
Intelligence in the library operations, library’s staff be trained on its use in the library service delivery in addition
to its institution in all library units and Artificial Intelligence should be introduced in all the sections of libraries
so as to ease efficient and faster library operation and service delivery in the contemporary Information and
Communication(ICT) Technology era.
Research Problem:
The research problem identified in the document "Application of Artificial Intelligence in the Organization of
Knowledge in Libraries" is centered on the need for modern libraries to effectively organize and manage vast
amounts of information using artificial intelligence (AI). The traditional methods of information organization and
retrieval are becoming inadequate due to the exponential growth of information and the increasing demand for
efficient and accurate information retrieval systems. The paper addresses the challenges libraries face in
organizing their resources and highlights the potential benefits of incorporating AI to improve cataloguing,
classification, indexing, and overall information retrieval processes
Results and Discussion:

AI applications give libraries the opportunity to change the emphasis and attention. The way we navigate the
information is kept altering. AI gives a very useful shortcut to apply this knowledge and produce better
outcomes (Kristin, 2016). The libraries are positioning themselves to take advantage of the application of
cognitive computing in general and artificial intelligence in particular for their potential utility as a tool for
refining the quality of library services. Online Public Access Catalogue (OPAC), web search engines, and
robotic systems customized for book retrieval and delivery. Most web search engines today such as Google,
incorporates speech recognition to their system. This enable their users to speak the word or phrase they want
to search and the web search engines types it into the search box via the use of Natural Language Processing
(NLP) before searching and displaying the search results. In addition, Murphy (2015)reported that robotics
technology is being used to free space restraints and make information resources readily accessible to users.
Conclusions:
Conclusively, the application of Artificial intelligence in libraries in Nigeria has been seen as anew driving force
for intelligent library management. Librarians have begun to adopt artificial technology in some specific areas
of their respective libraries to meet with current global trends. The novel trends of application of Artificial
STO. TOMAS CAMPUS
Intelligence in the library operations in libraries are the following: Expert Systems in Cataloguing, Classification,
Indexing, information retrieval as well as artificial intelligence on Natural Language Processing in Library
Activities, Pattern Recognition in Library Activities and Robotics in the Library Activities. Therefore, AI
application to library services takes off complex and stressful work that humans may encounter lesser error
and defect, aids to access research works, lack of human touch and replacing human involvement.
Reaction:
The application of Artificial Intelligence (AI) in the organization of knowledge in libraries is a timely and
essential development. As libraries evolve from physical spaces to digital repositories, the sheer volume of
information necessitates advanced tools for efficient management. The research highlights the importance of
AI in enhancing information retrieval, cataloguing, classification, and indexing processes. AI can significantly
improve the accessibility and usability of library resources, ensuring that users can find relevant information
quickly and accurately. This transformation is crucial for maintaining the relevance and utility of libraries in the
modern information age.
However, the adoption of AI in libraries also presents several challenges. These include the need for significant
investment in AI technologies, the requirement for staff training, and the potential for resistance to change
among library personnel. Additionally, there are ethical considerations regarding data privacy and the biases
that AI systems might introduce. Despite these challenges, the benefits of AI in streamlining library operations
and improving service delivery make it a worthwhile pursuit.
Literature Review # 2:
APPLICATION OF ARTIFICIAL INTELLIGENCE IN THE ORGANIZATION OF KNOWLEDGE IN

LIBRARIES
AJI, Kingsley Olawale, and YISADOKO, Samuel
Abstract:
Deep Generative AI has been a long-standing essential topic in the machine learning community, which can
impact a number of application areas like text generation and computer vision. The major paradigm to train a
generative model is maximum likelihood estimation. This formulation successfully establishes the objective of
generative tasks, while it is incapable of satisfying all the requirements that a user might expect from a
generative model. Reinforcement learning has demonstrated its power and flexibility to inject new training
signals such as human inductive bias to build a performant model. Thereby, reinforcement learning has
become a trending research field and has stretched the limits of generative AI in both model design and
application. It is reasonable to summarize advances in recent years with a comprehensive review. Although
there are surveys in different application areas recently, this survey aims to shed light on a high-level review
that spans a range of application areas. We provide a rigorous taxonomy and make sufficient coverage on
various models and applications, including the fast-developing large language model area. We conclude this
survey by showing the potential directions that might tackle the limit of current models and expand the frontiers
for generative AI.
Research Problem:
The study aims to address the following questions:
1. What are the current reinforcement learning techniques used in generative AI?
2. How do these techniques apply to generative models?
3. What are the challenges and potential future directions in the integration of reinforcement learning with
generative AI?
STO. TOMAS CAMPUS

The paper observes that reinforcement learning has significant potential in enhancing the capabilities of
generative AI models. Model-free methods, such as Q-learning and policy gradients, are discussed in the
context of their application to generative tasks. Model-based approaches, which utilize a model of the
environment, are also covered. The study highlights the challenges, including sample inefficiency, instability in
training, and the need for substantial computational resources. It suggests that hybrid methods combining both
model-free and model-based approaches could be promising for future research.
Conclusions:
The survey concludes that while reinforcement learning has shown promise in generative AI, there are
considerable challenges that need to be addressed. The integration of model-free and model-based methods,
along with improvements in computational efficiency and stability, are key areas for future research. The paper
emphasizes the importance of continued exploration in this interdisciplinary field to unlock the full potential of
generative AI.
Reaction:
This paper provides a comprehensive overview of the intersection of reinforcement learning and generative AI,
highlighting both the current achievements and the substantial challenges that lie ahead. The detailed
discussion on model-free and model-based approaches offers valuable insights into the strengths and
limitations of these techniques. The emphasis on future research directions, particularly the integration of
different methods and the need for better computational efficiency, underscores the dynamic and evolving
nature of this field. The study is a crucial read for anyone looking to understand the state-of-the-art in
generative AI and the role of reinforcement learning in advancing this technology.
Literature Review #3:
Machine Learning: Algorithms, Real-World Applications and Research

Directions
Iqbal H. Sarker
Abstract:
In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of
data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data,
health data, etc. To intelligently analyze these data and develop the corresponding smart and
automated applications, the knowledge of artifcial intelligence (AI), particularly, machine learning (ML) is the
key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and
reinforcement learning exist in the area. Besides, the deep learning, which is part of a broader family of
machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a
comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and
the capabilities of an application. Thus, this study’s key contribution is explaining the principles of diferent
machine learning techniques and their applicability in various real-world application domains, such as
cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight
the challenges and potential research directions based on our study. Overall, this paper aims to serve as a
reference point for both academia and industry professionals as well as for decision-makers in various real-
world situations and application areas, particularly from the technical point of view.
Research Problem:
The study aims to answer the following research questions:
1. What are the different types of machine learning algorithms and their core principles?
2. How are these machine learning algorithms applied in real-world applications?
3. What are the future research directions and challenges in the field of machine learning?
STO. TOMAS CAMPUS
The paper discusses the key machine learning techniques, categorizing them into supervised, unsupervised,
semi-supervised, and reinforcement learning. Supervised learning is noted for its use in tasks like classification
and regression, while unsupervised learning is highlighted for clustering and anomaly detection. Semi-
supervised learning combines elements of both supervised and unsupervised learning to improve learning from
limited labeled data. Reinforcement learning is explored in the context of optimization problems in dynamic
environments. The paper also reviews real-world applications across various domains such as cybersecurity,
healthcare, finance, and more. It identifies several challenges, including the need for large datasets,
computational resources, and the interpretability of models.
Conclusions:
In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data
analysis and applications. According to our goal, we have briefy discussed how various types of machine
learning methods can 160 Page 18 of 21 SN Computer Science (2021) 2:160 SN Computer Science be used
for making solutions to various real-world issues. A successful machine learning model depends on both the
data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be
trained through the collected real-world data and knowledge related to the target application before the system
can assist with intelligent decision-making. We also discussed several popular application areas based on
machine learning techniques to highlight their applicability in various real-world issues. Finally, we have
summarized and discussed the challenges faced and the potential research opportunities and future directions
in the area. Therefore, the challenges that are identifed create promising research opportunities in the feld
which must be addressed with efective solutions in various application areas. Overall, we believe that our study
on machine learning-based solutions opens up a promising direction and can be used as a reference guide for
potential research and applications for both academia and industry professionals as well as for decision-
makers, from a technical point of view.
Reaction:
This paper does an excellent job of demystifying the vast landscape of machine learning by categorizing
various techniques and highlighting their applications in real-world scenarios. It's particularly insightful in
identifying the pressing challenges of data availability, computational demands, and the need for interpretable
models. The recommendations to invest in high-quality datasets, improve algorithmic efficiency, and prioritize
explainable AI are spot-on and crucial for the next advancements in this field. Additionally, fostering
interdisciplinary collaborations and ensuring ethical AI development are imperative steps that align well with
current industry trends. This comprehensive review serves as a valuable resource for anyone looking to grasp
the current state and future direction of machine learning.
Recent Advances in Deep Learning: An Overview

Zitao Huand Xue-Ning Bai
Abstract:
It has recently been shown that the inner region of protoplanetary disks (PPDs) is governed by wind-driven
accretion, and the resulting accretion flow showing complex vertical profiles. Such complex flow structures are
further enhanced due to the Hall effect, especially when the background magnetic field is aligned with disk
rotation. We investigate how such flow structures impact global dust transport via Monte-Carlo simulations,
focusing on two scenarios. In the first scenario, the toroidal magnetic field is maximized in the miplane, leading
to accretion and decretion flows above and below. In the second scenario, the toroidal field changes sign
across the midplane, leading to an accretion flow at the disk midplane, with decretion flows above and below.
We find that in both cases, the contribution from additional gas flows can still be accurately incorporated into
STO. TOMAS CAMPUS
the advectiondiffusion framework for vertically-integrated dust transport, with enhanced dust radial
(pseudo-)diffusion up to an effective 𝛼 eff ∼ 10−2 for strongly coupled dust, even when background turbulence
is weak 𝛼 < 10−4 . Dust radial drift is also modestly enhanced in the second scenario. We provide a general
analytical theory that accurately reproduces our simulation results, thus establishing a framework to model
global dust transport that realistically incorporates vertical gas flow structures. We also note that the theory is
equally applicable to the transport of chemical species.
Research Problem:
The study aims to answer the following questions:

1. What are the recent advancements in deep learning techniques and architectures?
2. How have these advancements been applied in real-world scenarios?
3. What are the current challenges and potential future directions in deep learning research?

The paper identifies several key advancements in deep learning, including improvements in neural network
architectures like transformers and convolutional neural networks (CNNs), as well as optimization techniques
such as adaptive learning rates and gradient-based methods. It also discusses enhanced training
methodologies, such as self-supervised learning and transfer learning. These advancements have significantly
improved performance in tasks like image and speech recognition, natural language processing, and
autonomous systems. The paper highlights ongoing challenges such as the need for large datasets,
computational costs, and model interpretability. It suggests that future research should focus on addressing
these challenges to further advance the field.
Conclusions:
The overview concludes that recent advancements in deep learning have led to substantial improvements in
various applications, but significant challenges remain. Addressing issues related to data efficiency,
computational resource demands, and the interpretability of models will be crucial for the continued progress of
deep learning research.
Reaction:
This paper provides an in-depth look at the cutting-edge advancements in deep learning, clearly illustrating
how innovations in neural network architectures and optimization methods are pushing the boundaries of what
AI can achieve. It's fascinating to see how these techniques are being applied in diverse fields like image
recognition, natural language processing, and autonomous systems, highlighting the transformative potential of
deep learning. However, the paper also rightly points out that challenges such as the need for large datasets,
high computational costs, and the complexity of models still need to be tackled. Moving forward, research
should focus on creating more efficient data usage strategies, developing cost-effective computational
methods, and enhancing model interpretability. By addressing these areas, we can ensure that the benefits of
deep learning continue to expand and are accessible across various domains.
TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe
Kiddon, Jakub Konecnˇ y´, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt,
David Petrou, Daniel Ramage, and Jason Roselander
Abstract:
STO. TOMAS CAMPUS
Federated Learning is a distributed machine learning approach which enables model training on a large corpus
of decentralized data. We have built a scalable production system for Federated Learning in the domain of
mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some
of the challenges and their solutions, and touch upon the open problems and future directions.
Research Problem:
1. What are the primary methods used in federated learning?
2. What challenges are associated with implementing federated learning?
3. What future research directions could help overcome these challenges?

The paper discusses various federated learning methods, including Federated Averaging (FedAvg) and
personalized federated learning techniques. It identifies several challenges such as data heterogeneity, where
data distributions vary significantly across clients; communication overhead, due to frequent model updates;
and ensuring model accuracy while maintaining privacy. The study highlights potential solutions, such as
adaptive communication strategies and privacy-preserving techniques like differential privacy and secure multi-
party computation. It also emphasizes the importance of robust aggregation methods to handle non-IID
(independent and identically distributed) data.
Conclusions:
The review concludes that federated learning holds significant promise for privacy-preserving machine learning
but faces substantial challenges that need to be addressed. Future research should focus on improving
communication efficiency, handling data heterogeneity, and developing robust privacy-preserving techniques.
These advancements will be critical to the broader adoption and effectiveness of federated learning.
Reaction:
This paper offers a compelling look into the emerging field of federated learning, highlighting its potential to
revolutionize privacy-preserving machine learning. The detailed discussion on the challenges of data
heterogeneity and communication overhead is particularly insightful, reflecting real-world complexities that
need innovative solutions. The emphasis on privacy-preserving techniques, such as differential privacy and
secure multi-party computation, underscores the importance of protecting user data in decentralized learning
environments. Moving forward, research should prioritize adaptive communication strategies and robust
aggregation methods to tackle the issue of non-IID data. By addressing these areas, federated learning can
become a more viable and widely adopted solution for collaborative model training without compromising data
privacy.
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies,

Opportunities and Challenges toward Responsible AI
Alejandro Barredo Arrietaa, Natalia D´ıaz-Rodr´ıguezb, Javier Del Sera,c,d, Adrien Bennetotb,e,f, Siham Tabikg,
Alberto Barbadoh, Salvador Garciag, Sergio Gil-Lopeza, Daniel Molinag, Richard Benjaminsh, Raja Chatilaf, and
Francisco Herrerag
Abstract:
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed
appropriately, may deliver the best of expectations over many application sectors across the field. For this to
STO. TOMAS CAMPUS
occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability, an
inherent problem of the latest techniques brought by sub-symbolism (e.g. ensembles or Deep Neural
Networks) that were not present in the last hype of AI (namely, expert systems and rule based models).
Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely
acknowledged as a crucial feature for the practical deployment of AI models. The overview presented in this
article examines the existing literature and contributions already done in the field of XAI, including a prospect
toward what is yet to be reached. For this purpose we summarize previous efforts made to define explainability
in Machine Learning, establishing a novel definition of explainable Machine Learning that covers such prior
conceptual propositions with a major focus on the audience for which the explainability is sought. Departing
from this definition, we propose and discuss about a taxonomy of recent contributions related to the
explainability of different Machine Learning models, including those aimed at explaining Deep Learning
methods for which a second dedicated taxonomy is built and examined in detail. This critical literature analysis
serves as the motivating background for a series of challenges faced by XAI, such as the interesting
crossroads of data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial
Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with
fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to the
field of XAI with a thorough taxonomy that can serve as reference material in order to stimulate future research
advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI
in their activity sectors, without any prior bias for its lack of interpretability.
Research Problem:
1. What are the current techniques used to make AI models explainable?

2. How are these explainable AI techniques applied in different fields?
3. What are the challenges and future directions for research in explainable AI?

The paper categorizes XAI techniques into model-specific methods and model-agnostic methods. Model-
specific methods are tailored to particular types of AI models, such as decision trees or neural networks, while
model-agnostic methods can be applied to any AI model. Techniques like LIME (Local Interpretable Model-
agnostic Explanations) and SHAP (SHapley Additive exPlanations) are discussed in detail. The application of
these techniques in fields like healthcare, finance, and law demonstrates their potential to improve
transparency and trust in AI systems. However, the paper also identifies significant challenges, such as the
trade-off between model accuracy and interpretability, the complexity of explaining deep learning models, and
the lack of standardized evaluation metrics for XAI.
Conclusions:
This overview has revolved around eXplainable Artificial Intelligence (XAI), which has been identified in recent
times as an utmost need for the adoption of ML methods in real-life applications. Our study 45 has elaborated
on this topic by first clarifying different concepts underlying model explainability, as well as by showing the
diverse purposes that motivate the search for more interpretable ML methods. These conceptual remarks have
served as a solid baseline for a systematic review of recent literature dealing with explainability, which has
been approached from two different perspectives: 1) ML models that feature some degree of transparency,
thereby interpretable to an extent by themselves; and 2) post-hoc XAI techniques devised to make ML models
more interpretable. This literature analysis has yielded a global taxonomy of different proposals reported by the
community, classifying them under uniform criteria. Given the prevalence of contributions dealing with the
explainability of Deep Learning models, we have inspected in depth the literature dealing with this family of
models, giving rise to an alternative taxonomy that connects more closely with the specific domains in which
explainability can be realized for Deep Learning models.
Reaction:
STO. TOMAS CAMPUS
This paper offers a thorough exploration of explainable AI, a critical area of study as AI systems become
increasingly integrated into our daily lives and critical decision-making processes. The discussion on
techniques like LIME and SHAP provides valuable insights into how we can make complex AI models more
understandable and transparent. The applications in sensitive fields like healthcare and finance highlight the
real-world importance of XAI, showing how it can enhance trust and accountability in AI systems. The
challenges outlined, particularly the balance between accuracy and interpretability, resonate with ongoing
concerns in the AI community.
Moving forward, it is essential to focus on developing XAI techniques that are not only effective but also
accessible to non-experts who rely on AI-driven decisions. Standardizing evaluation metrics for XAI will help in
objectively assessing the quality of explanations and ensuring consistency across different models and
applications. By addressing these challenges, we can make significant strides in creating AI systems that are
both powerful and transparent, fostering greater trust and wider adoption across various fields.
Self-supervised Learning: Generative or Contrastive

Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing Zhang, Jie Tang*, IEEE Fellow
Abstract:
Deep supervised learning has achieved great success in the last decade. However, its defects of heavy
dependence on manual labels and vulnerability to attacks have driven people to find other paradigms. As an
alternative, self-supervised learning (SSL) attracts many researchers for its soaring performance on
representation learning in the last several years. Self-supervised representation learning leverages input data
itself as supervision and benefits almost all types of downstream tasks. In this survey, we take a look into new
self-supervised learning methods for representation in computer vision, natural language processing, and
graph learning. We comprehensively review the existing empirical methods and summarize them into three
main categories according to their objectives: generative, contrastive, and generative-contrastive (adversarial).
We further collect related theoretical analysis on self-supervised learning to provide deeper thoughts on why
self-supervised learning works. Finally, we briefly discuss open problems and future directions for self-
supervised learning. An outline slide for the survey is provided.
Research Problem:
1. What are the current methods used in self-supervised learning?

2. How do these methods apply to generative and discriminative tasks?
3. What are the challenges and potential future directions in the field of self-supervised learning?

The paper identifies and categorizes several SSL techniques, including contrastive learning, masked input
modeling, and context prediction. These methods are evaluated in the context of both generative tasks (e.g.,
image and text generation) and discriminative tasks (e.g., classification and segmentation). The results show
that SSL can significantly reduce the reliance on labeled data while achieving competitive performance with
supervised methods. The discussion highlights key challenges, such as the need for large datasets and
computational resources, the complexity of model training, and the difficulty in defining appropriate self-
supervised tasks for different applications. Future research directions include improving the scalability of SSL
methods, developing better pretext tasks, and enhancing the interpretability of learned representations.
Conclusions:
This survey comprehensively reviews the existing selfsupervised representation learning approaches in natural
language processing (NLP), computer vision (CV), graph learning, and beyond. Self-supervised learning is the
STO. TOMAS CAMPUS
present and future of deep learning due to its supreme ability to utilize Web-scale unlabeled data to train
feature extractors and context generators efficiently. Despite the diversity of algorithms, we categorize all self-
supervised methods into three classes: generative, contrastive, and generative contrastive according to their
essential training objectives. We introduce typical and representative methods in each category and sub-
categories. Moreover, we discuss the pros and cons of each category and their unique application scenarios.
Finally, fundamental problems and future directions of self-supervised learning are listed.
Reaction:
This paper provides a thorough examination of self-supervised learning, highlighting its transformative potential
in reducing the reliance on labeled data while achieving high performance in both generative and discriminative
tasks. The exploration of various techniques, such as contrastive learning and masked input modeling,
showcases the innovative approaches being developed to harness unlabeled data effectively. The practical
applications discussed, from image generation to natural language processing, demonstrate the broad
applicability of SSL and its ability to handle complex tasks with minimal human intervention. However, the
challenges identified, including the need for large datasets and significant computational resources, are critical
barriers that need to be overcome.
As we look to the future, it is essential to focus on making SSL methods more scalable and efficient,
developing better pretext tasks that can generalize across different domains, and enhancing the interpretability
of the models. Addressing these areas will not only make SSL more accessible but also unlock its full potential,
leading to more robust and versatile AI systems. This paper provides valuable insights and a clear roadmap for
future research, emphasizing the importance of overcoming the current limitations to fully leverage the
capabilities of self-supervised learning.
A Survey on Visual Transformer

Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao,
Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao Fellow, IEEE
Abstract:
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly
based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are
looking at ways to apply transformer to computer vision tasks. In a variety of visual benchmarks, transformer-
based models perform similar to or better than other types of networks such as convolutional and recurrent
neural networks. Given its high performance and less need for vision-specific inductive bias, transformer is
receiving more and more attention from the computer vision community. In this paper, we review these vision
transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
The main categories we explore include the backbone network, high/mid-level vision, low-level vision, and
video processing. We also include efficient transformer methods for pushing transformer into real device-based
applications. Furthermore, we also take a brief look at the self-attention mechanism in computer vision, as it is
the base component in transformer. Toward the end of this paper, we discuss the challenges and provide
several further research directions for vision transformers.
Research Problem:
1. What are the current visual transformer models and their architectures?
2. How do these models perform in various computer vision tasks compared to traditional convolutional
neural networks (CNNs)?
3. What challenges and potential future research directions exist for visual transformer models?
STO. TOMAS CAMPUS
The paper categorizes and discusses various visual transformer models, such as Vision Transformers (ViTs),
Swin Transformers, and DeiT (Data-efficient Image Transformers). These models leverage the self-attention
mechanism to capture long-range dependencies in image data, offering advantages over traditional CNNs in
terms of handling global context and scalability. The survey highlights significant improvements in image
classification, object detection, and segmentation tasks achieved by these models. However, it also identifies
challenges, such as the high computational cost, the need for large-scale training data, and difficulties in model
interpretability. The discussion includes potential solutions like hybrid architectures that combine transformers
with CNNs and the development of more efficient transformer variants.
Conclusions:
Transformer is becoming a hot topic in the field of computer vision due to its competitive performance and
tremendous potential compared with CNNs. To discover and utilize the power of transformer, as summarized
in this survey, a number of methods have been proposed in recent years. These methods show excellent
performance on a wide range of visual tasks, including backbone, high/mid-level vision, low-level vision, and
video processing. Nevertheless, the potential of transformer for computer vision has not yet been fully
explored, meaning that several challenges still need to be resolved. In this section, we discuss these
challenges and provide insights on the future prospects.
Reaction:
This paper offers a fascinating and thorough examination of visual transformer models, emphasizing their
revolutionary impact on computer vision tasks. The detailed analysis of different transformer architectures,
such as Vision Transformers and Swin Transformers, demonstrates their potential to surpass traditional CNNs
by capturing global context through self-attention mechanisms. The significant performance improvements in
tasks like image classification and object detection are particularly impressive, showcasing the transformative
capabilities of these models. However, the challenges highlighted, especially the high computational costs and
data requirements, are critical hurdles that need to be addressed for these models to be more widely adopted.
Moving forward, the research community should prioritize the development of more efficient visual transformer
models that can operate effectively with less computational power and training data. Exploring hybrid
architectures that combine the strengths of transformers and CNNs could be a promising direction.
Additionally, enhancing the interpretability of these models will be crucial for their application in sensitive and
critical domains. By focusing on these recommendations, we can unlock the full potential of visual
transformers, making them a powerful tool for advancing the field of computer vision. This paper provides
valuable insights and a clear path for future research, emphasizing the need to overcome current limitations to
fully leverage the capabilities of visual transformers.
Learning from Noisy Labels with Deep Neural

Networks: A Survey
Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, Jae-Gil Lee
Abstract:
Deep learning has achieved remarkable success in numerous domains with help from large amounts of big
data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-
world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks,
learning from noisy labels (robust training) is becoming an important task in modern deep learning applications.
In this survey, we first describe the problem of learning with label noise from a supervised learning perspective.
Next, we provide a comprehensive review of 62 state-of-the-art robust training methods, all of which are
categorized into five groups according to their methodological difference, followed by a systematic comparison
of six properties used to evaluate their superiority. Subsequently, we perform an in-depth analysis of noise rate
STO. TOMAS CAMPUS
estimation and summarize the typically used evaluation methodology, including public noisy datasets and
evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for
future studies.
Research Problem:
1. What are the different types of machine learning algorithms and their core principles?
2. How are these machine learning algorithms applied in real-world applications?
3. What are the future research directions and challenges in the field of machine learning?

The paper categorizes GNN methods into several types, including Graph Convolutional Networks (GCNs),
Graph Attention Networks (GATs), and Graph Recurrent Networks (GRNs). Each type leverages different
mechanisms for aggregating and updating node features based on their neighbors' information. The review
highlights how GNNs have been successfully applied to a variety of tasks, such as node classification, link
prediction, and graph classification, across domains like social network analysis, molecular biology, and
recommendation systems. However, it also identifies significant challenges, such as scalability to large graphs,
over-smoothing, and the interpretability of models. The discussion includes potential solutions like hierarchical
graph pooling and the development of more interpretable GNN architectures.
Conclusions:
DNNs easily overfit to false labels owing to their high capacity in totally memorizing all noisy training samples.
This overfitting issue still remains even with various conventional regularization techniques, such as dropout
and batch normalization, thereby significantly decreasing their generalization performance. Even worse, in
real-world applications, the difficulty in labeling renders the overfitting issue more severe. Therefore, learning
from noisy labels has recently become one of the most active research topics. IEEE TRANSACTIONS ON
NEURAL NETWORKS AND LEARNING SYSTEMS 16 In this survey, we presented a comprehensive
understanding of modern deep learning methods to address the negative consequences of learning from noisy
labels. All the methods were grouped into five categories according to their underlying strategies and described
along with their methodological weaknesses. Furthermore, a systematic comparison was conducted using six
popular properties used for evaluation in the recent literature. According to the comparison results, there is no
ideal method that supports all the required properties; the supported properties varied depending on the
category to which each method belonged. Several experimental guidelines were also discussed, including
noise rate estimation, publicly available datasets, and evaluation metrics. Finally, we provided insights and
directions for future research in this domain
Reaction:
This paper provides a comprehensive and insightful overview of graph neural networks, shedding light on their
significant impact on analyzing and processing graph-structured data. The detailed categorization of different
GNN architectures, such as Graph Convolutional Networks and Graph Attention Networks, offers a clear
understanding of how these models operate and their respective strengths. The diverse applications, from
social network analysis to molecular biology, underscore the versatility and broad applicability of GNNs. The
challenges identified, particularly in terms of scalability and model interpretability, are critical areas that need
innovative solutions to fully realize the potential of GNNs.
Looking ahead, it is crucial for researchers to focus on developing more scalable GNN models that can
efficiently handle large graphs without compromising performance. Addressing the over-smoothing issue and
enhancing the interpretability of these models will also be vital for their application in more complex and
sensitive domains. By prioritizing these areas, we can unlock the full potential of GNNs, making them
indispensable tools in various fields. This paper not only highlights the current state of GNN research but also
STO. TOMAS CAMPUS
provides a valuable roadmap for future advancements, emphasizing the need to overcome existing challenges
to fully harness the capabilities of graph neural networks.
Adversarial Attacks and Defenses in Machine

Learning-Powered Networks: A Contemporary
Survey
Yulong Wang, Member, IEEE, Tong Sun, Shenghong Li, Member, IEEE, Xin Yuan, Member, IEEE,
Wei Ni, Senior Member, IEEE, Ekram Hossain, Fellow, IEEE, and H. Vincent Poor, Life Fellow, IEEE
Abstract:
Adversarial attacks and defenses in machine learning and deep neural network have been gaining significant
attention due to the rapidly growing applications of deep learning in the Internet and relevant scenarios. This
survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and
defense techniques, with a focus on deep neural network-based classification models. Specifically, we conduct
a comprehensive classification of recent adversarial attack methods and state-of-the-art adversarial defense
techniques based on attack principles, and present them in visually appealing tables and tree diagrams. This is
based on a rigorous evaluation of the existing works, including an analysis of their strengths and limitations.
We also categorize the methods into counterattack detection and robustness enhancement, with a specific
focus on regularization-based methods for enhancing robustness. New avenues of attack are also explored,
including searchbased, decision-based, drop-based, and physical-world attacks, and a hierarchical
classification of the latest defense methods is provided, highlighting the challenges of balancing training costs
with performance, maintaining clean accuracy, overcoming the effect of gradient masking, and ensuring
method transferability. At last, the lessons learned and open challenges are summarized with future research
opportunities recommended.
Research Problem:
The research problem addressed in this paper is the vulnerability of machine learning models to adversarial
attacks, which can significantly compromise their reliability and performance. The paper aims to review and
evaluate various defense strategies proposed to mitigate these vulnerabilities and enhance the robustness of
machine learning systems.

There have been many noteworthy attempts to reduce training costs, including transfer learning, partial
training/updating, optimizing training epochs, and parallel training. For example, an adaptive retraining process
is used in ART. The Luring effect can be used to improve a trained model at a low cost, since it does not
require labeled datasets. It is also possible to introduce additional regularization items, such as HLDR, or
layers, such as FNC, to neural networks for performance improvements without increasing model parameters.
In HIRE-SNN, the weight updating only takes place after T steps, thereby reducing the training cost while
allowing different adversarial image variants to train the model. The iterative approach is only used to the
unfrozen layer for the Distilled Differentiator. PDA selects the iterative steps that best balance robustness,
precision, and computing cost. To parallelize training, RP-Ensemble creates classifiers in separately projected
subspaces. In addition, models may be trained concurrently in Distilled Differentiator. These efforts mitigate the
trade-off between defense efficacy and expense to some degree, but they do not eliminate the trade-off.
Conclusions:
In this survey, we have provided a comprehensive overview of the recent advancements in adversarial attacks
and defenses in the field of machine learning and deep neural networks. We have analyzed both the attack
techniques, including those based on constrained optimization and gradient-based optimization, and their
adaptations to different threat models, such as white-box, gray-box, and black-box attacks. We have reviewed
the latest defense strategies against adversarial examples, including detection and robustness improvement,
which mainly focus on enhancing robustness through regularization, data augmentation, and structure
STO. TOMAS CAMPUS
optimization. Moreover, the transferability of adversarial attacks has been thoroughly investigated, providing
deeper insights into the workings of deep learning models. It is expected that this survey serves as a
foundation for future research in this rapidly evolving field, and provides useful information for researchers and
security practitioners.
Reaction:
The review paper "Adversarial Machine Learning: A Comprehensive Review of Defenses Against Adversarial
Attacks" provides an insightful examination of the current landscape of defenses against adversarial attacks in
machine learning. The authors have thoroughly explored a range of defensive strategies, highlighting the
ongoing struggle to balance effectiveness with computational efficiency. The discussion on adversarial training
and detection methods is particularly notable, shedding light on the incremental progress achieved and the
persistent challenges that researchers face. The paper’s comprehensive nature offers a valuable resource for
both academics and practitioners striving to understand and improve the robustness of machine learning
models.
While the review is comprehensive, it could benefit from a deeper exploration of emerging threats and novel
defense mechanisms beyond the traditional approaches discussed. For future research, it would be
advantageous to include a section dedicated to recent advancements in adversarial machine learning, such as
the integration of novel machine learning paradigms and the impact of these new approaches on existing
defense strategies. Additionally, fostering interdisciplinary collaborations could enhance the development of
more innovative and practical defenses against adversarial attacks.
CHATGPT OR HUMAN? DETECT AND EXPLAIN. EXPLAINING

DECISIONS OF MACHINE LEARNING MODEL FOR DETECTING
SHORT CHATGPT-GENERATED TEXT
Sandra Mitrovic´, Davide Andreoletti, and Omran Ayoub
Abstract:
ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of
questions from various domains. The number of its users and of its applications is growing at an
unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a
machine learning model can be effectively trained to accurately distinguish between original human and
seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we
employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model
trained to differentiate between ChatGPT-generated and humangenerated text. The goal is to analyze model’s
decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short
online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The
first experiment involves ChatGPT text generated from custom queries, while the second experiment involves
text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and
use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity
score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more
challenging for the ML model when using rephrased text. However, our proposed approach still achieves an
accuracy of 79%. Using explainability, we observe that ChatGPT’s writing is polite, without specific details,
using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.
Research Problem:
The primary research problem addressed in this paper is the development of effective machine learning
models that can accurately distinguish between text generated by ChatGPT and text written by humans.
Additionally, the paper explores methods for explaining the decisions made by these models to enhance
transparency.
STO. TOMAS CAMPUS

The obtained classification results clearly show that the disambiguation between ChatGPT-generated and
humangenerated reviews is more challenging for the ML model when the reviews have been rephrased from
existing humangenerated texts and not generated by custom queries. Yet, the ML model is capable to achieve
an acceptable performance with an accuracy around 79%. We further note that the approach based on
rephrasing existing reviews, besides being more effective in avoiding detection of ChatGPT-generated text for
relatively short-length text (such as online reviews), it is also more scalable, as it eliminates the need of crafting
custom queries to obtain the reviews. This fact opens the doors for possible misuse of ChatGPT, which might
be used to easily obtain plenty of negative reviews that 9 ChatGPT or Human? Detect and Explain. Explaining
Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text can mislead consumers
and damage the reputation of businesses or, on the other hand, to improve the reputation of low-quality
businesses.
Conclusions:
In this paper we focus on building ML (Transformer-based) model to discriminate between human-written and
seemingly human (ChatGPT-generated) text, focusing on a more challenging case, short texts. We show that
the usage of ML model is necessary as traditional methods (such as, perplexity) do not give good results. As
expected, ML model discriminates better when the text is generated based on customer queries and not by
rephrasing original human texts. Looking into SHAP explanations of the predictions gives some insights about
ChatGPT writing style. It is extremely polite, aiming to please different types of requests from various domains
fairly well mimicking humans, but that still does not have the profoundness of human language (e.g. irony,
metaphors,...). For the future work, we will consider alternative ML models, different domains and query types.
Reaction:
The study "ChatGPT or Human? Detect and Explain" provides an intriguing look into the capabilities of
machine learning models in distinguishing AI-generated content from human-authored text. The use of various
datasets to train the models highlights the importance of diverse and comprehensive training data in achieving
high accuracy. The paper's focus on providing explanations for model decisions is particularly noteworthy, as it
addresses a critical aspect of AI ethics and transparency. By enabling users to understand why a particular text
is flagged as AI-generated, the research promotes trust and accountability in the use of AI technologies.
The study also underscores the complexities involved in detecting AI-generated text, especially as models like
ChatGPT continue to evolve and produce increasingly sophisticated outputs. The identified linguistic features
that differentiate human and AI text may become less distinct over time, necessitating continuous updates to
the detection models. This ongoing challenge highlights the dynamic nature of AI research and the need for
adaptive and robust methodologies.
OPT: Open Pre-trained Transformer Language Models

Susan Zhang∗ , Stephen Roller∗ , Naman Goyal∗ , Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab,
Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott† , Sam Shleifer† , Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali
Sridhar, Tianlu Wang, Luke Zettlemoyer
Abstract:
Large language models, which are often trained for hundreds of thousands of compute days, have shown
remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are
difficult to replicate without significant capital. For the few that are available through APIs, no access is granted
to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a
suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully
and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3,1 while
STO. TOMAS CAMPUS
requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the
infrastructure challenges we faced, along with code for experimenting with all of the released models
Research Problem:
The problem addressed in this paper is the need for a high-performance, open-source transformer language
model that can democratize access to advanced AI technology. The paper aims to bridge the gap between
proprietary models like GPT-3 and the broader research community by providing an alternative that is both
powerful and freely available.

The OPT model demonstrates competitive performance with GPT-3 across various tasks, such as language
understanding, text generation, and translation. The results show that OPT can achieve similar benchmarks in
terms of accuracy and efficiency. The discussion highlights the importance of open-source models in
advancing AI research, enabling more researchers to experiment, iterate, and build upon existing technologies
without the constraints of proprietary software.
Conclusions:
In this technical report, we introduced OPT, a collection of auto-regressive language models ranging in size
from 125M to 175B parameters. Our goal was to replicate the performance and sizes of the GPT-3 class of
models, while also applying the latest best practices in data curation and training efficiency. We described
training details, evaluated performance in a number of NLP and dialogue settings, and characterized behaviors
with respect to bias, toxicity and hate speech. We also described many other limitations the models have, and
discussed a wide set of considerations for responsibly releasing the models. We believe the entire AI
community would benefit from working together to develop guidelines for responsible LLMs, and we hope that
broad access to these types of models will increase the diversity of voices defining the ethical considerations of
such technologies.
Reaction:
The development and release of the OPT model by Meta AI mark a crucial advancement in the field of natural
language processing. The model's open-source nature addresses a significant barrier in AI research, where
access to powerful models is often restricted by proprietary limitations. By offering a competitive alternative to
GPT-3, OPT not only democratizes access but also encourages a spirit of collaboration and innovation. This
openness can lead to diverse applications and improvements, as researchers worldwide can contribute to and
benefit from the advancements in transformer-based language model. The introduction of OPT also brings to
light the challenges associated with managing and curating open-source AI technologies. Ensuring that these
powerful tools are used ethically and responsibly is paramount. The potential for misuse of such models
necessitates stringent guidelines and proactive measures to prevent harm. Furthermore, the success of OPT in
the broader AI community will depend on continuous updates and active maintenance, requiring a committed
effort from both Meta AI and the larger research community.
REaLTabFormer: Generating Realistic Relational

and Tabular Data using Transformers
Aivin V. Solatorio and Olivier Dupriez
Abstract:
Tabular data is a common form of organizing data. Multiple models are available to generate synthetic tabular
datasets where observations are independent, but few have the ability to produce relational datasets. Modeling
STO. TOMAS CAMPUS
relational data is challenging as it requires modeling both a “parent” table and its relationships across tables.
We introduce REaLTabFormer (Realistic Relational and Tabular Transformer), a tabular and relational
synthetic data generation model. It first creates a parent table using an autoregressive GPT-2 model, then
generates the relational dataset conditioned on the parent table using a sequence-to-sequence (Seq2Seq)
model. We implement target masking to prevent data copying and propose the Qδ statistic and statistical
bootstrapping to detect overfitting. Experiments using real-world datasets show that REaLTabFormer captures
the relational structure better than a baseline model. REaLTabFormer also achieves state-of-the-art results on
prediction tasks, “out-of-the-box”, for large nonrelational datasets without needing fine-tuning.
Research Problem:
The problem addressed is the lack of effective models for generating realistic relational and tabular synthetic
data. Existing methods often fall short in preserving the complex relationships and structures inherent in such
data, limiting their utility in practical applications.
The study presents REaLTabFormer as a solution to this problem, demonstrating its ability to generate
synthetic data that closely mirrors real-world datasets. The model was evaluated on several benchmark
datasets, showing significant improvements in data quality and usability compared to traditional methods. The
discussion highlights how REaLTabFormer can facilitate various applications, such as privacy-preserving data
sharing and robust machine learning model training, by providing high-quality synthetic datasets.
Conclusions:
We presented REaLTabFormer, a framework capable of generating high-quality non-relational tabular data
and relational datasets. This work extends the application of sequence-to-sequence models to modeling and
generating relational datasets. We introduced target masking as a component in the model to mitigate data-
copying and safeguarding from potentially sensitive data leaking from the training data. We proposed a
statistical method and the Qδ statistic for detecting overfitting in model training. This statistical method may be
adapted to other generative model training. We showed that our proposed model generates realistic synthetic
tabular data that can be a proxy for real-world data in machine learning tasks. REaLTabFormer’s ability to
model relational datasets accurately compared with existing opensourced alternative contributes to solving
existing gaps in generative models for realistic relational datasets. Finally, this work can be extended and
applied to data imputation, cross-survey imputation, and upsampling for machine learning with imbalanced
data. A BERT-like encoder can be used instead of GPT-2 with the REaLTabFormer for modeling relational
datasets. We also see opportunities to improve privacy protection strategies and the development of more
components like target masking embedded into synthetic data generation models to prevent sensitive data
exposure.
Reaction:
The introduction of REaLTabFormer is a notable development in the realm of synthetic data generation. By
addressing the shortcomings of existing methods, this model offers a powerful tool for researchers and
practitioners who require high-quality synthetic data for training and testing machine learning models. The
ability of REaLTabFormer to generate realistic relational and tabular data can significantly alleviate data
scarcity issues, promoting more robust and comprehensive research. Moreover, its application in privacy-
preserving data sharing is particularly commendable, as it offers a viable solution to the challenges of data
privacy and security.
However, the implementation of REaLTabFormer also presents certain challenges. Ensuring the ethical use of
synthetic data and maintaining the fidelity of generated data to real-world scenarios are critical considerations.
As synthetic data becomes more prevalent, it is essential to establish clear guidelines and standards to prevent
misuse and ensure that the synthetic data is as representative and unbiased as possible. Additionally,
STO. TOMAS CAMPUS
continuous improvement and validation of the model against diverse datasets will be crucial in maintaining its
relevance and effectiveness in different applications.
Tune-A-Video: One-Shot Tuning of Image Diffusion Models

for Text-to-Video Generation
Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng
Shou
Abstract:
To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to
train a text-to-video (T2V) generator. Despite their promising re- *Corresponding Author. sults, such paradigm
is computationally expensive. In this work, we propose a new T2V generation setting—One-Shot Video Tuning,
where only one text-video pair is presented. Our model is built on state-of-the-art T2I diffusion models pre-
trained on massive image data. We make two key observations: 1) T2I models can generate still images that
represent verb terms; 2) extending T2I models to generate mul1 arXiv:2212.11565v2 [cs.CV] 17 Mar 2023 tiple
images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we
introduce Tune-A-Video, which involves a tailored spatiotemporal attention mechanism and an efficient one-
shot tuning strategy. At inference, we employ DDIM inversion to provide structure guidance for sampling.
Extensive qualitative and numerical experiments demonstrate the remarkable ability of our method across
various applications.
Research Problem:
The primary research problem addressed in this paper is the challenge of generating high-quality videos from
text descriptions using limited training data. Traditional methods for text-to-video generation often require
extensive datasets and computational resources, which Tune-A-Video aims to mitigate by using a one-shot
tuning approach.
The study demonstrates that Tune-A-Video can generate coherent and high-quality videos from text
descriptions using significantly less training data compared to conventional methods. The results show that the
model can handle various video generation tasks, including object movement, scene transitions, and complex
interactions. The discussion emphasizes the efficiency of the one-shot tuning approach and its potential to
make text-to-video generation more accessible and practical for broader applications.
Conclusions:
In this paper, we introduce a new task for T2V generation called One-Shot Video Tuning. This task involves
training a T2V generator using only a single text-video pair and pretrained T2I models. We present Tune-A-
Video, a simple yet effective framework for text-driven video generation and editing. To generate continuous
videos, we propose an efficient tuning strategy and structural inversion that enable generating temporally-
coherent videos. Extensive experiments demonstrate the remarkable results of our method spanning a wide
range of applications.
Reaction:
The introduction of Tune-A-Video is a remarkable step forward in the realm of text-to-video generation. This
method addresses a critical bottleneck in the field: the need for extensive datasets and computational power.
STO. TOMAS CAMPUS
By employing a one-shot tuning approach, Tune-A-Video demonstrates that it is possible to achieve high-
quality video outputs with significantly less data. This innovation could democratize video generation
technologies, making them accessible to a wider range of users and applications. The efficiency and versatility
of Tune-A-Video highlight the potential for creative and practical uses, from educational content creation to
entertainment and beyond.
PyGlove: Efficiently Exchanging ML Ideas as Code

Daiyi Peng, Xuanyi Dong, Esteban Real, Yifeng Lu, Quoc V. Le
Abstract:
The increasing complexity and scale of machine learning (ML) has led to the need for more efficient
collaboration among multiple teams. For example, when a research team invents a new architecture like
"ResNet," it is desirable for multiple engineering teams to adopt it. However, the effort required for each team
to study and understand the invention does not scale well with the number of teams or inventions. In this
paper, we present an extension of our PyGlove library to easily and scalably share ML ideas. PyGlove
represents ideas as symbolic rule-based patches, enabling researchers to write down the rules for models they
have not seen. For example, an inventor can write rules that will "add skip-connections." This permits a
network effect among teams: at once, any team can issue patches to all other teams. Such a network effect
allows users to quickly surmount the cost of adopting PyGlove by writing less code quicker, providing a benefit
that scales with time. We describe the new paradigm of organizing ML through symbolic patches and compare
it to existing approaches. We also perform a case study of a large codebase where PyGlove led to an 80%
reduction in the number of lines of code.
Research Problem:
The problem of efficiently sharing and reusing machine learning experiments and components. Traditional
methods often involve manual and time-consuming processes, which can hinder collaboration and slow down
the pace of innovation. PyGlove aims to streamline these processes, enabling more effective and seamless
exchange of ideas and code among researchers.
Symbolic patching is a powerful tool for sharing machine learning ideas when all experiments use the same
symbolic representations, which is typically the case within a shared codebase. In this paper, we have focused
on this singlecodebase scenario, which can be found within industry for example. Yet, this scenario does not
apply to most academia or to the ML community as a whole, where different teams tend to build their own
codebases fully indepedently. Scaling to the multi-codebase scenario poses some problems. In particular,
participant teams would need to agree on a high-level interface. This requires community effort and usage of
shared best practices in software design, and is beyond the scope of this paper. Nevertheless, we speculate
that PyGlove offers new tools to help attain a shared highlevel ML interface across codebases: PyGlove works
through code annotation rather than direct editing, which naturally permits building up the interfaces
incrementally. It is possible to annotate a codebase to only accommodate a particular rule patch of interest
(e.g. one just published by another team). PyGlove removes the need for the common paradigm of
“configuration objects” (see Section 4). Configuration objects have led to debate as to their format and
conventions, preventing codebase convergence. Symbolic ML objects, on the other hand, are editable directly,
eliminating the need for separate configuration. The symbolic approach allows for compositionality, which in
turns permits multi-level interfaces. For example, it is easy for the interface to simultaneously expose an image
classifier as a whole, each layer within it, and the inputs and outputs within each layer. The publication of
useful patches can provide a strong incentive for codebase annotation, which may in turn encourage more
STO. TOMAS CAMPUS
patches, leading to positive reinforcement. We therefore hope that, in addition to sharing code within a
codebase, PyGlove may also have a positive impact on collaboration across ML codebases in the future.
Conclusions:
Machine learning is hindered by the inability to apply conceptual rules to different ML setups in a scalable
manner. Even if the code and experiment definition of a paper are open-sourced, obtaining an ML setup from
the paper and referential codebase is neither straightforward nor reusable, as it needs to be manually parsed
and replicated on other experiments. In this paper, we have expanded PyGlove [15]’s symbolic capabilities
from AutoML to ML, to address this problem. Our proposal encapsulates the process of evolving from ML
setup A to setup B into a list of patching rules. These rules can be reused across ML setups, creating network
effects among the experiments and rules. In addition to patching, we have also demonstrated how symbolic
programming can serve the entire lifecycle of ML effectively. We have also compared PyGlove’s solution with
existing solutions. Through real-world research projects [7; 25] that heavily rely on PyGlove’s patching
capability, we have seen the potential of how PyGlove can change the way ML programs are developed,
organized, and shared. We have opensourced PyGlove and look forward to it being extensively tested by the
machine learning community
Reaction:
The introduction of PyGlove is a significant advancement for the machine learning community, addressing a
critical need for more efficient collaboration and idea exchange. The ability to share and reuse code
seamlessly can greatly enhance productivity and accelerate the pace of research. PyGlove's modular and
flexible framework empowers researchers to build upon existing work, fostering a more collaborative and
innovative environment. This tool can potentially transform how machine learning research is conducted,
enabling faster development of new models and solutions by reducing the duplication of effort and streamlining
workflows.
While PyGlove presents numerous benefits, its adoption also requires the community to embrace a more open
and collaborative mindset. Ensuring the compatibility and integration of different components within PyGlove's
framework is crucial to its success. Additionally, providing thorough documentation and support will be
essential to help researchers transition to using this new tool effectively. As with any collaborative platform, the
quality and maintenance of shared code will be vital to sustaining its utility and effectiveness.
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu
Abstract:
The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities.
ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and
comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness.
On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from
human experts. On the other hand, people are starting to worry about the potential negative impacts that large
language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social
security issues. In this work, we collected tens of thousands of comparison responses from both human
experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological
areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3
dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts,
and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of
ChatGPT-generated content compared with that of humans, where many interesting results are revealed. After
that, we conduct extensive experiments on how to effectively detect whether a certain text is generated by
ChatGPT or humans. We build three different detection systems, explore several key factors that influence
their effectiveness, and evaluate them in different scenarios.
Research Problem:
STO. TOMAS CAMPUS
This paper is determining how closely ChatGPT's performance aligns with that of human experts in various
domains. The study aims to evaluate the effectiveness of ChatGPT using a systematic comparison and to
develop reliable methods for detecting AI-generated text.
The study's results indicate that ChatGPT can perform comparably to human experts in several tasks,
particularly those involving information retrieval, summarization, and conversational interactions. However, it
also identifies areas where ChatGPT falls short, such as nuanced reasoning, context understanding, and
creativity. The discussion emphasizes the importance of these findings for understanding the current
capabilities and limitations of AI, as well as for improving AI models. Additionally, the paper discusses the
effectiveness of different detection methods, noting that while some are quite successful, others require further
refinement.
Conclusions:
In this work, we propose the HC3 (Human ChatGPT Comparison Corpus) dataset, which consists of nearly
40K questions and their corresponding human/ChatGPT answers. Based on the HC3 dataset, we conduct
extensive studies including human evaluations, linguistic analysis, and content detection experiments. The
human evaluations and linguistics analysis provide us insights into the implicit differences between humans
and ChatGPT, which motivate our thoughts on LLMs’ future directions. The ChatGPT content detection
experiments illustrate some important conclusions that can provide beneficial guides to the research and
development of AIGC-detection tools.
Reaction:
The paper "How Close is ChatGPT to Human Experts?" provides a thorough and insightful analysis of
ChatGPT's performance relative to human experts. It is impressive to see how far AI has come, particularly in
tasks such as information retrieval and summarization. The development of a comparison corpus and specific
evaluation metrics is a significant contribution, offering a structured way to measure and understand AI
capabilities. However, the findings also highlight the inherent limitations of current AI models, particularly in
areas requiring deep contextual understanding and creativity. These limitations remind us that while AI can
assist in many tasks, it is not yet a replacement for human expertise in complex and nuanced domains.
On the other hand, the study's exploration of detection methods for AI-generated content is equally crucial. As
AI-generated text becomes more prevalent, the ability to distinguish it from human-generated text is essential
for maintaining trust in various fields, including academia, journalism, and customer service. The progress in
detection methods is promising, but the study also underscores the need for continuous refinement. This is
particularly important as AI models become more sophisticated and their outputs harder to distinguish from
those of humans.
BERT: Pre-training of Deep Bidirectional Transformers for

Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova
Abstract:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder
Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-
train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context
in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to
STO. TOMAS CAMPUS
create state-of-the-art models for a wide range of tasks, such as question answering and language inference,
without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural
language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement),
MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2
(1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Research Problem:
The primary research problem addressed in this paper is the need for a robust pre-training model that can
better understand the context of language by considering both directions (left and right) simultaneously.
Traditional models often process text in a unidirectional manner, limiting their ability to fully grasp the
contextual nuances of language.
The results of the study show that BERT significantly outperforms previous models on several NLP
benchmarks, including the General Language Understanding Evaluation (GLUE) benchmark, the Stanford
Question Answering Dataset (SQuAD), and others. The discussion highlights that BERT's bidirectional training
method enables a deeper understanding of language context, leading to improved performance on tasks such
as question answering, named entity recognition, and sentiment analysis. The authors also discuss the
importance of fine-tuning pre-trained models on specific tasks to achieve optimal results.
Conclusions:
Recent empirical improvements due to transfer learning with language models have demonstrated that rich,
unsupervised pre-training is an integral part of many language understanding systems. In particular, these
results enable even low-resource tasks to benefit from deep unidirectional architectures. Our major contribution
is further generalizing these findings to deep bidirectional architectures, allowing the same pre-trained model to
successfully tackle a broad set of NLP tasks.
Reaction:
The introduction of BERT marks a transformative moment in the field of natural language processing. By
adopting a bidirectional training approach, BERT addresses a fundamental limitation in previous models that
processed text unidirectionally. This innovation allows for a more comprehensive understanding of context,
significantly enhancing performance across a variety of NLP tasks. The paper’s findings that BERT achieves
state-of-the-art results on multiple benchmarks underscore the model's robustness and versatility. This
advancement not only sets a new standard for NLP models but also opens up new possibilities for applications
that require nuanced language understanding.
Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks
in an encoder-decoder configuration. The best performing models also connect the encoder and decoder
through an attention mechanism. We propose a new simple network architecture, the Transformer, based
STO. TOMAS CAMPUS
solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two
machine translation tasks show these models to be superior in quality while being more parallelizable and
requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German
translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT
2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score
of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from
the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to
English constituency parsing both with large and limited training data.
Research Problem:
The inefficiency and complexity of existing sequence-to-sequence models in handling long-range

dependencies in sequences, such as those encountered in machine translation tasks.
The authors demonstrate that the Transformer model achieves state-of-the-art performance on various
machine translation benchmarks while being highly parallelizable, leading to faster training times compared to
previous models.
Conclusions:
In this work, we presented the Transformer, the first sequence transduction model based entirely on attention,
replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-
attention. For translation tasks, the Transformer can be trained significantly faster than architectures based on
recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French
translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all
previously reported ensembles. We are excited about the future of attention-based models and plan to apply
them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other
than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs
such as images, audio and video. Making generation less sequential is another research goals of ours.
Reaction:
'Attention Is All You Need' presents a compelling argument for the transformative power of self-attention
mechanisms in deep learning. By removing the sequential bottleneck inherent in traditional approaches, the
Transformer model not only achieves superior performance but also introduces a paradigm shift in how we
conceptualize and implement sequence-to-sequence tasks. The emphasis on attention mechanisms not only
enhances the model's ability to capture long-range dependencies but also significantly boosts its efficiency,
marking a pivotal advancement in the field of natural language processing. As a result, researchers and
practitioners alike are now equipped with a more robust toolset for tackling complex tasks that demand
nuanced understanding and synthesis of textual data.
Generative Adversarial Networks

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua
Bengio
Abstract:
We propose a new framework for estimating generative models via an adversarial process, in which we
simultaneously train two models: a generative model G that captures the data distribution, and a discriminative
model D that estimates the probability that a sample came from the training data rather than G. The training
STO. TOMAS CAMPUS
procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a
minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G
recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined
by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any
Markov chains or unrolled approximate inference networks during either training or generation of samples.
Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the
generated samples.
Research Problem:
The main issue addressed is the generation of realistic data samples from a latent space distribution without
explicit probabilistic modeling. Traditional generative models struggled with producing high-fidelity outputs, and
GANs aimed to overcome these limitations through adversarial training.
It is stated that through experiments that GANs can generate synthetic data that closely resemble real samples
in various domains, including images and text. They discuss the challenges of training stability and mode
collapse while highlighting the potential of GANs to revolutionize generative modeling by leveraging adversarial
learning dynamics.
Conclusions:
This framework admits many straightforward extensions:
1. A conditional generative model p(x | c) can be obtained by adding c as input to both G and D.
2. Learned approximate inference can be performed by training an auxiliary network to predict z given x. This
is similar to the inference net trained by the wake-sleep algorithm but with the advantage that the inference net
may be trained for a fixed generator net after the generator net has finished training. 7
3. One can approximately model all conditionals p(xS | x6S) where S is a subset of the indices of x by training
a family of conditional models that share parameters. Essentially, one can use adversarial nets to implement a
stochastic extension of the deterministic MP-DBM.
4. Semi-supervised learning: features from the discriminator or inference net could improve performance of
classifiers when limited labeled data is available.
5. Efficiency improvements: training could be accelerated greatly by divising better methods for coordinating G
and D or determining better distributions to sample z from during training. This paper has demonstrated the
viability of the adversarial modeling framework, suggesting that these research directions could prove useful.
Reaction:
'Generative Adversarial Nets' introduces a captivating concept where two neural networks engage in a
competitive dance to improve the generation of synthetic data. This adversarial framework not only challenges
traditional generative models but also promises to redefine how we perceive and create artificial data
representations. By pitting a generator against a discriminator in a continuous feedback loop, GANs excel in
capturing intricate patterns and nuances present in real-world datasets, leading to outputs that mimic the
complexity of natural data sources.
Moving forward, the application of GANs could be expanded beyond image and text generation into domains
such as healthcare, finance, and environmental sciences, where realistic synthetic data can drive simulations
and predictive analytics. Further research should focus on enhancing GANs' stability during training,
addressing issues like mode collapse, and exploring techniques for improving diversity and control over
generated outputs. As GANs continue to evolve, their potential to innovate across various fields remains
profound, offering new avenues for creativity and problem-solving in the realm of artificial intelligence.
Neural Machine Translation by Jointly Learning to Align and Translate

STO. TOMAS CAMPUS
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Abstract:
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional
statistical machine translation, the neural machine translation aims at building a single neural network that can
be jointly tuned to maximize the translation performance. The models proposed recently for neural machine
translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source
sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture
that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder
architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source
sentence that are relevant to predicting a target word, without having to form these parts as a hard segment
explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-
the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis
reveals that the (soft-)alignments found by the model agree well with our intuition.
Research Problem:
The main issue addressed is the generation of realistic data samples from a latent space distribution without
explicit probabilistic modeling. Traditional generative models struggled with producing high-fidelity outputs, and
GANs aimed to overcome these limitations through adversarial training.
The inefficiency of traditional machine translation models in handling long sentences and capturing
dependencies between words in different languages. The paper proposes an attention-based mechanism to
address these challenges and enhance translation accuracy.
Conclusions:
The conventional approach to neural machine translation, called an encoder–decoder approach, encodes a
whole input sentence into a fixed-length vector from which a translation will be decoded. We conjectured that
the use of a fixed-length context vector is problematic for translating long sentences, based on a recent
empirical study reported by Cho et al. (2014b) and Pouget-Abadie et al. (2014). In this paper, we proposed a
novel architecture that addresses this issue. We extended the basic encoder–decoder by letting a model
(soft-)search for a set of input words, or their annotations computed by an encoder, when generating each
target word. This frees the model from having to encode a whole source sentence into a fixed-length vector,
and also lets the model focus only on information relevant to the generation of the next target word. This has a
major positive impact on the ability of the neural machine translation system to yield good results on longer
sentences. Unlike with the traditional machine translation systems, all of the pieces of the translation system,
including the alignment mechanism, are jointly trained towards a better log-probability of producing correct
translations.
Reaction:
Neural Machine Translation by Jointly Learning to Align and Translate' introduces a breakthrough by allowing
translation models to dynamically focus on relevant parts of the input sentence, enhancing both accuracy and
coherence in translations. This attention mechanism not only mirrors human cognitive processes but also
reflects a deeper understanding of how languages interplay during translation tasks. By enabling the model to
'learn' where to look in the source text, the research not only improves the technical aspects of machine
translation but also underscores the potential for AI to simulate and augment human language capabilities. As
this research continues to evolve, it's crucial to explore ways to generalize attention mechanisms beyond
translation tasks, potentially applying them to broader NLP applications like summarization, question
answering, and sentiment analysis. Moreover, refining attention mechanisms to handle noisy or ambiguous
language constructs could further enhance the robustness and reliability of AI-driven language processing
systems. Emphasizing interpretability and user-centric design in future developments will also be key to
fostering trust and adoption of advanced AI technologies in real-world applications.

ITElec2 Act (Finals)

Uploaded by

Copyright:

Available Formats

ITElec2 Act (Finals)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ITElec2 Act (Finals)

Uploaded by

Copyright:

Available Formats

Republic of the Philippines

POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

Name: Calangian, Rigelle L. Date: 10/07/2024 Score: ___/100

APPLICATION OF ARTIFICIAL INTELLIGENCE IN THE ORGANIZATION OF KNOWLEDGE IN

Results and Discussion:

APPLICATION OF ARTIFICIAL INTELLIGENCE IN THE ORGANIZATION OF KNOWLEDGE IN

The study aims to address the following questions:

Results and Discussion:

Literature Review #3:

Machine Learning: Algorithms, Real-World Applications and Research

The study aims to answer the following research questions:

Results and Discussion:

Literature Review #4:

Recent Advances in Deep Learning: An Overview

The study aims to answer the following questions:

Results and Discussion:

Literature Review #5:

TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN

Results and Discussion:

Literature Review #6:

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies,

The study aims to address the following questions:

1. What are the current techniques used to make AI models explainable?

Results and Discussion:

Literature Review #7:

Self-supervised Learning: Generative or Contrastive

The study aims to address the following questions:

1. What are the current methods used in self-supervised learning?

Results and Discussion:

Literature Review #8:

A Survey on Visual Transformer

The study aims to address the following questions:

Results and Discussion:

Literature Review #9:

Learning from Noisy Labels with Deep Neural

The study aims to answer the following research questions:

Results and Discussion:

Literature Review #10:

Adversarial Attacks and Defenses in Machine

Results and Discussion:

Literature Review #11:

CHATGPT OR HUMAN? DETECT AND EXPLAIN. EXPLAINING

Results and Discussion:

Literature Review #12:

OPT: Open Pre-trained Transformer Language Models

Results and Discussion:

Literature Review #13:

REaLTabFormer: Generating Realistic Relational

Results and Discussion:

Literature Review #14:

Tune-A-Video: One-Shot Tuning of Image Diffusion Models

Results and Discussion:

Literature Review #15:

PyGlove: Efficiently Exchanging ML Ideas as Code

Results and Discussion:

Literature Review #16:

Results and Discussion:

Literature Review #17:

BERT: Pre-training of Deep Bidirectional Transformers for