Islandora 161301
Islandora 161301
Islandora 161301
1 Institute for Data Science, Cloud Computing and IT-Security (IDACUS), Hochschule Furtwangen University,
78120 Furtwangen im Schwarzwald, Germany; philipp.ruf@hs-furtwangen.de (P.R.);
manav.madan@hs-furtwangen.de (M.M.)
2 Institut de Recherche en Informatique, Mathématiques, Automatique et Signal (IRIMAS),
Université de Haute-Alsace, 61 rue Albert Camus, 68093 Mulhouse, France; djaffar.ould-abdeslam@uha.fr
* Correspondence: christoph.reich@hs-furtwangen.de
† These authors contributed equally to this work.
Abstract: Nowadays, machine learning projects have become more and more relevant to various
real-world use cases. The success of complex Neural Network models depends upon many factors,
as the requirement for structured and machine learning-centric project development management
arises. Due to the multitude of tools available for different operational phases, responsibilities
and requirements become more and more unclear. In this work, Machine Learning Operations
(MLOps) technologies and tools for every part of the overall project pipeline, as well as involved
roles, are examined and clearly defined. With the focus on the inter-connectivity of specific tools and
comparison by well-selected requirements of MLOps, model performance, input data, and system
quality metrics are briefly discussed. By identifying aspects of machine learning, which can be reused
from project to project, open-source tools which help in specific parts of the pipeline, and possible
combinations, an overview of support in MLOps is given. Deep learning has revolutionized the field
Citation: Ruf, P.; Madan, M.;
Reich, C.; Ould-Abdeslam, D.
of Image processing, and building an automated machine learning workflow for object detection
Demystifying MLOps and Presenting is of great interest for many organizations. For this, a simple MLOps workflow for object detection
a Recipe for the Selection of with images is portrayed.
Open-Source Tools. Appl. Sci. 2021,
11, 8861. https://doi.org/10.3390/ Keywords: MlOps; tool comparison; workflow automation; quality metrics
app11198861
quality while capturing defects such as scratches and broken edges. Building an MLOps
workflow for object detection is a laborious task. The challenges one encounters when
adopting MLOps are directly linked with the complexity due to the high dimensionality of
the data involved in the process.
Data are a core part of any ML system, and generally, they can be divided into three
categories, i.e., structured, semi-structured, and unstructured data. Vision data, i.e., images
and videos, are considered as unstructured data [4]. Analyzing images and videos by
deep neural networks (DNNs) like FastMask has gained much popularity in the recent
past [5]. Nevertheless, processing images with ML is a resource-intensive and complex
process. As the training of DNNs requires a considerable amount of data, automated data
quality assessment is a critical step in an MLOps workflow. It is known that if the quality
of data is compromised to some specific level, this makes a system more prone to failure [6].
For images, it is challenging to define metrics for automated quality assessment because
metrics such as completeness, consistency, etc., cannot be universally defined for image
data sets. On the other hand, structured data are easier to process as one can clearly define
data types, quality ratings (set by domain experts), and quality assessment tests.
Moreover, several tools are available for building an automated MLOps workflow.
These tools can be used to achieve the best outcomes from developing of a model to
deployment until maintenance. In most cases creating an MLOps workflow requires
multiple tools which collaborate to fulfill individual parts.
There is also a high overlap of functionalities provided by many of these tools. The
selection of tools for an optimal MLOps pipeline is a tedious process. The recipe for the
selection of these tools is the requirements that each step of the workflow introduces. This
recipe is also dependent on the maturity level of the machine learning workflow adopted
in an organization and the capabilities of integration of each tool or ingredient. Therefore,
we address these challenges in the work on hand. The main contributions of this paper
are as follows:
• A holistic analysis and depiction of the need for principles of MLOps.
• An intensive consideration of related roles in MLOps workflows and their responsibilities.
• A comprehensive comparison of different supportive open-source MLOps tools, to
allow organizations to make an informed decision.
• An example workflow of object detection with deep learning that shows how a sim-
ple GitFlow-based software development process can be extended for MLOps with
selected MLOps tools.
Demystifying MLOps and selection procedure of tools is of utter importance as the
complexity of ML systems grows with time. Clarification of stages, roles, and tools is
required so that everyone involved in the workflow can understand their part in the
development of whole system.
This paper is structured as follows. In Section 2, a quick overview of work mentioning
and using MLOps techniques is given and is followed by the depiction of workflows in
MLOps in Section 3. With a description of the various roles in Section 4 and a comparison
of MLOps supporting tools and applicable monitoring metrics in Section 5, a use case
for automating the workflow for object detection with deep neural nets is discussed in
Section 6. With potential future research directions, this work is concluded in Section 7.
recovery scenario were given. As the monitoring and CICD aspects are an inevitable part
of MLOps, the work on hand is not concerned with disaster recovery. A definition of
measuring data quality dimensions as well as challenges while applying monitoring tools
was outlined by Coleman in [15]. By translating user-defined constraints into metrics, a
framework for unit tests for data was proposed by Schelter et al. in [16]. Using a declarative
Application Programming Interface (API) which consumes user-defined validation codes,
the incremental- and batch-based assessment of data quality was evaluated on growing
data sets. Regarding the data completeness, consistency, and statistics, the authors pro-
posed a set of constraints and the respective computable quality metrics. Although the
principles of a declarative quality check for datasets are applicable during an MLOps
workflow, the enumeration of this approach is a surrogate for the definition of quality
check systems and services.
Taking the big data value chain into consideration, there are similar requirements to
the domain of ML quality. Various aspects of demands on quality in big data were surveyed
by Taleb et al. in [17] and answered by a proposed quality management framework. While
considering the stated demands on data quality during the various sections of the work on
hand, no specific framework or quality management model for big data value chains, as
introduced by the authors, is proposed. Barrak et al. empirically analyzed 391 open-source
projects which used Data Version Control (DVC) techniques with respect to coupling of
software and DVC artifacts and their complexity evolution in [18]. Their empirical study
concludes that using DVC versioning tools becomes a growing practice, even though
there is a maintenance overhead. As DVC is a part of the work on hand, it neither ex-
clusively focuses on versioning details nor takes repository and DVC-specific statistics
into consideration. Practical aspects of performing ML model evaluation were given by
Ramasubramanian et al. in [19] concerning a real life dataset. In describing selected met-
rics, the authors introduced a typical model evaluation process. While the utilization of
well-known datasets is out of scope for the work on hand, the principles of ML model
evaluation are picked up during the various subsequent sections. A variety of concept
drift detection techniques and approaches were evaluated by Mhemood et al. in [20] with
respect to time series data. While the authors gave a detailed overview of adaption algo-
rithms and challenges during the implementation, aspects of monitoring the appearance
of concept drift are picked up in work on hand. The various methods, systems, and
challenges of Automated Machine Learning (AutoML) were outlined in [21]. Concerning
hyperparameter optimization, meta-learning, and neural architecture search, the common
foundation of AutoML frameworks is described. Subsequently, established and popular
frameworks for automating ML tasks were discussed. As the automation of the various
parts in an ML pipeline becomes more mature, and the framework landscape for specific
problems grows, the inclusion of ML-related automation in MLOps tasks becomes more
attractive. The area of AutoML was surveyed by Zöller et al. in [22]. A comprehensive
overview of techniques for pipeline structure creation, Combined Algorithm Selection
and Hyperparameter optimization (CASH) strategies, and feature generation is discussed
by the authors. As the AutoML paradigm has the potential of performance loss while
training model candidates, different patterns of improving the utilization of such sys-
tems are introduced. By discussing the shortcomings and challenges of AutoML, a broad
overview of this promising discipline was given by the authors. Although we refer to
specific aspects of ML automation in work on hand, models’ deployment and integra-
tion into the target application are treated more intensely. Concerning research data sets,
Peng et al. provided international guidelines on sharing and reusing quality information
in [23]. Based on the Findable, Accessible, Interoperable, and Reusable (FAIR) principles,
different data lifecycle stages and quality dimensions help in systematically processing
and organizing data sets, their quality information, respectively. While determining and
monitoring the quality of datasets and processed data structures are vital to different oper-
ations in MLOps, the work on hand does not address the sharing of preprocessed records.
A systematic approach for utilizing MLOps principles and techniques was proposed by
Appl. Sci. 2021, 11, 8861 5 of 39
Raj in [24]. With a focus on monitoring ML model quality, end-to-end traceability, and
continuous integration and delivery, a holistic overview of tasks in MLOps projects is
given, and real-world projects are introduced. As the authors focused on practical tips for
managing and implementing MLOps projects using the technologies Azure in combination
with MLflow, the work on hand considers a broader selection of supportive frameworks
and tools. Considering automation in MLOps, various roles and actors’ main tasks are
supported by interconnected tooling for the dedicated phases. Wang et al. surveyed the
degree of automation required by various actors (e.g., 239 employees of an international
organization) in defining a human-centric AutoML framework in [25]. With visualizing the
survey answers, an overview of the different actor’s thoughts on automating the various
phases of end-to-end ML life cycles was given and underlined the author’s assumption of
only partly automating processes in such complex and error-prone projects.
Additionally, the landscape of MLOps tools has evolved massively in the last few
years. There has been an emergence of high-quality software solutions in terms of both
open-source and commercial options. The commercial platforms and tools available in the
MLOps landscape make ML systems development more manageable. One such example
is the AWS MLOps framework [26]. The framework is one of the easiest ways to get
started with MLOps. The framework is built on two primary components, i.e., first, the
orchestrator, and second, the AWS CodePipeline instance. It is an extendable framework
that can initialize a preconfigured pipeline through a simple API call. Users are notified by
email about the status of the pipeline. There are certain disadvantages of using commercial
platforms for MLOps. The development process requires multiple iterations, and you
might end up spending much money on a solution that is of no use. Many training runs
do not produce any substantial outcome. To have a flexible and evolving workflow, it is
essential to have a 100% transparent workflow, and with commercial solutions, this can
not be completely ensured. In general, open-source tools are more modular and often offer
higher quality than their counterparts. This is the reason why only open-source tools are
benchmarked in this paper.
Figure 1. (a,b) Difference between Waterfall and DevOps software development life cycle. (c) A manual ML Pipeline.
3.3. DevOps
Traditionally, the developers would wait until the release date to pass the newly
developed code (patch) to the operations team. The operations team would then foresee that
the developed code is deployed with additional infrastructure abstraction, management,
and monitoring tasks. In contrast, DevOps aimed at bridging the gap between the two
branches: Dev and Ops. It responds to the agile need by combining cultural philosophies,
practices, and tools that focus on increasing the delivery of new features in production. It
emphasizes communication, collaboration, and integration between Software Developers
and Operations team [27]. An example of a DevOps workflow is depicted in Figure 1b. As
seen from the Figure, the customer and project manager can redirect the development team
on short notice if there are any changes in the specification. The different phases of DevOps
can be implemented in a shorter duration such that new features can be deployed rapidly.
The prominent actors involved in the DevOps process are also depicted in Figure 1b.
Appl. Sci. 2021, 11, 8861 7 of 39
DevOps has two core practices: Continuous Integration (CI) and Continuous Delivery
(CD). Continuous Integration is a software practice that focuses on automating the process
of code integration from multiple developers. In this practice, the contributors are encour-
aged to merge their code into the main repository more frequently. This enables shorter
development cycles and improves quality, as flaws are identified very early in the process.
The core of this process is a version control system and automated software building and
testing process. Continuous Delivery is a practice in which the software is built in a manner
that is always in a production-ready state. This ensures that changes could be released on
demand quickly and safely. The goal of CD is to get the new features developed to the end
user as soon as possible [27,28].
There is also another practice known as Continuous Deployment, which is often con-
fused with CD. Continuous deployment is a practice in which every change is deployed in
production automatically. However, some organizations have external approval processes
for checking what should be released to the user. In such cases, Continuous delivery is
considered a must, but Continuous deployment is an option that can be left out.
3.6. Operations in ML
Building an ML model is a role designated for a data scientist with the support of a
Domain expert, and it does not intersect with how the business value is produced with
that model. In a traditional ML development life cycle, the operations team is responsible
for deploying, monitoring, and managing the model in production. Once the data scientist
implements the suitable model, it is handed over to the operations team.
There are different techniques in which a trained model is deployed. The two most
common techniques are “Model as a Service” and “Embedded model” [30]. In “Model as a
Service”, the model is exposed as Representational state transfer (REST) API endpoints,
i.e., deploying the model on a web server so that it can interact through REST API and
any application can obtain predictions by transferring the input data through an API call.
The web server could run locally or in the cloud. On the other hand, in the “Embedded
model” case, the model is packaged into an application, which is then published. This use
case is practical when the model is deployed on an edge device. Note that how an ML
model should be deployed is wholly based on the final user’s interaction with the output
generated by the model.
Figure 2. Common actors in MLOps and their responsibilities throughout the workflow.
important. By comparing scenarios with the problem on hand, possible workarounds, and
the determination of human expertise required for solving them, a common understanding
of how the ML system is applied in the end is generated. As the specifics of ML projects
become more apparent within this process, the first draft of technologies required for suc-
cessful implementation can be outlined. Next to data quality attributes, Key Performance
Indicator (KPI)s of the resulting deployment, and the feasibility of the infrastructural
design choices or demands on re-training of models must be identified. Additionally, much
effort is required to fix the current business need for the ML solution. Business metrics
differ significantly from the traditional ML metrics, and a high-performing model does not
always guarantee that a higher business value would be generated.
problem and subsequent plans for implementing the ML model also requires interaction
with data scientist and domain experts.
tools. This area of ML was started to let non-technical domain experts somewhat configure
the model training instead of having to implement the individual steps manually. In as-
sembling different ML concepts and techniques, the development of feature preprocessing,
model selection, and hyperparameter optimization are automated [44]. The process of
selecting an algorithm, as well as its hyperparameters, is often implemented in one singular
step and is referred to as CASH. In order to improve the performance in automated model
training and hyperparameter optimization, techniques like k-fold cross-validation can be
applied for early stopping the training when reaching a particular threshold [22].
governance in ML. With storing each model iteration configuration (e.g., chosen hyperpa-
rameters, data version, and quality demands) and persisting the actual versions of trained
(and well performing) models, a comparable history of solutions to specific input data is
created. Furthermore, the combination of model and data version makes the sharing and
re-validating results in a community easy.
5. Tooling in MLOps
In recent years, many different tools have emerged which help in automating the
ML pipeline. The choice of tools for MLOps is based on the context of the respective ML
solution and the operations setup [24].
In this section, an overview of the different requirements which these tools fulfill is
discussed. Note that different tools automate different phases of the ML workflow. There
is no single open-sourced tool currently known to us that can create a fully automated
MLOps workflow. Specific tasks are more difficult to automate than others, such as data
validation for images. Furthermore, some of the standard Full Reference-based Image
Quality Assessment (FR-IQA) metrics are listed. These metrics could be applied for
automating the data validation part. After discussing typical demands on such tools,
26 open-source tools are benchmarked according to these requirements. A ranking is
provided indicating their fulfillment.
TP-3: Tracking
ML is intrinsically approximate, and the development process is iterative. As the
number of iterations increases, the information necessary for reproducing that experiment
also grows. Tracking in ML is a process through which all modeling-related information,
i.e., the algorithm, its hyperparameters, and the observed performance based on some
metrics, could be saved automatically in a database. This collection of meta-information
helps compare algorithms, runs, and performance metrics.
Tracking the training runs, configured hyperparameters, and the corresponding model
quality metrics are the overall foundation of the model’s quality. Especially when imple-
menting multiple different ML models as a solution to a problem in parallel, comparing
results of measurements may help in selecting feasible model candidates. ML experiment
tracking tools are an essential part of a data scientist tool kit.
Appl. Sci. 2021, 11, 8861 19 of 39
Metric Remarks
The metric measures the response time. Low latency is one of
the critical aspects of a good user experience. In order to create
an expressive measurement, the responses of every tier must be
Serving latency
taken into consideration. Next to the average time to answer a re-
quest, other aspects like perceived page load time may influence
the metering [14].
Average predictions returned in one second. High throughput is
representative of a healthy system. Additionally, measurements
Throughput regarding the throughput could be impacted by additional met-
rics, e.g., total hits per week, average hit size, or the occurring
error rate [14].
Monitoring free disk space helps prevent data loss and avoid
server downtime. The statistics of the respective server’s filesys-
Disk utilization
tem may also help in determining and understanding the appli-
cations health status [13]
System uptime is an important measure of the reliability of the
system. It accounts for the availability of the service. Therefore,
System’s uptime
this performance metric type is especially useful when serving
web-based applications [14].
Monitoring GPU/CPU usage helps identify whether some in-
puts result in longer computation time. This information is
beneficial for tuning the model for a better user experience.
CPU/GPU usage
While fine-tuning the capacity and configuration of infrastruc-
ture, these resources may be resized in order to fit the required
serving capability [14].
While serving the model to many users, monitoring the number
Number of API calls of API calls is a good metric for approximating the maximum
load and systems availability.
Metric Remarks
This metric aims to check whether the data schema in the infer-
ence stage is identical to what was there during training. It is an
Data schema important measure of data integrity. By assuring such referential
integrity, a certain degree of data quality, as well as assumptions
on data completeness can be observed [15].
The metric checks if the data encountered in production is within
the expected range. It could also be used for catching data type
Data consistency or format errors. By comparing the amount of records to previ-
ously known states of the dataset, another consistency factor can
be measured [15].
The metrics checks if the quality of data is similar to what the
model was trained on. If the quality is changed, then that will have
Data quality a direct effect on the model’s performance. Therefore, the different
data quality dimensions must be assured in every stage of the data
lifecycle [17].
The metric aims to examine changes in the distribution of data. It
Data drift checks whether or not the input data’s statistical structure (changes
in input feature space) has changed over time.
Many algorithms are sensitive to outliers and perform poorly
on such data points. Outliers could bring additional knowledge
about the use case and therefore should be investigated in detail.
Outliers
When comparing the incoming information to a previously known
reference or mathematical rule, the validity of data can be mea-
sured [15].
Training/Serving The aim is to check if there is a mismatch of data between data
skew acquired for training and what is received in production.
Table 4. The different metrics that can be used for monitoring ML systems.
Metric Remarks
The distribution shift of the predicted outputs between data
from the training and production phase indicates model degra-
dation. Metrics such as the Population stability index or Diver-
Output distribution gence index could help in alerting when such shifts happen.
Especially in time-series-based datasets, this metric is essential
in order to determine the credibility of existing models on the
new data [19].
This particular metric is dependent on the task at hand. It is
useful for checking whether the current model is good enough
or not. Examples are Auc score, precision, recall, FPR, TPR,
and Confusion Matrix for classification tasks. Such metrics are
Model’s performance difficult to calculate as there might be long latency between
the output and true label. In general, a model’s performance
can be evaluated by its accuracy (e.g., right predictions), gains
(comparison to other models), and accreditation (credibility of
resulting predictions) [19].
The model’s performance might not degrade, but there could be
Feature importance a change in feature importance based on which the predictions
change are made. This could be an indication of drift and possibly a
point where data should be collected again for retraining.
Concept drift is a major source of model performance degra-
dation. It represents the pattern which the model has learned
Concept drift in the past is not valid anymore. For a comparative discus-
sion of active and passive concept drift detection in distributed
environments, we refer to the work in [20].
The trained model could be biased towards one class or group.
Accuracy difference (AD) can be measured for each class to
Bias/Fairness check for model bias. Another possibility is that the model can
be evaluated on specific slices of data for obtaining an in-depth
review of the model’s performance.
Numeric values that are invalid or wrong are possibly learned
during model training. These do not directly trigger any explicit
Numerical stability
errors but can produce significantly wrong predictions in pro-
duction. Checking for numerical stability ensures robustness.
GR-1: Scalability
The scalability of supportive tools for data science projects is a basic requirement.
Depending on the specific task, various aspects like the required processing power, storage
amount, or service latency must be flexible.
GR-2: APIs
By controlling a tool via its served APIs, many tasks can be realized programmatically,
potentially strengthening the robustness of the MLOps pipeline by automating specific
tasks and eliminating errors of manually utilization or configuration of tools.
Appl. Sci. 2021, 11, 8861 23 of 39
5.2.1. MLflow
MLflow [59] is a Machine Learning Platform that can be used for partial automation
for small- to medium-sized data science teams. The tool has four parts—Tracking, Model,
Projects, and registry—targeting a specific phase of the ML lifecycle. The feature of User
management is missing from the tool. It is also not fully customizable (it cannot group
experiments under one architecture).
5.2.2. Polyaxon
Polyaxon [60] is a tool that focuses on two things, i.e., first, it promotes collaboration
among ML teams, and second, it focuses on the complete life cycle management of ML
projects. Unlike MLflow, it Deals with user management and offers code and data version-
ing support. The platform is not fully open-sourced, and it needs Kubernetes. Therefore, it
is not fully architecture agnostic. It provides an interface to run any job on a Kubernetes
cluster. The platform requires greater setup time and might be unnecessary for small teams.
5.2.3. Kedro
The main focus of kedro [61] is pipeline construction and providing a basic project
structure for ML projects (scaffolding). It offers the capabilities of defining pipelines by
using a list of functions, and it also provides inbuilt file and database access wrappers.
It also offers code versioning functionality with pipeline visualization (with an extra
python package, ‘Kedro-Viz’). However, with the pipeline visualization feature, there is no
functionality to see real-time progress monitoring.
Appl. Sci. 2021, 11, 8861 24 of 39
General Requirements
Model Management
Model Deployment
Data Preprocessing
Model Training
Tool Name Introduction/ Use Case
+ +
An open-sourced tool kit for ML experiment tracking, + +
Mlflow (v1.14.1) [59] o MM-1 o GR-2 GR-3
registry, packaging and lifecycle management. TP-3 TP-4 DP-1
MM-2 GR-4
++ ++ +
An enterprise-grade platform for building, training, +
Polyaxon (v1.9.5) [60] + DPR-5 TP-2 TP-3 MM-1 + OMP-1 GR-1 GR-2
and monitoring large-scale deep learning applications. DP-1
TP-4 TP-5 MM-2 GR-3
A python-based tool for creating reproducible, main-
+ + + +
* Kedro (v0.17.3) [61] tainable, and modular data science code. It offers CLI o o
TP-4 MM-1 DP-1 GR-3 GR-4
and UI to visualize data and ML pipelines.
+ DPR-2
++ +
TFX is an end-to-end platform for deploying produc- DPR-3 + + +
TFX (v0.30.0) [62] MM-1 OMP-2
tion ML pipelines. DPR-4 TP-3 DP-1 GR-1 GR-3
MM-2 OMP-3
DPR-5
A lightweight tool for creating reproducible ML + + ++ +
+
* ZenMl (v0.3.8) [63] Pipelines. It provides the ability to shift between cloud DPR-1 TP-3 TP-4 MM-1 o GR-2 GR-3
DP-1
and on-premises environments rapidly. DPR-5 TP-5 MM-2 GR-4
A fully open-source, distributed in-memory machine + DPR-1 + + GR-1 GR-2
H20 [64] o o
learning platform with linear scalability. DPR-2 TP-1 TP-2 MM-1 GR-4
It is a Kubernetes-native platform formed out of a collec-
+ ++ TP-1 TP-2
tion of different components that focuses on orchestrat- +
* Kubeflow (v1.3.0) [65] DPR-2 TP-3 TP-4 + MM-1 + DP-1 + OMP-1
ing, developing, and deploying scalable ML workloads. GR-1 GR-3
DPR-5 TP-5
Comes with a GUI and CLI.
Appl. Sci. 2021, 11, 8861 25 of 39
Table 5. Cont.
General Requirements
Model Management
Model Deployment
Data Preprocessing
Model Training
Tool Name Introduction/ Use Case
Table 5. Cont.
General Requirements
Model Management
Model Deployment
Data Preprocessing
Model Training
Tool Name Introduction/ Use Case
Great
An open-source python framework for profiling, vali- +
Expectations + DPR-3 o o o + OMP-2
dating, and documenting data. GR-3 GR-4
(v0.13.21) [71]
+
An open-sourced project which works as an extension
GitLFS [72] + DPR-5 o o o o GR-2 GR-3
to Git for handling large files.
GR-4
A CLI tool (a library of functions) focused on Continu- +
** CML (v0.5.0) [73] o o o o o
ous Machine learning. GR-4
A functionality of GitHub that helps in automating
** Github +
software development workflows. Not free of charge o o o o o
actions [74] GR-4
for private repositories.
+
CircleCI is a cloud-hosted platform for creating CI/CD
** CircleCI [75] o o o o o GR-1 GR-2
workflows. Not free of charge for private repositories.
GR-3
Open-source Continuous Integration and Continuous ++ GR-1 GR-
** GoCD [76] o o o o o
Delivery system. 2 GR-3 GR-4
A multi-framework tool with CLI for deploying, man- +
Cortex (v1.9.0) [77] o o o o + OMP-1
aging, and scaling ML models. GR-1 GR-2
Appl. Sci. 2021, 11, 8861 27 of 39
Table 5. Cont.
General Requirements
Model Management
Model Deployment
Data Preprocessing
Model Training
Tool Name Introduction/ Use Case
+ OMP-1 +
A framework that simplifies and accelerates ML model
Seldon Core [78] o o + MM-1 ++ DP-1 OMP-2 GR-1 GR-2
deployment on Kubernetes on a large scale.
OMP-3 GR-3
A flexible, high-performance framework for serving, +
++ MM-1
BentoML [79] managing, and deploying machine learning models. o o ++ DP-1 + OMP-1 GR-2 GR-3
MM-2
The tool has both CLI and Web UI. GR-4
+ OMP-1 +
Prometheus [80] A toolkit for monitoring systems and generating alerts. o o o o OMP-2 GR-2 GR-3
OMP-3 GR-4
A framework that is an essential part of the ELK stack
+
* Kibana [81] (Elasticsearch, Logstash, and Kibana). Kibana is used o o o o o
GR-1 GR-3
mainly for visualizing machine logs.
+
A tool that can be used to visualize and analyze any
* Grafana [82] o o o o o GR-2 GR-3
machine-readable data.
GR-4
An open-sourced web application platform for data
labeling, which offers labeling solutions for multiple +
Label Studio [83] + DPR-6 o o o o
data types such as text, images, video, audio, and time GR-2 GR-3
series.
+
Make sense [84] A browser based platform for labeling images. + DPR-6 o o o o GR-1 GR-3
GR-4
* Indicates that tool fulfills other requirements of the ML lifecycle such as pipeline orchestration or visualization. ** Indicates fulfillment of CI/CD tools for the ML lifecycle.
Appl. Sci. 2021, 11, 8861 28 of 39
5.2.4. TFX
TensorFlow Extended (TFX) [62] is an end-to-end platform for deploying production-
ready ML pipelines. The pipeline is formed as a sequence of components that implement
an ML system. To benefit from TFX, one has to use the TF framework libraries for building
individual components of the pipeline. The pipeline can be orchestrated using common
orchestrators like Apache Beam, Apache Airflow, and Kubeflow pipelines. It has higher
integration costs and might not be ideal for small projects where new models are required
less frequently.
5.2.5. ZenML
ZenML [63] is a lightweight tool for creating reproducible ML Pipelines. It provides
the capability to shift between cloud and on-premises environments rapidly. It is focused on
pipeline construction formed through a combination of steps by using Apache Beam. It has
a UI for visualizing pipelines and comparing different pipelines in one Zenml repository.
However, it does not offer support for scheduling pipelines currently. It has powerful, out-
of-the-box integrations to various backends like Kubernetes, Dataflow, Cortex, Sagemaker,
Google AI Platform, and more.
5.2.6. H2O
The H2O platform [64] is a part of a large stack of tools where not every tool is
open-sourced. It comes with a web UI. It is more like an analytics platform, but it is also
regarded as a library due to the different APIs it offers. The tool is used mainly for running
predefined models with automl capabilities, but it does not give the freedom to integrate
state-of-the-art custom models. It has leading AutoML functionality. It is Java-based and
requires Java 7 or later. Additionally, it lacks the capabilities of model governance.
5.2.7. Kubeflow
Kubeflow [65] is a specialized tool for orchestrating ML workflows. One disadvantage
of the tool is that it has high competency with Kubernetes and therefore is more difficult to
configure. The main goal of the tool is to make ML lifecycle and components interaction
(workflow manager) easy on Kubernetes.
5.2.8. Flyte
The focus of this tool is on ML and data processing workflow orchestration. The
Flyte [66] platform is modular and flexible by design. It is directly built on Kubernetes
and therefore gets all the benefits that containerization provides but is also difficult to
configure. It uses “workflow” as a core concept and a “task” as the most basic unit of
computation. It has grafana [82] and prometheus [80] integration for monitoring. It can
generate cross-cloud portable pipelines. There are SDKs for Python, Java, and scala.
5.2.9. Airflow
Airflow [67] is a workflow management platform that offers functionalities for man-
aging and scheduling tasks (depicted as DAGs). Airflow is not limited to Kubernetes like
the kubeflow tool; rather it is designed to be integrated into the Python ecosystem. It is not
intuitive by design and has a steep learning curve for new users. One can define tasks in
python and orchestrate a simple pipeline. It is also quite difficult to integrate Airflow for
ML projects which are already under development.
5.2.10. DVC
DVC is a Version Control System for ML Projects [68]. The tool is capable of tracking
models, datasets (including large dataset 10–100 Gb vs. 2 Gb limit on Github), and pipelines.
It is cloud- and storage-agnostic, i.e., the datasets can be stored, accessed, and versioned
locally or on some cloud platform. It is git compatible and can track experiments but does
not offer a dashboard. It works by creating the file ‘.dvc’ in the project root, where meta
Appl. Sci. 2021, 11, 8861 29 of 39
information such as storage location and format of the data is stored. For unstructured
data, the changes are tracked as new versions by themselves, which requires high storage
capacity. It is a very lightweight tool which offers great features with the minimum effort
required for integrating it into any Gitflow workflow.
5.2.11. Pachyderm
Pachyderm [69] is another tool that runs on top of Kubernetes and Docker. It helps
configure continuous integration with container images. It has a git-like immutable file
system. It has a steep learning curve due to dependency on the Kubernetes cluster if using
the free version. Compared to DVC, it offers many options for data engineering, such as
creating a data lineage by tracking the sources of data. The web dashboard is not available
in the free version.
5.2.14. GitLFS
It is an extension created by Atlassian for Git. It introduced the concept of tracking
pointers instead of the actual files. Unlike DVC, Git-lfs requires installing a dedicated
server. The servers cannot scale as DVC does. The framework is used for storing large files,
but with GitHub, that limit is up to 2 GB (limitation of Github as Github has a 2 GB limit of
the file size with Git LFS).
5.2.15. CML
The tool is for implementing CI/CD/CT functionality in ML projects. It is ideal if one
is using GitFlow for data science, i.e., using git ecosystem and DVC for ML development
lifecycle. Every time some code or data is changed, the ML model can be automatically
retrained, tested, or deployed. It relies on GitHub actions and GitLab CI but offers an
abstraction on the top for better usability. It is intended to save model metrics and other
artifacts as comments in GitHub/Lab.
5.2.17. CircleCI
CircleCI [75] is a cloud-hosted platform for creating CI/CD workflows that is not free
of charge for private repositories. It is an alternative to CML but with more functionality. It
has a dashboard and can generate detailed reports. It is a complete continuous integration
system. The tool was originally developed for DevOps and not intended for ML purposes.
5.2.18. GoCD
The GoCD tool is focused on Continuous Delivery (CD). It can be used for tracing
(visualizing) the changes in the workflow and also for creating one. It has native Docker
Appl. Sci. 2021, 11, 8861 30 of 39
and Kubernetes support. For the open-source version, you have to set up and maintain
the GoCD server (acting as central management endpoint). Different options for the
automation of triggers are also available.
5.2.19. Cortex
The main purpose of the tool is ML Model serving (as a web service) and Monitoring.
It is built on top of Kubernetes and has the capability of autoscaling. It requires a lot of
effort for setting up and currently supports AWS only (therefore, the tool can be seen as
an open-source alternative to Amazon’s Sagemaker). Cortex includes Prometheus [80] for
metrics collection and Grafana [82] for visualization.
5.2.21. BentoML
The framework is focused on building and shipping high-performance prediction
services, i.e., models serving the purpose of solving specific problems. With just a few
lines of code, the model can be converted into a production REST API endpoint. Similar to
MLflow models, it offers a unified model packaging format. It acts as a wrapper around the
ML model for converting it into a prediction service. This approach is also followed up by
the tool Seldon core. It offers an extensible API. There is support available for almost every
major ML framework (Pytorch, Tensorflow, etc.). It currently does not handle horizontal
scaling, i.e., spinning up more nodes for prediction service when required.
5.2.22. Prometheus
It is a metric-based monitoring and alerting system which scraps metrics and saves
them to a database as time-series data. Applications do not send data to Prometheus,
but it is the tool that pulls data. Each target is scrapped at a fixed interval. It is highly
customizable and modular. The only disadvantage is that with time, managing the database
and the server becomes difficult. The tool is mostly used together with Grafana [82].
5.2.23. Kibana
Kibana is a data visualization and exploration tool. The user has to connect to Elas-
ticsearch (a popular analytics and search engine). It can be run on-premise or on cloud
platforms and has options for creating and saving browser-based dashboards. The tool has
a steep learning curve for new users, especially for the initial setup. If one does not use
Elasticsearch, then the tool is of no use.
5.2.24. Grafana
Grafana [82] is a metric-based tool that supports various data sources such as the
Prometheus database, Elasticsearch, etc., generally used with Prometheus. It comes with
an in-built alert system. It can be deployed on-premises, but this increases maintainabil-
ity. There are fewer options for customization of the dashboard by default and require
installation of extra plugins, which may increase complexity.
Designing an automated MLOps workflow follows the semi-identical recipe for any
domain. What changes from one domain to another are metrics involved in data quality
assessment, model evaluation, and system evaluation stages. For example, when working
with object detection using images, one may use IOU as a metric for model evaluation.
However, the same metric cannot be used for a natural language processing task. The
requirements on which the different tools are benchmarked are chosen so that the choice of
the domain becomes irrelevant. Even if an MLOps workflow is designed using the tools
mentioned in Table 5, the workflow will still have to be tweaked to fit a particular use case.
Concisely, the MLOps workflow for Image Processing mainly with data-driven ap-
proaches follows a similar pattern as explained in Section 3. However, it has more com-
plexity when compared to workflows for structured datasets. The reasons for this high
complexity are as follows:
• High dimensionality of the data involved.
• Interpretation and understanding of visual data is a complex task.
• Image data sets are generally enormous which require extra storage resources.
• Training of data-driven model for image processing is a time-consuming task that
requires more extensive computational resources.
• The quality assessment of images is a challenging task. The image quality is dependent
on various factors such as external sources (e.g., poor lighting conditions) or the
camera’s hardware itself.
A suitable starting point for developing the workflow is thinking of a manual pipeline
for object detection and automating each stage individually. From the manual pipeline
shown in Figure 1c, we need to automate five stages, i.e., Data analysis, Data preparation,
Model training, Model testing, and, finally, deployment. The particular tools which
automate the stages mentioned above are listed below.
• Data analysis: Git and DVC
• Data preparation: GitHub actions
• Model training, Model testing, and deployment: MLflow
We only focus on the design for simplification and overlook the reasoning for selecting
these particular tools. Additionally, in this example, there is no monitoring of the model or
the workflow. The model serving is just limited to saving ML models as a python function
at a fixed location. Furthermore, these tools do not require any significant additional
installations, and most of them are already commonly used for software development. This
ensures reusability, and in order to automate an old model training script, one only has to
integrate MLflow API and set up a new DVC repository.
Regarding the automation level of a workflow, also note that an MLOps workflow
can be distinguished based on the maturity level of the data science process followed in
an organization and also on how many end users interact with the ML model. A lower
maturity level represents a small team working on data science with manual ML workflow
execution (in most cases, an individual data scientist). In such scenarios, a model is not
deployed to multiple users, and there is no monitoring of the deployed service. The data
science team hands over the trained model to the operations team for further usage, like its
integration into the target application.
Interestingly, most organizations start with a data-centric approach and then work their
way to a model-centric and, then, pipeline centric approach when adopting MLOps [29].
The data-centric approach entails that initially, some models are created to check whether or
not business value can be created from the data gathered at the organization. This is the first
attempt at applying ML to data (low maturity level) where simple modeling approaches
are preferred. Once the proof of concept is ready, the organization shifts to more complex
state-of-the-art algorithms for solving the problem at hand (increasing maturity level). The
final stage of this characterization is the pipeline-centric approach which represents that an
organization has ML models which are critical for generating business value and are looking
for ways to scale them for achieving continuous development (highest level of maturity).
Appl. Sci. 2021, 11, 8861 33 of 39
Figure 3. Basic MLOps workflow with automated training, Automl, model packaging, and model registry.
for this task. Furthermore, there is also an expansion of the skills required by different
actors in the ML lifecycle to create any real business value. For example, earlier, the role
of data scientist was limited to experimentation and modeling. However, it can be seen
from Figure 2 that close collaboration with operation or software engineers is needed for
the successful deployment of an ML model.
In future work, the potentials of AutoML capabilities for MLOps workflows and
their deep integration and orchestration within such operations could be investigated. By
revealing the combinations of state-of-the-art software, their infrastructural restrictions
and requirements could be focused on. As the tool variety for supporting the different
phases in ML projects is constantly evolving, many operational tools are expected to be
refined in the future. Exciting and novel offerings for interconnection capabilities with
other tools and a more holistic metric support may be worth further investigation. The
implementation of expert interviews regarding their experience with interconnecting and
integrating various tools for creating complete MLOps pipelines and ecosystems are of
interest, too. As concepts for assuring data quality constantly emerge, pursuing their
integration into MLOps tools and workflows appears to be worthwhile.
Author Contributions: Conceptualization, P.R. and M.M.; methodology, P.R. and M.M.; formal
analysis, P.R. and M.M.; investigation, P.R. and M.M.; resources, P.R. and M.M.; data curation,
P.R. and M.M.; writing—original draft preparation, P.R. and M.M.; writing—review and editing,
P.R., M.M., C.R. and D.O.-A.; supervision, C.R. and D.O.-A.; project administration, C.R.; funding
acquisition, C.R. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by by the Ministry of Science, Research and the Arts of the State
of Baden-Württemberg (MWK BW) under reference number 32-7547.223-6/12/4.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Acknowledgments: The contents of this publication are taken from the research project “(Q-AMeLiA)—
Quality Assurance of Machine Learning Applications”, which is supervised by Hochschule Furtwangen
University (Christoph Reich, IDACUS).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Sculley, D.; Holt, G.; Golovin, D.; Davydov, E.; Phillips, T.; Ebner, D.; Chaudhary, V.; Young, M.; Crespo, J.F.; Dennison, D. Hidden
Technical Debt in Machine Learning Systems. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee,
D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Cambridge, MA, USA, 2015; Volume 28.
2. Goyal, A. Machine Learning Operations. In International Journal of Information Technology Insights & Transformations [ISSN:
2581-5172 (Online)]; Eureka Journals: Pune, India 2020; Volume 4.
3. Raj, E.; Westerlund, M.; Espinosa-Leal, L. Reliable Fleet Analytics for Edge IoT Solutions. arXiv 2021, arXiv:2101.04414.
4. Rai, R.K. Intricacies of unstructured data. EAI Endorsed Trans. Scalable Inf. Syst. 2017. [CrossRef]
5. Mohammadi, B.; Fathy, M.; Sabokrou, M. Image/Video Deep Anomaly Detection: A Survey. arXiv 2021, arXiv:2103.01739.
6. Shrivastava, S.; Patel, D.; Zhou, N.; Iyengar, A.; Bhamidipaty, A. DQLearn: A Toolkit for Structured Data Quality Learning.
In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020;
pp. 1644–1653.
7. Tamburri, D.A. Sustainable MLOps: Trends and Challenges. In Proceedings of the 2020 22nd International Symposium on
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, 1–4 September 2020; pp. 17–23.
8. Fursin, G.; Guillou, H.; Essayan, N. CodeReef: an open platform for portable MLOps, reusable automation actions and
reproducible benchmarking. arXiv 2020, arXiv:2001.07935.
9. Granlund, T.; Kopponen, A.; Stirbu, V.; Myllyaho, L.; Mikkonen, T. MLOps Challenges in Multi-Organization Setup: Experiences
from Two Real-World Cases. arXiv 2021, arXiv:2103.08937.
10. Zhao, Y. Machine Learning in Production: A Literature Review. 2021. Available online: https://staff.fnwi.uva.nl/a.s.z.belloum/
LiteratureStudies/Reports/2021-LiteratureStudy-report-Yizhen.pdf (accessed on 27 July 2021).
11. Muralidhar, N.; Muthiah, S.; Butler, P.; Jain, M.; Yu, Y.; Burne, K.; Li, W.; Jones, D.; Arunachalam, P.; McCormick, H.S.; et al. Using
AntiPatterns to avoid MLOps Mistakes. arXiv 2021, arXiv:2107.00079.
Appl. Sci. 2021, 11, 8861 37 of 39
12. Silva, L.C.; Zagatti, F.R.; Sette, B.S.; dos Santos Silva, L.N.; Lucrédio, D.; Silva, D.F.; de Medeiros Caseli, H. Benchmarking
Machine Learning Solutions in Production. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning
and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020; pp. 626–633.
13. Sureddy, M.R.; Yallamula, P. A Framework for Monitoring Data Warehousing Applications. Int. Res. J. Eng. Technol. 2020,
7, 7023–7029.
14. Shivakumar, S.K., Web Performance Monitoring and Infrastructure Planning. In Modern Web Performance Optimization: Methods,
Tools, and Patterns to Speed Up Digital Platforms; Apress: Berkeley, CA, USA, 2020; pp. 175–212. [CrossRef]
15. Sebastian-Coleman, L. Chapter 4—Data Quality and Measurement. In Measuring Data Quality for Ongoing Improvement; Sebastian-
Coleman, L., Ed.; MK Series on Business Intelligence; Morgan Kaufmann: Boston, MA, USA, 2013; pp. 39–53. doi:10.1016/B978-0-
12-397033-6.00004-3. [CrossRef]
16. Schelter, S.; Lange, D.; Schmidt, P.; Celikel, M.; Biessmann, F.; Grafberger, A. Automating large-scale data quality verification.
Proc. VLDB Endow. 2018, 11, 1781–1794. [CrossRef]
17. Taleb, I.; Serhani, M.A.; Dssouli, R. Big data quality: A survey. In Proceedings of the 2018 IEEE International Congress on Big
Data (BigData Congress), San Francisco, CA, USA, 2–7 July 2018; pp. 166–173.
18. Barrak, A.; Eghan, E.E.; Adams, B. On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects. In
Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Honolulu,
HI, USA, 9–12 March 2021; pp. 422–433. [CrossRef]
19. Ramasubramanian, K.; Singh, A. Machine learning model evaluation. In Machine Learning Using R; Apress: Berkeley, CA, USA;
2017; pp. 425–464. [CrossRef]
20. Mehmood, H.; Kostakos, P.; Cortes, M.; Anagnostopoulos, T.; Pirttikangas, S.; Gilman, E. Concept drift adaptation techniques in
distributed environment for real-world data streams. Smart Cities 2021, 4, 349–371. [CrossRef]
21. Hutter, F.; Kotthoff, L.; Vanschoren, J. (Eds.) Automated Machine Learning: Methods, Systems, Challenges; Springer: Berlin, Germany,
2018; in press. Available online: http://automl.org/book (accessed on 27 July 2021).
22. Zöller, M.A.; Huber, M.F. Survey on automated machine learning. arXiv 2019, arXiv:1904.12054.
23. Peng, G.; Lacagnina, C.; Downs, R.R.; Ramapriyan, H.; Ivánová, I.; Ganske, A.; Jones, D.; Bastin, L.; Wyborn, L.; Bastrakova,
I.; et al. International Community Guidelines for Sharing and Reusing Quality Information of Individual Earth Science Datasets.
OSF Preprints, 16 April 2021. Available online: https://osf.io/xsu4p (accessed on 27 August 2021).
24. Raj, E. Engineering MLOps: Rapidly Build, Test, and Manage Production-Ready Machine Learning Life Cycles at Scale; Packt Publishing:
Birmingham, UK, 2021.
25. Wang, D.; Liao, Q.V.; Zhang, Y.; Khurana, U.; Samulowitz, H.; Park, S.; Muller, M.; Amini, L. How Much Automation Does a Data
Scientist Want? arXiv 2021, arXiv:2101.03970.
26. AWS MLOps Framework. Available online: https://aws.amazon.com/solutions/implementations/aws-mlops-framework/
(accessed on 14 September 2021).
27. Sharma, S. The DevOps Adoption Playbook: A Guide to Adopting DevOps in a Multi-Speed IT Enterprise; John Wiley & Sons: Hoboken,
NJ, USA, 2017.
28. Karamitsos, I.; Albarhami, S.; Apostolopoulos, C. Applying DevOps practices of continuous automation for machine learning.
Information 2020, 11, 363. [CrossRef]
29. Mäkinen, S.; Skogström, H.; Laaksonen, E.; Mikkonen, T. Who Needs MLOps: What Data Scientists Seek to Accomplish and
How Can MLOps Help? arXiv 2021, arXiv:2103.08942.
30. Treveil, M.; Omont, N.; Stenac, C.; Lefevre, K.; Phan, D.; Zentici, J.; Lavoillotte, A.; Miyazaki, M.; Heidmann, L. Introducing
MLOps; O’Reilly Media: Sebastopol, CA, USA, 2020.
31. Lu, J.; Liu, A.; Dong, F.; Gu, F.; Gama, J.; Zhang, G. Learning under concept drift: A review. IEEE Trans. Knowl. Data Eng. 2018,
31, 2346–2363. [CrossRef]
32. Baylor, D.; Haas, K.; Katsiapis, K.; Leong, S.; Liu, R.; Menwald, C.; Miao, H.; Polyzotis, N.; Trott, M.; Zinkevich, M. Continuous
Training for Production ML in the TensorFlow Extended (TFX) Platform. In Proceedings of the 2019 USENIX Conference on
Operational Machine Learning (OpML 19), Santa Clara, CA, USA, 20 May 2019; pp. 51–53.
33. Google. MLOps: Continuous Delivery and Automation Pipelines in Machine Learning. Available online: https://cloud.google.
com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning/ (accessed on 3 May 2021).
34. Maydanchik, A. Data Quality Assessment; Technics Publications: Basking Ridge, NJ, USA, 2007.
35. Verheul, I.; Imming, M.; Ringerma, J.; Mordant, A.; Ploeg, J.L.V.D.; Pronk, M. Data Stewardship on the Map: A study of Tasks and
Roles in Dutch Research Institutes. 2019. Available online: https://zenodo.org/record/2669150#.YUw2BH0RVPY (accessed on
27 August 2021).
36. Wende, K. A model for data governance–Organising accountabilities for data quality management. In Proceedings of the Data
Stewardship on the Map: A Study of Tasks and Roles in Dutch Research Institutes, Toowoomba, Australia, 5–7 December 2007.
37. Pergl, R.; Hooft, R.; Suchánek, M.; Knaisl, V.; Slifka, J. “Data Stewardship Wizard”: A tool bringing together researchers, data
stewards, and data experts around data management planning. Data Sci. J. 2019, 18, 59. [CrossRef]
38. Peng, G.; Ritchey, N.A.; Casey, K.S.; Kearns, E.J.; Privette, J.A.; Saunders, D.; Jones, P.; Maycock, T.; Ansari, S. Scientific
Stewardship in the Open Data and Big Data Era-Roles and Responsibilities of Stewards and Other Major Product Stakeholders.
2016. Available online: https://www.dlib.org/dlib/may16/peng/05peng.html (accessed on 27 August 2021).
Appl. Sci. 2021, 11, 8861 38 of 39
39. Mons, B. Data Stewardship for Open Science: Implementing FAIR Principles; CRC Press: Boca Raton, FL, USA, 2018.
40. Mons, B.; Neylon, C.; Velterop, J.; Dumontier, M.; da Silva Santos, L.O.B.; Wilkinson, M.D. Cloudy, increasingly FAIR; revisiting
the FAIR Data guiding principles for the European Open Science Cloud. Inf. Serv. Use 2017, 37, 49–56. [CrossRef]
41. Zubair, N.; Hebbar, K.; Simmhan, Y. Characterizing IoT data and its quality for use. arXiv 2019, arXiv:1906.10497.
42. Gudivada, V.; Apon, A.; Ding, J. Data quality considerations for big data and machine learning: Going beyond data cleaning and
transformations. Int. J. Adv. Softw. 2017, 10, 1–20.
43. Dong, X.L.; Rekatsinas, T. Data integration and machine learning: A natural synergy. In Proceedings of the 2018 International
Conference on Management of Data, Houston, TX, USA, 10–15 June 2018; pp. 1645–1650.
44. Kißkalt, D.; Mayr, A.; Lutz, B.; Rögele, A.; Franke, J. Streamlining the development of data-driven industrial applications by
automated machine learning. Procedia CIRP 2020, 93, 401–406. [CrossRef]
45. Lee, Y.; Scolari, A.; Chun, B.G.; Weimer, M.; Interlandi, M. From the Edge to the Cloud: Model Serving in ML. NET. IEEE Data
Eng. Bull. 2018, 41, 46–53.
46. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: from error visibility to structural similarity. IEEE
Trans. Image Process. 2004, 13, 600–612. [CrossRef] [PubMed]
47. Li, C.; Bovik, A.C. Content-partitioned structural similarity index for image quality assessment. Signal Process. Image Commun.
2010, 25, 517–526. [CrossRef]
48. Wang, Z.; Li, Q. Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process. 2010,
20, 1185–1198. [CrossRef] [PubMed]
49. Li, C.; Bovik, A.C. Three-component weighted structural similarity index. In Proceedings of the Image Quality and System
Performance VI. International Society for Optics and Photonics, 19–21 January 2009, San Jose, CA, USA; Volume 7242, p. 72420Q.
50. de Freitas Zampolo, R.; Seara, R. A comparison of image quality metric performances under practical conditions. In Proceedings
of the IEEE International Conference on Image Processing 2005, Genoa, Italy, 11–14 September 2005; Volume 3, pp. 1192–1195.
[CrossRef]
51. Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [CrossRef] [PubMed]
52. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image
Process. 2011, 20, 2378–2386. [CrossRef] [PubMed]
53. Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient magnitude similarity deviation: A highly efficient perceptual image quality
index. IEEE Trans. Image Process. 2013, 23, 684–695. [CrossRef]
54. Egiazarian, K.; Astola, J.; Ponomarenko, N.; Lukin, V.; Battisti, F.; Carli, M. New full-reference quality metrics based on HVS. In
Proceedings of the Second International Workshop on Video Processing and Quality Metrics, Scottsdale, AZ, USA, 22–24 January
2006; Volume 4.
55. Larson, E.C.; Chandler, D.M. Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron.
Imaging 2010, 19, 011006.
56. Lee, D.; Plataniotis, K.N. Towards a full-reference quality assessment for color images using directional statistics. IEEE Trans.
Image Process. 2015, 24, 3950–3965.
57. Zhang, L.; Shen, Y.; Li, H. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image
Process. 2014, 23, 4270–4281. [CrossRef]
58. Mattson, P.; Cheng, C.; Coleman, C.; Diamos, G.; Micikevicius, P.; Patterson, D.; Tang, H.; Wei, G.Y.; Bailis, P.; Bittorf, V.; et al.
Mlperf training benchmark. arXiv 2019, arXiv:1910.01500.
59. MLflow. MLflow. Available online: https://mlflow.org/ (accessed on 27 July 2021).
60. Polyaxon—Machine Learning at Scale. Available online: https://polyaxon.com/ (accessed on 27 July 2021).
61. Kedro: A Python Framework for Creating Reproducible, Maintainable and Modular Data Science Code. Available online:
https://github.com/quantumblacklabs/kedro (accessed on 27 July 2021).
62. Baylor, D.; Breck, E.; Cheng, H.T.; Fiedel, N.; Foo, C.Y.; Haque, Z.; Haykal, S.; Ispir, M.; Jain, V.; Koc, L.; et al Tfx: A tensorflow-
based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1387–1395.
63. ZenML. Available online: https://zenml.io/ (accessed on 27 July 2021).
64. H2O: Tfully Open Source, Distributed in-Memory Machine Learning Platform. Available online: https://www.h2o.ai/products/
h2o/ (accessed on 2 September 2021).
65. Kubeflow: The Machine Learning Toolkit for Kubernetes. Available online: https://www.kubeflow.org/ (accessed on 27 July 2021).
66. Flyte: The Workflow Automation Platform for Complex, Mission-Critical Data and ML Processes at Scale. Available online:
https://flyte.org/ (accessed on 27 July 2021).
67. Apache Airflow, a Platform Created by the Community to Programmatically Author, Schedule and Monitor Workflows. Available
online: https://airflow.apache.org/ (accessed on 27 July 2021).
68. DVC: Open-Source Version Control System for Machine Learning Projects. Available online: https://dvc.org/ (accessed on
27 July 2021).
69. The Data Foundation for Machine Learning. Available online: https://www.pachyderm.com/ (accessed on 27 July 2021).
70. Quilt. Available online: https://quiltdata.com/ (accessed on 27 July 2021).
71. Great Expectations. Available online: https://greatexpectations.io/ (accessed on 27 July 2021).
Appl. Sci. 2021, 11, 8861 39 of 39
72. Git Large File Storage (LFS). Available online: https://git-lfs.github.com/ (accessed on 27 July 2021).
73. Continuous Machine Learning (CML). Available online: https://cml.dev/ (accessed on 27 July 2021).
74. GitHub Actions. Available online: https://github.com/features/actions (accessed on 27 July 2021).
75. circleci. Available online: https://circleci.com/ (accessed on 27 July 2021).
76. gocd. Available online: https://www.gocd.org/ (accessed on 27 July 2021).
77. Cortex. Available online: https://www.cortex.dev/ (accessed on 27 July 2021).
78. Seldon Core. Available online: https://github.com/SeldonIO/seldon-core (accessed on 27 July 2021).
79. BentoML. Available online: https://github.com/bentoml/BentoML (accessed on 27 July 2021).
80. Prometheus—Monitoring System and Time Series Database. Available online: https://prometheus.io/ (accessed on 27 July 2021).
81. Kibana. Available online: https://www.elastic.co/kibana/ (accessed on 27 July 2021).
82. Grafana: The Open Observability Platform. Available online: https://grafana.com (accessed on 27 July 2021).
83. Lable Studio. Available online: https://labelstud.io/ (accessed on 27 July 2021).
84. Make Sense. Available online: https://www.makesense.ai/ (accessed on 27 July 2021).
85. Tao, X.; Zhang, D.; Ma, W.; Liu, X.; Xu, D. Automatic metallic surface defect detection and recognition with convolutional neural
networks. Appl. Sci. 2018, 8, 1575. [CrossRef]
86. Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.R. A unifying
review of deep and shallow anomaly detection. In Proceedings of the IEEE, Xiamen, China, 28–30 July 2021. [CrossRef]
87. Ultralytics. YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 22 July 2021).
88. Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for
vision architectures. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013;
pp. 115–123.
89. Ahmed, M.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Survey and Performance Analysis of Deep Learning
Based Object Detection in Challenging Environments. Sensors 2021, 21, 5116. [CrossRef]