Research Papers

Enhancement of a Transformation Algorithm to Migrate SQL Database into NoSQL Graph Database

Authors

Hira Lal Bhandari
Roshan Chitrakar

Abstract

With the proliferation of data, relational database management systems face the challenge of managing big volumes of data because of the absence of dynamic data models, high functioning, high scalability, high availability, and low cost. The NoSQL technology has the features which RDBMS lacks. Therefore, many organizations are migrating from RDBMS to NoSQL databases. The migration from RDBMS to NoSQL is not always effective if the data is lost, if the data is replicated while transmission, and if the performance is degraded. For this reason, enhancement of transformation algorithm proposed by Shabana Ramzan et al. has been performed to implement the methodology which is used to successfully migrate data from structured and normalized SQL database MySQL to NoSQL graph database Neo4j. SchemaCrawler engine has been used to metamodeling from source to destination database and execute the transformation rules. This study has also evaluated and validated the performance of the enhanced algorithm after successful migration of data. The NoSQL graph database Neo4j handles the most common form of data that are unstructured, semi-structured, and structured as well as they support the ACID property of relational databases. This feature is not supported by other NoSQL databases such as Oracle NoSQL, MongoDB, and HBase databases.

Keywords:

Year: 2024

Volume 23

Page/Article: 35

DOI: 10.5334/dsj-2024-035

Submitted on Nov 21, 2023

Accepted on May 18, 2024

Published on Jun 10, 2024

Peer Reviewed

CC Attribution 4.0

List of Abbreviations

CSV	Comma-separated values
DBMS	Database Management System
IDE	Integrated Development Environment
JSON	JavaScript Object Notation
Neo4j	Network Exploration and Optimization “for” Java
NOSQL	Not Only Structure Query Language
RDBMS	Relational Database Management System
SQL	Structured Query Language

1. Introduction

In NoSQL, databases are not built principally on tables, and commonly do not have the concept of using Structured Query Language for data manipulation () unlike a Relational Database Management System. Nowadays, non-relational model NoSQL database systems are fast prominence as an alternative model for handling the data due to many advantages () which are: easily scalable, high performance, highly available, dynamic data model, fast development cycle, low cost, and so on. NoSQL databases are in different types according to their database modeling ways. Different forms of databases are: key-value, document, column oriented and graph oriented ().

A graph data storage is a system that incorporates graph constructions as nodes and edges, to get representation and storage of data. The mostly preferred graph data model is considered to be labeled property archetypal (). The NoSQL graph database nowadays is gaining more popularity due to its flexible structure of representing almost all kinds of databases such as organized, semi-organized and unorganized. There is a growing status of data generation and almost all of them are connected because of rapid expansion of data from web and social networks, and therefore, most of the companies compulsorily need to migrate/transform their large volume of data systems into new data management systems which system enables to handle a large volume of highly connected data.

Ramzan et al. () proposed an algorithm to transfer data from SQL database to NoSQL graph database. They also presented a framework that implements a transformation algorithm using SiTra tool to automatically map metadata and transfer database of SQL server to Oracle NoSQL. The algorithm implemented has used schema and object modeling between SQL database, that is, SQL server and NoSQL database, that is, Oracle NoSQL and transformed dataset from SQL to NoSQL. But we have identified a few problems viz. (1) The instantiation of table from source database to destination database Oracle NoSQL cannot be useful for that of MySQL to Neo4j graph database; (2) Schema mapping and data modeling of database from SQL server to Oracle NoSQL has different in that of MySQL to Neo4j; (3) Mapping column from MySQL Server to Oracle NoSQL and MySQL to Neo4j have different rules; and (4) Implementation of transformation algorithm using SiTra tool requires different transformations rules instead of using SchemaCrawler for the data migration.

And hence, this research work aims to enhance the transformation algorithm proposed by Ramzan et al. This study also presents the evaluation of the efficiency with different key matrices and validation of database using data manipulation language (DDL) and data definition language (DML).

Hira Lal Bhandari*, and Roshan Chitrakar () have introduced a conversation, investigation, and correlation among seven distinct methods of information movement from SQL data set to NoSQL data set. Using SysGauge, a system tool, the migration was carried out with the appropriate tools and frameworks for each method, and the outcomes were evaluated, analyzed, and verified. Different parameters like speed, execution time, maximum CPU usage, and maximum memory usage have been used in the analysis and comparison of databases.

Rahma S. Al Mahruqi, Manar H. Alalfi, and Thomas R. Dean () demonstrates a semi-automated framework for the analysis and migration of three existing web applications in order to migrate the PHP code to interact with the migrated database and extract, classify, translate, migrate, and optimize the queries. He describes the four phases of the strategy: The schema is moved in the first stage, and the data from the SQL database is moved into the new NoSQL schema in the second stage. The migration and optimization of the SQL queries into MongoDB actions that are equivalent to them is the next step. In the last stage, the adaption of the application code utilizes the deciphered questions. This approach gives the accompanying commitments: assessing and executing existing pattern relocation calculations proposed by Arora et al. and Jia et al. to move the mapping from MySQL to NoSQL and relocating and improving the implanted SQL questions to communicate with the new data set framework and changing the application code to utilize the deciphered inquiries.

J. Christina* and S. Saranya** () have implemented the extended form of the existing algorithm from data migration from MySQL to MongoDB and presented the comparative study on their performances. The existing algorithm proposed by Liana et al. unfilled a structure which apparatuses an algorithm with automatic mapping from MySQL to MongoDB. The algorithm has used the MySQL INFORMATION_SCHEMA that gives entree to database metadata and represents the entity relationship in MongoDB using embedding and linking strategies (structure mode).

Jaejun Yoo et al. () have presented the naive migration method based on table-level de-normalization, novel migration method based on column-level de-normalization and atomic aggregates in their paper. They have also performed comparative study on the performance of those three migrating concepts. They have used separate algorithms for respective concepts.

Ying-Ti Liaoa, Jiazheng Zhou, et al. () have utilized a data adapter system that is layered between applications and databases and highly modularized. Database transformation and queries can be carried out simultaneously from the application. To access both a relational and a NoSQL database, the system makes it easy to use a SQL interface that parses query statements. They enable applications to conduct queries regarding whether or not target data are being transformed and provide a mechanism for controlling the process of database migration. There is a patch mechanism for synchronizing inconsistent tables after data transformation.

Fatima Jamal AI Shekh Yassin () presents research by simulating a comparison of the performance of retrieving data from various MongoDB representations. The conclusion is based on some instances in which it is preferable to keep all of the data in one document rather than using multiple collections to represent massive amounts of data with intricate relationships. The MySQL server database is made up of five tables that hold information about employees and have various kinds of relationships between them. Using NoSQL DB Mongo, which stores data as documents, the data was represented. The MySQL DB and three distinct Mongo designs were compared in terms of data reading performance. The three designs were chosen to cover the different relationships between documents in MongoDB: reference relationships between two documents, each of which has a different level of embedding, fully reference relationships between five documents, and fully embedded relationships between two documents.

Alza A. Mahmood () has presented an automated algorithm and developed a prototype for the structural type conversion from SQL data form to NoSQL data form with the automatic transfiguration features available in an algorithm. Different structure has been defined under different choices, such as Auto, Embedded and Linking.

Shabana Ramzan et al. () have proposed a method that has two building blocks: data transformation and data cleansing module. This study has been performed in two phases: (i) Migration of SQL database to Oracle NoSQL database with model-model conversion and (ii) Improvement of data quality through data cleansing ability to prepare it for big data analysis. Experiment has been performed to show the successfully transformation from structured database to non-SQL database and improving the data excellence.

Roberto De Virgilio et al. () have provided a tool for automatically converting a SQL database into a graph database. The application’s persistent layer can be converted using this tool from a relational to a graph format, which can be very helpful for the flexible and effective management of graph-based data like social and semantic Web data. The Tinkerpop1 framework is used to implement the abstract interface over graph databases in R2G, which was developed in Java. There are four main components in R2G: i) the Metadata Analyzer, which checks the schema and constraints of the source database and creates the corresponding schema graph; ii) the Data Mapper (DM), which creates the data mapping; iii) the Query Mapper (QM), who is in charge of translating queries; and iv) the Graph Manager (GM), which actually moves the data and runs queries over the target database using mappings created by the DM and QM. The output of MA is utilized by the DM and the QM.

A method for migrating data from a Relational database to a NoSQL graph database was proposed in the paper by Mehak Gupta and Dr. (Mrs.) Rinkle Rani (). Data can be transformed from a relational database to a graph database using foreign key relations in a relational database. A procedure for switching inquiries from SQL over completely to encode question language has likewise been proposed. A similar examination of SQL and CQL is performed based on the execution time in milliseconds.

3. Methodology

The work presented in this work is the enhancement of the transformation algorithm proposed by Shabana Ramzan et al. () to renovate data from MySQL server to Neo4j NoSQL diagram database. Automatic transformation of data from MySQL server to Neo4j is presented in the study. Model-model transformation method has been taken into implication. It is a method for creating software that uses models. The inputting model, the renovation explanation, the conversion mechanism, and the translation procedures are all necessary components for model transformation. Figure 1 depicts the entire model transformation framework.

Figure 1

Model Transformation from Relational to NoSQL Graph Database.

Here we have presented the diagrammatic representation of migration procedure. The diagram clearly shows the process to convert and migrate the database from MySQL to Neo4j graph database.

From Figure 2 we can summarize the process flow of the application as per mentioned below:

Figure 2

Migration of Database from Relational Database MySQL Server to NoSQL Graph Database Neo4j.

SchemaCrawler engine extracts metadata and capture data interacting with SQL database.
Metamodeling of the source database i.e. MySQL is performed.
Now SchemaCrawler engine executes the transformation rule with the incorporation of enhanced algorithm.
SchemaCrawler establishes the connectivity with Neo4j using Java driver.
SchemaCrawler maps metadata from source model and convert data into respective elements as mentioned in the transformation rules.
Through metamodeling data is mapped and converted to destination database model, that is, the NoSQL graph model.

If we want to get efficient ETL (Extract, Transform, Load) from SQL database to NoSQL graph database, each table data and properties should be accessed as minimum as possible. Therefore, we inspect metadata before transformation of migration step forward. We should also identify relationships between tables, recursive relationship, primary key, foreign and other attributes that are not part of primary key or foreign key. SchemaCrawler is used to achieve all those aforementioned tasks (). Figures 3 and 4 clearly depicts meta modeling structure of relational and non-relational graph databases respectively.

Figure 3

Relational Metamodel.

Figure 4

Neo4j Metamodel.

Here in the existing model to model transformation technique, SchemaCrawler tool has been used as engine to reduce the time for metadata and data extraction from MySQL and then migration of that data to NoSQL graph database Neo4j. Due to some major limitations of SiTra as mentioned by D. H. Akehurst et al. (), the tool ‘SchemaCrawler’ is worth considered for the efficiency of the migrating process. The limitations are:

SiTra can’t be utilized when there is more than one decision that ought to guide to a similar objective item. This is due to the fact that SiTra cannot be used to determine which rule should map to the same target object.
Additional constraint concerns the recursive invocation of procedures.
SiTra and ETL give a direct hint of what rules and data sources have made what yields.

Although different tools available such as SchemaSpy online, SchemaCrawler is selected for meta modeling of MySQL database and Neo4j graph database as well as executing the transformation rules to transform the dataset due to support of following features:

Schemacrawler provides extracted information about database schemas in various formats (e.g., text, json, csv, html5) ().
The API allows to access to SQL database in a homogeneous way ().
SchemaCrawler is not only a diagramming tool, but also a tool for discovering and understanding database schemas, with dominant scripting capabilities using JavaScript, Groovy, Ruby or Python ().
It is highly customizable where more use cases can be achieved using the boilerplate code provided in the example distribution ().
For programmers, there is a rich Java API for database object metadata and integration with Maven repositories ().

There are different ways to analyze the metadata of a schema in a database, including SchemaCrawler. SchemaCrawler is portrayed as ‘Open-source Java API that produces working with database metadata as simple as working with plain ancient Java objects’ and is an app (). There are three choices to SchemaCrawler for Windows, Linux and Mac. The leading SchemaCrawler elective is DbSchema. It is not free. A free option is the SchemaSpy tool. We here demonstrate some typical differences so that we can remark the important key points on why we choose SchemaCrawler instead of DbSchema and SchemaSpy. Table 1 illustrates the differences clearly.

Table 1

Differences between Meta Analyzer Tools.


Criteria	SchemaCrawler	DbSchema	SchemaSpy

Type	Free	Commercial	Free

Export type	CSV, HTML, JSON, Plain text	HTML, MS Excel, PDF	HTML

Runs on	Linux, Mac OS, Windows	Windows	Linux, Mac OS, Windows

Most popularly used two different tools are SchemaCrawler and SchemaSpy in metadata modeling and reengineering tool for the visualization of metadata of database schema. SchemaCrawler engine is used to map schema of SQL database and NoSQL graph database and then SchemaCrawler execute the transformation rules to migrate data set from MySQL server to Neo4j NoSQL graph database due to the following reasons ():

SchemaCrawler provides support of almost any database that has JDBC driver.
SchemaCrawler can be utilized with any working framework that upholds Java SE 8, CompactProfile 2, or later.
SchemaCrawler is as well Java API, which makes it as easy to work with database metadata as it does for plain old Java objects.
SchemaCrawler can be utilized with any working framework that upholds Java SE 8, CompactProfile 2, or later.

With the assistance of various Java APIs, SchemaCrawler is integrated with the same Java applications. In order for us to select from or even make changes to your database, a live connection to the script context is established. Working with database metadata is made as simple as working with standard Java objects with SchemaCrawler, a Java API. When programmatically determining the capabilities of a particular RDBMS and locating the names and types of tables and columns in the database, we require access to the metadata of the database in order to dynamically generate SQL statements. Thus we can make the implementation of the MDD (Model Driven Development) approach to migrate data by incorporating these two main aspects.

The tool ‘SchemaCrawler’ is used programmatically using spring boot framework.
Plain old Java object (POJOs) as well as Java APIs are used for database mapping and instantiation for Meta modeling.

3.1 Enhancement of Transformation Algorithm

Improvement for transformation algorithm proposed by Shabana Ramzan et al. () has been done as per the table shown below. Improvement has been done in this way so as to implement the algorithm to accomplish the conversion. The technique to perform translation of dataset from normalized structure of MySQL to NoSQL graph database Neo4j is same that was used to implement a transformation algorithm proposed by Shabana Ramzan et al. (). Table 2 shows how enhancement is done to implement the migration process.

Table 2

Enhancing an Algorithm Proposed by Shabana et al.


ALGORITHM PROPOSED BY SHABANA RAMZAN ET AL. TO MIGRATE DATA FROM MS SQL SERVER TO ORACLE NOSQL	ALGORITHM AFTER MODIFICATION TO MIGRATE DATA FROM MYSQL SERVER TO NEO4J GRAPH DATABASE

Construct: make a table object of the source. Construct: Make a lean of the instances in the target tabular from those in the source table. Representing: Target tabular instances map to objects in the source table. Construct: make an article for every section of the source table. Mapping: Map from target field object to source column object. If value is true Pick the rule for the transformation. // the standard is appropriate or not when more than one rule is free for a similar sort of item. endif Construct: The object for the target table is created. Construct: Every column’s target object is made. Select primary key. Keep a background marked by changed objects // to prevent the creation of target objects in duplicate.	Construct: make a tabular form of the source. Construct: Creating a rundown of targeted nodes from the source table’s items. // Table object of SQL table is addressed as a named hub in a diagram data set Mapping: Target node objects map to objects in the source table. // A node in a graph database has been used to map the source table’s schema. Construct: make an article for every section of the source table. Mapping: Target node property to object map from source column. // Oracle’s NoSQL source column is mapped with a table object, whereas the graph database’s source column is mapped with a node property. If value is true Pick the rule for the transformation.// transformation rule that has been modified in accordance with the data set must be mapped and transformed from a SQL to a NoSQL graph database, whereas the previous algorithm utilized transformation rule that was transferred from an Oracle non-structured database to a relational database. // the standard is appropriate or not when more than one rule is free for a similar sort of item. endif Create: The instance with the targeted labeled node is created.// In a graph database, the object that represents the source table is labeled a node. Construct: target hub property of every section is made. // the section is planned and changed as hub property in a diagram data set. Select primary key. Keep a background marked by changed objects // to prevent the creation of target objects in duplicate.

We have come across a crucial improvements in the algorithm () as shown in the right column of the above table. The improvements have been done from the following important aspects:

Data modeling
Schema mapping and data mapping
Transformation rules

The enhanced algorithm uses this transformation rule below shown. This rule has an association with target model and the source model. Transformation rule has been pinpointed below:

Change rules

ER-Normal Models: Rule of Transformation
1. Table become nodes
2. Table name as node label
3. Segments are changed over completely to hub properties
4. If necessary, convert the values
5. Keys from other countries n, n: 1) into connections, segment name into relationship-type
Relationships are represented by join tables.
1. Additionally, two FKs and additional tables without domain identity (PK)
2. Sections transforms into relationship properties
ER-Normal Models: Rules for Cleanup
1. Eliminate specialized IDs (auto-increasing PKs)
2. Maintain domain IDs like ISBNs
3. Include lookup field indexes.
4. Change the labels for the property Name, REL_TYPE, and Label.
The Import Tool for Relational Database Management Systems consists of:
1. JDBC for database connection independent of vendor
2. Database metadata can be extracted using SchemaCrawler.
3. Transform a graph model import using rules
4. A way to change the default behavior that is optional
5. Simultaneously scans tables for relationships and nodes.

3.2 Data Description

The research is basically the improvement of transformation algorithm proposed by Shabana Ramzan et al. () move the informational collection from MySQL to Neo4j. The data set used to migrate from MySQL to Neo4j is structured. All the datasets of MySQL are structured. Here in the Table 3, we have 5 different data sets for the migration purpose:

Table 3

Data Source.


SN	DATA SET	SOURCE	FORMAT	SIZE

1	Classicmodels	https://www.mysqltutorial.org/mysql-sample-database.aspx	.sql	205 KB

2	Sampledb	https://sample-videos.com/download-sample-sql.php	.sql	4,340 KB

3	Northwind	https://en.wikiversity.org/wiki/Database_Examples/Northwind/	.sql	64 KB

4	Hrdb	https://www.sqltutorial.org/sql-sample-database/	.sql	23 KB

5	chinook	https://github.com/lerocha/chinook-database/releases?fbclid=IwAR34vMltOsbXO_bBu2Rkmw3dvYvWTqdaP1BlF1c9JPrWSjCQ2ENyyBYvT_8	.sql	1,827 KB

The structured databases sampledb, northwind, hard, and classic models are made up of a number of tables, relationships, and constraints. The CMS as a model is discussed in this section. The relationship between the various entities, primary and foreign keys, and data types is depicted in the Figure 5.

Figure 5

Retailer of Scale Models of Classic Cars Database.

The database for the classic model is essentially a sample database of scale car model retailers. customers, products, sales orders, and sales order line items, among other things, are included ().

Database: sampledb

Schematic view of a single table, user_details, with fifty thousand records, as shown in the Table 4.

Table 4

Schematic View of sampledb Database.


FIELD	TYPE	NULL	KEY	DEFAULT	EXTRA

user_id	int	NO	PRI	NULL	auto_increment

Username	varchar(255)	YES		NULL

first_name	varchar(50)	YES		NULL

last_name	varchar(50)	YES		NULL

Gender	varchar(10)	YES		NULL

Password	varchar(50)	YES		NULL

Status	tinyint	YES		NULL

Database: northwind

Figure 6 shows relationship of data with different types of relations.

Figure 6

Entity Relationship Diagram for northwind Database.

Database: hrdb

Database hrdb has different 7 tables. The database schematic view is presented in the Figure 7.

Figure 7

Entity Relationship Diagram for hrdb Database.

Database: chinook

Database chinook has different 11 tables. The database details has been presented in the Figure 8 in Entity Relationship Diagram form.

Figure 8

Entity Relationship Diagram for chinook Database.

Briefly discussed are the fundamental components of the proposed method for transferring data from the MySQL server to Neo4j:

SchemaCrawler: SchemaCrawler is an open-source Java API that makes it simple to work with database metadata and regular Java objects. In addition to documenting tools for the schema, SchemaCrawler is a tool for locating and comprehending database schema. It is intended that the output be distinguished from other database schemas. It is bundled with drivers for some commonly used RDBMS systems for convenience, but it supports nearly all databases that have a JDBC driver. It works with any working framework that upholds Java.

Transformation module: A crucial component of the model-driven engineering paradigm is model transformation. The domain’s conceptual models can be created and played with in this automated manner. For this reason, the social Meta model is looked and separated, and planned with the Meta model of the NoSQL diagram information base so relocation can be performed from the SQL data set to the NoSQL chart data set.

3.4 Transformation Evaluation Metrics

Different cost metrics have been used to evaluate the DBMS’s performance. Additionally, these metrics are utilized to assess and verify the validity of database transformation procedures. The matrices used by Management and Url () and Shabana Ramzan et al. () are (a) schema information preservation, (b) data equivalence, and (c) system performance. These metrics are also used to verify the process of checking that software achieves its goal without any bugs. Performance evaluation with these metrics ensures whether the product that is developed is in right track or not. That means the process of evaluating metrics verifies whether the developed product meets the requirements that are pinpointed in the ‘Statement of the Problem’.

1) Schema information preservation: Schema information is compared between source data store and target data store as mentioned in Fahrner & Vossen () and Ramzan et al. (). The comparison is performed with semantics used in both kinds of store. Assuming the linguistic structure and semantics of the source model idea have been utilized accurately, then the diagram will be right. The execution of migration tool from MySQL to Neo4j developed has proved the syntactically and semantically migration process is valid. Migration from MySQL to Neo4j includes conversion and transformation of syntax and semantics from relational database MySQL into Neo4j graph database correctly. A formal foundation for accurate schema transformations has been developed by Pasteur, Koch and Robert in the Xiaolei Qian Computer Science Laboratory (). The formalization of schema transformations as signature interpretations, the formalization of accurate schema transformations as information-preserving signature interpretations, and the sanctification of schemas as abstract data types all fall under this category.

Data conversion and schema mapping are depicted in Figure 9. Schema matching is the first step after selecting the source and target schemas. Experts specify correspondences between various schema elements in this step. Tools that are only partially automated will probably help with this. Schema mapping is the outcome of this step. The subsequent step deciphers this planning as an information change. Typically, this transformation is carried out using the relevant data model (SQL, CQL). The target schema is created by transforming the source data using querying. Transformation queries can be used to hypothetically rewrite the query against the target schema or to populate the target database using a materialized approach. Casually, the understanding of diagram planning faces a few challenges.

Figure 9

The Process of Schema Mapping for Data Transformation ().

Source schema interpretation: During transformation, it is necessary to recognize and preserve associations between data elements (relationships, nesting, and foreign keys) in the source schema.

Target schema conformance: The result of conversion must match the target schema.

User intention: Professional users only provide informal correspondences (‘arrows’) between schemas. Even under the requirement that all correspondence must be included in the interpretation of such mappings, there are usually several alternatives. Correctly guessing what you meant is only possible based on good heuristics.

Migration correction is checked with the help of queries so that all records from source database to destination are migrated successfully as used in Queiroz et al. ().

We find out whether all records of database from MySQL to Neo4j are migrated by executing query on both SQL and Cypher in MySQL and Neo4j respectively. The SQL query and cypher query for four different databases has been used to evaluate the correctness of schema and data set from MySQL to Neo4j.

To count entire set of records in a specific database of MySQL.
SQL Query:
SELECT SUM(TABLE_ROWS) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘DATABASE_NAME’;
To count entire set of nodes in a specific database of Neo4j.
Cypher Query:
MATCH (n) RETURN count(*)

When we execute the data migration tool from MySQL to Neo4j, we have come into the following types of transformations:

Table has been transformed into labelled nodes in Neo4j.
Source table object is mapped target node objects.
Column object is mapped to target node property.
Source table object is mapped to target labelled node object.
Each record from all table(s) in database has been converted into respective nodes in Neo4j; that is, all tables, relationship types, columns from MySQL to Neo4j are mapped and hence migrated all data into Neo4j.

2) Data equivalence: The source and target databases will be accessed in order to evaluate the results. The information returned by the questions will be contrasted with approving the produced patterns. With the use of set of queries such as insertion, deletion, retrieval and updating for a MySQL; create and match for a Neo4j), we evaluate datasets as well as the primary features of the database.

Using a set of predefined queries on databases, we evaluated the proposed system: classicmodels, sampledb, northwind and hrdb. Results are physically surveyed to guarantee information identicalness.

Query roles are defined as follows:

Query 1: INSERT record
In MySQL, the INSERT query has been used to add a row of data. The create operation in Neo4j is used to create or insert a new node.
SQL Query:
INSERT INTO table_name VALUES(val1, val2,…..,valN);
Cypher Query:
CREATE (n:node_name {attribute1 :val1, attribute2: val2, attribute3: val3,…………, attributeN:valN}) RETURN n
Query 2: DELETE
This inquiry is utilized to eliminate the single line. The delete command is what Neo4j uses to get rid of a specific node.
SQL Query:
ALTER TABLE TABLE_NAME DROP FOREIGN KEY TABLE_FK_KEY;
DROP INDEX INDEX_NAME ON TABLE_NAME;
DELETE FROM TABLE_NAME WHERE KEY_FIELD=’VALUE’;
Cypher Query:
MATCH (e:NODE_LABEL_NAME {KEY:VALUE}) DETACH DELETE e
Query 3: RETRIEVE
The single row is retrieved by this query. Data was retrieved by Neo4j using the MATCH operation and the WHERE conditional.
SQL Query:
SELECT * FROM TABLE_NAME WHERE KEY_FIELD=VAL;
Cypher Query:
MATCH (c:NODE_LABEL_NAME) WHERE c.NODE_LABEL_NAME =VALUE RETURN c
Query 4: MULTIPLE-RETRIEV
This question recovers numerous lines. Neo4j utilized the MATCH order to recover different records immediately.
SQL Query:
SELECT * FROM TABLE_NAME;
Cypher Query:
MATCH (n) WHERE n:NODE_LABEL_NAME RETURN n
Query 5: RETRIEVE WITH SINGLE JOIN
This query pulls records from a relational database as a join query. Neo4j has a parent-youngster relationship that serves as a join rather than a go-along. This query includes both the crucial key from the parent table column and the crucial key from the child table.
SQL Query:
SELECT * FROM TABLE1, TABLE2 where TABLE1.PK=TABLE2.FK;
Cypher Query:
MATCH(n1:NODE_LABEL_NAME)-[r]->(n2: NODE_LABEL_NAME) RETURN n1,n2, r
Query 6: RETRIEVE WITH MULTIPLE JOIN
This query reads rows from multiple child tables in addition to the parent table.
SQL Query:
SELECT DISTINCT c.TABLE_NAME, c. TABLE_NAME FROM TABLE_NAME AS c
        JOIN TABLE_NAME AS o ON (o.COLUMN_VALUE = c. COLUMN_VALUE)
        JOIN COLUMN_VALUE AS p ON (p. COLUMN_VALUE = c. COLUMN_VALUE);
        WHERE c. COLUMN_VALUE =VAL;
Cypher Query:
        MATCH (c: NODE_LABEL_NAME), (o: NODE_LABEL_NAME), (p:
        NODE_LABEL_NAME) WHERE
        c.KEY=VALUE return DISTINCT c.FIELD_NAME, c.FIELD_NAME
Query 7: UPDATE
The row is updated by this query. Neo4j uses a set operation to update the row based on the key value in the node.
SQL Query:
UPDATE TABLE_NAME SET COLUMN_NAME=’VALUE’;
Cypher Query:
MATCH (c: NODE_LABEL_NAME {FIELD_NAME: “VALUE”}) SET c.FIELD_NAME = “VALUE”;

3) System Performance Comparison: By evaluating the system’s performance, the quality of the conversion or translation is guaranteed. Question tumbled by time, the times that inquiry takes to execute, is utilized as an exhibition assessment metric by the vast majority of the DBMS. The exhibition and various parts of the usefulness of chart information stores are inspected and estimated through the created set of questions. By measuring the time spent on each query, the queries are used to assess the system’s performance. Windows 11 and an i3 CPU operating at 1.80 GHz were the subjects of our testing. sampledb, northwind, classicdb and hardb are the respective names of database. The execution times (in milliseconds) of each query for the source and destination databases are shown in Tables 6, 7, 8, 9, 10.

The measurement is the average amount of time that has passed since we ran each query three times. There is a plausible of reserving results from the data set, so we have changed the worth in the predicate prior to presenting the question each time. The assessment results are displayed in Figures 10, 11, 12, 13, 14.

Figure 10

Query Execution Time for classicmodels Database.

Figure 11

Query Execution Time for sampledb Database.

Figure 12

Query Execution Time for northwind Database.

Figure 13

Query Execution Time for hrdb Database.

Figure 14

Query Execution Time for chinook Database.

4. Experiment and Analysis

This section presents the RDBMS model utilized for the relocation process. This section likewise portrays the information demonstrating and execution of both MySQL and Neo4j data sets. This section presents the results of the experiment. With qualitative and quantitative analysis, we observe the data migration process. As shown in Figure 2, the qualitative analysis is a proof of concept that involves migrating an existing relational database to a NoSQL graph database with the assistance of the SchemaCrawler Engine in accordance with the rule of Model Transformation from Relational to NoSQL graph database. The qualitative analysis aims to determine whether the migrated data have an effect on the applications’ ability to carry out a variety of SQL operations. In the quantitative examination, our goal is to comprehend and foresee conduct by the utilization of numerical estimations, factual displaying, and research. There are a number of uses for quantitative analysis, including efficiency, performance evaluation, and measurement. To play out this appraisal, we carried out an exploratory set-up with information and assessed the complete movement time for the various informational indexes. The size of the MySQL database, the operating system, the hardware configuration, and the related complexity of the tables all influence the data migration process.

For migrating database we have taken different dummy databases available online. The data set belongs to .sql format. We have imported those datasets in the MySQL server before migrating them from MySQL to Neo4j. Migration is performed in these two ways:

In the first phase, we map and extract Metadata from MySQL and retrieve database from with SchemaCrawler engine.
In the second phase, SchemaCrawler extract data set from MySQL and transfer that data into Neo4j with Model to Model Migration process.

To perform the complete migration procedure using enhanced form of the Shabana et al. () Transformation algorithm, first we come through the process of meta modeling and extracting data set from MySQL to SchemaCrawler using existing plain Java code as well as JDBC drivers. To perform the migration of SQL data sets from MySQL to Neo4j has been implemented with plain Java code as mentioned in the APPENDIX A. SchemaCrawler engine has been presented as intermediary for the data migration. It executes the transformation rules with certain directed transformation rules so that the data set from MySQL to Neo4j is performed.

4.1 Execution of Springboot Project with Intellij

SchemaCrawler engine (tool) has been used to implement model driven approach through model driven development. Model driven development is in the implementation with plain Java code to develop the automatic migration tool (software) to migrate and convert relational database from MySQL to Neo4j. The approach has been come into full-fledged implementation using springboot framework. Implementation of the tool maps schema and then extracting metadata and data from MySQL and directly transfer the relational database into Neo4j. Program code snipset has been placed in the APPENDIX A and the details of execution has been explained and clearly depicted in the APPENDIX B.

There are four different databases of MySQL. These relational databases of MySQL have been converted and successfully migrated to Neo4j. Details of the databases to be migrated from MySQL and results of the transformation time of respective database has been shown in Table 5.

Table 5

Data Conversion and Migration.


DATABASE	NUMBER OF RECORDS	TRANSFORMATION TIME (IN SECONDS)	RELATION (YES/NO)

Classicmodels	3,864	11 sec.	Yes

Sampled	50,000	12 sec.	No

Northwind	932	10 sec.	Yes

Hrdb	136	8 sec.	Yes

chinook	15,587	8 sec.	Yes

After the successful execution of the MySQL to Neo4j migrator tool with the help of SchemeCrawler engine, we come to the successful migration from the following important points:

There are different four databases of MySQL databases, for example, classicmodels, sampledb, northwind and hrdb used for migrating purpose.
Migration of each database in .sql format from MySQL to Neo4j has been performed successfully.
Resultant data values after transformation and conversion from MySQL to Neo4j is consistent and accurate.
Evaluation of consistency and accuracy of data transformation procedure is performed and explained in section 4.2.

4.2. Evaluation of Migrating Process

Model to Model transformation is evaluated by using different key metrics as per the quantitative as well qualitative evaluation methods ().

1) Schema information preservation

Migration correction is checked with the help of queries so that all records from source database to destination is migrated successfully as used in ().

We find out whether all records of database from MySQL to Neo4j are migrated by executing query on both SQL and Cypher in MySQL and Neo4j respectively. The SQL query and cypher query for three different database has been performed as shown below:

Database: classicmodels
SQL Query:
SELECT SUM(TABLE_ROWS) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘classicmodels’;
Cypher Query: MATCH (n) RETURN count(*)
Database: sampledb
SQL Query:
SELECT SUM(TABLE_ROWS) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘sampledb’;
Cypher Query: MATCH (n) RETURN count(*)
Database: northwind
SQL Query:
SELECT SUM(TABLE_ROWS) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘northwind’;
Cypher Query: MATCH (n) RETURN count(*)
Database: hrdb
SQL Query:
SELECT SUM(TABLE_ROWS) FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = ‘hrdb’;
Cypher Query: MATCH (n) RETURN count(*)

2) Data equivalence

By running queries against the source and destination databases, the migration process’ outcome has been assessed. The produced schemas have been verified against the data that the queries returned. With the aid of a set of queries (insert, delete, retrieve, update for relational databases, and build and match for NoSQL graph databases), we have assessed datasets and the essential characteristics of databases.

Using a series of predefined queries on the databases classicmodels, sampledb, northwind, and hrdb, we tested the suggested solution. To make sure the data are equivalent, the results have been manually examined as shown in the Tables 6, 7, 8, 9, 10.

Table 6

Query Elapsed Time in Milliseconds for Database classicmodels.


NO.	QUERY	ELAPSED TIME MYSQL	ELAPSED TIME NEO4J

1.	Insert	32 ms	4.33 ms

2.	Delete	703 ms	74 ms

3.	Retrieve	20 ms	46.67 ms

4.	Multiple retrieve	3 ms	18.33 ms

5.	Retrieval with single join	47 ms	11.33 ms

6.	Retrieval with multiple join	15 ms	1360 ms

7.	Update	47 ms	51.33 ms

Table 7

Query Elapsed Time in Milliseconds for Database Sampled.


NO.	QUERY	ELAPSED TIME MYSQL	ELAPSED TIME NEO4J

1.	Insert	0.0 ms	52 ms

2.	Delete	16 ms	6 ms

3.	Retrieve	16 ms	1209 ms

4.	Multiple retrieve	47 ms	1900 ms

5.	Retrieval with single join	No join	No join

6.	Retrieval with multiple join	No join	No join

7.	Update	0.0 ms	287 ms

Table 8

Query Elapsed Time in Milliseconds for Database northwind.


NO.	QUERY	ELAPSED TIME MYSQL	ELAPSED TIME NEO4J

1.	Insert	46 ms	216 ms

2.	Delete	31 ms	28 ms

3.	Retrieve	0.0 ms	215 ms

4.	Multiple retrieve	31 ms	156 ms

5.	Retrieval with single join	47 ms	199 ms

6.	Retrieval with multiple join	130 ms	212 ms

7.	Update	63 ms	330 ms

Table 9

Query Elapsed Time in Milliseconds for Database hrdb.


NO.	QUERY	ELAPSED TIME MYSQL	ELAPSED TIME NEO4J

1.	Insert	16 ms	56 ms

2.	Delete	381 ms	60 ms

3.	Retrieve	0.0 ms	16 ms

4.	Multiple retrieve	0.0 ms	32 ms

5.	Retrieval with single join	0.0 ms	17 ms

6.	Retrieval with multiple join	0.0 ms	11 ms

7.	Update	78 ms	143 ms

Table 10

Query Elapsed Time in Milliseconds for Database Chinook.


NO.	QUERY	ELAPSED TIME MYSQL	ELAPSED TIME NEO4J

1.	Insert	63 ms	305Ms

2.	Delete	15 ms	61 Ms

3.	Retrieve	0.0 ms	160 ms

4.	Multiple retrieve	0.0 ms	124 ms

5.	Retrieval with single join	16 ms	183ms

6.	Retrieval with multiple join	16 ms	156036 ms

7.	Update	16 ms	294 ms

From the above five results of query elapsed time in MySQL and Neo4j, we come to the following conclusion:

Query elapsed time for Insert, Delete and Retrieval with single join in Neo4j is better than in MySQL
Query elapsed time for single record retrieval, multiple record retrieval, update and retrieval with multiple join in MySQL is better than in comparison to Neo4j.

3) System Performance Comparison

Quality has been assessed by using evaluation of system performance. Query execution time has been implemented as a performance assessment metrics by many of the database systems. A set of queries has been developed to examine and measure the performance and various aspects of the functionality of graph data stores.

The query statements have been executed to assess the system functioning by assessing the execution time of each query. We have implemented tests on Core i3 1.80 GHz processor with 8 GB RAM and Windows 10. Each query for sourced and targeted databases is shown in for databases: classicmodels, sampledb, northwind, hrdb and chinook.

In the above diagrams, query elapsed time for insert, delete, retrieve, multiple retrieve, retrieval with single join, retrieval with multiple join and update have been plotted for every database: classicmodels, sampledb, northwind, hrdb and chinook. From the above presented plotted diagrams, the summary of the conclusion is:

Performance of cypher query is better for sing record insertion, single record deletion and record retrieval with single join in comparison to MySQL SQL query.
On the other hand, performance of SQL query for single record retrieval, multiple record retrieval, update and retrieval with multiple in better in that of cypher query in neo4j.

4) Other validation techniques

Here we have summarized the validation techniques from different studies to migrate SQL to NoSQL so that the work we have performed is validated and verified.

The conference paper presented by () proposed the method to migrate databases successfully from relational database to a graph database. The conversion accuracy has been measured through the analysis and comparison of data integrity, time cost, result validity. This has shown that the integrity and operability of the database before and after conversion are consistent.
The paper by () has validated the conversion results by comparing the total number of records in the SQL database with the number of nodes and edges created in the graph database. The nodes properties are validated for consistency using a probabilistic data structure.

Finally we come to the conclusion that the migrated result from source database, that is, MySQL, to a graph database, that is, Neo4j, is validated. The above-shown metrics 1, 2 and 3 have proven the integrity and consistency of the result while migrating and converting the database from SQL to NoSQL graph database.

5. Conclusion and Future Work

This research has worked on the enhancement of a transformation algorithm proposed by Shabana Ramzan et al. to convert and transfer databases from the relational database MySQL Server to the NoSQL graph database Neo4j. The migration performed is Model to Model data transformation. The migration model uses four basic components: input model, transformation description, and transformation engine and transformation rules to convert and migrate the database. The transformation module has automatically executed. The developed system completed the migration by mapping schema and extraction of data from MySQL Server database to Neo4j graph database. SchemaCrawler Tool has been used to execute transformation rules and perform automatic migration from MySQL to Neo4j. This migrating model has performed the migrating data, there is need of metamodeling from source database to target database. Future work of this study will be developing a transformation tool to migrate huge database from relational database to Neo4j using SchemaCrawler by solving the problem of heap memory issues while converting database into Neo4j.

The limitation of the research work is that if the data migration process is interrupted by any condition, it cannot be resumed from that point. The embed document limit (nested documents) is applied at once for all table which enforces all the tables to have same nested documents. The data modeling for nested documents is not available as a result for embedding documents it embeds all the data from particular table. The number of query limit and threads is handled automatically by system. The data migration is done within a system only, that is, it cannot migrate the data from one system to another system. This system that has been developed with an enhanced form of transformation algorithm proposed by Shabana Ramzan et al. smoothly works for small set of SQL data sets. The major limits of this research study are pointed below:

Migration can be performed for small datasets of SQL database.
Neo4j used in this study is Community version i.e. Neo4j 3.1.3

SchemaCrawler engine is used to execute transformation rule and map schema and extract data.

Additional Files

The additional files for this article can be found as follows:

Appendix A

Code Snippets. DOI: https://doi.org/10.5334/dsj-2024-035.s1

Appendix B

Execution of Code to Migrate Data from MySQL to SchemaCrawler. DOI: https://doi.org/10.5334/dsj-2024-035.s2

Competing Interests

The authors have no competing interests to declare.

References

Aftab, Z et al. 2020 Automatic NoSQL to relational database transformation with dynamic schema mapping. Scientific Programming, 2020. DOI: https://doi.org/10.1155/2020/8813350
Akehurst, D H et al. 2006 SiTra: Simple transformations in Java. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4199 LNCS, pp. 351–364. DOI: https://doi.org/10.1007/11880240_25
Ba, V A C, Di, M P and Ba, V A C 2013 DATA MIGRATION FROM STANDARD SQL TO NoSQL, pp. 1–22. Thesis.
Christina, A J and Saranya, S 2018 Extending the Existing Data Migration Algorithm for MySQL To MongoDB and Analyzing Its Performance, 6(8), 440–446.
Database Schema Drawing tools – Database Processing n.d. Available at http://www.consuminglinkeddata.org/database-schema-drawing-tools [Last accessed 3 August 2022].
De Virgilio, R, Maccioni, A and Torlone, R 2014 R2G: A tool for migrating relations to graphs. Advances in Database Technology – EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings, pp. 640–643. DOI: https://doi.org/10.5441/002/edbt.2014.63.
Explore Your Database Schema with SchemaCrawler – DEV Community n.d. Available at https://dev.to/sualeh/explore-your-database-schema-with-schemacrawler-5341 [Last accessed 13 August 2022].
Fahrner, C and Vossen, G 1995 A survey of database design transformations based on the Entity-Relationship model. Data and Knowledge Engineering, 15(3): 213–250. DOI: https://doi.org/10.1016/0169-023X(95)00006-E
Feng, H and Huang, M 2022 An approach to converting relational database to graph database: From MySQL to Neo4j. 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications, ICPECA 2022, pp. 674–680. DOI: https://doi.org/10.1109/ICPECA53709.2022.9719151
Gowrisangar, K n.d. What is SchemaCrawler and Why would you need it? Geek Culture. Available at: https://medium.com/geekculture/what-is-schemacrawler-and-why-would-you-need-it-b9c496054533 [Last accessed 13 August 2022].
Gupta, M 2014 A novel approach to transform relational database into graph database using Neo4j. (June): 56. http://dspace.thapar.edu:8080/dspace/handle/10266/2800, Thesis.
Journal, A S, Bhandari, H L and Chitrakar, R 2020 Comparison of data migration techniques from SQL database to NoSQL database. Journal of Computer Engineering & Information Technology, 9(6): 1–10.
Khare, A 2016 A review of NoSQL databases, types and comparison with relational database. International Journal of Engineering Science and Computing, 3(7): 4963–4966.
Legler, F and Naumann, F 2007 A classification of schema mappings and analysis of mapping tools. Lecture Notes in Informatics (LNI), Proceedings – Series of the Gesellschaft fur Informatik (GI), P-103, pp. 449–463.
Liao, Y T et al. 2016 Data adapter for querying and transformation between SQL and NoSQL database. Future Generation Computer Systems, 65: 111–121. DOI: https://doi.org/10.1016/j.future.2016.02.002
Management, I M and Url, E 2018 Northumbria Research Link, pp. 25–35.
Mohamed, W, Abdel-Fattah, M A and Abdelgaber, S 2021 A comprehensive approach for converting relational to graph database using spark. Journal of Theoretical and Applied Information Technology, 99(12): 2972–2990.
Moniruzzaman, A B M and Hossain, S A 2013 NoSQL database: New era of databases for big data analytics – Classification, characteristics and comparison. (June). http://arxiv.org/abs/1307.0191.
MySQL Sample Database n.d. Available at https://www.mysqltutorial.org/mysql-sample-database.aspx [Last accessed 17 August 2021].
Pasteur, L and Koch, R 1941 Correct Schema Transformations*, 74(1934): 535–546.
Pokorný, J et al. 2017 Graph databases: Their power and limitations, pp. 58–69. DOI: https://doi.org/10.1007/978-3-319-24369-6_5
Putrama, I M and Martinek, P 2022 An automated graph construction approach from relational databases to Neo4j. IEEE Joint 22nd International Symposium on Computational Intelligence and Informatics and 8th International Conference on Recent Achievements in Mechatronics, Automation, Computer Science and Robotics, CINTI-MACRo 2022 – Proceedings, pp. 131–136. DOI: https://doi.org/10.1109/CINTI-MACRo57952.2022.10029438
Queiroz, J S et al. 2022 AMANDA: A middleware for automatic migration between different database paradigms. Applied Sciences (Switzerland), 12(12). DOI: https://doi.org/10.3390/app12126106
Rahma S. Al Mahruqi et al. 2020 Migrating webs applications from SQL TO NoSQL databases, (January). Thesis.
Ramzan, S et al. 2019 Intelligent data engineering for migration to NoSQL based secure environments. IEEE Access, 7, pp. 69042–69057. DOI: https://doi.org/10.1109/ACCESS.2019.2916912
SchemaCrawler – Free database schema discovery and comprehension tool n.d. Available at https://www.schemacrawler.com/ [Last accessed 15 March 2022].
SchemaCrawler Alternatives and Similar Software|AlternativeTo n.d. Available at https://alternativeto.net/software/schemacrawler/?fbclid=iwar2axgn-v7momm76l1qee9s5t4hroxtpap6dd0vmc3zdphngfa7sgqukrne [Last accessed 10 February 2024].
Yassine, F and Awad, M A 2019 Migrating from SQL to NOSQL database: Practices and analysis. Proceedings of the 2018 13th International Conference on Innovations in Information Technology, IIT 2018, pp. 58–62. DOI: https://doi.org/10.1109/INNOVATIONS.2018.8606019
Yoo, J, Lee, K H and Jeon, Y H 2018 Migration from RDBMS to NoSQL using column-level denormalization and atomic aggregates. Journal of Information Science and Engineering, 34(1): 243–259. DOI: https://doi.org/10.6688/JISE.2018.34.1.15.