Nothing Special   »   [go: up one dir, main page]

ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Exploratory graph analysis of the network data of the Ethereum blockchain

[version 1; peer review: 2 not approved]
PUBLISHED 09 Sep 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

Abstract

Background: This research uses exploratory graph analysis to analyze the transaction data of the Ethereum network. This is achieved through network visualization and mathematical and statistical modelling of the network data.
Methods: The dataset used in this study was extracted from the Ethereum in the BigQuery public dataset, specifically selected transactions in July 2019. The transactions were firstly modelled as network graphs and then visualized using the Kamada-Kawai and force-directed graphs layouts. Further modelling was explored with classical random graph and network block, with emphasis on network cohesion, hierarchical clustering and community membership.
Results: Looking at the network visualization and hierarchical clustering of the data, the network shows 170 clusters, the largest having 135 members. Through random graph modelling the optimum number of clusters is shown to be 95. Referring to the generated dendrograms, notable large transactions center around the DRINK token, the Maximine Exchange, the Upbit2 Exchange and the IDEX Exchange, identified through public disclosure of their Ethereum addresses. The network graphs tend to go towards the DRINK smart contract and the Maximine Exchange, indicating deposit actions, while it is the opposite for the IDEX Exchange. Further analysis also shows a different number of communities than the expected number. Falling short of the expected 170 clusters, the model is not able to capture additional mechanism that may be present at the density and social interaction distribution level of the network. On the other hand, network block modelling shows only four major clusters out of the 170 expected clusters, an indication that the model is not able to capture the network sufficiently.
Conclusions: The study was able to capture and model the interconnectedness of the system with its notion of elements, in this case, the transactions on the network.

Keywords

Ethereum, graph analysis, blockchain, network modelling

Introduction

A blockchain is a distributed and replicated data structure in which a linked chain of digital information known as a block, is stored as a public database or ledger. Each of the blocks in the chain may contain zero or more records of transactions or exchanges. If another exchange happens on the network, a copy of the transaction is added to the record of every member on the network.1

Blockchain has several characteristics that make it so compelling. Firstly, blockchain is immutable. It cannot be corrupted, changed, or altered.2 A blockchain is also decentralized if it is a public one. There is no governing authority overseeing the network. In addition, information on the blockchain is cryptographically hashed, thus preserving privacy and integrity of the data that is stored.2 Ethereum is an open-source blockchain, known for its decentralized smart contract platform. It is programmable and allows for the fabrication and distribution of decentralized applications (DApps).3

This research investigates the transaction data of the Ethereum network, through a graph analysis approach. This is achieved through network visualization and mathematical and statistical modelling of the network data. The work presented here provides a targeted analysis of the Ethereum network, capturing the interconnectedness of the system with its notion of elements, in this case, the transactions on the network.

Related works

Small-world models were proposed by Watts et al.4 These creators were interested in the way that numerous networks in the genuine work show significant levels of clustering, yet with the little distance between most vertices. They recommended rather starting with a network graph with a cross-section structure, and afterward arbitrarily 'reworking' a little the level of the edges. Assuming we have many N vertices that are instigated on an intermittent style, each of the vertex joins its neighbors to their respective side.4

Barabási et al. proposed a preferential attachment model for modeling networks.5 The more associated a vertex is, the more probable it is to get new connections. A vertex with a further extent has a more grounded capacity to get joins added to the network.6

Exponential random graph models (ERGMs) have relations to generalized linear models. However, the appropriate specification and fitting of exponential random graph models can be more unobtrusive than that of a standard generalized linear model. Furthermore, a significant part of the standard inferential foundation accessible for generalized linear models, based upon asymptotic approximations to fitting chi-square distributions, is not in the consideration for exponential random graph models.7

Stochastic block models (SBM) are a class of models in the statistical data analysis of the network or graph data, and they can be utilized to find or comprehend the structure and the latent of a network graph for clustering. This model will in general deliver a graph containing networks, subsets described by being associated with edge densities.7

Methodology

Dataset

The dataset of transaction records of Ethereum was taken from the BigQuery public dataset (Ethereum in BigQuery).8 The attributes from_address and to_address will be used as nodes (vertices) and the relationship between them will be the edges of the nodes in this study. The attributes and their descriptions are shown in Table 1. In this exploratory study, only arbitrarily selected 1000 contiguous rows of transactions in July 2019 are considered.15

Table 1. Attributes and description of the dataset (Ethereum in BigQuery).

AttributesDescriptions
hashHash of the transaction
nonceThe number of transactions made by the sender prior to this one
transaction_indexInteger of the transactions index position in the block
from_addressAddress of the sender
to_addressAddress of the receiver. Null when it’s a contract creation transaction
valueValue transferred in Wei
gasGas provided by the sender
gas_priceGas price provided by the sender in Wei
inputThe data sent along with the transaction
receipt_cumulative_gas_usedThe total amount of gas used when this transaction was executed in the block

Software for analysis

The analysis is performed in R version 4.0.2, utilising the igraph R package for network analysis. The source code used can be found at Zenodo.16

Network graph

A graph G=VE is a set of E of links (edges), a set V of nodes (vertices), and where components of E are unordered pairs uv of distinct vertices u,vV . The number of vertices Nv=Vand the number of edgesNe=V are also called the order and size of the graph G, respectively.7

Visualizing the network

To visualize a large network having over 1000 vertex and edge attributes, few methods exist. Kamada, et al. proposed a technique to draw general graphs.9 This technique tends to be broadly utilized in the network structures that are managed by the system. The mathematical displacement between the node in the drawing can be identified by graph-theoretic displacement. The spring algorithm proposed by Kamada, et al. has good properties such as symmetric drawings, edge crossing with a relatively small number of edges, and isomorphic graphs with almost congruent drawings.9

The DrL method provides two-dimensional visualizations of exceptionally huge theoretical chart structures.10 The graph is drawn employing a force-directed algorithm based on recreated strengthening. This clustering is utilized to create a coarsened graph (less vertex) which is at that point redrawn. This handle is rehashed until an adequately small graph is produced.10

Charactering network cohesion

One way to deal with characterizing the network cohesion of a specific network graph is through the identification of the subgraph of interest. The standard case is that of finished subgraphs and henceforth are subgroups of vertices that are completely strong, as in all vertices inside the subgroup are associated by edges. Large groups of subgraphs are considered rare in practice because they need the required network graph itself to be dense.7

dⅇnH=EHVHVH12 is the general formula of the density of a subgraph H=VHEH, that the recurrence of acknowledged edges is comparative with possible edges, in an undirected graph G which has no self-loops and various edges. In the situation that if G is a directed graph, VHVH1 will be replaced with the denominator of the equation above. The value of zero and one in denH is to justify how near the subgraph H is to be becoming a clique in the network.7

In graph clustering, the relative frequency also can be one of the methods to defining a cluster. The standard use of the term clustering coefficient normally means the quantityclTG=3τΔGτ3G , where τΔG is the number of triangles in the graph G, and τ3G is the number of connected triples. The value of clTG is the transitivity of the graph and is also referred to as the fraction of transitive triples A graph G is fully connected if each node is reachable from all other nodes, and that a graph’s connected component is a maximally connected subgraph.7

Hierarchical clustering

Various methods have been proposed for clustering, varying in essentially by the way they assess the nature of the proposed clustering and quality of the enhancement brought about by the algorithms.11 These strategies adopt a greedy strategy to looking through the states of all potential partitions C, through altering progressive applicant partitions iteratively.

One of the most popular measures is modularity.10 Let C=C1Ck as a given candidate partition and define fij=fijC to be the fraction of edges in the original network that connect vertices in C1 with vertices in Cj. The modularity of C is the value modC=k=1kfkkCfkk2, where fkk is the expected value of fkk under some model of random edge assignment.10

Classical random graph modelling

Graph modelling refers to a collection θG,Gg:θω, where g is a collection of possible graphs, θ is a probability distribution on g, and θ is a vector of parameters, ranging over possible values in ω. A collection g and a uniform probability · over g is usually used to refer to a random graph model.7 Erdos et al. established a series of the seminal paper on the classical theory of random graph models.8 Erdos et al.’s model specifies a collection of gNv,Ne of all graph G=VE with V=Nv and E=Ne, and assigns probability G=NNe1 to each GgNv,Ne, where N=Nv2 is the total number of distinct vertex pairs.8

Network block models

In a random graph, G, let G=VE. The graph can be categorized into one of Q classes. Each element of the block model for a particular graph can be specified as each element of the adjacency matrix Y, labelled as q and r of vertices i and j, respectively, with a probability of Bernoulli random variable. Snijders, et al. introduced a parametric limit of the form.13

Results and discussions

The network graph (Figure 1) shows a total number of 170 clusters among 1000 rows of data. The top three clusters inside the network graph consist of the size of 135 vertices, followed by 55 vertices, and 39 vertices. There are 103 clusters having a pair of vertices, that is the smallest cluster size in this network.

5783d25e-af65-481b-875a-5afaa23dbf47_figure1.gif

Figure 1. Network graph.

The Kamada-Kawai layout uses a combination of spring model and the technique local minimization of global energy.15 The graph is shown in Figure 2.

5783d25e-af65-481b-875a-5afaa23dbf47_figure2.gif

Figure 2. Kamada-Kawai graph.

Clauset proposed a fast and greedy approach to optimization through an agglomerative type of hierarchical clustering algorithm.11 Figure 3 shows the outputs from the clustering, while Figure 4 shows the dendogram from the clustering. The clusters are the same as the ones shown from the network graph (Figure 1). There are also many two-member communities in this network, which may indicate disposable addresses after single use.

5783d25e-af65-481b-875a-5afaa23dbf47_figure3.gif

Figure 3. Output of hierarchical clustering.

5783d25e-af65-481b-875a-5afaa23dbf47_figure4.gif

Figure 4. Dendogram from the hierarchical clustering.

Looking into the clusters, Figures 5 and 6 show the membership of the largest cluster, with 135 addresses. We notice that all the addresses point toward address 0x0089659f609933d16a5cd6c2be1a5dca1abe24ad. Looking at the Ethereum blockchain Explorer, Etherscan,14 the address belongs to the DrinkChain (DRINK) token.

5783d25e-af65-481b-875a-5afaa23dbf47_figure5.gif

Figure 5. DrinkChain cluster.

5783d25e-af65-481b-875a-5afaa23dbf47_figure6.gif

Figure 6. Dendogram of the DrinkChain cluster.

Figures 7 and 8 show the membership of the second-largest cluster which contains a total number of 55 addresses. Members of the clusters point toward the address 0x8e766f57f7d16ca50b4a0b90b88f6468a09b0439.

5783d25e-af65-481b-875a-5afaa23dbf47_figure7.gif

Figure 7. Maximine Coin cluster.

5783d25e-af65-481b-875a-5afaa23dbf47_figure8.gif

Figure 8. Dendogram of the Maximine Coin cluster.

This address belongs to the token, Maximine Coin (MXM).

Figures 9 and 10 show a cluster with 39 addresses, they all point to the address 0xBA826fEc90CEFdf6706858E5FbaFcb27A290Fbe0.

5783d25e-af65-481b-875a-5afaa23dbf47_figure9.gif

Figure 9. Upbit2 Coin cluster.

5783d25e-af65-481b-875a-5afaa23dbf47_figure10.gif

Figure 10. Dendogram of the Upbit2 cluster.

This address belongs to Upbit2, a digital asset exchange in South Korea. This cluster is most probably showing withdrawals as the network graphs are pointing out from the center.

Figures 11 and 12 show a cluster with 20 addresses, they all point to the address 0x2a0c0DBEcC7E4D658f48E01e3fA353F44050c208.

5783d25e-af65-481b-875a-5afaa23dbf47_figure11.gif

Figure 11. IDEX cluster.

5783d25e-af65-481b-875a-5afaa23dbf47_figure12.gif

Figure 12. Dendogram of the IDEX cluster.

This address belongs to IDEX, a decentralized exchange.

From hierarchical clustering, it is shown there are 170 communities.

To examine this further, network modelling was considered. In this case, two assumptions were made. The first one, the same order of vertex and size, 800 and 1000 respectively, was maintained. The second is the same as the first, but with a fixed degree of sequence.

In random graph modelling, Monte Carlo methods were used to generate approximations quickly over 1000 trials. For each of the trials, community detection was employed. Figure 13 shows the number of communities. It is shown both assumptions detected 24 and 95 communities, respectively. This is nowhere close to the actual 170 number of communities. There could be additional mechanism at work which complicate the density and the social interaction distribution in modeling the network.

5783d25e-af65-481b-875a-5afaa23dbf47_figure13.gif

Figure 13. Relative frequency vs number of communities.

Figure 14 shows the integrated conditional likelihood (ICL), with its corresponding classes Q. It is shown that the fitted model has 4 classes, which suggests a suitable latitude of 4 for this model.

5783d25e-af65-481b-875a-5afaa23dbf47_figure14.gif

Figure 14. Integrated conditional likelihood (ICL) vs classes.

To investigate further, we estimated the posterior probability of each of the class members. We noticed that the evidence for the class membership assignment shows a very strong relationship between vertices with a maximum posterior probability of 99%. From the reorganized adjacency matrix (Figure 15), 0.57894737 and 0.38596491 will be the larger proportions and the rest the smaller class proportions.

5783d25e-af65-481b-875a-5afaa23dbf47_figure15.gif

Figure 15. Reorganized adjacency matrix.

Figure 16 shows the visual summary of the class connection probabilities and the concordance of the 4 classes. Falling short of the expected 170 clusters, the modelling effort was not able to capture additional mechanism that may be present at the density and social interaction distribution level of the network.

5783d25e-af65-481b-875a-5afaa23dbf47_figure16.gif

Figure 16. Summary of class connection probabilities.

Conclusions

The paper presented a targeted analysis of the Ethereum network, capturing the interconnectedness of the system with its notion of elements, in this case, the transactions on the network. The study takes into account only selected transactions in July 2019, which may not be generally representative of the Ethereum network. This will be addressed in future work.

Data availability

Underlying data

Zenodo: Extracted Ethereum transactions for July 2019 from the Ethereum in BigQuery public dataset. https://doi.org/10.5281/zenodo.5263166.15

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Extended data

Analysis code available from: https://github.com/tzenvun/exploratory-network-eth.

Archived analysis code as at time of publication: https://doi.org/10.5281/zenodo.5336085.16

License: GNU General Public License v3.0.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Sep 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Yap TTV, Ho TF, Ng H and Goh VT. Exploratory graph analysis of the network data of the Ethereum blockchain [version 1; peer review: 2 not approved]. F1000Research 2021, 10:908 (https://doi.org/10.12688/f1000research.73141.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 09 Sep 2021
Views
2
Cite
Reviewer Report 21 May 2024
Satpal Singh Kushwaha, Manipal University Jaipur, Jaipur, Rajasthan, India 
Not Approved
VIEWS 2
The paper titled "Exploratory Graph Analysis of the Network Data of the Ethereum Blockchain [version 1; peer review: 1 not approved]" presents some notable findings; however, there are significant issues with the dataset and references that undermine the credibility of ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Kushwaha SS. Reviewer Report For: Exploratory graph analysis of the network data of the Ethereum blockchain [version 1; peer review: 2 not approved]. F1000Research 2021, 10:908 (https://doi.org/10.5256/f1000research.76770.r274737)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
5
Cite
Reviewer Report 04 Mar 2022
Barbara Guidi, Department of Computer Science, University of Pisa, Pisa, Italy 
Not Approved
VIEWS 5
The paper proposes a graph analysis of the Ethereum blockchain. The paper could be interesting but the main issue is the size of the dataset, and its age - the data are too old. It is not clear why the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Guidi B. Reviewer Report For: Exploratory graph analysis of the network data of the Ethereum blockchain [version 1; peer review: 2 not approved]. F1000Research 2021, 10:908 (https://doi.org/10.5256/f1000research.76770.r121271)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Sep 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.