Keywords
Ethereum, graph analysis, blockchain, network modelling
This article is included in the Research Synergy Foundation gateway.
Ethereum, graph analysis, blockchain, network modelling
A blockchain is a distributed and replicated data structure in which a linked chain of digital information known as a block, is stored as a public database or ledger. Each of the blocks in the chain may contain zero or more records of transactions or exchanges. If another exchange happens on the network, a copy of the transaction is added to the record of every member on the network.1
Blockchain has several characteristics that make it so compelling. Firstly, blockchain is immutable. It cannot be corrupted, changed, or altered.2 A blockchain is also decentralized if it is a public one. There is no governing authority overseeing the network. In addition, information on the blockchain is cryptographically hashed, thus preserving privacy and integrity of the data that is stored.2 Ethereum is an open-source blockchain, known for its decentralized smart contract platform. It is programmable and allows for the fabrication and distribution of decentralized applications (DApps).3
This research investigates the transaction data of the Ethereum network, through a graph analysis approach. This is achieved through network visualization and mathematical and statistical modelling of the network data. The work presented here provides a targeted analysis of the Ethereum network, capturing the interconnectedness of the system with its notion of elements, in this case, the transactions on the network.
Small-world models were proposed by Watts et al.4 These creators were interested in the way that numerous networks in the genuine work show significant levels of clustering, yet with the little distance between most vertices. They recommended rather starting with a network graph with a cross-section structure, and afterward arbitrarily 'reworking' a little the level of the edges. Assuming we have many N vertices that are instigated on an intermittent style, each of the vertex joins its neighbors to their respective side.4
Barabási et al. proposed a preferential attachment model for modeling networks.5 The more associated a vertex is, the more probable it is to get new connections. A vertex with a further extent has a more grounded capacity to get joins added to the network.6
Exponential random graph models (ERGMs) have relations to generalized linear models. However, the appropriate specification and fitting of exponential random graph models can be more unobtrusive than that of a standard generalized linear model. Furthermore, a significant part of the standard inferential foundation accessible for generalized linear models, based upon asymptotic approximations to fitting chi-square distributions, is not in the consideration for exponential random graph models.7
Stochastic block models (SBM) are a class of models in the statistical data analysis of the network or graph data, and they can be utilized to find or comprehend the structure and the latent of a network graph for clustering. This model will in general deliver a graph containing networks, subsets described by being associated with edge densities.7
Dataset
The dataset of transaction records of Ethereum was taken from the BigQuery public dataset (Ethereum in BigQuery).8 The attributes from_address and to_address will be used as nodes (vertices) and the relationship between them will be the edges of the nodes in this study. The attributes and their descriptions are shown in Table 1. In this exploratory study, only arbitrarily selected 1000 contiguous rows of transactions in July 2019 are considered.15
Software for analysis
The analysis is performed in R version 4.0.2, utilising the igraph R package for network analysis. The source code used can be found at Zenodo.16
Network graph
A graph is a set of of links (edges), a set of nodes (vertices), and where components of are unordered pairs of distinct vertices . The number of vertices and the number of edges are also called the order and size of the graph , respectively.7
Visualizing the network
To visualize a large network having over 1000 vertex and edge attributes, few methods exist. Kamada, et al. proposed a technique to draw general graphs.9 This technique tends to be broadly utilized in the network structures that are managed by the system. The mathematical displacement between the node in the drawing can be identified by graph-theoretic displacement. The spring algorithm proposed by Kamada, et al. has good properties such as symmetric drawings, edge crossing with a relatively small number of edges, and isomorphic graphs with almost congruent drawings.9
The DrL method provides two-dimensional visualizations of exceptionally huge theoretical chart structures.10 The graph is drawn employing a force-directed algorithm based on recreated strengthening. This clustering is utilized to create a coarsened graph (less vertex) which is at that point redrawn. This handle is rehashed until an adequately small graph is produced.10
Charactering network cohesion
One way to deal with characterizing the network cohesion of a specific network graph is through the identification of the subgraph of interest. The standard case is that of finished subgraphs and henceforth are subgroups of vertices that are completely strong, as in all vertices inside the subgroup are associated by edges. Large groups of subgraphs are considered rare in practice because they need the required network graph itself to be dense.7
is the general formula of the density of a subgraph , that the recurrence of acknowledged edges is comparative with possible edges, in an undirected graph G which has no self-loops and various edges. In the situation that if is a directed graph, will be replaced with the denominator of the equation above. The value of zero and one in is to justify how near the subgraph is to be becoming a clique in the network.7
In graph clustering, the relative frequency also can be one of the methods to defining a cluster. The standard use of the term clustering coefficient normally means the quantity , where is the number of triangles in the graph , and is the number of connected triples. The value of is the transitivity of the graph and is also referred to as the fraction of transitive triples A graph is fully connected if each node is reachable from all other nodes, and that a graph’s connected component is a maximally connected subgraph.7
Hierarchical clustering
Various methods have been proposed for clustering, varying in essentially by the way they assess the nature of the proposed clustering and quality of the enhancement brought about by the algorithms.11 These strategies adopt a greedy strategy to looking through the states of all potential partitions C, through altering progressive applicant partitions iteratively.
One of the most popular measures is modularity.10 Let as a given candidate partition and define to be the fraction of edges in the original network that connect vertices in with vertices in . The modularity of is the value , where is the expected value of under some model of random edge assignment.10
Classical random graph modelling
Graph modelling refers to a collection , where is a collection of possible graphs, is a probability distribution on , and is a vector of parameters, ranging over possible values in . A collection and a uniform probability over is usually used to refer to a random graph model.7 Erdos et al. established a series of the seminal paper on the classical theory of random graph models.8 Erdos et al.’s model specifies a collection of of all graph with and , and assigns probability to each , where is the total number of distinct vertex pairs.8
Network block models
In a random graph, G, let . The graph can be categorized into one of Q classes. Each element of the block model for a particular graph can be specified as each element of the adjacency matrix Y, labelled as q and r of vertices i and j, respectively, with a probability of Bernoulli random variable. Snijders, et al. introduced a parametric limit of the form.13
The network graph (Figure 1) shows a total number of 170 clusters among 1000 rows of data. The top three clusters inside the network graph consist of the size of 135 vertices, followed by 55 vertices, and 39 vertices. There are 103 clusters having a pair of vertices, that is the smallest cluster size in this network.
The Kamada-Kawai layout uses a combination of spring model and the technique local minimization of global energy.15 The graph is shown in Figure 2.
Clauset proposed a fast and greedy approach to optimization through an agglomerative type of hierarchical clustering algorithm.11 Figure 3 shows the outputs from the clustering, while Figure 4 shows the dendogram from the clustering. The clusters are the same as the ones shown from the network graph (Figure 1). There are also many two-member communities in this network, which may indicate disposable addresses after single use.
Looking into the clusters, Figures 5 and 6 show the membership of the largest cluster, with 135 addresses. We notice that all the addresses point toward address 0x0089659f609933d16a5cd6c2be1a5dca1abe24ad. Looking at the Ethereum blockchain Explorer, Etherscan,14 the address belongs to the DrinkChain (DRINK) token.
Figures 7 and 8 show the membership of the second-largest cluster which contains a total number of 55 addresses. Members of the clusters point toward the address 0x8e766f57f7d16ca50b4a0b90b88f6468a09b0439.
This address belongs to the token, Maximine Coin (MXM).
Figures 9 and 10 show a cluster with 39 addresses, they all point to the address 0xBA826fEc90CEFdf6706858E5FbaFcb27A290Fbe0.
This address belongs to Upbit2, a digital asset exchange in South Korea. This cluster is most probably showing withdrawals as the network graphs are pointing out from the center.
Figures 11 and 12 show a cluster with 20 addresses, they all point to the address 0x2a0c0DBEcC7E4D658f48E01e3fA353F44050c208.
This address belongs to IDEX, a decentralized exchange.
From hierarchical clustering, it is shown there are 170 communities.
To examine this further, network modelling was considered. In this case, two assumptions were made. The first one, the same order of vertex and size, 800 and 1000 respectively, was maintained. The second is the same as the first, but with a fixed degree of sequence.
In random graph modelling, Monte Carlo methods were used to generate approximations quickly over 1000 trials. For each of the trials, community detection was employed. Figure 13 shows the number of communities. It is shown both assumptions detected 24 and 95 communities, respectively. This is nowhere close to the actual 170 number of communities. There could be additional mechanism at work which complicate the density and the social interaction distribution in modeling the network.
Figure 14 shows the integrated conditional likelihood (ICL), with its corresponding classes Q. It is shown that the fitted model has 4 classes, which suggests a suitable latitude of 4 for this model.
To investigate further, we estimated the posterior probability of each of the class members. We noticed that the evidence for the class membership assignment shows a very strong relationship between vertices with a maximum posterior probability of 99%. From the reorganized adjacency matrix (Figure 15), 0.57894737 and 0.38596491 will be the larger proportions and the rest the smaller class proportions.
Figure 16 shows the visual summary of the class connection probabilities and the concordance of the 4 classes. Falling short of the expected 170 clusters, the modelling effort was not able to capture additional mechanism that may be present at the density and social interaction distribution level of the network.
The paper presented a targeted analysis of the Ethereum network, capturing the interconnectedness of the system with its notion of elements, in this case, the transactions on the network. The study takes into account only selected transactions in July 2019, which may not be generally representative of the Ethereum network. This will be addressed in future work.
Zenodo: Extracted Ethereum transactions for July 2019 from the Ethereum in BigQuery public dataset. https://doi.org/10.5281/zenodo.5263166.15
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Analysis code available from: https://github.com/tzenvun/exploratory-network-eth.
Archived analysis code as at time of publication: https://doi.org/10.5281/zenodo.5336085.16
License: GNU General Public License v3.0.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
No
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Blockchain Technology
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
No
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Zhao Lin, Sen Gupta Sourav, Khan Arijit, Luo Robby: Temporal Analysis of the Entire Ethereum Blockchain Network. Proceedings of the Web Conference 2021. 2021. Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Blockchain, Peer to Peer Systems, Social Media
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 09 Sep 21 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)