1 Introduction

It has become common for offshore business to utilize remote or cloud computers located in their motherland to achieve higher efficiency, safety and dependability (e.g., recovery from disaster or serious failure). Smooth and efficient web-based video conferences with offices in other countries are also important for the business activity of many multinational enterprises. Because of the cost, such services are usually accessed via the open (ordinary or public) Internet.

Nowadays, the open Internet is no longer transparent due to providers’ differentiation of traffics, restriction by governmental agencies (authorities or organizations), and network outages caused by cyber-attacks. Three early studies [13] reported blocks for BitTorrent [2, 3] or for general traffic based on port numbers [1].

In some countries, international communication channels of the Internet are continuously monitored by governmental agencies and can be suddenly restricted. For instance, the Golden Shield (GS) [4] is a Chinese surveillance project that manages connections to and from foreign countries. A subsystem of the Golden Shield, which is known as “the Great Firewall” () blocks access to the Internet. Packets coming from foreign IP addresses are prevented from being routed through. “The Great Firewall” consists of so-called standard firewalls (e.g., Windows firewall) and proxy servers located at China’s main Internet gateways. The Great Firewall is also known to selectively engage in Domain Name System (DNS) cache poisoning when particular sites are requested.

Meanwhile, blocks by the Great Firewall or GS apply not only to individual IP addresses which are targeted by transmission Control Protocol (TCP) Reset [5], but also to larger groups of IP addresses that include the targets. In other words, the GS block can disable communications with groups of IP addresses and even multiple subnets connected through many international channels related to targets. GS successively shuts down or blocks international channels connecting the target-related subnets if the destination addresses are close to the target sites or in these subnets. This way, business sites also get involved and significantly affected. Because of such limited connection interfaces with other domains, the network within the Great Firewall or GS is often considered as a single autonomous routing domain. GS blocks cause business discontinuity for users of application servers located outside this domain. For such (offshore business) users, GS significantly degrades network response during Web-based video conferences, access to cloud or remote computers, etc.

Tunneling by VPN has been used since long to circumvent blocks or censorship. Indeed, there are many VPN services in China and users can manually switch to VPN if GS blocks occur. However, manual switch means that business users notice significant network latency. It requires time and effort, and may cause loss of business opportunities. Furthermore, manual switching to VPN can itself be discouraged by governmental censorship. Since late 2012, the Great Firewall has started trying to block VPNs: well-known VPN services are reported to be censored and blocked. This problem is alleviated by costless volunteer VPN gateway servers and their collaborative detection using spy lists [6] against governmental attack or censorship. However, using volunteer VPN servers is not stable for business use as their number or quality may change, even if they are free. In addition, besides monitoring known VPN servers, the Great Firewall includes entropy-based tests of sampled packets for identifying what encrypted VPN traffic looks like and killing suspect VPN connections [7]. Therefore, manual activation of VPNs to static remote addresses is no longer a viable option.

As related technologies to network virtualization or tunneling, there are various overlay network technologies such as peer to peer (P2P) (e.g., BitTorrent, CDN [8]) and STT [9]. P2P aims at achieving efficiency and anonymity through distribution, which has very different purposes from circumventing intentional blocks. STT is a promising state-of-art overlay network or network virtualization technology used in Software Designed Network (SDN). It uses IP tunneling to support automatic switching of a physical network among multi-tenant virtual machines (VMs) belonging to virtual data centers [10]. SDN is an emergent computing and networking paradigm, and has become one of the most popular topics in IT industry. It separates control and data communication layers. SDN latency control can be exploited for attacks but is been mostly used to enforce uniform latency in a virtual network [11].

In [12], the authors discuss how to provide open APIs to SDN virtual routers to support development of network-as-a-service applications. Although they do not mention explicitly latency control, their proposal would allow writing applications to handle and manage latency according to requirements. The idea of latency control at the cloud periphery is discussed in detail in [13]. Unfortunately, currently available SDN latency control technology cannot manage Internet latency by remapping the IP destination addresses of open Internet connections between remote clients and the virtual data center. Supporting network switching at such peripheries would require special-purpose virtual devices which are currently unprevailing and costly. Meanwhile, the VPN technology is currently prevailed, simple and practical.

In this paper, we propose an automatic VPN bypass method for network virtualization allowing offshore business activities which need to circumvent connectivity blocks to keep connected to computers on the motherland. We rely on the notion that the blocking procedure cannot be changed at will, and that an “RTT signature” of the block onset can be identified. Through analyzing the phenomenon, we discovered that the RTT: Round Trip Time of Internet Control Message Protocol (ICMP) echo increases stepwise at the pitch of 50–500 ms per step from 1 h (recently often from 10 min) to several hours during GS blocks. Exploiting such staircase waveform of network delay, our method automatically recognizes the block in the early stage and activates a VPN bypass before significant network response degradation is experienced by users. Intelligent routers of user’s offices automatically switch the network, modifying the destination address from the open Internet one to the (dynamic) one of a VPN server to preserve network response or QoS during blocks.

Our method confines VPN utilization to the duration of the block to (1) avoid the VPN server address itself to be a target of blocking, or alleviate the problems such as previously mentioned automatic entropy-based identification of VPN traffic by reducing the probability of successful sampling, (2) be less costly. Thus, to ensure the prompt release of VPN links at the end of GS blocks, we use a set of asymmetric criteria each different in deciding when to start and finish bypass. Respectively, (1) differential values of network latency are used together with absolute threshold values for detecting the onset of GS blocks in the early stage, (2) absolute threshold values only are used but together with the continuity time, to identify the block removal and go back to the open Internet connection.

Lately, the interval between each staircase step got shorter, from several hours to tens of minutes. Some steps that have smaller increase ratio (e.g., 20 %) of RTT appear between larger steps. In such cases, the riser of each step does not exceed 30 % of RTT increase ratio. It is important to remark that the RTT increase ratio (%) in such step risers may also vary a little due to incomplete congestion control on the part of the Internet. Thus, in some cases, the onset of a block may be characterized by successive small RTT increases, each below 30 % in RTT increase ratio. Indeed, such noise may partially mask the abrupt RTT increase caused by GS, making it difficult to predict the onset of blocks.

Even when these diversion techniques are employed, our method, in spite of its simplicity, detects the block’s onset sufficiently correctly; namely analysis of over a hundred of real blocking data shows that our method works correctly in more than 97 % of the cases. This way, business users scarcely notice nor feel the blocks in the international channel.

This paper is organized as follows. Section 2 outlines related work and its difference to our approach. Section 3 illustrates practical problems in China including governmental restriction on network access in foreign countries. Section 4 describes our network virtualization method to enable stable and dependable communication services for offshore business. Section 5 evaluates the effects. Section 6 concludes the paper.

2 Related work

In case of blocks caused by unintentional or natural network (including network device) failures, detour routes are automatically activated by ordinary routing algorithms. However, if the attack is performed by network operators or GS, even such detours will be blocked persistently. In our method, data are transferred through paths that are different from the routes provided by the underlying IP network layer. For this reason, we define our method as an overlay network or network virtualization. In other words, our bypass mechanism is performed not by network carriers (who may be involved in setting up the blocks) but by users (i.e., their intelligent routers) who automatically switch to alternatives from paths blocked or deteriorated.

2.1 Overlay network-related work

A typical, overlay network is a peer-to-peer (P2P) system where peers communicate with each other by superimposing a protocol layer over IP. This way, a peer can have a stable identity and neighborhood in the overlay layer while changing continuously its IP address [14]. There exist various P2P implementations, mainly aiming at data distribution efficiency and anonymity.

In file exchange software such as Bittorrent [15], each node of a P2P network holds a file or a piece of it; upon request from a certain node, the data are distributed from the node having better communication (c.f., the one “closest”) to the requesting node. This improves communications speed, network traffic, and data distribution. Additionally, anonymity can be attained by interposing many intermediary nodes between data distributers and receivers, as in the Tor Project [16] making it difficult to identify the original information source. A content delivery/distribution network (CDN) is another type of overlay network where users can receive the data efficiently from the mirror server closer to them.

Besides the aforementioned overlay technologies, any Virtual Private Network (VPN), used as a single network to put distant LANs in contact via the Internet can be seen as an overlay network. Various technologies for VPN are available: standard technologies such as Security Architecture for Internet Protocol (IPSec) [17] that implements a VPN on Layer 3 and L2TP [18] on Layer 2; Point-to-Point Tunneling Protocol (PPTP) [19] originally a vendor specification but now often used as a de-facto standard. There are also VPN products that provide a VPN via Secure Sockets Layer (SSL). Our system automatically switches from the Internet to VPNs to avoid blocks of the public (open) Internet links and preserve QoS or network response for offshore services.

STT [9] is also an overlay network technology used in SDNs (Software Designed Networks) for network virtualization [10] creating multiple virtual networks to be shared among multi-tenant VMs (Virtual Machines) within a datacenter. Such virtualization is promising but, as we mentioned in the previous section, does not work at the cloud periphery, i.e., on the Internet links between remote clients and the datacenter.

The multilayer overlay network architecture (MON) [20] was proposed for enhancing IP services availability against Denial-of-Service (DoS) attacks. In this work, a DDME (DoS Detection and Mitigation Engine) keeps track of all the incoming flows using a flow information monitor (FIM). The FIM records the number of packets observed per flow, and the time that each flow was initiated. Then, the DDME periodically examines (with a period T) whether the packet rate of any flow has exceeded the legitimate configurable threshold. If this is the case, packets belonging to this specific flow are “punished” (i.e., delayed or dropped). Differently from our proposed method, DDME uses neither RTT nor its differential value but the threshold of the packet rate. DDME includes an exponential punishment mechanism [21] and a sophisticated attack detection technology depending on the observation of attacking packets. However, this method is not applicable if attacks are performed to networks under no control of users as in GS.

Similar approaches include the stateless spread-spectrum paradigm [22]. In this approach based on distributed hashing technique, data are converted /encoded into hashes and sent through multiple paths formed by overlay networks. Even if some of them are blocked, the original data can be decoded from hash values sent through other paths. This approach requires multiple and considerably high-performance paths to be constantly active during the period of usage, which is somewhat costly to maintain. Moreover, such large-scale mechanism can be easily detected. Compared to this, our method relies on just one alternative VPN path during the usage period, independently selected, easily replaced and operated on censoring or on performance deterioration. Meanwhile in Japan, a dedicated line has been used for setting up bypass connection to test cloud centers [23]. However, the cost of this solution is much higher.

2.2 Traffic block detection-related work

More recent related works propose systems or methods to counter providers’ blocking or differentiation of some types of traffic such as large file transfer. For detecting traffic differentiation, NetPolice [24] (previously named NVLens [25]) compares the aggregate loss rates of different flows to infer the presence of “network neutrality violations” in backbone ISPs. NANO [26] uses causal inference to infer such violation as traversing a particular ISP leads to poorer performance for certain kinds of traffic. Based on active queue management (AQM), such as random early detection (RED) and weighted fair queueing, DiffProbe [27] can detect differentiation leading to small increase in latency, which complements Glasnost [28] by Dischinger.

Monkey [29] is a TCP replay tool taking a packet-level trace as input and generating a new trace with similar network-level properties, such as latency and bandwidth. More recent work [30] investigates ways to infer higher level protocols from low-level packet traces. Measurement systems or network testbeds, such as PlanetLab [31], RON [32], and NIMI [33] are designed explicitly for researchers. Netalyzr [34] is a web-based measurement tool that provides lay users with an easy-to-use interface and allows them to detect any manipulation of Web content by an HTTP proxy or traffic blocking on some ports.

“Glasnost” [28], and the “Test Your ISP” Project [3] are research products for allowing lay users to monitor Internet blocks and restrictions. They provide a very easy-to-use interface and functions for detecting differentiation. The DIMES Project [35] and the Measurement Lab [36] are also research works toward this kind of measurement. Network Diagnostic Test (NDT) [37] can capture detailed connection statistics.

More in detail, Glasnost [28] can detect ISP’s differentiation between flows of applications. Many kinds of data are collected to check traffic shaping or restrictions: round trip time (RTT), TCP downstream and upstream throughput, User Datagram Protocol (UDP) jitter and datagram loss, etc. Glasnost compares the throughput of a pair of flows to determine if traffic differentiation exists. However, to overcome the problem of noise interference due to cross traffic on the network, flows are running multiple times back-to-back and along the same network path. Thus, such detection is very high in cost and subject to censoring.

An integrated measurement software [38] collects the measurement data by the iPerf [39] and ping tool. The iPerf tool measures throughput and bandwidth of the TCP as well as the data loss and jitter of UDP. The ping tool, by sending several short packets to a server, provides the minimum, maximum, and average RTT to indicate packet loss. However, it can neither provide differential value of RTT nor utilize it, which is different from our method.

2.3 Discussion

There are many existing works to detect traffic blocks or differentiation by providers including governmental organizations. However, even the excellent systems or tools such as Glasnost have problems including scalability (due to its centralized architecture) and vulnerability to obfuscation by providers or governmental organizations. Further such systems are usually or practically not integrated with automatic network bypassing mechanism using such as overlay networks. These cause users experience network deterioration and business activities stop on GS.

Exploiting the first derivative of RTT measured by routers distributed in each client site, our proposed virtualization method can detect intentional blocks such as GS. Then, automatic switch to a bypass exploiting an overlay network such as VPN or proxy can be done at once before users experience network latency. Further, our method uses just one alternative VPN path and yet just on the time period of GS blocks. This solves censoring problems as well as cost problems caused by CDN (censor as well as storage cost), spread-spectrum paradigm (communication cost), etc. and much more cost problems caused by dedicated lines.

3 Restrictions of telecommunication in China

This section describes problems in accessing cloud or remote computers, using network environments in China.

3.1 Hurdle in operating distributed systems in China

Although over 500 million Internet users exist in China, local Internet suffers from low speed and quality of peering, i.e., connections between ISPs, as shown in Fig. 1 This problem becomes remarkable if system components are distributed among multiple offices and access each other via Internet connection to different ISPs. This causes malfunctions of distributed systems that rely on application servers, such as MetaFrame [40].

Fig. 1
figure 1

Internet exchange in China

Figure 1 illustrates a case where clients in offices connected to multiple ISPs (ISP-A and ISP-B), access the servers in headquarter connected to ISP-H. This type of implementation is frequently applied without problems in other countries, but is an issue in China because of the peering problem. Internet exchanges between China Telecom and China Unicom, both of which are among the world’s largest carriers, are very narrow. The quality, speed, and dependability of communication over such links are not guaranteed. This problem is sometimes referred as “North–South problem”, as the problem happens when clients of China Unicom in the South have their data center hosted by China Telecom in the North.

In addition, the Golden Shield Project (GS) or DNS poisoning and URL filtering are often used to prevent access to Web sites as Facebook or YouTube. Such restriction includes blocks of international communication links to affect business traffic as well as daily life. Such incidents become frequent around the time of large political ceremonies like National People’s Congress.

3.2 Alternative business system

As explained above, the peering problem between ISPs in China causes deterioration of communication quality. This problem is serious for companies that run a distributed system where the clients in multiple offices are linked to servers at data centers.

Most Internet traffic between China and the rest of the world is routed through three gateways: the Beijing–Qingdao–Tianjin area in the north, where fiber cables come in from Japan; Shanghai on the central coast, where they also come from Japan; Guangzhou in the south, where they come from Hong Kong. Other lines include connection from China to Russia via Central Asia, but carry very little traffic.

Under normal conditions, the connection between an ISP in China and one in Japan is often faster than between two nodes both located in China. Thus, client computers in China executing collaborative applications communicate with each other via servers in Japan. In principle, multiple paths to reach Japan are available and clients can choose the best one according to the situation. However, some are inconvenient (e.g., require manual (VPN) bypassing to cope with failures or blocks of international channels) and others are expensive (e.g., require dedicated lines) or can easily be censored.

Figure 2 illustrates the architecture of our business system using offshore cloud or remote computers. We show only two offices in China for simplicity, but in real cases there are many. In China, offices of the same company are often connected to different ISPs. In our example, Office 1 is connected to ISP-A, which serves a local region, while Office 2 is connected to ISP-B that serves another local region. As mentioned, peering among these ISPs is often low bandwidth. Figure 2 shows a scenario where headquarters of companies and data centers are connected to the Internet in Japan, where application software programs are also run. In contrast to virtual terminal environments like MetaFrame, which requires data and programs to be placed inside a single server, our framework allows datacenters to be distributed and connected to the communication infrastructure in Japan, which is considered more stable and dependable.

Fig. 2
figure 2

Our network virtualization solution

This feature gives flexibility of system deployment because each data base of each offshore company can be deployed to a different server in Japan. In addition, copies of the database can be deployed to different servers, creating a distributed data center in Japan shared by multiple branches or companies in China. This provides a more stable and dependable virtualized storage infrastructure, similar to a private cloud system.

However, a serious issue in this architecture is the stability of the international Internet connection. This is a political rather than technological issue. The Golden Shield (GS) operated by the Chinese Ministry of Public Security can block services such as Twitter, Facebook and YouTube when regarded as harmful by Chinese government, as well as foreign Web sites with political messages against the country. GS can restrict protocols such as e-mails also other than those of Web. It is not foreseeable when a connection will be blocked.

GS applies access restrictions to individual foreign IP addresses by issuing a TCP Reset command to source and destination IP addresses [5], as follows. When a host located in China starts a TCP three-way handshake with a foreign address, GS surveillance computers within China get a copy of the connection request and quickly check a blacklist of forbidden IP sites. If the connection request is trying to reach an address on that blacklist, Chinese international gateways interrupt the transmission by sending an Internet “Reset” command both to the originating host and to the server it is trying to reach.

Although this TCP Reset technique only works on each individual IP address or TCP traffic, GS also can set up general blocks of even multiple international lines by successive shutdown of the routes to groups of IP addresses including the target. In other words, the GS block can disable communications with groups of IP addresses or a subnet and even a huge amount of subnets connected through many international channels related to targets. This way, even business sites get involved and significantly affected by the blocks or successive shutdowns of international channels connecting the target-related subnets in case the destination address is close to some of the target sites or in the same subnet. Thus, there is a serious risk of business interruption due to involvement in the restrictions by GS.

4 Network virtualization for stable business communication

To solve the problem mentioned above, we present an intelligent bypass method using VPN. Our solution automatically switches between the ordinary Internet and more stable VPN contracted considering bandwidth with reasonable cost. This switch is automatically done for Network virtualization to construct virtual or overlay network without users’ consciousness even on sudden governmental restriction to international communication. A VPN Gate server and/or proxy server as shown in the bottom of Fig. 2 accepts connections from office’s intelligent VPN routers. Automatically controlling the IP packet flow, the server relays them to its counterpart connected to the Internet in Japan or offshore remote computers or clouds. To achieve both effectiveness of communication and the immunity or avoidance from blockage, our intelligent VPN router dynamically changes the route to send IP packets. It ordinarily sends them to a regional ISP (i.e., ISP-A for Office 1). Using differential calculus, it detects the onset of governmental network blockage. Then, it changes the path to the VPN Gateway Server. Once the block is over, it switches the path back to the ordinary Internet (ISP-A in this case.)

This section concretely explains the proposed method and its rationale. Furthermore, the next section discusses validation using real data.

4.1 VPN bypass for stable communication with offshore computers

On the Internet, traffic among Autonomous Domains (ADs) so-called Autonomous Systems (ASs) routed using Border Gateway Protocol (BGP), which allows AD border routers to announce which destination networks are reachable through them and the expected delay to reach them. When an intentional block is set up by the Golden Shield project (GS), BGP cannot compute low-latency alternative routes for some international destinations and packets directed there get significantly delayed. This slowdown effect takes place multiple times, as routes are progressively shut down, before significant packet loss can be observed. To circumvent the block, users have to change the destination address of their packets to the communication bypass entry point on the Chinese side.

In our case, the communication bypass entry point is the gateway of a VPN connection using IPSEC [17]. Of course, the route from user’s offices to this VPN gateway server (CN-side) may become congested. However, the delay on GS is lower than the one at routers/IXs located at or close to the few international communication gateways in China. This is confirmed by the waveform RTT data (below 150 ms as the dashed lined “Bypass Link” shows in Figs. 5 and 6).

As shown in Fig. 2, our intelligent VPN routers continuously monitor the behavior of network services and select which path to forward the packets to. To avoid the VPN gateway address being identified and attacked, the intelligent routers of user’s offices switch routes to the VPN gateway only during GS blocks. Namely as described later in detail, if no intentional block is taking place, users had better use the (open) Internet connection rather than VPN.

On the Japanese side of the VPN connection in Fig. 2, a Multi-protocol Proxy (so-called Web Proxy) including ICMP Proxy [41] provides reverse proxy functionality to allow remote users to access applications. Thanks to the proxy, the service providers can grant selective access to application protocols running on servers inside the organization in Japan to remote users located outside (e.g., in China) of the organization. The process to make the application available externally is known as publishing. Application publishing enables end users to continue accessing their organization’s applications via the VPN link as they did via the open namely ordinary Internet connections.

4.2 Intelligent switch to/from VPN

The timing of the switch to VPN is very important for stable communication with offshore cloud computers. Intelligent VPN routers have to recognize the onset of a block before users are aware of network deterioration. This abruptly happens and causes a business activity stop. According to our experience, the behavior of the Internet services is very different between unintentional malfunctioning and intentional attack. Disruptions caused by the former initially affect the dropped packet ratio, while the latter gives away some telling signs of the onset in terms of latency (RTT). In the case of GS, network latency increases following a staircase pattern.

Taking these considerations into account, switching to/from VPN bypass is performed by our proposed intelligent VPN routers as follows.

Step 1: The router sends ICMP Echo Requests to the offshore servers used by application programs in the office, and records the turn-around time (RTT) every 15 s for the corresponding ICMP Echo Replies to return (ICMP is forwarded by the proxy in Fig. 2).

Step 2: The router calculates changes of the RTT. If the RTT increase ratio is higher than a preset threshold of tdf% per tm seconds and the RTT absolute value is longer than t1 milliseconds, GS mode (initially reset) is set, and the communication path to the offshore cloud or remote computers is switched from the Internet to a VPN bypass. Denoting by RTTC, the current RTT, and by RTTB, the RTT, pings measured a tick before, we have

$$\begin{aligned} {\text {RTT increase ratio}}=100\times \frac{( {{\text {RTTC}}-{\text {RTTB}}} )}{\text {RTTB}} \end{aligned}$$

Step 3: The router continues to record RTT of ICMP Echo packets. When it becomes less than a predefined threshold t2 milliseconds over t3 minutes, GS mode is reset and the router changes the communication path from the bypass link back to the Internet.

Typical values of threshold used in the above process are tdf \(\,=\,\)30 % per tm \(\,=\,{15\sim 360}\) s with RTT over t1 \(\,=\,\) 140 ms at the time for that used in Step 2, and t2 \(\,=\,\) 200 ms over t3 \(\,=\,\)30 min for that in Step 3. Such parameters are refined or adjusted on the real use, depending on the situation. We remark that practically this control is enabled only when the network is being used. Intelligent routers check network traffic, and when no or little traffic is observed, e.g., during the night, they disable switching to limit pointless use of bypass routes.

A major feature of our network virtualization method is the asymmetric control of path selection between Steps 2 and 3. Namely, differential values and absolute values of RTT are used for switching to bypass, while only absolute values combined with the elapsed time are used for switching back to the Internet. This is obtained from our practical experience with the systems operated in China for several years. Below, we provide the rationale of our method.

It is a straightforward approach to switch paths once the length of RTT becomes larger than a fixed threshold. It does not work in practice due to the physiological or comparatively gradual slow-downs of Internet traffic in peak hours. In China, for example, the Internet becomes congested in the evening almost every day. Thus, a simple rule specifying an absolute threshold of turn-around time, such as, “if RTT is more than X ms (milliseconds), it is a block” has problems as follows. Namely, it brings either to false positives if threshold X is even slightly low (e.g., 200 ms) or to noticing the block or attack when it is too late if threshold X is even slightly high (e.g., 220 ms). Section 5.1 explains this in detail later. To distinguish congestion from blocks or attacks, we rather look at the first derivative of RTT. Governmental blocks (shutdowns) introduce abrupt staircase increases of network latency, while performance deterioration caused by ordinary daily congestion has a lower rate of change. Even if an Internet slowdown is caused by an accident or a natural disaster, it remains local. It does not show the abrupt multiple staircase increase typical of the onset of intentional blocks such as GS. Based on this insight, to recognize or predict the start (onset) of restriction or intentional barrier, our method uses a threshold on a differential value or the first derivative together with that on an absolute value rather than thresholding absolute value only.

On the other hand, such tendency is not shown at the end of GS blocks. From our practical experience, the RTT does not change (decrease) so abruptly at the end as it increases at the onset of GS blocks. We believe this happens because a large amount of traffic flows in each ordinary Internet route when it is re-announced to re-open at the end of GS blocks. This results in the congestion of the ordinary or open Internet and the RTT does not decrease so rapidly. In practice, switching communication paths back from VPN bypass to the Internet is usually not as urgent as avoiding the shutdown or blockage of international channels. Therefore, we use simply RTT itself to switching back to the Internet, instead of its differential value.

The rationale behind using the Round Trip Time of ICMP Echo (RTT) rather than other parameters such as packet-loss rate is as follows. At the beginning of an intentional network block by GS, some routes are shut down, but packets can be re-routed to the ones still open and not dropped. Indeed, pings namely ICMP Echo packets from our intelligent routers to the open Internet (not to VPN) are not dropped and come back without disappearing. Thus, the onset of the GS block can be easily missed if packet-loss is used. Eventually, packet-loss ratio cannot show the block (GS) onset, while RTT increase ratio reflects it much better.

Figure 3 shows the change of the RTT of an ordinary day, which is factored in our system. This day, no block was observed. The RTT increased and the quality of Internet communication decreased as the time of day passed, but no step-formed large (over 250 ms) increase was detected. The communication was performed using the open Internet the whole day without significant delay not through VPN bypass. Indeed, sometimes (around 9:00–12:00, and at evening around 15:00–22:00 as shown in Fig. 3), the value of RTT became larger than the absolute threshold (e.g., 140 ms as mentioned in Step 2). But the switch to VPN bypass did not occur, since the differential threshold of 30 % per 3 min was not exceeded.

Fig. 3
figure 3

RTT (turn-around time of ICMP echo) on an ordinary day

Some readers may wonder why automatic switching based on RTT beats (outperforms) or surpasses using the VPN link all the time. There are two reasons for this: first, permanently using VPN is costly (our VPN bypass partly use a leased line when the Internet is congested), and secondly—and more importantly—to avoid identification and attack of the VPN gateway address, e.g., alleviate the problems such as automatic entropy-based identification of VPN traffic by reducing the probability of successful sampling. Thus, we need to use the VPN bypass only for emergency.

Each solid line in Figs. 4, 5, and 6 shows the RTT (round trip time) through the Internet on various real blocks by the golden shield (GS). When GS was activated, the bypass was switched. The resulting RTT (i.e., the response time) through the bypass remained stable (less than 150 ms) as shown by the dashed lined Bypass Link in the lower part of Figs. 5 and 6.

Due to differential calculation of RTT, our tool (step 2) recognized the onset of GS, at the first step of the stair case in Figs. 4, 5, and 6 (also at the first step of the 2nd attack, namely at the 7th step around 20:00 in Fig. 4). Immediately and automatically, it switched from the Internet to the stable bypass. Once the block was removed our method (step 3) switched back to the Internet again.

Fig. 4
figure 4

RTT on typical but two attacks GS

Fig. 5
figure 5

RTT on typical but long GS

Fig. 6
figure 6

RTT on single step GS

5 Evaluation

5.1 Quantitative evaluation

This section describes quantitative evaluation of our proposed method, especially rules using differential calculus to detect the onset of GS and switch to bypass before clients experience the significant network latency. Here, Rule 1 (cf. Step 2 in Sect. 4.2) for switching to VPN is (1A) RTT increase ratio becomes (tdf \(=\)) 30 % per (tm \(=\)) 360 s and (1B) RTT is more than (t1 \(=\)) 140 ms. Meanwhile, Rule 2 (Step 3 in Sect. 4.2) to return to the Internet is (2A) RTT stays below (t2 \(=\)) 200 ms and (2B) for more than 30 min. Its validity is evaluated using typical examples of the real GS (Golden Shield) as shown in Figs. 4, 5, 6. Such stair-case or step-formed RTT increase is confirmed by more than one hundred cases of data on real GS blocks.

Figure 5 shows the RTT graph of a day when a typical GS attack happened. The first sudden delay can be observed at 4:36 a.m., the second starts at 6:06 a.m. and the third at 8:00 a.m. Table 1 shows the RTT increases at these three times. Due to Rule1, our system successfully detected the GS attack at 4:36:00 sufficiently in time for avoiding heavy response such as RTT reaching 335 ms at 6:06:00, and seamlessly switched to VPN bypass. Due to this network virtualization, users could keep their business going without even noticing the network deterioration or what happened in the Internet communications. Indeed, Fig. 5 shows four or five such sudden changes on this day. However, the network was successfully virtualized and none of the RTT increases by GS blocks did cause any deterioration suffering the users. Such deterioration would have been experienced by users in case of waiting for RTT exceeding the absolute (not differential) threshold even though it may be well adjusted to be stable one (e.g., 220 ms).

Table 1 RTT increase on GS of Fig. 5 (“–” in RTT increase ratio means below 30 %)

In Fig. 6, the RTT graph of another (different) day having “just” one (-step type of) GS attack is shown. Again using Rule1, at 10:48 a.m. our system detected the problem by the sudden change (RTT increase ratio 35 % which is more than tdf: 30 %) at RTT 175 ms which is more than 140 ms (but below 250 ms where users experience deterioration) and the bypass was switched or activated. After switching, the RTT was less than 150 ms as shown by the (blue colored Bypass Link) line graph in the lower part of Fig. 6. Again users could continue doing business without experiencing the Internet response deterioration.

As far as the time to return to Internet is concerned, we see 18:12:00 in Fig. 5, and 12:42:00 for Fig. 6, due to Rule2. Namely, the bypassing time is confined within 13 h 30 min even for the long (multiple steps) GS block of Fig. 5, and within 1 h 54 min for the short (one step) GS block of Fig. 6.

Meanwhile, Table 3 shows three largest increases of RTT on a usual day in Fig. 3, having no GS. However, none of these has more than 30 % of RTT increase ratio. So, on such a normal day having no GS, neither bypassing occurred nor RTT did exceed 250 ms, where users experience network latency.

5.2 Discussion

Figure 4 shows the staircase waveform of RTT on typical GS. The latency significantly and abruptly increases every 1–4 h (lately frequently every 10–100 min), creating the step’s rises, while only small increases of RTT are observed between abrupt increases of RTT (the steps’ treads). It is obvious that significant (abrupt) RTT increases are triggered by intentional blocks. Meanwhile, the latter small increases are mostly caused by congested packets queued in routers of the currently used still unblocked channel, though such congestion is resolved by the open or ordinary internet routing mechanism and this small RTT increase diminishes or even the RTT starts decreasing. The amplitude of this small oscillatory delay becomes larger during Internet peak hours. However, this oscillatory delay fortunately helps our method detect GS blocks, in that even such small RTT increase is added to the step’s rise at the beginning of GS blocks. Owing to this, tdf does more easily exceed 30 % and switch to bypass does readily occur. This is different from the following case.

On the other hand, sometimes the RTT increase height decreases according to the capacity increase of international channels and/or decrease of the temporal or periodical (e.g., every midnight) traffic volume. For example, in Fig. 5 the third step of abrupt increase of RTT is small (50ms or less). This is a half of ordinary step rise though such rises are more than 100ms in the RTT increase. If such a small step rise of RTT in Fig. 7 flocks together to continue successively, our intelligent router may not be able to detect the onset of GS. For example, if RTT increases as follows: 135 ms –> 175 ms –> 220 ms –> 280 ms, tdf does not exceed 30 % and the switch to VPN cannot occur.

Fig. 7
figure 7

Switching error occurring case

Such cases can happen when the neighbor international channels are successively blocked and the currently using international channel is abruptly but shortly or half-way (congested and) delayed by the packets from the neighbors. However, such errors (misclassification) occurred only 4 times out of 159 GS blocks last year. Thus, the error rate of our method was 2.5 %. Therefore, the network is virtualized with very high possibility lest clients should experience significant latency causing their business activities stop.

GS occurs around 3 times per week. Figs. 4, 5, and 6 are similar and typical in their staircase shapes, though the staircase of Fig. 6 has only one step. Nevertheless, as proved quantitatively in Tables 1, 2 and 3, our approach, focusing on abrupt stair-case increase in RTT, was effective enough to quickly predict the network blocks in the early phase. Over 30 % RTT increase on GS in Tables 1 and 2 and below 30 % RTT increase on a usual day in Table 3 showed our method quickly predicted the onset of GS usually at the first step of the RTT stepwise increase and kept business communications stable.

Table 2 RTT increase on GS in Fig. 6 (“–” in RTT increase ratio means below 30 %)
Table 3 RTT increase on a usual day in Fig. 3 (“–” in RTT increase ratio means below 0 %)

Our evaluation showed that network virtualization was successfully done by differentially switched VPN. Thus, offshore clients can continue doing business without experiencing Internet speed decrease. The effect of our differential prediction combined with the immediate switch to VPN bypass on real blocks by the Golden Shield (GS) has been validated also by successful usages from more than fifty offices over 2 years.

Even in countries outside China, such RTT signatures as abrupt stair cases causing over 30 % RTT increase can always exist if there are multiple international channels for open or public internet and the speed of each such channel is different. The reason is why the channels are successively switched to the slow line due to the blocking and RTT increases rapidly (e.g., over 30 %) to form stepped stair cases. As shown above, our method can recognize such RTT signature. Thus, network virtualization can be successful even in countries other than China.

6 Conclusion

We proposed an intelligent bypass method using VPN capable of alleviating the negative impact of intentional network blocks such as China’s GS (Golden Shield). To keep communications to cloud or remote servers active, our method applies asymmetric criteria to decide whether to switch to and from a VPN bypass. Namely, differential values of network latency (increase of ICMP Echo packet’s RTT) are used for detecting the onset of intentional network blocks, while a different absolute threshold is used for determining both start and end. While more sophisticated methods such as frequency-domain spectral analysis of the RTT waveform could conceivably be used [42] our differential method is simple as well as effective to predict successive step-wise increases of response time at the first step. Such a network virtualization method by differentially switched VPN was developed for stable and dependable business communications with offshore computers through our experience in managing practical distributed systems in China. The effects were also validated through successful usage by more than fifty offices over 2 years.

Our approach can be extended beyond block detection and bypass. While the Golden Shield does not appear to be systematically examining all Internet content, Deep Packet Inspection (DPI) capabilities are available and are selectively used to monitor traffic. While the absolute RTT change introduced by such monitoring is very low (around 10 ms, [43]) each packet must be “frozen” during DPI execution. The corresponding steepness is high, as well as jitter and jerk. In our future work, we plan to use measurements at the TCP stack level to enable jitter and jerk analysis of the RTT waveform.