Scaling in the space-time of the Internet

The Internet on the router level, is a complex network embedded in a geographical space. We provide experimental evidences suggesting that the average travel time for a message, with fixed length, increases roughly as the square root of the geographical distance. To understand this scaling law and other measurable topological properties of the Internet as a graph, we introduce and study a simple network model. The model is based on a few realistic socio-economic facts/assumptions and qualitatively reproduces the experimentally observed stylized facts.

assumptions. The model is able to reproduce statistically the known topological properties that can be measured by large scale "traceroute" experiments 15 . The present work offers thus contribution in this research field from two direction. First, as an experimental result we present an interesting and universally valid statistical law for the data transmission speed on the Internet. Second, we consider a simple modelling approach to the Internet as a graph, trying to understand in a unified context all stylised facts (including the one reported here) known for the Internet.
The present study does not aim at ending the debates on the Internet's topological properties nor at giving solutions related to its' rapid growth. We only intend to emphasise another stylised fact and a simple explanation for it, that might have had less attention recently and could be of importance in future engineering and optimisation problems for the Internet.

Average Round trip time as a Function of Distance
We consider both a large-scale "ping" experiment and use also the freely available results of the CAIDA UCSD IPv4 Routed/24 Topology Dataset 15 obtained with "traceroute". These experiments prove that the average response time (RTT) is increasing with the square root of the distance (measured on geodesic lines) between the considered routers. The Caida dataset offers also information on the topology of the Internet as a graph. The "ping" experiments were conducted around the globe starting the command from two fixed locations in Europe (see targets on Fig. 1a), while the "traceroute" measurements aimed to reveal also the network topology, is restricted to a well-delimited area of Europe (see locations on Fig. 1b). ping experiment. The experiment is based on the Internet Control Message Protocol (ICMP) echo request packet sending and receiving 19 . We relied on the "ping" command 20 , which is used on most platforms to test the ability of a source computer to reach a specified destination computer. It operates by sending an operating system based ICMP echo request to a specified IP address, receiving also the response time elapsed from sending the request and getting back the echo from the destination. This time is usually measured in ms. In total 24700 destination computers were selected from various locations on the Earth (see Fig. 1a). Their geographical coordinates (GPS coordinates) were determined from the IP addresses using the home page: IP2LOCATION 21 . Two source locations (from where the ping command were generated) were considered: one in Budapest (Hungary) and one in Cluj-Napoca (Romania). By using the GPS coordinates we calculated the geodesic distance (d) between the source and target routers.
The destination routers' ability to respond was tested for weeks on a 24 hour basis. Response times were averaged on the number of times the target was reached. Experiments were conducted between September 2013 and March 2015. On Fig. 2 we plot the averaged RTT values against the distance d, for all pinged router pairs. Using a logarithmic binning method (exponentially increasing bin-sizes) one gets the averaged trend plotted with black dots on Fig. 3. From Figs 2 and 3 we draw the first qualitative conclusion that the response time scales roughly as ∝ d RTT .
traceroute experiment. "Traceroute" operates in a somehow similar manner as "ping" does. In this case the ICMP echo request maps the RTT round trip time, but with the difference that it repeats itself with increasing life time (or TTL -time to live) settings, revealing the intermediate hops as well. This process roughly follows the steps: (1) the sender sends out a sequence of three datagrams to an invalid port address with a TTL set to 1; (2) the first router to receive the packets will send it back with TTL set to 0; (3) the sender now will send the message again with TTL set to 2; (4) the packet will be received back now after two hops; (5…) the process is repeated incrementing TTL with unity until the message reaches its final destination; (n) since the packages were sent to an inexistent port the final destination replies with an ICMP Port Unreachable message. For this experiment we used the already available data from the IPv4 Routed/24 Topology Dataset 15 , which contains summaries of "traceroute" probing with the Paris Traceroute protocol since the year 2007 15 . Our aim was to use recent data, so we limited our study to files only from 2017. The traceroute dataset allows to reveal also the network topology on router level, since we can map all first-order neighbours of the nodes (routers in our case). The most active monitors at that time were from Netherlands and Switzerland. As a consequence, for an easier Before focusing on the unveiled graph topology and compare the observed features with the ones given by models, we present here our results for the RTT versus distance (d) scaling, as it is revealed by these experiments. In this case the source for the traceroute experiment were located inside the region shown in Fig. 1b and the target routers were spread all over the World. On Fig. 3 we plot together the logarithmically binned data 18 for the "ping" and "traceroute" experiments. Apart of the fact that the data for the traceroute experiments scatter in a larger manner due to a poorer statistics, both data is consistent with a power law trend having the scaling exponent of 1/2.
Considering a fit of the experimental results in the form of = ⋅ a d RTT 1/2 the obtained R 2 coefficients of determination were = .
R 0 98 2 for the ping data and = . R 0 88 2 for the traceroute data. In such a view the assumed scaling can be considered also quantitatively acceptable.

Model
In order to explain the non-trivial scaling found in the previous section we consider a network model based on a simple wiring rule. We will argue in the next section that the model is successful not only in reproducing the RTT versus distance scaling, but it is successfull also in generating the right network topology.
Within the model the nodes of the graph represent cities while links are wiring channels between them (network cables). In the simplest approximation N nodes are uniformly distributed in the Euclidean space. The considered territory is a square with edges of unit size.
Recent results indicate that the "connectivity" of a city usually depends on it's Gross Domestic Product (GDP) 22,23 . An equivalent formulation of this is that the probability for a person in the given city to be "wired" depends on the GDP per capita. In the studied region however, the city's GDP is roughly proportional with the total population, since we considered a territory where the economic development is rather homogeneous. In Fig. 4, we illustrate this proportionality for some large cities in Germany and France. Data for Germany are taken  www.nature.com/scientificreports www.nature.com/scientificreports/ from the Internet 24 , while the data for France is obtained form a discussion forum on the Internet. The linear fit for the cities in Germany has a coefficient of determination = .
R 0 89 2 , which is a reasonable value for admitting our hypothesis.
In such a view the population of the cities (W i ) will determine the weight (ω i ) of the nodes used in the wiring process (see later), and we consider the W i values distributed according to a Tsallis-Pareto type distribution with exponent α = 1. Such city-size distribution is realistic for not too small settlements (in general W i > 10 3 − 10 4 inhabitants) and is confirmed in many geographical areas 25,26 . For smaller settlements a better statistics is given by the log-normal distribution 25,26 , taking into account however that the wired Internet network interconnects mainly the larger cities, we consider that the Tsallis-Pareto distribution is justified in our model. For generating the Tsallis-Pareto distribution of the W i values we used the implemented generator from the scipy package of Python 27 . In order to simplify the scaling properties of our model the generated W i values were renormalized, so that the maximal value to be always one.
We assume the relation between W i and ω i in the form: The values of ω i are defining the "radius of connectivity" for each settlement. The "wealth or GDP" of each city is assumed to be proportional with its population, therefore we assume that the area around it that can be supported with links is proportional with W i . This leads us to the considered hypothesis: it's "radius of connectivity" should be proportional with the square-root of W i . In the proposed kernel β is a proportionality factor.
For wiring the nodes first we determine for each pair of nodes the ij i j ij fraction, where d ij is the geodesic distance between the two cities. In case > f 1 ij we connect the two nodes, otherwise they remain unconnected. This wiring condition models the obvious fact that the total wealth of the cities govern their socio-economic ability to build a direct link between them. Small cities at large distances are unlikely to be connected, while large cities at short distances become connected. Also, when connecting two settlement one has to take contribution from both by adding their radius of connectivity. The main elements of the model are sketched in Fig. 5.  www.nature.com/scientificreports www.nature.com/scientificreports/ The model in it's actual form has two parameters: N and β. Changing the value of N affects the density of the nodes and implicitly the average distance between them. However, due to the rescaling of the W i values, the imposed Pareto-type distribution and the connectivity condition one might expect that the statistical properties of the graph will not be seriously affected by changing N. This guess has to be however proved by computer simulations. In contrast with the parameter N, we expect that the β parameter influences strongly the obtained graphs giant component.
Simulations for a relatively large system ( = N 2400 and = N 8000) and several β values were carried out. We searched for the best value of β that is able to reproduce the experimentally observed scaling laws and the average number of links/node (8.68) for the giant component. For each parameter set the computer simulation results were averaged over 100 independent realizations of the graph with fixed N and β parameters.

Model versus Reality. Results and Discussion
In order to validate the previously introduced model we compare the statistical properties of the network unveiled by the traceroute experiments and the one generated by the model. The main advantage of trace-route experiments is that one can reveal also the routers on which the information passed. By this one can reconstruct the links in the studied network.
Network topology. The largest component of the experimentally studied network plotted in the real geographical space is shown in Fig. 6a. The topological representation of the largest component of this graph with the Fruchterman-Reingold layout is given on Fig. 6b. For = N 2400 we obtained that the experimentally observed features are best reproduced by the model if one considers β ≈ . 0 4. A realization of the networks giant component generated with these parameters is shown in Fig. 7. In agreement with our initial guess, we found that changing the value of N does not affect in a considerable manner the giant components statistical properties. Simulation for = N 8000 and β = . 0 4 yields giant components with statistically similar graph characteristics. Some properties are summarized and compared in Table 1.
The average degree of the nodes mapped in our subnetwork is 〈 〉 = . k 8 68. The degree-distribution for the largest component is fitted with a Paretto-Tsallis (or Lomax II) distribution: The experimental results obtained for this subnetworks degree distribution is plotted with black dots on Fig. 8, and the Paretto-Tsallis fit (2) with α = .
1 23 is indicated with the dashed line. We recall here that such degree distributions with a scale-free tail was modelled mostly in the framework of a preferential growth model with an exponential dilution 25,28 .
One can also determine the degree distribution for the graph generated by the model. Comparison between the experimentally measured degree distribution and the one given by the model (Fig. 8) suggests a reasonable agreement (see also Table 1). In Table 1 we summarize the estimated topological properties for the Internet on router level obtained with different methods 29 , estimations from our traceroute experiments and the results of our model for = N 2400 and = N 8000 (β = . 0 4). The obtained structural characteristics of the studied subnetwork are in agreement with previous Internet mapping measurements 29 . A more detailed presentation of the experimental results can be found in the master-thesis of Mounir Afifi 30 . www.nature.com/scientificreports www.nature.com/scientificreports/ Round trip time statistics. The observed scaling can be associated with the routing properties of the Internet. The largest contribution to RTT is due to the waiting times encountered at the routers, so one can assume that the measured mean RTT should increase with the number of encountered routers (hops) H, up to the target. Indeed, the experimental traceroute results suggests that ∝ γ H RTT (Fig. 9) with γ ≈ 3/4. By considering γ = 3/4, the statistics of the fit gives a coefficient of determination = .
R 0 98 2 . In case one would assume a linear dependence between the RTT and the number of hops (γ = 1) the fit would be less performant, and the best coefficient of determination achieved is = .
0 73 which would increase the value of R 2 with less than 0.001. In such manner, for the sake of simplicity, the γ = 3/ = .
4 0 75 choice is justified. As a result, we can also imply that the number of encountered hops is scaling with the distance, the scaling exponent being: (1/2)/(3/4) = 2/3. Results plotted on Fig. 10 suggest the validity of such scaling. For the traceroute data the coefficient of determination for such a regression is = .
R 0 98 2 . The fact that the average RTT values are not directly proportional with the H values suggests that a simple delay that is in average constant on all routers cannot completely account for the observed ∝ d RTT 1/2 scaling. For the elaborated model one can study the hops number versus geodesic distance. By using the Breadth-first search algorithm implemented in Igraph's Python package 31 one can determine the shortest topological path between the nodes. Since the ICMP echo request protocol is programmed to go on routes with the minimal number of hops, we can assume that any message will follow the shortest topological path. In such manner one can study again the scaling for the number hops (H) as a function of the distance between the nodes. Results averaged on several realizations with = N 2400 and = N 8000 (β = . 0 4) values are presented in comparison with the experimental ones on Fig. 10. One can realize, that both the experimentally observed trend and the one given by the model is a power-law, consistent with an exponent 2/3 and visibly different from 1/2. The trend similar to the experimental results suggests that this simple model captures the origin of the non-trivial scaling between the travel time and geodesic distance.

Conclusions
From comparison between the model and experimental data we conclude that our simple one parameter wiring model implemented in a geometric space is capable of qualitatively reproducing the observed statistical features of the Internet network at router level. In this view the nontrivial scaling for the average travel time of a message as a function of the geodesic distance is the result of the specific network topology. However, the difference between the scaling exponent for the number of hops as a function of distance (≈2/3), and for the scaling exponent of RTT  (Fig. 1b), while target routers were spread all over the world. www.nature.com/scientificreports www.nature.com/scientificreports/ as a function of distance (≈1/2) suggests that a constant average delay on routers cannot account totally for this scaling. Apart of network topology presumably other elements has to be taken into account for building a more realistic model. This is somehow similar with what we have learned in 18 when the nontrivial scaling of the human travel time as a function of the geodesic distance was investigated.

Data Availability
The datasets generated/analyzed during the current study (others than the ones that are publicly available on the indicated links) are available in the FIGHSAHRE repository under the link: https://figshare.com/s/ ce595c23e93aad2d7f14.