Roads and cities of 18th century France

The evolution of infrastructure networks such as roads and streets are of utmost importance to understand the evolution of urban systems. However, datasets describing these spatial objects are rare and sparse. The database presented here represents the road network at the french national level described in the historical map of Cassini in the 18th century. The digitization of this historical map is based on a collaborative methodology that we describe in detail. This dataset can be used for a variety of interdisciplinary studies, covering multiple spatial resolutions and ranging from history, geography, urban economics to network science.


Background & Summary
Triggered by recent, powerful digitization techniques, there is a huge interest in historical data, in particular when they allow to track temporal changes at different spatial scales. Such projects comprise for example the NYPL initiative 1 , the digitization of the road network of a region in Italy 2 , of Paris over 200 years 3 , and the digitization of ancient French forests 4,5 . New historical datasets extracted from maps allow researchers to study the time evolution of urban systems, to extract stylized facts, and for the first time to test theoretical ideas and models. Historical datasets of road networks allow to study territorial evolutions at different scales and to build tools to accurately answer theoretical questions. In particular, one can ask about the impact of the road network on subsequent urbanization, the correlation between the location of an entity (such as a city, town, etc.) and socio-economical indicators such as population or importance in the trade network, immigration, etc. More generally, such historical datasets are of interest to a wide variety of scientists comprising historians, geographers, mathematicians, archeologists, geohistorians, geomaticians, and computer scientists [6][7][8][9] . The digitization of historical sources is usually done locally by researchers for their immediate research needs without sharing their work and results with others. In contrast, we believe that it is essential to build a platform to share our work, but also to have a collective control over the production process of the data, its transformation and its analysis.
Operations such as scanning, georeferencing and digitization of historical sources imply several and delicate choices that should be documentated and tracked. Historical sources might have deformations originating from aging. Their georeferencing carries its own deformations which have to be minimized in order for the sources to remain legible. Our approach consists in taking these geometric displacements into account after the digitization process using spatial data matching tools 10 to find corresponding entities in consecutive data sets. Such tools should allow researchers to control and take into account the imperfections of the data throughout their analysis 11 . This way, we can reduce the impact of the georeferencing in the matching process and the analysis. Furthermore, opendata and open source tools provide the scientific community with the ability to control, track and reproduce the results at every stage.
With these ideas in mind, we developed a collaborative way to digitize the Cassini map of the 18th century (see Fig. 1 for a visualization of a small subset of the map and the corresponding digitized data). This map is the first one that restitutes with geometrical precision the entire French territory in the second half of the eighteenth century at a scale of 1/86,000. First conceived in the late 17th century, this work was made possible by the development of geodesic triangulation techniques and their generalization. The determination of the Paris meridian and the establishment of a single framework for all triangulations of France (1744) provided the reference needed for putting together several local maps 12 . In 1747 César-François Cassini de Thury was formally commissioned by Louis XV to draw the entire map of France showing the entire kingdom but also finer details. Cassini and his engineers divided the French territory in a grid of 180 rectangles with a size of about 80 km × 50 km which lead to as many maps printed on sheets of size 104 cm × 73 cm. Due to financial difficulties, the Revolution and regime changes, the constitution of this map was delayed and it is not before 1815 that the last sheets were released, under the direction of Jean-Dominique Cassini, son of César-François.
The maps that serve as a basis for our work is the digital copy of the so-called 'Marie-Antoinette' version, commissioned in 1780 by the queen. These maps were completed, corrected and updated in the subsequent years. For example, the map of the Paris region which was initially drawn between 1749 and 1755, and published the first time in 1756, displays clear signs of corrections made during the postrevolution period with the introduction of administrative divisions created during the Republic in 1790. An important part of the project was therefore to analyze each sheet, to give a precise date of its drawing and to provide an assessment of its accuracy. This was done by comparing different printed and dated versions, and many minutes and notes from the National Institute of Geographic and Forest Information (IGN) archives. The main work was however (see Methods) to analyze and vectorize a large number of features of the Cassini map such as roads, water networks, towns and villages, forest and crops, industrial and administrative structures. The digitized data have been made available on a dedicated geohistorical portail 13 . These different features put together under a digital form give us a detailed picture of the french territory in the second half of the eighteenth century.

Methods
The digitization of the Cassini maps and, in particular, of its road network, was achieved in a collaborative way using a shared PostgreSQL 14 database and its spatial extension PostGIS 15 . GIS editing tools such as Quantum GIS 16 were used to remotely digitize the objects using a WMTS (Web Tile Map Service) layer provided by IGN 17 as background. Details on the methods used to produce the georeferenced map are available on a dedicated website 18 . This way, several operators have been able to digitize data simultaneously on the same database. In order to provide consistent data records, data specifications were proposed as a result of an important collaborative work. Nevertheless, as the specifications were enhanced during the digitization process, local variations in the capture of several attributes might be found (the attribute 'bordered' was added after a few months of digitization for instance). Further work will focus on the consistency of the data (both for attributes and geometries).
An important aspect of the Cassini dataset is the fact that the Cassini map was not homogeneously drawn (different sheets might show different levels of detail as seen in Fig. 2) or conceived as a road map 19 . Hence, one has to be careful when studying the road network extracted from it 20 . Specifically, the road network inside most cities was not drawn in the map. An automatic process is therefore proposed to create so-called 'fictive' edges inside cities allowing to link all roads leading them. As shown in Fig. 3, a node representing the city is created at its centroid (or rather at the centroid of the geometry representing its boundary in the map) and edges are created to connect this node to the edges ending in the city. Furthermore, in order to speed up the digitizing process, some roads have been captured as continuous strokes rather than by topological road segments: some users digitized entire roads instead of stopping the capture at each road intersection. We therefore use the PostGIS topology engine 21 to convert the digitized strokes into a topological network. This process uses a distance threshold to merge points closer than the given threshold and thus allows for the correction of minor shifts between points and a second threshold for to collect all nodes in the neighboorhood of a city. The thresholds used in the current export are 10 and 20 meters respectively. The digitized roads and cities are also provided in the export and the code for the topological export is available 22 .

Data Records
The data records contain the roads and cities as captured (the names of the attributes have been translated though) and the topological nodes, edges and faces. We propose five shapefiles (which each

Roads (france_cassini_roads.shp)
This file contains the roads represented in the Cassini maps. It includes the following attributes: • id: the (unique) identifier for each road segment (integer); • geometry: the geometry of the segment (linestring) in RGF93/Lambert-93 (EPSG:2154). • type: the type of road or connexion as represented in the map: either 'red', 'white', 'trail', 'forest', 'bridge', 'ferry' or 'gap'. These values refer respectively to main roads, secondary roads, trails, forest trails, bridges, tubs, and shifts between sheets (string). • name: the name of the segment when it has one (string). • uncertain: whether the nature of the segment is difficult to clearly identify in the map (boolean). • bordered: whether the segment is bordered by trees (boolean). • comments: comments left by our contributors when the object raises specific questions (string).

Cities (france_cassini_cities.shp)
This file describes some of the main types of land use identifiable in the maps.
• id: the (unique) identifier for each object (integer). • geometry: the geometry of the object (multipolygon) in RGF93/Lambert-93 (EPSG:2154). • type: the type of object: 'city', 'town', 'domain', 'fort' (string), respectively for cities, towns, domains and forts. • name: the name of the land element when it has one (string). • fortified: is the city fortified? (boolean). Can only be true if the type is 'city'. • comments: comments left by our contributors when the object raises specific questions (string).
Topological nodes (node.shp) • id: the (unique) identifier for each object (integer). Edges are not oriented so the start and end nodes are arbitrary. Nevertheless, they are consistent with the order of the points in the geometry of the edge (the start node position is the first point of the geometry of the edge). When the edge is built from a road, it holds the identifier of this road. Its type is also given for convenience but is recoverable by join (combining the Edge table with the type from the roads table by using the common identifier road_id). Note that 'fictive' edges do not hold such an identifier. Furthermore, in cases where multiple roads are merged into the same edge, the identifier is arbitrary.
• id: the (unique) identifier for each object (integer). • geom: the geometry of the object (linestring) in RGF93/Lambert-93 (EPSG:2154). • start_node: identifier of the initial node of the edge (from node.shp) • end_node: identifier of the final node of the edge (from node.shp) • road_id: identifier of the road it stems from (from france_cassini_roads.shp) • road_type: type of the road(from france_cassini_roads.shp) • length: length of the edge (meters) • component: the identifier of the connected component the edge belongs to (integer) Topological faces (face.shp) As the resulting network is a planar graph (i.e., a graph that can be embedded in the plane), the faces (i.e., the regions bounded by edges) are also provided.

Simplified topological nodes (node.csv)
This file contains the same nodes as node.shp but in a different easily accessible format. The position of the roads is given in lat/long. This file contains the same edges as edge.shp without the geometry. It is therefore a simplified version. The length of the edge is the cartesian 2D length of the geometry (a linestring, i.e., a sequence of line segments) from edge.shp computed using PostGIS funtion ST_Length.
• id: the (unique) identifier for each object (integer) • start_node: identifier of the initial node of the edge (from node.shp) • end_node: identifier of the final node of the edge (from node.shp) • road_id: identifier of the road it stems from (from france_cassini_roads.shp) • road_type: type of the road(from france_cassini_roads.shp) • length: length of the edge (meters)

Technical Validation
The topology created using PostGIS Topology is first validated by the same tool and the provided function ValidateTopology without error. This function checks for several errors including crossing edges, and mismatching edge/node topology.
Furthermore, we compute the number of input edges corresponding to the edges of the final network. This allows us to identify the duplicated edges, i.e., the edges in the final network which correspond to multiple edges in the input data. These duplicated edges usually correspond to digitization errors and are used to manually validate the digitized data. The latest version (V5) of the topology does not contain any duplicated edge.

Connected components validation
The second validation consists in computing and analysing the connected components of the network. Indeed, such a road network should essentially be connected and small connected components are unlikely (they would mean small 'islands' disconnected from the rest of the network). Our network contains 1,274 connected components. The largest component is about 110,000 kilometers in length (more than 96% of the total length of the network) whereas the smallest is about 100 meters. Figure 4 shows the three largest connected components in the network. Note that the second largest component is at the very edge of the map (in Germany) and is not visually connected to the network in the map. Finally, the third largest component is the Jersey island. Other large components represent other islands but also forests which paths are represented (and thus digitized) but rarely connected to the road network. The smallest components represent isolated features such as bridges. They can also correspond to digitization errors and the connected components can be used as a tool for data correction.

Collaborative validation
Our third validation method is still ongoing work. It was inspired by the 'Building Inspector' 23 , developped by NYPL and used for the validation of buildings automatically vectorized from insurance maps. With the help of NYPL, we adapted this tool to collaboratively validate and correct our digitized data. The resulting application, 'L'Arpenteur Topographe' 24 is being tested on the digitized cities. The code of the application (from NYPL and our contributions) is available online 25 . Further tests should be carried out on other objects in the future. Further work will also focus on better handling the interaction between the collaborative digitization process (using desktop or online GIS tools) and the collaborative validation, correction and enrichment processes such as in 'L'Arpenteur Topographe'.