Early meteorological records from Latin-America and the Caribbean during the 18th and 19th centuries

This paper provides early instrumental data recovered for 20 countries of Latin-America and the Caribbean (Argentina, Bahamas, Belize, Brazil, British Guiana, Chile, Colombia, Costa Rica, Cuba, Ecuador, France (Martinique and Guadalupe), Guatemala, Jamaica, Mexico, Nicaragua, Panama, Peru, Puerto Rico, El Salvador and Suriname) during the 18th and 19th centuries. The main meteorological variables retrieved were air temperature, atmospheric pressure, and precipitation, but other variables, such as humidity, wind direction, and state of the sky were retrieved when possible. In total, more than 300,000 early instrumental data were rescued (96% with daily resolution). Especial effort was made to document all the available metadata in order to allow further post-processing. The compilation is far from being exhaustive, but the dataset will contribute to a better understanding of climate variability in the region, and to enlarging the period of overlap between instrumental data and natural/documentary proxies.

The objective of this paper is to make the EMERLAC dataset available. We have retrieved more than 300,000 meteorological data summarized in 137 series from 20 countries. While we acknowledge that this is not the final word regarding the full extent of data recovery in this area, it is nevertheless a significant step towards improving the availability of the region's early instrumental records.

Methods
Three steps were followed to create the EMERLAC dataset (Data Citation 1): (i) identifying documentary sources with non-retrieved meteorological data, (ii) digitizing the meteorological data, and (iii) correcting non-systematic biases.

Finding early meteorological observations
A great diversity of documentary sources are preserved in the region's archives and libraries 17 . The libraries and archives with most potential to preserve early instrumental measurements are those of administrative, academic, scientific, and military institutions.
The present collection was made through a combination of 'in situ' visits to institutions located in the authors' own countries and consultations of Web-based resources. The institutions visited were: Biblioteca Nacional de España and Archivo Histórico del Real Observatorio de la Armada (Spain); Biblioteca Nacional de Portugal and Biblioteca da Academia das Ciencias (Portugal); Biblioteca Aurelio Espinosa Pólit and Biblioteca del Ministerio de Cultura y Patrimonio del Ecuador (Ecuador); Biblioteca de la Sociedad Económica de Amigos del País, Biblioteca del Instituto de Literatura y Lingüística, Biblioteca Nacional 'José Martí', National Archive and Historical Archive of the Instituto de Meteorología (Cuba); Biblioteca del Servicio Meteorológico Nacional (Argentina); and Biblioteca Nacional del Perú and Instituto Riva Agüero (Peru). Fortunately some institutions provide part of their scanned/imaged holdings online. For instance, the Instituto Histórico e Geográfico Brasileiro (http://www.ihgb.org.br/acervo1.php), Biblioteca Nacional Digital of Brazil (http://bndigital.bn.br/), and Anales de la Universidad de Chile (http://www.anales.uchile.cl/) were especially useful in this research. Initiatives such as Google Books (https://books.google.com/) and the Hathi Trust digital library (https://www.hathitrust.org/) are improving the online availability of documentary sources, and were extensively used in this research.
Next, we describe the types of documentary sources consulted in this research so as to provide a general overview of those sources and to serve as a guide in the selection of documentary sources in future research.
1. Meteorological records published in a monograph by an institution: These are documentary sources produced by military or scientific institutions that collect together instrumental meteorological measurements. These records usually provide metadata about the instruments used and the methods of observation. Sometimes the records are published with daily (or even sub-daily) resolution, while others only present monthly summaries. These series usually keep the methods of observation fixed even when the observers change. The institutions frequently exchanged their meteorological bulletins, so that these documents are often encountered at specialized libraries far from their original country. There is growing accessibility of these records through online repositories. One example is the Abstracts from the meteorological observations taken at the stations of the Royal Engineers in the year 1853-1854 21 that record observations from the Bahamas and Jamaica. 2. Newspapers: Some newspapers during the 18th and 19th centuries contain early instrumental measurements from the previous days or weeks. The usefulness of these records has been examined elsewhere 22 . These series are usually short. The observations appear and disappear from the newspapers without apparent reason. Metadata are infrequent, and the observer and the instruments are usually unknown. It is frequent that some issues of the newspaper are missing or remain only on paper in a specific library or archive. Thus, it is usually difficult to complete long series from newspapers. An example in the present study was the O Patriota Jornal Literario, Político e Mercantil do Rio de Janeiro (Fig. 1a). 3. Scientific annals or proceedings: These are works presented to an academy, university, or scientific institution. It is quite common for them to provide just summaries of the original meteorological records due to the journal's restriction of space. Such records are usually accompanied by a description of metadata, and some discussion or conclusion inferred from the data. At other times, the original data are presented with regularity but without any metadata or comment. The sources were frequently exchanged with other institutions, which increases the possibility of finding digital copies on Internet. This is the case of the University of Chile with all its annals scanned on the website http://www.anales. uchile.cl/ (Fig. 1b). Many of the data rescued for that country were extracted from there. 4. Geographical papers: It was common in the colonies to describe the new territories just discovered.
Some of these works provide summaries of the meteorological data recorded or compiled by the author. Metadata are usually scarce. These works usually allow one to identify the earliest observers, and sometimes give clues to finding where the original manuscripts of the early observations are kept. This was the case with the monograph entitled Viajes científicos a los Andes Ecuatoriales 23 .

5.
Almanacs: These are annual publications with information about the agricultural calendar, astronomical ephemerides, and weather forecasts based on traditions or astronomy. Meteorological measurements are infrequent in their records. But El conocimiento de los Tiempos presents instrumental measurements from Lima during the period 1754-1856 24 . As with the newspapers, metadata are infrequent, and complete series are hard to obtain. 6. Navigation records: Ships' logbooks have also been widely used to understand and reconstruct atmospheric patterns [25][26][27][28][29][30] . We compiled only records from logbooks of stationary ships. As an example, we found a summary from the logbook of the Vapor Hernán Cortés which was anchored in San Juan (Puerto Rico) (Fig. 1c).
To ensure the traceability and reproducibility of the EMERLAC dataset, we have included a detailed reference of the documentary sources consulted in the headline of each series (see the Data Records section). Additionally, a link to an Internet repository has been included when possible.

Digitization
Due to the high variability in format, layout, typing, and legibility of the documentary sources studied, and taking into account that Optical Character Recognition programs often lead to errors 31,32 , we performed the digitization by key entry. We photographed the documentary sources in situ to have the possibility of re-checking the data when digital copies were not available.
All the retrieved data were in numerical or text format. There was no template of a working sheet for the digitization, so each digitizer chose the best option to speed up the process and reduce the reformatting, depending on the format of the original source and taking into account the final structure of the archives (Fig. 2).
The digitizers have climatological knowledge, and are familiar with the variables studied. This allowed for many errors to be corrected during the digitization phase, i.e., number or column transposition, inconsistencies in sequential dates, impossible measurements (e.g., relative humidity above 100%), and the disappearance of (decimal) commas. The metadata of the series, times of observations, methods of observation, observers, precise location of the observatories, and instruments used were also digitized and provided when available in the original sources. We searched historical literature when metadata were not available in the original sources. The documentary source has been identified in the headline of each series, so that the user can re-check the data if needed.

Correction of non-systematic biases
The different typology of the retrieved data (time resolution, length, variables) and the lack of close stations at those times made it impossible to perform a systematic and uniform quality control for all the datasets. All the series were subjected to visual inspection to detect transcription mistakes. Of these, there are at least two possible types: one caused in passing from the manuscript to the printed document, and the other during the digitization of these printed sources to the EMERLAC dataset. We detected 488 suspicious values. This is 0.16% of the EMERLA dataset. We corrected 59 values from re-checking the original source, but 429 values were rejected because the suspicious values were coincident with the original source.
A basic quality control was done for the temperature, pressure, and humidity daily series with more than 5 years of observations. These data represent 30% of the total dataset. Each series and variable provides information at different temporal scales. Some series offer 3 observations per day, i.e., morning, afternoon, and night, while others the maximum, minimum, and mean values for each day. The quality control comprised three steps, and was done in the original units: (1) Tolerance test: We flagged all values of each variable above and below the mean plus/minus three standard deviations. No values were corrected or deleted during this process. In total, 90 516 data were evaluated, and only 257 cases (0.28%) were flagged by putting an asterisk after the value. This approach allows any quality control, homogenization, or interpolation processes to be performed by the final users according to their needs.

Units used in the dataset
The transformation of old units into their contemporary equivalent is not a straightforward process, but usually requires many decisions to be made that may introduce uncertainties. Consequently, we have provided the data in their original units. In this subsection, we provide all the information required to transform old units into current ones. Temperature measurements: Three scales were used in the dataset, i.e., Celsius, Réaumur, and Fahrenheit. Equation (1) is the formula to transform Réaumur (°R) to Celsius (°C), and equation (2), Réaumur to Fahrenheit (°F).
Length units: Different variables were recorded in length units, i.e., pressure (height of the mercury column), precipitation, and evaporation. We found five units of length in the dataset: millimetres, king's feet, French (or Paris) inches, English inches, and Castilian (or Spanish) inches. The conversion factors for French, English, and Castilian inches to mm are 27.0696, 25.3995, and 23.2195, respectively. The measurements were frequently expressed in inches and lines (a line being 1/12th of an inch). The king's feet appear in measurements of Colombia retrieved from the Semanario de Granada, with the author giving 1.935 as the conversion factor to mm. This seems to be an error because this is actually the conversion factor from Castilian lines to millimetres, and it is more plausible that the precipitation was recorded in Castilian lines. Time units: Some meteors, e.g., rainfall, snowfall, hail, fog, are measured in number of days for which that meteor occurs. Others, such as rainfall or sunshine duration, are measured in hours, minutes, and seconds.
Cardinal directions: The wind direction is provided from a compass of 8 or 16 divisions in English or Spanish. Traditional Italianate wind names are not used.
Speed: The wind speed is expressed by adjectives (in the original language) or in km/h. Most of the series are easily converted with the conversion factors provided. But sometimes the units recorded in the documentary source were wrong, e.g., king's foot. In some cases, too, the units were not recorded in the documentary sources consulted. We then put forward the most plausible unit in the headline of the archive, taking the approach of comparing the documentary data with current measurements in the same or a close-by location. Nevertheless, it is important to bear in mind some caveats about this comparison: -It is possible that the observations were made in non-standard conditions, such as indoors, influenced by a wall, or partially exposed to solar radiation. -The metadata rarely explain how the means were computed (daily, monthly, seasonal, or annual). -It is possible that the barometer measurements were not corrected for temperature, elevation, or standard gravity, especially those of the observations in the first half of the nineteenth century. Moreover, the temperature correction may be done in different ways at this epoch, e.g., the Palatine Meteorological Society adopted 10°R as standard, the Royal Society in London adopted 50°F, and Cotte recommended the freezing point as standard.

Data Records
In total, we retrieved 301,778 meteorological records from 20 countries of Latin-America and the Caribbean. The earliest observations retrieved are from Lima in 1754, and the latest from La Havana in 1905.
Most of the data are at a daily scale (93.8%), 5.8% of them are monthly, 0.3% annual, and 0.1% seasonal. We recorded all the variables available in every source, the commonest being temperature (104 series), precipitation (67 series), and atmospheric pressure (54 series). Figure 3 shows the locations of the data retrieved, as well as the availability of data per location. It should be noted that most of the information was retrieved from just four countries-Chile, Cuba, Brazil, and Ecuador (83% of the total).  All the data have been deposited in an appropriate repository (Data Citation 1). The folders are organized by country. In each folder every .txt file has been identified with an ID of six letters: the first three refer to the country of the observations, and the last three to the city/location ( Table 1). The six letters are followed by a number to differentiate series recorded in the same city/location. Currently the lowest numbers represent the earliest measurements, but this might change if future versions of the dataset are created. Table 2 lists all the series retrieved with their respective archival ID, the period covered, the temporal resolution of the data, and the variables recorded.
To clarify the data and the organization of each series, each archive has a headline with the following information: ID: Name of the archive as described above. Observers: Names of the people who recorded the measurements. Observatory location: Latitude and longitude of the observatory in WGS84, plus altitude when available. The name of the observatory, or the street where it was located when that location is exactly known. When the precise location is unknown, a probable latitude and longitude is provided.
Meteorological variables: Describes all the meteorological variables recorded, their units, and the corresponding columns in the file.
Data source: The complete reference of the documentary source in which the meteorological record was found.
Descriptive name: A name of the archive that makes reference to the location and the period covered by the series.
Other comments: All the metadata rescued about the observations or the observer. Also, any extreme or rare events recorded by the observer and any other information that could be useful to interpret the series.
After the headline, the first columns give the temporal information of the record (year, season, month, day and hour) and the following columns give the measurements of each meteorological variable. Every column has a short descriptive title. Figure 2 above showed an example of the CHILLA1 archive.

Technical Validation
As described in the Methods section, we provide raw data. The only correction done had the goal of avoiding non-systematic biases. One must bear in mind that the data are from documentary sources written with very different objectives. The observers had different reasons for recording meteorological observations (scientific, agricultural, navigation, administrative, …), and not only did the care in this recording vary from one observer to another, but, even more so, a given observer might have had greater interest in some meteorological variable than in others.
It also has to be taken into account that, until the end of the 19th century, the observations were not subject to standard rules, so that, methodologically, each series was different.
For all these reasons, we believe that testing the raw data and all the available metadata is the best option with which to optimize the use of the database. Every individual user will be able to apply the type of post-processing that is best suited to their needs.
Some examples of uses that have already been made of part of the dataset are a study to describe the impacts of the 1783-1784 Laki eruption in the Southern Hemisphere 5 , a study of the earliest known continuous 8-year-long instrumental meteorological series for South America 6 , and a study of the earliest known systematic instrumental meteorological observations taken at above 4,000 mamsl 7 .