Reconstructing the historical synthesis of mauveine from Perkin and Caro: procedure and details

Mauveine, an iconic dye, first synthesised in 1856 still has secrets to unveil. If nowadays one wanted to prepare the original Perkin’s mauveine, what would be the procedure? It will be described in this work and lies on the use of a 1:2:1 (mole) ratio of aniline, p-toluidine and o-toluidine. This was found from a comparison of a series of products synthesized from different proportions of these starting materials, with a set of historical samples of mauveine and further analysed with two unsupervised chemometrics methods.


Results and Discussion
Recently 8 , a very high percentage of pseudo-mauveine (C 24 ) was found in Caro's mauveine, whereas the C 27 derivative, mauveine B, was the major compound in Perkin's mauveine. This was discovered from a comparison of the chemical profile of the dyes used in a set of lilac postage stamps of the Victorian period (established based on HPLC-DAD-MS analysis 8 ) with historical samples from both Perkin's and Caro's mauveine, both found to be used in the manufacture of those postage stamps 8 . In order to validate these results, an interpretation that involves taking the simultaneous account of several mauveine chromophores is proposed, facilitating the graphical visualization of different chemical profiles. This was carried out resorting to hierarchical cluster analysis (HCA) and principal component analysis (PCA), as described in the Methods section. In a first approach, both stamps and historical samples were characterized on the basis of the major tracers related to the mauveine structures C 24 (pseudo-mauveine), C 25 , C 26 (mauveine A), C 27 (mauveine B), C 27 B3B4, C 28 (mauveine C), C 29 (mauveine D) and carminic acid (obtained from the cochineal extract). This was performed to confirm the correct source of the mauveine (Perkin's or Caro's mauveine) present in the extracts of the UK Victorian postage stamps. Specifically, the data set contains information on 10 selected postage stamps (labelled according to the Stanley Gibbons catalogue number, SG) and 8 historical samples obtained from the Science Museum, in London, UK, (denoted as ScM1-4), the Museum of Science and Industry, in Manchester, UK, (denoted as MS1IM1 and MS1M2), the Chandler Museum (CM), in Arizona, USA, and a synthesised mauveine by Caro from the Deutsche Museum (DMC), in Munich, Germany, (for details see ref. 8). As previously reported 8 , due to the lower signal of the mauveine chromophores in the extracts of Penny Lilac stamps SG170 to SG174, these were considered as a single object in the analysis, reducing the number of samples to 14. The chemical profile obtained from the postage stamps was established according to the prevalence of mauveine compounds present the historical mauveine samples, and attributed on the basis of their m/z and respective fragmentation patterns 8 . Figure S1 in the Electronic Supporting Information (ESI) presents a dendrogram representing the similarity, in terms of the m/z information, between the UK Victorian postage stamps and the historical samples. This particular data structure displays five groups of samples in which it is apparent that the distribution of those groups reflects the C 26 /C 27 ratio, as found in previous studies (see ref. 8). For example, the group containing the SG106 stamp and the MS1M2 and DMC samples (see Fig. S1) includes a C 26 /C 27 ratio from 3.5 to 9.4. The SG170-4 samples present a very distinct chemical profile, with no C 26 or C 27 mauveine structures. As seen from Fig. S2 in the ESI, the relative positioning of the samples can be further interpreted resorting to a biplot representation. It is worth noting that both the mauveine chromophores and samples are depicted on the same diagram, which allows interpreting the mauveine tracers while inspecting samples' position. The mauveine chromophores are depicted into a so-called correlation circle, in which the angle formed by any two chromophores, represented here as vectors, reflects their actual pairwise correlation.
In Fig. S2, the points representing the stamps and the historical samples are mainly distributed along the first component, PC1, which reflects an increase of C 24 and C 25ab (two isomers) fractions, from left to right, being strongly present in the SG106 stamp and in the MS1M2 and DMC samples, and an increase of C 26 , C 27 , C 27 B3 and B4, C 28 and C 29 (mauveine D) fractions, from right to left. The latter structures are dominant in ScM1-4, CM, MS1M1 samples and in the SG108, SG109 and SG109_2 stamps. The second component is mostly related with the carminic acid (Coch), C 25ab , C 24 and C 26 fractions. A strong relationship between C 28 and C 29 , C 27 and C 27 B3 and B4 and a weak correlation between Cochineal and C 24 , C 25 , C 28 and C 29 are also extracted from the angles between the vectors corresponding to these fractions. The correlation between Cochineal and each of the other fractions is strongly negative. As the C 26 and C 27 fractions have the same loadings in this first component, an additional variable can be naturally estimated reflecting the C 26 /C 27 ratio and can be projected on the second component along which the samples are ranked. The C 26 /C 27 ratio, considered now as a response variable, was not included in the analysis. It should be noted that high values of C 26 /C 27 reflect the process used by Caro 8 . From all the analyzed samples Mauveines B are dominant with no signals (or only residual traces) of pseudomaveine on ScM1, ScM2, ScM3, ScM4, MS1M1 and CM); only for MS1M2 and DCM pseudomauveine is dominant and show high values of the C 26 /C 27 ratio. The first samples point out for a synthetic recipe followed by Perkin whereas Figure 1. Chemical structures of aniline, o-toluidine and p-toluidine used in the synthesis of mauveine. The combination of these starting materials leading to mauveine A and B, two of the major components in mauveine samples, is depicted in different colours. Pseudo-mauveine consisting of a non-methylated structure is also presented. the latter two, MS1M2 and DMC, point out for Caro's synthetic procedure. This suggests that there are significant dissimilarities in the distribution of mauveine compounds amongst the samples included in the groups, defined in Fig. S1. The main feature for Caro's mauveine is the presence of a high quantity of pseudo-mauveine (C 24 ), in contrast to Perkin's mauveine, which is characterized by the absence of C 24 and a high C 26 /C 27 (mauveine A/ mauveine B) ratio. Figure S2 indicates that SG107, SG109_2, SG108 and SG109 stamps were dyed with mauveine obtained from the process used by Perkin, whereas SG106 was dyed from Caro's procedure. Penny Lilacs SG170-4 stands out from the set, once it contains only carminic acid (Coch). These stamps contain the cochineal extract (carminic acid), and a null C 26 /C 27 ratio. The SG106 extract stands out with the highest C 26 /C 27 ratio, and C 24 and C 25 fractions, and do not present C 28 , C 29 and carminic acid, thus relating to the use of Caro's synthetic procedure. SG108 also stands out with the highest values of C 29 and C 28 fractions, followed by SG109, SG107 and SG109_2. The latter three have high fractions of C 26 and C 27 B3 and B4 and similar fractions of C 24 and C 25 . This suggests that these stamps were dyed with mauveine from Perkin's recipe.
It is worth stressing that the synthesis of mauveine requires the use of impure aniline (with o-and p-toluidine). Although the relevance of this historical dye, and the fact that its original recipe was identified in some existent museum specimens 2, 8 , the synthetic procedure leading to this particular mixture of compounds is yet to be determined. Perkin have reported in latter investigations, that aniline was contaminated with orto-and para-toluidine, which then became part of the recipe for tuning the red and blue shade of Mauve 14 : "By using aniline containing much larger quantities of toluidine a redder colouring matter was obtained. By taking advantage of this two different products were manufactured, namely a blue shade of mauve prepared from aniline containing but little toluidine, and a red shade from aniline containing large quantities of toluidine". It is known that both Perkin and Caro knew how to make red and blue shade mauve (here seen as the colour), according to the compositions, or at least boiling points, of the mixed starting aromatic hydrocarbons (aniline, o-toluidine, p-toluidine). However, there was poor knowledge at the time of the final composition and it is unlikely that the shades of mauve were obtained by synthetic procedures other than those using the above-mentioned aromatic hydrocarbons as starting materials. A possible control of the shade of mauve can lie in the purification step, i.e., to remove other mauveine isomers or to tune the C 26 /C 27 ratio to improve the shade.
While attempting to reproduce the original Perkin recipe for mauveine, Cliffe 13 proposed a mixture of aniline (2 mol.) and o-and p-toluidines (1 mol. each) while others 15 suggested in the proportion of p-toluidine (1.0 equiv.) and aniline and o-toluidine (1.5 equi. each). As will be shown these are different to the one we have found to be identified as the one leading to the samples present in the historical samples (ScM1 and CM). Indeed, In the development of our studies towards establishing the exact nature of the dye produced by Perkin 2 , different proportions of aniline, p-toluidine and o-toluidine were combined for reproducing the pioneer synthesis of mauveine. Two starting compositions of 1:10:10 (MV6) and 1:10:5 (MV5) of aniline, p-toluidine and o-toluidine, were selected for the synthesis of mauveine based on the assumption that larger quantities of toluidines will produce a redder colouring matter with the C 26 and C 25 and C 24 structures emerging as the major mauveine tracers, see Tables 1 and 2. Other proportions including those of 1:4:2 (MV3), 1:4:1 (MV2), 1:2:1 (MV1) and 0:1:0 (MV4) mixtures of aniline with smaller quantities of both toluidines were also tested for producing a more blue-based colouring matter in which C 26 and C 27 structures are dominant tracers. A comparative HPLC analysis of the synthesized products, together with a set of historical mauveine samples, obtained from museum collections, was carried out and the complex composition of Perkin's mauveine was confirmed. In agreement with previous findings 2, 8 mauveines A (C 26 ) and B (C 27 ) have emerged as dominant components, while mauveines C 25 and C 27 B2 isomers were identified as important tracers of the original synthetic procedure. HPLC plays an important role in analysis of historical organic colorants 16 . Therefore, the analysis (and characterization) of the resulting  Scientific REPORTS | 7: 6806 | DOI:10.1038/s41598-017-07239-z products was made by HPLC-DAD, followed by a multivariate data analysis procedure based on HCA and PCA. These unsupervised chemometric methods have provided some clear-cut patterns from the HPLC data, and have validated the synthetic route that was likely to have been followed by Perkin.
The HPLC-DAD chromatograms of mauveine samples obtained at λ = 550 nm are depicted in Fig. 2.
Observation of the chromatograms indicates a broad range of mauveine chromophores with the main structures corresponding to C 24 (pseudo mauveine), C 25 , C 26 (mauveine A), C 27 (mauveine B), C 28 (mauveine C) and C 29 (mauveine D). The C 25 and C 27 isomers are also identified and are clearly less prevalent. The labelling of the peaks corresponding to the mauveine structures with different number of carbons was based on the comparison of the retention times and the relative peak areas with previous data 2 . From the observation of these profiles, it can be seen that one of the synthesized mauveine, identified as MV1 together with the historical samples are complex mixtures of at least 10 different compounds, all of which containing the 7-amino-5-phenyl-3-(phenylamino)-phenazin-5-ium core. In the sample of the Museum of Science and Industry of Manchester 2 (MS1M2), pseudo-mauveine (C 24 ), monomethylated C 25ab isomers and dimethylated mauveine A (C 26 ) were found as the major components. C 24 and C 25 isomers are present as minor tracers in the remaining historical samples. The profiles identified for Science Museum 2 (ScM2), Science Museum 3 (ScM3) and Museum SI Manchester 1 (MS1M1) are in agreement with the previously published analysis 2,8 . As can be seen from Fig. 2 mauveines A and B are the major components of the mauveine mixture. A clear pattern can be observed: the distribution of the different chromophores in MV1 (Fig. 2) corresponds to the profile of Science Museum 1 and Chandlers Museum (ScM1 and CM respectively) samples. Mauveines A and B are clearly dominant, corresponding to a C 26 /C 27 ratio approximately equal to 1 (see Table 1). Tables 1 and 2 summarize information on the relative percentage and retention times of the main mauveine chromophores isolated from the synthesized and historical collections, at λ = 550 nm. The ratio between the sum of all C 26 (mauveine A) and all C 27 (mauveines B, B2, B3 and B4) compound percentages is also included.
Although the distribution of mauveine compounds amongst the different samples (Table 1) shows significant dissimilarities, the two major chromophores are (with the exception of MS1M2) mauveines A (C 26 ) and B (C 27 ). Other C 27 isomers, mauveines B3 and B4, as well as C 28 mauveines, C and C1, are also present, with mauveine B2 being the most prevalent of these minor compounds. In ScM3, CM, MV1 and ScM1, mauveine A is present in a larger amount, and contributes with ca. 30% for the overall chromophore content. In contrast, when the C 26 /C 27 ratio is considered, in ScM2 and MS1M1, the major chromophores are the C 27 isomers, which is in agreement with previously results 2 . Fig. 3 presents a dendrogram reflecting the similarity between the synthesized samples and the historical samples, considering the relative percentages of different mauveine compounds identified in Fig. 2. This provides a two-dimensional diagram of the data structure, indicating the merging samples and the merging distances. Five groups of samples are identified at a distance of 30, and reflect the C 26 /C 27 ratio (sum of all C 26 and all C 27 compound percentages). For instance, the group on the top includes the samples MS1M1 and ScM2 with the lowest C 26 /C 27 ratio (0.3 and 0.2 respectively). ScM1, ScM4, CM and MV1 present a C 26 /C 27 ratio of 0.7-1.0. MS1M2 differ significantly from the other historical samples as the prevalence of C 27 structures is very low. This sample presents the highest percentage of non-methylated mauveine (C 24 ). The group including MV2-6 samples is also isolated as their profiles are quite dissimilar (see Fig. 3 and Tables 1 and 2). These samples contain higher percentages of pseudo-mauveine, mono-and dimethylated mauveines (C 24 , C 25 isomers and C 26 , respectively), while trimethylated mauveines B, B3 and B4 are minor or even absent compounds.
After establishing the number of clusters, PCA was then directly applied in order to reveal the relationship between the chemical structures in these groups. It is seen that the first two principal components are able to recover more than 72% of the data variability, indicating that a graphical representation based on these two    All samples were dissolved in methanol. The major components identified are labelled as: 1-pseudo-mauveine (C 24 ); 2 and 2′ -two monomethylated C 25 isomers (C 25a and C 25b ); 3 -one dimethylated mauveine (C 26 ); 4, 4′, 4″ and 5 -four trimethylated mauveines (including C 27 and B2, B3 and B4 isomers); 6 and 7 -two tretamethylated mauveines (C 28 C, C1 isomers); 8 -one pentamethylated mauveine (C 29 ). For details see refs. 2, 8. Purple labels refer to the mauveine structures identified in each sample; grey labels refer to unidentified compounds. In order to illustrate the correct alignment of the peaks between the samples and considering three of the most similar profiles, ScM1, MV1 and CM were represented in the same graph. and pseudo-mauveine (C 24 ) mirror the original synthetic procedure of Caro 8 . ScM1-4, MS1M1, CM and MV1 samples contain mauveine from the process used by Perkin, whereas MS1M2 possesses mauveine obtained using Caro's procedure. The latter contains the highest content of C 24 and C 25 isomers and lower fractions of C 26 and C 27 . The first component contains essentially information on C 25 isomers, C 27 B and B2, C 28 C and C1 and C 24 fractions. The second component is mostly related to C 26 , C 24 , and C 27 isomers B2, B3 and B4. The length of the vectors in Fig. 4 approximates the variances of the chromophores content. The longer the vector, the higher is the variance. A strong relationship between the C 27 B, C 28 C and C1 isomers and C 29 , and also between C 24 and the C 25 isomers is observed. Additionally, a weak relationship between C 24 , C 26 and the C 27 isomers is confirmed. With one exception (for the C 25 isomers) the correlation between C 24 and each of the other mauveine chromophores is negative. MS1M2 stands out through possessing the highest content of pseudo-mauveine and the absence of mauveine A and B.
Other studies 17,18 have proposed different starting materials for the synthesis of Perkin's mauveine; however, based on our approach and comparing with the historical samples (ScM1, and CM), a simple mixture of impure aniline with orto-and para-toluidine in the (in)correct proportions leads to what can be undoubtedly considered Perkin's original recipe. The interpretation of the mauveine traces in terms of fragmentation and chromatogram data must be complemented having in mind the relative proportions of the starting materials and the relative combination of specific mauveine isomers (B2, B3, B4 and C1).  In conclusion, we have established a procedure aiming at establishing the (historical) synthesis of mauveine. A simple combination of aniline together with o-and p-toluidine leads, in the correct proportion, to a mixture of the complex chemical profile of Perkin's mauve, containing a mixture of several methyl derivatives with a 7-amino-5-phenyl-3-(phenylamino)phenazin-5-ium core. A clear pattern was identified in which mauveines A (C 26 ) and B (C 27 ) were dominant, while mauveines C 25 and C 27 (the B2 isomer) were found to be important tracers to probe the original synthesis. The complementarity between the HPLC-DAD analysis and the unsupervised characterization of the respective data was explored and allowed to determine the synthetic procedure. The inspection of the most discriminating chromophores provides a detailed understanding of the chemical profile. The biplot analysis contains a significant amount of information and can be helpful in interpreting relationships between unknown samples and mauveine fingerprints. It is believed that this approach proves a general route for exploring the chemical profiles for a variety of systems.

Methods
Synthesis of Mauveine. The general synthetic procedure used for mauveine consisted of mixing p-toluidine (1.14 mmol, 122 mg), Mili-Q water (2.3 mL) and sulfuric acid (0.6 mL, 2.0 M) in a 25 mL glass vial 13 . The reaction vial was magnetically stirred, and an ultrasound-enhanced dissolution was carried out. The reaction vial was then heated gently using a controlled water bath until the reactants dissolved. After the complete dissolution of p-toluidine, aniline (0.57 mmol, 0.052 mL), o-toluidine (0.55 mmol, 0.06 mL) and 30 mg of potassium dichromate (K 2 Cr 2 O 7 ) were added to 160 mL of water. Soon after the addition of K 2 Cr 2 O 7 , the solution turned purple. This describes the synthesis for mauveine MV1, prepared using a 1:2:1 (mole) ratio of aniline, p-toluidine and o-toluidine, respectively. The other samples followed the same synthetic method, with different proportions of the starting materials and are labelled as MV2-6: 1:4:1, 1:4:2, 0:1:0, 1:10:5 and 1:10:10, respectively. The reaction mixture was stirred and heated at controlled temperature (water bath) for two hours. At the end of the reaction time, the liquid portion was discarded. The resulting dark solid was filtered using gentle suction filtration, and washed with distilled water (slightly heated) until the washing solution becomes colorless. The remaining solid was dried in an oven at ~110 °C for 30 min. It was then washed with petroleum ether until the washings became colorless and dried again for 10 min at 110 °C. Finally, the remaining solid was washed with a 25% methanol/ water solution until the solution turned colorless. This aqueous/alcoholic solution was further evaporated. After complete evaporation, 0.3 mL of methanol (100%) was added to the remaining solid and the sample transferred with a filter pipet to a 5 mL vial. The liquid was carefully evaporated until its volume reduced to ca. 0.03 mL. This final methanol solution contains the ultimate product, mauveine (in ca. 2 mg yield). The algorithm for baseline correction and peak identification in chromatograms was developed by the authors using R version 3.3.1 19 and the Alternating Least Squares (ALS) for the Automatic Chemical Exploration of mixtures package [20][21][22] . This has been implemented and validated for use in data from systems like HPLC-DAD 23 . Baseline estimation was built on the Asymmetric Least Squares 23 , a method in which a second derivative constrained weighted regression algorithm is used for baseline correction. This was performed using a second derivative constraint of 10, a weighting of positive residuals parameter of 0.01, and a maximum of 20 iterations. For resolved peaks and for peaks in regions with an approximately linear baseline this method gives a good estimated of the baseline. However, for unresolved peaks, the first and second derivatives of the original profiles allowed to identify less marked features. Zeros in the first derivative, from positive values, were assigned to definite maxima. Inflection points and inverted peaks given by the second derivative were used for confirmation. As the number of mauveine compounds was previously identified 2, 8, 9 , a method based on fitting chromatographic peaks with a Gaussian profile was used for determining the fit peak parameters. For peak grouping, it was assumed that peaks in chromatograms having similar retention time originate from the same chemical compound. No further corrections were made for determining the sample to sample variation of the retention times, as no significant shifts were identified between the historical and the new synthesized samples.
Multivariate data analysis procedure. The analysis resorts mainly to methods that involve the simultaneous study of several key variables related to the different mauveine chromophores. In this context, multivariate methods allow a detailed exploration into possible patterns, enable inter-relationships between the different chromophores to be represented graphically, and provide ways of simplifying and reducing the dimensionality of the data to predict the proportions of the starting materials, in order to reproduce the historical synthesis of mauveine. The main goal is to provide an interpretation of the different chemical profiles, taking simultaneous account of several mauveine compounds. A similar analysis was performed in a previous work related to specific mauveine profiles found in UK Victorian postage stamps, including a comparison with historical samples from both Perkin's and Caro's recipes. A selection of more than 35 postage stamps was analyzed, covering the period of 1847-1901 8 , to identify the use of mauveine as a dye for those stamps and characterize the corresponding synthesis. The chemical profile was established and classified using standard chemometrics methods, according to the main colorants of the historical mauveine samples, attributed on the basis of their m/z and respective fragmentation patterns. The characterization of mauveine samples was conducted resorting to two well established methods: (i) hierarchical cluster analysis (HCA) for defining the data structure and (ii) principal component analysis (PCA) for data overview and feature selection. HCA is especially suitable for cases in which there is no a priori identification of classes, suggesting a structure of the data based on clusters. These clusters are further detailed and rationalized by PCA with biplot representation. The latter techniques allow the visualization of the data, and, thus, lead directly to observation of the most relevant patterns.
The procedure based on HCA and PCA requires a description of the objects, i.e. points in Euclidean space. In this analysis, each mauveine sample corresponds to one of these points. Both sets (mauveine MV1-6 and historical samples) are described on the basis the components related to the mauveine structures (C 24 , two C 25 isomers-C 25ab , C 26 -mauveine A, four C 27 -mauveine B-isomers (C 27 , C 27 B2 and C 27 B3 + B4), C 28 -mauveine C-and the C1 isomer, and C 29 -mauveine D. The data set contains information on 13 analyzed samples (6 synthesized samples, 7 historical samples obtained from the Science Museum, London, UK, (ScM1-4), the Museum of Science and Industry, Manchester, UK, (MS1M1-2) and the Chandler Museum, Arizona, USA, (CM). More details on the data normalization, data structure and dimensionality reduction procedure can be found in the Electronic Supplementary information section.