Introduction

Per- and polyfluoroalkyl substances (PFASs) are a large class of hazardous pollutants widespread in the environment, and have been raising concerns in the last few decades1,2. PFASs such as perfluorooctanoic acid (PFOA) and perfluorooctanesulfonic acid (PFOS) have been proved to be persistent, bioaccumulative and toxic3,4, and have posed serious environmental threats worldwide5,6. Hitherto, over 10000 PFASs have been documented in various lists7,8,9, but the chemical identities of many are still unclear and merely a small fraction of them have been found in the environment3. Besides known PFAS pollutants, numerous unknown PFASs are potentially present in environmental and biological matrices, such as river water, surface water, and marine mammals10,11,12. For the global concern over PFAS pollution, it is necessary to screen, identify and comprehensively characterize PFASs in the environment, including regular, little-known, and unknown ones.

In the recent decade, nontarget and quasi-target analyses have been applied to screening and identification of environmental pollutants such as current-use chemicals13 and halogenated organic pollutants14,15, which can mitigate the dilemma caused by the lack of reference standards16,17. By virtue of recent advances in chromatography and high-resolution mass spectrometry (HRMS), numerous nontarget and suspect screening analysis methods for environmental pollutants have been rapidly developed18,19,20,21,22. The cutting-edge analytical techniques applied in nontarget and suspect screening analyses mainly include liquid chromatography coupled with quadrupole-Orbitrap HRMS (LC-Q-Orbitrap-HRMS)23,24,25,26,27, gas chromatography coupled with Q-Orbitrap-HRMS (GC-Q-Orbitrap-HRMS)28,29, LC coupled with quadrupole time-of-flight MS (LC-QTOF-MS)30,31,32, GC-QTOF-MS33,34, Fourier transform ion cyclotron resonance HRMS (FT-ICR-HRMS)35,36, etc. Nontarget and suspect analyses have also been applied in the screening and identification of PFASs, and more than 1000 previously unrecognized PFASs have been identified in various matrices recently37, e.g., commercial products1,38,39, environmental matrices40,41,42 and biological samples2,43,44. These analytical methods mainly applied LC-Q-Orbitrap-HRMS37,45,46,47 and LC-QTOF-MS39,40,48, with the detection mode of full scan37,49, data-dependent acquisition (DDA)49 and data-independent acquisition (DIA)9,39,46. Characteristic fragment ions, e.g., C2F5, C3F5, SO3F, PO3, HSO4 and NSO2 were utilized to trace quasi-molecular ions, thus identifying PFASs37,45. Particularly, in the study of Liu et al45., an innovative nontarget analysis strategy using in-source fragmentation flagging was developed and successfully applied, demonstrating that diagnostic fragment ions generated during in-source fragmentation can be employed to flag and identify PFASs. It thus can be extrapolated that other in-source fragmentation features of PFASs such as neutral losses, which have been observed previously45, could facilitate nontarget identification of these substances. In addition, in-source neutral losses may be more compound-specific than neutral losses occurring in collision-induced dissociation (CID), since fragmentation is generally more difficult to happen in electrospray ionization (ESI) source than in CID cell. However, so far, no study has applied in-source neutral losses to trace and screen quasi-molecular ions of PFASs for identifying these compounds.

The LC-HRMS techniques used in nontarget analysis such as LC-Q-Orbitrap-HRMS and LC-QTOF-MS are full-information scanning techniques, which usually generate large-volume dataset for each sample, resulting in laborious and cumbersome data-processing workload50,51,52,53. As a result, in nontarget analysis using these techniques, screening and identifying compounds of concern is analogous to “finding needles in a haystack”54, and thus requires aid of computer programs in data-processing procedures, e.g., data mining and analysis24,25,52,55. However, few scripting approaches have been developed and used for nontarget analysis of PFASs in environmental matrices using LC-HRMS so far40. As in-source neutral losses of PFASs can occur in ESI-MS and lead to constant mass discrepancies between precursor and fragment ions, this property could be readily utilized in algorithms for searching and identifying PFAS features in LC-HRMS data. In addition, specific carbon and sulfur isotopologue distributions can also be used in algorithms for screening and identifying PFSAs, as some previous studies have already used isotopic patterns for aiding in nontarget identification of these compounds39,40,45.

Benefiting from the rapid development in nontarget analysis, comprehensive characterization (i.e., finding and profiling all relevant compounds) of chemical components of certain groups of compounds has been realized recently56,57,58, which can effectively depict pollution signatures of the pollutants from an overall perspective. To date, most of the reported studies relevant to comprehensive characterization of environmental pollutants focused on limited groups of compounds, such as hydrocarbons59, halogenated dioxins60, and atmospheric brown carbon components61. However, at present, studies focusing on comprehensive characterization of PFAS pollutants in environmental matrices or other media are still scarce62.

Hence, in this work, we systematically conducted nontarget, quasi-target and target analyses of PFASs in fluorinated-industrial wastewater, and further comprehensively characterized their chemical components and distributions based on results of identification, quantification and semiquantification. The nontarget analysis, an essential part of this work, was implemented by LC-Q-Orbitrap-HRMS with the aid of in-house algorithms involving characteristic in-source neutral losses and specific carbon and sulfur isotopologue distributions of PFASs. The identification results derived from the nontarget and quasi-target analyses were confirmed by DDA and DIA mass spectra. A large number of PFAS formulae were identified and assigned with tentative or exact chemical structures, and a much larger number of congeners including isomers were found. The comprehensive characteristics of the components and distributions of PFASs in the wastewater were explored. This study presents an integral method for accurate and high-efficient identification of both known and unknown PFASs in complex environmental water samples, and the comprehensive analysis outcome provides an overview of the PFASs signature in the fluorinated-industrial wastewater.

Results and discussion

Method performances

Quantitative target analysis

In this study, quantitative target analysis was performed by LC-MS/MS in MRM mode using the native standards of 21 PFASs (Supplementary Table 1). The accuracies of all the standards in all quality control (QC) samples were in range of 85.4–114.9% with relative standard deviations (RSDs) of 0.9–14.7% (Supplementary Table 1), indicating satisfactory accuracy and precision of the target analysis method. The experiments of recovery and matrix effect were carried out with 8 extraction internal standards (EISs). As documented in Supplementary Table 2, the absolute recoveries were 43.5–139.9% with RSDs of 1.0–17.9%, and the matrix effects were 56.8–120.0% with RSDs of 0.5-21.1%. The matrix effects in TE (the total effluent wastewater) were not necessarily better than those in other samples, even though the wastewater of TE had been treated by the subsidiary wastewater treatment plant before sampling. The limits of quantification (LOQs) of the individual targeted PFASs were 0.1 ng mL−1, exhibiting sufficient sensitivity for the quantitative analysis. In the wastewater samples, 8 PFCAs were quantified (Supplementary Fig. 1), with the concentrations from 0.2 ng mL−1 to 23.8 μg mL−1 (Supplementary Table 3). The quantified PFCAs in Post-RO (the wastewater after reverse osmosis treatment) generally showed higher concentrations than in Pre-RO (the wastewater before reverse osmosis treatment), which was well consistent with the fact that the reverse osmosis treatment was actually a concentration process for solutes.

Nontarget analysis

The nontarget analysis of PFASs were conducted by LC-HRMS with the aid of the developed data-processing algorithms. The performance of the nontarget analysis method was evaluated by standard solutions and the native PFCAs and labeled PFCA standards in the wastewater samples. As shown in Supplementary Table 4, in the calibration sample at 100 ng mL−1, all the native standards could be filtered and identified with the nontarget analysis method. In addition, some 13C1- and 13C2-substituted PFASs, the molecular carbon isotopologues of the native standards, were found. Besides, some 13C3- and 13C4-labeled standards were screened and identified. These results demonstrate that the nontarget analysis method developed in this study was reliable and efficient.

In addition to standard solutions, the nontarget analysis method was also validated with the wastewater samples via the PFCAs identified by the target analysis method using LC-MS/MS, along with their 13C1- and 13C2-substituted molecular isotopologues, and the spiked internal standards containing 13C3 and 13C4. As listed in Supplementary Table 5, 6 of the 8 PFCAs detected by LC-MS/MS were screened and identified by the nontarget analysis method, along with some of their 13C1- and 13C2-substituted molecular isotopologues. In addition, the 13C3- to 13C4-labeled internal standards were screened and identified. These results further confirm the reliability of the nontarget analysis method.

The features of PFASs in the wastewater samples

Overview of total PFASs

In total, 175 PFAS formulae were identified in the wastewater (Fig. 1), among which 119, 79, and 8 formulae were identified by means of nontarget analysis, quasi-target analysis and target analysis, respectively (Fig. 1a). Twenty-five formulae were found by both nontarget and quasi-target analyses, and 6 were identified by both nontarget and target analyses. This result indicates that nontarget analysis could find most PFASs, and quasi-target analysis could be a complementary approach for identifying PFASs that were missed in nontarget analysis. In TE, Pre-RO and Post-RO, 107, 151 and 159 formulae of PFASs were found, respectively, of which 101 could be found in all the samples, and 44 were identified in both Pre-RO and Post-RO (Fig. 1b), suggesting good reproducibility and reliability of the identification approaches. In all the PFAS formulae, 120 were PFCAs and 55 were PFSAs, accounting for 69% and 31% of the total, respectively, showing that PFCAs were the predominant and more diverse PFASs (Fig. 1c). As illustrated in Fig. 1d, the total concentrations of all the PFASs found in TE, Pre-RO and Post-RO were as high as 5.3, 15.9 and 33.4 μg mL−1, respectively, indicating serious PFAS contamination in all the wastewater samples. The magnitude orders of both PFAS species and total concentrations were as follows: TE < Pre-RO < Post-RO (Fig. 1b, d). It was reasonable that Post-RO had more PFAS species and higher concentrations than Pre-RO because the RO process concentrated the PFASs and higher concentrations enhanced detectable rates. TE had less PFAS species and lower concentration than other samples, implying that the subsidiary wastewater treatment plant was effective to remove partial PFASs from the wastewater.

Fig. 1: Outline of the PFASs identified in the wastewater.
figure 1

a Formula numbers found by nontarget analysis (NTA), quasi-target analysis (QTA) and target analysis (TA); 88, 54 and 2 formulae were exclusively found by NTA, QTA and TA, respectively; 25 formulae were identified by both NTA and QTA; 6 formulae were found by both NTA and TA. b Formula numbers found in the three samples, i.e., total effluent (TE), before reverse osmosis treatment (Pre-RO), and after reverse osmosis treatment (Post-RO); 101 formulae were found in all the wastewater samples; 44 formulae were found in both Pre-RO and Post-RO; 1 formula was found in both TE and Pre-RO; 1 formula was found in both TE and Post-RO; 4, 5 and 13 formulae were exclusively found in TE, Pre-RO and Post-RO, respectively. c Formula numbers and proportions of the identified per- and polyfluorocarboxylic acids (PFCAs) and per- and polyfluoroalkanesulfonic acids (PFSAs); 120 formulae were PFCAs, and 55 were PFSAs. d Total concentrations of all the identified PFASs in individual wastewater samples.

Formula characteristics

The identified PFAS formulae were mainly in the mass range of 100–600 u (Supplementary Fig. 2), indicating that the PFASs in the wastewater were mainly low to medium molecular weight compounds (≤600 u). The mass ranges of 150–400, 150–450, and 150–450 u had the most formula numbers in TE, Pre-RO and Post-RO, respectively. The formula numbers were generally normally distributed with the molecular weights in 100–650 u. The formula number distributions in Pre-RO and Post-RO were fairly consistent (Supplementary Fig. 2), whereas that in TE was different from others. This observation confirms that Pre-RO and Post-RO were cogenetic, and implies that the TE wastewater contained the Pre-RO/Post-RO wastewater and other wastewater with different PFAS compositions.

The graphs of CF2-normalized Kendrick masses against adjusted Kendrick mass defects (AKMD) of all detected ions applied for PFAS screening (Fig. 2a) and all the identified PFASs (Fig. 2b) were plotted. As illustrated in Fig. 2a, the masses of all measured ions were mainly in the range of 100–1000 u, and the AKMD were distributed in the range of 0.4–1.5. It is noteworthy that a great number of measured ions fell within the AKMD range of 0.9–1.1, possibly encompassing a large number of PFASs. Since numerous ions were within the PFAS-characteristic AKMD range (0.9–1.1), the screening and identification of PFASs were thus challenging, and the identification could not simply rely on formula assignment for quasi-molecular ions. As shown in Fig. 2b, all the normalized Kendrick masses (normalized with CF2) of the identified PFASs were in the range from 100 to 760 u, and the AKMD fell within the range of 0.94–1.06. In addition, most of the identified PFASs were within a relatively narrow AKMD range from 0.98 to 1.04. These results suggest that the identification outcomes were reasonable and reliable in light of the mass defect feature.

Fig. 2: Plots of CF2-normalized Kendrick mass vs. adjusted Kendrick mass defect (AKMD) of all detected ions and the identified PFASs, and van Krevelen diagrams of the PFASs in the wastewater samples.
figure 2

a CF2-normalized Kendrick mass vs. AKMD of all detected ions. b CF2-normalized Kendrick mass vs. AKMD of the identified PFASs. c O/C vs. F/C of the identified PFASs. d O/C vs. (H+F)/C of the identified PFASs. The calculation procedures for the CF2-normalized Kendrick masses and AKMD were referred to the study of Liu et al.45, and are also provided in the Supplementary Information (Supplementary Equations 1-4).

In addition to the AKMD plots, the van Krevelen diagrams of O/C vs. F/C (Fig. 2c) and O/C vs. (H + F)/C (Fig. 2d) of the identified PFASs were plotted. The van Krevelen diagrams explicitly show characteristic patterns of the PFASs, indicating different groups of PFAS species. The dots in the diagram of O/C vs. (H + F)/C (Fig. 2d) are less than those in the O/C vs. F/C diagram (Fig. 2c), indicating many identified PFASs were hydrogen-substituted species (H-PFASs). As shown in Fig. 2c, d, the compound groups of PFCAs and PFSAs could be clearly recognized through the characteristic patterns of O/C vs. F/C, and O/C vs. (H + F)/C. These observations also demonstrate the high accuracy and reliability of the nontarget and quasi-target analyses in this work.

PFAS distributions vs. carbon, hydrogen, and oxygen numbers

The identified PFASs possess the carbon numbers of 2–16 (Supplementary Fig. 3). The predominant PFASs contain 2–9 carbon atoms, among which the three most abundant are those containing 3, 4, 6, and 8 carbon atoms (Supplementary Fig. 3a). C8-PFASs were the most abundant in all the samples, presenting the concentrations of 2.6, 12.8, and 24.4 μg mL−1 in TE, Pre-RO and Post-RO, respectively. This observation is ascribable to the high concentrations of n-PFOA in the wastewater (1.7–23.8 μg mL−1), accounting for 31.7–71.2% of the concentrations of ∑PFASs (i.e., all the PFASs found in individual wastewater samples), which suggests that the chemical industry park might mainly use n-PFOA-containing fluoride materials. On the other hand, the PFASs with relatively larger carbon numbers (10–16) showed significantly lower concentrations (0.8–20.4 ng mL−1) in comparison with those with less carbon atoms. In addition, the PFASs in TE merely possess 2–10 carbon atoms. The formulae of the identified PFASs mainly contain 4–9 carbon atoms, and the three PFAS groups with the most formulae are those having 6-8 carbon atoms (Supplementary Fig. 3b), with the formula numbers of 58–83, accounting for 52.2–54.2% of the total PFAS formulae. C8-contaning PFASs had the most formulae, with the numbers of 25, 33, and 32 in TE, Pre-RO and Post-RO, respectively. The PFASs with 2 and 10–16 carbon atoms presented significantly less formula numbers in contrast to others. The congener number distributions against carbon numbers were fairly consistent with the formula distributions (Supplementary Fig. 3b, c). The C4- to C10-PFASs possessed the most congeners, and C8-PFASs had the largest congener numbers (Supplementary Fig. 3b, c), whereas the PFASs containing 2, 3, and 11–16 carbon atoms possessed significantly less congeners.

The identified PFASs had hydrogen numbers from 0 to 17 (Supplementary Fig. 4). As can been seen in Supplementary Fig. 4a, the predominant PFASs were those containing 0, 1, and 3–5 hydrogen atoms. The perfluorinated species (hydrogen number = 0) had the highest concentrations among all the PFASs, with the concentrations of 4.0, 13.7, and 26.7 μg mL−1 in TE, Pre-RO and Post-RO, respectively, accounting for 76.2–86.2% of the total concentrations of PFASs. The PFASs with the hydrogen numbers of 2, 6, and 9–17 were significantly less abundant than others. Particularly in TE, merely the PFASs containing 0–11 hydrogen atoms were found. Interestingly, the PFASs containing 8, 10, 12, 14, and 16 hydrogen atoms were not found, which may relate to pathways of PFAS transformation. The PFAS abundance distributions in Pre-RO and Post-RO were relatively consistent, but different from that in TE. The formula numbers of the identified PFASs decreased as the hydrogen number increased from 0 to 17 (Supplementary Fig. 4b). The non-hydrogen-substituted PFASs encompassed the most formula numbers, possessing 53-63 species in the wastewater, accounting for 39.1–49.5% of the total formulae. It is worth noting that the formula number distributions against hydrogen numbers of the PFASs were fairly consistent in the three wastewater samples. The PFASs with 6–17 hydrogen numbers had significantly less formulae than the PFASs with other hydrogen numbers. Similar to the formula numbers, the congener numbers of PFASs generally decreased as the hydrogen numbers increased from 0 to 17 (Supplementary Fig. 4c). The PFASs containing 0-2 hydrogen atoms possessed the most congeners, in total accounting for 68.2–80.5% of the ∑PFASs.

The oxygen numbers of the identified PFASs were 2–8 (Supplementary Fig. 5). As shown in Supplementary Fig. 5a, the PFASs with two oxygen atoms were the predominant species, in other words, PFCAs containing merely two oxygen atoms were the predominant, with the concentrations of 4.7–31.2 μg mL−1, accounting for 88.7–94.8% of the total concentrations of PFASs in the wastewater. The PFASs with oxygen numbers of 4 and 3 were the second and the third highest abundant, with similar concentration ranges of 184.2–370.4 and 144.3–316.0 ng mL−1, respectively, accounting for 0.9–3.9% of the total concentrations of PFASs. In general, the concentrations decreased with the increasing oxygen numbers, and PFASs with 5–8 oxygen atoms had significantly lower relative abundances than those with 2–4 oxygen atoms, in all accounting for 0.1-0.7% of the total PFAS concentrations. With respect to formula numbers, PFASs with oxygen number of 2–4 possessed the most formulae, making up 83.6-88.8% of the total (Supplementary Fig. 5b). The formula numbers generally decreased with the increase of oxygen numbers. The congener number distributions with oxygen numbers were roughly similar to those of the formula number distributions, showing a gradual decrease with oxygen numbers from 2 to 8 (Supplementary Fig. 5c).

Distribution signatures of PFCAs and PFSAs

As shown in Fig. 3a, the predominant PFASs were PFCAs, of which the concentrations were overwhelmingly higher than those of PFSAs, accounting for 95.5–96.4% of the ∑PFASs in the wastewater. On the other hand, the relative abundance distributions of PFCAs and PFSAs were consistent, showing the same concentration order of TE < Pre-RO < Post-RO. With respect to formula numbers, the identified PFCAs had much more formulae than PFSAs (Fig. 3b), making up 66.9-82.2% of the total. The concentration orders of PFCAs and PFSAs in the three samples were consistent, presenting the same order of TE < Pre-RO < Post-RO. The congener distributions between PFCAs and PFSAs were similar to the formula distributions, i.e., the PFCAs had much more congeners than the PFSAs, accounting for 73.0-83.3% of the total (Fig. 3c). The congener numbers of PFCAs and PFSAs in the three samples shared the same order of TE < Pre-RO < Post-RO. The above results indicate that the PFASs in the wastewater were mainly PFCAs in terms of abundances, formula numbers and congener numbers, suggesting that the fluorinated materials used in the chemical industry park mainly contained PFCAs.

Fig. 3: Distributions of the ΣPFCAs and ΣPFSAs identified in the wastewater samples.
figure 3

a Distribution of concentrations. b Distribution of formula numbers. c Distribution of congener numbers.

Isomer distributions of representative PFASs

A large number of PFAS congeners including isomers of individual formulae were found in the wastewater. Generally, several isomers were observed for individual formulae, particularly for those with relatively large carbon numbers (Supplementary Tables 68). For instance, as illustrated in Supplementary Fig. 6, PFHxA and PFOA presented four and five isomers in the wastewater, respectively, which were merely different in carbon skeletons, i.e., normal (n-) vs. branched carbon chains. n-PFHxA and n-PFOA showed longer retention times than their respective branched isomers on the C18 chromatographic column (Supplementary Fig. 6). In this work, we measured the isomeric concentrations of typical PFASs by quantitative and semiquantitative analyses (Supplementary Table 8), and the isomeric concentration distributions of some representative perfluorinated species were illustrated in Fig. 4 and Supplementary Fig. 7. In general, six representative n-PFCAs were much more abundant than their respective branched isomers. Specially, the concentrations of n-PFHxA and n-PFOA were 172.7–717.1 ng mL−1 (Figs. 4a) and 1.7–23.8 μg mL−1 (Fig. 4b), respectively, accounting for 90.4–92.2% and 65.9–75.0% of the total concentrations of PFHxA and PFOA (Fig. 4). Interestingly, for PFHxA and PFOA, the relative abundances of their isomers gradually increased with the increasing retention times (Supplementary Fig. 6), and the abundance distributions in the three samples were similar, implying the same sources of PFHxA and PFOA in the samples (Fig. 4 and Supplementary Fig. 7). In addition, except PFPeA, all the representative PFCAs showed similar isomeric abundance distributions in the three samples, suggesting that many PFASs in the samples might be from same sources (Fig. 4 and Supplementary Fig. 7). This observation indicates that the isomeric abundance distributions of PFASs may be capable of source identification for these compounds.

Fig. 4: Isomeric concentration distributions of two typical PFASs detected in the wastewater samples.
figure 4

a Perfluorohexanoic acid (PFHxA). b Perfluorooctanoic acid (PFOA).

Structural elucidation for PFASs

By means of the data-processing algorithms developed in this study, most F-containing carboxylic acids and sulfonic acids could be screened and identified. As a result, the identified PFASs possess at least one carboxyl/sulfonic acid group. Since the in-source neutral loss CF2O of PFCAs and characteristic ion FSO3 of PFSAs were taken into account in the identification process, the carbon linking to the carboxyl should bond to no less than two fluorine atoms, and that neighboring the sulfonic acid group should link with ≥ one fluorine atom. Using reference standards, we unambiguously identified the structures of eight PFCAs (Supplementary Table 3). In addition, some PFASs processing sole structure were also unambiguously elucidated with structures, such as chlorodifluoroacetic acid (C2O2ClF2), trifluoroacetic acid (C2O2F3), pentafluoroethanesulfonic acid (C2O3F5S), difluoro-hydroxy-acetyl fluoride (C2O4F3S), 3,3,3-trifluoropropanoic acid (C3H2O2F3), fluoropropanate (C3HO2F4), 3-chlorotetrafluoropropionic acid (C3O2ClF4), perfluoropropionic acid (C3O2F5), difluoro(trifluoromethoxy)acetic acid (C3O3F5) and carbonic acid mono-(difluoro-trifluoromethoxy-methyl) ester (C3O4F5) (Supplementary Table 9). A number of identified PFASs could be proposed with probable structures by searching the formulae in ChemSpider and PubChem. The existence of these PFASs in the databases indicates that they might be found and/or synthesized previously. Many formulae of the screened PFASs could not be matched with rational structures or no relevant structure was present in the databases, which were proposed with tentative structures in light of their formulae, categories (PFCAs or PFSAs) and mass spectra. These PFASs are likely unknown pollutants, even though their exact structures could not be elucidated. Some N-containing PFAS formulae could not be assigned with putative structures, due to high complexity and large numbers of potential structures. Nonetheless, the categories of these PFASs could be ascertained, as their carboxyl/sulfonic acid group and fluorine numbers could be specified (Supplementary Table 9).

Rarely reported and representative unknown PFASs

In this study, the quantitatively analyzed eight PFCAs and some perfluorinated short-chain PFASs (C2 and C3) were common species reported previously. These short-chain PFASs included trifluoroacetic acid (C2O2F3), pentafluoroethanesulfonic acid (C2O3F5S) and perfluoropropionic acid (C3O2F5). However, most of the identified PFASs also contain other elements (e.g., H, Cl and N) beyond traditional perfluorinated species, or with extra oxygen atoms. These PFASs are potentially little-known or unknown fluorinated pollutants in the environment. However, it is challenging to determine their exact structures and difficult to ascertain whether they have been reported previously. In this work, if some PFAS formulae cannot be matched with rational structures via online chemical databases such as ChemSpider and PubChem or even no relevant information is available in these databases (Supplementary Table 9), they can be regarded as potentially unknown PFASs irrespective of undetermined exact chemical structures. With this recognition standard, 84 potentially unknown PFASs were identified, such as difluoro-fluorocarbonyl-methanesulfonic acid (C2O4F3S) and 2,2,3-trifluoropropanoic acid (C3H2O2F3), and their structures have also been tentatively proposed (Supplementary Table 9). These PFASs may be of high research significance and warrant further in-depth investigation.

Most importantly, three iodinated PFSAs (I-PFSAs), i.e. C6H6OF6I-SO3H (including two isomers) and C6H4F8I-SO3H (Supplementary Tables 7,9) were discovered in this work. To the best of our knowledge, I-PFSAs have neither been reported previously, nor can be found in any available database. As shown in Supplementary Table 7, the three I-PFSAs were found not only in Pre-RO, but also in Post-RO, with concentrations of 1.2-20.9 ng mL−1. The total concentrations of the three I-PFSAs in Pre-RO and Post-RO were 11.9 and 23.7 ng mL−1, respectively, accounting for 8.2% and 7.7% of the total concentrations of all the identified PFSAs in the respective samples. In particular, the isomers of C6H6OF6I-SO3H were two of the major PFSAs in the samples (Supplementary Table 7). These I-PFSAs may find their way into the environment ultimately like some ubiquitous PFAS pollutants. As a consequence, I-PFSAs may be a group of non-negligible unknown PFASs in the environment, needing further research and serious concerns.

Implications and prospect

This study presents an integrated method for comprehensive characterization of PFAS pollutants in fluorinated-industrial wastewater by combined utilization of nontarget, quasi-target and target analyses using LC-Q-Orbitrap-HRMS and LC-MS/MS. Data-processing algorithms in light of characteristic in-source neutral losses and isotopologue distributions were applied to the screening and identification of PFAS pollutants. Semiquantitative analysis based on the quantification results was employed to determine the concentrations and distributions of PFASs in the wastewater. Comprehensive characterization of PFASs in the wastewater was successfully conducted. A large number of PFAS pollutants (175 formulae and >350 congeners), including traditional, rarely known and unknown species were found and further ascertained in terms of concentrations and distributions. The total concentrations of PFASs in the samples were from 5.3 to 33.4 μg mL−1, indicating heavy PFAS pollution in the wastewater. A number of potentially unknown PFASs were identified (84 formulae), which merit further in-depth research. The present study not only provides a highly efficient screening and identification approach for PFASs, but also presents a practicable way to comprehensively delineate pollution signatures of PFASs in the environment. The developed method involving instrumental analysis and data-processing algorithms can be extended to comprehensive analysis of PFAS pollutants in other matrices for revealing the full picture of PFASs in the environment. The identification, semiquantification and quantification outcomes provide crucial understandings about the environmental pollution caused by PFASs, especially in the aspect of rarely known and unknown PFASs. It will be worthwhile to study the rarely known and newly discovered PFASs in the aspects of exact structure elucidation, accurate quantification, environmental occurrence and health risk assessment.

Methods

Chemicals and materials

Three standard solutions, i.e., native perfluorinated compound (PFC) stock solution (PFAC-MXC, 21 PFASs), mass-labeled PFC EISs solution (MPFAC-C-ES, 13 13C-labeled PFASs), and mass-labeled PFC injection internal standards (IIS) solution (MPFAC-C-IS, 4 13C-labeled PFASs) were purchased from Wellington Laboratories Inc. (Ontario, Canada). Details of these standard solutions are provided in Supplementary Table 10. Methanol (MeOH) and acetonitrile (ACN) were of chromatographic grade and purchased from Merck Corp. (Darmstadt, Germany). Nylon filters (0.45 μm) were bought from Jinteng Experiment Equipment Company (Tianjin, China). Chromatographic grade ammonium acetate (NH4Ac) was purchased from Thermo-Fisher Scientific Co., Ltd. (Hampton NH, USA). Ultrapure water (electrical resistivity: 18.2 MΩ cm) was produced by a Millipore water purification apparatus (Millipore Corporation, Bellerica, MA, USA).

Calibration and quality control (QC) working solutions were prepared by serial dilution of the PFAC-MXC solution using MeOH/H2O (1:1, v/v), with the concentration ranges of 1–1000 ng mL−1 and 1–800 ng mL−1, respectively. Working solutions of EIS and IIS were prepared by diluting the MPFAC-C-ES and the MPFAC-C-IS solutions to 500 ng mL−1 with MeOH/H2O (1:1, v/v), respectively. Calibration and QC samples were prepared by 10-fold dilution of the corresponding working solutions with MeOH/H2O (1:4, v/v), followed by addition of 10 μL of the EIS working solution and 10 μL of the IIS working solution, and the final volume of each sample was 1 mL. The concentrations of the calibration samples were 0.1, 0.5, 1, 5, 10, 50, and 100 ng mL−1 for all the analytes, and those of the QC samples were 0.1 (LLOQ), 0.25 (LQC), 5 (MQC), and 80 ng mL−1 (HQC). These solutions were stored in a freezer at −20 oC before use.

Sample information and pretreatment

In June 2021, three wastewater samples were collected from a chemical industry park in North China, where PFASs were used as materials in industrial production. Among the three samples, two were collected before and after a reverse osmosis (RO) treatment process, and labeled as Pre-RO and Post-RO, respectively. The third sample was collected from the total effluent (TE) of a subsidiary wastewater treatment plant of the chemical industry park, and labeled as TE. Specifically, the wastewater treatment plant involved an activated carbon adsorption process. The samples were placed in polypropylene bottles and immediately transported to our laboratory for pretreatment. A volume of 800 μL of each sample was transferred to a 2-mL glass vial, and spiked with 10 μL of the EIS working solution and vortex-mixed for 2 min. Then, 180 μL MeOH was added in the sample mixture and vortex-mixed for 2 min, followed by addition of 10 μL of the IIS working solution. Thereafter, the sample mixture was filtered with a Nylon filter, and then sealed for instrumental analysis. Specifically, for quantitative target analysis of PFASs with high concentrations in the wastewater samples using LC-MS/MS, the raw samples were subjected to 100-fold dilution with ultrapure water before the pretreatment procedures described above.

Instrumental analysis

Nontarget and quasi-target analyses

The nontarget and quasi-target analyses were conducted by an LC-Q-Orbitrap-HRMS system comprised of a Dionex ultra performance liquid chromatograph and a Q-Extractive Plus mass spectrometer equipped with a heated-electrospray ionization (HESI) source (Thermo-Fisher Scientific, Waltham, USA). Chromatographic separation was carried out on an Acquity UPLC® BEH C18 column (2.1 × 100 mm, 1.7 µm, Waters, Milford, MA, USA), with a protective column with the same packing material (1 mm). The mobile phase A was ultrapure water containing 2 mM NH4Ac, and the mobile phase B was ACN. The gradient elution program was set as follows: from 0 to 0.2 min, mobile phase B kept at 20%; 0.2–8 min, B linearly increased to 80%; 8–10 min, B linearly increased to 95%; 10–12 min, B maintained at 95%; 12–12.1 min, B decreased to 20%; 12.1–15 min, B kept at 20%. The flow rate was 250 µL min−1, and the column was kept at room temperature. The injection volume was 5 µl. The HESI source was operated in negative mode. Other ionization source parameters are listed as follows, spray voltage: −3200 V, sheath gas flow rate: 45 arb, auxiliary gas flow rate: 10 arb; capillary temperature: 320 oC, auxiliary temperature: 350 oC. Full scan, DDA, and DIA modes were applied, with the mass resolutions of 140,000, 70,000, and 70,000, respectively (at 200 u). The scan ranges of full scan mode were set at m/z 50–750 and 100–1500, and that of DIA mode was m/z 70–1000.

Target analysis

The target analysis was performed by an LC-MS/MS system consisting of a Waters Acquity H-Class UPLC and a Xevo TQD triple quadrupole tandem mass spectrometer equipped with an ESI source (Waters Corporation). The LC column used and the LC working conditions were identical to those in the LC-Q-Orbitrap-HRMS analysis as described above. The ESI source was operated in negative mode. The ESI parameters are provided as the following, capillary voltage: −2500 V, source temperature: 150 oC, desolvation temperature: 500 oC, desolvation gas flow rate: 800 L h−1. Multiple reaction monitoring (MRM) mode was applied for data acquisition. The MRM parameters including ion transition channels, cone voltages and collision energies are documented in Supplementary Table 11.

Data-processing

Nontarget analysis

The data volume for each sample generated by LC-Q-Orbitrap-HRMS was fairly large (~120 MB), and the data processing was thereby crucial and challenging. The chromatogram of each sample (0.5–15 min) was equally divided into 29 segments with a steplength of 0.5 min, and the full scan mass spectral data of individual segments were exported from the software Xcalibur 4.1 (Thermo-Fisher) to Excel files in form of plain text.

PFCAs

The mass spectral data were first screened with a Matlab script for total per- and polyfluorocarboxylic acids (PFCAs). The algorithm was based on the premise that each PFCA should generate three ions, namely, [M–H], [M–CO2–H] and [M–CF2O–H], with the mass discrepancy between the first ions of 22.00189 ± 0.001 u and that between the last two ions of 43.98983 ± 0.001 u. In addition, based on the mass spectra of the native PFCA standards, the abundance ratio between the first two ions and that between the last two were set within the ranges of 0.5–100 and 0.1-40, respectively. Therefore, all the potential PFCAs containing one -COOH and at least one -CF2 were filtered. Then, the exact masses of the quasi-molecular ions ([M–H]) were determined. The exact mass of each quasi-molecular ion was then checked with the formula calculation program implemented in Xcalibur for formula assignment, and the general formula was preset as C2–30H0–60O2–10N0–2P0–2F2–60. The mass error should be ≤3 ppm, and the ring and double bond equivalence (RDBE) values should be with a decimal part of 0.5. In addition, formulae which had incommensurate RDBE values so that could not constitute reasonable molecules were regarded as fragment ions rather than quasi-molecular ions. The quasi-molecular ion of each PFCA candidate, along with the in-source fragment ions, namely, [M–CO2–H] and [M–CF2O–H] were checked in the full scan chromatogram, and the extracted ion chromatograms (EICs) of these ions were obtained. Then, the chromatographic peaks (including possible isomeric peaks) of PFCA candidates were determined. Afterwards, DDA (if detected) and DIA mass spectra of the PFCA candidates were extracted, and their chemical structures were tentatively elucidated via analysis of characteristics of the mass spectra and fragmental pathways of the ions. For complex PFCAs with uncommon structures difficult to elucidate, their molecular formulae were searched in online chemical databases, e.g., ChemSpider and PubChem to match potential chemical structures, thus facilitating the structure identification.

PFSAs

The mass spectral data were first filtered by a Matlab script for overall potential per- and polyfluoroalkanesulfonic acids (PFSAs), which based on the algorithm for searching monosulfur-containing organic compounds via their specific carbon and sulfur isotopologue distributions. For each potential PFSA, at least 3 isotopologues of the quasi-molecular ion should be detected, i.e., [M − H], [13C1-M − H] and [34S1-M − H], and the mass discrepancies between [M − H] and [13C1-M − H], and that between [M − H] and [34S1-M − H] should be within 1.00335 ± 0.001 and 1.99580 ± 0.001 u, respectively. In addition, the abundance ratio of [13C1-M − H] to [M − H], and that of [34S1-M − H] to [M − H] should be within 0.02–0.3 and 0.024-0.064, respectively. Moreover, in each chromatogram segment, the in-source fragment ions SO3 (m/z 79.95736), HSO4 (m/z 96.96010) and/or FSO3 (m/z 98.95576) should be found. After filtering, quasi-molecular ions of potential PFSA candidates were identified, which were then sent to the formula matching program in Xcalibur for formula assignment and checking. The general formula for the assignment was preset as: C2–30H0–60O3–10N0–2P0–2SF2–60. Like PFCAs, the mass error tolerance for screening PFSAs was ≤ 3 ppm, and the RDBE values should contain a decimal part of 0.5. The three quasi-molecular ion isotopologues of each potential PFSA were further checked in the full scan chromatogram, and their EICs were checked to ascertain the chromatographic peak or isomeric peaks of the PFSA. Thereafter, the DDA (if detected) and DIA mass spectra of each PFSA candidate were extracted and checked for the existence of the diagnostic product ions SO3 and FSO3, for confirming the PFSA formula. In addition, according to the MS/MS spectra obtained by DDA/DIA, chemical structures of the PFSAs were tentatively elucidated.

The workflow of the data-processing procedures is outlined in Supplementary Fig. 8. All the screening scripts were written with Matlab®R2020a (Mathworks, Inc., Natick, MA, USA), and the data processing was performed by Matlab.

Quasi-target analysis

According to the structures and mass spectra of the 21 native reference standards of PFASs, we in-silico devised a large number of homologues of the reference standards by substitution with and/or introduction of H, O, Cl and Br atoms, wherein the introduced oxygen atom(s) are involved in ether, alcoholic hydroxyl and carbonyl. The exact masses of quasi-molecular ions of the devised PFASs were then checked in the full scan chromatograms with the mass error ≤3 ppm. If a suspect chromatographic peak of a devised PFAS was found in the chromatogram, the reasonability of retention time was assessed by comparison with that of corresponding reference standards in consideration of the introduced group with regard to polarity variations. If the retention time was rational, then the full scan, DDA (if available) and DIA mass spectra of the peak were extracted and checked for the presence of diagnostic fragment ions of the PFAS. If the mass spectra were consistent with the devised PFAS, then the compound was identified with a tentative structure.

Quantitative target analysis and semiquantitative analysis

In this work, 13 PFCAs and 8 PFSAs were quantitatively analyzed. The quantification was performed with an internal standard method. Linear calibration was applied with the weight factor of 1/x. As for the identified PFASs without reference standards, semiquantification was conducted by comparing their MS signal intensities of quasi-molecular ions with those of their structurally similar PFASs that were quantitatively analyzed, e.g., hydrogen-substituted PFOA (H-PFOA) vs. PFOA.

Quality assurance and quality control

Quality assurance and quality control were conducted to ensure accuracy and reliability. The calibration samples and QC samples were analyzed by both LC-Q-Orbitrap-MS and LC-MS/MS. The performances of the nontarget analysis were evaluated with the reference standards in the calibration and the QC samples, the 13C-labeled standards spiked in the wastewater samples, and the PFASs found in the wastewater samples by the target analysis using LC-MS/MS. The accuracy and detection rates of these PFASs and reference standards were utilized to validate the reliability of the nontarget analysis. The quantitative target analysis of the 21 native PFASs was validated with the QC samples in terms of accuracy and precision at low, middle, and high concentration levels. In addition, the recoveries and matrix effects of the PFCAs in the LC-MS/MS analysis were evaluated by procedures used in our previous study63. Procedure blanks, and control blanks (spiked with internal standards only) were prepared and analyzed, and in these samples no analyte should be detected with signal intensity ≥20% of that in the LLOQ samples. The carryover effects should be ≤20% of the signal intensities of the analytes in the LLOQ samples.