Biomarker-indicated extent of oxidation of plant-derived organic carbon (OC) in relation to geomorphology in an arsenic contaminated Holocene aquifer, Cambodia

The poisoning of rural populations in South and Southeast Asia due to high groundwater arsenic concentrations is one of the world’s largest ongoing natural disasters. It is important to consider environmental processes related to the release of geogenic arsenic, including geomorphological and organic geochemical processes. Arsenic is released from sediments when iron-oxide minerals, onto which arsenic is adsorbed or incorporated, react with organic carbon (OC) and the OC is oxidised. In this study we build a new geomorphological framework for Kandal Province, a highly studied arsenic affected region of Cambodia, and tie this into wider regional environmental change throughout the Holocene. Analyses shows that the concentration of OC in the sediments is strongly inversely correlated to grainsize. Furthermore, the type of OC is also related to grain size with the clay containing mostly (immature) plant derived OC and sand containing mostly thermally mature derived OC. Finally, analyses indicate that within the plant derived OC relative oxidation is strongly grouped by stratigraphy with the older bound OC more oxidised than younger OC.

. Schematic sketch showing geomorphological cut crossing relationships of north-eastern Kandal Province, Cambodia, based on data by Papacostas et al. 44 . Also shown are the locations of study sites LR01, LR03, LR05, LR07, LR09, LR10, LR14 (this study and 36 ), SY 12 , KS 38 , DS and SR 14 . Note that wetlands area is approximate and size changes seasonally. This figure was produced using Inkscape 0.91 (https://inkscape.org/ en/download/windows/). 6,000 to 7,000 years the delta has prograded into the South China Sea due to the high sedimentation rate of the Mekong river system [38][39][40][41] .
Many of the circum-Himalayan rivers are dominated by monsoon processes where the monsoon rain causes greater and increased river water levels and thus frequent flooding. However in the Mekong of the Cambodian lowlands whilst severe flooding still occurs regularly it is less pronounced due to the effects of Tonle Sap, a large fluvial lake in the centre of Cambodia, which is connected to the Mekong by the Tonle Sap River. During the rainy season water flow is in a north westerly direction from the Mekong River into the Tonle Sap with water accumulating in the lake, during the dry season flow reverses flowing south east into the Mekong from the lake. This flow regime means that the seasonal difference in water levels in the Mekong is considerably less than in other circum-Himalayan rivers. This feature results in fewer flooding events and has an important effect on sedimentation downstream of Tonle Sap 42 . The onset of this seasonal reversal in flow at Tonle Sap, referred to as the flood-pulse, occurred between 4,000 and 3,600 14 C years BP due to a change in monsoon intensity; the modern system appears to have started 1,660 14 C years BP 43 .
Within the study area, Landsat images and interpretation of sedimentary cross-cutting relationships show that at least four generations of scroll bars exist 44 . The oldest of these is perpendicular to the present day Mekong whilst more recent scroll bars run parallel to the modern Mekong and are probably active features (Fig. 1). The oldest sediments are most likely the floodplain deposits into which subsequent river channels have eroded and deposited sediment.
Representative sites with contrasting geomorphological features were selected for study 36 . In this manuscript the nomenclature "LRXX-##" is used; LRXX refers to the site location and ## refers to the sample depth in metres. The sites from LR01 to LR09 make up the transect T-Sand 36,45 , of which key sites were selected for the analysis reported in this manuscript. T-Sand is a 4.5 km 2D transect running along the oldest generation of scroll bars from the central wetlands to the Bassac river. A pre-sampling electrical resistivity tomography here indicated it was probably a largely layer cake structure consisting of sands and clays 36,46,47 . Previous analyses on single boreholes indicate that this structure is associated with distinctive OC composition distribution 14,16,17 . Site LR14 was selected to provide a sample representative of the flood plain. LR10 was selected to provide a sample representative of younger scroll bars than on T-Sand (Fig. 1). Additional data from other sites (SY, DS, SR and KS) conducted earlier in the same region using similar methods have also been used to give a wider regional setting to this study 12,14,38,41 .

Results
Grainsize distribution, TOC and age of sediments. The Folk and Ward based mean grain size of the sediments (see supplementary information Table 1) ranges from 1.07 mm for the very coarse sand at LR02-27, to 0.005 mm for the fine silt at LR05-45 (Fig. 2). On T-Sand there are three layers; on the top is a silty and clayey cap, a middle sandy layer and a deep silt wedge. The deep silt wedge has grainsizes <0.031 mm (LR05 to LR03, 30-45 m) and the middle sandy sediments are >0.063 mm (Fig. 2). The clay cap extends across the top of the entire transect, with the exception of a sandy window at LR03. At LR09 and LR08 the clay extends to about 21 to 18 m, however across most of T-Sand it is much shallower and it does not extend far below 9 m (Fig. 2). LR10 is predominantly coarse silt and fine sand with grain sizes ranging from 0.038 to 0.25 mm, whereas LR14 is silty, with grain sizes ranging from 0.006 to 0.058 mm.
TOC concentrations in these sediments (supplementary information Table 1) are weakly associated with the sedimentary grain size distribution (R 2 = 0.38, p-value = 5 × 10 −5 , Figs 2 and 3). The TOC concentrations in the deep silty wedge range from 0.7 to 1% (w/w), in the middle sandy layers the concentration is lowest, ranging from <0.05% (w/w) at multiple locations, to 1.0% (w/w) at LR02-30, whilst the clay cap has a medium high TOC content with the lowest values from 0.5 and the highest at 1.2% (w/w) at LR01-6 and LR09-6 ( Fig. 2). Total nitrogen (TN) concentrations range from below detection limit to 0.26% (w/w) and correlate positively with TOC (R 2 = 0.61, p-value = 5 × 10 −13 , supplementary information Table 1). The C/N ratio ranges from 1.51 to 16.5 (supplementary information Table 1).
The radiocarbon age of the OC ranges from 1,500 14 C years BP (83 pmc; LR09-6) to 12,100 14 C years BP (22 pmc; LR05-45), generally sites age with depth. On T-Sand the oldest sediments are in the deep silty wedge with ages >9,000 14 C years while the youngest sediments are in the shallow clay cap deposits with ages <1,500 14 C years with intermediate ages for the middle sandy sediments. The shallow samples from the LR10 and LR14 are significantly older than those on T-Sand with ages of 2,600 14 C years BP (72 pmc) at LR10-6 and 6,400 14 C years BP (45 pmc) at LR14-6. The deeper sediments at 30 m at LR14 are the same age as the deep silty wedge on T-Sand 9,500 14 C years BP (31 pmc). The δ 13 C of OC ranges from −32. Lipid analysis. Lipid analyses of the sediments indicates the presence of low but quantifiable amounts of HMW n-alkanes with concentrations ranging from 6 to 67 ng/g sediment (Table 2) and a distribution that mostly follows the TOC concentration (R 2 = 0.78, p-value 5 × 10 −11 , Fig. 3) with highest concentrations predominantly in the clay layers ( Table 2, Fig. 4, representative chromatograms are presented in supplementary information). The distribution of HMW n-alkanes as a proportion of OC ranges from 2 to 62 mg/g OC, with the highest proportions found in the sandy layers and the lowest in the clay sediments (Table 2). Those with a CPI 21-35 > 2 are mostly restricted to the clay layers (both the clay wedge and the clay cap) with a notable exceptions of LR05-21 which is a sandy sediment with a relatively high CPI 21-35 of 2 and LR05-45, which is in the clay wedge but has a relatively low CPI of 1.4 (Table 2, Fig. 4). Note that LR03-6, a shallow sediment in the sandy window, has a relatively low CPI of 1.5 ( Table 2, Fig. 4). CPI 21-35 of HMW n-alkanes has a very strong correlation with TOC (R 2 = 0.81, p-value 6 × 10 −11 ), indicating that the more thermally immature the OC the more OC is present (Fig. 3). The HMW n-alkanoic acids have a similar magnitude of concentration to the HMW n-alkanes, ranging from 0.18 ng/g sediment at LR03-6 to 202 ng/g of sediment at LR05-30 or 0.06 mg/g of OC at LR03-6 to 25.4 mg/g of OC at LR05-21 ( Table 2). The CPI [22][23][24][25][26][27][28][29][30][31][32][33][34] is consistently high with a range from 3.26 at LR05-45 to 9.96 at LR01-6. The n-alkanoic acid to n-alkane ratio ranges from 0.06 to 0.75. LR14-15 is notable for having an exceptionally high HMW n-alkanoic acid concentration of 914 ng/g sediment. Other than this sample the HMW n-alkanoic concentration on T-Sand and at LR10 and LR14 were similar, with respective mean concentrations of 19 ng/g (n = 20) and 18 ng/g sediment (n = 6). Finally, at LR14-15 the n-alkanoic acid to n-alkane ratio is the highest ratio reported by this study with a value of 0.96.

Discussion
The grainsizes and radiocarbon ages indicate that there are three key ages of deposition. The Early Holocene was dominated by the deposition of clay sediments from about 12,000 years to 6,000 years BP (the early Holocene facies; EHF), followed by sandy deposits from 4000 years to 2000 years BP (mid Holocene facies; MHF) and finally a recent depositional period continuous for the last 2000 years (young Holocene Facies;YHF). The EHF is found at three localities which are the entire depth of LR14, LR05-30 and 45 m on T-Sand and KS below 5 m. The EHF samples at 30 m on T-Sand and LR14 have very similar radiocarbon ages, 9,500 ± 40 14 C years BP, however at LR14 the shallow (6 m) sediments are 6,360 ± 40 14 C years BP which is considerably older than anything on T-Sand at the same depth (1,700 ± 40 14 C years BP; Table 1). The similarity in ages between the deep sediments at LR05 and the age of the sediments at LR14 is strong evidence in favour of the presence of an EHF. Provided that little erosion has taken place at LR14 the 6 m sample represents the latter stages of deposition of the EHF, at around 7,310 to 7,160 cal years BP (equivalent to 6,360 ± 40 14 C years BP), which means that sedimentation of EHF ceased with the cessation of sea-level rise in the area at about 7,000 cal years BP 38,41 .
At KS sedimentary structures indicate that sediments from 12 m to 30 m, deposited during the early Holocene, were tidal deposits 38,41 and the carbon isotopic values are also indicative of marine deposition 30 . Here the early Holocene deposition also ceased with the cessation of sea-level rise at around 7265 to 7019 cal years BP (6,250 ± 40 14 C years BP, 7 m). However, at KS there was a transition from the marine to more terrestrial deposition at about 7675 to 7570 cal years BP (6,760 ± 40 14 C years BP, 12 m) which is recorded in both sedimentary structures and the carbon isotopes 38,41 . The carbon isotopes at LR14 and all other sites sampled by this study indicate that they are entirely terrestrial deposits 30 suggesting that the early Holocene shoreline was located between LR14 and KS (Fig. 5).
The MHF ranges from 4,260 ± 40 14 C years BP to 2,990 ± 40 14 C years BP (Fig. 5). The contacts between the EHF and the MHF represent unconformities with the appearance of an incision channel at about to 4,855 to 4,625 cal years BP (4,220 ± 40 14 C years BP, based on the date from SY-27). The occurrence of this high energy system is concurrent with a major change in monsoon patterns 42 , given an increase in precipitation would increase the load carrying capacity of the Mekong then it is possible that the sandy facies is linked to the change in monsoon, however it could also be a localised geomorphological feature.
The final depositional sequence on T-Sand (and SY) was the shallow clay cap of the YHF. The YHF is not present at LR10 or LR14 or at the high point on T-Sand (LR03) where there is a sandy window through the clay. Given that the YHF is associated with recent flooding events 38,41 , this means that modern flooding events do not affect all areas, LR03 is in raised topography and thus is less susceptible to flooding events.  Fig. 1) and ## refers to the depth in metres. Also, samples from earlier studies labelled $$-## where $$ is the location (see Fig. 1) and ## is depth in metres. n.r. value not reported by original study. b References with the following code: TS = This study, T07 = Tamura et al. 38 , vD08 = van Dongen et al. 14 and R07 = Rowland et al. 12 .
There are two explanations for the stratigraphy present, firstly, that the deposition was autocyclic and the stratigraphy is a result of avulsion of the major river channel. However, secondly, and perhaps more interestingly, deposition of the YHF started at about 1990 to 1820 cal years BP (1,900 ± 40 14 C years BP). These dates coincide with the onset of the modern flood pulse at Tonle Sap which occurred about 1,660 cal years BP and initiated a lower energy period of the Mekong river through the Cambodian lowlands 43 . This would have reduced the load bearing capacity of the river, reducing the possible transportable grainsize. So it is possible that the appearance of the YHF at this study site and the onset of Tonle Sap flood-pulse are connected. However, without a more detailed sedimentary study, it is impossible to conclusively state at this moment which hypothesis is most accurate.
Previous studies have noted the unusual age-depth pattern of some sediments, in that sediments do not always increase in age with depth 14 . At DS sediment age increases with depth down to 27 m but is followed by younger sandy sediments present below 27 m 14 . Similarly, LR09 (this study) has an age of 8,000 ± 40 14 C years BP at 39 m depth but at 45 m the age is only 2,930 ± 40 14 C years BP. It is notable that this occurs at both DS and LR09 which are near rivers and also in the sandy sediments below 30 m.
Such an age-depth profile is not consistent with any model of sedimentation and is therefore likely to be the result of other factors. There are two potential causes of the unusual age-depth profile. Firstly, the unusual age-depth profiles are in deeper (>35 m) sandy sediments near rivers leading to the possibility of infiltration of river POC into the sedimentary OC. Secondly, the possibility of small levels of contamination cannot be ruled out, sandy sediments are less well consolidated than clays and difficult to sample at greater depths. Given that the sedimentary OC concentrations at these sites are extremely low, low levels of infiltration or contamination could have a relatively large effect on the radiocarbon dates. Thus, caution should be given when interpreting data from sandy sediments below 30 m particularly when a site is located near a river and sampling has involved wet drilling. The distribution of OC in the sediments in terms of quantity and type is clearly strongly related to grainsize. The TOC correlates moderately with grainsize, with the highest concentrations in the clay sediments (mostly YHF and EHF). Rapid sedimentation has been associated with the accumulation and preservation of OC in this region 44 and clayey sediments are associated with lower energy clay deposition than higher energy sandy deposition 48 . Given that most POC in the Mekong is clay (<63 μm) 49 it is expected that most is deposited with clay particles in low energy floodplains. There is a small statistical difference between the concentrations of OC  Fig. 1) and ## refers to the depth in metres. Also, samples from earlier studies labeled $$-## where $$ is the location (see Fig. 1) and ## is depth in metres. b Concentration of aforementioned lipid per mass sediment. c Proportion of aforementioned lipid per mass organic carbon. d Carbon preference index of aforementioned lipid. e HMW n-alkanoic acid to Σ(HMW n-alkane ratio + HMW n-alkanoic acid). f References for data with the following codes: TS = This Study, vD08 = van Dongen et al. 14 , HR07 = Rowland et al. 12 .  in the YHF and EHF with mean (±standard deviation) concentrations of 0.46 ± 0.28% (n = 23) and 0.78 ± 0.58% (n = 10) respectively and a Student's p-value of 0.06. This means that to all intents and purposes a similar amount of OC should be considered within both clay facies. The distribution of different types of organic material is consistent with earlier studies 11,12,14,16,17 , showing that thermally mature HMW n-alkanes in these aquifers are largely restricted to the sandy layers: this is probably because these are upwardly migrating petroleum and the sand, unlike the clay, has large pore sizes that can be exploited. The correlation between grainsize and CPI shows that sandy sediments primarily contain thermally mature OC which previous studies have shown to be bioavailable but at relatively low concentrations [12][13][14]16,17,50 . The results shown in Fig. 4 clearly indicate that the plant material is mostly, but not exclusively, associated with clay layers and that the concentration of plant OC is significantly higher than thermally mature OC. This conclusion, that plant derived OC is mostly restricted to the clay layers, is also supported by another proxy, the C/N ratio. Higher C/N ratios indicate that OC is likely to be plant derived 30 and at our site clay layers have higher C/N (see supplementary information Table 1), this is consistent with a study of broadly similar sediments in Vietnam 16,19 .
There appears to be a relationship between stratigraphic position of OC and the relative levels of oxidation. The n-alkanoic acid to n-alkane ratio clearly shows that of the plant OC, that in the EHF has undergone more oxidation than the younger facies (Fig. 6). The levels of oxidation in the EHF are statistically higher than in the YHF (Student's p-value = 0.02) implying that the older plant OC is more degraded than younger OC, this could be due to the longer time available for degradation in the older layers or could be related to differences in bioavailability between the two layers.
These results demonstrate that geomorphological processes which formed deposits at this study site are quite different to those which formed the deposits at the highly detailed core of KS which is located 20 km further south 38,41 . This is important because many studies of arsenic in groundwater at this site have used KS to interpret these sediments due to the similarities in stratigraphy across much of the Kandal Province area and at KS 20,22,23,35,45,[51][52][53][54][55][56] . This means that, despite its quality, we suggest that KS is not a suitable framework for this area of Kandal Province as the site is under a different geomorphological system. Therefore these results indicate that future studies of this area should use a locally appropriate geomorphological framework for reference in future research.
These results clearly show that the concentrations and bioavailability of sedimentary OC at our study are related to stratigraphy and grainsize which in turn is controlled by wider regional environmental and climatic change throughout the Holocene. Previous studies have already shown that thermally mature (petroleum-derived) OC exists in the sands and is bioavailable, yet the concentrations are very low [11][12][13][14]16,17,50 . These results show that a much greater reservoir of plant derived OC exists in the clays both from the early Holocene (>7,000 years BP) and more recent Holocene (<2,000 years BP). The oxidation proxies show that the old plant derived OC is much more oxidised than young plant derived OC. This could be because the older sediments have had more time to oxidise than the younger sediments but it could also relate to differences in the bioavailability between the two layers. If the latter is true then this indicates that variations between sediments and/or the nature of the OC within them could have an effect on the bioavailability of the OC. Understanding what, if any, effect this has had on the release of arsenic is the next important challenge.

Methodology
Sediment collection. Coring was performed by manual rotary drilling using a cutting auger attached to the end of a hollow steel pipe up to a maximum depth of 30 or 45 m. For detailed descriptions of the sampling methods see Richards et al. 45 . About 100 g (though sample size varied greatly) of sediments was collected every 3 m through manual hammering of a custom made stainless steel sampler attached to a thinner drilling rod which was lowered through the middle of the hollow drilling pipe. Sediment cores were captured in an internal acrylic sampling tube (25 mm or 50 mm) capped with a one-way valve made from bottled water caps and removed from the acrylic sampling tube using a purpose built stainless steel sample plunger or manual hammering.
All sediment samples for organic and inorganic analysis were placed in foil bags which had previously been furnaced to 350 °C for 3 hours to remove any organic contaminants. The foil bags were placed in resealable polyethylene bags and flushed with nitrogen to minimise oxidation. A subset of sediment sample for later grain size analysis was stored in polythene bags (without a foil bag) and not flushed with nitrogen. All samples were placed in a polystyrene chiller containing ice packs before transport within a few hours of collection to the local laboratory where they were re-flushed with nitrogen and frozen. Samples were transported back to Manchester by non-temperature controlled air freight and once in Manchester samples were stored in a freezer at −20 °C, prior to analysis.
Inorganic and bulk sediment characterisation. Full method details for all techniques are presented in the supplementary information. Total OC (TOC), reported as % w/w relative to total sediment, was measured in the Faculty of Life Sciences, University of Manchester, using an elemental analyser (Vario EL Cube, Elementar). Grain size analysis was conducted at the British Geological Survey (Keyworth, UK) using laser diffraction (LS 13 320 Laser Diffraction Particle Size Analyzer, Beckman Coulter, UK), enabled with Polarization Intensity Differential Scattering which accounted for non-spherical, sub-micron particles as previously described 36,57 . Data analysis was conducted using Gradistat_v8 58 .
Radiocarbon analysis provides a model estimate of time since a carbon sample was in equilibrium with the atmosphere. All of these samples are bulk sedimentary OC, therefore, this time reflects the average age of the different carbon constituents and is related to the length of burial and lack of disturbance by active plant material. 14 C samples were prepared to graphite at the NERC Radiocarbon Facility, East Kilbride and analysis was carried out at the SUERC AMS Laboratory, East Kilbride using a 5MV accelerator mass spectrometer, National Electrostatics Corporation, Wisconsin, US 59,60 and the data reported in accordance with international practice 61 . Radiocarbon ages are reported as both 14 C years before present (BP) and as calibrated (cal years BP). Radiocarbon calibration was undertaken using OxCal 62,63 using the IntCal13 calibration curve 64 . Calibrated ages presented are the highest probabilities calculated from an analytical confidence of 2 sigmas. Where possible the calibrated ages are presented with a confidence of 95.4% but in some cases where a significantly narrower age range could be calculated from a moderately lower probability, this was accepted (see supplementary information Figures 7 and 8). δ 13 C of sedimentary OC is a bulk proxy commonly used to distinguish the source of different plant groups and marine and terrestrial inputs 19,30 . A sub-sample of CO 2 (from the radiocarbon analysis) was used to measure δ 13 C using a dual-inlet mass spectrometer with a multiple ion beam collection facility (Thermo Fisher Delta V), in order to correct 14 C data to −25‰ δ 13 C VPDB .
Organic extraction, separation and analysis. Lipid biomarkers are used to analyse organic processes and sources of the sedimentary OC. Here, the term HMW refers to a carbon chain length of 21 to 35 for n-alkanes and 20 to 30 for n-alkanoic acids as the general regional pattern 11,12,14,16,17 . The concentration of HMW n-lipids (alkanes and alkanoic acids), both as a proportion of gram sediment and gram OC, is calculated to show the dominance of these compounds within the sediments and OC. The CPI, the proxy for thermal maturity, is calculated for all lipids and is the predominance of odd-over-even in the case of n-alkanes, and even-over-odd in the case of n-alkanoic acids 28,29 . CPI is calculated using the same method as van Dongen et al. 14 , and any value >2 is considered to be thermally immature. The average chain length (ACL) is calculated using the same method as van Dongen et al. 14 . The ratio of HMW n-alkanoic acids to Σ(HMW n-alkanes + HMW n-alkanoic) acids, e.g. the n-alkanoic acid to n-alkane ratio, is used to assess the extent of oxidation of OC 25,26 . This ratio is used because it expresses the end-members of organic oxidation and therefore the clearest separation of reduced and oxidised compounds. However, it is only suitable for those sediments which are immature, e.g. with HMW n-alkane CPI values ≫ 2. This technique assumes that (i) any plant derived organic material will have a constant initial ratio of acids to alkanes and (ii) if different samples oxidise to different levels they will have different ratios of acids to alkanes 27 .
Organic analysis was conducted by extracting the total lipid extract using organic solvents and Soxhlet apparatus, followed by appropriate separation techniques based on earlier studies 11,14,16,26 . All analysis was conducted using Gas-Chromatography Mass Spectrometry (GCMS) using an Agilent 789 A GC interfaced to an Agilent 5975 C MSD. Full details are presented in the supplementary information.
Data analysis and presentation. All statistical analysis was conducted using R 65 . Bulk sedimentary concentrations were kriged to produce geochemical maps using the package geoR 66 (variograms shown in supplementary information). All statistical analyses were plotted using the R package ggplot2 67 .