Both neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR) have been suggested as novel and useful biomarkers for the diagnosis or prognostic prediction of diseases.1, 2 A high NLR level was shown to be an independent predictor of mortality in patients undergoing cardiac revascularization3 and in patients with myocardial infarction.4 Elevated NLR levels were also related to a poor prognosis of various cancers, such as esophageal, pancreatic, lung, ovarian and hepatocellular cancer.5, 6, 7 Similar to NLR, PLR was also reported as an index for diagnosis or prognostic prediction of oncologic disorders and inflammatory diseases.8, 9 NLR and PLR thus may serve as biomarkers in patient populations. However, studies of variation in these biomarkers within healthy populations are scarce. Recently, we showed that variation in NLR and PLR levels is owing to genetic influences, with a broad sense heritability of 36% for NLR and 64% for PLR, using a twin family epidemiological design.10 Here, we investigate if the significant heritability estimates can be explained by common single-nucleotide polymorphisms (SNP) and if we can identify the genes that play a role in these two blood ratios. We also investigate if our findings are unique to the two ratios or whether their count-components (that is, lymphocyte, neutrophil and platelet counts) show similar results.

No genome-wide association study (GWAS) has yet been published for NLR and PLR. However, GWASs on their subcomponents, the neutrophil, lymphocyte and platelet counts were carried out in different populations including European,11, 12, 13, 14 African–American,15, 16, 17, 18 Korean19, 20 and Japanese populations.21, 22 These GWASs for blood cell count in different cohorts have identified multiple genetic loci for blood cell components. For neutrophil count, the DARC gene promoter at 1q23.3 was identified in African–American populations23 and loci at 20p12 (PLCB4 gene)22 and 7q21.2 (CDK6 gene)21 were found in the Japanese population. The chromosomal region nearby PSMD3 on 17q21 was associated in a GWAS meta-analysis in both Japanese and European ancestry cohorts, but not in African–American cohorts. The variants at AK123889 on 6p21.33 were novel findings in a European ancestry cohort,15 and were also confirmed by meta-analysis.24 For lymphocyte count, two genetic variants nearby EPS15L1 gene on 6p21 and LOC101929772 on 19p13 were identified.18 For platelet count, many loci were identified: SH2B3 on 12q24, ARHGEF3 on 3p14.3, ZBTB9-BAK1 on 6p21.31, KIAA0232 on 4p16.1, EGF on 4q25, PNPLA3 on 22q13.31 in the Korean population.19, 20 ARHGEF3 on 3p14.3, PEAR1 on 1q23.1, BMPR1A on 10q23.2, loci on 6p22, 7q11, 10q21, 11q13, 20q13 were detected in the African–American population16, 17, 25 and over 55 loci including CCDC71L-PIK3CG, ARHGEF3, BAK1 and HBS1L-MYB in the European population.11, 12, 14, 26

Some blood cell count loci show pleiotropy: they influence multiple hematological indices.21, 26, 27, 28, 29 For example, the genetic region nearby AK123889 on 6p21.33 was associated with neutrophil count, lymphocyte count and total white blood cell count18, 24 and the DARC promoter on 1q23.2 was associated with neutrophil count, monocyte count and total number of white blood cells.15, 30 The intergenic HBS1L-MYB variants were associated with total white blood cell count and also with number of neutrophils, lymphocytes, erythrocytes, eosinophils, monocytes and platelets.21, 31 Therefore, we also examined genetic effects across the ratios and constituent cell counts.

We conducted five GWASs to identify genetic variants associated with NLR, PLR and neutrophil, lymphocyte and platelet counts. The discovery cohort consisted of 5901 healthy participants from the Netherlands Twin Register (NTR)32, 33 and replication of top results was sought in the TwinsUK cohort consisting of 2538 participants.34 Furthermore, all top SNPs, which showed a significant association with our phenotypes of interest, were selected for an expression quantitative trait loci (eQTL) analysis35 to test whether these variants have an effect on the gene expression level. For the ratios, we estimated the proportion of trait variance explained by significant SNPs from the GWAS and the variance explained by SNPs that were associated with lymphocyte, platelet and neutrophil counts.36, 37 Using the summary statistics of the GWAS results, we applied linkage disequilibrium (LD) regression to determine the variance explained by all autosomal SNPs, to examine polygenetic effects between NLR and PLR, and to determine the genetic correlation between variants affecting the two ratios, their subcomponents and LD-Hub published GWAS.38, 39, 40

Materials and methods


All participants were registered with the Netherlands Twin Register (NTR) and had taken part in biobank projects conducted between 2004 and 2011.32, 33 After removing outliers (defined as values outside mean ±5 × s.d. for NLR, PLR or their subcomponents), the sample size for PLR and NLR was 9434 individuals from 3411 families. We further excluded individuals who met one or more of these criteria: (1) illness in the sampling collection week (N=539); (2) values of CRP15 mg/l (N=287); (3) basophile count>0.02 × 109/l (N=151); (4) report of chronic immune disease or cancer (N=83); and (5) anti-inflammatory medication, glucocorticoids or iron supplements (N=537). When linking these data to the genetic data, 6112 individuals had both phenotype and genotype data. After exclusion of 211 individuals with non-Dutch ancestry (based on genotype information), the sample size was 5901 individuals. Written informed consent was obtained from all participants and the Medical Ethics Committee of the VU Medical Centre approved the study protocols.

Blood sampling and cell counts

Blood samples were obtained during a home visit, or sometimes a work visit, between 7 and 10 a.m. Participants were instructed to fast overnight and to refrain from heavy physical exertion and medication use (if possible) in the morning before the visit. Smokers were asked to abstain from smoking at least 1 h prior to the visit. For fertile women without hormonal birth control, when possible, an appointment was made within the 2nd to 4th day of the menstrual cycle and women taking hormonal birth control were visited during the pill-free week. Peripheral venous blood samples were collected into multiple anticoagulant vacuum tubes. Within 3–6 h upon blood withdraw tubes were transported to the laboratory. During the visit, data were also collected on body composition, the presence of chronic diseases, medication use and smoking history.33

The hematological profile, including the number of neutrophils, lymphocytes and platelets, was obtained from 2 ml EDTA tubes using the Coulter system (Coulter Corporation, Miami, FL, USA). NLR was calculated as the absolute neutrophil count (109/l) divided by the absolute lymphocyte count (109/l) and PLR was calculated as the absolute platelet count (109/l) divided by the absolute lymphocyte count (109/l).

Genotype data

For DNA isolation, we used the GENTRA Puregene DNA isolation kit.41 Genotyping was done on multiple chip platforms, with a number of overlapping participants. Chronologically the following platforms were used: Affymetrix Perlegen 5.0 (N=1718), Illumina 370 (N=424), Illumina 660 (N=1103), Illumina Omni Express 1 M (N=346) and Affymetrix 6.0 (N=3602). Genotype calls were made with the platform-specific software (Birdsuite, APT-Genotyper, Beadstudio) for each specific array. Quality control was done within and between platforms and subsets. For each platform, the individual SNP markers were lifted over to build 37 (HG19) of the Human reference genome, using the LiftOver tool (‘’). The data were then strand aligned with the 1000 Genomes GIANT phase1 release v3 20,101,123 SNPs INDELS SVS ALL panel. SNPs from each platform were removed if they had ambiguous locations, mismatching alleles with this imputation reference set or the allele frequencies differed >0.20 compared with the reference. From each platform, SNPs were also excluded if meeting the following criteria: a minor allele frequency (MAF) <1%, Hardy–Weinberg Equilibrium with P<0.00001 and call rate <95%. Samples were excluded from the analysis when their expected sex did not match their genotyped sex, the genotype missing rate was above 10% or the PLINK 1.07 F inbreeding value was either >0.10 or <−0.10.

After these steps, the data of the individual arrays were merged into a single data set using PLINK 1.07.42 Within the merged set, identity by state sharing was calculated between all possible pairs of participants and compared with the known NTR family structures. Samples were removed if the data did not match their expected identity by state sharing. The concordance rate of DNA samples on multiple platforms for overlapping SNPs generally exceeded 99.0% after data cleaning. The Hardy–Weinberg Equilibrium, MAF- and the reference allele frequency difference <0.20 filters were re-applied in the combined data. As a final step, SNPs with C/G and A/T allele combinations were removed when the MAF was between 0.35 and 0.50 to avoid incorrect strand alignment. Phasing of all samples and imputing cross-missing platform SNPs was done with MACH 1.43 The phased data were then imputed with MINIMAC44 in batches of ~500 individuals for the autosomal genome using the above 1000G Phase I integrated reference panel for 561 chromosome chunks obtained by the CHUNKCHROMOSOME program.45 To avoid issues having SNPs from different platforms partly imputed and partly genotyped we took the re-imputed calls for all genotyped SNPs. After imputation of these SNPs, we generally find a high concordance between re-imputed SNPs and the original genotype (0.9868). The mean imputation quality R2 metric is 0.38 (based on all 30 051 533 imputed autosomal SNPs). After imputation, SNPs were filtered based on the Mendelian error rate in families, which was calculated from the best guess genotypes in families (trios or sib-pairs with parents) using first GTOOL to calculate best guess genotypes and then PLINK 1.07 to analyze the data. SNPs were removed if the Mendelian error rate >0.02, the imputed allele frequency differed >0.15 from the 1000G reference allele frequency, MAF<0.01 or R2<0.80. Hardy–Weinberg Equilibrium was calculated on the genotype probability counts for the full sample, and SNPs were removed if the P-value<0.00001. This left 6 010 458 SNPs for the GWAS.


Generation of genetic relatedness matrices

GRMs with the values of the identity by state allele sharing for a given set of SNP markers between all possible pairs of individuals were calculated with the GCTA software,46 after removing SNPs that showed significant genotyping differences between platforms (P<0.0001); 6 009 498 SNPs were retained, which is sufficient for GRM estimation.46 The SNP data were transformed to best guess Plink binary format, and subsets were made for each of the 22 chromosomes. We generated 25 GRMs: one GRM containing only the significant GWAS SNPs for PLR from our own study, and one GRM containing the SNPs known to be involved in the cell counts. A third GRM was constructed for closely related individuals (identity by state>0.05), pairwise relationship estimates smaller than 0.05 were set to 0 in this matrix.36 This matrix is used as second covariate matrix in the GWAS and heritability studies to account for the family structure.36 Including family members in the GWAS increases the power to detect genes, and using a mixed linear model correction as employed in GCTA, corrects for the statistical inflation that is caused by including the related members.36 Finally, 22 GRM matrixes were made that include all autosomal SNPs, except for the one chromosome on which the SNP is present that is tested in the GWAS: the LOCO (Leave One Chromosome Out) strategy.47 These matrixes are used in the GWAS as covariates to remove any remaining statistical inflation due to subsample stratification.


The first three Dutch principal components as were generated with the EIGENSOFT software were used as covariates in the GWAS (Supplementary Figures 1 and 2).48, 49 Additional covariates were age, sex and genotype platform. For NLR and PLR as well as for the three sub-component counts we modeled the phenotypes as being influenced by SNP and these six covariates. Analyses were performed with the GCTA software running a mixed linear association model, including the LOCO GRMs for chromosome 1–22, and the close-related GRM.36, 50 For the GWAs, the significance threshold was P-value<5 × 10−8.51

GWAS replication

Replication of significant GWAS hits for NLR, PLR or individual blood cell counts, which were not previously found, was examined in TwinsUK. TwinsUK is an UK-based twin registry with a focus on the genetic and environmental etiology of age related complex traits and diseases.34 Samples from TwinsUK were genotyped using the Illumina Hap317K and Hap610K assays (Illumina, San Diego, USA) following standard procedures. Normalized intensity data were pooled and genotypes called on the basis of the Illumina’s algorithm.52 No calls were assigned if the most likely call was less than a posterior probability of 0.95. SNPs were excluded if they that had a low call rate (<95%) and/or Hardy–Weinberg P-value<10−4. Subjects were also removed if the sample call rate was <95%, autosomal heterozygosity was outside the expected range, genotype concordance was over 97% with another sample and the sample was of lesser call rate. Imputation of genotypes was carried out using the software IMPUTE.53 The best guess Plink binary format data were used to conduct the replication analysis. The sample size of the TwinsUK data set was 2538 subjects with genetic and phenotypic information, after values outside mean±5 s.d. in the phenotype of interest were removed. We tested the association with the SNPs using a linear mixed model, in which the traits were regressed on the SNPs, whereas correcting for age and sex as fixed effects variables.

eQTL analysis

To determine the effects of the GWAS located genetic variants for both ratios as well as the constituent counts, we conducted eQTL analysis, using the NESDA-NTR Conditional eQTL Catalog (online accessible:, 54 The details of the eQTL analysis are described in the Supplementary Method Material. In brief, eQTL effects were examined with a linear model approach using MatrixeQTL55 with expression level as dependent variable and SNP genotype values as independent variable. eQTL effects were defined as cis when probe set–SNP pairs were at distance <1M base pairs (Mb), and as trans when the SNP and the probe set were separated by more than 1 Mb on the genome according to the Human reference genome HG19. To determine whether the observed cis and trans effects may reflect causal mechanism we checked the LD of our top SNPs with the top SNPS identified for gene expression in the implicated genes. As gene expression is related to blood composition we repeated the analysis with and without correction for blood composition components (specifically mean corpuscular volume, red cell distribution width, and neutrophil, lymphocyte, monocyte, eosinophil, basophile and platelet counts).

SNP heritability and genetic correlations

The variance explained by the significant SNPs in our GWAS for PLR was estimated with the GCTA software.46 The variance explained in NLR and PLR was estimated with GCTA for the known loci from literature for neutrophil, platelet and lymphocyte blood cell counts. For each analysis we included family members and therefore included the closely related GRM under the Restricted Maximum Likelihood analysis procedure within GCTA.36 Sex, age, genotype platform and three Dutch principal components were used as covariates. The variance explained by all SNPs was estimated by LD regression between our computed GWAS summary statistics effect sizes and the expected Hapmap 3 LD.38 To do this, we used the HapMap3 LD scores (NSNPs=1 293 150), computed for each SNP based on the LD observed in European ancestry individuals from 1000 Genomes project (online accessible: The criteria of passing quality control for SNPs were the default by LD regression: imputation quality info>0.90, MAF>0.01. SNPs with invalid P-values (P>=1 or P<0) were excluded. In addition, variants that are not SNPs (for example, insertion–deletions), strand ambiguous SNPs and SNPs with duplicated RS numbers were also excluded. After quality control, the number of SNPs for these analyses reduced to 951 097.

Genetic correlations among the ratios and counts were estimated using LD regression.38, 39 The principle of this technique is that the genetic correlation of two traits can be calculated by the slope from the LD regression on the product of effect sizes (z-score) for two phenotypes. The genetic correlations between published GWAS available online and our summary statistics were estimated with LD-Hub.40 For these regression analyses we selected the list of recommended SNPs from the website and extracted those from the GWAS results for the counts and ratios. Note that this list excludes the chromosome 6 MHC region. A total of 1 210 244 SNPs were used in the analyses.40 Finally, phenotypic Pearson correlations between PLR, NLR and the constituent cell counts were calculated with the SPSS 22 program.



Summary statistics for the phenotypes of interest are given in Table 1 and GWAS results for NLR and PLR are summarized in the Manhattan and QQ plots in Figures 1 and 2, respectively. The GWAS inflation factors (λ) were 0.9963005 for NLR and 1.020995 for PLR, indicating that there is no hidden stratification left in the studied GWAS sample. For NLR, no loci were found that reached genome-wide significance level. For PLR, there were 20 SNPs located between the HBS1L and MYB genes on chromosome 6q23.3 in the HBS1L-MYB region, which were significantly associated with the phenotype (in Figure 2 Manhattan, Table 2 descriptive and Figure 3 locus zoom). The top SNP rs9376092 of this locus has a C allele, which significantly increases PLR level (β=5.48, P=2.75 × 10−9). This SNP was also significantly associated with platelet count (β=6.98, P=4.05 × 10−8), but not with lymphocyte count (β=−0.039, P=0.008). In the TwinsUK sample, rs9376092 replicated with a similar effect for PLR (β=4.766, P=0.004) as well as platelet count (β=6.053, P=0.002). Here again, the SNP was not associated with lymphocyte count (β=0.014, P=0.49) (Table 3).

Table 1 Summary statistics of neutrophil–lymphocyte ratio (NLR) and platelet–lymphocyte ratio (PLR), the constituent blood cell count phenotypes and age in males and females
Figure 1
figure 1

(a) Manhattan and (b) QQ plot for the neutrophil–lymphocyte ratio (NLR) GWAS results with SNPs having a MAF>0.01. A full color version of this figure is available at the Journal of Human Genetics journal online.

Figure 2
figure 2

(a) Manhattan and (b) QQ plot for the platelet–lymphocyte ratio (PLR) GWAS results with SNPs having a MAF>0.01. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 2 The significant SNPs associated in our study for PLR and the P-values for the platelet and lymphocyte counts, these SNPs are all located in the known intergenic HBS1L-MYB region
Figure 3
figure 3

Regional plot for the rs9376092 association with PLR level.

Table 3 The top SNP rs9376092 GWAS results in the NTR data and TwinsUK data

Manhattan and QQ plots for the GWAS of neutrophil, lymphocyte and platelet counts are given in Figures 4. For neutrophil counts we found significant associations (P<5 × 10−8) for 65 SNPs in LD in the PSMD3 locus (Table 4). For lymphocyte count we did not detect any significant genetic associations. For platelet count, a locus in CCDC71L-PIK3CG on 7q22.3 showed the strongest signal in our study (P=3.45 × 10−10). We also detected genetic variants for platelet count within ARHGEF3, BAK1 and HBS1L-MYB. In Supplementary Table 1 we report the known genetic variants from literature for the three blood cell counts of interest and their significance level as reported previously, together with the P-values obtained from our GWAS study. For neutrophil count we replicated the PSMD3 locus, which also showed an indication of association with PLR (P<1.0 × 10−3). The AK123889 locus showed a similar pattern for PLR (P=3.67 × 10−4), and this locus also had a P-value of 0.001 for lymphocyte count. For lymphocyte count, the known locus rs25224079 was marginally significant (P=3.02 × 10−4), while this locus showed a stronger association with PLR (P=6.56 × 10−5). We did not detect an association for lymphocyte count with the other known locus ESP15L1 (P=0.107). For platelet count, our top hit CCDC71L-PIK3CG was a replication of earlier studies and it was also associated with mean platelet volume.11, 26 We also replicated the loci at ARHGEF3, BAK1 and HBS1L-MYB, with the latter being associated with PLR as well. Furthermore five loci showed some signal at (P<1.0 × 10−3) for platelet counts: PDIA5, MEF2C, JMD1C, rs7149242 and TAOK1. Other platelet count loci showed some association (P<1.0 × 10−3) with related phenotypes: RCL1, JMD1C, rs7149242 and SNORD7-AP2B1 with PLR, and MICA with lymphocytes.

Figure 4
figure 4

(a) Manhattan and (b) QQ plot for the neutrophil count GWAS results with SNPs having a MAF>0.01(λ=1.011742). A full color version of this figure is available at the Journal of Human Genetics journal online.

Figure 6
figure 5

(a) Manhattan and (b) QQ plot for the platelet count GWAS results with SNPs having a MAF>0.01 (λ=1.018586). A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 4 Significant and known loci associated with neutrophil and platelet blood cell count within the NTR study

eQTL effects for significant SNPs

Whole blood cis and trans eQTL analysis was performed for the top significant SNPs per locus identified in the GWAS for PLR (in Table 3) and blood cell counts (in Table 4), with and without correcting for blood composition. The eQTL results are shown in Supplementary Table 2. Information on the function of the genes and the involved pathways was retrieved from the GeneCards website (online accessible: Cis effects were found for rs8081692: it increases GSDMB, MSL1 and KRT23 gene expression and decreases GSDMA expression. However, after blood components correction, only GSDMA gene expression was left upregulated by rs8081692. The locus rs169738 was found to increase HLA-DPB1 and decrease TAPBP and HLA-DPA1 expression, which remained after correcting for blood composition. For rs9376090, we detected a significant negative association with ALDH8A1 gene expression, but this SNP is not in LD with the top rs4646871 SNP of the ALDH8A1 gene.

Trans effects for both rs9376090 and rs9376092 were found to increase TMEM158 and HBE1 gene expression, and whereas the trans effects were alleviated when correcting for blood composition, they remained significant. In addition, some eQTLs for genes involved in platelet activation, signaling and aggregation pathways, were present for the uncorrected expression results but disappeared when correcting for blood composition: GNAS (for rs9376090), AQP9 and CREB5 (for rs8081692). The top SNP rs11925835 nearby ARHGEF3 gene was found to regulate several sets of genes involved in: (1) platelet activation, signaling and aggregation (ITGB3, PPBP, ITGA2B, PF4, GP1BA, PRKAR2B, C6orf25, SELP, THBS1, GNG11, CLU, SPARC, F13A1, VCL, EHD3, CD9, PDGFA, MGLL, GUCY1A3, TBXA2R, MMRN1); (2) immune system (TREML1, CD9, CD226); and (3) metabolism (PTGS1, VS1G2, EVOVL7, MGLL, ALOX12, MFAP3L and NDUFAF3). In addition to these genes, there were several eQTLs for genes that regulate cell division, proliferation, and differentiation such as ABL1M3, LMSM1, c7orf41, FHL1, MAX, RSU1, TSPAN9 and MTPN. Furthermore, some genes play a key role in hematopoietic stem cell differentiation pathways and lineage-specific markers, such as PEAR1 and CD226. For the majority of these genes the effect was alleviated after correction for blood composition. Some trans effects were no longer present after the correction, such as the effects for TPM1, EHD3, PDLIM1, MGLL, LMNA, SLA2, ELOVL7, MGLL, TBXA2R, RSU1, MFAP3L, NEXN, CMTM5, ALOX12, PGRMC1, SEPT5, CDK2AP1, CD226, NDUFAF3, MMRN1, TSPAN9 and MTPN.

SNP heritability and correlations among phenotypes

The SNP heritability of NLR and PLR was estimated at 2.39% (s.e.=0.0816) and 14.12% (s.e.=0.0844), respectively, using LD regression (Table 5). With GCTA the estimated variance explained by the known loci from literature was 0.52% (s.e.=0.300) for NLR and 3.28% (s.e.=0.700) for PLR within our study. Finally, the significant SNPs for PLR, the single SHB1L-MYB region found in our study, explained 0.50% (s.e.=0.600) of variance.

Table 5 LD regression results for NLR, PLR and the blood cell counts

Significant positive phenotypic correlations were observed between nearly all counts with the exception of NLR and platelet count, and the significantly negatively correlated combinations NLR and lymphocyte count, PLR and neutrophil count, and PLR with lymphocyte count (Table 6). Significant and nearly significant genetic correlations were found between PLR and platelet count (r=0.4565, P=0.0309) and between PLR and lymphocyte count (r=−0.4858, P=0.0701). All other genetic correlations were not significant. In Supplementary Table 3 the genetic correlations between the ratios, counts and all available GWAS phenotypes at May 2017 from LD-Hub are presented. There are clear genetic correlations between the consortium platelet count GWAS, and our PLR, platelet counts and neutrophil count. Furthermore, there is a relation between PLR and HDL cholesterol, and for NLR there is a genetic correlation with Crohn’s disease. For the counts separately correlations are present with several diabetes related traits, kidney disease, BMI, (over-)weight, coronary artery disease, autoimmune disease, smoking and lung function assuming a threshold P-value of 0.05.

Table 6 Phenotypic and genetic correlations for NLR and PLR levels and blood cell counts


We studied the genetic architecture of NLR and PLR as well as the genetic relationship between NLR, PLR and the corresponding immune cell counts. The intergenic HBS1L-MYB region is a well-known locus for hematological parameters such as red blood cell count,56 platelet count, hemoglobin level,57 MCHC level58 and blood-related diseases such as myeloproliferative neoplasms,9, 59 beta-thalassemia60 and sickle cell anemia.31 We found this intergenic HBS1L-MYB region to be significantly associated with PLR. HBS1L-MYB intergenic variants reduce the transcription factor binding and affect long-range interactions with MYB and MYB expression levels.61 This region was first identified as a quantitative trait locus controlling fetal hemoglobin level and is associated with iron deficiency anemia, beta-thalassemia and sickle cell disease.62, 63 The MYB gene encodes a protein with three HTH DNA-binding domains that functions as a transcription regulator. This protein plays an essential role in the regulation of hematopoiesis and lymphocyte differentiation. This gene can be aberrantly expressed, rearranged or undergo translocation in leukemia’s and lymphomas, and is thus considered to be a (proto-)oncogene.64, 65, 66 The HBSIL (Hsp70 subfamily B suppressor 1-like) gene encodes a member of the GTP-binding elongation factor family. A single-nucleotide polymorphism in exon 1 of HBS1L gene is significantly associated with severity in beta-thalassemia/hemoglobin E as found in a sequencing study67 and verified in several other studies.68, 69 Recently, this gene has been associated with several traits, including erythrocyte and platelet count11, 31, 70 and cholesterol level.71 A pleiotropic association study on a wide number of hematological traits found that rs9373124, also in the HBS1L-MYB region, was significantly associated with all of the evaluated hematological traits (P<0.005), including white blood cell count, red blood cell count and platelet count.21

The eQTL results show that some of the GWAS top SNPs for PLR and blood cell counts regulate the expression of genes, which are mainly involved in immune system pathways: platelet activation, signaling and aggregation; metabolism; cell division, proliferation and differentiation; and hematopoietic stem cell differentiation pathways and lineage-specific markers. These results provide new genetic targets for immune biomarkers and may inform future functional studies. In our GWAS study, SNPs with significant associations for NLR were not identified, which is consistent with the small SNP heritability found with LD regression analyses. Compared with PLR, NLR shows more phenotypic plasticity, because neutrophils are part of the immune response to viral infections, autoimmune diseases, acute-phase reactions and several drugs.72 Furthermore, compared with the longer lifespan of platelets (8–9 days), the lifespan of neutrophils is shorter (a few hours to max 5 days).73, 74 The phenotype is therefore much more dependent on environmental effects, for example, the time of measurement and health state of the individual also indicated by our own heritability findings.10

By selecting only healthy individuals, we may have excluded the identification of genetic loci that are related to both, disorders and an effect on the counts or ratios. With this particular step it is also unclear whether the identified SNPs are affecting the risk for the disorders for which PLR and NLR are supposed to be the biomarkers. Another question is if the effect of the SNPs on the ratios and immune response remains the same once a person gets affected. We have examined the heritability of the ratios in our full sample of individuals, and the point estimates were not different. Although, there is currently no direct association between the top SNP or the HBSIL locus and myocardial infarction in the GWAS catalog, there are clear genetic correlations between cardiovascular traits, diabetic traits, HDL cholesterol, weight, smoking, lung function and the examined counts and ratios. The link between HBS1L and cancer was already established due to translocations in the genomic region.64, 65, 66

For both NLR and PLR a large part of the heritability is still not explained by common SNPs or genetic variants in LD. This may suggest that other genetic variants, such as rare variants and copy number variants need to be studied. Furthermore, the missing heritability might be high because of non-additive effects and genetic interactions, which are not taken into account with the current applied statistical models. Epistatic effects of genetic variants for hematological indexes are already found.25, 75 We thus assume that, especially for immune system phenotypes, gene–gene and gene–environment interactions need to be studied further.

The LD regression results show that polygenic effects, rather than confounding factors explain NLR and PLR variance in our study. We also demonstrated significant genetic correlation between PLR and platelet count, but none of the other correlations between ratios and cell counts were large enough to be significant. As we found no SNP effects on NLR, it is not surprising that no genetic overlap between NLR and PLR is detected, although the genetic background of the lymphocyte count is expected to be affecting both ratios.

In summary, our study found the HBS1L-MYB locus to be associated with PLR level and with platelet count. In addition, we verified three additional known loci for platelet count (rs342213 in CCDC71L-PIK3CG, rs169738 nearby BAK1 and rs11925835 nearby ARHGEF3) and one locus for neutrophil count (rs8081692 nearby PSMD3). We did not identify any locus or any significant SNP heritability for NLR. Although NLR and PLR are both utilized as predictive or prognostic biomarkers for the same diseases, and phenotypic correlations are present, there seems to be no major genetic overlap between the two biomarkers in our healthy population. The NLR and PLR responses associated with the disorders, thus likely represent the simultaneous influence of separate and multiple immune genetic pathways.

Figure 5
figure 6

(a) Manhattan and (b) QQ plot for the lymphocyte count GWAS results with SNPs having a MAF>0.01 (λ=1.022341). A full color version of this figure is available at the Journal of Human Genetics journal online.