The human reference genome is the fundamental necessity for almost all high throughput re-sequencing based biomedical research. On December of 2013 the Genome Reference Consortium (GRC) released human reference genome version GRCh38, which according to GRC, represents the most comprehensive representation of the human genome [1]. Compared to the previous version (GRCh37), GRCh38 is improved, amongst other features, by bridging gaps associated with segmental duplications along with the addition of “missing” sequences, with an emphasis on paralogous sequences. In the Next Generation Sequencing era, it is well understood that duplicated and paralogous sequences can lead to erroneous sequence alignments when using short read sequencing methods, especially if the sequence similarity between the duplicated regions is high. These alignment errors can in their turn lead to the misidentification of paralogous sequence variants as allelic sequence variants and when paralogous sequences encompass clinical relevant genes, this may have a critical impact on diagnosis and variant classification.

This seems to be the case of KCNE1, a gene that encodes for the regulatory subunit of potassium voltage-gated channel and is associated with both Jervell and Lange-Nielsen (MIM#612347) and Romano-Ward (LQT5, MIM#613695) forms of long QT syndrome (LQTS). In the previous version of genome assembly (GRCh37), it represented a unique sequence located in the 21q22.11-22.12 region. In the current version of GRCh38, apart from the previously known sequence, a paralogue sequence: KCNE1B, highly similar to KCNE1 (coding sequence similarity 98.45%), is located in the 21p11.2 region (Fig. 1a). When zooming into the coding region of the two genes, KCNE1B when compared to KCNE1, presents an alternative starting codon resulting in three extra amino acids at the beginning of the protein and two nucleotide substitutions corresponding to the SNPs rs1805127 (NM_000219.5:c.112 A > G, p.(Ser38Gly)) and rs1805128 (NM_000219.5:c.253 G > A, p.(Asp85Asn)) observed in the KCNE1 gene (Fig. 1b). This degree of sequence similarity between the two genes would render population frequencies reported for these substitutions (rs1805127: 0.6437 and rs1805128: 0.009302 in Genome Aggregation Database) along with all other SNP frequencies previously reported for KCNE1 gene disputable, as pinpointed by Schneider et al. [1] in their evaluation study of GRCh38.

Fig. 1
figure 1

a Location of KCNE1 and its paralogue KCNE1B on GRCh38 Chr 21. b Browser view of SNP rs1805128 where its presence in the paralogue sequence of KCNE1B is marked

In GRCh38, KCNE1B is a part of a 122,046 bp segment (CU633980.13) produced by the sequence analysis of a BAC clone (CH507-396I9) that was placed on 21p11.2 region but was flanked by unknown sequences. This entire contig (with the exception of 1844 bp) displays > 99.9% sequence identity with the corresponding region of GRCh37, where KCNE1 gene is located. This degree of sequence similarity for this extended region could be attributed to either a very recent duplication event or an historical duplication, where functional and structural constrains are applied, but it also raises suspicions on the correct placement of the contig. In an effort to address this problem, we designed two primer pairs targeting both the KCNE1 and KCNE1B coding region and performed a qPCR assay (suppl. file), in which targets were relatively quantified against the diploid albumin gene, as described previously [2] for three individuals. In all three samples KCNE1 was found to be in diploid state with a ratio of approximately 1 when compared to albumin, suggesting that KCNE1B may in fact represent a misplaced clone (Fig. 2). However, the copy number differences observed for several duplicated genes amongst genomes of variable ethnicity or even amongst individuals of the same ethnic origin [2] along with the limited number of the samples we tested does not permit us to rule out the existence of KCNE1B.

Fig. 2
figure 2

qPCR-based copy number genotyping. Amplification ratio using two set of primers targeting both KCNE1 and KCNE1B against the diploid albumin gene is plotted for three individuals that were tested. Error bars display one standard error

Under this prism, the presence and the functional importance of the two paralogues remains to be elucidated with further studies, especially since both genes are found to be mainly expressed in heart and kidney tissues [3]. Furthermore, the variant classification in LQT5 patients needs to be revisited, an effort recently undertaken by Giudicessi et al. [4] and Lane et al. [5]. In their work, they describe a mild LQTS phenotype that they denote as LQT5-lite and they designate the presence of common variants, like the p.(Asp85Asn) variant present in the paralogue sequence of KCNE1B, as conferring to a proarrythmic state. In this state patients carrying the p.(Asp85Asn) variant may experience QT interval prolongation when affected by a second ‘hit’, either endogenous (like the genetic background or the effect of female hormones) or exogenous (QT-prolonging medications or electrolyte abnormalities). Furthermore and although Lane et al. [5] recognize the discordance between the genotype prevalence for KCNE1 variants and the estimated prevalence of LQT5 subtype, they attribute it possibly to an oligogenic model of disease without considering the potential existence of the paralogue gene KCNE1B.

In conclusion, the ambiguity over the presence of two potentially active genes KCNE1 and KCNE1B needs to be clarified. Overall, the inheritance pattern of LQT5 seems to be more complicated than initially considered supporting the notion that the genetic basis of LQT, depending on the observed variants, can be shifted from the traditionally considered monogenic form to a more oligogenic one [6].