Main

Patients with classic Marfan syndrome (MFS) tend to be tall, with long limbs and fingers, as described by Marfan1 in 1896. The syndrome’s description has since evolved considerably, however, and a variety of clinical manifestations are now well established. The genetic background, that is, mutations in the fibrillin-1 gene (FBN1),2 was discovered much later but is now a cornerstone in the diagnosis of MFS.

The definition of MFS is based on several sets of criteria, the latest being the Ghent II nosology.3 It is considered to be an autosomal-dominant disorder associated with a mutation in FBN1 and phenotypical manifestations of MFS. The Ghent II criteria highlight the importance of a FBN1 mutation test; for this reason, the focus on genetic testing has increased considerably when either diagnosing or excluding MFS.4

With the introduction of next-generation sequencing, the sequencing price and the process time have been reduced considerably, and the amount of genetic sequencing data has been multiplied, resulting in an enormous amount of data that need to be evaluated. Moreover, access to genetic testing in the clinical setting is becoming more common and widespread, and the number of patients who are genetically tested is rapidly increasing.

The vast majority of genetic variants found are benign and represents a part of our genetic variation, but some—maybe just one in a given patient—may be pathogenic and the cause of a given disease. So even though sequencing has become easier and more accessible, the evaluation of these sequencing data has not evolved with the same speed as the technical evolution of next-generation sequencing. This is a considerable problem given that a patient must receive the correct genetic diagnosis.

The majority of variants are single-nucleotide variants. In reality, the genotype–phenotype correlation is essential to determining the pathogenicity of FBN1 in MFS.

A range of tools for evaluating variants has been developed, but a majority of these tools are not exact and can be used only for guidance when evaluating genetic variants. In addition, a number of databases exist with data collected for published and nonpublished variants. The databases are often incorrect and have many incorrect interpretations of published data.5,6

Yang et al.6 recently presented an evaluation of common variants in the National Heart, Lung, and Blood Institute GO Exome Sequencing Project (ESP) classified as “disease causing” in the widely used Human Gene Mutation Database (HGMD).7 Yang et al. expected to find a maximum of two patients with disease-causing FBN1 variants in the ESP database but found 100 individuals with 23 different variants, indicating a misinterpretation of variants in the HGMD database. The aim of this study was to evaluate the quality of variant databases as they relate to these 23 likely benign variants.

Materials and Metods

Yang et al.6 identified 23 FBN1 variants (Supplementary Table S1 online) in ESP that were classified as disease-causing in HGMD. We searched the HGMD professional database (Supplementary Table S1 online), the UMD-FBN1 database8 (Supplementary Table S2 online), the ClinVar database9 (Supplementary Table S3 online), and the UniProt database10 (Supplementary Table S4 online) for the 23 variants. In each database we identified reference material in as much detail as possible (Supplementary Tables S1–S4 online). Published peer-reviewed articles were all identified via PubMed searches, and all material was accessible. The UMD-FBN1 database also contained data classified as “personal communication,” which is not published in the literature and therefore accessible only via the UMD-FBN1 database homepage (www.umd.be/FBN1/).

Each variant was evaluated according to published information (Supplementary Table S5 online). The accessible data were evaluated according to the Ghent II nosology, and each variant was evaluated as “not MFS”, “maybe MFS” and “inconclusive” ( Table 1 ). “Not MFS” indicates that the variant probably does not cause MFS based on a majority of reported phenotypes that do not fulfill the Ghent II nosology. “Maybe MF” indicates that the variant could cause MFS, but there is not full documentation for a genotype–phenotype association. “Inconclusive” indicates that the data were insufficient to evaluate the variant’s effect on the phenotype. The term “MFS” also was intended to be used to describe variants that cause an evident MFS phenotype, but none of the variants had a clear genotype–phenotype association with an MFS phenotype fulfilling the Ghent II nosology. The manual evaluation of the variants then was compared with the database conclusions ( Table 2 ).

Table 1 Summary of articles reviewed and the diagnostic conclusion as assessment for the documented phenotype
Table 2 Summary of conclusions in databases and conclusions of the manual evaluation of background material in this study

Results

Only the HGMD database contained all variants. The UMD-FBN1 database contained all but one variant, ClinVar contained eight variants, and UniProt contained five variants ( Table 2 ). The majority of references were overlapping in all databases. None of the databases did cover all references, and all four databases had unique references that were not recorded in the three other databases.

As expected, the HGMD database classified all 23 variants as “disease-causing mutations” and associated all the variants with MFS. The UMD-FBN1 database had records for 22 variants, all classified as “mutation,” but one variant was in a single subrecord classified as a “polymorphism.” This specific variant was still classified as a mutation in the overall database, and only by analyzing the specific input record of the patient with X-linked Lujan-Fryns syndrome was it clear that the record had been classified as a polymorphism. A polymorphism is historically defined as a variant more common than 1% in the background population, but the term is seldom used in the modern literature. In some records in the UMD-FBN1 database, the variant was associated with a variety of syndromes and characteristic phenotypes. In many cases the same variant was associated with more than one syndrome/phenotype. Even syndromes not associated with FBN1-like Lujan-Fryns syndrome were found in the UMD-FBN1 database.

ClinVar had records of eight variants, of which four were classified under the term “clinical significance” as “pathogenic/likely pathogenic”; two were classified as being of “uncertain significance” and two were classified as “conflicting data from submitters.” The six variants classified as either “pathogenic/likely pathogenic” or “conflicting data from submitters” were connected with MFS, whereas the “uncertain significance” variants were labeled as “all highly penetrant.”

The UniProt database had records of five variants, which all were associated with MFS.

Manual evaluation of the database references did not find evidence of any of the 23 variants being associated with MFS. Of the 23 variants, only 3 variants were classified as “maybe MFS,” indicating that the variant could result in the MFS phenotype but none of the identified references provided full documentation of a genotype–phenotype association. We classified 14 variants as “inconclusive,” indicating that the accessible literature on the variant precluded any definite conclusions concerning genotype–phenotype relations. Six variants were classified as “not MFS” because they most likely do not cause MFS based on the fact that the majority of the reported patients with this variant do not fulfill the Ghent II criteria.

Discussion

The evaluation of 23 variants shows that the databases do contain misleading information on variants and their genotype–phenotype associations. Thus, clearly, the databases cannot be used as a direct source for diagnostics, but only as a tool for seeking additional information about the specific variant. An MFS diagnosis is based on a rather complex set of diagnostic criteria, which has changed over time. Interpretation of FBN1 variants for defining MFS is rather difficult, and the analyst must have expert knowledge about MFS phenotypes and the diagnostic criteria. Descriptions such as “classic MFS” or “fulfilling the MFS criteria” are not precisely defined and not useful for classifying a given phenotype. MFS can be considered as a diagnosis about which knowledge is constantly evolving, and the diagnostic criteria therefore have to change over time.3,11,12 Describing specific phenotypical characteristics in the databases, such as aortic dilatation, ectopia lentis, or scoliosis, is necessary because they do not change with new diagnostic criteria, and reevaluating the MFS diagnosis if the nosology changes would be possible.

MFS is a rare disease, but the exact incidence is not fully known. Different estimates have been reported: from 17.2 per 100,000 (ref. 13) to 4.6 per 100,000 (ref. 14), but 10 in 100,000 is widely quoted.15 Precisely calculating the expected incidents of disease-causing FBN1 mutations in the ESP cohort is difficult. ESP contains 6,503 samples from a cohort with heart, lung, and blood disorders. One small subcohort of 29 subjects, notated as “thoracic aortic aneurysms leading to acute aortic dissections,” is noteworthy because aortic dissection is highly associated with MFS. The ESP phenotype data are not publicly available, and verifying in which cohort each variant is detected is not possible. In the general ESP cohort one would expect 0.3 to 1.1 patients with MFS, but this could be supplemented with up to 29 extra patients because of the aortic dissection cohort. An expected maximum of around 30 patients compared with the actual 100 patients indicates an overrepresentation of benign variants among the 23 variants found in ESP and HGMD.

The introduction of next-generation sequencing into clinical diagnostics obviously will provide an increased amount of genotype versus phenotype information, but much of this information will never be reported in the databases because the main source of data in the databases is published peer-reviewed articles. Currently, publication of new or reconfirmed variants is not prioritized by most scientific journals. For this reason, publication bias is inevitable. Because the diagnostic strength of each variant is correlated with the amount of data collected about genotype versus phenotype, the future of MFS diagnostics is dependent on data collection from genetic testing and input directly from laboratories, not via published, peer-reviewed articles. Variant databases need to accept data on an individual level, but providing phenotype information as well is crucial.

The UMD-FBN1 database does have the possibility for “personal communication” on variants, but such information relies on personal communication and in many cases does not contain specific information about the phenotype. The ClinVar database does provide data from a few laboratories, but these also do not provide phenotype data.

In a broad perspective, the FBN1 databases do not seem to be ready to incorporate the benefit of the high output of data from next-generation sequencing. The many daily analyses are not incorporated in the databases, and the databases are not ready to “learn” from new data input. Only the ClinVar database provides information on “conflicting data from submitters.” The UMD-FBN1 database registers all variants as mutations even though the variants may be associated with a variety of (up to seven different) phenotypes. It is necessary for variant databases to be able both to handle many data inputs with different and even conflicting data and to present these data in a way that provides the user with a clear understanding of the currently accessible information on the specific variant.

The evaluation of these 23 highly selected and likely benign variants in the FBN1 gene shows that not all data in the databases are correctly classified. That references with no other connection to MFS other than mentioning a variant in the FBN1 gene can be found in the UMD-FBN1 (ref. 16), HGMD,17 and ClinVar18 databases is worrying. Even a reference based on a publication about incidental findings of FBN1 variants is, for some reason, suddenly associated with MFS in the HGMD database.17 UniProt contains only a minority of references compared with HGMD and UMD-FBN1, and this might be the reason why this database does not have references to any of these articles. Among the five UniProt records labeled “MFS,” three do not have MFS, one is inconclusive, and one might have MFS.

Yang et al.6 concluded that the genotype prevalence of MFS was 1:65 but question the causality of some of these variants and suggest “that these variants may not be the monogenic cause of MFS.” We think that their study shows that some researchers tend to use the databases rather uncritically.

Conclusion

The genetic diagnosis of MFS cannot be made reliably using only variant databases; it must be made through time-consuming evaluation of the background material in the databases and by combining these data with expert knowledge on MFS. Because the databases do not provide a reliable interpretation of variants, there is a substantial possibility of misdiagnosing MFS.

Disclosure

The authors declare no conflict of interest.