Globally, 3.5-5.9% of the general population is affected by rare diseases. Proportionately, this percentage seems insignificant, but the estimated number of patients with rare diseases adds up to 263-446 million people [1]. Thus, the total number of cases is not negligible and necessitates further research in this field of study for quick and accurate diagnoses.
A rare disease is defined as a condition affecting fewer than 200,000 people in the United States [2] and fewer than 1 in 2,000 people in Europe [3]. In Korea, a rare disease is defined as a condition affecting fewer than 20,000 people in the general population. In 2020, the number of rare disease cases in Korea was a total of 52,069.
Rare disease patients take an average of 6-8 years to receive an accurate diagnosis [4]. According to statistics in Korea, almost 80% of patients visited two or more hospitals before receiving a rare disease diagnosis, making it difficult to get an accurate and quick diagnosis [5]. If a patient is diagnosed with a causative mutation, they can receive economic support such as the Exempted Calculation of Health Insurance. However, a significant number of patients remain undiagnosed even after a long period of time.
Genetic causes are known to account for 80% of patients with rare diseases [6,7]. Currently, targeted sequencing or whole-exome sequencing (WES) technologies are mainly used to detect mutations that cause rare diseases. However, WES covers only protein-coding regions which comprise less than 2% of the genome, so there is a clear limitation in detecting mutations in intronic and intergenic regions, large-scale structural variants, and repeat expansions. As a result, the WES-based diagnosis rate is about 25-41% [8]. For this reason, there have been efforts to overcome this limitation of WES by adopting additional omics technologies such as whole-genome sequencing (WGS), RNA sequencing (RNA-seq), bisulfite sequencing, and assay for transposase-accessible chromatin using sequencing (ATAC-seq) (Fig. 1). As the continuous development of next-generation sequencing (NGS) techniques has lowered the cost and time for these various omics techniques [9], they now can be more readily applied to rare disease diagnosis and study. In this review, we introduce some of the multi-omics techniques and studies harnessing them for rare disease studies.
Whole-exome sequencing (WES) is a method of sequencing protein-coding regions, which takes about 2% of our genome and has been widely used for the diagnosis of rare diseases. However, WES has limitations in being able to detect mutations occurring in non-coding regions, such as intronic/intergenic variants, splicing variants, and complex structural variants. To overcome these limitations, efforts have recently been made to introduce WGS technology to the diagnosis of rare diseases.
A recent study by Burdick et al. [10] reported that 15 of 54 (28%) diagnoses for Undiagnosed Diseases Network participants were not able to be solved by WES and required WGS or other omics techniques because WES failed to identify pathogenic non-coding variants, copy number variations, and repeat expansions. The UK100K project also identified novel pathogenic non-coding variants disrupting the transcription of disease-associated genes such as
RNA-seq is a technology for analyzing gene expression patterns using NGS [13]. Compared to conventional microarray-based methods, it is possible to detect gene expression levels more precisely at the base-pair level [14]. RNA-seq also has the advantage of being able to detect alternative splicing patterns and gene fusions, which are hard to be identified by WES and WGS. Although it should be considered that gene expression patterns are tissue-specific, there are recent efforts to diagnose and analyze rare diseases using RNA-seq data from blood samples.
Frésard et al. [15] analyzed RNA-seq data from 94 individuals with undiagnosed rare diseases and compared them with publicly available RNA-seq data from healthy individuals and tissues to identify outlier expression of genes that are potentially implicated in rare diseases. They found that 1) under-expression outliers were more enriched in the genes sensitive to loss-of-function mutations, 2) the number of splicing outliers was higher in patients, and 3) a large number of rare variants show allelic-specific expression (ASE) biased toward the deleterious allele.
Ferraro et al. [16] also characterized transcriptomic abnormalities such as gene expression, ASE, and alternative splicing from RNA-seq data of multiple different tissue types and developed a statistical model for predicting their impact by integrating more than 800 genomes matched with tissue-specific transcriptomes. They reported that outliers having aberrant gene expression, ASE, and splicing patterns tend to have a higher chance to carry a rare pathogenic variant near the corresponding gene.
Furthermore, a recent study from Oliver et al. [17] analyzed 47 individuals with undiagnosed rare genetic diseases using RNA-seq and reported 11 potentially pathogenic fusion transcripts such as
In addition to genetic mutations, epigenomic changes can also cause rare diseases. In particular, given that mutations in DNA methyltransferases have been reported in various rare diseases such as Heyn–Sproul–Jackson syndrome and immunodeficiency-centromeric instability-facial anomalies syndrome 1 (ICF1), it is necessary to accurately determine how these mutations actually affect genome-wide methylation patterns. There have been various different techniques developed to profile genomic DNA methylation, and most of them are based on bisulfite treatment converting unmethylated cytosines to uracil by deamination while leaving methylated cytosines unconverted [18]. After bisulfite conversion, NGS can be used to distinguish unmethylated cytosines from methylated ones.
Sun et al. [19] interrogated genome-wide DNA methylation by whole-genome bisulfite sequencing of hereditary sensory and autonomic neuropathy type 1 with dementia and hearing loss (HSAN1E) patients with
Gatto et al. [20] interrogated the effects of
Chromatin accessibility is highly dynamic and a key epigenomic feature for defining cellular identity because gene expression is also regulated by physical accessibility to its regulatory elements such as enhancers, promoters, and insulators [21]. The genome-wide profiles of DNA accessibility can be characterized by various molecular techniques such as DNase I hypersensitive sites sequencing [22], formaldehyde-assisted identification of regulatory elements followed by sequencing [23], and ATAC-seq. Among them, ATAC-seq is the most recently developed chromatin accessibility assay and the fastest and most sensitive of the available assays [24].
A recent study by Luperchio et al. [25] adopted ATAC-seq to investigate shared epigenetic alterations in mouse models of Kabuki type 1 and 2 and Rubinstein-Taybi type 1 syndromes. They found that disruption of chromatin accessibility at promoters frequently dysregulates downstream gene expression, and a considerable number of dysregulated genes were shared among the three rare disease mouse models, which may explain the shared disease manifestations.
With the recent rapid development of NGS technology, causal variants have been identified for many rare diseases. However, in a significant number of rare diseases, pathogenic variants still have not been discovered, and studies on underlying mechanisms are also lacking. Here, we introduced recent efforts harnessing multi-omics approaches to improve the diagnostic yield and to better understand the molecular mechanism of rare diseases.
WGS can detect various genomic variants such as non-coding mutations, structural variants, and repeat expansions, which cannot be accurately covered by WES. RNA-seq can be also very useful not only for understanding the downstream impact of genomic variants on gene expression profiles but also for detecting additional variant types such as alternative splicing and gene fusions implicated in the pathogenesis of rare diseases. As transcriptomic features can be heavily affected by various epigenomic features such as DNA methylation, histone modification, and DNA accessibility, additional epigenomic approaches such as bisulfite sequencing and ATAC-seq can be useful for understanding the underlying mechanisms of pathogenic variants (Table 1, Fig. 1) [26-35].
Overall, by integrating and analyzing these various omics techniques, it is expected that disease-associated variants will be more precisely identified, and pathogenesis will be better understood, thereby increasing the diagnosis rate of diseases and ultimately contributing to the development of novel treatment technologies.
The authors declare that they do not have any conflicts of interest.
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2020M3E5D7115320). This research was also partly supported by Basic Science Research Program through the NRF funded by the Ministry of Education (NRF-2021R1A6A3A13045998 to Y.C.).
Conception and design: SL. Acquisition of data: YC, DWYC. Analysis and interpretation of data: YC, DWYC. Drafting the article: YC, DWYC, SL. Critical revision of the article: YC, DWYC, SL. Final approval of the version to be published: SL.
Summary table of sequencing techniques and their applications
Sequencing technique | Description | Detectable variants | Reference |
---|---|---|---|
Whole-exome sequencing (WES) | - Covering exonic (protein-coding) regions. - Much lower cost than WGS. - Higher sequencing depth than WGS. - Faster sequencing and bioinformatic analysis than WGS. |
- Single nucleotide variants (SNVs) and short indels (indels) in exonic regions. - Copy number variants in exonic regions. |
- Suwinski et al. [26] - Rabbani et al. [27] |
Whole-genome sequencing (WGS) | - Covering the entire genomic region including intronic and intergenic regions. - Identifying more complex genomic variants than WES. - Higher cost than WES. |
- SNVs and indels in the entire genome including non-coding regions. - Copy number variants. - Complex structural variants. - Repeat expansions. |
- Austin-Tse et al. [28] - Ng and Kirkness [29] |
RNA sequencing (RNA-seq) | - Covering transcriptome. - Quantifying RNA expression levels - Identifying differentially expressed genes. |
- Gene/isoform expression. - Allele-specific gene expression. - Alternative splicing patterns. - Gene fusions. |
- Stark et al. [30] - Hong et al. [31] |
Bisulfite sequencing (BS-seq) | - Detecting methylated cytosine in genomic DNA at single-base resolution. | - Genome-wide DNA methylation profiles. - Hyper/hypo-methylated CpG islands. |
- Feng and Lou [32] - Wreczycka et al. [33] |
Assay for transposase-accessible chromatin using sequencing (ATAC-seq) | - Detecting chromatin accessibility along the genome. - Identifying differentially accessible regions. |
- Genome-wide DNA accessibility profiles - Enriched transcription factor binding sites |
- Yan et al. [34] - Grandi et al. [35] |