Copy number variations (CNVs) refer to segments of DNA where variations in copy numbers are observed when comparing different genomes . CNVs can manifest as gains (insertions or duplications) or losses (deletions or null genotypes) compared to a reference genome. Exonic CNVs specifically involve deletions and duplications at the exon level within specific genes, affecting genes differently based on the degree of overlap . Large CNVs, especially those involving multiple genes, are relatively easily detected, while their association with syndromes and establishing genotype-phenotype correlations pose challenges. This review aims to explore the relationship between exon level CNVs in well-characterized genes and rare genetic diseases. Additionally, the study investigates less familiar but recurrently reported CNVs, focusing on the enrichment of CNVs in East Asians.
The subsequent sections delve into specific genes with recurrent exonic CNVs, such as
CNV denotes segments of DNA where variations in copy numbers are detected when comparing two or more genomes. In the absence of additional annotations, CNV does not imply any specific information about the frequency relative to others or the phenotypic impact. These structural variants in quantity can manifest as gains in genomic copy numbers (insertions or duplications) or losses (deletions or null genotypes) compared to a specified reference genome sequence . Here, exonic CNV specifically refers to deletions and duplications at the exon level within specific genes. CNVs affect genes in different ways depending on the degree of overlap with them . Some CNVs cover entire genes (from now on, whole gene CNVs), other CNVs overlap with part of the coding sequence but not the whole gene (exonic CNVs) and other CNVs are found within purely intronic regions (intronic CNVs), not overlapping with any exon from any annotated isoform.
This suggests that detecting large CNVs involving multiple genes (greater than 100 kb) is relatively straightforward. Furthermore, large CNVs are associated with syndromes that induce multiple anomalies, and establishing a genotype-phenotype correlation is comparatively challenging. Therefore, in this review, we aim to explore the relationship between exon level CNVs in well-characterized genes and rare genetic diseases, given their known phenotypes.
Numerous well-established genes, such as
Recently, the exon 1-4 deletion of the
Pseudohypoparathyroidism type Ib (PHP-1b) is a rare imprinting disorder, characterized by renal parathyroid hormone resistance, but the absence of physical features of Albright hereditary osteodystrophy. A common heterozygous 3-kb deletion of
Laminin α2-related muscular dystrophy (LAMA2 MD) is a rare autosomal-recessive genetic disorder, impacting an estimated 0.7 to 2.5 individuals per 100,000 in predominantly European cohorts. An NGS-based CNV profiling was conducted on 114 individuals clinically diagnosed with
Cystic fibrosis stands out as one of the most prevalent autosomal recessively inherited disorders among Caucasians, attributed to pathogenic variants in the
The diagnosis of exonic CNVs, particularly those encompassing a single exon, poses challenges due to their inherent complexity. Despite the availability of various diagnostic methods, it is recommended to adopt a strategic approach by combining these methods judiciously. Cross-checking the results through the application of multiple diagnostic techniques becomes crucial to ensure the accuracy of the diagnosis, given the intricacies involved in detecting CNVs at the exonic level. This comprehensive and multi-method approach enhances the reliability of identifying and characterizing exonic CNVs, contributing to more precise and informed clinical assessments.
MLPA is highly sensitive and can detect small changes in DNA copy number, making it effective for identifying both small and large genomic alterations. MLPA allows the simultaneous analysis of multiple targets in a single reaction, enabling the assessment of several genomic loci in a cost-effective and time-efficient manner. MLPA provides quantitative information about the copy number of specific DNA sequences, offering precise insights into genetic variations. MLPA is versatile and can be applied to various sample types, including genomic DNA extracted from blood, tissues, or other biological samples. MLPA is a well-established and reliable technique, widely used in clinical diagnostics and research settings for identifying and characterizing genomic imbalances. However MLPA is targeted and designed for specific genomic regions, which means it may not provide a comprehensive overview of the entire genome. Whole genome approaches may be more suitable for global copy number analysis. The success of MLPA relies on the design of specific probes, and variations outside the targeted regions can be missed. MLPA may face challenges when applied to regions with novel genes or poorly characterized loci, as the probes need prior design based on known genomic sequences. While MLPA provides quantitative data, it is semi-quantitative and may not be as precise as some other quantitative techniques.
ddPCR is a molecular biology technique that allows for the absolute quantification of nucleic acid targets. One of the major strengths of ddPCR is its ability to provide absolute quantification of nucleic acid targets, offering precise and accurate measurements of target concentrations without the need for standard curves [21,22]. ddPCR is highly sensitive and can detect low-abundance targets, making it suitable for applications where high sensitivity is crucial, such as detecting rare pathogenic variants or monitoring minimal residual disease [23,24]. ddPCR allows for the simultaneous detection and quantification of multiple targets in the same reaction, enhancing its efficiency and reducing the amount of sample needed. The digital nature of ddPCR, where individual reactions are partitioned into thousands of droplets, reduces the impact of reaction inhibitors and provides robust and accurate results. However, ddPCR may have a more limited dynamic range compared to quantitative PCR (qPCR), which can affect its ability to quantify targets across a wide range of concentrations . The initial investment for ddPCR instrumentation can be relatively high, making it less accessible for some laboratories compared to more traditional PCR methods. Analyzing ddPCR data can be more complex compared to conventional PCR, particularly for users who are not familiar with the digital nature of the technique. Designing and optimizing ddPCR assays may require more effort compared to qPCR, and the flexibility to modify assays on the fly may be limited.
The continuous emergence of algorithms for CNV analysis using NGS data reflects the dynamic evolution and advancement in genomic research [26-28]. As NGS technologies progress, researchers and bioinformaticians are developing and refining algorithms to enhance the accuracy and efficiency of CNV detection. This ongoing trend signifies the commitment of the scientific community to harness the potential of NGS data for unraveling genomic complexities, contributing to a deeper understanding of genetic variations, and ultimately advancing precision medicine and diagnostic capabilities. The challenge lies in detecting germline CNVs from targeted NGS data, especially for single and multi-exon alterations. One study evaluates five CNV calling tools (DECoN, CoNVaDING, panelcn.MOPS, ExomeDepth, and CODEX2) using four genetic diagnostics datasets, totaling 495 samples with 231 validated CNVs . The evaluation, conducted with default and sensitivity-optimized parameters, reveals that most tools exhibit high sensitivity and specificity, although performance varies based on the dataset. In the diagnostic scenario, DECoN and panelcn.MOPS emerge as effective for CNV screening, with DECoN showing superior specificity. The study highlights the importance of tool selection and parameter optimization for accurate CNV detection in genetic diagnostics. However, the absence of a standardized approach remains a prevailing reality within the discipline.
WGS is a powerful method for detecting CNVs across the entire genome. By employing high-throughput sequencing techniques, WGS provides a comprehensive view of genomic alterations, making it effective in identifying CNVs of various sizes. Additionally, structural variant analysis tools such as Delly and Manta are commonly utilized in conjunction with WGS data to identify complex rearrangements, including insertions, deletions, and inversions [29,30]. These tools contribute to a more thorough understanding of genomic architecture and aid in the accurate detection of CNVs, further enhancing the capabilities of WGS in unraveling structural variations at the nucleotide level.
However, accurately calling CNVs from WGS remains challenging, lacking a consensus. Another study explores practical calling options, highlighting the complementary results obtained from callers based on different signals (paired-end reads, split reads, coverage depth). The authors propose a combined approach using four selected callers (Manta, Delly, ERDS, CNVnator) and a regenotyping tool (SV2), demonstrating its applicability in routine practice in terms of computation time and interpretation. The study showcases the superiority of these approaches over array-based comparative genomic hybridization (aCGH), particularly in breakpoint definition resolution and the detection of potentially relevant CNVs. The findings are confirmed on benchmark and clinically validated genomes, suggesting that WGS presents a timely and economically viable alternative to the combination of aCGH and whole-exome sequencing .
Confirmatory PCR is the most reliable method for confirming deletions and duplications. However, it requires a relatively precise understanding of the CNV range, this involves the inconvenience of designing new primers for experimentation. Specifically defining the CNV range allows for the design of specific primers for conducting Gap PCR .
Large-sized CNVs are typically identified using chromosomal microarray (CMA). However, for small-sized CNVs, understanding their prevalence in specific genes is crucial for utilizing methods such as MLPA or long-read sequencing. Some genes may have hotspots for exonic CNVs, and certain populations may show enrichment of these CNVs, possibly due to founder effects [8-10,33]. Detection methods such as MLPA, qPCR, ddPCR are preferred over traditional methods like CMA or karyotyping. WGS can also be considered, especially if the target gene is well-defined. When NGS data is available, various algorithms can be employed to detect exonic CNVs; however, these algorithms may be less robust in identifying single exon deletions or duplications. In a broader context, CNVs, particularly those of smaller scale such as exonic CNVs, may conceal latent genetic variations. The discernment and therapeutic targeting of these specific CNVs, potentially influenced by ethnic considerations, have the potential to substantially advance the realms of rare disease diagnosis and treatment. The continuous effort to discover and understand these subtle CNVs is highly significant, as it holds the potential to lead to the development of ethnically-specific treatments for rare diseases in the future.
No fundings to declare.