Analysis and benchmarking of small and large genomic variants across tandem repeats

Analysis and benchmarking of small and large genomic variants across tandem repeats

Levinson, G. & Gutman, G. A. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol.4, 203–221 (1987).

CAS 
PubMed 

Google Scholar 

Fan, H. & Chu, J.-Y. A brief review of short tandem repeat mutation. Genom. Proteom. Bioinform.5, 7–14 (2007).

Article 
CAS 

Google Scholar 

Shriver, M. D., Jin, L., Chakraborty, R. & Boerwinkle, E. VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics134, 983–993 (1993).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Wright, J. M. Mutation at VNTRs: are minisatellites the evolutionary progeny of microsatellites? Genome37, 345–347 (1994).

Article 
CAS 
PubMed 

Google Scholar 

Willems, T. et al. The landscape of human STR variation. Genome Res.24, 1894–1904 (2014).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Ren, J., Gu, B. & Chaisson, M. J. P. vamos: variable-number tandem repeats annotation using efficient motif sets. Genome Biol.24, 175 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Noyes, M. D. et al. Familial long-read sequencing increases yield of de novo mutations. Am. J. Hum. Genet.109, 631–646 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

DeJesus-Hernandez, M. et al. Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS. Neuron72, 245–256 (2011).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Depienne, C. & Mandel, J.-L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet.108, 764–785 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Mirceta, M., Shum, N., Schmidt, M. H. M. & Pearson, C. E. Fragile sites, chromosomal lesions, tandem repeats, and disease. Front. Genet.13, 985975 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Hannan, A. J. Repeat DNA expands our understanding of autism spectrum disorder. Nature589, 200–202 (2021).

Article 
CAS 
PubMed 

Google Scholar 

Hannan, A. J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet.19, 286–298 (2018).

Article 
CAS 
PubMed 

Google Scholar 

Stanley, U. et al. Forensic DNA profiling: autosomal short tandem repeat as a prominent marker in crime investigation. Malays. J. Med. Sci.27, 22–35 (2020).

Google Scholar 

Hall, C. L. et al. Accurate profiling of forensic autosomal STRs using the Oxford Nanopore Technologies MinION device. Forensic Sci. Int. Genet.56, 102629 (2022).

Article 
CAS 
PubMed 

Google Scholar 

Warner, J. P. et al. A general method for the detection of large CAG repeat expansions by fluorescent PCR. J. Med. Genet.33, 1022–1026 (1996).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Jeffreys, A. J., Wilson, V. & Thein, S. L. Hypervariable ‘minisatellite’ regions in human DNA. Nature314, 67–73 (1985).

Article 
CAS 
PubMed 

Google Scholar 

Dolzhenko, E. et al. ExpansionHunter: a sequence-graph based tool to analyze variation in short tandem repeat regions. Bioinformatics35, 4754–4756 (2019).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods14, 590–592 (2017).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res.47, e90 (2019).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02057-3 (2024).

Article 
PubMed 

Google Scholar 

Chiu, R., Rajan-Babu, I.-S., Friedman, J. M. & Birol, I. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol.22, 224 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Nurk, S. et al. The complete sequence of a human genome. Science376, 44–53 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science376, eabl3533 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Rhie, A. et al. The complete sequence of a human Y chromosome. Nature621, 344–354 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet.24, 464–483 (2023).

Article 
CAS 
PubMed 

Google Scholar 

Majidian, S., Agustinho, D. P., Chin, C.-S., Sedlazeck, F. J. & Mahmoud, M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol.24, 221 (2023).

Article 
PubMed 
PubMed Central 

Google Scholar 

Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom.2, 100128 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol.38, 1347–1355 (2020).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol.40, 672–680 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol.23, 271 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Yang, J. & Chaisson, M. J. P. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol.23, 110 (2022).

Article 
PubMed 
PubMed Central 

Google Scholar 

Audano, P. A. & Beck, C. R. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res.34, 7–19 (2024).

Article 
PubMed 
PubMed Central 

Google Scholar 

Fu, Y., Mahmoud, M., Muraliraman, V. V., Sedlazeck, F. J. & Treangen, T. J. Vulcan: improved long-read mapping and structural variant calling via dual-mode alignment. GigaScience10, giab063 (2021).

Article 
PubMed 
PubMed Central 

Google Scholar 

Gelfand, Y., Rodriguez, A. & Benson, G. TRDB—the Tandem Repeats Database. Nucleic Acids Res.35, D80–D87 (2007).

Article 
CAS 
PubMed 

Google Scholar 

Halman, A., Dolzhenko, E. & Oshlack, A. STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data. Hum. Mutat.43, 859–868 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Kent, W. J. et al. The human genome browser at UCSC. Genome Res.12, 996–1006 (2002).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Saini, S., Mitra, I., Mousavi, N., Fotsing, S. F. & Gymrek, M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat. Commun.9, 4397 (2018).

Article 
PubMed 
PubMed Central 

Google Scholar 

Benson, G. Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic Acids Res.27, 573–580 (1999).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Smit, A., Hubley, R. & Green, P. RepeatMasker. http://www.repeatmasker.org (2013).

Wlodzimierz, P., Hong, M. & Henderson, I. R. TRASH: tandem repeat annotation and structural hierarchy. Bioinformatics39, btad308 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Novák, P., Neumann, P. & Macas, J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. Nat. Protoc.15, 3745–3776 (2020).

Article 
PubMed 

Google Scholar 

Delucchi, M., Näf, P., Bliven, S. & Anisimova, M. TRAL 2.0: tandem repeat detection with circular profile hidden Markov models and evolutionary aligner. Front. Bioinform.1, 691865 (2021).

Article 
PubMed 
PubMed Central 

Google Scholar 

El-Sawy, M. & Deininger, P. Tandem insertions of Alu elements. Cytogenet. Genome Res.108, 58–62 (2004).

Article 

Google Scholar 

Moretti, T. R. et al. Population data on the expanded CODIS core STR loci for eleven populations of significance for forensic DNA analyses in the United States. Forensic Sci. Int. Genet.25, 175–181 (2016).

Article 
CAS 
PubMed 

Google Scholar 

Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature581, 444–451 (2020).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Stevanovski, I. et al. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. Sci. Adv.8, eabm5386 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Pellerin, D. et al. Deep intronic FGF14 GAA repeat expansion in late-onset cerebellar ataxia. N. Engl. J. Med.388, 128–141 (2022).

Article 
PubMed 
PubMed Central 

Google Scholar 

Tan, D. et al. CAG repeat expansion in THAP11 is associated with a novel spinocerebellar ataxia. Mov. Disord.38, 1282–1293 (2023).

Article 
CAS 
PubMed 

Google Scholar 

Mukamel, R. E. et al. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes. Science373, 1499–1505 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Liu, Z. et al. Inconsistent genotyping call at DYS389 locus and implications for interpretation. Int. J. Legal Med.132, 1043–1048 (2018).

Article 
PubMed 

Google Scholar 

White, P. S., Tatum, O. L., Deaven, L. L. & Longmire, J. L. New, male-specific microsatellite markers from the human Y chromosome. Genomics57, 433–437 (1999).

Article 
CAS 
PubMed 

Google Scholar 

Vinces, M. D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K. J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science324, 1213–1216 (2009).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Sulovari, A. et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl Acad. Sci. USA116, 23243–23253 (2019).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Annear, D. J. et al. Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci. Rep.11, 2515 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Liao, W.-W. et al. A draft human pangenome reference. Nature617, 312–324 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science372, eabf7117 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Garg, S. et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nat. Biotechnol.39, 309–312 (2021).

Article 
CAS 
PubMed 

Google Scholar 

Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics34, 3094–3100 (2018).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods18, 170–175 (2021).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Jarvis, E. D. et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature611, 519–531 (2022).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Dunn, T. & Narayanasamy, S. vcfdist: accurately benchmarking phased small variant calls in human genomes. Nat. Commun.14, 8149 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Cleary, J. G. et al. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines. Preprint at bioRxiv https://doi.org/10.1101/023754 (2015).

Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of genetic variants. Bioinformatics31, 2202–2204 (2015).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Marco-Sola, S., Moure, J. C., Moreto, M. & Espinosa, A. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics37, btaa777 (2020).

Google Scholar 

Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods15, 461–468 (2018).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Park, J., Kaufman, E., Valdmanis, P. N. & Bafna, V. TRviz: a Python library for decomposing and visualizing tandem repeat sequences. Bioinform. Adv.3, vbad058 (2023).

Article 
PubMed 
PubMed Central 

Google Scholar 

Krause, A. et al. Junctophilin 3 (JPH3) expansion mutations causing Huntington disease like 2 (HDL2) are common in South African patients with African ancestry and a Huntington disease phenotype. Am. J. Med. Genet. B168, 573–585 (2015).

Article 
CAS 

Google Scholar 

Wieben, E. D. et al. A common trinucleotide repeat expansion within the transcription factor 4 (TCF4, E2-2) gene predicts Fuchs corneal dystrophy. PLoS ONE7, e49083 (2012).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Jam, H. Z. et al. A deep population reference panel of tandem repeat variation. Nat. Commun.14, 6711 (2023).

Article 

Google Scholar 

Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. & Bafna, V. Targeted genotyping of variable number tandem repeats with adVNTR. Genome Res.28, 1709–1719 (2018).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Sonay, T. B. et al. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res.25, 1591–1599 (2015).

Article 
CAS 

Google Scholar 

Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics26, 841–842 (2010).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res.49, D884–D891 (2020).

Article 
PubMed Central 

Google Scholar 

English, A. Project Adotto tandem-repeat regions and annotations. Zenodo 10.5281/zenodo.8387564 (2022).

Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience10, giab008 (2021).

Article 
PubMed 
PubMed Central 

Google Scholar 

English, A. Project Adotto whole-genome variants. Zenodo 10.5281/zenodo.6975244 (2022).

Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods15, 595–597 (2018).

Article 
PubMed 
PubMed Central 

Google Scholar 

Chin, C.-S. et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun.11, 4794 (2020).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

Wootton, J. C. & Federhen, S. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem.17, 149–163 (1993).

Article 
CAS 

Google Scholar 

Šošić, M. & Šikić, M. Edlib: a C/C++ library for fast, exact sequence alignment using edit distance. Bioinformatics33, btw753 (2016).

Google Scholar 

Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data. GigaScience10, giab007 (2021).

Article 
PubMed 
PubMed Central 

Google Scholar 

Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol.30, 772–780 (2013).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 

English, A. et al. GIAB TandemRepeats benchmark v1.0. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/TandemRepeats_v1.0 (2023).

English, A. et al. GIAB TR comparison VCFs. Zenodo 10.5281/zenodo.10724503 (2024).

English, A. et al. Working space for the GIAB TR benchmarking project. GitHub https://github.com/ACEnglish/adotto (2023).

English, A. Structural variant toolkit for VCFs. GitHub https://github.com/ACEnglish/truvari (2023).

English, A. et al. Library for variant benchmarking stratification. GitHub https://github.com/ACEnglish/laytr (2023).

Olson, N. A snakemake based pipeline to build Adotto TR databases. GitHub https://github.com/nate-d-olson/adotto-smk (2023).

English, A. A rust implementation of regioneR for interval overlap permutation testing. GitHub https://github.com/ACEnglish/regioners (2023).

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : Nature.com – https://www.nature.com/articles/s41587-024-02225-z

Exit mobile version