Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-18481
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTørresen, Ole K-
dc.contributor.authorStar, Bastiaan-
dc.contributor.authorMier, Pablo-
dc.contributor.authorAndrade-Navarro, Miguel A-
dc.contributor.authorBateman, Alex-
dc.contributor.authorJarnot, Patryk-
dc.contributor.authorGruca, Aleksandra-
dc.contributor.authorGrynberg, Marcin-
dc.contributor.authorKajava, Andrey V-
dc.contributor.authorPromponas, Vasilis J-
dc.contributor.authorAnisimova, Maria-
dc.contributor.authorJakobsen, Kjetill S-
dc.contributor.authorLinke, Dirk-
dc.date.accessioned2019-10-18T09:00:29Z-
dc.date.available2019-10-18T09:00:29Z-
dc.date.issued2019-10-04-
dc.identifier.issn0305-1048de_CH
dc.identifier.issn1362-4962de_CH
dc.identifier.urihttps://digitalcollection.zhaw.ch/handle/11475/18481-
dc.description.abstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where misannotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.de_CH
dc.language.isoende_CH
dc.publisherOxford University Pressde_CH
dc.relation.ispartofNucleic Acids Researchde_CH
dc.rightshttp://creativecommons.org/licenses/by/4.0/de_CH
dc.subjectGenomicsde_CH
dc.subjectBioinformaticsde_CH
dc.subject.ddc572: Biochemiede_CH
dc.titleTandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databasesde_CH
dc.typeBeitrag in wissenschaftlicher Zeitschriftde_CH
dcterms.typeTextde_CH
zhaw.departementLife Sciences und Facility Managementde_CH
zhaw.organisationalunitInstitut für Computational Life Sciences (ICLS)de_CH
dc.identifier.doi10.1093/nar/gkz841de_CH
dc.identifier.doi10.21256/zhaw-18481-
dc.identifier.pmid31584084de_CH
zhaw.funding.euinfo:eu-repo/grantAgreement/EC/H2020/823886//Repeat protein Function Refinement, Annotation and Classification of Topologies/REFRACTde_CH
zhaw.issue21de_CH
zhaw.originated.zhawYesde_CH
zhaw.pages.end11006de_CH
zhaw.pages.start10994de_CH
zhaw.publication.statuspublishedVersionde_CH
zhaw.volume47de_CH
zhaw.publication.reviewPeer review (Publikation)de_CH
zhaw.funding.snf174836de_CH
zhaw.webfeedComputational Genomicsde_CH
zhaw.funding.zhawDiscovering evolutionary innovations by assessing variation and natural selection in protein tandem repeatsde_CH
zhaw.author.additionalNode_CH
Appears in collections:Publikationen Life Sciences und Facility Management

Files in This Item:
File Description SizeFormat 
2019Toerresen_tandem-repeats-lead-to-sequence-assembly-errors_NucleidAcidsResearch.pdf916.98 kBAdobe PDFThumbnail
View/Open
Show simple item record
Tørresen, O. K., Star, B., Mier, P., Andrade-Navarro, M. A., Bateman, A., Jarnot, P., Gruca, A., Grynberg, M., Kajava, A. V., Promponas, V. J., Anisimova, M., Jakobsen, K. S., & Linke, D. (2019). Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research, 47(21), 10994–11006. https://doi.org/10.1093/nar/gkz841
Tørresen, O.K. et al. (2019) ‘Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases’, Nucleic Acids Research, 47(21), pp. 10994–11006. Available at: https://doi.org/10.1093/nar/gkz841.
O. K. Tørresen et al., “Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases,” Nucleic Acids Research, vol. 47, no. 21, pp. 10994–11006, Oct. 2019, doi: 10.1093/nar/gkz841.
TØRRESEN, Ole K, Bastiaan STAR, Pablo MIER, Miguel A ANDRADE-NAVARRO, Alex BATEMAN, Patryk JARNOT, Aleksandra GRUCA, Marcin GRYNBERG, Andrey V KAJAVA, Vasilis J PROMPONAS, Maria ANISIMOVA, Kjetill S JAKOBSEN und Dirk LINKE, 2019. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research. 4 Oktober 2019. Bd. 47, Nr. 21, S. 10994–11006. DOI 10.1093/nar/gkz841
Tørresen, Ole K, Bastiaan Star, Pablo Mier, Miguel A Andrade-Navarro, Alex Bateman, Patryk Jarnot, Aleksandra Gruca, et al. 2019. “Tandem Repeats Lead to Sequence Assembly Errors and Impose Multi-Level Challenges for Genome and Protein Databases.” Nucleic Acids Research 47 (21): 10994–1006. https://doi.org/10.1093/nar/gkz841.
Tørresen, Ole K., et al. “Tandem Repeats Lead to Sequence Assembly Errors and Impose Multi-Level Challenges for Genome and Protein Databases.” Nucleic Acids Research, vol. 47, no. 21, Oct. 2019, pp. 10994–1006, https://doi.org/10.1093/nar/gkz841.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.