Pre-Print Research Highlights: The Entire Human Genome Has Been Sequenced


Image by Arek Socha from Pixabay

The Entire Human Genome Has Been Sequenced

  • The International Human Genome Sequencing Consortium and Celera Genomics published the initial drafts of the human genome in 2001.
  • The initial human genome publication revolutionized the field of genomics.
  • The drafts and the follow-up updates covered the euchromatic part of the genome.
  • However, many other complex regions as well as the heterochromatin were left incomplete or incorrect.
  • Euchromatin, as opposed to heterochromatin, is a loosely packed chromatin and is genetically active usually undergoing transcription.
  • This incomplete/erroneous part comprises 8% of the genome.
  • Telomere-to-Telomere Consortium has finished the first complete sequence of a human genome.
  • The research work is still in pre-print and has not been peer-reviewed.
  • The new improved human reference genome has a 3.055 billion base pair sequence.
  • The new human reference genome includes gapless assemblies for all 22 autosomal chromosomes including the X chromosomes.
  • The new reference also corrects a number of errors and introduces almost 200 million base pairs of new sequences containing 2,226 paralogous gene copies.
  • Paralogous genes are genes that are descended from the same ancestral gene through gene duplication in the path of evolution.[2]
  • Of the paralogous gene copies, 115 are predicted to code for proteins.
  • The new complete regions include all centromeric satellite arrays and the short arms of all five acrocentric chromosomes.
  • Satellite arrays play significant roles in heterochromatin formation, genome stability, reproductive isolation, dosage compensation, and evolution.[3]
  • Acrocentric chromosome has a centromere placed close to one end so that the short arm is very small.[4]
  • For the first time, the new data unlocked these complex regions of the genome to functional and variational studies.

Related Video:


Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gershman, Sergey Aganezov, Savannah J. Hoyt, Mark Diekhans, Glennis A. Logsdon, Michael Alonge, Stylianos E. Antonarakis, Matthew Borchers, Gerard G. Bouffard, Shelise Y. Brooks, Gina V. Caldas, Haoyu Cheng, Chen-Shan Chin, William Chow, Leonardo G. de Lima, Philip C. Dishuck, Richard Durbin, Tatiana Dvorkina, Ian T. Fiddes, Giulio Formenti, Robert S. Fulton, Arkarachai Fungtammasan, Erik Garrison, Patrick G.S. Grady, Tina A. Graves-Lindsay, Ira M. Hall, Nancy F. Hansen, Gabrielle A. Hartley, Marina Haukness, Kerstin Howe, Michael W. Hunkapiller, Chirag Jain, Miten Jain, Erich D. Jarvis, Peter Kerpedjiev, Melanie Kirsche, Mikhail Kolmogorov, Jonas Korlach, Milinn Kremitzki, Heng Li, Valerie V. Maduro, Tobias Marschall, Ann M. McCartney, Jennifer McDaniel, Danny E. Miller, James C. Mullikin, Eugene W. Myers, Nathan D. Olson, Benedict Paten, Paul Peluso, Pavel A. Pevzner, David Porubsky, Tamara Potapova, Evgeny I. Rogaev, Jeffrey A. Rosenfeld, Steven L. Salzberg, Valerie A. Schneider, Fritz J. Sedlazeck, Kishwar Shafin, Colin J. Shew, Alaina Shumate, Yumi Sims, Arian F. A. Smit, Daniela C. Soto, Ivan Sović, Jessica M. Storer, Aaron Streets, Beth A. Sullivan, Françoise Thibaud-Nissen, James Torrance, Justin Wagner, Brian P. Walenz, Aaron Wenger, Jonathan M. D. Wood, Chunlin Xiao, Stephanie M. Yan, Alice C. Young, Samantha Zarate, Urvashi Surti, Rajiv C. McCoy, Megan Y. Dennis, Ivan A. Alexandrov, Jennifer L. Gerton, Rachel J. O’Neill, Winston Timp, Justin M. Zook, Michael C. Schatz, Evan E. Eichler, Karen H. Miga, Adam M. Phillippy
bioRxiv 2021.05.26.445798; doi:




Keywords: human genome project, genome sequence, human genome, gene sequencing