Analyze Data Workflow Shared Data Visualization Help User
Using 0 bytes
Published Pages | aun1 | bushman

Complete Khoisan and Bantu genomes from southern Africa: A data analysis supplement

Stephan C. Schuster1*, Webb Miller1*, Aakrosh Ratan1, Lynn P. Tomsho1, Belinda Giardine1, Lindsay R. Kasson1,
Robert S. Harris1, Desiree C. Petersen2, Fangqing Zhao1, Ji Qi1, Can Alkan3, Jeffrey M. Kidd3, Yazhou Sun1,
Daniela I. Drautz1, Pascal Bouffard4, Donna M. Muzny5, Jeffrey G. Reid5, Lynne V. Nazareth5, Qingyu Wang1,
Richard Burhans1, Cathy Riemer1, Nicola E. Wittekindt1, Priya Moorjani6, Elizabeth A. Tindall2,7, Charles G. Danko8,
Wee Siang Teo2,7, Anne M. Buboltz1, Zhenhai Zhang1, Qianyi Ma1, Arno Oosthuysen9, Abraham W. Steenkamp10,
Hermann Oostuisen11, Philippus Venter12, John Gajewski1, Yu Zhang1, B. Franklin Pugh1, Kateryna D. Makova1,
Anton Nekrutenko1, Elaine R. Mardis13, Nick Patterson14, Tom H. Pringle15, Francesca Chiaromonte1,
James C. Mullikin16, Evan E. Eichler3, Ross C. Hardison1, Richard A. Gibbs5, Timothy T. Harkins4 & Vanessa M. Hayes2,7*

*These authors contributed equally to this work

bushman

Image by Stephan Schuster

Why a data analysis supplement?

Usually projects of this scope submit relevant data to public repositories. However, because datasets are large, this prevents the majority of biomedical researchers from effectively utilizing these data, as they need to be downloaded, reformatted, and further transformed in a variety of ways. In this project we wanted to make the data available to researchers in an immediately useful form.  This is why in addition to submitting our data to standard repositories, we provide datasets through Galaxy - an integrative genomic analysis platform. Galaxy makes it easy to perform analysis interactively through the web, on arbitrarily large datasets.  With hundreds of tools provided there are few limits on what can be done.  

Where is the data?

The data is served via three complementary mechanisms:

  1. Interactive analysis. Summary datasets including locations of all SNPs, indels, microsatellites, and retroposons identified in personal genomes are available through the Galaxy Library system.  Here you can download and analyze the data without ever leaving your web browser (but before you do, we recommend that you watch two short videos listed in the "How to download and analyze the data?" section below).
  2. Genome browser. Datasets can be viewed through the Penn State Genome Browser. It is also possible to obtain and visualize data from the Penn State browser using Galaxy, and to view your Galaxy data on the browser.
  3. FTP site. Mapped reads in BAM format can be downloaded from ftp://ftp.bx.psu.edu/data/bushman/ (if you are downloading using an ftp client, set username and password to anon).

How to download and analyze the data?

The Galaxy Team likes to provide documentation in the form of very short movies called Galactic Quickies. This supplement is no exception. Please watch the two clips below to get an idea of how to:

  1. Access data in Galaxy (2 min 03 sec) - an overview of accessing and downloading Bushman data from the Galaxy.
  2. Analyze data with Galaxy (8 min 20 sec) - an example analysis showing identification of novel exonic SNPs in the genome of Archbishop Tutu.

What if something does not work?

Send e-mail to our bug report list and we will get back to you.

Author Affiliations

  1. Pennsylvania State University, Center for Comparative Genomics and Bioinformatics, 310 Wartik Lab, University Park, Pennsylvania 16802, USA.
  2. Cancer Genetics Group, Children’s Cancer Institute Australia for Medical Research, C25 Lowy Cancer Research Centre University of New South Wales, High Street, New South Wales 2031, Australia.
  3. University of Washington, Department of Genome Sciences, and Howard Hughes Medical Institute, Foege S-413-C, Box 355065, Seattle, Washington 98195-5065, USA.
  4. Roche Diagnostics Corporation, Indianapolis, Indiana 46250-0414, USA.
  5. The Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.
  6. Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
  7. University of New South Wales, Randwick, New South Wales 2031, Australia.
  8. Department of Biological Statistics and Computational Biology, 101 Biotechnology Building, Cornell University, Ithaca, New York 14853, USA.
  9. PO Box 1899, Tsumeb, Namibia.
  10. PO Box 180, Arnos, Namibia.
  11. PO Box 1077, Grootfontein, Namibia.
  12. University of Limpopo, Turfloop Campus, P/Bag X1106, 0727 Sovenga, South Africa.
  13. Washington University in St Louis, School of Medicine, The Genome Center, 4444 Forest Park Boulevard, St Louis, Missouri 63108, USA.
  14. Broad Institute of MIT (Massachusetts Institute of Technology) and Harvard University, Cambridge Center, Cambridge, Massachusetts 02142, USA.
  15. Sperling Foundation, Eugene, Oregon 97405, USA.
  16. National Human Genome Research Institute, National Institutes of Health, 5625 Fishers Lane, Room 5N-01Q, MSC 9400, Rockville, Maryland 20892-9400, USA.