GenBank is the NIH genetic sequence database, an annotated collection of
all publicly available DNA sequences.
There are approximately 135,440,924
sequence records in the traditional GenBank divisions and 62,715,288 sequence records in the WGS (Whole genome sequence) division
as of April 2011.
GenBank is part of the International Nucleotide Sequence Database Collaboration,
which comprises the DNA DataBank of Japan (DDBJ), the European
Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three
organizations exchange data on a daily basis.
Many journals require submission of sequence information to a
database prior to publication so that an accession number may appear in
the paper. There are several options for submitting data to GenBank:
- BankIt, a WWW-based submission tool for convenient and quick submission of sequence data
- Sequin, NCBI's stand-alone submission software for MAC, PC, and UNIX platforms, is available by FTP. When using Sequin, the output files for direct submission should be sent to GenBank by e-mail.
- tbl2asn, a command-line program, automates the creation of sequence records for submission to GenBank using many of the same functions as Sequin. It is used primarily for submission of complete genomes and large batches of sequences.
- Barcode Submission Tool, a WWW-based tool for the submission of GenBank sequences and trace data for Barcode of Life projects. Currently, only mitochondrial cytochrome c oxidase subunit I (COI) genes are being accepted with this tool. For the submissions of loci other than COI please use either Bankit or Sequin.
Revisions or updates
to GenBank entries can be made by the submitters at any time. Updates should be sent via e-mail or the UpdateMacroSend form.
Send updates and revisions to gb-admin@ncbi.nlm.nih.gov. Be sure
to give the accession number of the sequence to be updated in the subject line.
There are several ways to search and retrieve data from GenBank.
a) Search GenBank for sequence identifiers and annotations
with Entrez Nucleotide, which is divided into
three divisions:
CoreNucleotide (the main collection),
dbEST (Expressed Sequence Tags), and dbGSS (Genome Survey Sequences).
b) Search and align GenBank sequences to a query sequence using
BLAST (Basic
Local Alignment Search Tool). BLAST searches CoreNucleotide,
dbEST, and dbGSS independently; see
BLAST
info for more information about the numerous BLAST
databases.
c) Search, link, and download sequences programatically using
NCBI e-utilities.
GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section.
The start of the annotation section is marked by a line beginning with the word "LOCUS".
The start of sequence section is marked by a line beginning with the word "ORIGIN"
and the end of the section is marked by a line with only "//".
The GenBank database is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information. Therefore, NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted.
The GenBank database is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information. Therefore, NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted.
No comments:
Post a Comment