Document Detail

The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources.
Jump to Full Text
MedLine Citation:
PMID:  19965766     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
The current issue of Nucleic Acids Research includes descriptions of 58 new and 73 updated data resources. The accompanying online Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, now lists 1230 carefully selected databases covering various aspects of molecular and cell biology. While most data resource descriptions remain very brief, the issue includes several longer papers that highlight recent significant developments in such databases as Pfam, MetaCyc, UniProt, ELM and PDBe. The databases described in the Database Issue and Database Collection, however, are far more than a distinct set of resources; they form a network of connected data, concepts and shared technology. The full content of the Database Issue is available online at the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
Authors:
Guy R Cochrane; Michael Y Galperin
Related Documents :
21749296 - Research progress of rna quadruplex.
9697226 - Development of software tools at bioinformatics centre (bic) at the national university...
2771636 - Overview of the limb database.
17331896 - A comprehensive crop genome research project: the superhybrid rice genome project in ch...
15359036 - Similarities and differences in experiences of hope.
15932006 - Pharmaceutical research and development and the patent system.
Publication Detail:
Type:  Journal Article; Research Support, N.I.H., Intramural; Research Support, Non-U.S. Gov't     Date:  2009-12-03
Journal Detail:
Title:  Nucleic acids research     Volume:  38     ISSN:  1362-4962     ISO Abbreviation:  Nucleic Acids Res.     Publication Date:  2010 Jan 
Date Detail:
Created Date:  2009-12-22     Completed Date:  2010-02-01     Revised Date:  2011-01-07    
Medline Journal Info:
Nlm Unique ID:  0411011     Medline TA:  Nucleic Acids Res     Country:  England    
Other Details:
Languages:  eng     Pagination:  D1-4     Citation Subset:  IM    
Affiliation:
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. cochrane@ebi.ac.uk
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Animals
Computational Biology / methods*,  trends
Databases, Genetic*
Databases, Nucleic Acid*
Databases, Protein
Gene Expression Profiling / methods
Humans
Information Storage and Retrieval / methods
Internet
Software
Comments/Corrections

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

Full Text
Journal Information
Journal ID (nlm-ta): Nucleic Acids Res
Journal ID (publisher-id): nar
Journal ID (hwp): nar
ISSN: 0305-1048
ISSN: 1362-4962
Publisher: Oxford University Press
Article Information
Download PDF
? The Author(s) 2009. Published by Oxford University Press.
creative-commons:
Received Day: 16 Month: 10 Year: 2009
Revision Received Day: 2 Month: 11 Year: 2009
Accepted Day: 3 Month: 11 Year: 2009
collection publication date: Month: 1 Year: 2010
Print publication date: Month: 1 Year: 2010
Electronic publication date: Month: 1 Year: 2010
pmc-release publication date: Day: 3 Month: 12 Year: 2009
Volume: 38 Issue: Database issue
First Page: D1 Last Page: D4
ID: 2808992
PubMed Id: 19965766
DOI: 10.1093/nar/gkp1077
Publisher Id: gkp1077

The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources
Guy R. Cochrane1*
Michael Y. Galperin2
1EMBL?European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Correspondence: *To whom correspondence should be addressed. Tel: +44 1223 492 564; Fax: +44 1223 494 468; Email: cochrane@ebi.ac.uk

COMMENTARY

This Database Issue of Nucleic Acids Research (NAR) includes descriptions of 58 new data resources and updates to 73 previously published data resources. The online Database Collection that accompanies the issue holds 1230 data resources, a growth of 5% over last year (http://www.oxfordjournals.org/nar/database/a/).

Continuing a decade-long tradition, the Database Issue and Database Collection serve two functions: (i) to introduce molecular and cell biologists that make up the regular readership of NAR to the databases that might be useful to them and (ii) to provide database developers a venue to publish articles to promote their resources and introduce their work to the community that might benefit from it. Based on a number of measures (such as the numbers of downloads, literature citations and web links from outside sources), the NAR Database Issue and Database Collection have been extremely successful. Despite rather strict acceptance criteria (1), the number of submitted articles greatly exceeds the capacity of a single annual issue. In order to accommodate this, Oxford University Press, the publisher of NAR, has recently launched the new journal Database: The Journal of Biological Databases and Curation (http://database.oxfordjournals.org/). We hope that the availability of this new journal, as well as that of our other sister journal, Bioinformatics, will provide a publication venue for databases that could not be accepted in the NAR Database Issue because of their limited scope, absence of manual curation or orientation to a limited readership.

The data resources of the Database Issue and the Database Collection make up an invaluable infrastructure upon which much of life science has come to rely. Far more than a collection of distinct information sources, the resources form an extensive and evolving network of connected data, common concepts and shared technologies, driven forward by the collective efforts of developers, curators and database managers. While there is no moderator of this network and no overall controller of its growth, through peer review and editorial processes, the Database Issue and Database Collection provide a valuable quality assurance service to the reader.

In this editorial, we first outline some of the new and updated databases that will be of interest to readers of the Database Issue. While individually these databases offer great utility, it is perhaps as part of the community of people, data and technology that the resources offer up some of their richest uses; we complete our introduction to the Database Issue with a commentary on this community.


NEW AND UPDATED DATABASES

In addition to the usual updates on the database services at the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI), this issue includes a comprehensive listing of Japanese databases provided by the Japanese National BioResource Project [http://www.nbrp.jp (2)]. The geographic distribution of the featured databases continues to grow; the phiSITE (http://www.phisite.org), a database of gene regulation in bacteriophages (3), is the first database in the list from Slovakia.

Several articles in this issue feature updates on the status of databases that have been included in the Database Collection after being described first in other journals. These include the Eukaryotic Linear Motif database [ELM, http://elm.eu.org/ (4)], the Catalogue Of Somatic Mutations In Cancer [COSMIC, http://www.sanger.ac.uk/genetics/CGP/cosmic/ (5)], MicrobesOnline [http://www.MicrobesOnline.org (6)], the Immune Epitope Database [http://www.immuneepitope.org/ (7)] and PDBselect [http://bioinfo.tg.fh-giessen.de/pdbselect/ (8)]. Several other articles describe updated features of such popular resources as the Comprehensive Microbial Resource [CMR, http://cmr.jcvi.org/ (9)], PrimerBank [http://pga.mgh.harvard.edu/primerbank/ (10)] and the Therapeutic Target Database [http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp (11)], which have been last described in NAR several years ago.

In previous issues, update articles were permitted only brief descriptions of the latest changes in the respective resource. We felt that this limitation was unnecessary and that readers might benefit from more extensive and detailed descriptions of key database resources. For this issue, we have invited the authors responsible for several popular data resources to submit extended papers to provide a deeper insight into the organization and goals of their respective resources and would put the recent changes in these resources into a broader context. We are very happy with several excellent papers that resulted from this initiative, including comprehensive descriptions of the recent changes in Pfam [http://pfam.sanger.ac.uk/ (12)], MetaCyc [http://metacyc.org/ (13)], UniProt [http://www.uniprot.org/ (14)], IntAct [http://www.ebi.ac.uk/intact/ (15)], the Eukaryotic Linear Motif database [ELM, http://elm.eu.org/ (4)], the Comprehensive Microbial Resource [CMR, http://cmr.jcvi.org/ (9)] and the Integrated Microbial Genomes system [IMG, http://img.jgi.doe.gov/ (16)].

In addition, we have included extensive descriptions of three key databases recently unveiled by the EBI: the Gene Expression Atlas [http://www.ebi.ac.uk/gxa, (17)], Ensembl Genomes [http://www.ensemblgenomes.org, (18)] and the Protein Data Bank in Europe [PDBe, http://www.ebi.ac.uk/pdbe/ (19)]. We expect to continue this approach with longer articles next year; database authors who would like to submit such descriptions of their resources are encouraged to contact Michael Galperin at nardatabase@gmail.com in advance.


A COMMUNITY OF DATA RESOURCES

Some of the greatest efforts that have brought connectivity between the records of distinct resources were conceived not to serve pre-defined sets of users, but rather as open-ended initiatives that would provide broad utility across multiple domains. Perhaps the best known of these is the Gene Ontology (GO) project [http://www.geneontology.org/ (20)], which finds itself at the heart of the community of resources described in this Database Issue and Database Collection, with many of them providing GO annotations. Annotation of a common GO term to data objects in distinct resources allows the user to infer a conceptual relationship between the objects that is described by the term; by extension, more distant relationships can be inferred by using ontological relationships within GO to reach terms common to distinct data objects.

A shared approach to the development of data models is a theme in the Database Collection. For example, many model organism databases, such as Flybase [http://flybase.org/ (21)], Beetlebase [http://beetlebase.org/ (22)], ParameciumDB [http://paramecium.cgm.cnrs-gif.fr/db/index (23)] and wFleaBase [http://wfleabase.org/ (24)], have adopted the Chado database schema as underlying data structure for their resources. Chado, delivered and maintained by the GMOD community, provides database and interface technology for a broad range of information typically required by users of a model organism database, including genomic data, expression data, phenotypic information and literature collections (25). Centred on ontologies and controlled vocabularies, the schema is extensible through its system of domain-specific modules. Model organism databases that adopt Chado benefit from reduced development costs, as they are immediately able to use the substantial body of technologies (such as genome browsers, gene pages, search tools) that are freely available. In addition, users of these databases can apply their knowledge and experience of Chado-based interfaces to all model organism databases that have adopted the schema.

A key strategy for many resources in the Database Collection is partnering to exchange data, as a means of achieving comprehensive coverage and to reduce overall effort in data management. Successful data exchange relies not least on agreement to structure information in compatible ways. In 1982, the databases of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org) established the Feature Table Definitions and continue to maintain the definitions to this day. The document defines a formalised text format in which biological features and their sequence coordinates can be described. INSDC feature table format, as defined in the Feature Table Definitions, remains the file format under which INSDC annotation data are kept in daily global synchrony.

An unsung hero, perhaps, in this community is the taxonomic backbone upon which almost all of its resources hang. The NCBI Taxonomy project (http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook&part=ch4) was established in 1991 and has been adopted as the de facto standard taxonomic classification of biomolecular data. While there are minor deviations and while for many resources with limited taxonomic range (such as the model organism databases), a model of taxonomy is not implicitly required, those resources that deal with multiple taxa all share this one system. The heaviest users are the large generalist resources [INSDC, PDBe (19), UniProt (14)], but even specialist resources adopt the system. The benefit to the user of the resource, of course, is the ability to approach, filter and link biomolecular information for any given taxon or set of taxa.

Finally, there are the direct relationships between items of information in the resources. These take many forms: perhaps molecular sequences are similar between objects, the same genes are described, a functional correspondence between objects exists or common technology for presenting data binds two resources together. Some of these relationships are derived computationally, some manually curated from the literature and some exchanged between databases in reciprocal exchanges. The strengths of these relationships also vary; some resources share data directly leading to exact mappings between their objects, others report common attributes of their objects. Relationships can be asserted explicitly in cross-references, implied through common references to objects in tertiary resources or the user can be left to inject his/her own knowledge and creativity to bring a specific area of the network into sharper focus.

For those generating and interpreting data to be fed into the resources of the Database Collection, shared technologies and concepts need to be used. For this to happen, they need to be understood and readily available. Community standardisation initiatives, with their repertoire of minimal reporting standards and technology development initiatives to support the use of their standards, make their contribution here. Seminal work in the microarray field under the Microarray Gene Expression Data (MGED) consortium led to the development and adoption of MIAME?Minimal Information about a Microarray Experiment?a checklist of items of information required to render microarray data reusable beyond the initial analysis of those who generated the data (26). Since this time, a whole host of minimal reporting standards have been developed to better the usability of data, across genomics and environmental sequencing [the MIGS, MIMS and MIENS standards of the Genomics Standards Consortium and the emerging MINSEQE standard from MGED (27), http://gensc.org/, http://www.mged.org/minseqe/], proteomics [the MIAPE standard (28)], cell-based assays (MIACA; http://miaca.sourceforge.net/), phylogenetics [MIAPA (29)], systems biology [MIRIAM (30)] and many more.

Plenty of additional cases of such collective investment exist, but from the examples of vocabulary, taxonomic backbone, shared data models, common file formats and cross-references development initiatives alone, the value to the user is already clear. A continued attention to collective effort in such areas remains key to optimising the utility of our community of resources.

As then, we prepare grant proposals to generate, interpret and present our latest and greatest data, we make reference to interoperability, shared effort to develop technologies and the need for cooperation and collaboration. The community of resources described in this Database Issue and Database Collection provides a compelling example of the many successes that can be won with investment in interoperability and shared effort. Curators, developers, managers and users of life science databases know well the ongoing importance of developing and maintaining connectivity between our resources and cooperation between those involved.


FUNDING

European Molecular Biology Laboratory (to G.R.C.); Intramural Research Program of the US National Institutes of Health (to M.Y.G.). Funding for open access charge: Oxford University Press.

Conflict of interest statement. None declared.


ACKNOWLEDGEMENTS

The authors thank Scott Federhen (NCBI) and Emily Dimmer and Cath Brooksbank (EBI) for providing background information that helped in the preparation of this editorial; Sir Richard Roberts and Alex Bateman for many helpful comments; Patricia Anderson, Martine Bernardes-Silva, Karen Otto and Gail Welsh for excellent editorial assistance; and the Oxford University Press teams lead by Claire Bird and Radha Dutia for their help in compiling this issue.


REFERENCES
1. Galperin M.Y.,Cochrane G.R.. Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009Nucleic Acids Res.Year: 200937D1D419033364
2. Yamazaki Y.,Akashi R.,Banno Y.,Endo T.,Ezura H.,Fukami-Kobayashi K.,Inaba K.,Isa T.,Kamei K.,Kasai F.,et al. NBRP databases: databases of biological resources in JapanNucleic Acids Res.Year: 201038D26D3219934255
3. Klucar L.,Stano M.,Hajduk M.. phiSITE: Database of gene regulation in bacteriophagesNucleic Acids Res.Year: 201038D366D37019900969
4. Gould C.M.,Diella F.,Via A.,Puntervoll P.,Gem?nd C.,Chabanis-Davidson S.,Michael S.,Sayadi A.,Bryne J.C.,Chica C.,et al. ELM: the status of the 2010 eukaryotic linear motif resourceNucleic Acids Res.Year: 201038D167D18019920119
5. Forbes S.,Tang G.,Bindahl N.,Bamford S.,Dawson E.,Cole C.,Kok C.Y.,Jia M.,Ewing R.,Menzies A.,et al. COSMIC (the catalogue of somatic mutations in cancer): a resource to investigate acquired mutations in human cancerNucleic Acids Res.Year: 201038D652D65719906727
6. Dehal P.S.,Joachimiak M.P.,Price M.N.,Bates J.T.,Baumohl J.K.,Chivian D.,Huang K.H.,Keller K.,Novichkov P.S.,Dubchak I.L.,et al. MicrobesOnline: an integrated portal for comparative and functional genomicsNucleic Acids Res.Year: 201038D396D40019906701
7. Vita R.,Zarebski L.,Greenbaum J.,Emami H.,Hoof I.,Salimi N.,Damle R.,Sette A.,Peters B.. The Immune Epitope Database 2.0Nucleic Acids Res.Year: 201038D854D86219906713
8. Griep S.,Hobohm U.. PDBselect 1992-2009 and PDBfilter-selectNucleic Acids Res.Year: 201038D318D31919783827
9. Davidsen T.,Beck E.,Ganapathy A.,Montgomery R.,Zafar N.,Yang Q.,Madupu R.,Goetz P.,Galinsky K.,White O.,et al. The Comprehensive Microbial Resource (CMR)Nucleic Acids Res.Year: 201038D340D34519892825
10. Spandidos A.,Wang X.,Wang H.,Seed B.. PrimerBank: a resource of human and mouse PCR primer pairs for gene expression detection and quantificationNucleic Acids Res.Year: 201038D792D79919906719
11. Zhu F.,Han B.C.,Kumar P.,Liu X.H.,Ma X.H.,Wei X.N.,Huang L.,Guo Y.F.,Han L.Y.,Zheng C.J.,et al. Update of TTD: Therapeutic Target DatabaseNucleic Acids Res.Year: 201038D787D79119933260
12. Finn R.D.,Mistry J.,Tate J.,Coggill P.,Heger A.,Pollington J.E.,Gavin O.L.,Ceric G.,Forslund K.,Holm L.,et al. The Pfam protein families databaseNucleic Acids Res.Year: 201038D211D22219920124
13. Caspi R.,Altman T.,Dale J.M.,Dreher K.,Fulcher C.A.,Gilham F.,Kaipa P.,Karthikeyan A.S.,Kothari A.,Krummenacker M.,et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databasesNucleic Acids Res.Year: 201038D473D47919850718
14. The UniProt ConsortiumThe Universal Protein Resource (UniProt) in 2010Nucleic Acids ResYear: 201038D142D14819843607
15. Aranda B.,Achuthan P.,Alam-Faruque Y.,Armean I.,Bridge A.,Derow C.,Feuermann M.,Ghanbarian A.T.,Kerrien S.,Khadake J.,et al. The IntAct molecular interaction database in 2010Nucleic Acids Res.Year: 201038D525D53119850723
16. Markowitz V.M.,Chen I.A.,Palaniappan K.,Chu K.,Szeto E.,Grechkin Y.,Ratner A.,Anderson I.,Lykidis A.,Mavromatis K.,et al. The integrated microbial genomes (IMG) system: an expanding comparative analysis resourceNucleic Acids Res.Year: 201038D382D39019864254
17. Kapushesky M.,Emam I.,Holloway E.,Kurnosov P.,Zorin A.,Malone J.,Rustici G.,Williams E.,Parkinson H.,Brazma A.. Gene Expression Atlas at the European Bioinformatics InstituteNucleic Acids Res.Year: 201038D690D69819906730
18. Kersey P.J.,Lawson D.,Birney E.,Derwent P.S.,Haimel M.,Herrero J.,Keenan S.,Kerhornou A.,Koscielny G.,K?h?ri A.,et al. Ensembl Genomes - Extending Ensembl across the taxonomic spaceNucleic Acids Res.Year: 201038D563D56919884133
19. Velankar S.,Best C.,Beuth B.,Boutselakis C.H.,Cobley N.,Sousa da Silva A.W.,Dimitropoulos D.,Golovin A.,Hirshberg M.,John M.,et al. PDBe: Protein Data Bank in EuropeNucleic Acids Res.Year: 201038D308D31719858099
20. The Gene Ontology ConsortiumThe Gene Ontology enters its second decadeNucleic Acids Res.Year: 201038D331D33519920128
21. Tweedie S.,Ashburner M.,Falls K.,Leyland P.,McQuilton P.,Marygold S.,Millburn G.,Osumi-Sutherland D.,Schroeder A.,Seal R.,et al. FlyBase: enhancing Drosophila Gene Ontology annotationsNucleic Acids Res.Year: 200937D555D55918948289
22. Kim H.S.,Murphy T.,Xia J.,Caragea D.,Park Y.,Beeman R.W.,Lorenzen M.D.,Butcher S.,Manak J.R.,Brown S.J.. BeetleBase in 2010: revisions to provide comprehensive genomic information for Tribolium castaneumNucleic Acids Res.Year: 201038D437D44219820115
23. Arnaiz O.,Cain S.,Cohen J.,Sperling L.. ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic dataNucleic Acids ResYear: 200735D439D44417142227
24. Colbourne J.K.,Singan V.R.,Gilbert D.G.. wFleaBase: the Daphnia genome databaseBMC BioinformaticsYear: 200564515752432
25. Mungall C.J.,Emmert D.B.. A Chado case study: an ontology-based modular schema for representing genome-associated biological informationBioinformaticsYear: 200723i337i34617646315
26. Brazma A.,Hingamp P.,Quackenbush J.,Sherlock G.,Spellman P.,Stoeckert C.,Aach J.,Ansorge W.,Ball C.A.,Causton H.C.,et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray dataNat. Genet.Year: 20012936537111726920
27. Field D.,Garrity G.,Gray T.,Morrison N.,Selengut J.,Sterk P.,Tatusova T.,Thomson N.,Allen M.J.,Angiuoli S.V.,et al. The minimum information about a genome sequence (MIGS) specificationNat. Biotechnol.Year: 20082654154718464787
28. Taylor C.F.,Paton N.W.,Lilley K.S.,Binz P.A.,Julian R.K. Jr,Jones A.R.,Zhu W.,Apweiler R.,Aebersold R.,Deutsch E.W.,et al. The minimum information about a proteomics experiment (MIAPE)Nat. Biotechnol.Year: 20072588789317687369
29. Leebens-Mack J.,Vision T.,Brenner E.,Bowers J.E.,Cannon S.,Clement M.J.,Cunningham C.W.,dePamphilis C.,deSalle R.,Doyle J.J.,et al. Taking the first steps towards a standard for reporting on phylogenies: minimum information about a phylogenetic analysis (MIAPA)OmicsYear: 20061023123716901231
30. Le Novere N.,Finney A.,Hucka M.,Bhalla U.S.,Campagne F.,Collado-Vides J.,Crampin E.J.,Halstead M.,Klipp E.,Mendes P.,et al. Minimum information requested in the annotation of biochemical models (MIRIAM)Nat. Biotechnol.Year: 2005231509151516333295

Article Categories:
  • Articles


Previous Document:  A calibrated diversity assay for nucleic acid libraries using DiStRO--a Diversity Standard of Random...
Next Document:  Dr1 (NC2) is present at tRNA genes and represses their transcription in human cells.