Life in BIOINFORMATICS......: Databases

Showing posts with label Databases. Show all posts

Monday, October 19, 2009

UCSC Genome browser..

This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides a portal to the ENCODE project.The UCSC Genome Browser is developed and maintained by the Genome Bioinformatics Group, a cross-departmental team within the Center for Biomolecular Science and Engineering (CBSE) at the University of California Santa Cruz (UCSC).

This application includes following features and software applications:

Monday, October 12, 2009

What are Databases..??

Databases

At the beginning of the "genomic revolution," a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues, but also the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.

Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analysing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:

The development and implementation of tools that enable efficient access to, and use and management of, various types of information;

The development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences.

Biological databases

A biological database is a large, organised body of persistent data, usually associated with computerised software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name; the input sequence with a description of the type of molecule; the scientific name of the source organism from which it was isolated; and, often, literature citations associated with the sequence. For researchers to benefit from the data stored in a database, two additional requirements must be met:

Easy access to the information;

A method for extracting only that information needed to answer a specific biological question.

Entrez

At the site of the NCBI, many of the databases are linked through a unique search and retrieval system, called Entrez. Entrez allows a user to not only access and retrieve specific information from a single database, but to access integrated information from many NCBI databases. For example, the Entrez protein database is cross-linked to the Entrez taxonomy database. This allows a researcher to find taxonomic information of the protein of interest. An overview of the most important databases is given in the part Databases on this site.

UniProt

This was a joint initiative comprising of the three significant protein database banks:

Protein Information Research(PIR)- by Georgetown univ
Swiss Institute of Bioinformatics(SIB)
European Molecular Biology Labs(EMBL) (EIB)

EMBL(European Molecular Biology Labs)

Sunday, October 11, 2009

PROTEIN DATABANKS: Protein Information Resource(PIR)

The Protein Information Resource (PIR) is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies (Wu et al., 2003).

PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers in the identification and interpretation of protein sequence information. Prior to that, the NBRF compiled the first comprehensive collection of macromolecular sequences in the Atlas of Protein Sequence and Structure, published from 1965-1978 under the editorship of Margaret O. Dayhoff. Dr. Dayhoff and her research group pioneered in the development of computer methods for the comparison of protein sequences, for the detection of distantly related sequences and duplications within sequences, and for the inference of evolutionary histories from alignments of protein sequences.

Dr. Winona Barker and Dr. Robert Ledley assumed leadership of the project after the untimely death of Dr. Dayhoff in 1983. In 1999 Dr. Cathy H. Wu joined NBRF, and later on Georgetown University Medical Center (GUMC), to head the bioinformatics efforts of PIR, and has served first as Principal Investigator and, since 2001, as Director.

For over four decades, beginning with the Atlas of Protein Sequence and Structure, PIR has provided protein databases and analysis tools freely accessible to the scientific community including the Protein Sequence Database (PSD).

In 2002 PIR, along with its international partners, EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics), were awarded a grant from NIH to create UniProt, a single worldwide database of protein sequence and function, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases.

In 2009 Dr. Wu accepted the Edward G. Jefferson Chair of Bioinformatics and Computational Biology at the University of Delaware (UD).

Today, PIR maintains staff at UD and GUMC and continues to offer world leading resources to assist with proteomic and genomic data integration and the propagation and standardization of protein annotation.

The image above basically represents the main DATABASE HOUSES amongst which the significant Nucleotide(DNA/RNA), Protein residues are excahnged and updated...

DNA DATABANK OF JAPAN(DDBJ)

DNA Data Bank of Japan (DDBJ) is the sole nucleotide sequence data bank in Asia,

which is officially certified to collect nucleotide sequences from researchers and to issue the internationally recognized accession number to data submitters. Since we exchange the collected data with EMBL-Bank/EBI (European Bioinformatics Institute) and GenBank/NCBI (National Center for Biotechnology Information) on a daily basis, the three data banks share virtually the same data at any given time. The virtually unified database is called "the International Nucleotide Sequence Database (INSD)". DDBJ collects sequence data mainly from Japanese researchers, but of course accepts data and issue the accession number to researchers in any other countries.
DDBJ is organized by the Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ) of the National Institute of Genetics (NIG) with endorsement of The Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT). 99% of INSD data from Japanese researchers are submitted through DDBJ.
The principal purpose of DDBJ operations is to improve the quality of INSD, as public domains. When researchers make their data open to the public through INSD and commonly shared in world wide, we at DDBJ make efforts to describe information on the data as rich as possible, according to the unified rules of INSD, preferably without any stress by using DDBJ.

Saturday, October 10, 2009

NCBI (National Center For Biotechnology Information)

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland(38.994994°N 77.099339°W ) and was founded in 1988 through legislation sponsored by Senator Claude Pepper. The NCBI houses genome sequencing data in GenBank and an index of biomedical research articles in PubMed Central and PubMed, as well as other information relevant to biotechnology. All these databases are available online through the Entrez search engine.

The NCBI is directed by David Lipman, one of the original authors of the BLAST sequence alignment program and a widely respected figure in Bioinformatics. He also leads an intramural research program, including groups led by Stephen Altschul (another BLAST co-author), David Landsman, and Eugene Koonin (a prolific author on comparative genomics).

Wednesday, September 23, 2009

NCBI (National Center For Biotechnology Information)

http://www.ncbi.nlm.nih.gov/
Bioinformatics was first started as a field of study under the dept of Biotechnology at the National Center For Biotechnology Information...Estb

interrelations between the various databases in NCBI

http://www.ncbi.nlm.nih.gov/Database/datamodel/index.html

Monday, October 19, 2009

UCSC Genome browser..

Monday, October 12, 2009

What are Databases..??

Databases

Biological databases

Entrez

UniProt

EMBL(European Molecular Biology Labs)

Sunday, October 11, 2009

PROTEIN DATABANKS: Protein Information Resource(PIR)

DNA DATABANK OF JAPAN(DDBJ)

Saturday, October 10, 2009

NCBI (National Center For Biotechnology Information)

Wednesday, September 23, 2009

NCBI (National Center For Biotechnology Information)

interrelations between the various databases in NCBI

Blog Archive

Labels

Facebook Badge

Page Views

Labels

Followers