Go to the regular (non-viral) OMA database

Current release


The entire OMA database is available for download in several formats. It is also possible to download each group separately. This option is available in the group view. Please read our terms and conditions before integrating OMA data into your own research or database.



Orthology Relationships

The orthology relationships are available in two types: groups or pairs of orthologs. The information is given in terms of OMA identifiers (of the form HUMAN04376).

OMA groups: Text format
OrthoXML format
Hierarchical orthologous groups (HOGs): OrthoXML format
Species phylogeny of HOGs Phyloxml format
Newick format
Pairwise orthologs: Text format
Pairs between two species: Genome Pair View ext logo

Sequences

All sequences with the corresponding OMA identifiers can be downloaded in fasta files. The proteins are all in one file, while the coding DNA is split into two files, one for the Eukaryotes and one for the Prokaryotes.

Protein sequences: Fasta format
SeqXML format
cDNA Eukaryotes: Fasta format
cDNA Prokaryotes: Fasta format
Protein Annotations: Text format

Identifier Mapping

Mappings of the OMA identifier to various other databases are available. Mappings to UniProt, RefSeq and EntrezGene IDs are based on exact sequence matches, other cross-references come from source genome files directly. :

Mapping to UniProt: Text format
Mapping to Ensembl: Text format
Mapping to Refseq ACs: Text format
Mapping to Entrez Gene IDs: Text format
Mapping to NCBI GIs: Text format
Mapping to NCBI GenBank IDs: Text format
Mapping to Wormbase: Text format
Mapping to JGI: Text format
Mapping to GO: Text format
Plant mappings: Text format

Other files

OMA Groups/Sequences in COGs format: Cog format
Species information: (Taxon IDs, scientific names, genome sources) Text format
Group descriptions: Text format
Close OMA Groups: Text format

OMA ID History

Mappings of the OMA identifier of updated genomes from one release to another. We track only proteins with same amino acid sequences.