Skip Header

You are using a version of browser that may not display all the features of this website. Please consider upgrading your browser.

Automatic annotation

Last modified July 16, 2020

UniProt's Automatic Annotation pipeline enhances the unreviewed records in UniProtKB by enriching them with automatic classification and annotation.

Automatic classification and domain annotation

UniProt uses InterPro to classify sequences at superfamily, family and subfamily levels and to predict the occurrence of functional domains and important sites. InterPro integrates predictive models of protein function, so-called 'signatures', from a number of member databases. InterPro matches are automatically annotated to UniProtKB entries as database cross-references with every InterPro release.

In UniProtKB/TrEMBL entries, domains from the InterPro member databases PROSITE, SMART or Pfam are predicted and annotated automatically, and their evidence/source labels indicate "InterPro annotation".

Automatic annotation

UniProt has developed two prediction systems, UniRule and the Association-Rule-Based Annotator (ARBA) to automatically annotate UniProtKB/TrEMBL in an efficient and scalable manner with a high degree of accuracy.

Rules that constitute these two prediction systems can be browsed and queried in dedicated sections of the UniProt website:

We also use a suite of Sequence Analysis Methods (SAM) to enrich the unreviewed TrEMBL records in the UniProt Knowledgebase with extra sequence-specific information. Predictions of sequence features such as Signal, Transmembrane and Coil regions are generated using software from external providers.

UniProt is an ELIXIR core data resource
Main funding by: National Institutes of Health

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.

Do not show this banner again