Annotation and Enrichment Analysis

Biostatistical Methods


Biostatistical Methods

Biostatistical Service Volcano plot

BioXpedia is proud to offer data analysis using annotation and enrichment analysis.

This data analysis focuses on using gene set enrichment analysis (GSEA) to determine if a class (e.g. pathways) of genes or proteins are over-represented in a large set of genes or proteins.

The data analysis includes the following components:

  • Detailed PDF report.
  • Data handling.
  • Employment of GSEA.
  • Visualization of enriched genes or proteins.

Read below for more information on annotation and enrichment analysis:

Annotation of genes is a crucial part of any gene expression analysis. The raw output of an expression analysis is usually not easy to interpret because it does not include gene names. By performing annotation, the expression measurements are matched with gene names in databases. This couples the results with existing biological knowledge to aid in the interpretation.

If the experiment is based on RNA-seq, micro-array or any other method that includes a nearly exhaustive number of genes or proteins, the dataset become very large. So large, that even after annotation, interpretation can be quite overwhelming. Enrichment analysis is a great tool to help in this process (Subramanian et al., 2005).

Expression analysis often includes a test for differential to investigate which genes are expressed at different levels in cases and controls. With a long list of differentially expressed genes, Gene Set Enrichment Analysis (GSEA) (Mootha et al., 2003) can help to identify what these genes have in common. GSEA can be used to test if a list of genes is enriched for genes encoding proteins in a certain pathway, genes from a certain chromosomal region or genes that encode proteins which fall in the same category e.g. membrane transport. The test part is performed by testing if there are more genes from a certain pathway or category than would be expected by chance. The assumptions of the test require the number of genes to be very large, so enrichment analysis is only suitable for experiments using methods like RNA-seq or microarray as mentioned above. Annotation, however, is useful for any number of genes. (Reimand et al., 2019)




  1. Mootha, V., Lindgren, C., Eriksson, K. et al.PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267–273 (2003).

  1. Reimand, Jüri et al. “Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap.” Nature protocols 14,2 (2019): 482-517.

  1. Subramanian, Aravind et al. “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.” Proceedings of the National Academy of Sciences of the United States of America 102,43 (2005): 15545-50.