GSEAPreranked GenePattern Module Documentation

A GenePattern module for running the GSEA Preranked method

GSEAPreranked (v7.4.x)

Runs the gene set enrichment analysis against a user-supplied ranked list of genes.

Author: Chet Birger, David Eby; Broad Institute

Contact:

See the GSEA forum for GSEA questions.

Contact the GenePattern team for GenePattern issues.

GSEA Version: 4.3.x

Introduction

GSEAPreranked runs Gene Set Enrichment Analysis (GSEA) against a user-supplied, ranked list of genes. It determines whether a priori defined sets of genes show statistically significant enrichment at either end of the ranking. A statistically significant enrichment indicates that the biological activity (e.g., biomolecular pathway) characterized by the gene set is correlated with the user-supplied ranking.

Details

Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. It evaluates cumulative changes in the expression of groups of multiple genes defined based on prior biological knowledge.

The GSEAPreranked module can be used to conduct gene set enrichment analysis on data that do not conform to the typical GSEA scenario. For example, it can be used when the ranking metric choices provided by the GSEA module are not appropriate for the data, or when a ranked list of genomic features deviates from traditional microarray expression data (e.g., GWAS results, ChIP-Seq, RNA-Seq, etc.).

The user provides GSEAPreranked with a pre-ranked gene list. Paired with each gene in the list is the numeric ranking statistic, which GSEAPreranked uses to rank order genes in descending order. GSEAPreranked calculates an enrichment score for each gene set. A gene set’s enrichment score reflects how often members of that gene set occur at the top or bottom of the ranked data set (for example, in expression data, in either the most highly expressed genes or the most underexpressed genes).

The ranked list must not contain duplicate ranking values.

Duplicate ranking values may lead to arbitrary ordering of genes and to erroneous results. Therefore, it is important to make sure that the ranked list contains no duplicate ranking values.

Permutation test

In GSEAPreranked, permutations are always done by gene set. In standard GSEA, you can choose to set the parameter Permutation type to phenotype (the default) or gene set, but GSEAPreranked does not provide this option.

Understand and keep in mind how GSEAPreranked computes enrichment scores.

The GSEA PNAS 2005 paper introduced a method where a running sum statistic is incremented by the absolute value of the ranking metric when a gene belongs to the set. This method has proven to be efficient and facilitates intuitive interpretation of ranking metrics that reflect correlation of gene expression with phenotype. In the case of GSEAPreranked, you should make sure that this weighted scoring scheme applies to your choice of ranking statistic. If in doubt, we recommend using a more conservative scoring approach by setting scoring scheme parameter to classic; however, the scoring scheme parameter’s default value is weighted, the default value employed by the GSEA module. Please refer to the GSEA PNAS 2005 paper for further details.

References

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102(43);15545-15550. (link)

Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesivor JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1-α responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34:267-273. (link)

GSEA User Guide: http://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html

GSEA website: http://www.gsea-msigdb.org/

This version of the module is based on the GSEA v4.1.x code base. See the Release Notes for new features and other notable changes.

Parameters

NOTE: Certain parameters are considered to be “advanced”; that is, they control details of the GSEAPreranked algorithm that are typically not changed. You should not override the default values unless you are conversant with the algorithm. These parameters are marked “Advanced” in the parameter descriptions.

* = required

Input Files

  1. ranked list: RNK file

This file contains the rank ordered gene (or feature) list.

  1. gene sets database file: GMT, GMX, or GRP file

Gene set files, either your own or from the listed MSigDB files.

  1. chip platform: an optional CHIP file may be provided if you do not select a chip platform from the drop-down

Output Files

  1. Enrichment Report archive: ZIP

ZIP file containing the result files. For more information on interpreting these results, see Interpreting GSEA Results in the GSEA User Guide. Note that in prior versions the ZIP bundle was created as the only output file. This behavior has been changed to give direct access to the results without the need for a download.

  1. Enrichment Report: HTML and PNG images

The GSEA Enrichment Report. As above, see the GSEA User Guide for more info.

  1. Optional SVG images (compressed)

Identical to the PNGs in the Enrichment Report, but in SVG format for higher resolution. These are GZ compressed to reduce space usage; they can be decompressed using ‘gunzip’ on Mac or Linux and 7-Zip on Windows

Platform Dependencies

Task Type:
Gene List Selection

CPU Type:
any

Operating System:
any

Language:
Java

Version Comments

Copyright © 2003-2022 Broad Institute, Inc., Massachusetts Institute of Technology, and Regents of the University of California. All rights reserved.