| Title: | Performing Comprehensive Overlap Assessments |
|---|---|
| Description: | The implementation of a statistical framework for performing overlap assessments on lists comprising sets of strings (such as lists of gene sets) described in Stoica (2023) <https://ora.ox.ac.uk/objects/uuid:b0847284-a02f-47ee-88e3-a3c4e0cdb8b1>. It can assess overlaps of pairs of sets of strings selected either from the same universe or from different universes, and overlaps of triplets of sets of strings selected from the same universe. Designed for single-cell RNA-sequencing data analysis applications, but suitable for other purposes as well. |
| Authors: | Andrei-Florian Stoica [aut, cre] (ORCID: <https://orcid.org/0000-0002-5253-0826>) |
| Maintainer: | Andrei-Florian Stoica <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.8.0 |
| Built: | 2026-05-25 03:05:50 UTC |
| Source: | https://github.com/andrei-stoica26/listo |
This function builds a Seurat marker list ready to be used by LISTO. Requires Seurat (not automatically installed with LISTO).
buildSeuratMarkerList(seuratObj, col, logFCThr = 1, minPct = 0.2, ...)buildSeuratMarkerList(seuratObj, col, logFCThr = 1, minPct = 0.2, ...)
seuratObj |
A Seurat object. |
col |
Seurat metadata column used for grouping. |
logFCThr |
Fold change threshold for testing. |
minPct |
The minimum fraction of in-cluster cells in which tested genes need to be expressed. |
... |
Additional arguments passed to |
A list consisting of data frames generated with
Seurat::FindMarkers.
seuratPath <- system.file('extdata', 'seuratObj.qs2', package='LISTO') seuratObj <- qs2::qs_read(seuratPath) a <- buildSeuratMarkerList(seuratObj, 'Cell_Cycle', logFCThr=0.1)seuratPath <- system.file('extdata', 'seuratObj.qs2', package='LISTO') seuratObj <- qs2::qs_read(seuratPath) a <- buildSeuratMarkerList(seuratObj, 'Cell_Cycle', logFCThr=0.1)
This function generates the prime factor decomposition of n factorial.
factorialPrimePowers(n)factorialPrimePowers(n)
n |
A positive integer. |
A vector in which positions represent prime numbers (that is, the first position corresponds to 2, the second position corresponds to 3, the third position corresponds to 5, etc.) and values represent their exponents in the factorial decomposition.
factorialPrimePowers(8)factorialPrimePowers(8)
This function orders a data frame based on a column of p-values, performs multiple testing on the column, and filters the data-frame based on it.
mtCorrectDF( df, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), colStr = "pval", newColStr = "pvalAdj", pvalThr = 0.05, doOrder = TRUE, nComp = nrow(df) )mtCorrectDF( df, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), colStr = "pval", newColStr = "pvalAdj", pvalThr = 0.05, doOrder = TRUE, nComp = nrow(df) )
df |
A data frame with a p-values column. |
mtMethod |
Multiple testing correction method. Choices are 'BY' (default), 'holm', hochberg', hommel', 'bonferroni', 'BH', 'fdr' and 'none'. |
colStr |
Name of the column of p-values. |
newColStr |
Name of the column of adjusted p-values that will be created. |
pvalThr |
p-value threshold used for filtering. If |
doOrder |
Whether to increasingly order the data frame based on the adjusted p-values. |
nComp |
Number of comparisons. In most situations, this parameter should not be changed. |
A data frame with the p-value column corrected for multiple testing.
df <- data.frame(elem = c('A', 'B', 'C', 'D', 'E'), pval = c(0.032, 0.001, 0.0045, 0.051, 0.048)) mtCorrectDF(df)df <- data.frame(elem = c('A', 'B', 'C', 'D', 'E'), pval = c(0.032, 0.001, 0.0045, 0.051, 0.048)) mtCorrectDF(df)
This function performs multiple testing correction on a vector of p-values.
mtCorrectV( pvals, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), mtStat = c("identity", "median", "mean", "max", "min"), nComp = length(pvals) )mtCorrectV( pvals, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), mtStat = c("identity", "median", "mean", "max", "min"), nComp = length(pvals) )
pvals |
A numeric vector. |
mtMethod |
Multiple testing correction method. Choices are 'BY' (default), 'holm', hochberg', hommel', 'bonferroni', 'BH', 'fdr' and 'none'. |
mtStat |
A statistics to be optionally computed. Choices are 'identity' (no statistics will be computed and the adjusted p-values will be returned as such), 'median', 'mean', 'max' and 'min'. |
nComp |
Number of comparisons. In most situations, this parameter should not be changed. |
If mtStat is 'identity' (as default), a numeric vector of
p-values corrected for multiple testing. Otherwise, a statistic based on
these corrected p-values defined by mtStat.
pvals <- c(0.032, 0.001, 0.0045, 0.051, 0.048) mtCorrectV(pvals)pvals <- c(0.032, 0.001, 0.0045, 0.051, 0.048) mtCorrectV(pvals)
This function computes the probability that two subsets of sets M and N intersect in k points. Intersection sizes (M with N, A with N and B with M) must be provided.
probCounts2MN(intMN, intAN, intBM, k)probCounts2MN(intMN, intAN, intBM, k)
intMN |
Number of elements in the intersection of sets M and N. |
intAN |
Number of elements in the intersection of sets A (subset of M) and N. |
intBM |
Number of elements in the intersection of sets B (subset of N) and M. |
k |
Number of elements in the intersection of sets A and B. |
A numeric value in [0, 1] representing the probability that two subsets of sets M and N intersect in k points.
probCounts2MN(8, 6, 4, 2)probCounts2MN(8, 6, 4, 2)
This function computes the probability that three subsets of given sizes intersect in k points.
probCounts3N(a, b, c, n, k)probCounts3N(a, b, c, n, k)
a |
Size of the first subset. |
b |
Size of the second subset. |
c |
Size of the third subset. |
n |
Size of the set. |
k |
Size of the intersection. |
A numeric value in [0, 1] representing the probability that three subsets of given sizes intersect in k points.
probCounts3N(8, 6, 10, 20, 3)probCounts3N(8, 6, 10, 20, 3)
This function computes the probability that two subsets A and B of sets M and N intersect in at least k points.
pvalCounts2MN(intMN, intAN, intBM, k)pvalCounts2MN(intMN, intAN, intBM, k)
intMN |
Number of elements in the intersection of sets M and N. |
intAN |
Number of elements in the intersection of sets A (subset of M) and N. |
intBM |
Number of elements in the intersection of sets B (subset of N) and M. |
k |
Number of elements in the intersection of sets A and B. |
A numeric value in [0, 1] representing the probability that two subsets of sets M and N intersect in at least k points.
pvalCounts2MN (300, 23, 24, 6)pvalCounts2MN (300, 23, 24, 6)
This function computes the probability that three subsets of a set intersect in at least k points.
pvalCounts3N(lenA, lenB, lenC, n, k)pvalCounts3N(lenA, lenB, lenC, n, k)
lenA |
Size of the first subset. |
lenB |
Size of the second subset. |
lenC |
Size of the third subset. |
n |
Size of the set comprising the subsets. |
k |
Size of the intersection. |
A numeric value in [0, 1] representing the probability that three subsets of a set intersect in at least k points.
pvalCounts3N (300, 200, 250, 400, 180)pvalCounts3N (300, 200, 250, 400, 180)
This function assesses the statistical significance of the overlap of two or three objects (character vectors, or data frames having a numeric column).
pvalObjects( obj1, obj2, obj3 = NULL, universe1, universe2 = NULL, numCol = NULL, isHighTop = TRUE, maxCutoffs = 500, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), nCores = 1, type = c("2N", "2MN", "3N") )pvalObjects( obj1, obj2, obj3 = NULL, universe1, universe2 = NULL, numCol = NULL, isHighTop = TRUE, maxCutoffs = 500, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), nCores = 1, type = c("2N", "2MN", "3N") )
obj1 |
Either 1) a data frame having items as row names and a numeric column or 2) a character vector. |
obj2 |
Either 1) a data frame having items as row names and a numeric column or 2) a character vector. |
obj3 |
Either 1) a data frame having items as row names and a numeric column or 2) a character vector. |
universe1 |
The set from which the items stored
in |
universe2 |
The set from which the items stored
in |
numCol |
The name of the numeric column used for data frame ordering. |
isHighTop |
Whether higher values in the numeric column correspond to better-ranked items. Ignored if all provided objects are character vectors. |
maxCutoffs |
Maximum number of cutoffs. If the input data frames
contain more cutoffs than this value, only |
mtMethod |
Multiple testing correction method. |
nCores |
Number of cores. If performing an overlap assessment between sets belonging to the same universe, it is recommended not to use parallelization (that is, leave this parameter as 1). |
type |
Type of overlap assessment. Choose between: two sets belonging to the same universe ('2N'), two sets belonging to different universes ('2MN'), three sets belonging to the same universe ('3MN'). |
A numeric value in [0, 1] representing the p-value of the overlap of the two objects.
pvalObjects(LETTERS[seq(2, 7)], LETTERS[seq(3, 19)], universe1=LETTERS)pvalObjects(LETTERS[seq(2, 7)], LETTERS[seq(3, 19)], universe1=LETTERS)
This function computes the p-value of intersection of two subsets of sets M and N.
pvalSets2MN(a, b, m, n)pvalSets2MN(a, b, m, n)
a |
A character vector. |
b |
A character vector. |
m |
Set from which |
n |
Set from which |
A thin wrapper around pvalCounts2MN.
A numeric value in [0, 1] representing the p-value of intersection of two subsets of sets M and N.
pvalSets2MN(LETTERS[seq(4, 10)], LETTERS[seq(7, 15)], LETTERS[seq(19)], LETTERS[seq(6, 26)])pvalSets2MN(LETTERS[seq(4, 10)], LETTERS[seq(7, 15)], LETTERS[seq(19)], LETTERS[seq(6, 26)])
This function calculates the p-value of intersection for two sets.
pvalSets2N(a, b, n)pvalSets2N(a, b, n)
a |
A character vector. |
b |
A character vector. |
n |
Set from which |
A thin wrapper around stats::phyper.
A numeric value in [0, 1] representing the p-value of intersection for two sets.
pvalSets2N(LETTERS[seq(4, 10)], LETTERS[seq(7, 15)], LETTERS)pvalSets2N(LETTERS[seq(4, 10)], LETTERS[seq(7, 15)], LETTERS)
This function computes the p-value of intersection of three subsets.
pvalSets3N(a, b, c, n)pvalSets3N(a, b, c, n)
a |
A character vector. |
b |
A character vector. |
c |
A character vector. |
n |
Set from which |
A thin wrapper around pvalCounts3N.
A numeric value in [0, 1] representing the p-value of intersection of three subsets.
pvalSets3N(LETTERS[seq(4, 10)], LETTERS[seq(7, 15)], LETTERS[seq(19)], LETTERS)pvalSets3N(LETTERS[seq(4, 10)], LETTERS[seq(7, 15)], LETTERS[seq(19)], LETTERS)
This function assesses the overlap of two or three lists of objects (character vectors, or data frames having at least one numeric column).
runLISTO( list1, list2, list3 = NULL, universe1, universe2 = NULL, numCol = NULL, isHighTop = TRUE, maxCutoffs = 500, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), pvalThr = NULL, nCores = 1, verbose = TRUE, ... )runLISTO( list1, list2, list3 = NULL, universe1, universe2 = NULL, numCol = NULL, isHighTop = TRUE, maxCutoffs = 500, mtMethod = c("BY", "holm", "hochberg", "hommel", "bonferroni", "BH", "fdr", "none"), pvalThr = NULL, nCores = 1, verbose = TRUE, ... )
list1 |
A list containing either 1) data frames having items as row names and a numeric column or 2) character vectors. |
list2 |
A list containing either 1) data frames having items as row names and a numeric column or 2) character vectors. |
list3 |
A list containing either 1) data frames having items as row names and a numeric column or 2) character vectors. |
universe1 |
Character vector; the set from which the items
corresponding to the elements in |
universe2 |
Character vector; the set from which the items
corresponding to the elements in |
numCol |
The name of the numeric column used for data frame ordering. |
isHighTop |
Whether higher values in the numeric column correspond to better-ranked items. Ignored if all provided objects are character vectors. |
maxCutoffs |
Maximum number of cutoffs. If the input data frames
contain more cutoffs than this value, only |
mtMethod |
Multiple testing correction method. |
pvalThr |
Threshold to filter the results based on the adjusted
p-values. If |
nCores |
Number of cores. If performing an overlap assessment between sets belonging to the same universe, it is recommended not to use parallelization (that is, leave this parameter as 1). |
verbose |
Logical; whether the output should be verbose. |
... |
Additional arguments passed to |
A data frame listing the p-value and adjusted p-value for each
overlap. Combinations of overlaps are represented through the first two
(or three if list3 is not NULL) columns, while the penultimate
column records the overlap p-values and the last column records the adjusted
overlap p-values.
donorPath <- system.file('extdata', 'donorMarkers.qs2', package='LISTO') donorMarkers <- qs2::qs_read(donorPath)[seq(3)] labelPath <- system.file('extdata', 'labelMarkers.qs2', package='LISTO') labelMarkers <- qs2::qs_read(labelPath)[seq(3)] universe1Path <- system.file('extdata', 'universe1.qs2', package='LISTO') universe1 <- qs2::qs_read(universe1Path) res <- runLISTO(donorMarkers, labelMarkers, universe1=universe1, numCol='avg_log2FC')donorPath <- system.file('extdata', 'donorMarkers.qs2', package='LISTO') donorMarkers <- qs2::qs_read(donorPath)[seq(3)] labelPath <- system.file('extdata', 'labelMarkers.qs2', package='LISTO') labelMarkers <- qs2::qs_read(labelPath)[seq(3)] universe1Path <- system.file('extdata', 'universe1.qs2', package='LISTO') universe1 <- qs2::qs_read(universe1Path) res <- runLISTO(donorMarkers, labelMarkers, universe1=universe1, numCol='avg_log2FC')
This function computes the prime factor decomposition of the binomial coefficient.
vChoose(n, k)vChoose(n, k)
n |
Total number of elements. |
k |
Number of selected elements. |
A vector in which positions represent prime numbers (that is, the first position corresponds to 2, the second position corresponds to 3, the third position corresponds to 5, etc.) and values represent their exponents in the factorial decomposition.
vChoose(8, 4)vChoose(8, 4)
This function adds numeric vectors of different lengths by filling shorter vectors with zeroes.
vSum(...)vSum(...)
... |
Numeric vectors. |
A numeric vector.
vSum(c(1, 4), c(2, 8, 6), c(1, 7), c(10, 4, 6, 7))vSum(c(1, 4), c(2, 8, 6), c(1, 7), c(10, 4, 6, 7))