Omics analysis¶
Barnacle was initially developed for unsupervised analysis of metatranscriptomics. What does that mean?
Omics data¶
Metatranscriptomics is a category of “omics” analysis that aims to catalog all of the RNA molecules (transcripts) in a sample. The “meta” part of the term indicates that the sample contains multiple taxa. As such, metatranscriptomes measures the gene expression of a community of organisms, such as a microbiome.
Other types of omics data focus on different biological molecules, either on the level of a single organism, or a whole community (meta-omics):
Genomics –> DNA
Proteomics –> proteins
Lipidomis –> lipids
Metabolomics –> metabolites
Omics datasets are often characterized by certain properties that complicate their analysis, including high dimensionality, and technical noise. Meta-omics are further complicated by properties such as overdispersion, variable community composition, and pervasive zero values [10]. Dealing with this complexity in order to uncover insights from omics data requires specially equipped analytical tools.
Tensor decomposition for omics analysis¶
Tensor decomposition models have previously been applied to omics datasets as a means of dealing with some of the challenges mentioned above, and to uncover patterns in the data. For example, a non-negative tensor decomposition method was applied to a human gene expression dataset to reveal patterns associated with particular diseases across tissue types and demographics [11].
One of the main advantages of using tensor decomposition models to analyze omics data is that it is an unsupervised technique. This allows researchers to discover patterns independent of their own pre-conceived notions. The unsupervised nature of the analysis also allows un-characterized genes to be analyzed alongside annotated genes, whereas other analyses tend to throw out these data. In meta-omics this functionality is especially important because in many datasets over half of the genes have never been previously observed, much less functionally characterized [12]. Tensor decomposition can help generate inferences about this “functional dark matter” based on the association of uncharacterized genes with better characterized co-variates.
The sparse tensor decomposition model presented in Barnacle demonstrates that sparsity constraints may be a useful addition to tensor decomposition models applied to omics datasets. In the case of transcriptomics, the sparse components output by the model can be interpreted as clusters of co-expressed genes, which may be functionally related. Similarly, in other omics datasets Barnacle clusters could help identify groups of other biological molecules with common abundance profiles across samples and conditions.
Example analysis¶
For an in-depth example of using Barnacle for analyzing metatranscriptomics data, please see our research paper: Simultaneous acclimation to nitrogen and iron scarcity in open ocean cyanobacteria revealed by sparse tensor decomposition of metatranscriptomes [1]. All of the scripts used to generate the analyses in the paper are available in the manuscript repository. In particular, check out the following scripts and notebooks:
Normalization and tensorization
Uses sctransform [13] for normalization
-
Performs cross-validated grid search with bootstrapping
-
Compiles component bootstraps for models with best fit parameters
-
Generates summary statistics and profile visualizations for each model component