Meta-interaction analysis

Modified on Fri, 2 Aug at 2:17 PM

TABLE OF CONTENTS

Introduction
1 Creating a plot
2Performing the meta-interaction analysis
3 Meta-interaction analysis interactive plot page

Introduction

This module allows you to identify correlations between gene expression across different tissues, time points or conditions. This way you can determine how the expression of a gene in one cell type influences the expression of genes in another cell type.

Scenario: You want to investigate the signaling pathways activated in bone marrow precursor cells by granulocyte colony-stimulating factor (GCSF) secreted by macrophages. Before proceeding with expensive lab experiments, you decide to assess whether the expression level of the GCSF gene in microphages is correlated with the expression level of any gene in bone marrow precursor cells. You then proceed to collect data from publicly available single cell RNA-seq study performed on bone marrow. You are now ready to use the meta-interaction analysis module!

While setting up the meta-interaction analysis there are several key concepts to keep in mind:

Sender group: A group of cells identified through scRNA-seq analysis that exhibits distinct gene expression patterns, signaling, or other molecular characteristics. This cluster is referred to as the "sender" cluster because it is hypothesized to play a role in transmitting signals or information to other cell clusters.
Receiver group: Another group of cells identified through scRNA-seq analysis that is distinct from the sender cluster. The receiver cluster is the target or recipient of signals or interactions initiated by the sender cluster. The analysis aims to understand how gene expression or molecular activities in the sender cluster influence or impact the receiver cluster.
Independent variable: This is the variable that is manipulated or selected to determine its effect on the dependent variable. It is also sometimes called the predictor variable. In correlation analysis, the term "independent variable" may not be used as frequently as in regression analysis, but it essentially refers to the variable that is expected to influence or predict changes in another variable. In this module the independent variable can be either the number of cells in the sender group or an original feature (i.e. a gene from the sender group).
- Sender gene: A specific gene within the sender cluster that is of particular interest in the context of interaction analysis. Researchers may focus on this gene because of its known involvement in signaling pathways, regulatory functions, or other biological processes. These insights can be obtained by using the gene metadata table module. Investigating the behavior of this gene helps elucidate its role in intercellular communication.
Dependent Variable: This is the variable that is observed or measured in response to changes in the independent variable. It is the outcome variable or the variable of interest. The dependent variable is expected to be influenced by the independent variable. Regardless if the chosen independent variable was the sender gene or the number of cells in the sender group the correlation will be made with each gene in the responder group.
- Responding genes: Genes within the receiver cluster that exhibit changes in expression or activity in response to signals or interactions from the sender cluster. These genes are considered as responsive to the signaling or influence of the sender cluster. In other words, these are the genes whose expression in the receiver group is correlated with the gene expression of the sender gene in the sender group. Identifying and characterizing these receiver responding genes can provide insights into the molecular mechanisms underlying the communication between different cell types.

1 Creating a plot

As a first step of the analysis, a plot must be created by clicking on the create plot icon in your analysis track.

This will lead to the create plot page. Firstly we should enter the plot name and filling in the plot template to provide the proper context for performing this analysis:

You can then choose the "Meta-interaction analysis" algorithm from the "Choose algorithm to run your analysis".

The next step is to choose the data to analyze. This module accpets multiple normalized scRNA-seq datasets from the data pretreatment step.

For this module to work at least three datasets should be selected as input.

Finally, we must specify the parameters of the analysis. In the first menu tab you can select the independent variable which can be either an original feature (i.e. sender gene) or the number of cells in the sender group.

If you select the original feature as an independent variable, you will get to choose a specific sender gene from the sender group.

Alternatively, you can choose the number of cells in the sender group as the independent variable. This will correlate the expression of responding genes with the relative number of cells in the sender group.

In the Correlation method tab you can pick from one of the following correlations method:

Spearman: The default correlation method. Detects linear trends in a robust way by applying the Pearson correlation coefficient on ranks, rather than the raw values. Less prone to be influenced by outliers.
Pearson: Classical correlation coefficient that detects linear association between two variables. It is sensitive to outliers.
Kendal: Counts how many times the two variables rank the same sample concordantly (+1) or discordantly (-1). The final coefficient indicates whether the two variables are in agreement (positive coefficient) or rank the samples in reverse order (negative coefficient). It is less sensitive to ties, but it is slower for large sample size (>1000).

If the independent variable is an origianal feature, for each dataset that you have selected as input you can set the following parameters:

Name: Here you can change the name of the input. By default it has the same name as the plot used as input.
Select metadata variable: Specify the metadata variable based on which you can define the sender and receiver group. Most likely this will be the metadata variable where you have annotated the cell types in your data.
Select sender group: Here you can define the sender group.
Select receiver group: Here you can define the receive group.

If the independent variable is the number of cells, for each dataset that you have selected as input you can set the following parameters:

Name: Here you can change the name of the input. By default it has the same name as the plot used as input.
Select metadata variable: Specify the metadata variable based on which you can define the sender and receiver group. Most likely this will be the metadata variable where you have annotated the cell types in your data.
Groups to include: Here you can select which groups to include. The number of cells in the sender group will be normalized to the total number of cells in this selection. If you leave this menu empty, all available groups will be selected by default.
Select sender group: Here you can define the sender group.
Select receiver group: Here you can define the receive group.

2Performing the meta-interaction analysis

When the parameters are all set-up, you can click on the "Run" button to compute the meta-interaction analysis results. As soon as the results are computed, an interactive plot will appear in the track. Clicking on the "VIew interactively" will allow you to view the results of the meta-interaction analysis in the interactive plot page.

3 Meta-interaction analysis interactive plot page

The main results are the correlations between the gene of interested as expressed in macrophages and all the genes as expressed in precursor cells. These correlations can be visualized as a scatter plot, bar plot or a rank table.

3.1 Scatter plot

A scatter plot with a correlation line is a visual representation of the relationship between two variables. This type of plot is commonly used to explore and illustrate the strength and direction of the association between two quantitative variables, in our case this is the gene expression of the sender gene and a selected responding genes across multiple datasets.

On the scatter plot the previously selected sender gene is always displayed on the x-axis. On the y-axis you can choose to plot any of the responding genes or the number of cells from receiver group while each dot represents a dataset.

The correlation line, also known as the regression line or trend line, is a straight line that summarizes the overall trend or pattern in the scatter plot. The line is often determined using a statistical method called linear regression, which aims to find the best-fitting line that minimizes the sum of squared differences between the observed data points and the predicted values on the line. The slope of the line indicates the direction and strength of the relationship between the two variables. A positive slope indicates a positive correlation (both variables increase together), while a negative slope indicates a negative correlation (one variable increases as the other decreases). The tighter the cluster of points around the correlation line, the stronger the relationship between the two variables.

You use the the following line option to visualize and quantify the relationship between the data points:

Model type - determines the regression model used to produce the regression line and statistics. Currently available models are:
- Global regression models:
  - Linear regression
  - Second degree polynomial
  - Third degree polynomial
  - Fourth degree polynomial
- Local regression models
  - Local linear regression
  - Local polynomial regression
  - Weighted moving average
Show statistics - plots the regression statistics
Relative x-axis position of the statistics label - controls the position of the statics label relative to the x-axis
Relative y-axis position of the statistics label - controls the position of the statics label relative to the y-axis
Show 95% confidence interval - shows or hides the confidence interval on
Confidence interval color - determines the color of the confidence interval
Line color - determines the color of the regression line
Line style - determines the style of the regression line
Line width - determines the width of the regression line

3.2 Bar plot

On the bar plot you can plot the correlation coefficients of multiple responding genes to the sender genes. The correlation coefficient is plotted on the x-axis a determines the height of the bars for each plotted responding gene of the y-axis. The correlation coefficient is a numerical measure of the strength and direction of the linear relationship between the two variables. It ranges from -1 to 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no linear correlation.

3.3 Rank table

The results of the meta-interaction table is also displayed in the rank table. Each responding gene is ranked according to correlation score to the sender gene. By default the highest correlated genes (i.e. with the correlation coefficient closest to 1)