The following workflow was used to analyze the data and generate the figures for our paper “Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in lupus patients” published in Nature Medicine in 2020.
To avoid inavailability or version change in R packages we recommend to run the workflow using our Singularity container.
Before running the workflow prepare the working directory with the following subfloders:
data
figure_generation
generated_data
R
data
folderR
folder..Rprofile
file into the working directory itself. Modify the PROJECT_DIR
variable with the full path to the working directory.If you are using Singularity on NIAID HPC, follow these steps to start the container and run R
$ cd /hpcdata/sg/sg_data/singularity/baseline
$ singularity shell -B \
/full_path_to_the_working_dir/:/var/workflow1 \
baseline_rstudio_r_3.4.1.img
$ cd /var/workflow1
$ R
For outside usage check that Singularity is installed and works properly. Singularity installation and usage guide can be found at https://sylabs.io/docs/.
If Singularity is not available, it is possible to use standalone R but we cannot guarantee the error-free flow and the exact results.
Our pipeline requires the following R packages:
# CRAN:
install.packages(c("plyr", "tidyverse", "data.table", "pROC", "lme4", "limma", "tmod", "MASS", "latex2exp", "gplots", "GGally", "effects", "car", "effsize", "cowplot", "circlize"))
# Bioconductor:
source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("WGCNA", "tmod", "fgsea", "ComplexHeatmap", "DEseq2", "hgu133plus2.db"))
# older version of MetaDE
install.packages("https://cran.r-project.org/src/contrib/Archive/MetaDE/MetaDE_1.0.5.tar.gz", repos = NULL, type = "source")
# GitHub (ImmuneSpaceR and ImmSig2 packages have to be installed before ImmuneSignatures. Currently the latest development versions of these packages are required to download HIPC data). We do not include this packages in the Singularity container, since this version once outdated most probably will not not work with ImmuneSpace server.
install.packages("devtools")
devtools::install_github("rglab/ImmuneSpaceR")
devtools::install_github("ehfhcrc/ImmSig2")
devtools::install_github("rglab/ImmuneSignatures", build_vignettes = FALSE)
If while running some scripts you get X11 forwarding related error, check if you have X11 forwarding turned on (in Putty it is in Connection-SSH-X11) and turn it off. Sometime an error may appear due to a conflict between R packages. In this case restart R and rerun the script that previously returned the error.
Input data are stored in data folder:
All data except flow are available on the CHI data portal (https://chi.niaid.nih.gov/web/new/data.html)
```r
source(\R/chi_signature_analysis/cd38_10gene_random_sig_FIGURE.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
Output will be saved to ./generated_data/CHI/.
<br/>
## Step 2. Calculate and plot correlation between titer response and percent of CD38 high cells for 3 baseline time points
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9jaGlfc2lnbmF0dXJlX2FuYWx5c2lzL2NkMzhfMjBnZW5lc19zaW5nbGUuZ2VuZS5hdWNfYmFycGxvdF9GSUdVUkUuclxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/chi_signature_analysis/cd38_20genes_single.gene.auc_barplot_FIGURE.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>It will generate the following figure (Ext. Data Fig. 1b)
./figure_generation/CHI_flow_vs_respones_3time_p.pdf
<br/>
## Step 3. Compare newly gated cell populations with previous gates by predictive power (AUC) and temporal stability:
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9jaGlfc2lnbmF0dXJlX2FuYWx5c2lzL2NkMzhfMTBnZW5lX3NpZ19ybV9nZW5lc19GSUdVUkVTLnJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/chi_signature_analysis/cd38_10gene_sig_rm_genes_FIGURES.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>The script generates two plots for Figure 1b.
./figure_generation/CHI_flow_selected_gates_AUC.pdf ./figure_generation/CHI_flow_selected_gates_ISV.pdf
<br/>
## Step 4. Calculate inter-subject variation (ISV) of expression of individual genes to quantify their stability across baseline time points (days -7, 0, and 70):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9jaGlfc2lnbmF0dXJlX2FuYWx5c2lzL2NkMzhfMTBnZW5lX3NpZ19hbmFseXNpc19GSUdVUkVTX3dpdGhfbWlkLnJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/chi_signature_analysis/cd38_10gene_sig_analysis_FIGURES_with_mid.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>The script generates a table of gene stability metrics:
./generated_data/CHI/CHI_genes_stability.txt
<br/>
## Step 5. Analysis of correlation between gene expression and percentage of CD38 high cells:
Correlation using stable genes only
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9oaXBjX2RhdGFwcmVwL2hpcGNfZGF0YV9mcm9tX0lTX3ByZWxvYWRlZC5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/hipc_dataprep/hipc_data_from_IS_preloaded.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Correlation using all genes to generate correlations with CD20+CD38++ B cells for futher test of random signatures
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9oaXBjX2RhdGFwcmVwL2hpcGNfZGF0YXByZXAuclxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/hipc_dataprep/hipc_dataprep.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>The scripts adds percent of CD38++ cells to sample information table
./generated_data/CHI/CHI_sample_info_2_CD38hi.txt
and generates the following files with robust correlation results:
./generated_data/CHI/robust_corr_genes.txt ./generated_data/CHI/robust_corr_all.genes.txt
<br/>Testing correlations using 500 random signatures computed by permuting subject labels for flow data. At each iteration the correlation computed exactly the same way as for the real data. **Long computation!**
source(“R/chi_signature_analysis/rand.sig_generate.r”)
The output file with correlation data for each iteration is store as ```./generated_data/CHI/gene_sig_random_500_from_all.txt```.
Instead you can download pre-computed random signatures generated previously and put it into the data folder.
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi95Zl9kYXRhcHJlcC95Zl9kYXRhX2Zyb21fYXB0LnJcXClcbnNvdXJjZShcXFIveWZfZGF0YXByZXAveWZfcHJvYmVzMmdlbmVzLnJcXClcbnNvdXJjZShcXFIveWZfZGF0YXByZXAveWZfc2FtcGxlX2ZpbHRlcmluZy5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/yf_dataprep/yf_data_from_apt.r\)
source(\R/yf_dataprep/yf_probes2genes.r\)
source(\R/yf_dataprep/yf_sample_filtering.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
The scripts generate a table and AUC table, as well as a plot for Ext. Data Fig. 2e:
./generated_data/CHI/chi_random.sig.500_3times.rds ./generated_data/figure_generation
<br/>
## Step 6. Determine the optimal number of genes as expression surrogate signature for CD20+CD38++ B cells.
Evaluate prediction power of titer response for top 20 individual genes (Fig. 1d):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi95Zl9zaWduYXR1cmVfYW5hbHlzaXMveWZfY2QzOF8xMGdlbmVfc2lnX2FuYWx5c2lzX0ZJR1VSRVNfd2l0aF9taWQuclxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/yf_signature_analysis/yf_cd38_10gene_sig_analysis_FIGURES_with_mid.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Evaluate prediction power of titer response for different number of genes in the signature (Ext. Data Fig. 2c):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfZGF0YXByZXAvc2xlX2RhdGFwcmVwLnJcXClcbnNvdXJjZShcXFIvc2xlX2RhdGFwcmVwL3NsZV9wcm9iZXMyZ2VuZXMuclxcKVxuc291cmNlKFxcUi9zbGVfZGF0YXByZXAvc2xlX2FkZF9EQV9hbmRfUEcuclxcKVxuc291cmNlKFxcUi9zbGVfZGF0YXByZXAvc2xlX3NhbXBsZV9maWx0ZXJpbmcuclxcKVxuc291cmNlKFxcUi9zbGVfZGF0YXByZXAvc2xlX2dlLm1lYW5fYnlfc3ViamVjdC5yXFwpXG5zb3VyY2UoXFxSL3NsZV9kYXRhcHJlcC9zbGVfc2xlZGFpLmxvd0RBLm1lYW5fYnlfc3ViamVjdC5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/sle_dataprep/sle_dataprep.r\)
source(\R/sle_dataprep/sle_probes2genes.r\)
source(\R/sle_dataprep/sle_add_DA_and_PG.r\)
source(\R/sle_dataprep/sle_sample_filtering.r\)
source(\R/sle_dataprep/sle_ge.mean_by_subject.r\)
source(\R/sle_dataprep/sle_sledai.lowDA.mean_by_subject.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate ROC for titer response prediction based on 10-gene signature, and box plot comparing low- and high- responders based on TGSig for day 0, day -7 and day 70 (Fig. 1e-f). Output the TGSig scores for low and high responders for these time points.
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfZXhhbXBsZV9wcm9maWxlL1NMRV9TTEVEQUlfVElNRV9wcm9maWxlcy5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/sle_example_profile/SLE_SLEDAI_TIME_profiles.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Effect on AUC after removal of individual genes in TGSig (Ext. Data Fig. 2g):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfZGF0YXByZXAvc2xlX2dlbmVfc3RhYmlsaXR5LnJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/sle_dataprep/sle_gene_stability.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Check stability of relative rank of CD38++ signature genes applying different ISV threshold (Ext. Data Fig. 2f):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfc2lnbmF0dXJlX2FuYWx5c2lzL3NsZV9sb3dEQV9jZDM4X3Njb3JlLnJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/sle_signature_analysis/sle_lowDA_cd38_score.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Save the final gene lists for both TGSig and plasmablast signature:
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfc2lnbmF0dXJlX2FuYWx5c2lzL3NsZV9sb3dEQV9QQl9zY29yZS5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/sle_signature_analysis/sle_lowDA_PB_score.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate box plots comparing low-, middle- and high- responders based on TGSig score for day 0 (Ext. Data Fig. 3b):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfc2lnbmF0dXJlX2FuYWx5c2lzL3NsZV9QQl9zY29yZV9mcm9tX21peGVkX21vZGVsLnJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/sle_signature_analysis/sle_PB_score_from_mixed_model.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate box plots comparing male and female subject based on TGSig score for day 0 (Ext. Data Fig. 10b-e):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfc2lnbmF0dXJlX2FuYWx5c2lzL3NsZV9QQi5TTEVEQUkuY29ycl92c19DRDM4Lm1lYW4uc2NvcmVfRklHVVJFLnJcXClcbnNvdXJjZShcXFIvc2xlX3NpZ25hdHVyZV9hbmFseXNpcy9zbGVfUEIuU0xFREFJLmNvcnJfdnNfQ0QzOC5tZWFuLnNjb3JlX3RhYmxlLnJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/sle_signature_analysis/sle_PB.SLEDAI.corr_vs_CD38.mean.score_FIGURE.r\)
source(\R/sle_signature_analysis/sle_PB.SLEDAI.corr_vs_CD38.mean.score_table.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generated data files and figures are saved in ```./generated_data/CHI``` and ```./figure_generation```, respectively. The signature files are saved in ```./generated_data/signatures```.
<br/>
## Step 7. Analyze blood transcription module genes enrichment in genes correlated with percentage of CD38-high cells varying selection threshold for gene inter-subject variation.
Run GSEA on “LI” and “DC” modules separately and save the results in RDS file.
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfc2lnbmF0dXJlX2FuYWx5c2lzL3NsZV9QQi5TTEVEQUkuY29ycl92c19QQi5sb3dEQS5tZWFuLnNjb3JlX0ZJR1VSRS5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/sle_signature_analysis/sle_PB.SLEDAI.corr_vs_PB.lowDA.mean.score_FIGURE.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
Generated RDS files are saved in ```./generated_data/fgsea_with_btm_modules```.
<br/>
---
# HIPC data analysis
## Input data
HAI expression and titer data were downloaded from [ImmuneSpace](http://immunespace.org) and pre-processed using part of the R script to reproduce figures for HIPC meta-analysis paper [Multicohort analysis reveals baseline transcriptional predictors of influenza vaccination responses. Sci Immunol. 2017 Aug 25;2(14)](https://immunology.sciencemag.org/content/2/14/eaal4656). The script downloads the data, corrects the labels for swapped samples, and calculates adjMFC titers.
For SDY404 we use locally computed adjMFC in ```./data/SDY404_young_hai_titer_table_2016.txt```.
**Required file from previous analysis**
./generated_data/signatures/CD38_ge_sig.txt
## Step 1. Data formatting and preparation
Load the data preloaded from ImmuneSpace
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9zbGVfc2lnbmF0dXJlX2FuYWx5c2lzL3NsZV9CVE1fc2NvcmVfZnJvbV9EQS5kZWx0YS5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/sle_signature_analysis/sle_BTM_score_from_DA.delta.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
To load the data directly from ImmuneSpace we run this script. The script requires creating account and configuration file to access Immunespace. We cannot guarantee that the script will continue working in the future.
source(“R/hipc_dataprep/hipc_data_from_IS.r”)
<br/>Prepare data files for young subjects with titer response
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9QQi1zY29yZS12cy1XR0NOQS1tb2R1bGVzL3NjcmlwdC5SXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/PB-score-vs-WGCNA-modules/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Mapping probes to genes
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9mZ3NlYV93aXRoX3dnY25hX21vZHVsZXMvY2hpX0NEMzhfY29yX2J5X0lTVl9mZ3NlYV9sZWFkaW5nRWRnZS5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/fgsea_with_wgcna_modules/chi_CD38_cor_by_ISV_fgsea_leadingEdge.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Filter baseline (day 0) samples and subjects with low and high response
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9mZ3NlYV93aXRoX3dnY25hX21vZHVsZXMvY2hpX0NEMzhfY29yX2J5X0lTVl9mZ3NlYV9GSUdVUkUuclxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/fgsea_with_wgcna_modules/chi_CD38_cor_by_ISV_fgsea_FIGURE.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generated data files are saved in ```./generated_data/HIPC```.
<br/>
## Step 2. Validation of prediction power of CD38 signature in HIPC datasets (young subjects only)
<br/>Calculate TGSig scores using data from day 0
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9mZ3NlYV93aXRoX3dnY25hX21vZHVsZXMvYnJvd24tbGVhZGluZy1lZGdlLUJUTS10YWJsZS5SXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/fgsea_with_wgcna_modules/brown-leading-edge-BTM-table.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate ROC for titer response prediction based on 10-gene signature, and box plot comparing low- and high- responders based on TGSig:
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9XR0NOQS1tb2R1bGVzLWZyb20tU0xFLWxvdy1EQS10bW9kL3NjcmlwdC5SXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/WGCNA-modules-from-SLE-low-DA-tmod/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate box plots comparing low-, middle- and high- responders based on TGSig:
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9NZXRhREUtZmx1LXRpdGVyLXJlc3BvbnNlLWZyb20tZXhwcmVzc2lvbi1pbi1mb3VyLWNvaG9ydHMvc2NyaXB0LlJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/MetaDE-flu-titer-response-from-expression-in-four-cohorts/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generated data files and figures (Fig. 2a and Ext. Data Fig. 3b) are saved in ```./generated_data/HIPC``` and ```./figure_generation```, respectively.
<br/>
---
# Emory data analysis
## Input data
* Microarray (downloaded from GEO) and antibody titer data wre proprocessed outside of the workflow
* data/Emory/GSE29619.RData
* data/Emory/GSE74817_healthy.RData
<br/>
## Testing prediction power of CD38 signature
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9NZXRhREVfZm9yZXN0X3Bsb3QvRmx1X21ldGFfZm9yZXN0X3Bsb3QuclxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/MetaDE_forest_plot/Flu_meta_forest_plot.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
Output figure is saved in ```figure_generation```.
<br/>
---
# Yellow Fever data analysis
## Input data
* Sample information from GEO including Neutralization antibody titer data
* data/YF/expression/sample_info.txt
* Expression data from GEO processed by RMA-sketch (Affymetrix Power Tools)
* data/YF/expression/rma-sketch.summary.txt
* Probeset-gene mapping (for reproducibility, originally obtained using hgu133plus2.db R package)
* data/YF/expression/YF_probe_map.txt
<br/>
## Step 1. Data formatting and preparation
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9ZRjEtZWZmZWN0LXNpemVzL3NjcmlwdC5SXFwpXG5zb3VyY2UoXFxSL01ldGFERV9mb3Jlc3RfcGxvdC9ZRjFfZWZmZWN0X3NpemVfcGxvdC5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/YF1-effect-sizes/script.R\)
source(\R/MetaDE_forest_plot/YF1_effect_size_plot.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>
# Step 2. Validation of prediction power of CD38 signature in yellow fever datasets
Calculate TGSig scores using data from day 0
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9HU0VBLXVzaW5nLWdlbmVzLWNvbW1vbi10by1zdGFibGUtU0xFLWxvdy1EQS1hbmQtNGZsdS9zY3JpcHQuUlxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/GSEA-using-genes-common-to-stable-SLE-low-DA-and-4flu/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate ROC for titer response prediction based on 10-gene signature, and box plot comparing low- and high- responders based on TGSig (trials 1 for Figure 2b, and trial 2 for Ext. Data Fig. 4a):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9icm93bi1tb2R1bGUtbGVhZGluZy1nZW5lcy1sb3ctbWlkLWhpZ2gtZWlnZW5nZW5lL3NjcmlwdC5SXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/brown-module-leading-genes-low-mid-high-eigengene/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate box plot comparing low-, middle- and high- responders based on TGSig in trial 1 (Ext. Data Fig. 3d):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9yZWNhbGMtYW5kLWNoZWNrLTEwZ2VuZS1DRDM4cGx1c1NpZy9zY3JpcHQuUlxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/recalc-and-check-10gene-CD38plusSig/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generated data files and figures are saved in ```./generated_data/YF``` and ```./figure_generation```, respectively.
<br/>
---
# SLE data analysis
## Input data
* Expression data downloaded from http://websle.com (Data tab) -- Batch-corrected, probe filtered, latest probe-gene annotation.
* data/SLE/expression/SLE_Longitudinal_972_eset.RData
* Subjects assignment to patient groups (from the SLE paper, figure 6A)
* data/SLE/SLE_SUBJECT_PG.txt
<br/>
## Step 1. Data formatting and preparation
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9icm93bi1tb2QtbWludXMtbGVhZGluZy1lZGdlL3NjcmlwdC5SXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/brown-mod-minus-leading-edge/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generate figure of SLEDAI change at different visit for a single patient (Fig. 2d):
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9icm93bi1tb2QtbWludXMtbGVhZGluZy1lZGdlLWNvbWJpbmVkL3NjcmlwdC5SXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/brown-mod-minus-leading-edge-combined/script.R\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>
## Step 2. Calculate inter-subject variation (ISV) of expression of individual genes to quantify their stability across visits with low disease activity
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9JRk5fc2lnbmF0dXJlX2FuYWx5c2lzL3NldHVwLnJcXClcbnNvdXJjZShcXFIvSUZOX3NpZ25hdHVyZV9hbmFseXNpcy9jaGlfSUZOMjZfc2lnX2FuYWx5c2lzX0ZJR1VSRVMyLnJcXClcbnNvdXJjZShcXFIvSUZOX3NpZ25hdHVyZV9hbmFseXNpcy9oaXBjX2QwX0lGTjI2X3Njb3JlLnJcXClcbnNvdXJjZShcXFIvSUZOX3NpZ25hdHVyZV9hbmFseXNpcy9zbGVfbG93REFfSUZOMjZfc2NvcmUuclxcKVxuc291cmNlKFxcUi9JRk5fc2lnbmF0dXJlX2FuYWx5c2lzL2xvZ2lzdGljX3JlZ3JfNGZsdS5yXFwpXG5gYGBcbmBgYCJ9 -->
```r
```r
source(\R/IFN_signature_analysis/setup.r\)
source(\R/IFN_signature_analysis/chi_IFN26_sig_analysis_FIGURES2.r\)
source(\R/IFN_signature_analysis/hipc_d0_IFN26_score.r\)
source(\R/IFN_signature_analysis/sle_lowDA_IFN26_score.r\)
source(\R/IFN_signature_analysis/logistic_regr_4flu.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Generated data files and figures are saved in ```./generated_data/SLE``` and ```./figure_generation```, respectively.
<br/>
## Step 3. Calculate TGSig and Plasmablast signature scores
<br/>Calculate TGSig scores using data from low-disease-activity samples
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9STkFzZXEvQmNlbGxfY29tcGFyaXNpb25fREVzZXEuclxcKVxuYGBgXG5gYGAifQ== -->
```r
```r
source(\R/RNAseq/Bcell_comparision_DEseq.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Calculate Plasmablast signature scores using data from low-disease-activity samples
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc291cmNlKFxcUi9STkFzZXEvYnJvd24tbW9kLWxlYWRpbmctZWRnZS1lbnJpY2htZW50LnJcXClcbmBgYFxuYGBgIn0= -->
```r
```r
source(\R/RNAseq/brown-mod-leading-edge-enrichment.r\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
<br/>Calculate DaCP using mixed-effect model
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuc291cmNlKFwiUi9zbGVfc2lnbmF0dXJlX2FuYWx5c2lzL3NsZV9QQl9zY29yZV9mcm9tX21peGVkX21vZGVsLnJcIilcbmBgYCJ9 -->
```r
source("R/sle_signature_analysis/sle_PB_score_from_mixed_model.r")
Generate scatter plot between DaCP and average TGSig scores for low-disease-activity samples (Fig. 2e) and the table of correlation data
source("R/sle_signature_analysis/sle_PB.SLEDAI.corr_vs_CD38.mean.score_FIGURE.r")
source("R/sle_signature_analysis/sle_PB.SLEDAI.corr_vs_CD38.mean.score_table.r")
Generate scatter plot between DaCP and average Plasmablast signature scores for low-disease-activity samples (Fig. 2f)
source("R/sle_signature_analysis/sle_PB.SLEDAI.corr_vs_PB.lowDA.mean.score_FIGURE.r")
Generate scatter plot between DaCP and average Plasmablast signature scores for low-disease-activity samples analyzing correlation at different DaCP thresholds (Ext. Data Fig. 5c-d)
source("R/sle_signature_analysis/sle_PB.SLEDAI.corr_vs_CD38.mean.score_thresholds.r")
Generated data files and figures are saved in ./generated_data/SLE
and ./figure_generation
, respectively.
Correlation between DaCP and delta in Plasmablast score (Ext. Data Fig. 5b):
source("R/sle_signature_analysis/sle_DaCP_vs_dPB.r")
Heatmaps of blood gene expression signatures association with disease activity/flares (Fig. 2c and Ext. Data Fig. 5a):
source("R/sle_signature_analysis/sle_BTM_score_from_DA.delta.r")
Generated plots are stored in ./figure_generation
.
Gene modules discovery by WGCNA
source("R/WGCNA-modules-from-SLE-low-DA/script.R")
Generate Eigengene heatmap
source("R/WGCNA-eigengene-heatmap/script.R")
The generated figures (Fig. 3a) are saved in ./figure_generation/SLE_Sig
.
source("R/PB-score-vs-WGCNA-modules/script.R")
The generated figures (Fig. 3b-c) are saved in ./figure_generation/SLE-Sig
. Table with correlation data for all modules is saved in ./generated_data/PB-score-vs-WGCNA-modules
.
source("R/fgsea_with_wgcna_modules/chi_CD38_cor_by_ISV_fgsea_leadingEdge.r")
Generate a barplot (Fig. 3e) to show enrichment of brown module (with the enrichment plot) and blood transcriptome modules in genes ranked by correlation of their intensity with CD38++ cell frequency. The genes were filtered with ISV >= 0.5. The previously generated data (step 7 in CHI data analysis) were used for GSEA results of BTM enrichment.
source("R/fgsea_with_wgcna_modules/chi_CD38_cor_by_ISV_fgsea_FIGURE.r")
Generate table of three BTM module genes in SLE-Sig (Fig. 3f)
source("R/fgsea_with_wgcna_modules/brown-leading-edge-BTM-table.R")
Output files: ./generated_data/fgsea_with_wgcna_modules/
The generated figure is saved as ./figure_generation/SLE-Sig
source("R/WGCNA-modules-from-SLE-low-DA-tmod/script.R")
The generated figure is saved as ./figure_generation/SLE-Sig/WGCNA-BTM-enrichment-p.adjust.pdf
.
Output files are saved in ./generated_data/WGCNA-modules-from-SLE-low-DA-tmod
Output table to Fig. 3f is saved as ./generated_data/fgsea_with_wgcna_modules/brown-mod-m75-m150-m165-genes-short.csv
source("R/MetaDE-flu-titer-response-from-expression-in-four-cohorts/script.R")
Output files are saved in ./generated_data/MetaDE
.
source("R/MetaDE_forest_plot/Flu_meta_forest_plot.r")
The generated figure is saved in ./figure_generation/MetaDE
.
source("R/YF1-effect-sizes/script.R")
source("R/MetaDE_forest_plot/YF1_effect_size_plot.r")
The output files are saved in ./generated_data/MetaDE
. The generated figure is saved in ./figure_generation/MetaDE
.
source("R/GSEA-using-genes-common-to-stable-SLE-low-DA-and-4flu/script.R")
The output files are saved in ./generated_data/MetaDE
. The generated figure (Fig. 3d) is saved in ./figure_generation/MetaDE
.
source("R/brown-module-leading-genes-low-mid-high-eigengene/script.R")
Output files are saved in ./generated_data/brown-module-leading-genes-low-mid-high-eigengene
.
source("R/recalc-and-check-10gene-CD38plusSig/script.R")
Output figures (Ext. Data Fig. 6 a-b, e) are saved in ./figure_generation/SLE-Sig
.
Test removing SLE-Sig genes from brown module
source("R/brown-mod-minus-leading-edge/script.R")
Output files are saved in ./generated_data/brown-mod-minus-leading-edge/
.
source("R/brown-mod-minus-leading-edge-combined/script.R")
Output figure (Ext. Data Fig. 6d) is saved in ./figure_generation/SLE-Sig
.
source("R/IFN_signature_analysis/setup.r")
source("R/IFN_signature_analysis/chi_IFN26_sig_analysis_FIGURES2.r")
source("R/IFN_signature_analysis/hipc_d0_IFN26_score.r")
source("R/IFN_signature_analysis/sle_lowDA_IFN26_score.r")
source("R/IFN_signature_analysis/logistic_regr_4flu.r")
Output files are saved in ./generated_data/IFN26
.
Output figures (Ext. Data Fig. 6c,f) are saved in ./figure_generation/IFN26
.
Compare CD20+CD38++ B cells with CD20+ B cells with DEseq2:
source("R/RNAseq/Bcell_comparision_DEseq.r")
Output files are saved in ./generated_data/RNAseq
. Output figures (incl. Ext. Data Fig. 7b) are saved in ./figure_generation/RNAseq
.
Enrichment analysis of SLE-Sig genes and genes correlated with CD20+CD38++ B cells (at different TSM) against genes ranked by differential expression from RNAseq:
source("R/RNAseq/brown-mod-leading-edge-enrichment.r")
Output files are saved in ./generated_data/RNAseq
. Output figures (Ext. Data Fig. 7c-e) are saved in ./figure_generation/RNAseq
.