To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.

 
 

Description

Defining the human host factors associated with severe vs mild COVID-19 cases in infected individuals has become of increasing interest. Mining large numbers of public gene expression datasets is an effective way to identify genes that contribute to a given phenotype. Combining RNA-sequencing data with the associated clinical metadata describing disease severity can enable earlier identification of those patients who are at higher risk of developing severe COVID-19 disease. We consequently identified 356 public RNA-seq human transcriptome samples from the Gene Expression Omnibus database that had disease severity metadata. We then subjected these samples to a robust RNA-seq data processing workflow to quantify gene expression in each patient. This process involved using Salmon to map the reads to the reference transcriptomes, edgeR to calculate significant differential expression levels, and gene ontology enrichment using Camera. We then applied a machine learning algorithm to the read counts data to identify features that best differentiated samples based on COVID-19 severity phenotype. Ultimately, we produced a ranked list of genes based on their Gini importance values that includes GIMAP7 and S1PR2, which are associated with immunity and inflammation (respectively). We expect that these results can establish a groundwork foundation to improve the development of improved prognostics for severe COVID-19.

Disciplines

Health Services Research | Medical Sciences | Medicine and Health Sciences

Share

COinS
 

Prediction of Human Transcriptional Biomarkers for Severe Infection with SARS-CoV-2

Defining the human host factors associated with severe vs mild COVID-19 cases in infected individuals has become of increasing interest. Mining large numbers of public gene expression datasets is an effective way to identify genes that contribute to a given phenotype. Combining RNA-sequencing data with the associated clinical metadata describing disease severity can enable earlier identification of those patients who are at higher risk of developing severe COVID-19 disease. We consequently identified 356 public RNA-seq human transcriptome samples from the Gene Expression Omnibus database that had disease severity metadata. We then subjected these samples to a robust RNA-seq data processing workflow to quantify gene expression in each patient. This process involved using Salmon to map the reads to the reference transcriptomes, edgeR to calculate significant differential expression levels, and gene ontology enrichment using Camera. We then applied a machine learning algorithm to the read counts data to identify features that best differentiated samples based on COVID-19 severity phenotype. Ultimately, we produced a ranked list of genes based on their Gini importance values that includes GIMAP7 and S1PR2, which are associated with immunity and inflammation (respectively). We expect that these results can establish a groundwork foundation to improve the development of improved prognostics for severe COVID-19.