A. Before you begin

A1. Download R and RStudio

⏰ timing: 1 hour

✅ 1. R is a free software environment for statistical computing and graphics. It runs on UNIX, Windows and MacOS.

  • To download and install R go to https://www.r-project.org/ (R Core Team, 2013). The current pipeline was performed using R version 4.2.2.

✅ 2. RStudio is an integrated development environment (IDE) for R. It allows to easily execute the R codes, plot graphics, and manage the workspace in a multipanel interphase.

A2. Download required packages

⏰ timing: 1 hour

✅ 3. Users must first download the required packages (listed in the key resources table). They can be downloaded through Bioconductor, which provides tools for the analysis and comprehension of high-throughput genomic data. BiocManager::install() is the recommended command to install packages (for detailed information on why BiocManager::install() is preferred to the standard R packages installation please read https://www.bioconductor.org/install/#whybiocmanagerinstall):

  • Open RStudio, and set up the working directory. Here, you may select a specific directory path, such as my dir path “F:/advbioinfor_test/lecture_03”.
# construct a new file dir.
print(getwd())
## [1] "F:/winServer_G/ABI/ABI-Project-01"
if (!dir.exists("data")) dir.create("data")
  • Install the packages needed for the analysis.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager") 

pkgs <- c("GEOquery", "affy", "simpleaffy", "arrayQualityMetrics")

for (p in pkgs) {
  if (!requireNamespace(p, quietly = TRUE))
BiocManager::install(p)
}
  • Once all packages are installed, they need to be loaded:
library(GEOquery)
library(affy)
library(simpleaffy)
library(arrayQualityMetrics)

A3. Dataset selection

⏰ timing: 2 days

✅ 4. When using datasets from public repositories, the key step is to identify a dataset (or datasets) that comply with the eligibility criteria and that contains the sample information required for the analysis.

  • We suggest browsing Gene Expression Omnibus (GEO: https://www.ncbi.nlm.nih.gov/gds, (Barrett et al., 2012)) and ArrayExpress (https://www.ebi.ac.uk/arrayexpress/, (Athar et al., 2019)) repositories because they gathermultiple high-throughput genomics datasets.

  • In this project, publicly available microarray gene expression datasets for asthma were retrieved from the Gene Expression Omnibus Database (GEO) (http://www.ncbi.nlm.nih.gov/geo/) using the keyword “asthma”. The raw datasets were manually checked and only those met the following criteria were included for subsequent analysis: 1) gene expression profiling in asthmatics and controls, 2) cell type: airway epithelial cell, but not nasal epithelium, 3) gene expression data were generated by a single-channel microarray platform (Affymetrix or Agilent chips), 4) availability of raw CEL or TXT files, 5) samples with detailed descriptions, and 6) sample size > 80.

  • According to above-mentioned criterion, two datasets were identified, including GSE63142 and GSE67472. Next, We will demonstrate how to conduct data analysis for Affymetrix DNA microarray (i. e. GSE67472, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67472).

Figure 1. Dataset GSE67472 in GEO database

B. Materials & Equipment

For this bioinformatics analysis we used a laptop with an Intel Core i5 8th generation processor, 32 GB RAM memory and Windows 10 Pro. No high-performance computing clusters were needed for the analysis of the data. Internet connection is required for downloading R packages and data matrixes.

C. Step-by-step Methods

The flow chart for data processing is included in Figure 1.

C1. Download and prepare the data matrix for analysis

⏰ timing: 2 hours

You can download the experiment information and clinical data directly from GEO using the GEOquery package:

✅ 5. The series matrix file is a text file that includes a tab-delimited value-matrix for each sample containing the phenotypic/clinical and experimental data of a given dataset. In the GEO webtool, there is a hyperlink to the series matrix, called ‘‘Series Matrix File(s)’’. To download the series matrix file directly to the R environment use the getGEO command:

options(timeout=1000) 
#library(GEOquery)
#gse <- getGEO("GSE67472")
#print(gse)
#gsm <- gse[[1]]$geo_accession
#print(gsm)

✅ 6. Alternatively, you can download the raw data from GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67472), as shown in figure 2.

Figure 1. Download the raw data of GSE67472

  • If you want to download it using the computer program, you may run the following R script:
setwd("./data")
if (file.exists("GSE67472_RAW.tar")) {
  s <- file.info("GSE67472_RAW.tar")
  if (s$size > 600000000) print("Dataset downloading successfully!")
} else {
  # getOption('timeout')
  options(timeout=1000) 
  test <- try(getGEOSuppFiles("GSE67472", 
                              makeDirectory = TRUE, 
                              baseDir = getwd(), 
                              fetch_files = TRUE, 
                              filter_regex = NULL), 
              silent=TRUE) 
  if (is.null(class(test))) print("Dataset downloading error!") 
}
## [1] "Dataset downloading successfully!"
setwd("..")

C2. Importing data into R environment.

⏰ timing: 10~30 mins

✅ 7. Import the downloaded data into R, according to the following codes.

setwd("./data")
# decompressed file (GSE67472_RAW.tar) to a file dir namely GSE67472. 
untar("GSE67472_RAW.tar", exdir = "GSE67472")
print(getwd())
## [1] "F:/winServer_G/ABI/ABI-Project-01/data"
# Import all *.cel files into R environment.
setwd("GSE67472")
library(affy)
dat <- ReadAffy()
setwd("..")
unlink("GSE67472", recursive = TRUE)
print(dat)
## AffyBatch object
## size of arrays=1164x1164 features (66 kb)
## cdf=HG-U133_Plus_2 (54675 affyids)
## number of samples=105
## number of genes=54675
## annotation=hgu133plus2
## notes=
setwd("..")

C3. Quality Assessment for DNA microarray

✅ 8. Optionally, QA is an important step to preprocessing part.

#. if (!dir.exists("QC")) dir.create("QC")
#. setwd("./QC")
#. library(arrayQualityMetrics)
#. err.pos <- arrayQualityMetrics(expressionset = dat, 
#.                                outdir = "QA_before", 
#.                                force = TRUE)
#. err.cel <- which(err.pos$arrayTable == "x", arr.ind = TRUE)[, 1]
#. print(err.cel)
#. setwd("..")

C4. Microarray data normalization

⏰ timing: 30 mins

✅ 9. There are many algorithm used in DNA microarray data normalization, such as RMA, MAS5.0, GCRMA, PLIER, VSN, and so on. Here, we adopt the RMA.

eset <- rma(dat)
## Background correcting
## Normalizing
## Calculating Expression
print(eset)
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54675 features, 105 samples 
##   element names: exprs 
## protocolData
##   sampleNames: GSM1647628_08.Fahy_12GOBMUC2_A.CEL.gz
##     GSM1647629_49.Fahy_13GOBMUC2_A.CEL.gz ... GSM1647732_239_new.CEL.gz
##     (105 total)
##   varLabels: ScanDate
##   varMetadata: labelDescription
## phenoData
##   sampleNames: GSM1647628_08.Fahy_12GOBMUC2_A.CEL.gz
##     GSM1647629_49.Fahy_13GOBMUC2_A.CEL.gz ... GSM1647732_239_new.CEL.gz
##     (105 total)
##   varLabels: sample
##   varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: hgu133plus2
eset <- exprs(eset)
cl.name <- sapply(colnames(eset), 
                  function(x) strsplit(x, "_")[[1]][1])
colnames(eset) <- cl.name
dim(eset)
## [1] 54675   105
DT::datatable(eset[1:100, 1:4], 
              extensions = c('Buttons','FixedColumns','RowGroup'), 
              options = list(dom = 'Bfrtip', 
                             buttons = c('copy', 
                                         'csv', 
                                         'excel', 
                                         'pdf', 
                                         'print')
                             ))

✅ 10. Download annotation file for hgu133plus2.0 platform, and save it as “GPL570-hgu133plus2.txt”.

hgu133plus2 <- read.csv("GPL570-hgu133plus2.txt", 
                        sep = "\t", 
                        skip = 16, 
                        header = TRUE)
dim(hgu133plus2)
## [1] 54675    16

✅ 11. Annotation files were processed to extract information that was available in this study.

### For hgu133plus2, annotation information, GSE67472.
### 1) Find out the probes which match multiple genes, or are unknown.  
o2m.p.133p2 <- NULL
len.stat <- NULL
for (i in 1:nrow(hgu133plus2)) {
  a <- as.character(hgu133plus2$ENTREZ_GENE_ID[i])
  tmp <- strsplit(a, " /// ")[[1]]
  len.stat <- c(len.stat, length(tmp))
  if (length(tmp) > 1 | length(tmp) == 0) {
    o2m.p.133p2 <- c(o2m.p.133p2, i)
  }   
}
o2m.p <- length(o2m.p.133p2)
word1 <- paste("There are", 
               o2m.p, 
               "probes which matched more than one genes!")
print(word1)
## [1] "There are 12841 probes which matched more than one genes!"
# sum(table(len.stat)) - 41834
# table(is.na(hgu133plus2$ENTREZ_GENE_ID))
# which(o2m.p.95av2 == "4721")
### If you want, you can remove the o2m probes from annotation file. 
anno.133p2 <- hgu133plus2[-o2m.p.133p2, c(1, 11, 12)]
# table(unique(anno.133p2$ID) == anno.133p2$ID)

# 2) Find out the probes which match only one gene.
# i. e., one probe, one gene. 
o2o <- table(anno.133p2$ENTREZ_GENE_ID) == 1
o2o.133p2 <-  names(table(anno.133p2$ENTREZ_GENE_ID))[o2o]
length(o2o.133p2)
## [1] 9792
# 3) Find out the genes which match the more probes.
# i., e., more probes, one gene. 
m2o <- table(anno.133p2$ENTREZ_GENE_ID) > 1
m2o.133p2 <-  names(table(anno.133p2$ENTREZ_GENE_ID))[m2o]
length(m2o.133p2)
## [1] 10694

✅ 12. Extract (or prepare) the gene expression matrix from GSE67472.

### 1) Extracting one2one gene expression levels. 
mat.67472 <- NULL
probe.67472 <- rownames(eset)
for (s in o2o.133p2) {
  s.pos <- which(anno.133p2$ENTREZ_GENE_ID == s) 
  if (length(s.pos) > 1) print(s.pos)
  s.tmp <- eset[which(probe.67472 == anno.133p2$ID[s.pos]), ] 
  # rownames(mat.tmp) <- s
  mat.67472 <- rbind(mat.67472, s.tmp)
}

### 2) Extracting more2one gene expression levels. 
for (m in m2o.133p2) {
  m.pos <- which(anno.133p2$ENTREZ_GENE_ID == m) 
  m.tmp <- eset[match(anno.133p2$ID[m.pos], probe.67472), ] 
  # if (nrow(m.tmp) > 1) print(m.tmp)
  tmp <- m.tmp[which.max(apply(m.tmp, 1, IQR)), ]
  mat.67472 <- rbind(mat.67472, tmp)
}
rownames(mat.67472) <- c(o2o.133p2, m2o.133p2)

### 3) Previewing the gene expression matirx. 
head(mat.67472)
##           GSM1647628 GSM1647629 GSM1647630 GSM1647631 GSM1647632 GSM1647633
## 1           3.493539   3.815563   3.610844   3.594420   3.515792   3.598956
## 10          4.221160   3.994366   4.081684   4.095276   3.995225   4.061557
## 100048912   4.411851   4.262043   4.288609   4.715922   4.240065   4.113609
## 10007       7.675279   7.129691   7.091532   7.330977   7.301560   8.066077
## 100093698   4.756881   4.745557   5.290200   4.797684   4.655890   4.643768
## 1001        8.085951   7.931672   7.757693   7.567498   7.942103   8.091432
##           GSM1647634 GSM1647635 GSM1647636 GSM1647637 GSM1647638 GSM1647639
## 1           3.524817   3.786830   3.607734   3.624177   3.631540   3.980450
## 10          4.045187   3.731648   4.031600   3.922030   3.959078   3.965483
## 100048912   4.634525   4.083764   4.424771   4.278938   4.286333   4.130990
## 10007       7.237745   7.778026   7.022828   7.480860   8.087107   7.756052
## 100093698   4.625860   4.654817   4.520235   4.581960   4.500682   4.694741
## 1001        8.008257   7.731630   7.966563   8.083884   7.962252   8.129002
##           GSM1647640 GSM1647641 GSM1647642 GSM1647643 GSM1647644 GSM1647645
## 1           3.614938   3.889815   3.471951   3.426553   3.458802   3.406155
## 10          4.130205   3.834808   4.040178   4.307509   4.044788   4.274490
## 100048912   4.326898   4.374988   4.255556   4.772894   4.296362   4.833295
## 10007       7.434535   7.238472   6.926840   7.466005   7.452425   7.042389
## 100093698   5.005437   4.833614   5.016033   4.813633   4.671746   4.730833
## 1001        8.083623   8.091432   8.386591   8.154212   8.047155   8.094366
##           GSM1647646 GSM1647647 GSM1647648 GSM1647649 GSM1647650 GSM1647651
## 1           3.731419   3.823126   3.847240   3.805505   3.767520   3.427415
## 10          4.544955   3.853246   3.558827   3.650856   3.690891   4.033876
## 100048912   4.328221   4.073886   4.581152   4.275477   4.367421   4.198209
## 10007       8.113792   8.240947   7.480237   7.654124   7.626536   7.306358
## 100093698   4.858869   4.530557   4.640295   4.639624   4.574224   4.850072
## 1001        8.071839   8.524349   8.211024   8.465392   8.263971   7.901734
##           GSM1647652 GSM1647653 GSM1647654 GSM1647655 GSM1647656 GSM1647657
## 1           3.538655   3.792681   3.854631   3.592468   3.784979   3.447886
## 10          3.777804   3.719733   3.788754   4.022531   3.925838   3.704985
## 100048912   4.101644   4.341932   4.172638   4.393942   4.232531   4.320004
## 10007       7.766085   7.775982   7.792051   7.784038   7.324192   7.465378
## 100093698   4.822784   4.322000   4.780470   4.712794   4.808868   4.678359
## 1001        8.530143   8.403850   8.613862   8.763135   8.072464   8.464986
##           GSM1647658 GSM1647659 GSM1647660 GSM1647661 GSM1647662 GSM1647663
## 1           3.548694   3.624177   3.778054   3.749435   4.092182   3.493683
## 10          3.914622   3.751118   3.658723   3.715779   3.767556   4.172536
## 100048912   4.248941   4.086963   4.118574   4.296706   3.988716   4.167928
## 10007       7.777732   7.470334   7.577234   8.249810   7.762185   7.582286
## 100093698   4.599880   4.501640   4.746848   4.494668   4.667586   5.054394
## 1001        8.208811   8.258622   8.275777   8.781770   8.421109   8.423450
##           GSM1647664 GSM1647665 GSM1647666 GSM1647667 GSM1647668 GSM1647669
## 1           3.728554   3.691676   3.672567   3.518454   3.728377   3.553892
## 10          3.681612   3.870919   3.822458   3.785636   3.979369   4.038572
## 100048912   4.303938   4.470891   4.265020   4.229603   4.474944   4.739768
## 10007       7.356018   7.832507   7.374706   7.541572   7.298703   7.512672
## 100093698   4.674376   4.582640   5.014345   4.748786   4.796670   4.971267
## 1001        8.254545   8.140062   7.960394   8.500167   8.557788   8.588560
##           GSM1647670 GSM1647671 GSM1647672 GSM1647673 GSM1647674 GSM1647675
## 1           3.604163   3.377187   3.541051   3.701559   3.687964   3.462925
## 10          4.052307   4.031013   4.166329   3.949714   3.847567   3.884303
## 100048912   4.391817   4.309714   4.324954   4.601820   4.449960   4.404119
## 10007       7.368099   7.943735   7.567371   7.592621   8.079317   6.977886
## 100093698   4.646662   4.734808   4.626728   4.515904   4.673637   5.007907
## 1001        7.769916   8.775533   8.323765   8.075811   7.963822   8.021878
##           GSM1647676 GSM1647677 GSM1647678 GSM1647679 GSM1647680 GSM1647681
## 1           3.487839   3.730179   3.627973   3.495416   3.586580   3.648914
## 10          3.654517   4.056240   4.254719   4.304213   3.929474   4.140867
## 100048912   4.475781   4.201370   4.317208   4.457723   4.412154   4.322228
## 10007       7.117548   7.591061   7.033449   7.653792   7.278171   7.193045
## 100093698   4.898230   5.057406   4.781666   5.120903   4.471379   4.719116
## 1001        8.025430   7.773447   8.505299   7.982408   8.222151   8.184908
##           GSM1647682 GSM1647683 GSM1647684 GSM1647685 GSM1647686 GSM1647687
## 1           3.491328   3.840231   3.624177   3.447938   3.651987   3.548095
## 10          3.823714   3.933194   3.779992   4.176949   3.931765   3.846544
## 100048912   4.484474   4.354049   4.437525   4.536114   4.047968   4.734792
## 10007       7.782200   7.692376   7.471291   7.238200   7.713349   7.271928
## 100093698   4.673863   4.728424   4.717438   4.662934   4.773808   4.716867
## 1001        8.229269   8.504173   8.249935   8.269294   8.021421   8.557898
##           GSM1647688 GSM1647689 GSM1647690 GSM1647691 GSM1647692 GSM1647693
## 1           3.559386   3.589201   3.503411   3.756853   3.782884   3.544267
## 10          4.271602   3.859686   3.582452   5.548169   3.618996   3.612497
## 100048912   4.243510   4.455239   4.177011   4.443936   3.994784   4.121800
## 10007       7.208901   7.493661   6.318035   6.887358   7.234137   6.609267
## 100093698   4.681159   4.839381   4.094268   4.583141   4.482587   4.559766
## 1001        8.488396   7.813515   7.454245   7.070134   7.772381   7.119365
##           GSM1647694 GSM1647695 GSM1647696 GSM1647697 GSM1647698 GSM1647699
## 1           3.854844   3.391394   3.938420   3.616746   3.543546   3.357637
## 10          4.041660   3.524700   3.584794   3.842314   3.618981   3.703531
## 100048912   4.277661   4.331131   3.998356   3.990968   3.971352   4.001425
## 10007       7.212859   6.849776   6.513698   6.602003   7.467341   6.435926
## 100093698   4.404763   4.674050   4.639142   4.036403   4.134684   4.300547
## 1001        7.908496   8.184520   7.717381   8.011796   8.536115   7.995198
##           GSM1647700 GSM1647701 GSM1647702 GSM1647703 GSM1647704 GSM1647705
## 1           3.404336   3.619980   3.764452   3.575531   3.583176   3.628200
## 10          3.627778   3.557135   3.623480   3.631437   3.485017   3.652029
## 100048912   4.039981   4.044260   4.848737   4.376736   4.546645   4.439595
## 10007       6.977143   6.606321   6.746771   6.919508   6.553946   6.950588
## 100093698   3.906094   4.417674   4.488039   4.183569   4.447541   4.403258
## 1001        7.725240   7.856125   8.083761   7.589999   7.812311   7.395116
##           GSM1647706 GSM1647707 GSM1647708 GSM1647709 GSM1647710 GSM1647711
## 1           3.582888   3.582700   3.647314   3.397082   3.554474   3.351084
## 10          3.397724   3.524158   3.519721   3.668817   3.343952   3.391633
## 100048912   5.023137   4.414605   4.954801   4.596287   4.463078   3.820420
## 10007       7.080405   6.588667   6.753694   7.286492   6.965241   6.674601
## 100093698   4.337309   4.356547   4.574294   4.369669   4.144505   4.375722
## 1001        8.122574   7.367379   7.518406   7.207111   8.496733   7.318378
##           GSM1647712 GSM1647713 GSM1647714 GSM1647715 GSM1647716 GSM1647717
## 1           3.719210   3.526593   3.372690   3.501530   3.727421   3.639791
## 10          3.456361   3.428795   3.487687   3.549424   3.534387   3.830634
## 100048912   4.046192   4.066871   4.271311   4.172623   3.882176   4.053331
## 10007       6.658031   6.857437   6.572512   6.828444   6.993781   7.075091
## 100093698   4.443090   4.336848   4.058928   4.307881   4.431165   4.249286
## 1001        8.383271   8.752761   8.033930   8.108315   9.022741   8.387307
##           GSM1647718 GSM1647719 GSM1647720 GSM1647721 GSM1647722 GSM1647723
## 1           3.399587   3.624177   3.752995   3.322396   3.590592   3.512920
## 10          3.605670   3.552359   3.396808   3.946787   3.681840   3.732090
## 100048912   4.586746   4.243077   4.157660   3.816541   4.717015   4.258321
## 10007       6.894933   6.995748   7.141518   7.025541   6.860344   6.814057
## 100093698   4.238214   4.440895   4.292646   4.275602   4.220167   4.341084
## 1001        7.976506   8.426613   8.142116   8.305765   8.098517   7.805309
##           GSM1647724 GSM1647725 GSM1647726 GSM1647727 GSM1647728 GSM1647729
## 1           3.716713   4.027583   3.505757   3.991416   3.825900   3.905370
## 10          3.679192   3.758264   3.618333   3.468145   3.883723   3.369822
## 100048912   3.973574   3.918572   3.920508   4.345485   4.236662   3.986152
## 10007       7.171236   6.884750   7.150438   6.706999   6.180664   6.015538
## 100093698   4.134684   4.085823   4.370084   4.077464   4.532269   4.376569
## 1001        8.196194   7.437380   7.640051   7.268189   7.603088   7.232151
##           GSM1647730 GSM1647731 GSM1647732
## 1           3.661197   3.478313   3.976525
## 10          3.487539   3.717022   3.721855
## 100048912   4.259137   3.727160   3.918848
## 10007       7.060676   6.260667   6.922737
## 100093698   4.287670   4.342175   4.548165
## 1001        7.801146   8.303764   7.379277
tail(mat.67472)
##      GSM1647628 GSM1647629 GSM1647630 GSM1647631 GSM1647632 GSM1647633
## 9987   8.797270   9.543499  10.196649   9.676556   9.926735   8.229028
## 999    5.087675   4.777227   4.619528   4.563001   4.851726   4.342094
## 9990   4.749497   4.817312   4.763305   4.536702   4.934358   4.470848
## 9991   3.489513   3.847950   3.519479   3.465999   3.720478   3.571195
## 9993   5.993702   6.638650   6.401174   6.405897   6.580947   6.278414
## 9994   4.715041   3.873419   4.268575   4.505541   3.731421   4.320570
##      GSM1647634 GSM1647635 GSM1647636 GSM1647637 GSM1647638 GSM1647639
## 9987   9.628917   9.833227   9.693346   8.992126   8.708899   8.838605
## 999    4.814504   4.686905   4.635453   4.611680   4.501259   4.539497
## 9990   4.797551   5.157497   4.736082   4.530546   4.791117   4.714043
## 9991   3.510455   3.749444   3.634873   3.565544   3.502093   3.364446
## 9993   6.336191   6.338804   6.183747   6.486892   6.318802   6.415717
## 9994   4.215959   3.658543   4.321291   4.310333   4.197945   4.094475
##      GSM1647640 GSM1647641 GSM1647642 GSM1647643 GSM1647644 GSM1647645
## 9987   8.519494   9.642169   9.600863  10.070196   9.844676   9.721972
## 999    4.151984   4.924871   4.421486   4.537612   4.454581   4.993588
## 9990   4.749003   4.718259   4.679408   4.681002   4.715246   4.466099
## 9991   3.555019   3.411676   3.526558   3.501498   3.445067   3.739113
## 9993   6.471316   6.469016   6.318802   6.065829   6.316447   5.990558
## 9994   4.510645   4.172083   4.261934   4.665155   4.682810   4.801920
##      GSM1647646 GSM1647647 GSM1647648 GSM1647649 GSM1647650 GSM1647651
## 9987   8.900714   9.185092  10.038580   8.906396  10.208416   9.201599
## 999    4.550740   5.029535   4.749776   4.930668   4.887648   5.069506
## 9990   4.898078   4.781393   5.092634   4.689468   4.594047   4.770097
## 9991   3.566156   3.730246   3.797230   3.695209   3.519872   3.499086
## 9993   6.106470   6.263873   6.437010   6.651857   6.364111   6.644035
## 9994   4.236391   3.955137   4.111297   3.895333   3.983710   4.446326
##      GSM1647652 GSM1647653 GSM1647654 GSM1647655 GSM1647656 GSM1647657
## 9987   8.975572   8.908338   9.027172   8.706801   9.687480  10.177435
## 999    4.902590   5.110962   4.477884   4.400306   4.940268   4.406582
## 9990   5.093683   4.851102   4.655348   4.333250   4.922592   4.816731
## 9991   3.821349   3.636542   3.522028   3.811687   3.531415   3.742637
## 9993   6.376458   6.683031   6.287875   6.121229   6.297613   6.342503
## 9994   3.829943   4.000390   4.115873   5.085913   4.197945   3.504014
##      GSM1647658 GSM1647659 GSM1647660 GSM1647661 GSM1647662 GSM1647663
## 9987   9.596874  10.060879   9.823678   9.164008   9.350353   9.633823
## 999    4.750011   4.608157   4.705335   4.578296   4.537404   4.372189
## 9990   4.818815   4.874244   5.264185   4.745079   4.731024   4.606107
## 9991   3.547773   4.135978   3.586132   3.753409   3.456251   3.541659
## 9993   6.311116   6.302858   6.906638   6.337772   6.653700   6.251163
## 9994   4.278376   3.685006   3.960065   3.922632   4.009117   4.301791
##      GSM1647664 GSM1647665 GSM1647666 GSM1647667 GSM1647668 GSM1647669
## 9987   9.904486   9.570970  10.081523   9.765201   9.055918   9.257624
## 999    4.769400   4.484010   4.694268   5.342281   4.413512   4.498139
## 9990   4.324302   4.632187   4.878395   4.794198   4.683746   4.767858
## 9991   3.481019   3.514511   3.658261   3.847191   3.330420   3.516446
## 9993   6.416057   6.227119   6.169913   6.302684   6.490340   6.776257
## 9994   4.379528   4.035962   4.447716   3.802357   4.298185   4.476881
##      GSM1647670 GSM1647671 GSM1647672 GSM1647673 GSM1647674 GSM1647675
## 9987   9.073088   8.828343   8.894181   8.800846   8.700348  10.279342
## 999    4.903564   4.764284   4.457028   4.740606   4.512574   4.679920
## 9990   4.798875   4.851903   4.704414   4.619785   4.918170   4.957715
## 9991   3.762041   3.749075   3.676415   3.677416   3.870908   3.799202
## 9993   6.382595   6.165056   6.017437   6.632966   6.119678   6.725781
## 9994   4.050898   3.986944   4.617908   3.834929   4.568451   3.735107
##      GSM1647676 GSM1647677 GSM1647678 GSM1647679 GSM1647680 GSM1647681
## 9987  10.169723   9.347962   9.776060  10.145354  10.182200   9.900812
## 999    4.594516   4.214567   4.252756   4.626332   4.584692   4.589770
## 9990   4.710790   4.847243   4.502457   4.818572   4.563963   4.585053
## 9991   3.656685   3.600700   3.649638   3.642221   3.696986   3.499682
## 9993   6.417382   6.458502   6.294467   6.162869   6.158380   6.122428
## 9994   4.475781   4.703668   4.831550   4.493391   4.268756   4.181568
##      GSM1647682 GSM1647683 GSM1647684 GSM1647685 GSM1647686 GSM1647687
## 9987   9.040678   9.675271   9.663450   9.398040   9.701226   9.859115
## 999    4.301822   4.746444   4.524282   4.700323   4.739564   4.193090
## 9990   4.713128   4.631045   5.164010   4.702815   4.720601   4.762992
## 9991   3.457692   3.520906   3.548973   3.362627   3.615327   3.519613
## 9993   6.251369   6.409668   6.205957   6.413504   6.671587   6.174700
## 9994   3.977842   4.370924   3.631346   4.524520   3.794920   4.963665
##      GSM1647688 GSM1647689 GSM1647690 GSM1647691 GSM1647692 GSM1647693
## 9987  10.387270   8.982582  12.545066  12.299644  12.391114  12.468412
## 999    4.489886   4.550368   8.659643   9.035696   8.365179   8.695121
## 9990   4.629061   4.723761   9.370547   9.352484   9.205883   9.150237
## 9991   3.666695   3.460645  10.087780   9.397239   9.378504   9.884493
## 9993   6.382622   6.349075   6.257728   6.138301   6.615222   6.638248
## 9994   4.302590   4.863930   3.329453   3.117792   3.581132   3.556022
##      GSM1647694 GSM1647695 GSM1647696 GSM1647697 GSM1647698 GSM1647699
## 9987  12.321481  12.444026  12.469580  12.464626  12.446148  12.329810
## 999    8.610234   9.073496   8.585342   9.123837   8.817750   8.698458
## 9990   9.110475   8.943346   9.057649   9.203340   9.405636   9.266758
## 9991   9.489726  10.096217   9.927865  10.296162  10.117756  10.113108
## 9993   6.602443   6.510552   6.091101   6.385542   6.542367   6.405951
## 9994   3.603829   3.690442   3.853179   3.340139   3.657396   3.738596
##      GSM1647700 GSM1647701 GSM1647702 GSM1647703 GSM1647704 GSM1647705
## 9987  12.357389  12.389274  12.469010  12.456117  12.393919  12.199283
## 999    8.355022   8.669765   8.371930   8.155059   8.716816   8.213573
## 9990   9.228869   8.943264   8.706569   9.273165   9.141237   8.760095
## 9991   9.618103  10.067136   9.782571   9.916169   9.990994   9.949034
## 9993   5.970204   6.324345   6.291997   5.774860   6.200433   6.431080
## 9994   3.425047   3.619653   3.358793   3.265579   3.575744   3.411245
##      GSM1647706 GSM1647707 GSM1647708 GSM1647709 GSM1647710 GSM1647711
## 9987  12.468018  12.466848  12.332214  12.381347  12.308812  12.408019
## 999    8.476079   8.160227   8.020776   7.767112   8.177750   8.397695
## 9990   9.207019   9.216705   9.260052   9.503812   9.243718   9.487315
## 9991   9.893183  10.014857  10.019207   9.930516   9.874120   9.956008
## 9993   5.909816   6.450221   6.038726   5.886792   6.364923   6.088311
## 9994   3.401832   3.624383   3.200030   3.750955   3.427719   3.400561
##      GSM1647712 GSM1647713 GSM1647714 GSM1647715 GSM1647716 GSM1647717
## 9987  12.435812  12.332821  12.419882  12.349310  12.376765  12.217223
## 999    8.231136   8.529670   8.600379   8.082034   7.664927   8.609067
## 9990   9.445479   9.311809   9.286401   9.484943   9.218387   9.406477
## 9991  10.030680  10.183490  10.075673   9.988534  10.136182  10.209932
## 9993   5.599545   6.401653   6.256615   5.882119   6.619720   6.187790
## 9994   3.307463   3.653609   3.469419   3.611405   3.387477   3.352760
##      GSM1647718 GSM1647719 GSM1647720 GSM1647721 GSM1647722 GSM1647723
## 9987  12.350784  12.400903  12.411422  12.349919  12.343119  12.313529
## 999    7.799913   7.816289   7.900371   8.123323   7.627430   7.502731
## 9990   9.299853   9.476335   9.392429   9.321778   9.213561   9.137480
## 9991   9.943423   9.624970   9.734943  10.215482   9.833705   9.997739
## 9993   5.889117   6.024456   6.412208   5.929917   6.395975   6.085076
## 9994   3.568028   3.319400   3.592305   3.520880   3.648228   3.334641
##      GSM1647724 GSM1647725 GSM1647726 GSM1647727 GSM1647728 GSM1647729
## 9987  12.312746  12.315157  12.307455  12.356493  12.359289  12.326813
## 999    7.553576   7.783808   7.951572   8.845998   8.632151   8.730998
## 9990   9.053719   8.683569   9.328236   9.242972   9.204422   9.159040
## 9991   9.940263  10.174905  10.053122   9.861581   9.593428   9.574328
## 9993   6.496909   6.379911   5.899576   6.733012   6.503043   6.017495
## 9994   3.471100   3.715923   3.351870   3.605524   3.404443   3.415473
##      GSM1647730 GSM1647731 GSM1647732
## 9987  12.326307  12.374970  12.308382
## 999    8.392537   9.141722   8.752968
## 9990   9.485381   8.946582   9.219892
## 9991   9.564925   9.968836   9.582460
## 9993   6.094405   5.963519   6.651468
## 9994   3.222852   3.426816   3.556945
dim(mat.67472)
## [1] 20486   105
print(getwd())
## [1] "F:/winServer_G/ABI/ABI-Project-01"

C5. Differential gene expression analysis across different conditions

⏰ timing: 1 day

✅ 13. Before identifying the differentially expressed genes, you must offer the sample information (sample labels, or groups).

sample.url <- paste("https://www.ebi.ac.uk/biostudies/files", 
                    "E-GEOD-67472/E-GEOD-67472.sdrf.txt", 
                    sep = "/")
sample.info <- read.csv(sample.url, sep = "\t", header = TRUE)
sample.info <- sample.info[, c(33, 46, 47, 50, 53)]
names(sample.info) <- c("Assay", "Age", "State", "Sex", "Group")
DT::datatable(sample.info)

C6. Correlation analysis

  • To be available …

⏰ timing: 1 day

C7. Patient segregation based on gene expression

⏰ timing: 1 day

D. About Statistical Analysis

Eligibility criteria, statistical tests and software used for this protocol are properly described in the ‘‘before you begin’’ and ‘‘step-by-step methods details’’ sections.

  • Statistics index used in this project includ mean, sd, CV, T statistics, fold change (FC), p value, and so on.

\[ \sigma = \sqrt{ \frac{1}{N} \sum_{i=1}^N (x_i -\mu)^2} \]

E. Reference

📰 Nie X, Wei J, Hao Y, et al. Consistent biomarkers and related pathogenesis underlying asthma revealed by systems biology approach[J]. International journal of molecular sciences, 2019, 20(16): 4037.