⏰ timing: 1 hour
✅ 1. R is a free software environment for statistical computing and graphics. It runs on UNIX, Windows and MacOS.
✅ 2. RStudio is an integrated development environment (IDE) for R. It allows to easily execute the R codes, plot graphics, and manage the workspace in a multipanel interphase.
⏰ timing: 1 hour
✅ 3. Users must first download the required packages (listed in the key resources table). They can be downloaded through Bioconductor, which provides tools for the analysis and comprehension of high-throughput genomic data. BiocManager::install() is the recommended command to install packages (for detailed information on why BiocManager::install() is preferred to the standard R packages installation please read https://www.bioconductor.org/install/#whybiocmanagerinstall):
# construct a new file dir.
print(getwd())
## [1] "F:/winServer_G/ABI/ABI-Project-01"
if (!dir.exists("data")) dir.create("data")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
<- c("GEOquery", "affy", "simpleaffy", "arrayQualityMetrics")
pkgs
for (p in pkgs) {
if (!requireNamespace(p, quietly = TRUE))
::install(p)
BiocManager }
library(GEOquery)
library(affy)
library(simpleaffy)
library(arrayQualityMetrics)
⏰ timing: 2 days
✅ 4. When using datasets from public repositories, the key step is to identify a dataset (or datasets) that comply with the eligibility criteria and that contains the sample information required for the analysis.
We suggest browsing Gene Expression Omnibus (GEO: https://www.ncbi.nlm.nih.gov/gds, (Barrett et al., 2012)) and ArrayExpress (https://www.ebi.ac.uk/arrayexpress/, (Athar et al., 2019)) repositories because they gathermultiple high-throughput genomics datasets.
In this project, publicly available microarray gene expression datasets for asthma were retrieved from the Gene Expression Omnibus Database (GEO) (http://www.ncbi.nlm.nih.gov/geo/) using the keyword “asthma”. The raw datasets were manually checked and only those met the following criteria were included for subsequent analysis: 1) gene expression profiling in asthmatics and controls, 2) cell type: airway epithelial cell, but not nasal epithelium, 3) gene expression data were generated by a single-channel microarray platform (Affymetrix or Agilent chips), 4) availability of raw CEL or TXT files, 5) samples with detailed descriptions, and 6) sample size > 80.
According to above-mentioned criterion, two datasets were identified, including GSE63142 and GSE67472. Next, We will demonstrate how to conduct data analysis for Affymetrix DNA microarray (i. e. GSE67472, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67472).
For this bioinformatics analysis we used a laptop with an Intel Core i5 8th generation processor, 32 GB RAM memory and Windows 10 Pro. No high-performance computing clusters were needed for the analysis of the data. Internet connection is required for downloading R packages and data matrixes.
The flow chart for data processing is included in Figure 1.
⏰ timing: 2 hours
You can download the experiment information and clinical data directly from GEO using the GEOquery package:
✅ 5. The series matrix file is a text file that includes a tab-delimited value-matrix for each sample containing the phenotypic/clinical and experimental data of a given dataset. In the GEO webtool, there is a hyperlink to the series matrix, called ‘‘Series Matrix File(s)’’. To download the series matrix file directly to the R environment use the getGEO command:
options(timeout=1000)
#library(GEOquery)
#gse <- getGEO("GSE67472")
#print(gse)
#gsm <- gse[[1]]$geo_accession
#print(gsm)
✅ 6. Alternatively, you can download the raw data from GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67472), as shown in figure 2.
setwd("./data")
if (file.exists("GSE67472_RAW.tar")) {
<- file.info("GSE67472_RAW.tar")
s if (s$size > 600000000) print("Dataset downloading successfully!")
else {
} # getOption('timeout')
options(timeout=1000)
<- try(getGEOSuppFiles("GSE67472",
test makeDirectory = TRUE,
baseDir = getwd(),
fetch_files = TRUE,
filter_regex = NULL),
silent=TRUE)
if (is.null(class(test))) print("Dataset downloading error!")
}
## [1] "Dataset downloading successfully!"
setwd("..")
⏰ timing: 10~30 mins
✅ 7. Import the downloaded data into R, according to the following codes.
setwd("./data")
# decompressed file (GSE67472_RAW.tar) to a file dir namely GSE67472.
untar("GSE67472_RAW.tar", exdir = "GSE67472")
print(getwd())
## [1] "F:/winServer_G/ABI/ABI-Project-01/data"
# Import all *.cel files into R environment.
setwd("GSE67472")
library(affy)
<- ReadAffy()
dat setwd("..")
unlink("GSE67472", recursive = TRUE)
print(dat)
## AffyBatch object
## size of arrays=1164x1164 features (66 kb)
## cdf=HG-U133_Plus_2 (54675 affyids)
## number of samples=105
## number of genes=54675
## annotation=hgu133plus2
## notes=
setwd("..")
✅ 8. Optionally, QA is an important step to preprocessing part.
#. if (!dir.exists("QC")) dir.create("QC")
#. setwd("./QC")
#. library(arrayQualityMetrics)
#. err.pos <- arrayQualityMetrics(expressionset = dat,
#. outdir = "QA_before",
#. force = TRUE)
#. err.cel <- which(err.pos$arrayTable == "x", arr.ind = TRUE)[, 1]
#. print(err.cel)
#. setwd("..")
⏰ timing: 30 mins
✅ 9. There are many algorithm used in DNA microarray data normalization, such as RMA, MAS5.0, GCRMA, PLIER, VSN, and so on. Here, we adopt the RMA.
<- rma(dat) eset
## Background correcting
## Normalizing
## Calculating Expression
print(eset)
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54675 features, 105 samples
## element names: exprs
## protocolData
## sampleNames: GSM1647628_08.Fahy_12GOBMUC2_A.CEL.gz
## GSM1647629_49.Fahy_13GOBMUC2_A.CEL.gz ... GSM1647732_239_new.CEL.gz
## (105 total)
## varLabels: ScanDate
## varMetadata: labelDescription
## phenoData
## sampleNames: GSM1647628_08.Fahy_12GOBMUC2_A.CEL.gz
## GSM1647629_49.Fahy_13GOBMUC2_A.CEL.gz ... GSM1647732_239_new.CEL.gz
## (105 total)
## varLabels: sample
## varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: hgu133plus2
<- exprs(eset)
eset <- sapply(colnames(eset),
cl.name function(x) strsplit(x, "_")[[1]][1])
colnames(eset) <- cl.name
dim(eset)
## [1] 54675 105
::datatable(eset[1:100, 1:4],
DTextensions = c('Buttons','FixedColumns','RowGroup'),
options = list(dom = 'Bfrtip',
buttons = c('copy',
'csv',
'excel',
'pdf',
'print')
))
✅ 10. Download annotation file for hgu133plus2.0 platform, and save it as “GPL570-hgu133plus2.txt”.
<- read.csv("GPL570-hgu133plus2.txt",
hgu133plus2 sep = "\t",
skip = 16,
header = TRUE)
dim(hgu133plus2)
## [1] 54675 16
✅ 11. Annotation files were processed to extract information that was available in this study.
### For hgu133plus2, annotation information, GSE67472.
### 1) Find out the probes which match multiple genes, or are unknown.
.133p2 <- NULL
o2m.p<- NULL
len.stat for (i in 1:nrow(hgu133plus2)) {
<- as.character(hgu133plus2$ENTREZ_GENE_ID[i])
a <- strsplit(a, " /// ")[[1]]
tmp <- c(len.stat, length(tmp))
len.stat if (length(tmp) > 1 | length(tmp) == 0) {
.133p2 <- c(o2m.p.133p2, i)
o2m.p
}
}<- length(o2m.p.133p2)
o2m.p <- paste("There are",
word1
o2m.p, "probes which matched more than one genes!")
print(word1)
## [1] "There are 12841 probes which matched more than one genes!"
# sum(table(len.stat)) - 41834
# table(is.na(hgu133plus2$ENTREZ_GENE_ID))
# which(o2m.p.95av2 == "4721")
### If you want, you can remove the o2m probes from annotation file.
.133p2 <- hgu133plus2[-o2m.p.133p2, c(1, 11, 12)]
anno# table(unique(anno.133p2$ID) == anno.133p2$ID)
# 2) Find out the probes which match only one gene.
# i. e., one probe, one gene.
<- table(anno.133p2$ENTREZ_GENE_ID) == 1
o2o .133p2 <- names(table(anno.133p2$ENTREZ_GENE_ID))[o2o]
o2olength(o2o.133p2)
## [1] 9792
# 3) Find out the genes which match the more probes.
# i., e., more probes, one gene.
<- table(anno.133p2$ENTREZ_GENE_ID) > 1
m2o .133p2 <- names(table(anno.133p2$ENTREZ_GENE_ID))[m2o]
m2olength(m2o.133p2)
## [1] 10694
✅ 12. Extract (or prepare) the gene expression matrix from GSE67472.
### 1) Extracting one2one gene expression levels.
.67472 <- NULL
mat.67472 <- rownames(eset)
probefor (s in o2o.133p2) {
<- which(anno.133p2$ENTREZ_GENE_ID == s)
s.pos if (length(s.pos) > 1) print(s.pos)
<- eset[which(probe.67472 == anno.133p2$ID[s.pos]), ]
s.tmp # rownames(mat.tmp) <- s
.67472 <- rbind(mat.67472, s.tmp)
mat
}
### 2) Extracting more2one gene expression levels.
for (m in m2o.133p2) {
<- which(anno.133p2$ENTREZ_GENE_ID == m)
m.pos <- eset[match(anno.133p2$ID[m.pos], probe.67472), ]
m.tmp # if (nrow(m.tmp) > 1) print(m.tmp)
<- m.tmp[which.max(apply(m.tmp, 1, IQR)), ]
tmp .67472 <- rbind(mat.67472, tmp)
mat
}rownames(mat.67472) <- c(o2o.133p2, m2o.133p2)
### 3) Previewing the gene expression matirx.
head(mat.67472)
## GSM1647628 GSM1647629 GSM1647630 GSM1647631 GSM1647632 GSM1647633
## 1 3.493539 3.815563 3.610844 3.594420 3.515792 3.598956
## 10 4.221160 3.994366 4.081684 4.095276 3.995225 4.061557
## 100048912 4.411851 4.262043 4.288609 4.715922 4.240065 4.113609
## 10007 7.675279 7.129691 7.091532 7.330977 7.301560 8.066077
## 100093698 4.756881 4.745557 5.290200 4.797684 4.655890 4.643768
## 1001 8.085951 7.931672 7.757693 7.567498 7.942103 8.091432
## GSM1647634 GSM1647635 GSM1647636 GSM1647637 GSM1647638 GSM1647639
## 1 3.524817 3.786830 3.607734 3.624177 3.631540 3.980450
## 10 4.045187 3.731648 4.031600 3.922030 3.959078 3.965483
## 100048912 4.634525 4.083764 4.424771 4.278938 4.286333 4.130990
## 10007 7.237745 7.778026 7.022828 7.480860 8.087107 7.756052
## 100093698 4.625860 4.654817 4.520235 4.581960 4.500682 4.694741
## 1001 8.008257 7.731630 7.966563 8.083884 7.962252 8.129002
## GSM1647640 GSM1647641 GSM1647642 GSM1647643 GSM1647644 GSM1647645
## 1 3.614938 3.889815 3.471951 3.426553 3.458802 3.406155
## 10 4.130205 3.834808 4.040178 4.307509 4.044788 4.274490
## 100048912 4.326898 4.374988 4.255556 4.772894 4.296362 4.833295
## 10007 7.434535 7.238472 6.926840 7.466005 7.452425 7.042389
## 100093698 5.005437 4.833614 5.016033 4.813633 4.671746 4.730833
## 1001 8.083623 8.091432 8.386591 8.154212 8.047155 8.094366
## GSM1647646 GSM1647647 GSM1647648 GSM1647649 GSM1647650 GSM1647651
## 1 3.731419 3.823126 3.847240 3.805505 3.767520 3.427415
## 10 4.544955 3.853246 3.558827 3.650856 3.690891 4.033876
## 100048912 4.328221 4.073886 4.581152 4.275477 4.367421 4.198209
## 10007 8.113792 8.240947 7.480237 7.654124 7.626536 7.306358
## 100093698 4.858869 4.530557 4.640295 4.639624 4.574224 4.850072
## 1001 8.071839 8.524349 8.211024 8.465392 8.263971 7.901734
## GSM1647652 GSM1647653 GSM1647654 GSM1647655 GSM1647656 GSM1647657
## 1 3.538655 3.792681 3.854631 3.592468 3.784979 3.447886
## 10 3.777804 3.719733 3.788754 4.022531 3.925838 3.704985
## 100048912 4.101644 4.341932 4.172638 4.393942 4.232531 4.320004
## 10007 7.766085 7.775982 7.792051 7.784038 7.324192 7.465378
## 100093698 4.822784 4.322000 4.780470 4.712794 4.808868 4.678359
## 1001 8.530143 8.403850 8.613862 8.763135 8.072464 8.464986
## GSM1647658 GSM1647659 GSM1647660 GSM1647661 GSM1647662 GSM1647663
## 1 3.548694 3.624177 3.778054 3.749435 4.092182 3.493683
## 10 3.914622 3.751118 3.658723 3.715779 3.767556 4.172536
## 100048912 4.248941 4.086963 4.118574 4.296706 3.988716 4.167928
## 10007 7.777732 7.470334 7.577234 8.249810 7.762185 7.582286
## 100093698 4.599880 4.501640 4.746848 4.494668 4.667586 5.054394
## 1001 8.208811 8.258622 8.275777 8.781770 8.421109 8.423450
## GSM1647664 GSM1647665 GSM1647666 GSM1647667 GSM1647668 GSM1647669
## 1 3.728554 3.691676 3.672567 3.518454 3.728377 3.553892
## 10 3.681612 3.870919 3.822458 3.785636 3.979369 4.038572
## 100048912 4.303938 4.470891 4.265020 4.229603 4.474944 4.739768
## 10007 7.356018 7.832507 7.374706 7.541572 7.298703 7.512672
## 100093698 4.674376 4.582640 5.014345 4.748786 4.796670 4.971267
## 1001 8.254545 8.140062 7.960394 8.500167 8.557788 8.588560
## GSM1647670 GSM1647671 GSM1647672 GSM1647673 GSM1647674 GSM1647675
## 1 3.604163 3.377187 3.541051 3.701559 3.687964 3.462925
## 10 4.052307 4.031013 4.166329 3.949714 3.847567 3.884303
## 100048912 4.391817 4.309714 4.324954 4.601820 4.449960 4.404119
## 10007 7.368099 7.943735 7.567371 7.592621 8.079317 6.977886
## 100093698 4.646662 4.734808 4.626728 4.515904 4.673637 5.007907
## 1001 7.769916 8.775533 8.323765 8.075811 7.963822 8.021878
## GSM1647676 GSM1647677 GSM1647678 GSM1647679 GSM1647680 GSM1647681
## 1 3.487839 3.730179 3.627973 3.495416 3.586580 3.648914
## 10 3.654517 4.056240 4.254719 4.304213 3.929474 4.140867
## 100048912 4.475781 4.201370 4.317208 4.457723 4.412154 4.322228
## 10007 7.117548 7.591061 7.033449 7.653792 7.278171 7.193045
## 100093698 4.898230 5.057406 4.781666 5.120903 4.471379 4.719116
## 1001 8.025430 7.773447 8.505299 7.982408 8.222151 8.184908
## GSM1647682 GSM1647683 GSM1647684 GSM1647685 GSM1647686 GSM1647687
## 1 3.491328 3.840231 3.624177 3.447938 3.651987 3.548095
## 10 3.823714 3.933194 3.779992 4.176949 3.931765 3.846544
## 100048912 4.484474 4.354049 4.437525 4.536114 4.047968 4.734792
## 10007 7.782200 7.692376 7.471291 7.238200 7.713349 7.271928
## 100093698 4.673863 4.728424 4.717438 4.662934 4.773808 4.716867
## 1001 8.229269 8.504173 8.249935 8.269294 8.021421 8.557898
## GSM1647688 GSM1647689 GSM1647690 GSM1647691 GSM1647692 GSM1647693
## 1 3.559386 3.589201 3.503411 3.756853 3.782884 3.544267
## 10 4.271602 3.859686 3.582452 5.548169 3.618996 3.612497
## 100048912 4.243510 4.455239 4.177011 4.443936 3.994784 4.121800
## 10007 7.208901 7.493661 6.318035 6.887358 7.234137 6.609267
## 100093698 4.681159 4.839381 4.094268 4.583141 4.482587 4.559766
## 1001 8.488396 7.813515 7.454245 7.070134 7.772381 7.119365
## GSM1647694 GSM1647695 GSM1647696 GSM1647697 GSM1647698 GSM1647699
## 1 3.854844 3.391394 3.938420 3.616746 3.543546 3.357637
## 10 4.041660 3.524700 3.584794 3.842314 3.618981 3.703531
## 100048912 4.277661 4.331131 3.998356 3.990968 3.971352 4.001425
## 10007 7.212859 6.849776 6.513698 6.602003 7.467341 6.435926
## 100093698 4.404763 4.674050 4.639142 4.036403 4.134684 4.300547
## 1001 7.908496 8.184520 7.717381 8.011796 8.536115 7.995198
## GSM1647700 GSM1647701 GSM1647702 GSM1647703 GSM1647704 GSM1647705
## 1 3.404336 3.619980 3.764452 3.575531 3.583176 3.628200
## 10 3.627778 3.557135 3.623480 3.631437 3.485017 3.652029
## 100048912 4.039981 4.044260 4.848737 4.376736 4.546645 4.439595
## 10007 6.977143 6.606321 6.746771 6.919508 6.553946 6.950588
## 100093698 3.906094 4.417674 4.488039 4.183569 4.447541 4.403258
## 1001 7.725240 7.856125 8.083761 7.589999 7.812311 7.395116
## GSM1647706 GSM1647707 GSM1647708 GSM1647709 GSM1647710 GSM1647711
## 1 3.582888 3.582700 3.647314 3.397082 3.554474 3.351084
## 10 3.397724 3.524158 3.519721 3.668817 3.343952 3.391633
## 100048912 5.023137 4.414605 4.954801 4.596287 4.463078 3.820420
## 10007 7.080405 6.588667 6.753694 7.286492 6.965241 6.674601
## 100093698 4.337309 4.356547 4.574294 4.369669 4.144505 4.375722
## 1001 8.122574 7.367379 7.518406 7.207111 8.496733 7.318378
## GSM1647712 GSM1647713 GSM1647714 GSM1647715 GSM1647716 GSM1647717
## 1 3.719210 3.526593 3.372690 3.501530 3.727421 3.639791
## 10 3.456361 3.428795 3.487687 3.549424 3.534387 3.830634
## 100048912 4.046192 4.066871 4.271311 4.172623 3.882176 4.053331
## 10007 6.658031 6.857437 6.572512 6.828444 6.993781 7.075091
## 100093698 4.443090 4.336848 4.058928 4.307881 4.431165 4.249286
## 1001 8.383271 8.752761 8.033930 8.108315 9.022741 8.387307
## GSM1647718 GSM1647719 GSM1647720 GSM1647721 GSM1647722 GSM1647723
## 1 3.399587 3.624177 3.752995 3.322396 3.590592 3.512920
## 10 3.605670 3.552359 3.396808 3.946787 3.681840 3.732090
## 100048912 4.586746 4.243077 4.157660 3.816541 4.717015 4.258321
## 10007 6.894933 6.995748 7.141518 7.025541 6.860344 6.814057
## 100093698 4.238214 4.440895 4.292646 4.275602 4.220167 4.341084
## 1001 7.976506 8.426613 8.142116 8.305765 8.098517 7.805309
## GSM1647724 GSM1647725 GSM1647726 GSM1647727 GSM1647728 GSM1647729
## 1 3.716713 4.027583 3.505757 3.991416 3.825900 3.905370
## 10 3.679192 3.758264 3.618333 3.468145 3.883723 3.369822
## 100048912 3.973574 3.918572 3.920508 4.345485 4.236662 3.986152
## 10007 7.171236 6.884750 7.150438 6.706999 6.180664 6.015538
## 100093698 4.134684 4.085823 4.370084 4.077464 4.532269 4.376569
## 1001 8.196194 7.437380 7.640051 7.268189 7.603088 7.232151
## GSM1647730 GSM1647731 GSM1647732
## 1 3.661197 3.478313 3.976525
## 10 3.487539 3.717022 3.721855
## 100048912 4.259137 3.727160 3.918848
## 10007 7.060676 6.260667 6.922737
## 100093698 4.287670 4.342175 4.548165
## 1001 7.801146 8.303764 7.379277
tail(mat.67472)
## GSM1647628 GSM1647629 GSM1647630 GSM1647631 GSM1647632 GSM1647633
## 9987 8.797270 9.543499 10.196649 9.676556 9.926735 8.229028
## 999 5.087675 4.777227 4.619528 4.563001 4.851726 4.342094
## 9990 4.749497 4.817312 4.763305 4.536702 4.934358 4.470848
## 9991 3.489513 3.847950 3.519479 3.465999 3.720478 3.571195
## 9993 5.993702 6.638650 6.401174 6.405897 6.580947 6.278414
## 9994 4.715041 3.873419 4.268575 4.505541 3.731421 4.320570
## GSM1647634 GSM1647635 GSM1647636 GSM1647637 GSM1647638 GSM1647639
## 9987 9.628917 9.833227 9.693346 8.992126 8.708899 8.838605
## 999 4.814504 4.686905 4.635453 4.611680 4.501259 4.539497
## 9990 4.797551 5.157497 4.736082 4.530546 4.791117 4.714043
## 9991 3.510455 3.749444 3.634873 3.565544 3.502093 3.364446
## 9993 6.336191 6.338804 6.183747 6.486892 6.318802 6.415717
## 9994 4.215959 3.658543 4.321291 4.310333 4.197945 4.094475
## GSM1647640 GSM1647641 GSM1647642 GSM1647643 GSM1647644 GSM1647645
## 9987 8.519494 9.642169 9.600863 10.070196 9.844676 9.721972
## 999 4.151984 4.924871 4.421486 4.537612 4.454581 4.993588
## 9990 4.749003 4.718259 4.679408 4.681002 4.715246 4.466099
## 9991 3.555019 3.411676 3.526558 3.501498 3.445067 3.739113
## 9993 6.471316 6.469016 6.318802 6.065829 6.316447 5.990558
## 9994 4.510645 4.172083 4.261934 4.665155 4.682810 4.801920
## GSM1647646 GSM1647647 GSM1647648 GSM1647649 GSM1647650 GSM1647651
## 9987 8.900714 9.185092 10.038580 8.906396 10.208416 9.201599
## 999 4.550740 5.029535 4.749776 4.930668 4.887648 5.069506
## 9990 4.898078 4.781393 5.092634 4.689468 4.594047 4.770097
## 9991 3.566156 3.730246 3.797230 3.695209 3.519872 3.499086
## 9993 6.106470 6.263873 6.437010 6.651857 6.364111 6.644035
## 9994 4.236391 3.955137 4.111297 3.895333 3.983710 4.446326
## GSM1647652 GSM1647653 GSM1647654 GSM1647655 GSM1647656 GSM1647657
## 9987 8.975572 8.908338 9.027172 8.706801 9.687480 10.177435
## 999 4.902590 5.110962 4.477884 4.400306 4.940268 4.406582
## 9990 5.093683 4.851102 4.655348 4.333250 4.922592 4.816731
## 9991 3.821349 3.636542 3.522028 3.811687 3.531415 3.742637
## 9993 6.376458 6.683031 6.287875 6.121229 6.297613 6.342503
## 9994 3.829943 4.000390 4.115873 5.085913 4.197945 3.504014
## GSM1647658 GSM1647659 GSM1647660 GSM1647661 GSM1647662 GSM1647663
## 9987 9.596874 10.060879 9.823678 9.164008 9.350353 9.633823
## 999 4.750011 4.608157 4.705335 4.578296 4.537404 4.372189
## 9990 4.818815 4.874244 5.264185 4.745079 4.731024 4.606107
## 9991 3.547773 4.135978 3.586132 3.753409 3.456251 3.541659
## 9993 6.311116 6.302858 6.906638 6.337772 6.653700 6.251163
## 9994 4.278376 3.685006 3.960065 3.922632 4.009117 4.301791
## GSM1647664 GSM1647665 GSM1647666 GSM1647667 GSM1647668 GSM1647669
## 9987 9.904486 9.570970 10.081523 9.765201 9.055918 9.257624
## 999 4.769400 4.484010 4.694268 5.342281 4.413512 4.498139
## 9990 4.324302 4.632187 4.878395 4.794198 4.683746 4.767858
## 9991 3.481019 3.514511 3.658261 3.847191 3.330420 3.516446
## 9993 6.416057 6.227119 6.169913 6.302684 6.490340 6.776257
## 9994 4.379528 4.035962 4.447716 3.802357 4.298185 4.476881
## GSM1647670 GSM1647671 GSM1647672 GSM1647673 GSM1647674 GSM1647675
## 9987 9.073088 8.828343 8.894181 8.800846 8.700348 10.279342
## 999 4.903564 4.764284 4.457028 4.740606 4.512574 4.679920
## 9990 4.798875 4.851903 4.704414 4.619785 4.918170 4.957715
## 9991 3.762041 3.749075 3.676415 3.677416 3.870908 3.799202
## 9993 6.382595 6.165056 6.017437 6.632966 6.119678 6.725781
## 9994 4.050898 3.986944 4.617908 3.834929 4.568451 3.735107
## GSM1647676 GSM1647677 GSM1647678 GSM1647679 GSM1647680 GSM1647681
## 9987 10.169723 9.347962 9.776060 10.145354 10.182200 9.900812
## 999 4.594516 4.214567 4.252756 4.626332 4.584692 4.589770
## 9990 4.710790 4.847243 4.502457 4.818572 4.563963 4.585053
## 9991 3.656685 3.600700 3.649638 3.642221 3.696986 3.499682
## 9993 6.417382 6.458502 6.294467 6.162869 6.158380 6.122428
## 9994 4.475781 4.703668 4.831550 4.493391 4.268756 4.181568
## GSM1647682 GSM1647683 GSM1647684 GSM1647685 GSM1647686 GSM1647687
## 9987 9.040678 9.675271 9.663450 9.398040 9.701226 9.859115
## 999 4.301822 4.746444 4.524282 4.700323 4.739564 4.193090
## 9990 4.713128 4.631045 5.164010 4.702815 4.720601 4.762992
## 9991 3.457692 3.520906 3.548973 3.362627 3.615327 3.519613
## 9993 6.251369 6.409668 6.205957 6.413504 6.671587 6.174700
## 9994 3.977842 4.370924 3.631346 4.524520 3.794920 4.963665
## GSM1647688 GSM1647689 GSM1647690 GSM1647691 GSM1647692 GSM1647693
## 9987 10.387270 8.982582 12.545066 12.299644 12.391114 12.468412
## 999 4.489886 4.550368 8.659643 9.035696 8.365179 8.695121
## 9990 4.629061 4.723761 9.370547 9.352484 9.205883 9.150237
## 9991 3.666695 3.460645 10.087780 9.397239 9.378504 9.884493
## 9993 6.382622 6.349075 6.257728 6.138301 6.615222 6.638248
## 9994 4.302590 4.863930 3.329453 3.117792 3.581132 3.556022
## GSM1647694 GSM1647695 GSM1647696 GSM1647697 GSM1647698 GSM1647699
## 9987 12.321481 12.444026 12.469580 12.464626 12.446148 12.329810
## 999 8.610234 9.073496 8.585342 9.123837 8.817750 8.698458
## 9990 9.110475 8.943346 9.057649 9.203340 9.405636 9.266758
## 9991 9.489726 10.096217 9.927865 10.296162 10.117756 10.113108
## 9993 6.602443 6.510552 6.091101 6.385542 6.542367 6.405951
## 9994 3.603829 3.690442 3.853179 3.340139 3.657396 3.738596
## GSM1647700 GSM1647701 GSM1647702 GSM1647703 GSM1647704 GSM1647705
## 9987 12.357389 12.389274 12.469010 12.456117 12.393919 12.199283
## 999 8.355022 8.669765 8.371930 8.155059 8.716816 8.213573
## 9990 9.228869 8.943264 8.706569 9.273165 9.141237 8.760095
## 9991 9.618103 10.067136 9.782571 9.916169 9.990994 9.949034
## 9993 5.970204 6.324345 6.291997 5.774860 6.200433 6.431080
## 9994 3.425047 3.619653 3.358793 3.265579 3.575744 3.411245
## GSM1647706 GSM1647707 GSM1647708 GSM1647709 GSM1647710 GSM1647711
## 9987 12.468018 12.466848 12.332214 12.381347 12.308812 12.408019
## 999 8.476079 8.160227 8.020776 7.767112 8.177750 8.397695
## 9990 9.207019 9.216705 9.260052 9.503812 9.243718 9.487315
## 9991 9.893183 10.014857 10.019207 9.930516 9.874120 9.956008
## 9993 5.909816 6.450221 6.038726 5.886792 6.364923 6.088311
## 9994 3.401832 3.624383 3.200030 3.750955 3.427719 3.400561
## GSM1647712 GSM1647713 GSM1647714 GSM1647715 GSM1647716 GSM1647717
## 9987 12.435812 12.332821 12.419882 12.349310 12.376765 12.217223
## 999 8.231136 8.529670 8.600379 8.082034 7.664927 8.609067
## 9990 9.445479 9.311809 9.286401 9.484943 9.218387 9.406477
## 9991 10.030680 10.183490 10.075673 9.988534 10.136182 10.209932
## 9993 5.599545 6.401653 6.256615 5.882119 6.619720 6.187790
## 9994 3.307463 3.653609 3.469419 3.611405 3.387477 3.352760
## GSM1647718 GSM1647719 GSM1647720 GSM1647721 GSM1647722 GSM1647723
## 9987 12.350784 12.400903 12.411422 12.349919 12.343119 12.313529
## 999 7.799913 7.816289 7.900371 8.123323 7.627430 7.502731
## 9990 9.299853 9.476335 9.392429 9.321778 9.213561 9.137480
## 9991 9.943423 9.624970 9.734943 10.215482 9.833705 9.997739
## 9993 5.889117 6.024456 6.412208 5.929917 6.395975 6.085076
## 9994 3.568028 3.319400 3.592305 3.520880 3.648228 3.334641
## GSM1647724 GSM1647725 GSM1647726 GSM1647727 GSM1647728 GSM1647729
## 9987 12.312746 12.315157 12.307455 12.356493 12.359289 12.326813
## 999 7.553576 7.783808 7.951572 8.845998 8.632151 8.730998
## 9990 9.053719 8.683569 9.328236 9.242972 9.204422 9.159040
## 9991 9.940263 10.174905 10.053122 9.861581 9.593428 9.574328
## 9993 6.496909 6.379911 5.899576 6.733012 6.503043 6.017495
## 9994 3.471100 3.715923 3.351870 3.605524 3.404443 3.415473
## GSM1647730 GSM1647731 GSM1647732
## 9987 12.326307 12.374970 12.308382
## 999 8.392537 9.141722 8.752968
## 9990 9.485381 8.946582 9.219892
## 9991 9.564925 9.968836 9.582460
## 9993 6.094405 5.963519 6.651468
## 9994 3.222852 3.426816 3.556945
dim(mat.67472)
## [1] 20486 105
print(getwd())
## [1] "F:/winServer_G/ABI/ABI-Project-01"
⏰ timing: 1 day
✅ 13. Before identifying the differentially expressed genes, you must offer the sample information (sample labels, or groups).
<- paste("https://www.ebi.ac.uk/biostudies/files",
sample.url "E-GEOD-67472/E-GEOD-67472.sdrf.txt",
sep = "/")
<- read.csv(sample.url, sep = "\t", header = TRUE)
sample.info <- sample.info[, c(33, 46, 47, 50, 53)]
sample.info names(sample.info) <- c("Assay", "Age", "State", "Sex", "Group")
::datatable(sample.info) DT
⏰ timing: 1 day
⏰ timing: 1 day
Warning: This is a danger alert—check it out!
Eligibility criteria, statistical tests and software used for this protocol are properly described in the ‘‘before you begin’’ and ‘‘step-by-step methods details’’ sections.
\[ \sigma = \sqrt{ \frac{1}{N} \sum_{i=1}^N (x_i -\mu)^2} \]
📰 Nie X, Wei J, Hao Y, et al. Consistent biomarkers and related pathogenesis underlying asthma revealed by systems biology approach[J]. International journal of molecular sciences, 2019, 20(16): 4037.