How to Get (or Load) NCBI GEO Microarray Data into R using GEOquery Package from Bioconductor
By Güngör Budak
- One minute read - 183 wordsR, especially with lots of Bioconductor packages, provides nice tools to load, manage and analyze microarray data. If you are trying to load NCBI GEO data into R, use GEOquery package. Here, I’ll describe how to start with it and probably in my future posts I’ll mention more.
Installation
1source("http://bioconductor.org/biocLite.R")
2biocLite("GEOquery")
Usage
1library(GEOquery)
2gds <- getGEO("GDS5072")
or
1library(GEOquery)
2gds <- getGEO(filename="path/to/GDS5072.soft.gz")
getGEO
function return a complex class type GDS object which contains the complete dataset. For example to obtain sample organism information:
1organism <- gds@header$sample_organism
2print(organism)
3[1] "Homo sapiens"
Obtain sample IDs:
1sample <- gds@dataTable@columns$sample
2print(sample)
3[1] GSM1095883 GSM1095886 GSM1095877 GSM1095878 GSM1095879 GSM1095880
4[7] GSM1095881 GSM1095882 GSM1095884 GSM1095885 GSM1095876
511 Levels: GSM1095876 GSM1095877 GSM1095878 GSM1095879 ... GSM1095886
Obtain gene IDs:
1genes <- gds@dataTable@table$IDENTIFIER
2head(genes)
3[1] DDR1 RFC2 HSPA6 PAX8 GUCA1A UBA7
431596 Levels: ADAM32 AFG3L1P AK9 ALG10 ARMCX4 ATP6V1E2 ...
Obtain levels for the first sample:
1sample_1 <- gds@dataTable@table$GSM1095883
2head(sample_1)
3[1] "3362.6" "400" "70.9" "362.8" "5" "849"
GEOquery loads all parts of the data in a good format so that you can start your analysis directly. For more information visit package’s GitHub vignettes.