Skip to end of metadata
Go to start of metadata

This page contains links to data sets (and supporting documentation) that have been curated for loading into tranSMART by members of the community.  In most cases there will be several files for different types of data (clinical, expression, etc.) that must be downloaded.  The files can then be loaded into your tranSMART instance by following the instructions given below.

If you have data that you would like to contribute, or one that you would like to see here, please contact a member of the Content Committee.

 

Curated Datasets 

The Content Committee has been working diligently to obtain curated data in tranSMART-ready format.  The Foundation has organized these datasets in the tranSMART Foundation libarary server.  You can find these data on library.transmartfoundation.org.  Descriptions of these datasets are in progress and found below.  New data is in progress and will be posted on the wiki shortly.  You can load these datasets into your transmart instance using the directions below.  

Loading the data

These datasets can be loaded using transmart-data (available on github: https://github.com/tranSMART-Foundation/transmart-data/tree/release-16.1).

From the top level transmart-data directory on your system (on Oracle, replace 'samples/postgres' with 'samples/oracle'):

Select a dataset from the library server and install each datatype in turn, starting with the clinical data (if any) and any platform annotation. Loading the ref_annotation target(s) identifies any platform annotation and installs if it is not already in the database. Finally load the remaining available datatypes(s) by specifying the Load Target shown in the table for each dataset. 

 For example, to load the complete set of data for study GSE13168 (Load Target RanchoGSE13168):

cd transmart-data
source  ./vars
make update_datasets
make -C samples/postgres load_clinical_RanchoGSE13168
make -C samples/postgres load_ref_annotation_RanchoGSE13168
make -C samples/postgres load_expression_RanchoGSE13168

The files are available to download as tar.xz files from the tranSMART Foundation library server.  Each .tar.xz file contains all the required datafiles plus a file with a .params type which defines all the parameters needed by the loading scripts. In general you need several of these files (e.g. clinical, annotation, expression) in order to load the complete study, and must load the clinical data first. This information should be sufficient to load the datasets using any other loader of your choice.

Scripts are also provided to load all targets for a given study from the top level transmart-data directory. The scripts will check they are running from the top-level transmart-data directory and will load all targets for the selected study. See the Script column in the tables below.

Curated GEO (Gene Expression Omnibus) datasets

Study IDStudy Title Author (Published)DiseaseSubjectsPlatformDescriptionSourceDatatype Download LabelScript 
GSE10024Key Regulatory Molecules of Cartilage Destruction in Rheumatoid Arthritis: An in vitro StudyAndreas (2008)
Rheumatoid Arthritis6GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE10024 load_all_EtriksGSE10024.sh
GSE10500Gene expression in rheumatoid arthritis synovial macrophagesYarilina (2008)
Rheumatoid Arthritis8GPL8300GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE10500load_all_EtriksGSE10500.sh
GSE11575Mutual Antagonistic Relationship Between Prostaglandin E2 and Interferon-(gamma): Implications for Rheumatoid ArthritisMathieu (2008)Rheumatoid Arthritis12GPL4372GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE11575load_all_EtriksGSE11575.sh
GSE11827Identification of markers of responsiveness in rheumatoid arthritis patientBansard (2011)Rheumatoid Arthritis49GPL5215GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE11827load_all_EtriksGSE11827.sh
GSE11903Effective treatment of psoriasis with etanercept is linked to suppression of IL17 signaling, not immediate response TNFSuarez-Farinas (2009)
Psoriasis89GPL571GEO Public Studies with clinical and gene expression dataThomson Reutersclinical

ref_annotation

expression

RanchoGSE11903load_all_RanchoGSE11903.sh
GSE12051Microarray predictor of response to infliximab in rheumatoid arthritis (RA) patientsJulia (2009)Rheumatoid Arthritis44GPL2507GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE12051load_all_EtriksGSE12051.sh
GSE12653Gene Expression Analysis in Peripheral Blood of Patients with Rheumatoid ArthritisNishimoto (2008)Rheumatoid Arthritis72GPL7220GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE12653load_all_EtriksGSE12653.sh
GSE13026Early and long-standing rheumatoid arthritis: molecular signatures identified by gene expression profiling in synoviumLequerre (2009)Rheumatoid Arthritis45GPL5215GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE13026load_all_EtriksGSE13026.sh
GSE13168Effects of glucocorticoids and Protein Kinase A on growth factor- and 1beta- regulated geneMisior (2009)Asthma54GPL96

GEO Public Studies with clinical and gene expression data

Asthma dataset which has fairly complex structure.

Used to test ETL procedures and time series workflow

Rancho

clinical

ref_annotation

expression

RanchoGSE13168load_all_RanchoGSE13168.sh
GSE13732CIS (multiple sclerosis) (case-control) (time-series)Baranzini (2008)Multiple Sclerosis113GPL570GEO Public Studies with clinical and gene expression data  

clinical

ref_annotation

expression

 
RanchoGSE13732load_all_RanchoGSE13732.sh
GSE1456Gene expression of breast cancer tissue in a large population-based cohort of Swedish patientsPawitan (2005)Breast Cancer159

GPL96

GPL97

GEO Public Studies with clinical and gene expression data  clinical

ref_annotation

expression

RanchoGSE1456 load_all_RanchoGSE1456.sh
GSE14468Gene expression profiling of CEBPA double and single mutant and CEBPA wild type AMLVerhaak (2009)Acute Myeloid Leukemia526

GPL570

GEO Public Studies with clinical and gene expression data Elevadaclinical

ref_annotation

expression

ElevadaGSE14468  
GSE15245Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells Gurevich (2009)Multiple Sclerosis94

GPL96

GPL571

GEO Public Studies with clinical and gene expression data eTRIKS (University of Luxembourg)clinical

ref_annotation

expression

RanchoGSE15245

EtriksGSE15245

 

load_all_RanchoGSE15245.sh

load_all_EtriksGSE15245.sh

GSE15258Whole blood transcript profiling of rheumatoid arthritis patients Bienkowska Rheumatoid Arthritis86GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)clinical

ref_annotation

expression

RanchoGSE15258

EtriksGSE15258

 

load_all_RanchoGSE15258.sh

load_all_EtriksGSE15258.sh

GSE15316Differential expression of rituximab responders vs. non responders on 3 different blood cell typesJulia (2009)Rheumatoid Arthritis23GPL2507GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE15316load_all_EtriksGSE15316.sh
GSE15573Immunity and Defense Genes in Peripheral Blood Mononuclear Cells of Rheumatoid Arthritis patientsTeixeira (2009)Rheumatoid Arthritis33GPL6102GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE15573load_all_EtriksGSE15573.sh
GSE15602Differential gene expression in RA synovial biopsies from responders versus non-responders to adalimumab therapyBadot (2009)Rheumatoid Arthritis11GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE15602load_all_EtriksGSE15602.sh
GSE15615Differential effects of TNFalpha and IL1beta on FLS global gene expression profileBadot (2009)Rheumatoid Arthritis6GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE15615load_all_EtriksGSE15615.sh
GSE16879Mucosal expression profiling in patients with inflammatory bowel disease before and after first infliximab treatmentArijs (2009)Inflammatory Bowel Disease73GPL570GEO Public Studies with clinical and gene expression data eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE16879

EtriksGSE16879

 

load_all_RanchoGSE16879.sh

load_all_EtriksGSE16879.sh

GSE17755Human peripheral blood cells: autoimmune diseases vs. healthy individuals Nishimoto (2011)Autoimmune Diseases244GPL1291GEO Public Studies with clinical and gene expression data  

clinical

ref_annotation

expression

RanchoGSE17755 load_all_RanchoGSE17755.sh
GSE19821Rheumatoid arthritis and anti-TNF treatmentvan Baarsen (2010)Rheumatoid Arthritis30

GPL9869

GPL9870

GPL9871

GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE19821load_all_EtriksGSE19821.sh
GSE20141Expression analysis of laser-dissected SNpc neurons in Parkinson's diseaseZheng (2010)Parkinsons Disease18GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20141load_all_EtriksGSE20141.sh
GSE20146Expression analysis of dissected GPi in Parkinson's diseaseMiddleton (2010)Parkinsons Disease20GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20146load_all_EtriksGSE20146.sh
GSE20153Expression analysis of lymphoblast cells lines in Parkinson's diseaseZheng (2010)Parkinsons Disease16GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20153load_all_EtriksGSE20153.sh
GSE20163Systematic meta-analysis and replication of genome-wide expression studies of Parkinson's disease: 2Zheng (2010)Parkinsons Disease12GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20163load_all_EtriksGSE20163.sh
GSE20164Systematic meta-analysis and replication of genome-wide expression studies of Parkinson's disease: 3Zheng (2010)Parkinsons Disease11GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20164load_all_EtriksGSE20164.sh
GSE20168Transcriptional analysis of prefrontal area 9 in Parkinson's diseaseZhang (2005)Parkinsons Disease29GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20168load_all_EtriksGSE20168.sh
GSE20194MAQC-II Project: human breast cancer (BR) data set Popovici (2010)Breast Cancer278GPL96GEO Public Studies with clinical and gene expression data  

clinical

ref_annotation

expression

RanchoGSE20194 load_all_EtriksGSE20194.sh
GSE20291Transcriptional analysis of putamen in Parkinson's diseaseZhang (2005)Parkinsons Disease35GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20291load_all_EtriksGSE20291.sh
GSE20292Transcriptional analysis of whole substantia nigra in Parkinson's diseaseZhang (2005)Parkinsons Disease29GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20292load_all_EtriksGSE20292.sh
GSE20295Transcriptional analysis of multiple brain regions in Parkinson's diseaseZhang (2005)Parkinsons Disease93GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20295load_all_EtriksGSE20295.sh
GSE20314Systematic meta-analysis and replication of genome-wide expression studies of Parkinson’s disease: 4Zheng (2010)Parkinsons Disease8GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20314

load_all_EtriksGSE20314.sh
GSE20333Gene expression profiling of parkinsonian substantia nigraEdnaParkinsons Disease12GPL201GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE20333

load_all_EtriksGSE20333.sh
GSE20690Efficacy of anti-TNF biologic agent, infliximab, for RA patients using transcriptome analysis of white blood cells Takeuchi (2009)Rheumatoid Arthritis68GPL4133GEO Public Studies with clinical and gene expression data

Thomson Reuters

eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE20690

EtriksGSE20690

 

load_all_RanchoGSE20690.sh

load_all_EtriksGSE20690.sh
GSE2125isolated alveolar macrophagesWoodruff (2005)Asthma45GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE2125

load_all_EtriksGSE2125.sh
GSE21537The relationship between the gene expression profile in the synovium of RA patients at baseline and the clinical response to infliximab treatmentLindberg (2010)Rheumatoid Arthritis62GPL7768GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE21537

load_all_EtriksGSE21537.sh
GSE21959Transcriptional response to hypoxia of normal and rheumatoid arthritis synovial fibroblastsDel Rey (2010)Rheumatoid Arthritis36GPL4133GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE21959

load_all_EtriksGSE21959.sh
GSE22324Mapping of disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytesRaby (2010)Asthma200GPL6104GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE22324load_all_EtriksGSE22324.sh
GSE23611Lung biopsies from asthmatics pre- and post- steroid treatment and from healthy controlChoy (2011)Asthma62GPL6480GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE23611load_all_EtriksGSE23611.sh
GSE23676Expression data from advanced Parkinson's disease (PD) patients leukocytes - prior to and following deep brain stimulation (DBS) treatment in on and off stimulation conditions, and matched healthy control (HC) subjectsSoreq (2012)Parkinsons Disease27GPL5188

GEO Public Studies with clinical and expression data.

Very large platform with 40 million probes. ETL can stall on smaller systems.

eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE23676load_all_EtriksGSE23676.sh
GSE24060Peripheral Blood Cells Gene Expression Profiles from Discordant Monozygotic Twins with Systemic Autoimmune disease (SAIDs) O'Hanlon (2011)Autoimmune Diseases80GPL7264GEO Public Studies with clinical and gene expression dataRancho

clinical

ref_annotation

expression

RanchoGSE24060 load_all_EtriksGSE24060.sh
GSE25066Genomic predictor of response and survival following neoadjuvant taxane-anthracycline chemotherapy in breast cancerHatzis (2011)Breast Cancer508GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE25066

EtriksGSE25066

load_all_RanchoGSE25066.sh

load_all_EtriksGSE25066.sh
GSE25160Combination of peripheral blood gene expression profiles and clinical parameters predicts response for tocilizumab (anti-IL6) treatment in rheumatoid arthritisMesko (2012)Rheumatoid Arthritis26GPL6244GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE25160

load_all_EtriksGSE25160.sh
GSE27390Human bone marrow-derived mononuclear cells (BMMC): rheumatoid arthritis vs. osteoarthritisLee (2011)Rheumatoid Arthritis19GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE27390

load_all_EtriksGSE27390.sh
GSE27831 Syntenin-1 is expressed in uveal melanoma and correlates with metastatic progression Gangemi (2012)Melanoma Uveal29GPL570

GEO Public Studies with clinical and gene expression data.

General oncology set to test main functionality

Rancho

clinical

ref_annotation

expression

RanchoGSE27831 load_all_EtriksGSE27831.sh
GSE30662Comparison of human Rheumatoid arthritis patient peripheral whole blood mRNAs just before and after LeukocytapheresisKusaoi (2011)Rheumatoid Arthritis8GPL5639GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE30662

load_all_EtriksGSE30662.sh
GSE31773Comparison of mRNA expression in circulating T-cells from patients with severe asthma Tsitsiou (2012)Asthma

(eT) 40

(CH) 20

GPL570

GEO Public Studies with clinical and gene expression data

ConvergeHealth clinical data used to test Across Trial with GSE34466

ConvergeHealth (clinical only)

eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE31773 (clinical only)

EtriksGSE31773

load_all_RanchoGSE31773.sh (clinical only)

load_all_EtriksGSE31773.sh
GSE32583Expression data from lupus NZB/W, NZM2410, NZW/BXSB mouse kidneys prenephritic and nephritic.Berthier (2012)Lupus Nephritis57GPL7546GEO Public Studies with clinical and gene expression data eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE32583

EtriksGSE32583

load_all_RanchoGSE32583.sh

load_all_EtriksGSE32583.sh
GSE32591Expression data from human with lupus nephritis (LN) 

Berthier (2012)

Lupus Nephritis47GPL14663GEO Public Studies with clinical and gene expression data eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE32591

EtriksGSE32591

load_all_RanchoGSE32591.sh

load_all_EtriksGSE32591.sh
GSE33377Expression profiling of rheumatoid arthritis patients treated with anti-TNFToonen (2012)Rheumatoid Arthritis42GPL5175GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE33377

load_all_EtriksGSE33377.sh
GSE3365 Comparison of PBMCs in Inflammatory Bowel DiseaseBurczynski (2006)Inflammatory Bowel Disease127GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE3365

EtriksGSE3365

load_all_RanchoGSE3365.sh

load_all_EtriksGSE3365.sh
GSE34466Airway Epithelial miRNA Expression in Asthma Barczak (2012)Asthma63GPL15016

GEO Public Studies with clinical and gene expression data

ConvergeHealth clinical data used to test Across Trial with GSE31773

ConvergeHealthclinicalRanchoGSE34466 load_all_RanchoGSE34466.sh
GSE34577Routine use of microarray-based gene expression profiling to identify patients with low cytogenetic risk acute myeloid leukemia: accurate results can be obtained even with suboptimal samples (training samples)Guardiola (2012)Acute Myeloid Leukemia89GPL6947

GEO Public Studies with clinical and gene expression data

 

ElevadaclinicalElevadaGSE34577load_all_ElevadaGSE34577.sh
GSE34714Routine use of microarray-based gene expression profiling to identify patients with low cytogenetic risk acute myeloid leukemia: accurate results can be obtained even with suboptimal samples. (test samples)Guardiola (2012)Acute Myeloid Leukemia117GPL6947

GEO Public Studies with clinical and gene expression data

 

ElevadaclinicalElevadaGSE34714load_all_ElevadaGSE34714.sh
GSE35642Transcriptome analysis of a chronic in vitro model of ParkinsonismCabeza-Arvelaiz (2012)Parkinsons Disease18GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE35642

load_all_EtriksGSE35642.sh
GSE35643Expression data from human bronchial airway smooth muscle (ASM) cellsGounni (2012)Asthma12GPL6244GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE35643load_all_EtriksGSE35643.sh
GSE3592Expression profiling of responsiveness to infliximab in rheumatoid arthritis patientsLequerre (2006)Rheumatoid Arthritis44GPL3064GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE3592

load_all_EtriksGSE3592.sh
GSE36757Human Rheumatoid Arthritis Synovial Fibroblasts: Control vs. Tenascin-C StimulatedAsano (2012)Rheumatoid Arthritis1GPL9425GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE36757

load_all_EtriksGSE36757.sh
GSE37642Prognostic gene signature for AMLHerold (2013)Acute Myeloid Leukemia562GPL97

GEO Public Studies with clinical and gene expression data

 

ElevadaclinicalElevadaGSE37642load_all_ElevadaGSE37642.sh
GSE40240Expression data from peripheral blood - blood draws at Pre and Post time points of Allergen inhalation challenge (ER and DR)Tebbutt (2012)Asthma28GPL6244GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE40240load_all_EtriksGSE40240.sh
GSE41861Upper airway gene expression is an effective surrogate biomarker for Th2-driven inflammation in the lower airwayCheng (2012)Asthma138GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE41861load_all_EtriksGSE41861.sh
GSE41862Nasal scrape gene expression profiling in asthmaticsCheng (2012)Asthma116GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE41862load_all_EtriksGSE41862.sh
GSE41863Sputum gene expression profiling in asthmaticsCheng (2012)Asthma56GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE41863load_all_EtriksGSE41863.sh
GSE4271Molecular subclasses of high-grade glioma: prognosis, disease progression, and neurogenesisPhillips (2006)Brain Cancer77

GPL96

GPL97

GEO Public Studies with clinical and gene expression data Rancho

clinical

ref_annotation

expression

RanchoGSE4271 load_all_RanchoGSE4271.sh
GSE4302Genome-Wide Profiling of Airway Epithelial Cells in Asthmatics, Smokers and Healthy ControlsWoodruff (2007)Asthma118GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE4302

load_all_EtriksGSE4302.sh
GSE43696To identify differentially expressed genes between normal control (NC), mild-moderate asthmatic (MMA) and severe asthmatic (SA) patientsMilosevic (2013)Asthma109GP6480GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE43696load_all_EtriksGSE43696.sh
GSE4382Repeated observation of breast tumor subtypes in independent gene expression data setsSorlie (2003)Breast Cancer167GSE4382PDM

GEO Public Studies with clinical and gene expression data

eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE4382

EtriksGSE4382

load_all_RanchoGSE4382.sh

load_all_EtriksGSE4382.sh
GSE44037

Expression data from airway epithelial cells from patients with asthma, rhinitis, and healthy controls 

Wagener (2013)Asthma34GPL13158GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE44037load_all_EtriksGSE44037.sh
GSE45111

A sputum gene expression signature of six biomarkers identifies asthma inflammatory phenotype and steroid responsiveness 

Baines (2013)Asthma47GPL6104GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE45111load_all_EtriksGSE45111.sh
GSE45847To identify gene markers which differentiate between Aspirin-exacerbated respiratory disease (AERD) and aspirin-tolerant asthma (ATA) with a high discriminative power.Shin (2013)Asthma42GPL16979GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE45847load_all_EtriksGSE45847.sh
GSE46171

Asthmatics with exacerbation during acute respiratory illness exhibit unique transcriptional signatures

McErlean (2013)Asthma91GPL16981GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE46171load_all_EtriksGSE46171.sh
GSE4698Molecular characterization of very early relapsed childhood ALL Kirschner, Schwabe (2006)Acute Lymphoblastic Leukemia60GPL96GEO Public Studies with clinical and gene expression data Rancho

clinical

ref_annotation

expression

RanchoGSE4698 load_all_RanchoGSE4698.sh
GSE4922Genetic Reclassification of Histologic Grade Delineates New Clinical Subtypes of Breast CancerIvshina (2006)Breast Cancer289

GPL96

GPL97

GEO Public Studies with clinical and gene expression dataRancho

clinical

ref_annotation

expression

RanchoGSE4922 load_all_RanchoGSE4922.sh
GSE52074Human rhinovirus infection causes different DNA methylation changes in nasal epithelial cells from healthy and asthmatic subjectsMcErlean (2013)Asthma18GPL13534GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE52074load_all_EtriksGSE52074.sh
GSE5327Breast cancer relapse free survival and lung metastasis free survivalMinn (2007)
Breast Cancer58GPL96

GEO Public Studies with clinical and gene expression data

General oncology set to test main functionality

Rancho

clinical

ref_annotation

expression

RanchoGSE5327load_all_RanchoGSE5327.sh
GSE54282Systems-based analyses of brain regions functionally impacted in Parkinson's disease reveals underlying causal mechanismsRiley (2014)Parkinsons Disease33GPL17047GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE54282

load_all_EtriksGSE54282.sh
GSE54605

Human rhinovirus infection causes different DNA methylation changes in nasal epithelial cells from healthy and asthmatic subjects

Xiang (2014)Asthma10GPL1162GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE54605load_all_EtriksGSE54605.sh
GSE56553Short-term exposure to diesel exhaust is associated with dynamic changes in DNA methylation of circulating mononuclear cells in asthmaticsKober (2014)Asthma96GPL13534

GEO Public Studies with clinical and gene expression data.

Very large platform with 30 million probes. ETL can stall on smaller systems.

eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE56553load_all_EtriksGSE56553.sh
GSE59339Characteristic DNA methylation profiles in peripheral blood monocytes are associated with inflammatory phenotypes of asthma

Baines (2014)

Asthma62GPL8490GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE59339load_all_EtriksGSE59339.sh
GSE61225Gene expression changes in blood RNA after swimming in a poolSumoy (2014)Asthma74GPL19169GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE61225load_all_EtriksGSE61225.sh
GSE63270Expression profiles of normal hematopoietic stem and progenitor cells and acute myeloid leukemia sub-populationsGentles (2015)Acute Myeloid Leukemia104GPL17810

GEO Public Studies with clinical and gene expression data

 

Elevada

clinical

ref_annotation

expression

ElevadaGSE63270load_all_ElevadaGSE63270.sh
GSE63383Expression data from asthmatic and healthy airway smooth muscle cellsFaiz (2014)Asthma24GPL6244GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE63383load_all_EtriksGSE63383.sh
GSE65163The Nasal Methylome and Childhood Atopic AsthmaSchwartz (2015)Asthma72GPL13534GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE65163load_all_EtriksGSE65163.sh
GSE65204

The Nasal Gene Expression and Childhood Atopic Asthma

Schwartz (2015)Asthma69GPL14550GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE65204load_all_EtriksGSE65204.sh
GSE65205Childhood Atopic AsthmaSchwartz (2015)Asthma141GPL13534GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE65205load_all_EtriksGSE65205.sh
GSE6613Parkinson's disease vs. controls, whole bloodScherzer (2007)Parkinsons Disease105GPL96GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE6613

load_all_EtriksGSE6613.sh
GSE67472

Airway epithelial gene expression in asthma versus healthy controls

Christenson (2015)Asthma105GPL16311GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE67472load_all_EtriksGSE67472.sh
GSE7390Strong Time Dependence of the 76-Gene Prognostic Signature Desmedt (2007)Breast Cancer198GPL96GEO Public Studies with clinical and gene expression data eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE7390

EtriksGSE7390

load_all_RanchoGSE7390.sh

load_all_EtriksGSE7390.sh
GSE74075

Changes in Expression of Genes Regulating Airway Inflammation Following a High-Fat Mixed Meal in Asthmatics

Li (2016)Asthma16GPL6883GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE74075load_all_EtriksGSE74075.sh
GSE7621Expression data of substantia nigra from postmortem human brain of Parkinson's disease patients (PD)Lesnick (2007)Parkinsons Disease25GPL570GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE7621

load_all_EtriksGSE7621.sh
GSE8581Human Chronic Obstructive Pulmonary Disorder (COPD) Biomarker Bhattacharya  (2009)COPD58GPL570GEO Public Studies with clinical and gene expression data Rancho

clinical

ref_annotation

expression

RanchoGSE8581 load_all_RanchoGSE8581.sh
GSE8650Blood Leukocyte Microarrays to Diagnose Systemic Onset Juvenile Idiopathic Arthritis and Follow IL-1 blocade Allantaz (2007)Juvenile Idiopathic Arthritis115

GPL96

GPL97

GEO Public Studies with clinical and gene expression data Rancho

clinical

ref_annotation

expression

RanchoGSE8650 load_all_RanchoGSE8650.sh
GSE9329Expression data of synovial cells isolated from patients with rheumatoid arthritisKabuyama (2008)Rheumatoid Arthritis2GPL201GEO Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSE9329

load_all_EtriksGSE9329.sh
GSE9782Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomibMulligan (2007)Multiple Myeloma264

GPL96

GPL97

GEO Public Studies with clinical and gene expression data eTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

RanchoGSE9872

EtriksGSE9872

load_all_RanchoGSE9872.sh

load_all_EtriksGSE9872.sh

 

Curated TCGA (The Cancer Genome Atlas) datasets

 

Study IDStudy Title Author (Published)DiseaseSubjectsPlatformDescriptionSourceDatatype DownloadScript
BreastBreast

Breast invasive carcinoma 849 Agi_G4502ATCGA Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksTCGAbreastload_all_EtriksTCGAbreast.sh
ColorectalCRC

Colon adenocarcinoma 276 Agi_UNC_244TCGA Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksTCGAcolorectalload_all_EtriksTCGAcolorectal.sh
EndometEndo

Uterine corpus endometrial carcinoma 373 Affy_microarrayTCGA Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksTCGAendometload_all_EtriksTCGAendomet.sh
OvarianOV

Overian serous Cystadenocarcinoma488Agi_Affy_HuEX_U133ATCGA Public Studies with clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksTCGAovarianload_all_EtriksTCGAovarian.sh

 

Other curated datasets

 

Study IDStudy Title Author (Published)DiseaseSubjectsPlatformDescriptionSourceDatatype DownloadScript
CCLECancer Cell-Line Encyclopedia
Cancers1070GPL570 modifiedCancer cell-line encyclopedia clinical and gene expression dataeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksCCLEload_all_EtriksCCLE.sh
GSK Cell LinesGlaxo SmithKline Cell Lines

(many)950GPL570Clinical data and gene expression data for cell line catalogeTRIKS (University of Luxembourg)

clinical

ref_annotation

expression

EtriksGSKCellLinesload_all_EtriksGSKCellLines.sh


Test Datasets contributed by Sanofi for datatypes and serial data

These datasets are used for testing tranSMART releases. Data may be changed for testing purposes.

Study IDStudy Title Author  (Published)DiseaseSubjectsPlatformDescriptionSourceDatatypesDownloadScript
GSE19976Gene expression analysis of lung biopsies from patients with two different forms of pulmonary sarcoidosis Ho (2010)Lung Cancer15GPL6244GEO Public Studies with clinical and gene expression data Sanofi

clinical

ref_annotation

expression

SanofiGSE19976load_all_SanofiGSE19976.sh
GSE38642Expression data from human pancreatic isletsTaneera (2012)Diabetes63GPL6244GEO Public Studies with clinical and gene expression data Sanofi

clinical

ref_annotation

expression

SanofiGSE38642load_all_SanofiGSE38642.sh
TEST_MANNHEIM

Test Mannheim with samples

Test data for serial clinical (low-dimensional) data

Test dataDiabetes30-Serial Loading of data and Analysis on longitudinal studiesSanofi 

clinical

SanofiMannheimload_all_SanofiMannheim.sh
TEST_GSE18938

Data derived from GSE18938

Effect of EGF and/or HER2 on the growth of MCF10A cells on extracellular matrix: time course.

Pradeep (2012)Breast Cancer2GPL6244Serial Loading of expression data and analysis on longitudinal studiesSanofi

clinical

ref_annotation

expression

SanofiSerialHighdimload_all_SanofiSerialHighdim.sh
TEST_SERIAL_RICERCATest data for serial expression   GPL570Serial Loading of expression dataSanofi

clinical

ref_annotation

expression

SanofiSerialRicercaload_all_SanofiSerialRicerca.sh
 TEST_GSE37424

miRNA qPCR data derived from GSE37424

miRNAs expression in spinal cords of experimental autoimmune encephalomyelitis (EAE) mice

Shen Autoimmune Diseases  Test all HD Analysis  for qPCR miRNA data type; Sample IDs can be viewed in Grid view; Biomarker search based on miRNA id from miRBaseSanofi

clinical

ref_annotation

mirnaqpcr

SanofiMirnaGSE37424load_all_SanofiMirnaGSE37424.sh
 TEST_GSE37425

miRNA qPCR data derived from GSE37425

miRNAs expression in synovial tissues of rheumatoid arthritis (RA) individuals

ShenRheumatoid Arthritis  Test all HD Analysis  for qPCR miRNA data type; Sample IDs can be viewed in Grid view; Biomarker search based on miRNA id from miRBaseSanofi

clinical

ref_annotation

mirnaqpcr

SanofiMirnaGSE37425load_all_SanofiMirnaGSE37425.sh
 TEST_GSE42468

miRNA qPCR data derived from GSE42468

Differentially expressed miRNAs in MDA-MB-231 over-expressing GATA3 

Chou Breast Cancer  Test all HD Analysis  for qPCR miRNA data type; Sample IDs can be viewed in Grid view; Biomarker search based on miRNA id from miRBaseSanofi

clinical

ref_annotation

mirnaqpcr

SanofiMirnaGSE42468load_all_SanofiMirnaGSE42468.sh
TEST_GSE49520

miRNA qPCR data derived from GSE49520

microRNA Dysregulation in Human Prostate Cancer Cell Model (M12 and P69)

Budd Prostate Cancer  Test all HD Analysis  for qPCR miRNA data type; Sample IDs can be viewed in Grid view; Biomarker search based on miRNA id from miRBaseSanofi

ref_annotation

mirnaqpcr

(no clinical)

SanofiMirnaGSE49520load_all_SanofiMirnaGSE49520.sh
TEST_WITTEN_2010miRNAseq test dataWitten (2010)Test Data31

WITTEN

(test data)

Test all HD Analysis  for sequence based miRNA data type; Sample IDs can be viewed in Grid view; Biomarker search based on miRNA id from miRBaseSanofi

clinical

ref_annotation

mirnaseq

SanofiMirnaseqWittenload_all_SanofiMirnaseqWitten.sh
TEST_GSE48213

RNAseq test data derived from GSE48213

Transcriptional profiling of a breast cancer cell line panel using RNAseq technology

Heiser (2013)Breast Cancer56GPL10999Test all Analysis  for rnaseq data type; Sample IDs can be viewed in Grid view; Biomarker search based on transcript IDs/Gene Symbol from Ensembl and RefseqSanofi

clinical

ref_annotation

rnaseq

SanofiRnaseqGSE48213load_all_SanofiRnaseqGSE48213.sh
TEST_METABOLOMICSMetabolomics test dataTest dataTest Data24

METABOLOMICS_ANNOT

(test data)

Test all Analysis  for metabolomics data type; Sample IDs can be viewed in Grid view; Biomarker search based on  pathway name and super pathway namesSanofi

clinical

ref_annotation

metabolomics

SanofiMetabolomicsload_all_SanofiMetabolomics.sh

TEST_MASS_

SPEC_PROTEOMICS

Mass-spec proteomics test dataTest dataTest Data17

PROTEOMICS

(test data)

Test all Analysis  for Proteomics data type; Sample IDs can be viewed in Grid view; Biomarker search based on Uniprot ID, Uniprot name and Gene SymbolSanofi

clinical

ref_annotation

msproteomics

SanofiProteomicsload_all_SanofiProteomics.sh
TEST_RBM_ADNIRBM proteomics test dataAlzheimers DiseaseTest Data158

DISCOVERY_MAP

(test  data)

Test all Analysis  for RBM data type; Sample IDs can be viewed in Grid view; Biomarker search based on Uniprot ID, Uniprot name, Gene symbol and analyte nameSanofi

clinical

ref_annotation

rbm

SanofiRbmAdniload_all_SanofiRbmAdni.sh

TEST_GSE4382_INC

Test data derived from GSE4382

Repeated observation of breast tumor subtypes in independent gene expression data sets

Breast CancerTest Data143 + 22-

Clinical test data loaded in 4 incremental steps:

  • Initial data load
  • adding new nodes
  • replace data for the new nodes
  • adding new 22 subjects
Sanofi

clinical

clinical

clinical

clinical

SanofiGSE4382Inc

SanofiGSE4382Inc2

SanofiGSE4382Inc3

SanofiGSE4382Inc4

load_all_SanofiGSE4382Inc.sh

Cell Line Use Case 

The Cell Line Use Case data is provided by CTMM TraIT for testing multiple datatypes on public data.

A copy of the data is available from the TranSMART Foundation library server

See also the Tutorial: How to load the Cell Line Use Case dataset with transmart-data.

Data TypeSubset SubjectsPlatformDescriptionSourceLoad target
clinical Single set19 -Clinical dataCTMM TraIT 
aCGHregions 180k5180k region Copy number variation dataCTMM TraIT 
aCGHregions 224k10224k regionaCGH dataCTMM TraIT 
aCGH genes 180k5GPL8687aCGH dataCTMM TraIT 
aCGHgenes 224k10GPL8687aCGH dataCTMM TraIT 
expressionAffymetrix exon array2Affymetrix exon arrayexpression dataCTMM TraIT 
expressionAgilent 4k mRNA microarray16Agilent 4k mRNA microarrayexpression dataCTMM TraIT 
mirna_qpcrMicroRNA qPCR data2Cell-line mirnamiRNA qPCR dataCTMM TraIT 
proteomicsProteomics data8Cell-line proteomicsMass-spec proteomics dataCTMM TraIT 
rnaseqIllumina GA II6IlluminaRNAseq dataCTMM TraIT 
rnaseqIllumina HiSeq2000 Untreated 11IlluminaRNAseq dataCTMM TraIT 
rnaseqIllumina HiSeq2000 Untreated 21IlluminaRNAseq dataCTMM TraIT 
rnaseqIllumina HiSeq2000 DMSO control 11IlluminaRNAseq dataCTMM TraIT 
rnaseqIllumina HiSeq2000 DMSO control 21IlluminaRNAseq dataCTMM TraIT 
rnaseqIllumina HiSeq2000 siNon-Targeting 11IlluminaRNAseq dataCTMM TraIT 
rnaseqIllumina HiSeq2000 siNon-Targeting 21IlluminaRNAseq dataCTMM TraIT 
vcfComplete genomics DNAseq 2-Genomic variation dataCTMM TraIT 
vcfIllumina GA II RNAseq6-Variation data from RNAseqCTMM TraIT 
vcfIllumina HiSeq2000 RNAseq Untreated 11-Variation data from RNAseqCTMM TraIT 
vcfIllumina HiSeq2000 RNAseq Untreated 21-Variation data from RNAseqCTMM TraIT 
vcfIllumina HiSeq2000 RNAseq DMSO control 11-Variation data from RNAseqCTMM TraIT 
vcfIllumina HiSeq2000 RNAseq  DMSO control 21-Variation data from RNAseqCTMM TraIT 
vcfIllumina HiSeq2000 RNAseq siNon-Targeting 11-Variation data from RNAseqCTMM TraIT 
vcfIllumina HiSeq2000 RNAseq siNon-Targeting 21-Variation data from RNAseqCTMM TraIT 

GWAS example data

Contributed by Pfizer

DatasetDisease ApplicationSourceDownload
MAGIC_2hrGlucose_AdjustedForBMI GWASPfizer

SNP Annotations and GWAS Data

MAGIC_FastingGlucose GWASPfizer 
MAGIC_ln_FastingInsulin GWASPfizer 
MAGIC_ln_HOMA-B GWASPfizer 
MAGIC_ln_HOMA-IR GWASPfizer 

Dictionary data

Loaders are available for the most useful dictionaries for searching and linking data in tranSMART.

Some datasets (e.g. KEGG) can only be used with an appropriate license.

The scripts on the test servers use the loader in tranSMART-ETL which has been tested on both Postgres and Oracle.

DictionaryDescriptionSourceLoad script on test servers Download
Entrez Gene InfoEntrez gene names and IDsImperialload-entrez.sh  Human and mouse genes
MeSHMedline search termsImperialload-mesh.sh MeSH terms
KEGGKEGG PathwaysImperialload-kegg.sh Last public release
UniProtUniProt / SwissProt humanImperial

load-uniprot.sh

load-uniprot-sprot.sh

Human swissprot
MiRBaseMiRBaseImperialload-mirbase.sh MiRBase
Gene OntologyGene OntologyImperialload-go.shGene Ontology 
HMDBHuman metabolite databaseImperialload-hmdb.sh HMDB

Other contributions

DatasetDisease ApplicationSourceDownload
Cancer Cell Line Encyclopedia (CCLE) with SNP Subset of original Study loaded for testing purposes - 30 Subjects out of 947 Subjects. Uploaded Clinical, Gene Expression, RNASEQ-RPKM, and SNP data for 30 Subjects.Thomson Reuters 
VCF Test (VCF Browser needed for functionality testing) Truncated VCF file for testing, located in the 'Test' Folder in Dataset ExplorerThomson Reuters 
aCGH Data  CTMM TraIT (The Hyve) 
RNAseq (Other than Sanofi)  CTMM TraIT (The Hyve) 
VCF (Genome Browser needed for functionality testing)  CTMM TraIT (The Hyve) 

 

 

Original sources

Rancho Biosciences

Rancho Biosciences provides a selection of studies for testing early tranSMART releases, and have collected and curated all available public studies from the 2015 testathons.

The original Rancho curated studies are at: http://ranchobiosciences.com/free-downloads/

Pfizer GWAS

The tranSMART GWAS code repository has an "External Dep" folder that includes all of the dependencies (DB as well as ETL).  README.md has configuration changes. 

https://github.com/transmart/transmart-gwas-plugin/tree/pfizer/External%20Dep

Janssen

GSE8581 was one of the three GEO datasets provided by Janssen for early testing. The data is now loaded from transmart-data.

CTMM TraIT - Cell Line use case

CTMM TraIT has brought together a diverse set of data from 6 different cell lines under the Cell Line use case, to test tranSMART on public data without complicated privacy regulations.  A detailed tutorial can be found here: Loading data with transmart-data (Cell Line Use Case dataset).

eTRIKS

The eTRIKS project has over 50 curated studies loaded in the public eTRIKS server: https://public.etriks.org/. This includes both clinical and mRNA gene expression data fro GEO studies plus studies from The Cancer Genome Atlas (TCGA) and other sources.

The datasets are curated by the University of Luxembourg eTRIKS partners, and have been made available to the tranSMART community.

Thomson Reuters

For the first public release of tranSMART, Thomson Reuters released 10 curated GEO studies. These studies are now included in the curated set from Rancho Biosciences.

The original curated files in tranSMART-ready format can be downloaded from https://github.com/eugene-rakhmatulin/tMDataSamples 

ConvergeHEALTH by Deloitte

For the across trials functionality, ConvergeHEALTH contributed alternative data loading for the following studies for early tranSMART testing.

The original set are available on Github (https://github.com/transmart/tranSMART-ETL/tree/master/V1.2_Hackathon/Data)

  • Asthma_Tsitsiou_GSE31773
  • Asthma_Woodruff_GSE34466

 

 

  • No labels