The CTMM TraIT project recently added the Cell Line Use Case (CLUC) to tranSMART. The CLUC is a collection of data on colorectal and prostate cell lines from an exceptionally broad set of platforms, as shown in the table below.
This diverse set is used to:
By incorporating the same platforms as used for ongoing research projects, this cell line set gives a representative test set comparable to real patient data, without the legal burden of handling personal data. The TraIT Cell Line Use Case transmart-ready files are available under the CC0 license for download here.
Please use the following citation when making use of this dataset: Bierkens, Mariska & Bijlard, Jochem "The TraIT cell line use case." Manuscript in preparation. More information can also be found on the Bio-IT World Poster "Multi-omics data analysis in tranSMART using the Cell Line Use Case dataset".
Table of contents
Note that folder structure is very important in the upload process, make sure to structure your data in the correct way (figure 1). For more detailed information about the data type you wish to load please refer to the section dedicated to that specific data type.
It is important to setup the batchdb.properties file to provide transmart-batch with the location and login information needed to load the data. A detailed explanation on the properties file can be found here.
For the tutorial the assumption is made that the data is loaded into a local database with default settings, meaning that the database is located on the same machine that has the data folders and the ETL pipeline scripts.
Important note: As transmart-batch currently does not have a pipeline for VCF data this data typ will have to be loaded with Kettle.
Setting up transmart-batch and general documentation
For the complete documentation on transmart-batch please look here.
To use transmart-batch with 16.1 or 16.2 you can use the V1.0 release. To use the latest version please clone the git repository and build transmart-batch:
git clone https://github.com/thehyve/transmart-batch.git
After building you should see transmart-batch/build/lib/transmart-batch-1.1-SNAPSHOT-capsule.jar
The properties file contains information as the location of the database, the username and password that are used to upload the data to the database. The properties is build up of four lines indicating which database is being used, either PostgreSQL or Oracle, the location of the database and the user.
Data structure and loading the data
In order to load the data properly the scripts need to know were the data is located, in order to achieve this the data structure is more of less set. In the data (available here) the only thing you have to do is extract the files and you are ready to load. The following figure gives an overview of the data types and the way the folder structure is build up. More details about particular datatypes can be found in there respective sections.
Getting the data to the server
If you want to upload the data to a server you first need to get the data on the server. The easiest way to do this is by opening a terminal window and connect to the server:
When the connection is made open a new terminal window (do not close the window where you connected to the server) and navigate to the study you want to copy. From the folder the study is located in run the following command:
scp -r study_name username@serverAddress:~(default, folder on server to put the data, ~ is your home folder)
Loading the data
To load the data transmart-batch needs three files.
data to be loaded params file, this can be the data type or the annotation platform params file