Skip to end of metadata
Go to start of metadata

GUIDE TO INSTALL TRANSMART v1.2 ON IBM POWER8 SYSTEM (PostgreSQL)

Table of Contents
Part I. INTRODUCTION
Part II. SOFTWARE PREREQUISITES
1. Compiler and tools:
2. R and Bioconductor
3. Web application development tools
4. PostgreSQL database
Part III. TRANSMART INSTALLATION
1. Download tranSMART Data
2. Install tranSMART Data
Part IV. BUILD AND START TRANSMART WEB SERVER
1. Building tranSMART from source
2. Start tranSMART application server
3. Check tranSMART is correctly started
PART V. EXTRA: PostgreSQL data directory migration to GPFS
1. Fresh install data directory modification:
2. Change existing data directory:
3. Migrate tablespaces to GPFS
PART VI. EXTRA: Enable tranSMART DB on Power8 Running RHEL 7.1

 

Part I. INTRODUCTION

Translational medical research and personalized health become hot topics in life sciences in recent years, with the availability and capability of next generation sequencing tools and clinical data. TranSMART (www.transmartfoundation.org) is such an open source platform for knowledge management of translational research data, including next generation sequencing (NGS) data such DNA, RNA and protein sequences along with clinical information such as clinical trials, patient demographics and disease conditions. It serves as an environment for scientists in bioinformatics to develop and refine research hypotheses by investigating correlations between genomic sequences and phenotypic data, and assessing their analytical results in the context of published literatures.
IBM Power8 is a data-centric system to provide outstanding performance for big data handling, including data extract, transform and load (ETL) and analysis, which are bottleneck on tranSMART platform. This guide provides step-by-step instructions to install tranSMART version 1.2 on IBM Power8 system running Ubuntu 14.04.2. Tips for system and application tuning and optimization are also included.

Part II. SOFTWARE PREREQUISITES

Before install tranSMART package, GCC and JAVA compilers and tools are required on the system. Additionally, to tranSMART has the following prerequisites:

Compiler and tools:

Make sure GCC, JAVA7, ANT, GIT and make are installed on the system. If not:
$ sudo apt-get install openjdk-7-jdk
$ sudo apt-get install icedtea-7-plugin
$ sudo apt-get install make
$ sudo apt-get install ant
$ sudo apt-get install git
Set ANT_HOME and JAVA_HOME in your user profile (.profile or .bash_profile) and the PATH to the locations where these packages are installed.

R and Bioconductor

Two ways of installing R and Bioconductor:

  1. Install system R-base and core:

$ sudo apt-get install r-base
$ sudo R
>install.packages("Rserve")
$ sudo R
sudo apt-get install libcairo2-dev
sudo apt-get install libxt-dev>install.packages("Cairo")>install.packages("MASS") **no need for new version>install.packages("stringr")>install.packages("ggplot2")>install.packages("plyr")>install.packages("reshape2")>install.packages("gplots")>install.packages("data.table")
>source("{+}http://bioconductor.org/biocLite.R+")>biocLite("impute")
>biocLite("preprocessCore")
>biocLite("AnnotationDbi")
> biocLite("GO.db")
> biocLite("limma")
>install.packages("WGCNA")
Tips: allowWGCNAThreads()
export ALLOW_WGCNA_THREADS=16
>biocLite("multtest")

  1. Install latest version of R and R optimization

R version of R-3.2.1 is downloaded from http://cran.cnr.berkeley.edu/src/base/R-3/R-3.2.1.tar.gz. R is compiled with GCC and IBM XL Fortran compilers.

  1. Prerequisite

apt-get install texinfo
apt-get install texlive
apt-get install texi2html
apt-get install fonts-inconsolata texlive-fonts-extra
apt-get install libicu-dev

  1. Configuration

$ cat compile.sh
CC="gcc" CFLAGS="-O3 -mvsx -maltivec -mcpu=power8 -mtune=power8 " CXX="g++" CXXFLAGS="-O3 -mvsx -maltivec -mcpu=power8 -mtune=power8" F77="xlf_r -qextname -q64" F90="xlf90_r -qextname -q64" FC="xlf90_r -qextname -q64" FFLAGS="-O3 -qaltivec -qarch=pwr8 -qtune=pwr8 -qcache=auto" CPICFLAGS="-fPIC" FPICFLAGS="-qpic" CXXPICFLAGS="-fPIC" SHLIB_LDFLAGS="-shared" SHLIB_CXXLDFLAGS="-shared" LIBS="-L/opt/ibm/xlmass/8.1.2/lib -lmass_simdp8 -lmass -lmassvp8 -lm" BLAS_LIBS="-L/usr/lib -lessl -lblas" LAPACK_LIBS="-L/usr/lib -lessl -llapack" DYLIB_LDFLAGS="-L/usr/lib -lessl" ./configure --prefix=/home/rchen/R320 --with-blas="-L/usr/lib -lessl" --with-lapack="-L/usr/lib -lessl -llapack -lblas" --enable-R-shlib --with-readline=yes --with-x=yes
Tips: Should not have any WARNING message at the end of configuration. If so, please make sure more system software packages are needed.

  1. Enable R with IBM ESSL library

After configuration, modified file "Makeconf" and "etc/Makeconf":
From:
BLAS_LIBS = -L$(R_HOME)/lib$(R_ARCH) –lRblas
To:
BLAS_LIBS = -L/usr/lib -lessl -L$(R_HOME)/lib$(R_ARCH) –lRblas /opt/ibm/lib/libxlfmath.so.1 /usr/lib/powerpc64le-linux-gnu/libicuuc.so.52 /usr/lib/powerpc64le-linux-gnu/libicui18n.so.52
Tips: add MASS lib to fortran flags.
$ make
$ sudo make install

  1. Start and stop Rserve

$sudo R
>library(Rserve)>Rserve() or Starting Rserve on port 6311 : $ R CMD $path_to/Rserve/libs/Rserve

Web application development tools

1) PHP 5
$ sudo apt-get install php5
$ sudo service apache2 status|stop|start|restart #status
2) Tomcat 7
$ sudo apt-get install tomcat7$ sudo service tomcat7 status|stop|start|restart
Tips: Stop it tomcat before copy tranSMART WAR file to /var/lib/tomcat7/webapps
3) GRAILS
$ sudo apt-get install curl #ignore if already installed $ curl http://get.sdkman.io | bash
Tips: reopen terminal window or execute "$source .profile"
$ sdk install grails 2.3.11
$ gvm install groovy Tips: Do you want grails 2.3.11 to be set as default? (Y/n): Y

PostgreSQL database

$ sudo apt-get install postgresql-9.3
$ sudo apt-get install libpg-java
$ sudo apt-get install libpostgresql-jdbc-java
$ sudo apt-get install postgresql-server-dev-9.3
$ sudo -i -u postgres$ psql$ postgres# \q
$ sudo service postgresql status|stop|restart

Part III. TRANSMART INSTALLATION

Download tranSMART Data

  1. transmart-data $ git clone {+}https://github.com/transmart/transmart-data.git+$ cd transmart-data

$ git fetch --tags
$ git checkout
$ cd tranSMART-data/env
$ git clone {+}https://github.com/transmart/tranSMART-ETL.git+
$ cd tranSMART-ETL/
$ git fetch --tags
$ git checkout

  1. Download tranSMART Web Application

a) Transmart Main App $ git clone {+}https://github.com/transmart/transmartApp.git+
$ git fetch --tags
$ git checkout
b) Core API $ git clone {+}https://github.com/transmart/transmart-core-api.git+
$ git fetch --tags
$ git checkout

Install tranSMART Data

  1. Setup

$ sudo -i -u postgres$ psqlpostgres=# alter user postgres password 'postgres'; 

  1. Install tranSMART data

$ cd ~/transmart-data$ sudo make -C env ubuntu_deps_root$ make -C env ubuntu_deps_regular
$ vi vars * set PGPASSWORD = postgres* set TSUSER_HOME=$HOME/ #this is for grails
$ wget {+}http://downloads.sourceforge.net/project/pentaho/Data%20Integration/5.1/pdi-ce-5.1.0.0-752.zip+$ unzip pdi-ce-5.1.0.0-752.zip$ cp -R data-integration/ ~/transmart-data/env/
$ . ./vars$ make postgres_drop$ make -j4 postgres # Create transmart database instance
Tips: may need to add LC_CTYPE = 'en_US.utf8' to transmart-data/ddl/postgres/GLOBAL/Makefile if complain encoding.
$ make -C solr start$ make -C solr rwg_full_import

  1. Load tranSMART data

Load datasets using the example below:
make -C samples/postgres load_clinical_GSE8581make -C samples/postgres load_ref_annotation_GSE8581make -C samples/postgres load_expression_GSE8581make -C samples/postgres load_analysis_GSE8581
Tips:

  1. Data loading has to be in order: clinical, ref_annonation and expression.
  2. To solve "error connection" + jdbc, setting Kettle_home in vars

export KETTLE_HOME=/gpfs/fs1/rchen/transMART/transmart-data/sample
s/postgres/kettle-home

  1. tranSMART application configuration

$ cd ~/transmart-data$ make -C config install

Part IV. BUILD AND START TRANSMART WEB SERVER

Building tranSMART from source

$ cd ~/transmartApp $ grails clean
$ grails upgrade
$ grails war –plain-output

Start tranSMART application server

  1. Start Solr for database search

$ cd ~/transmart-data
$ . ./ vars-ubuntu

  1. Start PostgreSQL database

$ sudo service postgresql start

  1. Start apache2 and tomcat7

$ cp transmart.war /var/lib/tomcat7/webapps
$ cp $HOME/.grails /usr/share/tomcat7
$ sudo service apache2 start|status
$ sudo service tomcat7 start|status

  1. Start R server

$ sudo R
>library ("Rserve")
>Rserve ()
>q ()

Check tranSMART is correctly started

Open web browse: http://yourhost.com:8080/transmart

PART V. EXTRA: PostgreSQL data directory migration to GPFS

Fresh install data directory modification:

%sudo su – postgres
To check current data_directory and database version
% psql -d postgres -U postgres
%postgres=# SHOW data_directory;
data_directory
------------------------------
/var/lib/postgresql/9.3/main

  1. Create new data directory in /gpfs/fs1

%mkdir –p /gpfs/fs1/postgres/DB

  1. Run initb:

%/usr/lib/postgresql/9.3/bin/initdb –D /gpfs/fs1/postgres/DB

  1. Edit config file:

%sudo vi /etc/postgresql/9.3/main/postgressql.conf
Change data_directory=/gpfs/fs1/postgres/DB
Tips: you may need to set ssl=false if error

  1. Start the postgres database server

%sudo service postgresql start

Change existing data directory:

Under user postgres, do:
Stop database server:

  1. Create new data directory in /gpfs/fs1

%mkdir –p /gpfs/fs1/postgres/DB

  1. Copy existing database data to new directory

%cp –r /var/lib/postgresql/9.3/main/* /gpfs/fs1/postgres/DB
%rm –r /var/lib/postgresql/9.3/main

  1. Link new database to old

%cd /var/lib/postgresql/9.3
%ln –fs /gpfs/fs1/postgres/DB
(Tips: make sure the dir permission is 700)
Restart the database server

Migrate tablespaces to GPFS

Using the same procedures as "Change existing data directory"
Tips:

  1. Recovering hang issue, to solve:

hot_standby = on
wal_level = hot_standby

  1. Performance tuning (values in "postgresql.conf"):

checkpoint_segments = 100
checkpoint_timeout = 3600
checkpoint_completion_target = 1.0
checkpoint_warning = 300

  1. ketch.sh: set java memory to increase upload performance

PART VI. EXTRA: Enable tranSMART DB on Power8 Running RHEL 7.1

Install database:
$ yum install postgresql.ppc64le
$ yum install postgresql-server.ppc64le
Start and configure:
$ service postgresql initdb
$ chkconfig postgresql on
$ service postgresql start
Modify vars script:
PGSQL_BIN=/usr/bin/
TABLESPACES=/var/lib/pgsql/tablespaces/
KETTLE_JOBS_PSQL=$HOME/transmart-data/env/tranSMART-ETL/Postgres/GPL-1.0/Kettle/Kettle-ETL/
KITCHEN=$HOME/transmart-data/env/data-integration/kitchen.sh
Changes in pg_hba.conf
local all all trust
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
Create tranSMART postgres database:
% sudo -u postgres bash -c "source vars; PGSQL_BIN=/usr/bin/ PGDATABASE=template1 make -C ddl/postgres/GLOBAL tablespaces"
% make postgres
Running /copying transmart configuration file
% sudo bash -c "source vars; TSUSER_HOME=~rchen/ make -C config/ install"
% make -C env/ data-integration
% make -C env/ update_etl
After database setup and configurations, load datasets to PostgreSQL as same as Ubuntu's.

 

Ackowledgment to

Ruzhu Chen (ruzhuchen@us.ibm.com)
Senior Software Engineering Consultant, IBM Life Sciences Technical Solutions

  • No labels