Release 19.0 (August 2019May 2020) is a major update to the previous release 16.3.
Release Notes tranSMART 19.0
Version Number
This version is renumbered to reflect the year of release, and to indicate the major effort in rewriting and reorganizing code.
...
Installation instructions for tranSMART 19.0 are in preparation at Install the tranSMART 19.0 beta release
Test Version
A test version of the full release is available at http://postgres-test.transmartfoundation.org/transmart
Details of the beta test server data and of the features to be tested are in the beta test public instance wiki page
Single repository
TranSMART 19.0 code is reorganized in a single repository https://github.com/tranSMART-Foundation/transmart
...
The directories mirror the original organization of the source code for the tranSMART 17 project to simplify direct comparisons,
transmart-core-db and transmart-core-db-tests
The test code previously under transmart-core-db/transmart-core-db-tests has been relocated to its own top-level directory.
...
The two RNAseq datatypes are better separated in the core code.
transmart-etl
The release 16.3 tranSMART-ETL repository has been renamed to all lower case. Unused legacy directories have been purged from the repository, greatly reducing the size of the zip file generated for each release.
- FCL4tranSMART is superseded by transmart-ICE which has the latest released code from Sanofi for their ICE tool data loader (formerly FCL4tranSMART).
- The V1.2_Hackathon directory contents were unused
- The loader is now superseded by the copy built from the src directory
- The src-old directory is clearly redundant
- The duplicate Kettle scripts under Kettle-GPL, Oracle and Postgres are removed. Since the start of the release 16 series the release copy of these scripts has been in the database-specific directories under the Kettle directory. These are used by the make targets in transmart-data and should be used by any local ETL pipelines.
transmart-extensions
The transmart-extensions plugin in release 16.3 has been split into its three components:
...
This also reflects the rearrangement in the tranSMART 17 project.
Galaxy-export
The old release 16.x 'blend4-plugin' is renamed galaxy-export-plugin.
Throughout the code the name 'blend4j' has been replaced to make the functionality clear.
SmartR
SmartR was developed by the eTRIKS project and released in tranSMART 16.2 with a set ogf interactive analysis workflows that supersede many of the functions of the "Advanced Workflows" tab.
...
The Advanced Workflows remain active in this release. We anticipate that users will require them in order to reproduce previous analysis, and they can be used to compare results and encourage migration to SmartR.
Fractalis
Fractalis was developed for i2b2-transmart by the same author as SmartR (Sasha Herzinger at the University of Luxembourg) and superseded several of the SmartR workflows.
We are working on the full integration of Fractalis into tranSMART 19.0. This is a work in progress, involving new ETL interfaces between ttranSMART and Fractalis.
TransmartApp Web Application
Help pages
The revised transmart manual is added as help pages inder the defautl URL /transmartmanual. Links from the web interface bring up the appropriate section in a new tab.
...
Help links have been added to pages where they were missin gin previous releases, including the Comparison, Summary, and GridView tabs under Analyze.
Gene Signatures
The Gene Signature tab provides a way to maintain lists of genes, SNPs or probeIDs to define high-dimensional analyses (heatmaps etc.). These have been updated and tested. A new stored procedure is added to load platforms to match gene lists (the platform must be in the right table, with the species defined, for the gene signature to pick up the required genes.
...
A public set of gene signatures can be a helpful addition to a tranSMART installation where a set of markers is of special local interest.
Gene Lists
Gene Lists are a simple version of the Gene Signatures.
...
The safest way is to load from a file of gene names or IDs as this is easier to edit and reload.
Administration
A se of extra pages are available to users logged in as administratotors via the Admin tab
Build Information
Lists the metadata for the tranSMART web application.
Status of Support Connections
Tests the Solr server is up and reports the number of items under each category.
Configuration Information
The page was added in 16.3 and is extended in 19.0. All configuration values are retrieved, categorized and reported.
...
New in 19.0 is a set of "Manifest" settings. These are the links to all the javascript, stylesheet, image and otehr files packaged in the transmartApp distribution. By default this table is closed so you only see the total count. Other tabls start open but can be closed to make the remaining tables more readable.
Customization
Some customization options have been added to the database in 19.0.
...
Further customization options are planned for future releases. Please contact the developers if you have suggestions for features you would like to see.
JavaScript cleanup
Multiple versions of jQuery have been replaced by a single version across all plugins.
...
Location and timing of the definition for drag-and-drop in Analyze tab sub-pages has been updated.
R and tranSMART
The Rserve service is updated to provide better control for installations installing R in transmart-data.
...
The Rserve service template writes to a logfile to debugging output from R and error messages can be traced more easily when anlaysis has issues.
SolR Server
A template is provided to install a service to launch solr.
...
The solr server also provides help through its administration interface.
Database schemas
tranSMART and i2b2
The database schema has been updated to resolve, as far as possible, differences with the i2b2 1.7.12 schemas.
...
Integers in postgres are defined as type 'int' unless extremely large values are expected. Values that can exceed 1 billion are defined as type 'bigint'.
New schemas
Four additional i2b2 schemas have been included:
...
Though not used (currently) by tranSMART the aim is to create a database that can be populated and used as an i2b2 platform.
New I2b2 Tablespaces
I2b2 does not define tablespaces by default. A set of tablespace names were agreed with the i2b2-transmart developers for implementation in the tranSMART schema.
...
These reflect the tablespaces TRANSMART and INDX used for the transmart-specific tables and indexes.
Redundant tranSMART tablespaces
Earlier tranSMART releases defined 3 additional tablespaces: biomart, deapp and search_app. These were no longer used - though they were created with a new database.
...
We are working on including support for tMDataLoader on Oracle and Postgres by default. This involves running the respective installation scripts and incorpporating the changes into the Postgres and Oracle schema definitions, including the tMDataLoader specific versions of ETL functions.
Default passwords for database roles
The usernames and passwords for the database roles are defined for Kettle when the database is created. By default the username and password are the same as the schema. While this is simple for developers, production instances should define unique passwords for each role.
By adding these as environment variables when launching the database creation target the Kettle properties files will be populated. We recommend retaining the role names as these are used in many places in the code. The kettle properties files and the transmart-data/vars file shjould be secured from read access by other users.
Default passwords for transmart login
A limited set of usernames is defined for this release, as for previous releases. These have a default password, usually 'transmart2016' except for username 'admin' with default password 'admin'. These are simple for developers.
...
If using this option, changing to another username (to become an 'admin' user) requires logout through the 'Utilities' menu, or explicitly using the login page URL.
Initial database content
A new database (for Postgres or Oracle) is defined as a target in transmart-data.
...
The data from these sources is included in the searchapp.search_keywords table with synonyms in the searchapp.search_keyword_terms table.
Additional data dictionaries
Further data dictionaries can be loaded, and the data for the abvove dictionaries can be updated, using the loader utility under transmart-etl
Data | Source | Date | Table |
---|---|---|---|
Human Pathways | HMDB | biomart. | |
Pathways | KEGG | Last public release | biomart. |
Grails upgrade
Grails version 2.5.4
This release is built using Grails 2.5.4.
...
Earlier releases up to 16.3 were built using Grails 2.3.11 and Grails 2.3.7
Building individual components
Each directory has a build script.
...
For two external packages mydas and IpaApi, cd to the directory and build with:
mvn install
Component dependencies for transmartApp
The recommended build order to ensure dependencies are satisfied by previously built components is:
Directory | Name | Dependencies | Description |
---|---|---|---|
transmart-core-api | transmart-core-api | - | Core API |
transmart-shared | transmart-shared | - | New in 19.0. Provides generic utilities to return security information about the current user to remove the need to pass the username around in service calls. |
transmart-legacy-db | transmart-legacy-db | - | |
transmart-fractalis | transmart-fractalis | - | New in 19.0. Adds the Fractalis interactive workflow tab, integration is a work in progress. |
mydas | mydas | - | The original DAS code imported into tranSMART because there is no guarantee the original distribution will remain available. A small section of code is intended to be autogenerated, but this is easy to maintain by hand if any of the dependencies change. |
dalliance | dalliance | - | The dalliance genome browser. |
transmart-core-db | transmart-core | transmart-core-api transmart-shared | |
transmart-core-db-tests | transmart-core-db-tests | transmart-core-api transmart-core | Moved to a new directory for 19.0. Tests for the transmart-core-db code, also used by SmartR and transmartApp. |
transmart-mydas | transmart-mydas | transmart-core-api mydas | |
transmart-java | transmart-java | - | Moved to a new directory for 19.0. |
biomart-domain | biomart-domain | transmart-java | Moved to a new directory for 19.0. Domain definitions for the large number of tables in the biomart schema. |
search-domain | search-domain | transmart-shared biomart-domain | Moved to a new directory for 19.0. Domain defintions for tables in the searchapp schema. This covers keyword searching and user, role and access management as both are defined ni the searchapp schema. |
transmart-rest-api | transmart-rest-api | transmart-core-api transmart-shared transmart-core transmart-core-db-tests | |
transmart-custom | transmart-custom | transmart-shared search-domain | New in 19.0. Provides services to customize aspects of tre user interface using new application tables. Documentation is needed for these new capabilities. |
folder-management-plugin | folder-management | transmart-core search-domain | |
Rmodules | rdc-rmodules | transmart-core-api transmart-shared | Provides all the Advanced Workflows. There is also a dependency on data loaded into the searchapp schema to define the inputs,, outputs, parameters and scripts for each workflow and to defined their names and the order in wuich they are presented. This was intended in the pre-open source versions of tranSMART to allow users/administrators to edit these settings, but this makes little sense to control fixed scripts distributed in the transmart.war file. There is no known instance of a site developing their own Advanced Workflows thourh these mechanisms. |
spring-security-auth0 | spring-security-auth0 | transmart-shared transmart-core search-domain transmart-custom | New in 19.0. A new Auth0 controller and services. Documentation is needed for these new capabilities. |
IpaApi | IpaApi | - | This code provides one third-party SmartR workflow to interface to Ingenuity Pathway Analysis. SmartR includes hooks to load the IpaApi workflow |
SmartR | smart-r | transmart-core-api transmart-core transmart-core-db-tests IpaApi | Several potential new workflows from the eTRIKS project are candidates for inclusion. |
galaxy-export-plugin | galaxy-export-plugin | transmart-shared transmart-legacy-db rdc-rmodules | Directory renamed from blend4j-plugin for 19.0. A version of data export that transfers to an instance of Galaxy for further analysis. The user needs credentials to use the galaxy instance, defined in the server Config.groovy file. |
transmart-metacore-plugin | transmart-metacore-plugin | transmart-shared | |
transmart-xnat-viewer | xnat-viewer | transmart-core-api search-domain | |
transmart-xnat-importer-plugin | transmart-xnat-importer | transmart-shared biomart-domain | |
transmart-gwas-plugin | transmart-gwas | transmart-shared transmart-legacy-db transmart-core search-domain rdc-rmodules folder-management | |
transmart-gwas-plink | transmart-gwas-plink | rdc-rmodules | |
transmartApp | transmart.war | transmart-fractalis dalliance transmart-core-db-tests (test) transmart-mydas transmart-rest-api spring-security-auth0 smart-r galaxy-export-plugin transmart-metacore-plugin xnat-viewer transmart-xnat-importer transmart-gwas transmart-gwas-plink | Ths is the full transmart server, providingall the functions of the User Interface plus the access methods for the RESTful API to generate ans serve authentication tokens and to serve results when these topkens are presented. |
Java version
Grails 2.5.4 uses Java 8.
...
The pivot utility in the Kettle ETLs has been recompiled with Java 8.
Asset Pipeline
Javascript, stylesheets, and other resources are packaged and provided through the asset-pipeline plugin.
...
This upgrade is a major step towards preparing for an upgrade to Grails 3 or the newly released Grails 4 in a future tranSMART release
Code review
A major code review was conducted by Burt Beckwith at Harvard as part of the inclusion of tranSMART 16.1 and 16.2 code in the i2b2-transmart project.
The planned changes were described in the i2b2-transmart roadmap and summarized below
Summary of tranSMART 16-2 changes in i2b2/tranSMART 18.1-beta
optional support for Auth0 authentication
A new directory spring-security-auth0 provides Auth0 services.
Groovy code formatting and consistency
Coding standards have been applied to groovy code:
- leading white space
- code indentation
- positioning of braces:
- if/else blocks: newline after {
- } on new line
- splitting of long lines
- Map and List values on single line is possible
- replace -each for a Map or List by a loop with a datatype and named variable specified for the value
- Use boolean for all true/false variables
- Use int for integer variables
- Use single quotes for string values except to avoid escaping many single quotes for readability
- Use {} to insert values into strings
updated logging to use Slf4j wrapper and parameterized logging for performance and to help with Grails 3 Logback migration
All source files that used Log4j and calls to log.info:
- import groovy.util.logging.Slf4
- define Slf4j as 'logger'
- call logger.info (etc.) to write to log output
converted most Config lookups to use Spring's @Value annotation
Replace configuration parameter references with @Value definitions using with org.springframework.beans.factory.annotation.Value
use "private @Autowired" for dependency injection to reduce API pollution
Add @Autowired references with org.springframework.beans.factory.annotation.Autowired
replaced many uses of 'def' with actual types, particularly in method signatures
Throughout the code of release 16.3 types were undefined. Adding explicit types wherever possible provides validation of the type sactually passed and improves the usefulness of error messages when code breaks.
converted many cases of copy/paste to use methods
A single method can replace a set of identical code segments making testing and maintenance far more robust.
domain class cleanup, removed many unnecessary declarations
Domain classes have been reviewed and matched to the updates database schemas.
simplify access to current username, user id, roles, etc. via new SecurityService
The new SecurityService in transmart-shared provides calls to return information on users and roles for implementing access polcies.
removed passing AuthUser to methods that always work on the current authenticated user, moving the lookup to where it's used
A new directory transmart-shared provides utility functions. These inclide generic checks for the capabilities of the authenticated user, allowing the username to be removed from many method calls where it was being passed down.
upgraded to Grails 2.5.4, Servlet API 3, Java 8
Release 19.0 is built using grails 2.5.4 which depends only on java 8.
tests cleanup, converted to Spock
Tests updates in transart-core-db-tests
One test currently fails. It is testing something that is supposed to fail, but should be trapping the error condition and reposting a test success.
converted controller closures to methods
Closures are defined as methods with a set of parameters, replacing the closures and parameter fetching in earlier releases.
Much of the code remains unchanged within the method aside for parameter handling and other coding standard changes (see above)
controller params simplifications
Parameters are explicitly defined in each method
This give cleaner code where it is obvious what parameters are used and what parameters are available to control the result.
some removal of logic in GSPs, moving to controller/services (much more needed)
Groovy Servlet Pages (.gsp files) cleaned up to avoid interpreted code critical to functionality
Standard indenting of HTML within GSP pages.
updated sql queries to include schema+table instead of adding many db synonyms
Especially in Oracle code, tranSMART 16.3 defined synonyms to allow reference to tables without the schema.
...
They also make it possible to derive the permissions needed for functions/procedures to operate across schemas.
removed many unused methods, some unused classes, org.json source classes, duplicate classes (e.g. many in both transmartApp and transmart-java)
Many usused methods were retaine dbacause it is difficult to be certain they wil never be invoked.
...
However, some removed methods in the i2b2-transmart code were unused on Oracle but were required when running on postres and have been reinstated. Examples include handling large objects as strings.
deleted deprecated AccessLog class and changed to use AccessLogEntry and AccessLogService
Simplifying the access logging code
added simple Spring Security role hierarchy
Thsio requiers furthe rtesting to make sure the required tranSMART functionality is supported.
deleted many lib directory jars and replaced with BuildConfig dependencies
These jar files are removed from the code repository. They are downloaded though code dependencies in BuildConfig.grooxy or pom.xml.
These jar files remain in the repository. They may be removed later.
where possible annotated classes and methods with @CompileStatic for performance
Many classes amd methods now have @CompileStatic. Testing found very few instances where the annotation had to be removed.
StringBuffer -> StringBuilder
StringBuild is used to carete a string and to append to it using '<<' syntax
split transmart-extensions into three projects, split out transmart-core-db-tests
See the organization of the new single transmart repository
moved filters classes from grails-app/conf to grails-app/filters
Filters are now in grails-app/filters/org/transmart...
some SQL injection fixes
All SQL statements in the code need careful testing to check they work for both Oracle and Postgres
some conversion of simple Java classes to CompileStatic Groovy
Testing is needed to ensure that code functions as in earlier releases.
Oracle support
Oracle is fully supported using the same release (12.1 or 12.2) as previous versions of tranSMART.
Testing relies on an Oracle Docker instance.
Postgres support
This release has been tested on Postgres up to 9.6, and on Postgres 10, Postgres 11 and Postgres 12. No version-specific issues have been identified.
To date no attempt has been made to take advantage of the new partitioning features in more recent Postgres versions. We continue to monitor these and will consider supporting them in a future release. It is likely that legacy support for the current Postgres schema will be continued in tranSMART.
SQLserver support (potential)
TranSMART does not support SQL server.
...
Upgrading the source code to include support for a third database would require significant work, but would also test and clean many sections of code with obvious benefits to the quality and robustness of tranSMART.
Ubuntu 18 support
Changes have been made to support installation on Ubuntu 18.04, Ubuntu 16.04 and Ubuntu 14.04.
...
Scripts for Ubuntu 18.04 are updated with system libraries installed to cover dependencies for installation of R packages.
Ubuntu 20 support
TranSMART 19 is being tested on Ubuntu 20 (released in Spring 2020).
Fedora support
TranSMART 19 is being tested on Fedora 32 (released end-April 2020). Code has been built and tested on Fedora 31.
ETL with transmart-data load targets
Kettle 8 support
Releases up to 16.3 were tested only with Kettle 4.2.
...
Only minor updates to Kettle scripts were needed to satisfy an additional validation. These had prevented upgrading Kettle in earlier tranSMART releases.
New ETL targets
New targets are in preparation.
Study
Changes introduced into tranSMART 16 supported loading clinical and all high dimensional data in one step through a series of scripts and new parameter file for the TraIT Cell-Line Use Case poroject.
...
Should there be a failure at any point, going to the appropriate directory and running the load script there will resume just that part of the load after the issue is resolved.
Browse Tab Program Metadata
A set of utility scripts are mad einto a load target to create a new Program under the Browse tab.
The disease and therapeutic area fields are validated on loading.
Browse Tab Study Metadata
A set of utility scripts to load study metadata for the Browse tab can now be involked as load_browse targets.
...
The program must be loaded before the study
Browse Tab Assay Metadata
A potential load target can add Assay metadata into the Browse tab using scripts in preparation.
The platform information should be validated against the database ontologies before loading.
Sample Explorer Tab
A set of utility scripts are in prearation to load sample data into the Sample Explorer tab.
...
In GEO samples have limited informations but at least includes sample ID and organism.
Coding standards
Cleanup of SQL source code
- Standard indentation of lines and within SQL statements
- Consistent use of upper and lower case
- Commas in lists of columns/values positioned to ensure the correct row is indicated in SQL error messages
- Changed 'select ... into rtncd' to 'perform' where the rtncd value was ignored.
- Datatypes cleaned up e.g. int v. bigint
- Inputs and output matched to closest equivalents in Kettle
- Added 'explain' blocks for postgres to improve performance of slowest steps
ETL Data Loading
General ETL performance
Loading raw high-dimensional data in earlier versions could take a very long time. A SQL statement testing whether log intensity could be calculated for each raw intensity was creating very large loads on the server.
Preprocessing the raw intensities to identify usable values allowed this step to be simplified. Log intensity values are now calculated on a simple pass through the data using very low resources.
RNAseq ETL performance
A missing condition in a SQL statement caused RNAseq gene expression to load extremely slowly, and to consume vary large memory and tmp space resources. No other datatypes were affected.
High-Dimensional Data Columns
Previous releases loaded high-dimensional data (Microarray mRNA expression, RNAseq counts, etc.) with columns labelled as TISSUE_TYPE, SAMPLE_TYPE amd TIMEPOINT.
...
Consistent usage will be applied to tissue types, timepoints, and the sample treatments across these 200+ studies.
Clinical Data Ontologies
The libraries of curated studies at library.transmartfoundation.org
...
These studies will be reviewed to conform to a common set of terms to make it easier to work with multiple studies in tranSMART. Terms in common use (e.g. 'Medical History') should appear in the same place for each study.
Kettle ETL debugging
Previous releases have been hard to debug when loading data using Kettle. A number of issues are addressed in tranSMART 19:
...
The value is passed to Kettle as the -level parameter
ETL Stored Procedure debugging
When ETL procedures run for a very long time (see notes above for high-dimensional data, but also an issue for some very large clinical data loads) it is difficult in earlier tranSMART releases to identify the step causing problems.
...
psql -c "insert into tm_cz.etl_settings (paramname,paramvalue) values ('cleantables','no')"
cz_job_audit message position
Many messages reported “loading” with a row count for the previous step in earlier tranSMART versions.
All such messages should report the end of the step with its row count and description.
Function/stored procedure error checks
Several stored procedures reported errors, and returned an error status, but this was ignored by the calling procedure.
Release 19.0 checks the return status for all calls that have a return value.
Leading zeroes in Kettle job output
The job_id and log_base values werre reported with a large number of leading zeroes.
The datatypes used as outputs by stored procedures and the Kettle scripts have been changed and specific formats introduced to report only the integer value.
Cleanup of Kettle scripts
Kettle jobs have been pretty-printed to make the XML easier to read.
The return values from failed stored procedures have been standardized as zero for success and any other value for a failure. The tests in Kettle have been modified where the meaning of zero and one have been changed.
ETL stored procedure failures
Stored procedures called during ETL by Kettle and other ETL systems have been reviewed and updated.
...
An example was an error in the calculation of log intensities and zscore values for some high-dimensional datatypes failing silently.
Validation of platform
All platform annotation files for all datatypes should include only one platform.
RNAseq platforms
RNAseq expression data now uses a named platform. Loading the platform annotation populates gene names and gene IDs as for Expression platforms.
...
Test for missing gene ID and gene name information are more efficient in this release. In earlier versions loading RNAseq platform data as an anopnymized incremental update could take considerable time.
RNAseq multiple procedures
tranSMART has two sets of RNAseq ETL procedures. One is for gene-based expression counts, the other is for expression mapped to chromosome position. They were implemented around the same time for the tranSMART 16.1 release.
The names are defined interbally in several places in the source code. They have been made clearer in tranSMART 19. The internal name RNASEQ_COG (developed by Cognizant for Sanofi) is used by RNAseq expression counts.
Renaming/moving a study
The postgresql script tm_cz.i2b2_move_study missed many of the changes needed to rename or move a study. The updated script requires two inputs: the original path of the top node for the study and the new path. Any new nodes are automatically created. The function takes an additional jobId parameter which is NULL when run from the command line.
...