Grails 4.0 and Java 11
The upgrade to grails 2.5.4 takes tranSMART up to using Java8, but this is still no longer commercially supported by Oracle. To upgrade to Java 11 we need to migrate to the very latest (released in July 2019) Grails 4.0.
Grails 4.0 supports java 11 for the first time. It requires the use of the asset pipeline (already ported in tranSMART 19) and also requires the groovy and java source code to be reorganised. This is the same as the work required to port an earlier tranSMART branch (17.1) to grails 3. Code is moved to new locations, the BuildConfig files are rewritten, and a gradle build script is needed. The result is a far simpler and faster build environment for developers.
Postgres 11, 12 and beyond
TranSMART has supported Postgres releases from 9.2 up to 10 and 11 but has not exploited any of their new features.
Postgres now supports partitioned table (support started in Postgres 10 and is extended in Postgres 11). So far tranSMART has used a workaround for the large amounts of high-dimensional data for mRNA expression and RNAseq especially.
It is comparatively straightforward to update the Postgres schema to use the officially supported partitioning. The stored procedures basically adopt the same methods as for the existing Oracle procedures. We could maintain the older Postgres schema under another name for any users with legacy installations.
I2b2 supports Oracle, Postgres and SQLserver. The database can be on a remote system using any of these. TranSMART initially supported only Oracle. When the open source tranSMART project started in 2012 Postgres was adopted as the database system, and Oracle code was neglected until release 1.1.
There are several benefits to adding SQLserver support. Existing i2b2 installations will be able to use tranSMART 20 together with their existing database to support and analyse clinical and high dimensional data from in progress and completed clinical trials. The SQLserver schema can be constructed using the existing transmart-data tools, and the existing comparison scripts to keep all the supported schemas in sync (Oracle, Postgres 11, Postgres 9+, SQLserver). Finally, this will provide an excellent test base for the remaining SQL code to ensure that we find any statements that are specific to only one platform.
Of course there could be some interest from the commercial database providers – Microsoft and Oracle – in supporting their products.
Other database providers
Additional database providers have been suggested:
Clearly SQLServer should be the first, to be compatible with i2b2 and to work through the DBMS-specific areas of code.
MySQL is now owned by Oracle which is unfortunate given their approach with Java. There is also an open source derivative of MySQL 5.5 as MariaDB that aims to track Oracle’s future changes and is a
better candidate for support - though presumably it would a a small overhead to support both Various “big data” databases are worth looking into.
MongoDb is used by code from Sanofi that is integrated in the Browse tab for indexing and retrieval from large files.
In the pharmaceutical industry the practice has been to develop their own extensions to tranSMART. At Sanofi the Browse tab study metadata and a series of new data types were developed, but they continued to maintain their own transmart code including support for additional large files in MongoDB, an ETL package (ICE) and alternative authentication mechanisms. At Pfizer extensions to handle GWAS results were added. In both cases the native code only supported Oracle and needed to be ported to Postgres. Some ETL issues remain to be resolved. The Hyve developed new analysis methods and support for aCGH data for the TraIT projects in the Netherlands but Oracle support for these extensions has gaps (e.g. data loading).
New data types
At the Paris 2017 i2b2/tranSMART European users meeting a group gave a talk describing their integration of flow cytometry data into tranSMART. This could be imported into tranSMART 20.
Other users have imported flow cytometry as clinical data. We need to review the alternatives, describe what can be dome using clinical data, and implement a new data type with additional benefits.
Users in Oxford, England have proteomics data for protein clusters, requiring adjustment to the way identifiers are used so they can use the cluster or protein ids in queries and visualise them in results
Supporting the i2b2 schema is only part of the way to full integration. I2b2 loads all clinical data into a single ontology, while tranSMART loads data by study. More work is needed to establish how to allow both platforms to work with a common set of patient data (for clinical trial cohort selection in i2b2) and study data (for clinical trial analysis in tranSMART).
The schema for transmart 19 has been matched to i2b2. A few minor changes can be added post-release (reducing the size of some columns to the i2b2 value unless there is known transmart data that needs longer text, extending ids to 8-byte integers where very large data volumes may be loaded - and especially if they may be reloaded.
Certain tables are key to the i2b2 model. These should be made clean so either they are identical or i2b2 has agreed to ignore any additional transmart columns.
Other tables may be used by both platforms. These must each be documented and regularly maintained at release time.
We need to review how queries are processed and stored to check whether there could be a clash is both platforms are active on a common database.We need to check ownership and access rights within the data is a user logs in to transmart or to i2b2. They should be consistent at least with the access a guest user would have, and should not grant any additional rights that are otherwise limited in one platform.
Note: because transmart started out as i2b2 the i2b2 tables tend to be included in transmart. We need to review the code to track any references to these tables and figure out what functionality is involved, and where records may be created, modified or deleted by transmart. Transmart 19 includes all the i2b2 tables, but as only i2b2metadata and ‘i2b2demodata’ were used by tranSMART 16 we know already we can ignore the other 4 i2b2 schemas. We also know i2b2 will ignore the transmart schemas so only i2b2metadata and i2b2demodata need attention.
In i2b2 other schemas can be created equivalent to i2b2demodata. We need a mechanism for these to be usable by transmart, following something like the route i2b2 uses to find them.
Another key distinction is i2b2’s support for SQLserver. transmart started out as Oracle only. The open source projects added postgres but nobody has addressed any SQLserver issues. Creating the database is relatively simple - it is very close to the Postgres code. Issues will arise in the remaining SQL code and in translating and testing ETL stored procedures/functions, but as these are repetitive the task is lighter than it appears at first.
TranSMART 19 added code for the new interactive analysis plugin "Fractalis." This project is a successor to SmartR in tranSMART 16.2, which in turn supersedes the Advanced Workflows in tranSMART 16.1 and earlier.
Implementation of Fractalis depends on the development of an Application Programming Interface between Fractalis and tranSMART to export data for analysis. Available prototypes cover only the modified schema developed in the tranSMART Pro project. A fresh development is needed for tranSMART.
Coverage of the available analyses is incomplete. There are Advanced Workflows that are not in SmartR (though part-tested code is available to provide them) and similarly workflows that are not
provided by Fractalis.
Testing for transmart 16.3 used Geb and Selenium with a firefox browser driver. We have suggestions for a UI testing framework that may be helpful - it was used with Glowing Bear.
We should also update the current tests to Geb 3.3 (released Jan 2020) and check them against the updated tranSMARtT19 codebase and schema.
We need to first set out a detailed list of what needs to be tested, and prerequisites (mainly studies and metadata required).
We need to implement tests in all areas, covering the major features so they can be routinely examined.
We need a test database that can be used for tests that do not need to update the database significantly (so that several test sites can be running against the same database)
Testing datasets with more than one species. For example, cells from tissues with viral infection.
Possible improvements include adding a species prompt before selecting genes so only genes for the species of interest are displayed. This can be tested with human, mouse and rat genes for selected studies.
There is a capability to internationalize messages and text by defining a language-specific file with the English versions.
Matching files can then be generated for other languages, with the most important features prioritized.
We can make a start by trying to define the English text and adding an alternate language with the
text reversed so we can understand what is happening.
Local customization within the UI
We have several requests to change the terminology in the UI which we avoid because it would differ from the standard i2b2 view. However, it should be reasonable to allow sites to customize these, subject to a warning that the documentation will differ from their instance.
- Display “Navigate Terms” as “Navigate Concepts”
- Display “Sex” as “Gender”
We can add configuration variables for common terms - including others on the same pages for completeness - and check the alternate version is used by defining e.g. “AltSex” and making sure this replaces all occurrences.
Reducing autocomplete lists
Lists of genes can be complicated where too many similar results can be displayed. For example, the genes, pathways, etc lists for the high dimensional data selection. Especially genes as human and mouse are distinguished only by upper/lower case and other species may be added.
We should be able to display the values with some prefix to make it clear what is being selected, then use the underlying value within the application.
Alternatively we could add some way to limit the results to applicable terms e.g. by species.