Skip to end of metadata
Go to start of metadata

R workflows are organized in the following way:

       R files organization

  • A workflow consists of multiple R scripts stored in a single folder
  • Folder is named after a workflow. It resides in web-app/HeimScripts/
  • Workflow needs at least one script: run.R. Purpose of this script is to produce JSON for the D3 visualization. 
  • Optionally it can contain: summary.R, preprocess.R, downloadData.R and any other number of scripts the only reserved name is fetchData.R. 

    R script structure

  • Each R script needs to contain main function. This is the only function that will be invoked when the ScriptExecutionTask is being run
  • Main function of an R script may contain any arguments (primitive values only, Strings, numeric, character, lists no custom classes)
  • Main function arguments are passed from the frontend during starting the task via call to RunCommand (grails-app/controllers/smartR.plugin/rest/ScriptExecutionController/RunCommand) See next paragraph for details.
  • Last statement/return statement of main function is returned as a JSON to the frontend.
  • All files written to the disk will be available after successful exit of the script to the frontend. Their names are returned in the JSON of the status call (see paragraph on Communication with frontend).
  • jsonlite library is sourced automatically into the session during initialization. It should still be sourced in every run.R and others that use it to make it testable (see Testing Setup paragraph).  

      Communication with the front-end

  • Before any script execution R session needs to be initialized for the workflow. It is being done by a call to: RSessionController/init with workflow name (exactly as its folder name) as the sole argument.
  • RSessionController/init will return sessionid. This value is a mandatory argument for run, status and download calls. As all of these calls operate on data only within single R session, which is always dedicated to a single workflow.
  • R script execution can be started by using a call to  RunCommand (grails-app/controllers/smartR.plugin/rest/ScriptExecutionController/RunCommand).
  • RunCommand requires three arguments. sessionid, taskType and arguments dictionary. All of this needs to be passed in a POST call. 
  • taskType is either 'fetchData' for the special data fecthing task or a script name (without extension .R). 
  • Arguments field inside RunCommand POST body will be used directly for calling main() of selected R script. Only in case of fetchData task it is different. Example JSON POSTed to the RunCommand:
    sessionId: sessionId,
     arguments: arguments,
     taskType : taskType, // for instance 'run', 'preprocess' or 'fetchData'
     workflow : 'heatmap'
  • Arguments field inside RunCommand for fetchData requires: 
    conceptKeys : conceptKeys,
    resultInstanceIds: resultInstanceIds,
    projection: PROJECTION
    • conceptKeys is an array containing conceptKeys to be loaded. 
    • resultInstanceIds is an array containing two values corresponding to the subsets selection. Can be retrieved via extJShelper.js functions
    • PROJECTION - required only when concepts include HDD data. It is a projection in which data should be retrieved from tranSMART database.  Can be log2 for instance.
  • RunCommand initiates an asynchronous job. Therefore it only returns script executionid. This value is a mandatory argument for status and download calls.
  • Data returned from an R script  can be retrieved via a status call to: (grails-app/controllers/smartR.plugin/rest/ScriptExecutionController/StatusCommand)
  • Data from tranSMART database is passed to the R session via a special type of task: dataFetch task. In order to initiate the same REST call is made as for stating an R script: grails-app/controllers/smartR.plugin/rest/ScriptExecutionController/RunCommand. Just instead of R script name as a taskType - you have to pass 'fetchData'

      Passing Data Between R Scripts

  • The only time data should be passed between any two R scripts is when preprocessing.R script is run. 
  • It is being done by setting global variable 'preprocessed' and 'preprocessed_params'
  • 'preprocessed' global variable contains preprocessed data from 'loaded_variables'
  • 'preprocessed_params' contains only parameters user selected for preprocessing - only for purpose of exporting them when downloadData.R is run.
  • Default behavior of all scripts (especially run.R and summary.R) is to use 'preprocessed' global variable if it exists and ignore loaded_variables. 
  • loaded_variables should never be modified by any of the scripts, it should be considered read-only. 
  • Files generated by one scripts within the same workflow are not available to other scripts - this is intentional.

      Order of Ajax Calls in a Typical Workflow

  • R Session is initialized for given workflow when the page is loaded. SessionId is stored for later calls. HTTP POST to 
    '/RSession/create'
  •  After user drags concepts in SMartR view and the subsets in the Comparison tab dataFetch task can be started. HTTP POST to
    '/ScriptExecution/run'
  • Previous task returns only executionId as it is an asynchronous job. It has to be stored for use in next calls to the HTTP GET:
    '/ScriptExecution/status'
  • Status returns JSON with result if the fetchData is finished. It should be polled in intervals for instance by window.setTimer. It will also contain all exceptions should they be thrown by the backend during task execution.
  • After the fetchData is successfully finished an R script can be executed. Another HTTP POST to: 
    '/ScriptExecution/run'
  • As with fetchData any R script execution is also asynchronous - only exectionid will be returned. It has to be used for polling HTTP GET:
    '/ScriptExecution/status'
  • When the task is finished, JSON with results (file names that can be downloaded and also just pure JSON results) will be returned by the /status call. File names can be retrieved by HTTP GET to:
    '/ScriptExecution/downloadFile?sessionId=sessionId&executionId=executionId&filename=filename
  • Any number of scripts can be executed after data is successfully loaded. summary.R for instance can be run after every data manipulation (run.R and preprocess.R). It is safer however to disable starting any new task before old one is finished. One also needs to be mindful of the side-effects of dataFecthing - it will wipe clean loaded_variables and preprocessed before loading new data. 

      Clinical and High-Dimensional Data loading

  • All data loaded into R session exists only for given workflow. 
  • When application is run locally this data can also be inspected inside /tmp/heim/<sessionid>/<executionid>
  • All loaded data is accessible to the R scripts via a global variable 'loaded_variables'. It is always a list which contains R data.frames
  • Each data.frame loaded into the loaded_variables is named. Names strictly follow following convention: SOURCE_NODE_SUBSET. where: 
    • SOURCE is given by the frontend during fetchData task call. It describes HTML input from which the concept comes from. It has to be followed by a number. e.g. box1, group1.
    • NODE is given by the frontend during fetchData task call. It is always letter 'n' followed by the number. starting with 0. n0 for the first node dropped in given input n1 for a subsequent one.
    • SUBSET is given by the backend during data loading. It is always letter 's' followed by the number 1 or 2. Subset is understood as the selection made in comparison tab of transmartApp.
  • loaded_variables global variable is deleted before every fetchData task.
  • Another global variable available in R session is 'preprocessed'. 
    • It is meant for storing preprocessed data. 
    • It is created by the preprocess.R script. 
    • Default behaviour of all scripts (run.R and summary.R especially) is to check if the preprocessed variable is set and use it instead of the loaded_variables.
    • It is deleted before every dataFetch task. The same as loaded_variables
  • It is possible to load following
    • HDD: mRNA, RNASeq, metabolomics and proteomics.
    • Clinical: numeric and categorical
  • HDD is always of format (columns header):   Row.Label    Bio.marker    SubjectInTrialID1...SubjectInTrialIDN
    • Row.Label is  probeID for all currently supported HDD
    • Bio.marker is a Bio.marker as retrieved from the tranSMART database
    • Each next column is of type numeric and is named after SubjectInTrialID to which it belongs
  • Clinical data is always of format  (columns header): Row.Label    FullConceptKey
    • Row.Label is the AssayID (DISCLAIMER: need to check that, I am not sure)
    • FullConceptKey column contains either character or numeric values. Character for categorical and numeric for numeric data types.
  • When two subsets (as understood by selection in the Comparison tab of the transmartApp) are selected, data for each subset will be fetched separately and loaded into a separate data.frame. They can be distinguished by suffix s1 and s2 of their labels. See above for naming convention which pertain to the data.frames loaded into the loaded_variables. 
  • All data is loaded using base R function 
    <- read.csv('data', sep = "\t", header = TRUE, stringsAsFactors = FALSE)
  • This has the intended consequence of factors (default in R) being replaced by characters. No factors should ever be expected in loaded_variables.

      Common R functions

  • R scripts cannot be sourced using simple source(relativePath) statement. As they are copied to either local or docker /tmp folder before execution. Absolute path needs to be provided.
  • Absolute path for sourcing can be found like this: 
    utils <- paste(remoteScriptDir, "/workflowname/utils.R", sep="")
    source(utils)
  • remoteScriptDir is a global variable set by the grails backend before executing any R script. 
  • In order to make R scripts testable outside grails context following line always needs to be added to any script using sourcing:
    if (!exists("remoteScriptDir")) {  # Needed for unit-tests
     remoteScriptDir <- "web-app/HeimScripts/heatmap"
    }
  • default place for common functions shared by all workflows is: 
    web-app/HeimScripts/core
  • web-app/HeimScripts/core contains at least index.R file - this scripts sources all libraries and all core functions. It sources input.R at the time of writing. 
  • input.R contains a few functions for handling loaded data. Validation and parsing. parseInputs for instance should always be a preferred way to handle loaded_variables. 
  • default place for common functions shared by scripts of a single workflow is: utils.R
  • utils.R will NOT be automatically sourced - it has to be sourced explicitly like described at the beginning of this paragraph

     Unit Tests Setup

  • R Scripts should be accompanied by unit tests.
  • Current test setup for R is based on RUnit4; to run the existing test: 

install.packages ("RUnit4") # obviously, only once
install.packages("jsonlite") 
  • Test are run outside grails context, which means that none of the global variables such as loaded_variables, preprocessed or remoteScriptDir are available.
  • Test directory is 
    test/runit/tests
  • Tests are run automatically by travis. See travis.yml for details of implementation
  • Tests can also be run locally using following two R commands (in RStudio for instance):
    setwd("~/Code/heim/tmrepo/SmartR/")  # this is just an example path to SmartR, taken from my setup
    source("test/runit/run_tests.R")

     Organization of JavaScript files (THIS WILL BE UPDATED)

Boxplot was  implemented following a new structure for the JavaScript files, roughly following the MVC strategy.

In MVC, the view is responsible for 1) drawing the user interface, 2) respond to changes in the model and redraw the user interface accordingly, and 3) respond to user actions. The first two it does directly (or through the view part of a "component", more about that later), by manipulating the DOM, the last it delegates to the controller. Note that, in the current implementation, the user interface is not specified exclusively in the view JavaScript file (and view part of the "components"), there is also a GSP file that contains the skeleton for the analysis page. Both the controller and the model are injected into the view. The view could also change the model directly, but changes to the model are usually mediated through the controller, as they tend to happen in response to some user action.

The controller is called from the view in response to user actions. It may interact with external services and, in response to that, it updates the model (this may trigger the view to redraw the interface, since the model will then have changed). The controller gets the model injected so it can change it. The controller could also read the model, but we have preferred to pass to the controller methods, from the view, already the information they need.

In the components directory, there are several files representing "components". This is are not necessarily UI components. They are merely a way to cohesively organize a certain piece of functionality. They can be internally organized in whatever way, but they must expose an object with 1 or more of these keys: forControllerforModelforView. Each will have a value that will be itself an object. This latter object is meant to be injected into the analysis controller, model, or view, respectively. The forModel object should not be injected into the view, for instance. If interaction with the forModel object is needed (it probably is, otherwise it wouldn't make sense for the component to expose this object), then it should be mediated through the analysis model, in which the component's forModel object is injected. In practice, the components themselves will frequently be organized in (pre-bound) MVC fashion too (see summaryStats.js). Every component that has a UI side of it will likely need to be injected into the view in order to avoid it having knowledge from the analysis DOM (as specified in gsp and in the view) hardcoded. The main strategy used is to expose an init method in the forView object. In our implementations, these methods take ids and then jQuery is used inside the components to find the actual element. It probably would have a been a better idea to pass the elements already to the components.

     Dependancies

  • All R dependancies are covered by transmart-data make scripts. 
  • External dependancies: zip, libcairo and libpng
     

     

     

  • No labels

3 Comments

  1. It is possible to load

    Piotr Zakrzewski, this is probably unfinished?

     

  2. Can anyone also give me write access here so that I can contribute? Ward Weistra Gustavo Lopes