Skip to end of metadata
Go to start of metadata

How to record audit metrics from Transmart in an external system

This page documents how to configure Transmart to record metrics for auditing to an external system.

Overview

Transmart supports logging of a set of metrics useful for auditing. These logs can be directed to log files as any other logs, or to an external program that can then do any site-specific processing required. Logs can be formatted as free text just like any other logging output, or in the form of JSON data structures, which is more useful for further programmatic processing.

Setting up audit metrics recording therefore requires the necessary configuration of the Transmart logging system, and if the external program method is used, creation of that program.

Transmart configuration

The metrics system is based on the built-in logging system. To configure it you specify specific log appenders in the Config.groovy file:

Config.groovy
log4j = {
    /**
     * Configuration for writing audit metrics.
     * This needs to be placed in the out-of-tree Config.groovy, as the log4j config there will override this.
     * (and don't forget to 'import org.apache.log4j.DailyRollingFileAppender',
     * 'import org.transmart.logging.ChildProcessAppender' and 'import org.transmart.logging.JsonLayout'.)
     */
    appenders {
        // default log directory is either the tomcat root directory or the
        // current working directory.
        def catalinaBase = System.getProperty('catalina.base') ?: '.'
        def logDirectory = "${catalinaBase}/logs".toString()

        // Use layout: JsonLayout(conversionPattern: '%m%n', singleLine: true) to get each message as a single line
        // json the same way as ChildProcessAppender sends it.
        appender new DailyRollingFileAppender(
            name: 'fileAuditLogger',
            datePattern: "'.'yyyy-MM-dd",
            fileName: "${logDirectory}/audit.log",
            layout: JsonLayout(conversionPattern:'%d %m%n')
        )
        // the default layout is a JsonLayout(conversionPattern: '%m%n, singleLine: true)
        appender new ChildProcessAppender(
                name: 'processAuditLogger',
                command: ['/usr/bin/your/command/here', 'arg1', 'arg2']
        )
    }
    trace fileAuditLogger: 'org.transmart.audit'
    trace processAuditLogger: 'org.transmart.audit'
}

Note that this configuration needs to go in your out-of-tree Config.groovy, unless you don't have a log4j block in your in-tree Config.groovy. The out-of-tree log4j block will override the in-tree block if both are present. If you already have a log4j block out-of-tree, add this configuration into it.

This block creates two appenders: DailyRollingFileAppender and ChildProcessAppender. These will write the metric entries to a destination. The first is a normal appender that writes to a file which is rotated daily. The ChildProcessAppender starts an external script and passes the metrics to it to handle. Both appenders need a Layout, which decides how to layout each metric event. To programmatically process the metrics in an external program the JsonLayout is most useful. For the ExternalProcessAppender JsonLayout is the default, for the DailyRollingFileAppender you need to specify it explicitly. The JsonLayout itself has some configurable settings, but for programmatic processing the defaults are recommended except for setting singleLine to true.

Full documentation of the DailyRollingFileAppender can be found here. See below for the documentation of ChildProcessAppender and JsonLayout.

ChildProcessAppender will start an external program given by the command parameter as a list. The external program will receive the metrics on its standard input and can handle them as you want. For special purpose handling it is recommended to create a custom script to handle the metrics as you want.

To enable sending the metrics, set the configured appender(s) to log at trace level for the org.transmart.audit namespace as in the last two lines.

This example configures both a file appender and a process appender, this will send the metrics both to a file and to an external process. If you don't need both you don't need to configure both appenders.

JsonLayout

JsonLayout extends EnhancedPatternLayout, and provides the following properties:

  • conversionPattern (String): Inherited from EnhancedPatternLayout. The serialized JSON will be inserted for the %m conversion character.
  • singleLine (boolean): Default: false. If true, each JSON message is serialized into a single line, without any newline characters. If false, the message will be pretty-printed. To get each JSON message formatted on a single line you will also need to use the (default) conversion pattern "%m%n".
  • dateFormat (String): A string describing the date format to use in the JSON, according to java.text.SimpleDateFormat. Default "yyyy-MM-dd HH:mm:ss.SSSX"
  • printNulls (boolean): Include fields whose value is null. Default: true.

ChildProcessAppender documentation

(Copied from the java docs)

* This appender spawns a process, and sends log messages encoded as json to the process which can read them on its
* standard input. If the process dies or writing to its input pipe fails for some other reason the process is
* restarted. Note that ChildProcessAppender does not forcibly kill its child process in such cases, only the
* stdin pipe is closed. It is expected that the process will exit if its stdin is closed, but a misbehaving process
* may live on.
*
* Process management is done using a restart counter and a time window. If the child process needs to be restarted
* more than a set number of times within the time window, the process is considered to be broken and this appender
* will stop trying to restart it and go into a 'broken' state. If more elaborate process management is needed you
* should configure this appender to start the child under a process manager program.
*
* This appender has the standard properties inherited from AppenderSkeleton, i.e. name, filter and threshold.
* Properties layout and errorHandler are not used. Furthermore there are the following properties:
*
* command: List<String> The command to run to start the external process
*
* restartLimit: int (default 15) The number of times the child process will be restarted within the restartWindow
* before this appender decides that the child process configuration is broken. Set to 0 to disable the restart
* limiting feature. Doing so can cause running in an infinite restart loop if the child process exits immediately,
* so doing so is not recommended for production deployments.
*
* restartWindow: int (default 1800) The number of seconds of the restartWindow. If the child process fails more
* than restartLimit times within this window, this is interpreted as a configuration error for the child. The child
* is not restarted and this appender goes into a 'broken' state.
*
* throwOnFailure: boolean (default false) Throw an exception if this appender goes into the 'broken' state or if it
* is broken and it is asked to handle new log messages. Enable this if you want to be sure Transmart fails fast if
* the child process cannot be restarted.

Metrics Processing Scripts

To process the metrics you have two options: read the files generated by the DailyRollingFileAppender, or create a script to use with the ChildProcessAppender. For batch processing the files approach can be used, for realtime processing the process appender is most useful. In the /demo directory in TransmartApp you can find an example of a script that processes the metrics by doing some transformations and sending them to a web application, reproduced here:

audit_logger.py
#!/usr/bin/env python3
#
# Tested with python 3.4
#
# This script will read json-encoded auditlog messages from Transmart and send them as HTTP POST messages to
# an URL. The input must be provided as json objects separated by newlines. The json objects themselves must not
# contain unencoded newlines.
#
# This script will make up to THREADS requests in parallel to the metrics server. If a request fails this is noted
# on stderr, but the request is not retried so in that case the message is lost.

import sys, os
import logging
import concurrent.futures
import json
from threading import Lock
import urllib.request
import urllib.parse
from urllib.request import Request
# httpagentparser is not in the standard library, use "pip3 install httpagentparser" to install it
import httpagentparser

URL = 'http://metricsrecorder.example.com/recordMetric'
# Maximum timeout when waiting on the metrics server
TIMEOUT = 10 #seconds
# If there are more than this number of unprocessed messages, further messages will be dropped until the worker
# threads catch up.
MAX_QUEUE_LENGTH = 1000
# Max number of parallel requests to the metrics server
THREADS = 20


countlock = Lock()
queuelength = 0
failcount = 0

logging.basicConfig(level=logging.INFO, format='{asctime} {levelname}: {message}', style='{',
                    datefmt="%Y-%m-%d %H:%M:%S%z")
log = logging.getLogger()

def send_auditlog_record(line):
    global failcount, queuelength

    with countlock:
        queuelength -= 1

    # Invalid json input is not an error we can handle
    msg = json.loads(line)
    task = msg['event']
    user = msg['user']
    if task == "User Access":
        action = user
    else:
        action = msg.get('action') or '|'.join(msg.get(x) for x in ('study', 'subset1', 'subset2', 'analysis', 'query', 'facetQuery', 'clientId') if msg.get(x))
    args = dict(action = action,
                application = msg['program'],
                appVersion = msg['programVersion'],
                user = user,
                task = task,
            )
    if msg['userAgent']:
        args['browser'] = '<unknown browser>'
        browser = httpagentparser.detect(msg['userAgent']).get('browser')
        if browser:
            args['browser'] = browser['name'] + ' ' + browser['version']
    fullurl = URL + '?' + urllib.parse.urlencode(args)
    #print(fullurl)
    try:
        #raise Exception('testing')
        urllib.request.urlopen(Request(fullurl, method='POST'), timeout=TIMEOUT).readall()
    except Exception as exc:
        with countlock:
            failcount += 1
        log.error("{e}, url: {url}".format(e=' '.join(str(e) for e in exc.args), url=fullurl))

def process(line):
    try:
        send_auditlog_record(line)
    except BaseException as e:
        # An exception here is a programming error or something serious. As this is not the main thread we can't just
        # let the exception bubble up, so kill ourselves forcefully.
        log.fatal(str(e) + ", aborting!", exc_info=e)
        os.abort()


with concurrent.futures.ThreadPoolExecutor(max_workers=THREADS) as executor:
    for line in sys.stdin:
        if queuelength > MAX_QUEUE_LENGTH:
            log.error("MAX_QUEUE_LENGTH exceeded, ignoring message {line}".format(line=line))
            continue
        with countlock:
            queuelength += 1
        executor.submit(process, line)

sys.stdin.close()

You will need to adapt this script to your own use cases, or write your own.

Events

The events that are generated will depend on the version of Transmart you are using. In the current (April 2016) development release the following events exist:

EventValue of 'event' fieldEvent-specific extra fields

User login

User Access
 
Study access
Clinical Data Access
study
Execution of Summary Statistics
Summary Statistics
study, subset1, subset2
Execution of an Advanced Workflow  

Clinical data export

(High dimensional or low dimensional)

Clinical Data Exported - ${exportTypes}

(${exportTypes} is replaced by the actual export types)

study

Active filter search

Clinical Data Active Filter
query
Export of high dimensional data to Genedata  
GWAS: study accessed
Gwas Study Access
experiment, analysis, export






GWAS: analysis accessed
Gwas Analysis Access
GWAS: table view used
Gwas Table View
GWAS: Gwava used
Gwava
GWAS: CSV export
Gwas CSV Export
GWAS: files export
Gwas Files Export
GWAS: email analysis
Gwas Email Analysis
GWAS: active filter search
GWAS Active Filter
query, facetQuery

new Gene List signature

New Gene Signature
action, filename, size
new Gene/RsId list
New Gene_RSID List
action, filename, size
OAuth authentication
OAuth authentication
 

The events include the following properties (if applicable). All values are encoded as strings.

Fields that are always present:

  • program (always "Transmart")
  • programVersion: The current Transmart version
  • user: the users login name
  • event: description of the event that happened
  • userAgent: The user agent string from the client
  • timestamp: the date in ISO format

Event-specific fields:

  • study
  • subset1
  • subset2
  • query
  • experiment
  • analysis
  • export
  • action
  • filename
  • size
  • clientId: the clientId that the authenticating client sent
  • facetQuery