How to record audit metrics from Transmart in an external system
This page documents how to configure Transmart to record metrics for auditing to an external system.
Overview
Transmart supports logging of a set of metrics useful for auditing. These logs can be directed to log files as any other logs, or to an external program that can then do any site-specific processing required. Logs can be formatted as free text just like any other logging output, or in the form of JSON data structures, which is more useful for further programmatic processing.
Setting up audit metrics recording therefore requires the necessary configuration of the Transmart logging system, and if the external program method is used, creation of that program.
Transmart configuration
The metrics system is based on the built-in logging system. To configure it you specify specific log appenders in the Config.groovy file:
log4j = { /** * Configuration for writing audit metrics. * This needs to be placed in the out-of-tree Config.groovy, as the log4j config there will override this. * (and don't forget to 'import org.apache.log4j.DailyRollingFileAppender', * 'import org.transmart.logging.ChildProcessAppender' and 'import org.transmart.logging.JsonLayout'.) */ appenders { // default log directory is either the tomcat root directory or the // current working directory. def catalinaBase = System.getProperty('catalina.base') ?: '.' def logDirectory = "${catalinaBase}/logs".toString() // Use layout: JsonLayout(conversionPattern: '%m%n', singleLine: true) to get each message as a single line // json the same way as ChildProcessAppender sends it. appender new DailyRollingFileAppender( name: 'fileAuditLogger', datePattern: "'.'yyyy-MM-dd", fileName: "${logDirectory}/audit.log", layout: JsonLayout(conversionPattern:'%d %m%n') ) // the default layout is a JsonLayout(conversionPattern: '%m%n, singleLine: true) appender new ChildProcessAppender( name: 'processAuditLogger', command: ['/usr/bin/your/command/here', 'arg1', 'arg2'] ) } trace fileAuditLogger: 'org.transmart.audit' trace processAuditLogger: 'org.transmart.audit' }
Note that this configuration needs to go in your out-of-tree Config.groovy
, unless you don't have a log4j
block in your in-tree Config.groovy
. The out-of-tree log4j block will override the in-tree block if both are present. If you already have a log4j
block out-of-tree, add this configuration into it.
This block creates two appenders: DailyRollingFileAppender and ChildProcessAppender. These will write the metric entries to a destination. The first is a normal appender that writes to a file which is rotated daily. The ChildProcessAppender starts an external script and passes the metrics to it to handle. Both appenders need a Layout, which decides how to layout each metric event. To programmatically process the metrics in an external program the JsonLayout is most useful. For the ExternalProcessAppender JsonLayout is the default, for the DailyRollingFileAppender you need to specify it explicitly. The JsonLayout itself has some configurable settings, but for programmatic processing the defaults are recommended except for setting singleLine to true.
Full documentation of the DailyRollingFileAppender can be found here. See below for the documentation of ChildProcessAppender and JsonLayout.
ChildProcessAppender will start an external program given by the command parameter as a list. The external program will receive the metrics on its standard input and can handle them as you want. For special purpose handling it is recommended to create a custom script to handle the metrics as you want.
To enable sending the metrics, set the configured appender(s) to log at trace level for the org.transmart.audit namespace as in the last two lines.
This example configures both a file appender and a process appender, this will send the metrics both to a file and to an external process. If you don't need both you don't need to configure both appenders.
JsonLayout
JsonLayout extends EnhancedPatternLayout, and provides the following properties:
- conversionPattern (String): Inherited from EnhancedPatternLayout. The serialized JSON will be inserted for the %m conversion character.
- singleLine (boolean): Default: false. If true, each JSON message is serialized into a single line, without any newline characters. If false, the message will be pretty-printed. To get each JSON message formatted on a single line you will also need to use the (default) conversion pattern "%m%n".
- dateFormat (String): A string describing the date format to use in the JSON, according to java.text.SimpleDateFormat. Default "yyyy-MM-dd HH:mm:ss.SSSX"
- printNulls (boolean): Include fields whose value is null. Default: true.
ChildProcessAppender documentation
(Copied from the java docs)
* This appender spawns a process, and sends log messages encoded as json to the process which can read them on its
* standard input. If the process dies or writing to its input pipe fails for some other reason the process is
* restarted. Note that ChildProcessAppender does not forcibly kill its child process in such cases, only the
* stdin pipe is closed. It is expected that the process will exit if its stdin is closed, but a misbehaving process
* may live on.
*
* Process management is done using a restart counter and a time window. If the child process needs to be restarted
* more than a set number of times within the time window, the process is considered to be broken and this appender
* will stop trying to restart it and go into a 'broken' state. If more elaborate process management is needed you
* should configure this appender to start the child under a process manager program.
*
* This appender has the standard properties inherited from AppenderSkeleton, i.e. name, filter and threshold.
* Properties layout and errorHandler are not used. Furthermore there are the following properties:
*
* command: List<String> The command to run to start the external process
*
* restartLimit: int (default 15) The number of times the child process will be restarted within the restartWindow
* before this appender decides that the child process configuration is broken. Set to 0 to disable the restart
* limiting feature. Doing so can cause running in an infinite restart loop if the child process exits immediately,
* so doing so is not recommended for production deployments.
*
* restartWindow: int (default 1800) The number of seconds of the restartWindow. If the child process fails more
* than restartLimit times within this window, this is interpreted as a configuration error for the child. The child
* is not restarted and this appender goes into a 'broken' state.
*
* throwOnFailure: boolean (default false) Throw an exception if this appender goes into the 'broken' state or if it
* is broken and it is asked to handle new log messages. Enable this if you want to be sure Transmart fails fast if
* the child process cannot be restarted.
Metrics Processing Scripts
To process the metrics you have two options: read the files generated by the DailyRollingFileAppender, or create a script to use with the ChildProcessAppender. For batch processing the files approach can be used, for realtime processing the process appender is most useful. In the /demo directory in TransmartApp you can find an example of a script that processes the metrics by doing some transformations and sending them to a web application, reproduced here:
#!/usr/bin/env python3 # # Tested with python 3.4 # # This script will read json-encoded auditlog messages from Transmart and send them as HTTP POST messages to # an URL. The input must be provided as json objects separated by newlines. The json objects themselves must not # contain unencoded newlines. # # This script will make up to THREADS requests in parallel to the metrics server. If a request fails this is noted # on stderr, but the request is not retried so in that case the message is lost. import sys, os import logging import concurrent.futures import json from threading import Lock import urllib.request import urllib.parse from urllib.request import Request # httpagentparser is not in the standard library, use "pip3 install httpagentparser" to install it import httpagentparser URL = 'http://metricsrecorder.example.com/recordMetric' # Maximum timeout when waiting on the metrics server TIMEOUT = 10 #seconds # If there are more than this number of unprocessed messages, further messages will be dropped until the worker # threads catch up. MAX_QUEUE_LENGTH = 1000 # Max number of parallel requests to the metrics server THREADS = 20 countlock = Lock() queuelength = 0 failcount = 0 logging.basicConfig(level=logging.INFO, format='{asctime} {levelname}: {message}', style='{', datefmt="%Y-%m-%d %H:%M:%S%z") log = logging.getLogger() def send_auditlog_record(line): global failcount, queuelength with countlock: queuelength -= 1 # Invalid json input is not an error we can handle msg = json.loads(line) task = msg['event'] user = msg['user'] if task == "User Access": action = user else: action = msg.get('action') or '|'.join(msg.get(x) for x in ('study', 'subset1', 'subset2', 'analysis', 'query', 'facetQuery', 'clientId') if msg.get(x)) args = dict(action = action, application = msg['program'], appVersion = msg['programVersion'], user = user, task = task, ) if msg['userAgent']: args['browser'] = '<unknown browser>' browser = httpagentparser.detect(msg['userAgent']).get('browser') if browser: args['browser'] = browser['name'] + ' ' + browser['version'] fullurl = URL + '?' + urllib.parse.urlencode(args) #print(fullurl) try: #raise Exception('testing') urllib.request.urlopen(Request(fullurl, method='POST'), timeout=TIMEOUT).readall() except Exception as exc: with countlock: failcount += 1 log.error("{e}, url: {url}".format(e=' '.join(str(e) for e in exc.args), url=fullurl)) def process(line): try: send_auditlog_record(line) except BaseException as e: # An exception here is a programming error or something serious. As this is not the main thread we can't just # let the exception bubble up, so kill ourselves forcefully. log.fatal(str(e) + ", aborting!", exc_info=e) os.abort() with concurrent.futures.ThreadPoolExecutor(max_workers=THREADS) as executor: for line in sys.stdin: if queuelength > MAX_QUEUE_LENGTH: log.error("MAX_QUEUE_LENGTH exceeded, ignoring message {line}".format(line=line)) continue with countlock: queuelength += 1 executor.submit(process, line) sys.stdin.close()
You will need to adapt this script to your own use cases, or write your own.
Events
The events that are generated will depend on the version of Transmart you are using. In the current (April 2016) development release the following events exist:
Event | Value of 'event' field | Event-specific extra fields |
---|---|---|
User login | User Access | |
Study access | Clinical Data Access | study |
Execution of Summary Statistics | Summary Statistics | study, subset1, subset2 |
Execution of an Advanced Workflow | ||
Clinical data export (High dimensional or low dimensional) | Clinical Data Exported - ${exportTypes} ( | study |
Active filter search | Clinical Data Active Filter | query |
Export of high dimensional data to Genedata | ||
GWAS: study accessed | Gwas Study Access | experiment, analysis, export |
GWAS: analysis accessed | Gwas Analysis Access | |
GWAS: table view used | Gwas Table View | |
GWAS: Gwava used | Gwava | |
GWAS: CSV export | Gwas CSV Export | |
GWAS: files export | Gwas Files Export | |
GWAS: email analysis | Gwas Email Analysis | |
GWAS: active filter search | GWAS Active Filter | query, facetQuery |
new Gene List signature | New Gene Signature | action, filename, size |
new Gene/RsId list | New Gene_RSID List | action, filename, size |
OAuth authentication | OAuth authentication |
The events include the following properties (if applicable). All values are encoded as strings.
Fields that are always present:
- program (always "Transmart")
- programVersion: The current Transmart version
- user: the users login name
- event: description of the event that happened
- userAgent: The user agent string from the client
- timestamp: the date in ISO format
Event-specific fields:
- study
- subset1
- subset2
- query
- experiment
- analysis
- export
- action
- filename
- size
- clientId: the clientId that the authenticating client sent
- facetQuery
Related articles