ARS Metadata in Excel
The official CDISC ARS metadata is represented in JSON format, allowing language-agnostic and efficient data transfer. However, there is an unofficial representation of ARS metadata in Excel, which has the benefit of being more easily human readable (and directly editable if needed). An example of such an Excel ARS workbook for the CDISC Pilot study can be found here. This example workbook has been modified (especially by adding AnalysisMethodCodeTemplate code from the cards package for performing operations - more on this later) and is the example ARS Excel file we’ll use for examples in this article.
Reading in Excel ARS metadata - what to expect
The ARS metadata represented in Excel contains all ARS information for the Reporting Event. It is divided into Excel Worksheets, roughly mapped/equivalent to the official JSON ARS representation and its classes (an example can be found here). For the purpose of automating Analysis Results using siera, a subset of the multiple Excel Worksheets contain the key information. These include:
- MainListOfContents (links outputs to their respective analyses)
- OtherListsOfContents (contains Output metadata)
- DataSubsets (filters data for individual analyses)
- AnalysisSets (filters data to get the Population Set for the Output)
- AnalysisGroupings (groups data according to analysis specifications)
- Analyses (contains linking information to all components required for calculation of results)
- AnalysisMethods (describes operations to be performed on data to get results)
- AnalysisMethodCodeTemplate (contains dynamically written R code to perform AnalysisMethod)
- AnalysisMethodCodeParameters (populates AnalysisMethodCodeTemplate with Analysis information)
A note on example code used in AnalysisMethodCodeTemplate: using “cards” package
A powerful aspect of the ARS model, is that dynamic code can be supplied as part of the metadata and be evaluated on either 1. the Output level, 2. the Analysis level, or 3. the AnalysisMethod level (for more information on this, see this PHUSE paper). The latter case is implemented by making use of the AnalysisMethodCodeTemplate and AnalysisMethodCodeParameters classes in the metadata. The user supplies dynamic R code (with the option of making use of parameters referring to metadata pieces), which will be executed to perform the relevant AnalysisMethod operations.
For the example ARS metadata in Excel shipped with this package, the R code used in the AnalysisMethodCodeTemplate is aligned with the cards R package (more information here). The cards package is a reputable R package consisting of wrappers for various statistical functions, and producing results in a consistent ARD format. Below is a simple example of how a function ard_continuous from the cards package is used in the AnalysisMethodCodeTemplate class in the accompanying Excel ARS metadata example shipped with this package:
# example of 'cards' code in AnalysisMethodCodeTemplate, using the ard_continuous function:
Analysis_ARD = ard_continuous(
data = filtered_ADSL,
by = c(byvariables_here),
variables = analysisvariable_here
)
In the above example of AnalysisMethodCodeTemplate code for a particular AnalysisMethod (summary statistics for a continuous variable), the ard_continuous function from the cards package is used. Notice, however, that two parameters are used in the code:
- ‘byvariables_here’, and
- ‘analysisvariable_here’.
These are example placeholder names, to be replaced by actual variables, depending on the context for the Analysis. For example, ‘byvariables_here’ could be replaced by TRT01A, and ‘analysisvariable_here’ could be replaced by AGE, for an ARD with summary statistics on AGE by Treatment. The same code, however, will be reused for summary statistics on e.g. HEIGHT for another Analysis, and the ‘analysisvariable_here’ will be replaced with HEIGHT. All this is dynamically applied within the readARS_xl function when generating R scripts using siera.
In order to facilitate the use of placeholders and their replacement with actual values from the metadata, several “constructs” have been defined in the readARS_xl function, and are available to the user to call from the AnalysisMethodCodeTemplate and AnalysisMethodCodeParameters classes in the metadata. These constructs are either simple variables or datasets defined in the metadata, or complex, dynamic constructs from various metadata pieces, to be inserted in specific cards functions, or pre-processing steps. A list of defined constructs and examples is shipped with this package, and can be accessed with:
ARS_example("cards_constructs.xlsx")
reading in Excel ARS metadata
The readARS_xl function reads in a completed ARS metadata Excel file, and generates R scripts for each Output defined in the file. Below is an example of how this can be done using a ready-to-use ARS metadata Excel file:
# Path to the Excel ARS metadata file:
ARS_path <- ARS_example("Common_Safety_Displays_cards.xlsx")
# Path to a folder which will contain the Output meta-programmed R scripts (recommended to update
# to a more suitable local path)
output_folder <- tempdir()
# Path to the folder containing ADaM datasets in csv format (we will use temporary
# directory tempdir() to make the code run in this vignette, but it's recommended to
# 1. download the ADaMs required (csv ADSL and ADAE available using e.g. ARS_example("ADSL.csv"))
# 2. store it in a folder (the directly downloaded location can be found using dirname(ARS_example("ADSL.csv")), for example. Use this location, or manually store the ADaM somewhere else)
# 3. use the folder path instead of tempdir() below.
ADaM_folder <- tempdir()
# run the readARS function with these 3 parameters. This creates R scripts
# (1 for each Output in output_folder)
readARS_xl(ARS_path, output_folder, ADaM_folder)
Try it out!
If you’ve followed along in the previous section example, you should now have 5 R scripts (named ARD_Out14-1-1.R, ARD_Out14-3-1-1.R, ARD_Out14-3-2-1.R, ARD_Out14-3-3-1a.R, and ARD_Out14-3-3-1b.R) in the folder specified as output_folder. If this is successful, you can execute any of these 5 R scripts as-is (assuming the ADaM required for this script is available in the ADaM_folder), and the result will be an ARD (one result per row format) for each of the scripts. An example of such an auto-generated ARD script is included in the package, and can be used by using the example_ARD_script function. Running this script will look like this:
# Step 1: open ARD_xxx.R file
# Step 2: Confirm the location of ADaM dataset(s) is correct in the code section "Load ADaM".
# For the sake of simplicity, the only update made to the ARD_Out14-1-1.R
# script was to point to the ADaM folder to ARS_example("ADSL.csv")
# Step 3: Run the code and enjoy automated analysis results generation.
# Note: the ARD is contained in the object "df4" (which contains the appended)
# mini-ARDs from all individual Analyses ARDs.
example_ARD_script = ARD_script_example("ARD_Out14-1-1.R")
source(example_ARD_script)
head(df4)
If this was run successfully, you should see an ARD, with one result (in the stat column) per row. Other columns provide context for the result, e.g. grouping variables, as well as columns linking the result to the ARS metadata origin (specifically, AnalysisId and OperationId columns).
Note: each Output requires one or more ADaM datasets to calculate the results from. The code to load these csv ADaMs are automatically added to the ARD_xxx.R script. As mentioned, some ADaMs are available using the ARS_example(“XXXX.csv”) function. However, Output 14-3-3-1a and Output 14-3-3-1b make use of the ADVS dataset, which is not currently shipped with the siera package due to size considerations. The relevant ADVS ADaM dataset can be obtained from the CDISC Pilot Study. All other outputs use either ADSL or ADAE, which is shipped with this package and can be accessed through the ARS_example function.
How do I get ARS metadata?
ARS metadata can be generated automatically by using the point-and-click Shell-designing software, TFL Designer (from Clymb Clinical). There is a free Community Version which can be accessed here, as well as an Enterprise version, with more information here.
What’s next?
Once an ARD is created, it contains all results required for the final Output. This already has many benefits, like making QC easier, reusing results for various reporting events, etc. Of course, however, there might be a need to convert the ARD to an RTF or PDF Output.
Although an ARD consists of results and the context for the results, it does not contain layout metadata (e.g. which columns there should be, what the indentation is, sorting order, etc.). This is outside the scope of the CDISC ARS. In order to convert the ARD to an Output in an R environment, two R packages in particular address this: 1. gtsummary 2. tfrmt
Both these R packages take ARD objects and convert them to tables, adding layout metadata in the process. For some of the usage in these packages, the resulting ARD from running siera needs to be modified slightly, but can be achieved using functions from the tidyverse and aiming for the expected input format for the respective table-creating package.
In summary:
Having started with a completed ARS metadata Excel file, we were able to
- Ingest the Excel file to produce auto-generated R scripts, and
- Run these R scripts to produce ARDs from ADaM datasets
By following these steps and utilizing table-generating R packages (creating tables from ARDs), we have a clear roadmap for end-to-end TFL automation, from Shell Design to final outputs.