Built 2022-02-05 using NMdata 0.0.11.
Please make sure to see latest version available here.
This cheat sheet is intented to provide an overview and remind of command names. Please refer to other vignettes for more details on specefic topics and individual manual pages for details on the functions.
The order of the contents loosely follows a workflow example (except for the configuration part). The steps can be applied in any order and independently from each other.
install.packages("NMdata")
library(NMdata)
We have read some source data files and need to combine them into one data set for NONMEM. Key steps are stacking data sets (like doses, samples, and simulation records) and adding additional information such as covariates. We often use rbind
and merge or join operations for these steps. NMdata helps explore how to do these steps and ensure that merge/join results are as expected.
compareCols
- Compare presence and classes of columns across data sets before merging or stacking.
compareCols(covs,covs2)
#> Dimensions:
#> data nrows ncols
#> 1: covs 150 2
#> 2: covs2 150 2
#>
#> Columns that differ:
#> column covs covs2
#> 1: WEIGHTB numeric <NA>
#> 2: race <NA> character
Use the cols.wanted
argument for the overview to especially focus on the columns you need in your final data set.
mergeCheck(x,y,...)
- Merges data and only accept results if all that happened was that columns from y
were added to x
. Row order of x
is retained. Arguments are passed to data.table which does the actual merge. This completely automates the necessary checks when say merging covariates onto data.
pk2 <- mergeCheck(pk,covs2,by="ID")
#> The following columns were added: race
We did not get an error from mergeCheck
so we know that the rows in pk2
are exactly identical to those in pk
, except the addition of a column called cov2
. If rows duplicate or disappear mergeCheck
does a good job telling you where in data to address the issues.
renameByContents
- Keep track of what columns are compatible with NONMEM by renaming columns accordingly. NMisNumeric
evaluates whether NONMEM can interpret contents as numeric (different from is.numeric
):
## Example 1: Append an "N" to columns that NONMEM _can_ read (as numeric)
pk <- renameByContents(data=pk,
fun.test = NMisNumeric,
fun.rename = function(x)paste0(x,"N"))
## Example 2: lowercase names of columns that NONMEM _cannot_ read
pk <- renameByContents(data=pk,
fun.test = NMisNumeric,
fun.rename = tolower,
invert.test = TRUE)
flagsAssign
- Sequentially assign exclusion flags to a dataset based on a set of user-specified exclusion criteria.
flagsCount
- Create an overview of number of retained and discarded datapoints.
Example with only two exclusion flags applied to samples. If time is negative, we assign exclusion flag FLAG=100
. If (time is non-negative and) BLQ==1
we assign FLAG=10
. If none of these conditions are met, FLAG=0
, and the row is to be included in the analysis. (fread
is just for row-wise readability.)
dt.flags <- fread(text="FLAG,flag,condition
10,Below LLOQ,BLQ==1
100,Negative time,TIME<0")
pk <- flagsAssign(pk,tab.flags=dt.flags,subset.data="EVID==0")
#> Coding FLAG = 100, flag = Negative time
#> Coding FLAG = 10, flag = Below LLOQ
pk <- flagsAssign(pk,subset.data="EVID==1",flagc.0="Dosing")
flagsCount(pk[EVID==0],tab.flags=dt.flags)[,.( flag, N.left, Nobs.left, N.discard, Nobs.discard)]
#> flag N.left Nobs.left N.discard Nobs.discard
#> 1: All available data 150 1352 NA NA
#> 2: Negative time 150 1350 0 2
#> 3: Below LLOQ 131 755 19 595
#> 4: Analysis set 131 755 NA NA
You may also want to apply a couple of exclusion criteria to dose records (for missing time, zero or missing amounts?) by modifying the steps above and applying to EVID==1
.
NMorderColumns
- Standardize column order. Columns that can be read by NONMEM are prioritized. Row identifier and standard column names have special priorities.
NMcheckData
- Extensive data checks for NONMEM compatibility and common issues. Should be run before saving data but see the “Debuging…” section for example on output.
NMwriteData
- Write data ensuring compatibility with NONMEM. By defaults saves both a csv (for NONMEM) and an rds (for R, retaining factor levels etc). Text for optional use in $INPUT
and $DATA
NONMEM sections is returned. script
and args.stamp
are optional arguments, see “Traceability” section for their purpose.
text.nm <- NMwriteData(pk,file="derived/pkdata.csv",script="NMdata-cheat.Rmd",args.stamp=list(Description="PK data for the NMdata Cheatsheet"))
#> Data written to file(s):
#> derived/pkdata.csv
#> derived/pkdata.rds
#> For NONMEM:
#> $INPUT ROW ID NOMTIME TIME EVID CMT AMT DV FLAG STUDY BLQ CYCLE DOSE
#> PART PROFDAY PROFTIME eff0
#> $DATA derived/pkdata.csv
#> IGN=@
#> IGNORE=(FLAG.NE.0)
NMwriteSection
- Replace sections of a NONMEM control stream. Can use the text generated by NMwriteData
to update NONMEM runs to match the newly generated input data. Update INPUT section (and not DATA) for all control streams in directory “nonmem” which file names start with “run1” and end in “.mod” (say “run101.mod” to “run199.mod”):
NMwriteSection(dir="nonmem",
file.pattern="run1.*\\.mod",
list.sections=text.nm["INPUT"])
NMwriteSection
has the argument data.file
to further limit the scope of files to update based on what data file the control streams use. It only makes sense to use the auto-generated text for control streams that use this data set.
The text for NONMEM is generated by NMgenText
. Use that to generate alternative $INPUT
sections (e.g. for models that use other columns as dependent variables) without saving data again. You can tailor the generation of the text to copy (DV=CONC)
, drop (COL=DROP)
, rename (DV
instead of CONC
) and more.
NMcheckData
can check a data.frame
. However, it can also be run on a path to a control stream, in which case it provides a full check of how data is read by NONMEM and then checks the data as read by NONMEM. It checks column names in INPUT section against data and then runs a full check of the data set as read by NONMEM (according to column names in $INPUT and ACCEPT/IGNORE statements in $DATA). We suppress the default print to terminal (quiet=T
) and provide selected parts of the results here.
res.debug <- NMcheckData(file="nonmem/run201.mod",quiet=T)
## we will only show some of what is available here
names(res.debug)
#> [1] "datafile" "tables" "dataCreate" "input.filters"
#> [5] "input.colnames" "NMcheckData"
## Meta data on input data file:
res.debug$tables
#> source name nrow ncol nid filetype file.mtime
#> 1: input pkdata.csv 1502 23 150 text 2022-02-05 12:58:17
#> file has.col.row has.col.id
#> 1: nonmem/../derived/pkdata.csv TRUE TRUE
In this model we forgot to update the control stream INPUT section after adding a column to data (“off” means that INPUT text can be reorganized to match data file better):
## Comparison of variable naming:
res.debug$input.colnames[c(1:2)]
#> datafile INPUT nonmem result compare
#> 1: ROW ROW ROW ROW OK
#> 2: ID ID ID ID OK
res.debug$input.colnames[c(9:12)]
#> datafile INPUT nonmem result compare
#> 1: FLAG FLAG FLAG FLAG OK
#> 2: STUDY BLQ BLQ BLQ off
#> 3: BLQ CYCLE CYCLE CYCLE off
#> 4: CYCLE DOSE DOSE DOSE off
We have some findings on the data set too. But since res.debug$input.colnames
tells us we are reading the data incorrectly, we have to address that before interpreting findings on the data.
res.debug$NMcheckData$summary
#> column check N Nid
#> 1: EVID Subject has no obs 19 19
#> 2: MDV Column not found 1 0
If you are preparing a data set, run NMcheckData
directly on the data (using the data
argument) insted of on a control stream.
NMscanData
- Automatically find NONMEM input and output tables and organize data. By default, available column names are taken from the NONMEM control stream. Additional column names (columns not read by NONMEM) are taken from input data file.
res1 <- NMscanData("nonmem/run101.lst")
#> Model: run101
#> Input and output data merged by: ROW
#>
#> Used tables, contents shown as used/total:
#> file rows columns IDs
#> run101_res.txt 905/905 7/7 150/150
#> run101_res_vols.txt 905/905 3/7 150/150
#> run101_res_fo.txt 150/150 1/2 150/150
#> pkdata.rds (input) 905/1502 20/23 150/150
#> (result) 905 31+2 150
#>
#> Distribution of rows on event types in returned data:
#> EVID Output
#> 0 755
#> 1 150
The following plot serves to illustrate that the obtained data set combines output tables (PRED
is from a $TABLE statement) with input data (exclusion flags are represented as character variables). Moreover, the “below LLOQ” samples are included in the result even though they were not in the analysis (excluded using IGNORE
in control stream, recovered in NMscanData
using recover.rows=TRUE
)
## Recover rows that were not read by NONMEM (due to ACCEPT/IGNORE)
res2 <- NMscanData("nonmem/run101.lst",recover.rows=TRUE)
#> Model: run101
#> Input and output data merged by: ROW
#>
#> Used tables, contents shown as used/total:
#> file rows columns IDs
#> run101_res.txt 905/905 7/7 150/150
#> run101_res_vols.txt 905/905 3/7 150/150
#> run101_res_fo.txt 150/150 1/2 150/150
#> pkdata.rds (input) 1502/1502 20/23 150/150
#> (result) 1502 31+2 150
#>
#> Distribution of rows on event types in returned data:
#> EVID Input only Output
#> 0 597 755
#> 1 0 150
library(ggplot2)
res2.plot <- subset(res2,ID==135&EVID==0)
ggplot(res2.plot,aes(TIME))+
geom_point(aes(y=DV,colour=flag))+
geom_line(aes(y=PRED))+
labs(y="Concentration (unit)",subtitle=unique(res2$model))
#> Warning: Removed 2 row(s) containing missing values (geom_path).
Read the messages from NMwriteData
and NMscanData
carefully and notice that an rds file was written and read. This bypasses the loss of information caused by writing and reading csv, and so we have kept factor levels from the input data we generated:
levels(res1$trtact)
#> [1] "Placebo" "3 mg" "10 mg" "30 mg" "100 mg" "300 mg"
Use the many options in NMdataConf
to tailor NMdata behaviour to your setup and preferences. Make NMdata functions return data.tables or tibbles:
NMdataConf(as.fun=tibble::as_tibble)
NMdataConf(as.fun="data.table")
By default, NMdata functions will look for a unique row identifier in columns called ROW
. If you call this column REC
, do
NMdataConf(col.row="REC")
By default, NMdata is configured to read files from PSN in which case the input control stream is needed to find the input data. Do this if you don’t use PSN:
NMdataConf(file.mod=identity)
Loosely speaking, NMdataConf
changes default values of NMdata function arguments. Many options can be configured this way so you don’t have to remember to type in those arguments every time you call an NMdata funtion.
NMinfo
- Get metadata from an NMdata object. This will show where and when input data was created, when model was run, results of concistency checks, what tables were read, how they were combined and a complete list of data columns and their origin.
A list of the available elements:
names(NMinfo(res1))
#> [1] "details" "datafile" "dataCreate" "input.colnames"
#> [5] "tables" "columns"
The information recorded during saving of the input data:
NMinfo(res1,"dataCreate")
#> $DataCreateScript
#> [1] "NMdata-cheat.Rmd"
#>
#> $CreationTime
#> [1] "2022-02-05 12:58:17 EST"
#>
#> $writtenTo
#> [1] "derived/pkdata.rds"
#>
#> $Description
#> [1] "PK data for the NMdata Cheatsheet"
A full list of columns in all columns in output and input data is included. The source data file and the column number in the result (COLNUM
) are listed.
NMinfo(res1,"columns")[1:8]
#> variable file source level COLNUM
#> 1: ROW run101_res.txt output row 1
#> 2: ID run101_res_vols.txt output row 2
#> 3: NOMTIME pkdata.rds input row 3
#> 4: TIME pkdata.rds input row 4
#> 5: EVID pkdata.rds input row 5
#> 6: CMT pkdata.rds input row 6
#> 7: AMT pkdata.rds input row 7
#> 8: DV run101_res.txt output row 8
We saw earlier that we got “30+2” columns back. We see that the additional two were added by NMscanData (source
). DV
was already included from another table so the redundant DV
column is omitted.
NMinfo(res1,"columns")[30:33]
#> variable file source level COLNUM
#> 1: flag pkdata.rds input row 30
#> 2: trtact pkdata.rds input row 31
#> 3: model <NA> NMscanData model 32
#> 4: nmout <NA> NMscanData row 33
NMscanTables
- Find and read all output data tables based on a NONMEM control stream file. A list of tables is returned.
NMreadTab
- Read an output table file from NONMEM based on path to output data file
NMscanInput
- Read input data based on NONMEM control stream and optionally translate column names according to the $INPUT
NONMEM section
NMreadCsv
- Read input data formatted for NONMEM