Transitioning from Clinical SAS to R programming involves adapting to new tools and methodologies while retaining the core principles of clinical data analysis. One of the most critical tasks in clinical programming is creating Analysis-Ready Datasets (ADaM), which follow strict regulatory and standardization guidelines.
This chapter will guide you through replicating ADaM-style logic in R, covering data transformations, derivations, and integrity checks. By the end, you’ll be able to:
- Understand the parallels between SAS and R for ADaM dataset creation.
- Apply common ADaM derivations (e.g., ADSL, ADAE, ADLBC) in R.
- Implement metadata and traceability features.
- Validate datasets for regulatory compliance.
Let’s dive in!
ADaM (Analysis Data Model) datasets are designed to support statistical analysis with clear metadata, traceability, and standard variables. Key characteristics include:
| Concept | SAS Approach | R Equivalent |
|---|---|---|
| Data Steps | DATA step |
dplyr/data.table |
| Merge/Join | PROC SQL/MERGE |
merge()/join() |
| Metadata | PROC CONTENTS |
str()/attributes() |
| Variable Labels | LABEL statement |
Hmisc::label() |
SAS Example:
DATA ADSL;
SET SDTM.DM;
/* Derive AGE */
AGE = floor((input(RFSTDTC, yymmdd10.) - input(BRTHDTC, yymmdd10.)) / 365.25);
RUN;
R Equivalent:
library(dplyr)
library(lubridate)
ADSL <- SDTM_DM %>%
mutate(
RFSTDTC = ymd(RFSTDTC),
BRTHDTC = ymd(BRTHDTC),
AGE = floor(as.numeric(RFSTDTC - BRTHDTC) / 365.25)
)
Common Tasks:
- Flagging treatment-emergent adverse events (TEAEs).
- Calculating duration.
- Adding severity grades.
R Code Example:
ADAE <- SDTM_AE %>%
left_join(ADSL %>% select(USUBJID, TRTSDT), by = "USUBJID") %>%
mutate(
AESTDT = ymd(AESTDTC),
TEAE_FL = ifelse(AESTDT >= TRTSDT, "Y", "N"),
AEDUR = as.numeric(AEENDT - AESTDT)
)
Key Steps:
- Calculate baseline values.
- Flag abnormal values (ANRIND).
- Derive shift tables.
Example:
ADLBC <- SDTM_LB %>%
group_by(USUBJID, LBTESTCD) %>%
mutate(
BASETYPE = ifelse(VISIT == "SCREENING", "BASELINE", NA),
BASE = ifelse(BASETYPE == "BASELINE", LBSTRESN, NA),
CHG = LBSTRESN - BASE
)
In SAS:
LABEL AGE = "Age (Years)";
In R:
library(Hmisc)
label(ADSL$AGE) <- "Age (Years)"
Store metadata in attributes:
attr(ADSL, "creation_date") <- Sys.Date()
attr(ADSL, "programmer") <- "Your Name"
setdiff(ADSL$USUBJID, ADAE$USUBJID)
sapply(ADSL, function(x) sum(is.na(x)))
all(ADAE$USUBJID %in% ADSL$USUBJID)
dplyr for data manipulation (equivalent to SAS DATA steps). Replace PROC SQL with merge() or join().
Metadata Matters:
Track variable labels, derivations, and dataset history.
Validation is Critical:
Replicate SAS checks in R with functions like setdiff(), is.na().
Regulatory Readiness:
Practice converting a full ADaM dataset from SAS to R. Start with ADSL, then move to ADAE or ADLBC. Use real clinical data (anonymized) to build confidence!
In the next chapter, we’ll explore Statistical Outputs (Tables, Listings, and Graphs) in R.
This chapter equips you with foundational skills to create ADaM-style datasets in R. Keep experimenting, and soon the transition will feel seamless!