Transitioning from a Clinical SAS Programmer to an R Programmer can feel daunting, especially when working with clinical trial data, which is often stored in formats like CSV, Excel, XPT, and SAS7BDAT. This chapter will guide you through reading and writing these file types in R, providing a smooth transition for SAS users.
By the end of this chapter, you will:
- Understand how to import/export clinical data in R.
- Compare R functions to their SAS equivalents.
- Work with common file formats and troubleshoot common issues.
Whether you're analyzing clinical trial data for regulatory submissions or exploratory research, mastering these skills is essential.
CSV (Comma-Separated Values) files are one of the most common formats for clinical data storage.
read.csv())data_csv <- read.csv("clinical_data.csv", header = TRUE, stringsAsFactors = FALSE)
header = TRUE means the first row contains variable names. stringsAsFactors = FALSE prevents character strings from converting to factors. readr Package Method (read_csv())library(readr)
data_csv <- read_csv("clinical_data.csv")
read.csv(). PROC IMPORT DATAFILE="clinical_data.csv"
OUT=work.clinical_data
DBMS=CSV REPLACE;
GETNAMES=YES;
RUN;
Excel (.xlsx or .xls) files are widely used, especially for case report forms (CRFs).
readxl Package (read_excel())library(readxl)
data_excel <- read_excel("clinical_data.xlsx", sheet = 1)
sheet = 1 reads the first sheet by default. .xlsx and .xls. openxlsx Packagelibrary(openxlsx)
data_excel <- read.xlsx("clinical_data.xlsx", sheet = 1)
PROC IMPORT DATAFILE="clinical_data.xlsx"
OUT=work.clinical_data
DBMS=XLSX REPLACE;
SHEET="Sheet1";
RUN;
XPT files are standard for CDISC SDTM (Study Data Tabulation Model) submissions.
haven Package (read_xpt())library(haven)
data_xpt <- read_xpt("clinical_data.xpt")
LIBNAME xptfile XPORT "clinical_data.xpt";
DATA work.clinical_data;
SET xptfile.clinical_data;
RUN;
If you need to directly read SAS datasets (.sas7bdat) in R:
haven Package (read_sas())data_sas <- read_sas("clinical_data.sas7bdat")
DATA work.clinical_data;
SET saslib.clinical_data;
RUN;
write.csv(data_csv, "output_data.csv", row.names = FALSE)
row.names = FALSE prevents an extra index column. readr Alternative (write_csv())write_csv(data_csv, "output_data.csv")
write.xlsx(data_excel, "output_data.xlsx")
write_xpt(data_xpt, "output_data.xpt")
| Task | SAS Code | R Equivalent |
|---|---|---|
| Read CSV | PROC IMPORT DBMS=CSV |
read.csv() or read_csv() |
| Read Excel | PROC IMPORT DBMS=XLSX |
read_excel() |
| Read XPT | LIBNAME XPT |
read_xpt() |
| Read SAS7BDAT | DATA SET= |
read_sas() |
| Write CSV | PROC EXPORT DBMS=CSV |
write.csv() or write_csv() |
| Write XPT | LIBNAME XPT |
write_xpt() |
library(readxl)
file <- "patient_data.xlsx"
sheets <- excel_sheets(file)
data_list <- lapply(sheets, function(x) read_excel(file, sheet = x))
names(data_list) <- sheets
# Read two datasets
demographics <- read_csv("demographics.csv")
labs <- read_csv("labs.csv")
# Merge by patient ID
merged_data <- merge(demographics, labs, by = "SUBJID")
| Error | Solution |
|---|---|
| "File not found" | Check file path (use file.exists()). |
| Encoding issues | Use read_csv(..., locale = locale(encoding = "UTF-8")). |
| Missing labels in XPT | Ensure haven package is used. |
| Slow Excel import | Use openxlsx for large files. |
✅ Reading Data
- Use read_csv() for CSV, read_excel() for Excel, read_xpt() for XPT, and read_sas() for SAS datasets.
- The haven package is best for SAS and XPT files.
✅ Writing Data
- Use write_csv() for CSV, write.xlsx() for Excel, and write_xpt() for XPT.
✅ SAS vs. R Comparison
- R functions are more concise but require package installations (readr, haven, readxl).
- SAS has built-in support for clinical data formats, while R relies on external packages.
✅ Best Practices
- Always check for missing values (is.na()).
- Use dplyr for efficient data manipulation.
By mastering these basics, you’ll be well-equipped to handle clinical data in R, making your transition from SAS smoother!
Happy coding! 🚀