🏠 Home
Author: Bhanoji Duppada | 2025-07-26 13:19:16

Reading/Writing Clinical Data (CSV, Excel, XPT, SAS7BDAT)

Introduction

Transitioning from a Clinical SAS Programmer to an R Programmer can feel daunting, especially when working with clinical trial data, which is often stored in formats like CSV, Excel, XPT, and SAS7BDAT. This chapter will guide you through reading and writing these file types in R, providing a smooth transition for SAS users.

By the end of this chapter, you will:
- Understand how to import/export clinical data in R.
- Compare R functions to their SAS equivalents.
- Work with common file formats and troubleshoot common issues.

Whether you're analyzing clinical trial data for regulatory submissions or exploratory research, mastering these skills is essential.


Table of Contents

  1. Reading Data in R
  2. CSV Files
  3. Excel Files
  4. XPT Files (SAS Transport Format)
  5. SAS7BDAT Files (SAS Dataset)
  6. Writing Data in R
  7. Comparison Between SAS and R
  8. Practical Examples
  9. Common Errors and Fixes
  10. Summary & Key Takeaways

Reading Data in R

1. CSV Files

CSV (Comma-Separated Values) files are one of the most common formats for clinical data storage.

Base R Method (read.csv())

data_csv <- read.csv("clinical_data.csv", header = TRUE, stringsAsFactors = FALSE)

readr Package Method (read_csv())

library(readr)
data_csv <- read_csv("clinical_data.csv")

SAS Equivalent

PROC IMPORT DATAFILE="clinical_data.csv"  
    OUT=work.clinical_data  
    DBMS=CSV REPLACE;  
    GETNAMES=YES;  
RUN;  

2. Excel Files

Excel (.xlsx or .xls) files are widely used, especially for case report forms (CRFs).

readxl Package (read_excel())

library(readxl)
data_excel <- read_excel("clinical_data.xlsx", sheet = 1)

openxlsx Package

library(openxlsx)
data_excel <- read.xlsx("clinical_data.xlsx", sheet = 1)

SAS Equivalent

PROC IMPORT DATAFILE="clinical_data.xlsx"  
    OUT=work.clinical_data  
    DBMS=XLSX REPLACE;  
    SHEET="Sheet1";  
RUN;  

3. XPT Files (SAS Transport Format)

XPT files are standard for CDISC SDTM (Study Data Tabulation Model) submissions.

haven Package (read_xpt())

library(haven)
data_xpt <- read_xpt("clinical_data.xpt")

SAS Equivalent

LIBNAME xptfile XPORT "clinical_data.xpt";  
DATA work.clinical_data;  
    SET xptfile.clinical_data;  
RUN;  

4. SAS7BDAT Files (SAS Dataset)

If you need to directly read SAS datasets (.sas7bdat) in R:

haven Package (read_sas())

data_sas <- read_sas("clinical_data.sas7bdat")

SAS Equivalent

DATA work.clinical_data;  
    SET saslib.clinical_data;  
RUN;  

Writing Data in R

1. Exporting to CSV

write.csv(data_csv, "output_data.csv", row.names = FALSE)

readr Alternative (write_csv())

write_csv(data_csv, "output_data.csv")

2. Exporting to Excel

write.xlsx(data_excel, "output_data.xlsx")

3. Exporting to XPT

write_xpt(data_xpt, "output_data.xpt")

Comparison Between SAS and R

Task SAS Code R Equivalent
Read CSV PROC IMPORT DBMS=CSV read.csv() or read_csv()
Read Excel PROC IMPORT DBMS=XLSX read_excel()
Read XPT LIBNAME XPT read_xpt()
Read SAS7BDAT DATA SET= read_sas()
Write CSV PROC EXPORT DBMS=CSV write.csv() or write_csv()
Write XPT LIBNAME XPT write_xpt()

Practical Examples

Example 1: Reading Multiple Excel Sheets

library(readxl)
file <- "patient_data.xlsx"
sheets <- excel_sheets(file)  
data_list <- lapply(sheets, function(x) read_excel(file, sheet = x))
names(data_list) <- sheets  

Example 2: Merging Clinical Datasets

# Read two datasets
demographics <- read_csv("demographics.csv")  
labs <- read_csv("labs.csv")  

# Merge by patient ID
merged_data <- merge(demographics, labs, by = "SUBJID")

Common Errors and Fixes

Error Solution
"File not found" Check file path (use file.exists()).
Encoding issues Use read_csv(..., locale = locale(encoding = "UTF-8")).
Missing labels in XPT Ensure haven package is used.
Slow Excel import Use openxlsx for large files.

Summary & Key Takeaways

Reading Data
- Use read_csv() for CSV, read_excel() for Excel, read_xpt() for XPT, and read_sas() for SAS datasets.
- The haven package is best for SAS and XPT files.

Writing Data
- Use write_csv() for CSV, write.xlsx() for Excel, and write_xpt() for XPT.

SAS vs. R Comparison
- R functions are more concise but require package installations (readr, haven, readxl).
- SAS has built-in support for clinical data formats, while R relies on external packages.

Best Practices
- Always check for missing values (is.na()).
- Use dplyr for efficient data manipulation.

By mastering these basics, you’ll be well-equipped to handle clinical data in R, making your transition from SAS smoother!


Next Steps

Happy coding! 🚀