Nothing Special   »   [go: up one dir, main page]

Skip to content

Latest commit

 

History

History

R

Folders and files

NameName
Last commit message
Last commit date
 
 
 
 
 
 

Analyzing MEPS data using R

Loading R packages

Note: R version 4.0.4 or later is recommended.

To load and analyze MEPS data in R, additional packages are needed. Packages are sets of R functions that are downloaded and installed into the R system. A package only needs to be installed once per R installation. Typically, this is done with the install.packages function to download the package from the internet and store it on your computer. The library function needs to be run every time the R session is re-started. Packages are tailor-made to help perform certain statistical, graphical, or data tasks. Since R is used by many analysts, it is typical for only some packages to be loaded for each analysis.

# Only need to run these once:
  install.packages("foreign")  
  install.packages("devtools")
  install.packages("tidyverse")
  install.packages("readr")
  install.packages("readxl")
  install.packages("haven")
  install.packages("survey")

# Run these every time you re-start R:
  library(foreign)
  library(devtools)
  library(tidyverse)
  library(readr)
  library(readxl)
  library(haven)
  library(survey)

Loading MEPS data

For data years 2017 and later (and also for the 2016 Medical Conditions file), .zip files for multiple file formats are available, including ASCII (.dat), SAS V9 (.sas7bdat), Stata (.dta), and Excel (.xlsx). Prior to 2017, ASCII (.dat) and SAS transport (.ssp) files are provided for all datasets.

Different functions are needed for importing these various file formats into R. The most versatile method is to use the read_MEPS function from the MEPS package, which was created to facilitate loading and manipulation of MEPS PUFs. For users that prefer not to use the MEPS R package to load MEPS public use files, care must be taken to ensure that the correct file format is being imported in accordance with the data year, as detailed in the sections below.

The following table summarizes the recommended functions and R packages needed to load the various file formats into R:

File type Package Function Example
All types MEPS read_MEPS read_MEPS(year=2017, type="DV")
ASCII (.dat) readr read_fwf read_fwf("C:/MEPS/h206b.dat", col_positions, col_types)
Excel (.xlsx) readxl read_excel read_excel("C:/MEPS/h206b.xlsx")
SAS V9 (.sas7bdat) haven read_sas read_sas("C:/MEPS/h206b.sas7bdat")
SAS XPORT (.ssp) foreign read.xport read.xport("C:/MEPS/h188b.ssp")
SAS CPORT (.ssp) (none) N/A N/A
Stata (.dta) haven read_dta read_dta("C:/MEPS/h206b.dta")

All data years: Using the MEPS package

The MEPS R Package was created to facilitate loading and manipulation of MEPS PUFs. It can be installed using the following commands:

library(devtools)

install_github("e-mitchell/meps_r_pkg/MEPS")
library(MEPS)

The read_MEPS function can then be used to import MEPS data into R directly from the MEPS website. This function automatically detects the best file format to import based on the specified data year and file.

In the following example, the 2016 (h188b), 2017 (h197b), and 2018 (h206b) Dental Visits files are automatically downloaded from the MEPS website and imported into R. Either the file name or the year and MEPS data type can be specified:

# Specifying year and MEPS data type
dn2016 = read_MEPS(year = 2016, type = "DV")
dn2017 = read_MEPS(year = 2017, type = "DV")
dn2018 = read_MEPS(year = 2018, type = "DV")

# Specifying MEPS file name
dn2016 = read_MEPS(file = "h188b")
dn2017 = read_MEPS(file = "h197b")
dn2018 = read_MEPS(file = "h206b")

# Access the help page for a full list of allowable "type" values:
help(get_puf_names)

Data years 2017 and later: ASCII, SAS V9, Stata, and XLSX

For users that prefer not to use the MEPS package, multiple data formats are available for MEPS public use files from data years 2017 and later (and also for the 2016 Medical Conditions file). Due to the fast loading speed and simplicity of code, the Stata data format (.dta) is the recommended file format.

IMPORTANT! SAS transport (.ssp) versions of most these files were created using the SAS CPORT engine. These CPORT data files cannot be read directly into R, and alternative file formats must be used instead.

Examples of loading each of the available file types are detailed below. For each example, the 2018 Dental Visits files (h206b) has been downloaded from the MEPS website, unzipped, and saved in the local directory C:/MEPS:

Stata (.dta) files -- recommended

Stata files can be loaded into R using the read_dta function from the haven package. These are the recommended file formats, since Stata files are generally the fastest to load into R among the available formats.

library(haven)
dn2018 = read_dta("C:/MEPS/h206b.dta")

SAS V9 (.sas7bdat) files

SAS V9 files can be loaded into R using the read_sas function from the haven package:

library(haven)
dn2018 = read_sas("C:/MEPS/h206b.sas7bdat")

ASCII (.dat) files

ASCII (.dat) fixed-width files can be loaded into R using the read_fwf function from the readr package. Additional instructions for loading these ASCII files into R can be found in the 'R Programming Statements' text file link provided with each data file release. For example, the R programming statements for the 2018 Dental Visits file can be viewed here: https://meps.ahrq.gov/data_stats/download_data/pufs/h206b/h206bru.txt

Excel (.xlsx) files

Excel files can be read into R using the read_excel function from the readxl package. However, some of the larger files (e.g. FYC, Longitudinal) can require a longer time to load, so this method is not recommended.

library(readxl)
dn2018 = read_excel("C:/MEPS/h206b.xlsx")

Data years 1996-2016: SAS XPORT format

For data years prior to 2017, ASCII and SAS transport (XPORT) file formats were released for the MEPS public use files. Since the required R programming statements are not available for the ASCII data files for these years, the SAS transport (.ssp) formats are the recommended file type (excluding the 2016 Medical Conditions file).

These files can be read into R using the read.xport function from the foreign package. In the following example, the SAS XPORT format of the 2016 Dental Visits file (h188b.ssp) has been downloaded from the MEPS website, unzipped, and saved in the local directory C:/MEPS.

library(foreign)
dn2016 = read.xport("C:/MEPS/h188b.ssp")

Automating file download

Instead of manually downloading, unzipping, and storing MEPS data files in a local directory, it may be beneficial to automatically download MEPS data directly from the MEPS website. This can be accomplished using the download.file and unzip functions. The following code downloads and unzips the 2018 Dental Visits file (h206b), and stores it in a temporary folder (alternatively, the file can be stored permanently by editing the exdir argument). The file can then be loaded into R using the appropriate function (e.g. read_dta for the Stata file format). The following example demonstrates this process for the Stata file (.dta):

# Load 'haven' library for 'read_dta' function
library(haven)

# Download Stata (.dta) zip file
url = "https://meps.ahrq.gov/mepsweb/data_files/pufs/h206b/h206bdta.zip"
download.file(url, temp <- tempfile())

# Unzip and save to temporary folder
meps_file = unzip(temp, exdir = tempdir())

# Alternatively, this will save a permanent copy of the file to the local folder "C:/MEPS/R-downloads"
# meps_file = unzip(temp, exdir = "C:/MEPS/R-downloads")

# Read the .dta file into R 
dn2018 = read_dta(meps_file)

To download additional files programmatically, replace 'h206b' in the above code with the desired filename (see meps_files_names.csv for a list of MEPS file names by data type and year). The full URL of each zip file can be found by right-clicking the 'ZIP' hyperlink on the web page for the data file, selecting 'Copy link address', then pasting into a text editor or code editor.

Screenshot of 'copy link address'

Saving R data (.Rdata)

Once the MEPS data has been loaded into R, it can be saved as a permanent R dataset (.Rdata) for faster loading. In the following code, the 2018 Dental Visits dataset is saved to the C:/ drive (first create the 'MEPS/R/data' folder if needed):

save(dn2018, file = "C:/MEPS/R/data/dn2018.Rdata")

This dataset can then be loaded into subsequent R sessions using the code:

load(file = "C:/MEPS/R/data/dn2018.Rdata")

Survey package in R

To analyze MEPS data using R, the survey package should be used to ensure unbiased estimates. The survey package contains functions for analyzing survey data by defining a survey design object with information about the sampling procedure, then running analyses on that object. Some of the functions in the survey package that are most useful for analyzing MEPS data include:

  • svydesign: define the survey object
  • svytotal: population totals
  • svymean: proportions and means
  • svyquantile: quantiles (e.g. median)
  • svyratio: ratio statistics (e.g. percentage of total expenditures)
  • svyglm: generalized linear regression
  • svyby: run other survey functions by group

To use functions in the survey package, the svydesign function specifies the primary sampling unit, the strata, and the sampling weights for the data frame. The survey.lonely.psu='adjust' option ensures accurate standard error estimates when analyzing subsets. Once the survey design object is defined, population estimates can be calculated using functions from the survey package. As an example, the following code will estimate total dental expenditures in 2018:

library(survey)

options(survey.lonely.psu='adjust')

mepsdsgn = svydesign(
  id = ~VARPSU,
  strata = ~VARSTR,
  weights = ~PERWT18F,
  data = dn2018,
  nest = TRUE)  

svytotal(~DVXP18X, design = mepsdsgn)

R examples

In order to run the example codes, you must download the relevant MEPS files from the MEPS website and save them to your local computer. The codes are written under the assumption that these files are saved in the local directory "C:/MEPS/". However, the programs can be customized to point to an alternate directory as desired.

Workshop exercises

The following codes from previous MEPS workshops and webinars are provided in the workshop_exercises folder:

1. National health care expenses

exercise_1a.R: National health care expenses by age group, 2016
exercise_1b.R: National health care expenses by age group, 2018

2. Prescribed medicine purchases

exercise_2a.R: Purchases and expenses for narcotic analgesics or narcotic analgesic combos, 2016
exercise_2b.R: Purchases and expenses for narcotic analgesics or narcotic analgesic combos, 2018

3. Pooling data files

exercise_3a.R: Pooling MEPS FYC files, 2015 and 2016: Out-of-pocket expenditures for unisured persons ages 26-30 with high income
exercise_3b.R: Pooling longitudinal files, panels 17-19
exercise_3c.R: Pooling MEPS FYC files, 2017 and 2018: People with joint pain, using JTPAIN31 for 2017 and JTPAIN31_M18 for 2018
exercise_3d.R: Pooling MEPS FYC files, 2017-2019: People with joint pain, using Pooled Linkage Variance file for correct standard error calculation (required when pooling before and after 2019)

4. Regression

exercise_4a.R: Logistic regression to identify demographic factors associated with receiving a flu shot in 2018 (using SAQ population)
exercise_4b.R: Logistic regression for persons that delayed medical care because of COVID, 2020

5. Plots

ggplot_example.R: Code to re-create the data and plot for Figure 1 in Statistical brief #491.

6. Linking Medical Conditions and Event Files

cond_pmed_2020.R: Utilization and expenditures for prescribed medicine purchases for hyperlipidemia, 2020
cond_mv_2020.R: Utilization and expenditures for office-based visits for mental health, 2020

Summary tables examples

The following codes provided in the summary_tables_examples folder re-create selected statistics from the MEPS-HC Data Tools:

Accessibility and Quality of Care

Access to Care

care_access_2017.R: Reasons for difficulty receiving needed care, by poverty status, 2017
care_access_2019.R: Number and percent of people who did not receive treatment because they couldn't afford it, by poverty status, 2019

Diabetes Care

care_diabetes_a1c_2016.R: Adults with diabetes receiving hemoglobin A1c blood test, by race/ethnicity, 2016

Quality of Care

care_quality_2016.R: Ability to schedule a routine appointment, by insurance coverage, 2016

Medical conditions

cond_expenditures_2015.R: Utilization and expenditures by medical condition, 2015 -- Conditions defined by collapsed ICD-9/CCS codes
cond_expenditures_2018.R: Utilization and expenditures by medical condition, 2018 -- Conditions defined by collapsed ICD-10/CCSR codes

Health Insurance

ins_age_2016.R: Health insurance coverage by age group, 2016

Prescribed drugs

pmed_prescribed_drug_2016.R: Purchases and expenditures by generic drug name, 2016
pmed_therapeutic_class_2016.R: Purchases and expenditures by Multum therapeutic class, 2016

Use, expenditures, and population

use_events_2016.R: Number of events and mean expenditure per event, for office-based and outpatient events, by source of payment, 2016
use_expenditures_2016.R: Expenditures for office-based and outpatient visits, by source of payment, 2016
use_expenditures_2019.R: Mean expenditure per person, by event type and source of payment, 2019.
use_race_sex_2016.R: Utilization and expenditures by race and sex, 2016