Nothing Special   »   [go: up one dir, main page]

Introduction to stminsights

A Shiny Application for Inspecting Structural Topic Models

Topic models are widely used statistical models for reducing the dimensionality of textual data. Although the approach is quantitative in nature, model selection and validation of topic model results can be quite labor intensive, as it requires qualitative inspection of many documents and terms. This is were stminsights comes in: the package enables interactive validation, interpretation and visualization of one or several Structural Topic Models (stm). In case you are not familiar with structural topic models, the stm package vignette is an excellent starting point.

How to Install

Stminsights can be installed from CRAN by running install.packages('stminsights').

You can also download the latest development version of the app by running devtools::install_github('cschwem2er/stminsights').

For Windows users installing from Github requires proper setup of Rtools.

How to Use

Preparation

The main part of stminsights is an interactive shiny application, which requires a .RData file as input. This file should include:

As an example, the following code uses the quanteda package to prepare the gadarian corpus for structural topic modeling. Afterwards, two models and estimates effects are computed and all objects required for stminsights are stored in stm_gadarian.RData:

library(stm)
library(quanteda)

# prepare data
data <- corpus(gadarian, text_field = 'open.ended.response')
 docvars(data)$text <- as.character(data)

data <- tokens(data, remove_punct = TRUE) |>
   tokens_wordstem() |>
   tokens_remove(stopwords('english')) |> dfm() |>
   dfm_trim(min_termfreq = 2)

out <- convert(data, to = 'stm')

# fit models and effect estimates
gadarian_3 <- stm(documents = out$documents,
                 vocab = out$vocab,
                 data = out$meta,
                 prevalence = ~ treatment + s(pid_rep),
                 K = 3, verbose = FALSE)
prep_3 <- estimateEffect(1:3 ~ treatment + s(pid_rep), gadarian_3,
                        meta = out$meta)
gadarian_5 <- stm(documents = out$documents,
                 vocab = out$vocab,
                 data = out$meta,
                 prevalence = ~ treatment + s(pid_rep),
                 K = 5, verbose = FALSE)
prep_5 <- estimateEffect(1:5 ~ treatment + s(pid_rep), gadarian_5,
                        meta = out$meta)

# save objects in .RData file
save.image('stm_gadarian.RData')

Interactive application

After preparing the .RData file, the shiny application can be launched with run_stminsights():

library(stminsights)
run_stminsights()

Hovering over UI elements displays tooltips that assist users in navigating through the application. Stminsights is organized as a dashboard with multiple columns that serve different purposes:

Utility functions

Although the shiny application includes several options for exporting and visualizing the output from structural topic models, users may wish to create their own plots in different formats. For such cases stminsights offers three utility functions that can be used outside of the shiny application: