Setting Up a ShinyQDA Project

Abstract

ShinyQDA is an R package and Shiny application designed to fill the gap between traditional qualitative data analysis (QDA) and natural language processing (NLP) techniques such as tokenization, sentiment analysis, and topic modeling. The QDA features include the ability to code text using a flexible coding system as well as using rubrics. Tools for evaluating inter-rater reliability, code frequencies, and co-occurrence are provided. Users can leverage the NLP features to augment the QDA including the ability to see how these methods apply to individual text documents as well as across the entire dataset. Utility functions are provided for working with the data in R when moving beyond the features available in the Shiny application.

Installation

You can install the ShinyQDA package from Github using the remotes package:

remotes::install_github('jbryer/ShinyQDA', dependencies = c('Depends', 'Imports', 'Suggests))

Eventually you will be able to install from CRAN using the following command:

install.packages('ShinyQDA', dependencies = c('Depends', 'Imports', 'Suggests))

Much of the ShinyQDA application is implemented using Shiny modules. The package dependencies for each module are not isntalled using the default dependencies values. Although a minor inconvenience at installation time, this ensures that the ShinyQDA minimizes the packages with strict dependencies which is encouraged through the CRAN check process. If any required packages are not available the module will display a message with list of the missing packages in the Shiny application.

Creating New Project

To utilize the ShinyQDA application you need to initialize it with some data. At minimum you need a data frame with two columns: a unique identifier and the text data, although you can pass any other columns along and they will be preserved for analysis later. The following example is a small subset from the Diagnostic Assessment and Achievement of College Skills (DAACS) which was analyzed for a paper title College students’ responses to automated feedback on self-regulated learning (Yu et al., 2023). DAACS assesses key college skills in self-regulated learning, writing, mathematics and reading comprehension. For the writing assessment students are asked to identify strategies from their self-regulated learning they can improve and how they will use them to be successful. Students received scores from a machine learning algorithm and those scores are included to demonstrate the rubric features of ShinyQDA. The daacs_data data frame includes 17 essays along with their writing and self-regulation scores from two coders/raters. This is a subset of the 100 essays coded by five people for the Yu et al. (2023) paper.

You can run this application using the following command.

shiny::runApp(paste0(find.package('ShinyQDA'), '/daacs_demo/'))

The remainder of this document will outline the steps on how we setup the ShinyQDA with pre-existing data.

library(ShinyQDA)
data("daacs_data", package = 'ShinyQDA')

ShinyQDA::new_app(name = 'daacs_demo',
                  dir = getwd(),
                  qda_data = daacs_data,
                  id_column = 'id',
                  text_column = 'qda_text',
                  initialize_sentiment_dictionaries = TRUE,
                  run_app = FALSE)

This command will create a director at name/dir/, copy an app.R Shiny script, and initialize a SQLite database with the contents of the qda_data data frame parameter. You can now run this application and begin working with your qualitative data. The following section will discuss how we can populate the ShinyQDA application with pre-existing data.

Note that if initialize_sentiment_dictionaries = TRUE you will be prompted four times to download the four sentiment dictionaries. This is required if you intend to do any sentiment analysis. Please do check the licenses for each of these databases before using in production.

Setting up coding and scoring schemes

The ShinyQDA package uses a SQLite database to store all data. The qda() function is a wrapper to the database providing functions to select, insert, and delete data using R functions (i.e. no SQL knowledge required!). The following code will create the qda object and load data from the package that including codings, text responses, and rubric scores, respectively.

daacs_qda <- ShinyQDA::qda('daacs_demo/qda.sqlite')

data("daacs_codings", package = 'ShinyQDA')
data("daacs_text_responses", package = 'ShinyQDA')
data("daacs_rubric", package = 'ShinyQDA')

Text Questions

Text questions are questions you want each coder to respond to about each text document. That is, there is one set of questions for each text document. Going the the Setup -> Questions menu in the application will allow you to add questions through the application. ShinyQDA currently supports checkbox questions, radio button questions, or open text fields. These questions show up on the left hand side when in the coding view.

Screenshot of adding text questions

Alternatively, you can add the text questions manually through the qda object.

daacs_qda$add_text_question(stem = 'Non-responsive to prompt',
                            type = 'checkbox')
daacs_qda$add_text_question(stem = 'Additional comments about the text',
                            type = 'text')

You can get a data frame with all the specified text questions using the get_text_questions() function.

daacs_qda$get_text_questions()

Code Questions

Code questions are similar to text questions except there can be multiple responses per text document. When a users highlights some text and clicks the Add Code button, a dialog box will appear. The first input is a place to enter individual codes (see next section) along with an arbitrary set of questions defined either in the Setup -> Questions UI or using the following commands.

daacs_qda$add_code_question(
    stem = 'Content of Essay',
    type = 'checkbox',
    options = c('Definition of concepts',
                'Interpretation_strength',
                'Interpretation_weakness',
                'Interpretation_mixed',
                'Interpretation_medium',
                'Interpretation_strategies awareness',
                'Interpretation_strategies commitment',
                'Interpretation_judgments')
)
daacs_qda$add_code_question(
    stem = 'Does this text represent',
    type = 'checkbox',
    options = c('Strength',
                'Weakness',
                'Mixed',
                'Medium',
                'Strategies Awareness',
                'Strategies Commitment')
)
daacs_qda$add_code_question(
    stem = 'Judgment about survey or feedback',
    type = 'checkbox',
    options = c('Agree',
                'Disagree',
                'Valuable/helpful/useful',
                'Not valuable/helpful/useful',
                'Other')
)

Codes

Codes are the core functionality for qualitative data anlysis. You can define your codes using the application by going the Setup -> Codebook page. Alternatively, you can add codes through the qda object.

categories <- list(
    'metacognition' = c('planning', 'monitoring', 'evaluation'),
    'motivation' = c('mindset', 'test anxiety', 'mastery orientation'),
    'self efficacy' = c('self efficacy for online learning',
                        'self efficacy for writing',
                        'self efficacy for mathematics',
                        'self efficacy for reading'),
    'strategies' = c('managing environment', 'understanding', 'managing time', 'help seeking'),
    'procrastination' = c()
)
codes <- c(names(categories), unlist(categories)) |> unname()
daacs_qda$add_codes(codes)

The following chunk will loop through the codes setting the parent parameter to maintain our coding hierarchy defined in the ccategories list above.

for(i in seq_len(length(categories))) {
    category <- names(categories)[i]
    codes <- categories[[i]]
    for(i in codes) {
        daacs_qda$update_code(i, parent = category)
    }
}

Rubrics

In addition to coding text documents, you can specify a rubric that coders/raters can use to evaluate a text docuement. The daacs_rubric is the scoring rubric used for scoring (note students see a different rubric designed to provide feedback in a positive framework).

daacs_qda$add_rubric(
    rubric_name = 'daacs',
    description = 'DAACS Scoring Rubric',
    rubric = daacs_rubric
)

Importing existing data

The following sections provide sample code of how we can prepopulate the ShinyQDA application with pre-exisiting codings and scores.

Add existing codings

daacs_code_questions <- daacs_qda$get_code_questions()
for(i in seq_len(nrow(daacs_codings))) {
    coder <- daacs_codings[i,]$coder
    thetext <- daacs_data[daacs_data$id == daacs_codings[i,]$id,]$qda_text
    pos <- gregexpr(daacs_codings[i,]$text, thetext, fixed = TRUE)[[1]]
    coding_id <- daacs_qda$add_coding(
        id = daacs_codings[i,]$id,
        text = daacs_codings[i,]$text,
        codes = strsplit(daacs_codings[i,]$codes, ';')[[1]],
        start = pos,
        end = pos + nchar(daacs_codings[i,]$text),
        coder = coder
    )
    for(j in seq_len(nrow(daacs_code_questions))) {
        stem <- daacs_code_questions[j,]$stem
        if(stem %in% names(daacs_codings)) {
            val <- daacs_codings[i,stem]
            daacs_qda$add_code_question_response(coding_id = coding_id,
                                                 stem = stem,
                                                 answer = val,
                                                 coder = coder)
        }
    }
}

Add existing text questions

for(i in seq_len(nrow(daacs_text_responses))) {
    daacs_qda$add_text_question_response(
        id = daacs_text_responses[i,]$id,
        stem = daacs_text_responses[i,]$stem,
        answer = daacs_text_responses[i,]$answer,
        coder = daacs_text_responses[i,]$coder
    )
}

Add rubric scores

for(i in seq_len(nrow(daacs_data))) {
    for(j in unique(daacs_rubric$SubCriteria)) {
        daacs_qda$add_rubric_response(
            rubric_name = 'daacs',
            qda_id = daacs_data[i,]$id,
            coder = 'Rater1',
            criteria = j,
            score = daacs_data[i,j,drop=TRUE]
        )
    }
}

References

Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168–177. https://doi.org/10.1145/1014052.1014073

Hvitfeldt, E., & Silge, J. (2022). Text analysis in r. CRC Press.

Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x

Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29(3), 436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x

Nielsen, F. Äruprup. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. CoRR, abs/1103.2903. http://arxiv.org/abs/1103.2903

Schweinberger, M. (2023). Topic modeling with r (2023.02.05 ed.). The University of Queensland, Australia. School of Languages; Cultures.

Silge, J., & Robinson, D. (2022). Text mining with r. O’Reilly.

Yu, E., Lui, A. M., Andrade, H., Franklin, D., Akhmedjanova, D., & Bryer, J. (2023). College students’ responses to automated feedback on self-regulated learning. American Educational Research Association.

Jason Bryer, Ph.D.

2024-09-26