Data Visualization Workshop

with R shiny and Shiny Assistant


Dr. Xindi (Cindy) Hu, Assistant Professor at GWSPH

03-21-2025


  github.com/xindyhu/reach-data-viz-workshop

Meet Your Instructor đź‘©

Xindi (Cindy) Hu, ScD

Assistant Professor, George Washington University

  • Environmental Data Scientist and Environmental Health Researcher
  • 2024- Water, Health, Opportunity Lab at GW
  • 2018-2024 Principal Data Scientist at Mathematica, Inc.
  • 2018 ScD in Environmental Health, Harvard T.H. Chan School of Public Health
  • 2014 MS in Environmental Health, Harvard T.H. Chan School of Public Health
  • 2012 BS in Environmental Science, Peking University, China

Meet your classmates đź‘Ą

Join at menti.com using code 87160620

Acknowledgements

Much of the content in this section is from John Rauser’s talk on YouTube and JP Helveston’s course on Exploratory Data Analysis

A special thanks to Sayam Mukesh Palrecha, a Master of Science in Data Science candidate at GWU, class of 2026, for his invaluable assistance with preparing the R markdown and the R shiny code.

Learning objectives

  • Understand the basics of data visualization
  • Learn how to create interactive data visualization using R Shiny
  • Learn how to use Shiny Assistant to create interactive data visualization

Outline for today

1. How humans see data

  1. Introduction to R Shiny

  2. Introduction to Shiny Assistant

  3. Next-steps

Good data visualization is optimized for our visual-memory system

  • Helps us understand trends and patterns

  • Makes data more accessible to different audiences

  • Useful in decision-making and communication

The power of pre-attentive processing

Count all the 5s in the following image

The power of pre-attentive processing

Count all the 5s in the following image

What is pre-attentive processing?

  • Rapid, automatic processing of visual information before conscious attention kicks in.
  • Happens within <250 milliseconds.
  • Helps identify key patterns without effort.

Not all pre-attentive features are created equal

Feature Type Example
Color 🔴🔵 Different colored objects stand out
Size đź“Ź Larger objects draw attention first
Orientation ↗ A tilted line among vertical lines
Shape ◼️ ⬤ A square among circles

Where is the red dot?

Pre-Attentive vs. Attentive Processing

Feature Pre-Attentive Attentive
Speed Instant (<250 ms) Slow, deliberate
Effort Unconscious Requires focus
Example Spotting a red dot in a sea of gray Solving a math problem

🧠 Designing charts with pre-attentive features helps viewers understand data instantly!

Why Does This Matter for Data Visualization?

  • Viewers process visuals before reading text.
  • Using pre-attentive attributes can:
    • Direct focus to key insights.
    • Reduce cognitive load for interpretation.
    • Make data storytelling more effective.

đź’ˇ Good data visualization = Less work for the brain!

Cleveland’s three visual operations of pattern perception

🎯 Detection
      Recognizing that a geometric object encodes a physical value.

🧩 Assembly
      Grouping detected graphical elements into patterns.

đź“Ź Estimation

      Visually assessing the relative magnitude of two or more values. (Focus of today!)

Three levels of estimation

Level Example
1. Discrimination X = Y X != Y
2. Ranking X < Y X > Y
3. Ratioing X / Y = ?

đź“Ź We want to get as far down this list as possible with efficiency and accuracy

The most important measurement should exploit the highest ranked encoding possible

Source: Yau, N. (2013). Data Points: Visualization That Means Something. Wiley.

Introducing the coffee ratings dataset

  • These data contain reviews of 1312 arabica and 28 robusta coffee beans from the Coffee Quality Institute’s trained reviewers.
  • It contains detailed information on coffee samples from different countries, focusing on nine attributes like aroma, flavor, aftertaste, acidity, body, balance, uniformity, cup cleanliness, sweetness.
  • Total cup points measures the overall coffee quality.
Rows: 1,337
Columns: 43
$ total_cup_points      <dbl> 90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75,…
$ species               <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Ara…
$ owner                 <chr> "metad plc", "metad plc", "grounds for health ad…
$ country_of_origin     <chr> "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia",…
$ farm_name             <chr> "metad plc", "metad plc", "san marcos barrancas …
$ lot_number            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ mill                  <chr> "metad plc", "metad plc", NA, "wolensu", "metad …
$ ico_number            <chr> "2014/2015", "2014/2015", NA, NA, "2014/2015", N…
$ company               <chr> "metad agricultural developmet plc", "metad agri…
$ altitude              <chr> "1950-2200", "1950-2200", "1600 - 1800 m", "1800…
$ region                <chr> "guji-hambela", "guji-hambela", NA, "oromia", "g…
$ producer              <chr> "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabes…
$ number_of_bags        <dbl> 300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 3…
$ bag_weight            <chr> "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg"…
$ in_country_partner    <chr> "METAD Agricultural Development plc", "METAD Agr…
$ harvest_year          <chr> "2014", "2014", NA, "2014", "2014", "2013", "201…
$ grading_date          <chr> "April 4th, 2015", "April 4th, 2015", "May 31st,…
$ owner_1               <chr> "metad plc", "metad plc", "Grounds for Health Ad…
$ variety               <chr> NA, "Other", "Bourbon", NA, "Other", NA, "Other"…
$ processing_method     <chr> "Washed / Wet", "Washed / Wet", NA, "Natural / D…
$ aroma                 <dbl> 8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, …
$ flavor                <dbl> 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, …
$ aftertaste            <dbl> 8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, …
$ acidity               <dbl> 8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, …
$ body                  <dbl> 8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, …
$ balance               <dbl> 8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, …
$ uniformity            <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
$ clean_cup             <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
$ sweetness             <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
$ cupper_points         <dbl> 8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, …
$ moisture              <dbl> 0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, …
$ category_one_defects  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ quakers               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ color                 <chr> "Green", "Green", NA, "Green", "Green", "Bluish-…
$ category_two_defects  <dbl> 0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, …
$ expiration            <chr> "April 3rd, 2016", "April 3rd, 2016", "May 31st,…
$ certification_body    <chr> "METAD Agricultural Development plc", "METAD Agr…
$ certification_address <chr> "309fcf77415a3661ae83e027f7e5f05dad786e44", "309…
$ certification_contact <chr> "19fef5a731de2db57d16da10287413f5f99bc2dd", "19f…
$ unit_of_measurement   <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m"…
$ altitude_low_meters   <dbl> 1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, …
$ altitude_high_meters  <dbl> 2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, …
$ altitude_mean_meters  <dbl> 2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, …
Rows: 19
Columns: 3
$ country     <fct> "Ethiopia", "Kenya", "Uganda", "Colombia", "El Salvador", …
$ mean_rating <dbl> 85.48409, 84.30960, 83.45194, 83.10656, 83.05286, 82.92750…
$ n           <int> 44, 25, 36, 183, 21, 16, 51, 32, 20, 132, 40, 75, 181, 73,…

Let’s start from the bottom of the list

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

Use color hue to visualize average ratings

Easy: which has higher ratings, Kenya or Indonesia?

Use color hue to visualize average ratings

Hard: which has higher ratings, Indonesia or Costa Rica?

What about now?

Observation: alphabetical ordering of the categorical variable is almost never useful, re-rank as needed.

Move up one level to color saturation

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

Use color saturation to visualize average ratings

No legend?

No problem.

Because color saturation has natural ordering.

Color saturation is easier to quantify

The ratio between Mexico and United States is…

2 or 3

Moving down to the third level of estimation

Move up one level to area

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

This is weird graph but still informative

Move up one level to angle

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

Use angle to visualize coffee bean varieties

Pie chart uses angle to encode quantitative information

Don’t do this!

Or this!

Pie chart uses angle to encode quantitative information

This is fine

For categorical data, no more than 6 colors is best.

(Source: European Environment Agency)

We are so close!

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

Wait, I thought there is some difference…

The start-at-zero rule

How to Lie with Statistics (1954)

  • Huff argues that truncating the y-axis can exaggerate differences and mislead the viewer.
  • It creates a false impression of dramatic change where the actual variation is small.

The Visual Display of Quantitative Information (1983)

  • Tufte prioritizes data density and the detection of subtle patterns.
  • He argues that starting at zero can waste valuable space, obscuring meaningful variations.

Combined MMR vaccination rate, 1994/95 to 2014/15, England

Vaccination levels are consistently high over the last 20 years. So there’s nothing to worry about, right?

Take another look, axis doesn’t start at zero

An optional break symbol that can help draw attention to the fact axis doesn’t start at zero. Swap from a “don’t worry” version to a “there is still work to be done” version.

Position, but not a common scale

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

Position, and a common scale

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

Position, and a common scale

  1. Position on a common scale
  2. Position on non-aligned scales
  3. Length
  4. Angle
  5. Area
  6. Volume <> Density <> Color saturation
  7. Color hue

Re-ranking categorical variables still matters!

Climate stripes: a discussion

Climate stripes are a popular visualization of global temperature trends. They use color hue to represent temperature anomalies.

Source: Ed Hawkins/showyourstripes.info.

Discuss:

  • What are the strengths and weaknesses of the climate stripes visualization?
  • Why do you think the climate stripes have become so popular?
  • How does the emotional impact of the visualization affect its effectiveness?
  • When is it ok to sacrifice data precision for impact?

Outline for today

  1. How humans see data

2. Introduction to R Shiny

  1. Introduction to Shiny Assistant

  2. Next-steps

What is R Shiny?

  • Shiny is an R package that enables the creation of interactive web applications directly from R.
  • Ideal for data visualization, dashboards, and dynamic reports.
  • No extensive web development experience required.

Why Use Shiny?

From this

(Hu et al, 2016)

To this (Liddie et al, 2023)

Basic Structure of a Shiny App

A Shiny app comprises two main components:

Your first shiny app

Reactivity in Shiny

  • Reactivity is a core concept in Shiny, where changes in user inputs automatically update the outputs.
  • The server function watches for input changes and updates outputs without requiring explicit user intervention.

A familiar example in spreadsheets

In other words, the output reacts to changes in the input.

Back to the toy example

# Global variables can go here
n <- 200

# Define the UI
ui <- bootstrapPage(
  numericInput(inputId = 'n', label = 'Number of obs', value = n),
  plotOutput(outputId = 'myplot')
)

# Define the server code
server <- function(input, output) {
  output$myplot <- renderPlot({
    hist(runif(input$n))
  })
}

# Return a Shiny app object
shinyApp(ui = ui, server = server)
  • numericInput('n', 'Number of obs', n)
    • Creates a numeric input box where users enter a number.
    • 'n' → The input ID (referenced in the server).
    • 'Number of obs' → The label displayed next to the input field.
  • plotOutput('myplot')
    • Reserves space in the UI to display a plot.
    • 'myplot' → The output ID (referenced in the server function).

Back to the toy example

# Global variables can go here
n <- 200

# Define the UI
ui <- bootstrapPage(
  numericInput(inputId = 'n', label = 'Number of obs', value = n),
  plotOutput(outputId = 'myplot')
)

# Define the server code
server <- function(input, output) {
  output$myplot <- renderPlot({
    hist(runif(input$n))
  })
}

# Return a Shiny app object
shinyApp(ui = ui, server = server)
  • output$myplot <- renderPlot({...})
    • Assigns a dynamically generated plot to the UI element plotOutput("plot").
    • Uses renderPlot(), which is a Shiny function for rendering reactive plots.
  • hist(runif(input$n))
    • Generates a histogram of random uniform numbers.
    • runif(input$n): Produces n random numbers from a uniform distribution between 0 and 1.
    • Each time input$n changes, the histogram updates automatically.

UI Inputs

Rendering functions

Diving deeper into reactive programming

Three components of reactive objects exist in Shiny and they are:

  • A reactive input is a user input that comes through the browser interface
  • A reactive output is something that appears in the user’s browser window, such as a plot or a table
  • A reactive expression is a component between an input and an output

1. ui: Add a UI element for the user to select which species of coffee beans they want to plot with selectInput().

selectInput(
  inputId = "country_filter",
  label = "Select country:",
  choices = c("All", sort(unique(coffee_clean$country_of_origin))),
  selected = "All"
)
  • We define an inputId() that we’ll use to refer to the input element to later in the app
  • We come up with a user facing label
  • We specify the choices users can select from, as well as a default choice

2. server: Filter for chosen coffee beans and save the new data frame as a reactive expression.

filtered_data <- reactive({
  data <- coffee_clean
  if (input$country_filter != "All") {
    data <- data %>% filter(country_of_origin == input$country_filter)
  }
  return(data)
})
  • This creates a cached expression that knows it is out of date when its input changes
  • We check the necessity of filtering based on the user input
  • We surround the expression with curly braces

3. server: Use filtered_data (which is reactive) for plotting.

  output$bean_variety <- renderPlotly({
    req(nrow(filtered_data()) > 0)
    p <- ggplot(filtered_data(), aes(x = n, y = reorder(variety, n), fill = n)) +
      geom_col()
    ggplotly(p)
  })
  • This creates a plot using the reactive expression we defined earlier
  • The () after filtered_data() indicates it is reactive
  • filtered_data() is a cached expression, only rerun when inputs change

Functions vs. reactives

While functions and reactives help accomplish similar goals in terms of not-repeating oneself, they’re different in implementation.

  • Each time you call a function, R will evaluate it.

  • However reactive expressions are lazy, they only get executed when their input changes.

Switch to RStudio!

Outline for today

  1. How humans see data

  2. Introduction to R Shiny

3. Introduction to Shiny Assistant

  1. Next-steps

What is Shiny Assistant?

https://gallery.shinyapps.io/assistant/

Use Shiny Assistant to recreate the shiny app we just saw

Prompt

Use the tidytuesday coffee ratings dataset, located at https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv to develop a dashboard.

The title should be “Coffee Data Dashboard”, use the shinydashboard format.

On the left, there are three inputs, a drop down menu for species, a checkbox for quality category, and a multiselect box for country, pre-select “Ethiopia” and “India”.

On the right, there should be four plotly graphs. First one is the number of bean variety. Second is the average flavor profile shown in a radar chart. Third is a box plot showing coffee ratings by processing methods. Fourth is a choropleth map showing average coffee ratings by country.

Your turn!

đź’ˇ Try one of these built-in datasets to explore with Shiny Assistant!

Dataset Description
mtcars đźš— Car performance data (mpg, cylinders, hp)
iris 🌸 Iris flower measurements (sepal, petal, species)
diamonds đź’Ž Diamond pricing (carat, cut, price, etc.)
faithful 🌋 Old Faithful geyser eruptions (duration, waiting time)
airquality 🌍 New York air quality data (Ozone, Temp, Wind)
ToothGrowth 🦷 Vitamin C & tooth growth in guinea pigs

🛠️ Try it out yourself!

đź’¬ What did you learn? What worked well? Any surprises?

Outline for today

  1. How humans see data

  2. Introduction to R Shiny

  3. Introduction to Shiny Assistant

4. Next-steps

You can deploy an app for free on shinyapps.io

Follow this guide

  1. Create a shinyapps.io account
  2. Open your tokens, click “Show”, copy the code
  3. Run the code in RStudio
  4. Deploy your app:
library(rsconnect)
deployApp()

How much time do you have?

  • 10 min: Print out this Shiny for R cheatsheet
  • 2.5 hrs: Follow this Posit tutorial
  • 1 week: If you are an EOH student, participate in the 2025 EOH Data Visualization Competition!
  • 6 weeks: Sign-up for PUBH6199 Visualizing Data with R this summer!
  • Lifetime: Check out resources like the Shiny Gallery, TidyTuesday, and Mastering Shiny book

2025 EOH Data Visualization Competition

REACH Climate and Health Research Fellowship

PUBH6199 Visualizing Data with R

Further resources

Thank you!

Slides created via Quarto

The template comes from Tom Mock