Data Visualization Workshop

with R shiny and Shiny Assistant

Dr. Xindi (Cindy) Hu, Assistant Professor at GWSPH

03-21-2025

github.com/xindyhu/reach-data-viz-workshop

Meet Your Instructor 👩

Xindi (Cindy) Hu, ScD

Assistant Professor, George Washington University

Environmental Data Scientist and Environmental Health Researcher
2024- Water, Health, Opportunity Lab at GW
2018-2024 Principal Data Scientist at Mathematica, Inc.
2018 ScD in Environmental Health, Harvard T.H. Chan School of Public Health
2014 MS in Environmental Health, Harvard T.H. Chan School of Public Health
2012 BS in Environmental Science, Peking University, China

Meet your classmates 👥

Join at menti.com using code 87160620

Acknowledgements

Much of the content in this section is from John Rauser’s talk on YouTube and JP Helveston’s course on Exploratory Data Analysis

A special thanks to Sayam Mukesh Palrecha, a Master of Science in Data Science candidate at GWU, class of 2026, for his invaluable assistance with preparing the R markdown and the R shiny code.

Learning objectives

Understand the basics of data visualization
Learn how to create interactive data visualization using R Shiny
Learn how to use Shiny Assistant to create interactive data visualization

Outline for today

1. How humans see data

Introduction to R Shiny
Introduction to Shiny Assistant
Next-steps

Good data visualization is optimized for our visual-memory system

Helps us understand trends and patterns
Makes data more accessible to different audiences
Useful in decision-making and communication

The power of pre-attentive processing

Count all the 5s in the following image

The power of pre-attentive processing

Count all the 5s in the following image

What is pre-attentive processing?

Rapid, automatic processing of visual information before conscious attention kicks in.
Happens within <250 milliseconds.
Helps identify key patterns without effort.

Not all pre-attentive features are created equal

Feature Type	Example
Color	🔴🔵 Different colored objects stand out
Size	📏 Larger objects draw attention first
Orientation	↗ A tilted line among vertical lines
Shape	◼️ ⬤ A square among circles

Where is the red dot?

Pre-Attentive vs. Attentive Processing

Feature	Pre-Attentive	Attentive
Speed	Instant (<250 ms)	Slow, deliberate
Effort	Unconscious	Requires focus
Example	Spotting a red dot in a sea of gray	Solving a math problem

🧠 Designing charts with pre-attentive features helps viewers understand data instantly!

Why Does This Matter for Data Visualization?

Viewers process visuals before reading text.
Using pre-attentive attributes can:
- Direct focus to key insights.
- Reduce cognitive load for interpretation.
- Make data storytelling more effective.

💡 Good data visualization = Less work for the brain!

Cleveland’s three visual operations of pattern perception

🎯 Detection
Recognizing that a geometric object encodes a physical value.

🧩 Assembly
Grouping detected graphical elements into patterns.

📏 Estimation

Visually assessing the relative magnitude of two or more values. (Focus of today!)

Three levels of estimation

Level	Example
1. Discrimination	X = Y X != Y
2. Ranking	X < Y X > Y
3. Ratioing	X / Y = ?

📏 We want to get as far down this list as possible with efficiency and accuracy

The most important measurement should exploit the highest ranked encoding possible

Source: Yau, N. (2013). Data Points: Visualization That Means Something. Wiley.

Introducing the coffee ratings dataset

These data contain reviews of 1312 arabica and 28 robusta coffee beans from the Coffee Quality Institute’s trained reviewers.
It contains detailed information on coffee samples from different countries, focusing on nine attributes like aroma, flavor, aftertaste, acidity, body, balance, uniformity, cup cleanliness, sweetness.
Total cup points measures the overall coffee quality.

Rows: 1,337
Columns: 43
$ total_cup_points      <dbl> 90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75,…
$ species               <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Ara…
$ owner                 <chr> "metad plc", "metad plc", "grounds for health ad…
$ country_of_origin     <chr> "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia",…
$ farm_name             <chr> "metad plc", "metad plc", "san marcos barrancas …
$ lot_number            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ mill                  <chr> "metad plc", "metad plc", NA, "wolensu", "metad …
$ ico_number            <chr> "2014/2015", "2014/2015", NA, NA, "2014/2015", N…
$ company               <chr> "metad agricultural developmet plc", "metad agri…
$ altitude              <chr> "1950-2200", "1950-2200", "1600 - 1800 m", "1800…
$ region                <chr> "guji-hambela", "guji-hambela", NA, "oromia", "g…
$ producer              <chr> "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabes…
$ number_of_bags        <dbl> 300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 3…
$ bag_weight            <chr> "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg"…
$ in_country_partner    <chr> "METAD Agricultural Development plc", "METAD Agr…
$ harvest_year          <chr> "2014", "2014", NA, "2014", "2014", "2013", "201…
$ grading_date          <chr> "April 4th, 2015", "April 4th, 2015", "May 31st,…
$ owner_1               <chr> "metad plc", "metad plc", "Grounds for Health Ad…
$ variety               <chr> NA, "Other", "Bourbon", NA, "Other", NA, "Other"…
$ processing_method     <chr> "Washed / Wet", "Washed / Wet", NA, "Natural / D…
$ aroma                 <dbl> 8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, …
$ flavor                <dbl> 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, …
$ aftertaste            <dbl> 8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, …
$ acidity               <dbl> 8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, …
$ body                  <dbl> 8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, …
$ balance               <dbl> 8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, …
$ uniformity            <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
$ clean_cup             <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
$ sweetness             <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
$ cupper_points         <dbl> 8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, …
$ moisture              <dbl> 0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, …
$ category_one_defects  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ quakers               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ color                 <chr> "Green", "Green", NA, "Green", "Green", "Bluish-…
$ category_two_defects  <dbl> 0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, …
$ expiration            <chr> "April 3rd, 2016", "April 3rd, 2016", "May 31st,…
$ certification_body    <chr> "METAD Agricultural Development plc", "METAD Agr…
$ certification_address <chr> "309fcf77415a3661ae83e027f7e5f05dad786e44", "309…
$ certification_contact <chr> "19fef5a731de2db57d16da10287413f5f99bc2dd", "19f…
$ unit_of_measurement   <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m"…
$ altitude_low_meters   <dbl> 1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, …
$ altitude_high_meters  <dbl> 2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, …
$ altitude_mean_meters  <dbl> 2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, …

Rows: 19
Columns: 3
$ country     <fct> "Ethiopia", "Kenya", "Uganda", "Colombia", "El Salvador", …
$ mean_rating <dbl> 85.48409, 84.30960, 83.45194, 83.10656, 83.05286, 82.92750…
$ n           <int> 44, 25, 36, 183, 21, 16, 51, 32, 20, 132, 40, 75, 181, 73,…

(Link to dataset)

Let’s start from the bottom of the list

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

Use color hue to visualize average ratings

Easy: which has higher ratings, Kenya or Indonesia?

Use color hue to visualize average ratings

Hard: which has higher ratings, Indonesia or Costa Rica?

What about now?

Observation: alphabetical ordering of the categorical variable is almost never useful, re-rank as needed.

Move up one level to color saturation

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

Use color saturation to visualize average ratings

No legend?

No problem.

Because color saturation has natural ordering.

Color saturation is easier to quantify

The ratio between Mexico and United States is…

2 or 3

Moving down to the third level of estimation

Move up one level to area

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

This is weird graph but still informative

Move up one level to angle

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

Use angle to visualize coffee bean varieties

Pie chart uses angle to encode quantitative information

Don’t do this!

Or this!

Pie chart uses angle to encode quantitative information

This is fine

For categorical data, no more than 6 colors is best.

(Source: European Environment Agency)

We are so close!

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

Wait, I thought there is some difference…

The start-at-zero rule

How to Lie with Statistics (1954)

Huff argues that truncating the y-axis can exaggerate differences and mislead the viewer.
It creates a false impression of dramatic change where the actual variation is small.

The Visual Display of Quantitative Information (1983)

Tufte prioritizes data density and the detection of subtle patterns.
He argues that starting at zero can waste valuable space, obscuring meaningful variations.

Combined MMR vaccination rate, 1994/95 to 2014/15, England

Vaccination levels are consistently high over the last 20 years. So there’s nothing to worry about, right?

Take another look, axis doesn’t start at zero

An optional break symbol that can help draw attention to the fact axis doesn’t start at zero. Swap from a “don’t worry” version to a “there is still work to be done” version.

Position, but not a common scale

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

Position, and a common scale

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

Position, and a common scale

Position on a common scale
Position on non-aligned scales
Length
Angle
Area
Volume <> Density <> Color saturation
Color hue

Re-ranking categorical variables still matters!

Climate stripes: a discussion

Climate stripes are a popular visualization of global temperature trends. They use color hue to represent temperature anomalies.

Source: Ed Hawkins/showyourstripes.info.

Discuss:

What are the strengths and weaknesses of the climate stripes visualization?
Why do you think the climate stripes have become so popular?
How does the emotional impact of the visualization affect its effectiveness?
When is it ok to sacrifice data precision for impact?

Outline for today

How humans see data

2. Introduction to R Shiny

Introduction to Shiny Assistant
Next-steps

What is R Shiny?

Shiny is an R package that enables the creation of interactive web applications directly from R.
Ideal for data visualization, dashboards, and dynamic reports.
No extensive web development experience required.

Why Use Shiny?

From this

(Hu et al, 2016)

To this (Liddie et al, 2023)

Basic Structure of a Shiny App

A Shiny app comprises two main components:

Your first shiny app

Reactivity in Shiny

Reactivity is a core concept in Shiny, where changes in user inputs automatically update the outputs.
The server function watches for input changes and updates outputs without requiring explicit user intervention.

A familiar example in spreadsheets

In other words, the output reacts to changes in the input.

Back to the toy example

# Global variables can go here
n <- 200

# Define the UI
ui <- bootstrapPage(
  numericInput(inputId = 'n', label = 'Number of obs', value = n),
  plotOutput(outputId = 'myplot')
)

# Define the server code
server <- function(input, output) {
  output$myplot <- renderPlot({
    hist(runif(input$n))
  })
}

# Return a Shiny app object
shinyApp(ui = ui, server = server)

numericInput('n', 'Number of obs', n)
- Creates a numeric input box where users enter a number.
- 'n' → The input ID (referenced in the server).
- 'Number of obs' → The label displayed next to the input field.

plotOutput('myplot')
- Reserves space in the UI to display a plot.
- 'myplot' → The output ID (referenced in the server function).

Back to the toy example

# Global variables can go here
n <- 200

# Define the UI
ui <- bootstrapPage(
  numericInput(inputId = 'n', label = 'Number of obs', value = n),
  plotOutput(outputId = 'myplot')
)

# Define the server code
server <- function(input, output) {
  output$myplot <- renderPlot({
    hist(runif(input$n))
  })
}

# Return a Shiny app object
shinyApp(ui = ui, server = server)

output$myplot <- renderPlot({...})
- Assigns a dynamically generated plot to the UI element plotOutput("plot").
- Uses renderPlot(), which is a Shiny function for rendering reactive plots.
hist(runif(input$n))
- Generates a histogram of random uniform numbers.
- runif(input$n): Produces n random numbers from a uniform distribution between 0 and 1.
- Each time input$n changes, the histogram updates automatically.

UI Inputs

Rendering functions

Diving deeper into reactive programming

Three components of reactive objects exist in Shiny and they are:

A reactive input is a user input that comes through the browser interface
A reactive output is something that appears in the user’s browser window, such as a plot or a table
A reactive expression is a component between an input and an output

1. ui: Add a UI element for the user to select which species of coffee beans they want to plot with selectInput().

selectInput(
  inputId = "country_filter",
  label = "Select country:",
  choices = c("All", sort(unique(coffee_clean$country_of_origin))),
  selected = "All"
)

We define an inputId() that we’ll use to refer to the input element to later in the app
We come up with a user facing label
We specify the choices users can select from, as well as a default choice

2. server: Filter for chosen coffee beans and save the new data frame as a reactive expression.

filtered_data <- reactive({
  data <- coffee_clean
  if (input$country_filter != "All") {
    data <- data %>% filter(country_of_origin == input$country_filter)
  }
  return(data)
})

This creates a cached expression that knows it is out of date when its input changes
We check the necessity of filtering based on the user input
We surround the expression with curly braces

3. server: Use filtered_data (which is reactive) for plotting.

  output$bean_variety <- renderPlotly({
    req(nrow(filtered_data()) > 0)
    p <- ggplot(filtered_data(), aes(x = n, y = reorder(variety, n), fill = n)) +
      geom_col()
    ggplotly(p)
  })

This creates a plot using the reactive expression we defined earlier
The () after filtered_data() indicates it is reactive
filtered_data() is a cached expression, only rerun when inputs change

Functions vs. reactives

While functions and reactives help accomplish similar goals in terms of not-repeating oneself, they’re different in implementation.

Each time you call a function, R will evaluate it.
However reactive expressions are lazy, they only get executed when their input changes.

Switch to RStudio!

Outline for today

How humans see data
Introduction to R Shiny

3. Introduction to Shiny Assistant

Next-steps

What is Shiny Assistant?

https://gallery.shinyapps.io/assistant/

Use Shiny Assistant to recreate the shiny app we just saw

Prompt

Use the tidytuesday coffee ratings dataset, located at https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv to develop a dashboard.

The title should be “Coffee Data Dashboard”, use the shinydashboard format.

On the left, there are three inputs, a drop down menu for species, a checkbox for quality category, and a multiselect box for country, pre-select “Ethiopia” and “India”.

On the right, there should be four plotly graphs. First one is the number of bean variety. Second is the average flavor profile shown in a radar chart. Third is a box plot showing coffee ratings by processing methods. Fourth is a choropleth map showing average coffee ratings by country.

Your turn!

💡 Try one of these built-in datasets to explore with Shiny Assistant!

Dataset	Description
`mtcars` 🚗	Car performance data (mpg, cylinders, hp)
`iris` 🌸	Iris flower measurements (sepal, petal, species)
`diamonds` 💎	Diamond pricing (carat, cut, price, etc.)
`faithful` 🌋	Old Faithful geyser eruptions (duration, waiting time)
`airquality` 🌍	New York air quality data (Ozone, Temp, Wind)
`ToothGrowth` 🦷	Vitamin C & tooth growth in guinea pigs

🛠️ Try it out yourself!

💬 What did you learn? What worked well? Any surprises?

Outline for today

How humans see data
Introduction to R Shiny
Introduction to Shiny Assistant

4. Next-steps

You can deploy an app for free on `shinyapps.io`

Follow this guide

Create a shinyapps.io account
Open your tokens, click “Show”, copy the code
Run the code in RStudio
Deploy your app:

library(rsconnect)
deployApp()

How much time do you have?

10 min: Print out this Shiny for R cheatsheet
2.5 hrs: Follow this Posit tutorial
1 week: If you are an EOH student, participate in the 2025 EOH Data Visualization Competition!
6 weeks: Sign-up for PUBH6199 Visualizing Data with R this summer!
Lifetime: Check out resources like the Shiny Gallery, TidyTuesday, and Mastering Shiny book

2025 EOH Data Visualization Competition

Must be a current EOH student to participate
Visualizations must be submitted via a publicly accessible URL
Submissions will be evaluated based on:
- Clarity and insight
- Creativity and innovation
- Design and Aesthetics
- Usability and Accessibility
March 21–March 28, 2025
Get inspiration
- Information is Beautiful; TidyTuesday; R shiny gallery
Find datasets
- Kaggle; US Government Open Data; US Census Data; Tableau: Free Public Datasets

REACH Climate and Health Research Fellowship

PUBH6199 Visualizing Data with R

Further resources

Books
- Mastering Shiny, by Hadley Wickham
- R for Data Science, by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund
Podcasts
- Tidy Tuesday, hosted by Jon Harmon
- Data is Plural, hosted by Jeremy Singer-Vine
Decision trees for chart types
- Chart Suggestions – A Thought-Starter, by A. Abela
- From Data to Viz, by Yan Holtz & Conor Healy

Thank you!

Slides created via Quarto

The template comes from Tom Mock

Data Visualization Workshop

with R shiny and Shiny Assistant

Dr. Xindi (Cindy) Hu, Assistant Professor at GWSPH

03-21-2025

github.com/xindyhu/reach-data-viz-workshop

Meet Your Instructor 👩

Xindi (Cindy) Hu, ScD

Meet your classmates 👥

Acknowledgements

Learning objectives

Outline for today

Good data visualization is optimized for our visual-memory system

The power of pre-attentive processing

The power of pre-attentive processing

What is pre-attentive processing?

Not all pre-attentive features are created equal

Pre-Attentive vs. Attentive Processing

Why Does This Matter for Data Visualization?

Cleveland’s three visual operations of pattern perception

Three levels of estimation

The most important measurement should exploit the highest ranked encoding possible

Introducing the coffee ratings dataset

Let’s start from the bottom of the list

Use color hue to visualize average ratings

Use color hue to visualize average ratings

What about now?

Move up one level to color saturation

Use color saturation to visualize average ratings

Color saturation is easier to quantify

Move up one level to area

This is weird graph but still informative

Move up one level to angle

Use angle to visualize coffee bean varieties

Pie chart uses angle to encode quantitative information

Pie chart uses angle to encode quantitative information

We are so close!

The start-at-zero rule

How to Lie with Statistics (1954)

The Visual Display of Quantitative Information (1983)

Position, but not a common scale

Position, and a common scale

Position, and a common scale

Climate stripes: a discussion

Outline for today

What is R Shiny?

Why Use Shiny?

Basic Structure of a Shiny App

Your first shiny app

Reactivity in Shiny

Back to the toy example

Back to the toy example

UI Inputs

Rendering functions

Diving deeper into reactive programming

1. ui: Add a UI element for the user to select which species of coffee beans they want to plot with selectInput().

2. server: Filter for chosen coffee beans and save the new data frame as a reactive expression.

3. server: Use filtered_data (which is reactive) for plotting.

Switch to RStudio!

Outline for today

What is Shiny Assistant?

Use Shiny Assistant to recreate the shiny app we just saw

Your turn!

Outline for today

You can deploy an app for free on shinyapps.io

How much time do you have?

2025 EOH Data Visualization Competition

REACH Climate and Health Research Fellowship

PUBH6199 Visualizing Data with R

Further resources

Thank you!

You can deploy an app for free on `shinyapps.io`