knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.width = 7,
  fig.height = 4
)
set.seed(2025)

1 Introduction & Data


1.1 Context and Motivation

This project uses Bayesian methods to study attitudes toward U.S. military aid to Ukraine. The data come from a The Economist / YouGov survey of 1,603 U.S. adults (18+), fielded between February 16–18, 2025. Respondents were asked whether the United States should:

  • Increase aid
  • Maintain current aid
  • Decrease aid
  • Responded “Not sure”

The survey releases only aggregated percentages by political affiliation:

  • Democrats
  • Independents
  • Republicans

For each party we know the percentage of respondents in each of the four aid opinion categories. We do not have micro-level respondent data or exact cell counts.

The goal of this project is not to produce a definitive public-opinion study, but rather to:

  • demonstrate Bayesian hierarchical modeling and Gibbs sampling,
  • quantify uncertainty in party-level support for Ukraine aid,
  • and explore how pooled, separate, and hierarchical models behave on a small, grouped dataset.

These goals align with Module 6: Bayesian Computation in MATH 574, where we learn to implement Markov chain Monte Carlo (MCMC) methods and interpret their results.


1.2 The Data Table

We manually input the published percentages below. Each row is a political affiliation and each column is an aid opinion.

aid <- data.frame(
  party = rep(c("Democrat", "Independent", "Republican"), each = 4),
  opinion = rep(c("Increase", "Maintain", "NotSure", "Decrease"), times = 3),
  percent = c(
    35, 39, 16, 10,   # Democrats
    19, 23, 26, 33,   # Independents
    10, 24, 21, 45    # Republicans
  )
)

knitr::kable(aid,
             caption = "Survey results: percentage of respondents in each aid opinion by party.")
Table 1.1: Survey results: percentage of respondents in each aid opinion by party.
party opinion percent
Democrat Increase 35
Democrat Maintain 39
Democrat NotSure 16
Democrat Decrease 10
Independent Increase 19
Independent Maintain 23
Independent NotSure 26
Independent Decrease 33
Republican Increase 10
Republican Maintain 24
Republican NotSure 21
Republican Decrease 45

We treat the percent column as observed outcomes on a 0–100 scale. This is a simplification: in reality, within each party the four percentages are constrained to sum to 100%, and each cell summarizes many underlying respondent-level outcomes.

However, treating percentages as noisy measurements of an underlying “support level” makes the models directly comparable to the Gaussian hierarchical example in Gelman et al. (BDA3, Ch. 11), which the course materials emphasize.


1.3 Why a Bayesian Approach?

A Bayesian analysis is useful here for several reasons:

  1. Small amount of data
    We only have 12 cells (3 parties × 4 opinions). A classical frequentist analysis has limited flexibility with so few aggregated data points. Bayesian methods can combine prior information with observed data to stabilize estimates.

  2. Group structure (parties)
    Democrats, Independents, and Republicans are naturally viewed as three “groups” or “clusters”. Hierarchical Bayesian models allow us to:

    • estimate group-specific means,
    • share information across groups (partial pooling),
    • and quantify the variance between parties.
  3. Uncertainty and interpretation
    Posterior distributions give credible intervals and predictive distributions, which we can interpret directly as degrees of belief about unknown quantities (e.g., mean support levels).

Our analysis will focus on:

  • how the choice of model (pooled vs separate vs hierarchical) affects the conclusions,
  • and how much uncertainty remains given only 12 data points.

1.4 Basic Exploratory View

We begin with simple summaries and a couple of quick plots to anchor intuition before building models.

summary(aid$percent)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   18.25   23.50   25.08   33.50   45.00
# Wide-format table for convenience
aid_wide <- reshape(
  aid,
  idvar   = "party",
  timevar = "opinion",
  direction = "wide"
)
aid_wide
library(ggplot2)

ggplot(aid, aes(x = opinion, y = percent, fill = party)) +
  geom_col(position = "dodge") +
  ylab("Percent of respondents") +
  xlab("Opinion on U.S. aid to Ukraine") +
  ggtitle("Percent in each opinion category by party") +
  theme_minimal()

From this plot:

  • Democrats lean more heavily toward “Increase” and “Maintain”.
  • Republicans show a much larger share choosing “Decrease”.
  • Independents sit between the two, with substantial mass in “Not sure”.

The Bayesian modeling that follows will formalize these patterns, quantify uncertainty, and explore how much the three parties differ in their latent mean “support level”.


2 Bayesian Models & Gibbs Samplers


2.1 Modeling Framework

We now build three related Bayesian models for the percent outcome:

  1. Pooled model – assumes all 12 observations share a common mean.
  2. Separate model – allows each party to have its own mean, but treats them independently.
  3. Hierarchical model – assumes party means themselves come from a common population distribution, enabling partial pooling.

To simplify notation, let:

  • \(y_k\) be the \(k\)-th observed percentage (for \(k = 1, \dots, 12\)),
  • \(j(k) \in \{1,2,3\}\) be the party index: 1 = Democrat, 2 = Independent, 3 = Republican.

We assume:

\[ y_k \mid \theta_{j(k)}, \sigma^2 \sim \text{Normal}(\theta_{j(k)}, \sigma^2) \]

where:

  • \(\theta_j\) is the mean support level for party \(j\),
  • \(\sigma^2\) is a common residual variance (within-party variation across the four opinions).

The three models differ in how \(\theta_j\) is structured.

Before writing the Gibbs samplers, we create some convenient objects:

y <- aid$percent
party_index <- as.numeric(factor(aid$party))
parties <- levels(factor(aid$party))
J <- length(parties)        # number of parties (3)
n <- length(y)              # total observations (12)

table(party_index)
## party_index
## 1 2 3 
## 4 4 4
parties
## [1] "Democrat"    "Independent" "Republican"

2.2 Priors and General Choices

We use weakly informative priors centered in a reasonable range for percentages. Specifically:

  • Means are centered loosely around 50, with large variance.
  • Variances \(\sigma^2\) and \(\tau^2\) use inverse-gamma priors with small shape parameters, allowing a wide range of values.

These priors are not meant to encode strong substantive beliefs, but to prevent numerical instability and to regularize the estimates slightly when the data are limited.

We also define a small helper to sample from the inverse-gamma distribution using the gamma sampler:

rinv_gamma <- function(n, shape, rate) {
  1 / rgamma(n, shape = shape, rate = rate)
}

We will typically run 8,000 iterations per model and discard the first 2,000 as burn-in.

n_iter  <- 8000
burn_in <- 2000
keep_indices <- (burn_in + 1):n_iter
length(keep_indices)
## [1] 6000

2.3 Pooled Model

2.3.1 Specification

The pooled model assumes a single mean \(\mu\) for all observations:

\[ \begin{aligned} y_k \mid \mu, \sigma^2 &\sim N(\mu, \sigma^2), \quad k = 1,\dots,12 \\ \mu &\sim N(m_0, s_0^2) \\ \sigma^2 &\sim \text{Inverse-Gamma}(a_0, b_0). \end{aligned} \]

We choose \(m_0 = 50\), \(s_0^2 = 30^2\), and \(a_0 = b_0 = 0.001\) to be weakly informative.

2.3.2 Gibbs Sampler

Given \(\sigma^2\), the conditional for \(\mu\) is normal. Given \(\mu\), the conditional for \(\sigma^2\) is inverse-gamma; this is standard normal–inverse-gamma conjugacy.

pooled_gibbs <- function(y, m0 = 50, s0_sq = 30^2,
                         a0 = 0.001, b0 = 0.001,
                         n_iter = 8000) {
  n <- length(y)
  mu    <- numeric(n_iter)
  sigma2 <- numeric(n_iter)

  # Initialize at sample statistics
  mu[1]    <- mean(y)
  sigma2[1] <- var(y)

  for (t in 2:n_iter) {
    # Update mu | sigma2, y
    s_n_sq <- 1 / (n / sigma2[t-1] + 1 / s0_sq)
    m_n    <- s_n_sq * (n * mean(y) / sigma2[t-1] + m0 / s0_sq)
    mu[t]  <- rnorm(1, mean = m_n, sd = sqrt(s_n_sq))

    # Update sigma^2 | mu, y
    a_n <- a0 + n / 2
    b_n <- b0 + 0.5 * sum((y - mu[t])^2)
    sigma2[t] <- rinv_gamma(1, shape = a_n, rate = b_n)
  }

  list(mu = mu, sigma2 = sigma2)
}

post_pooled <- pooled_gibbs(y, n_iter = n_iter)
str(post_pooled)
## List of 2
##  $ mu    : num [1:8000] 25.1 27.3 27.9 26.2 18.8 ...
##  $ sigma2: num [1:8000] 122.6 126.2 81.1 133.1 109.6 ...

The pooled model is intentionally restrictive: it assumes Democrats, Independents, and Republicans all share the same underlying mean support. It serves as a baseline comparison rather than a realistic substantive model.


2.4 Separate (No-Pooling) Model

2.4.1 Specification

The separate model allows each party to have its own mean \(\theta_j\), but does not link them:

\[ \begin{aligned} y_k \mid \theta_{j(k)}, \sigma^2 &\sim N(\theta_{j(k)}, \sigma^2), \\ \theta_j &\sim N(m_0, s_0^2) \quad \text{independently for } j = 1,2,3, \\ \sigma^2 &\sim \text{Inverse-Gamma}(a_0, b_0). \end{aligned} \]

Each party is estimated independently, sharing only the common variance \(\sigma^2\).

2.4.2 Gibbs Sampler

The full conditional for \(\theta_j\) is normal; the full conditional for \(\sigma^2\) is inverse-gamma based on all residuals.

separate_gibbs <- function(y, party_index,
                           m0 = 50, s0_sq = 30^2,
                           a0 = 0.001, b0 = 0.001,
                           n_iter = 8000) {

  J <- length(unique(party_index))
  n <- length(y)

  theta  <- matrix(NA, nrow = n_iter, ncol = J)
  sigma2 <- numeric(n_iter)

  # Initialize means by party-specific sample means
  theta[1, ] <- tapply(y, party_index, mean)
  sigma2[1]  <- var(y)

  for (t in 2:n_iter) {
    # Update each theta_j
    for (j in 1:J) {
      yj <- y[party_index == j]
      nj <- length(yj)
      s_j_sq <- 1 / (nj / sigma2[t-1] + 1 / s0_sq)
      m_j    <- s_j_sq * (nj * mean(yj) / sigma2[t-1] + m0 / s0_sq)
      theta[t, j] <- rnorm(1, mean = m_j, sd = sqrt(s_j_sq))
    }

    # Update sigma^2 using all data
    resid <- y - theta[t, party_index]
    a_n <- a0 + n / 2
    b_n <- b0 + 0.5 * sum(resid^2)
    sigma2[t] <- rinv_gamma(1, shape = a_n, rate = b_n)
  }

  colnames(theta) <- parties
  list(theta = theta, sigma2 = sigma2)
}

post_sep <- separate_gibbs(y, party_index, n_iter = n_iter)
str(post_sep)
## List of 2
##  $ theta : num [1:8000, 1:3] 25 31.6 29.6 22.3 24.2 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:3] "Democrat" "Independent" "Republican"
##  $ sigma2: num [1:8000] 123 199 136 152 106 ...

This model reflects a complete lack of pooling between parties: if one party’s sample mean is extreme, the model will largely retain that extremity, even if the sample is small.


2.5 Hierarchical Model (Partial Pooling)

2.5.1 Specification

The hierarchical model assumes that party means are related through a common population distribution:

\[ \begin{aligned} y_k \mid \theta_{j(k)}, \sigma^2 &\sim N(\theta_{j(k)}, \sigma^2), \\ \theta_j \mid \mu, \tau^2 &\sim N(\mu, \tau^2), \\ \mu &\sim N(m_0, s_0^2), \\ \sigma^2 &\sim \text{Inverse-Gamma}(a_\sigma, b_\sigma), \\ \tau^2 &\sim \text{Inverse-Gamma}(a_\tau, b_\tau). \end{aligned} \]

Here:

  • \(\mu\) is the overall mean support across parties.
  • \(\tau^2\) is the between-party variance.
  • \(\sigma^2\) is the within-party variance across opinion categories.

This introduces partial pooling: if evidence for a party is weak, its mean will be pulled toward \(\mu\); if evidence is strong, it can differ substantially from \(\mu\).

2.5.2 Gibbs Sampler

The full conditionals are all conjugate:

  • Each \(\theta_j \mid \mu, \tau^2, \sigma^2, y\) is normal.
  • \(\mu \mid \theta, \tau^2\) is normal.
  • \(\sigma^2 \mid \theta, y\) and \(\tau^2 \mid \theta, \mu\) are inverse-gamma.
hier_gibbs <- function(y, party_index,
                       m0 = 50, s0_sq = 30^2,
                       a_sigma = 0.001, b_sigma = 0.001,
                       a_tau = 0.001,   b_tau = 0.001,
                       n_iter = 8000) {

  parties <- sort(unique(party_index))
  J <- length(parties)
  n <- length(y)

  theta  <- matrix(NA, nrow = n_iter, ncol = J)
  mu     <- numeric(n_iter)
  sigma2 <- numeric(n_iter)
  tau2   <- numeric(n_iter)

  # Initialize from simple summaries
  theta[1, ] <- tapply(y, party_index, mean)
  mu[1]      <- mean(theta[1, ])
  sigma2[1]  <- var(y)
  tau2[1]    <- var(theta[1, ])

  for (t in 2:n_iter) {

    # 1) Update theta_j | ...
    for (j in 1:J) {
      yj <- y[party_index == j]
      nj <- length(yj)
      s_j_sq <- 1 / (nj / sigma2[t-1] + 1 / tau2[t-1])
      m_j    <- s_j_sq * (nj * mean(yj) / sigma2[t-1] + mu[t-1] / tau2[t-1])
      theta[t, j] <- rnorm(1, mean = m_j, sd = sqrt(s_j_sq))
    }

    # 2) Update mu | theta, tau2
    s_mu_sq <- 1 / (J / tau2[t-1] + 1 / s0_sq)
    m_mu    <- s_mu_sq * (J * mean(theta[t, ]) / tau2[t-1] + m0 / s0_sq)
    mu[t]   <- rnorm(1, mean = m_mu, sd = sqrt(s_mu_sq))

    # 3) Update sigma^2 | theta, y
    resid <- y - theta[t, party_index]
    a_sig_n <- a_sigma + n / 2
    b_sig_n <- b_sigma + 0.5 * sum(resid^2)
    sigma2[t] <- rinv_gamma(1, shape = a_sig_n, rate = b_sig_n)

    # 4) Update tau^2 | theta, mu
    a_tau_n <- a_tau + J / 2
    b_tau_n <- b_tau + 0.5 * sum((theta[t, ] - mu[t])^2)
    tau2[t] <- rinv_gamma(1, shape = a_tau_n, rate = b_tau_n)
  }

  colnames(theta) <- parties
  list(theta = theta, mu = mu, sigma2 = sigma2, tau2 = tau2)
}

post_hier <- hier_gibbs(y, party_index, n_iter = n_iter)
str(post_hier)
## List of 4
##  $ theta : num [1:8000, 1:3] 25 25.2 25.2 25.4 24.9 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:3] "1" "2" "3"
##  $ mu    : num [1:8000] 25.1 25.2 25.1 25 24.6 ...
##  $ sigma2: num [1:8000] 122.6 70 144.5 146.4 75.5 ...
##  $ tau2  : num [1:8000] 0.02083 0.00983 0.04959 0.38259 0.06952 ...

In the next section, we will summarize and compare the three models, look at the posterior distributions for party means, and construct a predictive distribution for a hypothetical new political affiliation.


3 Results, Interpretation, and Discussion


3.1 Posterior Summaries for Party Means

We start by extracting posterior samples for the quantities of interest.

# Discard burn-in and thin if desired
theta_sep_post  <- post_sep$theta[keep_indices, ]
theta_hier_post <- post_hier$theta[keep_indices, ]
mu_hier_post    <- post_hier$mu[keep_indices]
tau2_post       <- post_hier$tau2[keep_indices]
sigma2_post     <- post_hier$sigma2[keep_indices]

# Quick summaries
sep_summary <- apply(theta_sep_post, 2, function(x) {
  c(mean = mean(x),
    quantile(x, c(0.025, 0.5, 0.975)))
})

hier_summary <- apply(theta_hier_post, 2, function(x) {
  c(mean = mean(x),
    quantile(x, c(0.025, 0.5, 0.975)))
})

sep_summary
##       Democrat Independent Republican
## mean  26.21374    26.34508   26.13170
## 2.5%  13.19526    13.55415   12.66676
## 50%   26.08960    26.33411   26.13324
## 97.5% 39.67861    39.68228   39.71870
hier_summary
##              1        2        3
## mean  25.02483 25.04081 25.02749
## 2.5%  18.17276 18.13495 18.15512
## 50%   24.97307 24.93119 24.96914
## 97.5% 32.07020 32.07443 32.13649

The sep_summary table gives posterior means and 95% credible intervals for each party under the separate model; hier_summary does the same under the hierarchical model.

To make this easier to read, we collect results into a single data frame:

parties <- colnames(theta_sep_post)

summary_df <- data.frame(
  party = rep(parties, times = 2),
  model = rep(c("Separate", "Hierarchical"), each = length(parties)),
  mean  = c(sep_summary["mean", ], hier_summary["mean", ]),
  lower = c(sep_summary["2.5%", ], hier_summary["2.5%", ]),
  upper = c(sep_summary["97.5%",], hier_summary["97.5%",])
)

summary_df

3.2 Visual Comparison of Separate vs Hierarchical Estimates

library(ggplot2)

ggplot(summary_df, aes(x = party, y = mean, color = model)) +
  geom_point(position = position_dodge(width = 0.3), size = 3) +
  geom_errorbar(aes(ymin = lower, ymax = upper),
                position = position_dodge(width = 0.3),
                width = 0.15) +
  ylab("Posterior mean percentage") +
  xlab("Party") +
  ggtitle("Posterior means and 95% credible intervals by party") +
  theme_minimal() +
  theme(legend.position = "bottom")

Interpretation:

  • Under both models, Democrats have the highest latent mean percentage across the four aid categories.
  • Republicans have the lowest latent mean percentage.
  • Independents lie between Democrats and Republicans.

The hierarchical model shrinks each party’s mean slightly toward the overall mean \(\mu\). Because we only have four cell values per party, this partial pooling helps avoid over-interpreting small sample fluctuations.


3.3 Overall Hyperparameters in the Hierarchical Model

We also examine the overall mean \(\mu\) and variance components.

hyper_df <- data.frame(
  parameter = c("mu (overall mean)", "tau2 (between-party var)", "sigma2 (within-party var)"),
  mean      = c(mean(mu_hier_post),
                mean(tau2_post),
                mean(sigma2_post)),
  lower     = c(quantile(mu_hier_post, 0.025),
                quantile(tau2_post, 0.025),
                quantile(sigma2_post, 0.025)),
  upper     = c(quantile(mu_hier_post, 0.975),
                quantile(tau2_post, 0.975),
                quantile(sigma2_post, 0.975))
)
hyper_df

Interpretation:

  • \(\mu\) gives an “average party” support level across all three affiliations.
  • \(\tau^2\) captures how far apart party means tend to be.
  • \(\sigma^2\) captures variation within parties (across the four aid categories).

The posterior for \(\tau^2\) being clearly nonzero confirms that parties are meaningfully different from one another. At the same time, the credible interval shows that we are uncertain about the exact magnitude of those differences, which is expected with only three groups.


3.4 Predictive Distribution for a New Political Affiliation

One advantage of the hierarchical model is that we can ask:

If a new political affiliation entered the scene, what mean support level would we expect it to have?

Under the model, a new party’s mean \(\theta_{\text{new}}\) follows:

\[ \theta_{\text{new}} \mid \mu, \tau^2 \sim N(\mu, \tau^2). \]

We approximate this distribution using the posterior draws of \(\mu\) and \(\tau^2\):

theta_new <- rnorm(
  length(mu_hier_post),
  mean = mu_hier_post,
  sd   = sqrt(tau2_post)
)

new_party_summary <- c(
  mean  = mean(theta_new),
  quantile(theta_new, c(0.025, 0.5, 0.975))
)
new_party_summary
##     mean     2.5%      50%    97.5% 
## 25.12772 16.23865 24.97890 34.46476
hist(theta_new, breaks = 30, col = "gray",
     main = "Posterior predictive distribution for new party mean",
     xlab = "theta_new (mean percentage)")
abline(v = new_party_summary["mean"], col = "red", lwd = 2)

Interpretation:

If a new political affiliation behaved like a “typical” party in this data-generating process, its average aid-opinion percentage would likely lie near the overall mean, with uncertainty captured by the predictive credible interval. The wide spread reflects both the small number of parties and the limited data per party.


3.5 Are These Methods Well-Suited to This Situation?

From a computational Bayesian perspective, the methods are well-suited:

  • The normal / inverse-gamma conjugate structure yields simple Gibbs updates.
  • The hierarchical model cleanly illustrates partial pooling, hyperparameters, and predictive inference.
  • With such a small dataset, Bayesian estimation is advantageous for stabilizing estimates and making uncertainty explicit.

From a substantive public-opinion perspective, there are important caveats:

  • We only use 12 aggregated percentages; we do not know the underlying cell sizes.
  • Percentages are constrained to sum to 100% within each party, but the normal model does not enforce this structural relationship.
  • We ignore respondent-level covariates such as age, education, region, or ideology.

Thus, the numerical results should be viewed as illustrative summaries under a stylized model, rather than as definitive measurements of U.S. opinion on Ukraine aid.


3.6 Concerns About Small Sample and Aggregation

Even though the original survey includes 1,603 respondents, our analysis sees only 12 numbers. This creates several issues:

  • Limited information: Posterior inferences depend more heavily on prior choices than if we had full microdata.
  • Aggregation bias: We cannot see within-party heterogeneity (for example, younger vs older Democrats may have very different views).
  • Sensitivity: With only four observations per party, each cell has a strong influence on the estimated party mean.

The hierarchical model helps mitigate these concerns by pooling information across parties, but it cannot completely overcome the limitations of the data.


3.7 How Much Should We Trust the Findings?

We can trust the analysis to:

  • Correctly implement the specified Bayesian models via Gibbs sampling.
  • Produce coherent posterior distributions for party means and variance components.
  • Provide insight into relative differences among Democrats, Independents, and Republicans under a common modeling framework.

We should be cautious in:

  • Interpreting posterior means and credible intervals as precise estimates of true public opinion.
  • Drawing strong policy conclusions solely from this simplified normal model of percentages.

In short:

  • Qualitatively, the models confirm that Democrats are most supportive of Ukraine aid, Republicans are least supportive, and Independents lie between.
  • Quantitatively, the exact numeric values should be interpreted as model-based summaries rather than hard facts about the U.S. population.

3.8 Conclusion

This project applied three Bayesian Gaussian models—pooled, separate, and hierarchical—to a small, aggregated dataset on U.S. attitudes toward military aid to Ukraine.

  • The pooled model serves as a baseline and shows what happens when we ignore group structure.
  • The separate model shows the maximum possible differences between parties, with no pooling.
  • The hierarchical model strikes a balance, allowing differences while stabilizing estimates through partial pooling.

Using Gibbs sampling, we obtained posterior distributions for:

  • party-specific mean support levels,
  • an overall mean across parties,
  • between- and within-party variance components,
  • and the predicted mean support for a hypothetical new affiliation.

From a modeling perspective, this example demonstrates the key ideas of Bayesian hierarchical inference, MCMC computation, and posterior prediction in a compact setting, fulfilling the goals of the course’s computational module.


3.9 References

  1. The Economist / YouGov Survey (2025). U.S. public opinion on Ukraine aid.
    February 16–18, 2025. Data retrieved from publicly released summary tables.

  2. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D., Vehtari, A., & Rubin, D. B. (2013).
    Bayesian Data Analysis (3rd ed.). Chapman and Hall/CRC.

  3. Jamshidi, S. (2025). MATH 574 – Bayesian Computational Statistics:
    Course materials, modules, and lecture notes, Illinois Institute of Technology.

  4. ChatGPT (OpenAI) (2025). Assistance in drafting narrative structure and boilerplate text, reviewed and validated by the author.