Exploring OVB with Snails

This app shows the consqeuences of different model structures for the analysis of data contaminated by omitted variable bias.

Consider a system (see the system tab) where you are interested in how temperature affects snail abundance. However, temperature is driven by oceanography at a site scale, and oceanography also drives recruitment. You have measured neither oceanography nor recruitment.

In your work, you have measured sites anually (or assessed plots within sites in one year - same models - as long as there is variation in temperature between plots!) and both temperature and number of snails is recorded. Temperature within one replicate is influenced by both site-level average temperature as well as more immediate varying conditions - i.e., either variation within a site or variation across years.

Snail abundance is influenced both by temperature as well as site-level recruitment. To remind you, a) we have no measure of site level recruitment and b) site level recruitment and temperature are collinear - both are driven by site-level oceanography.

The models we use to analyze the data are as follows for year (or plot) i in site j:

Naive model:
lm(y ~ x)
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + eta_1$

RE model:
lmer(y ~ x + (1|site))
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + \beta_1 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$

FE Mean Differenced model:
lm(y_dev_from_site_mean ~ x_dev_from_site_mean)
$y_{ij} - \bar{y_i} \sim \mathcal{N}(\widehat{y_{ij} - \bar{y_i}}, \sigma^2)$
$\widehat{y_{ij} - \bar{y_i}} = \beta_1 (x_{ij} - \bar{x_i})$

FE Dummy Variables model:
lm(y ~ x + site)
$y_i \sim \mathcal{N}(\widehat{y_i}, \sigma^2)$
$\widehat{y_ij} = \beta_0 x_{1ij} + \sum_{k=1}^{j} \beta_k x_{2ij}$
$x_{2ij} = 1 \text{ if site} = j, 0 \text{ otherwise}$

Group Mean Covariate model:
lmer(y ~ x + x_site_mean + (1|site))
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + + \beta_1 \bar{x_j} + \beta_2 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$

Group Mean Centered model:
lmer(y ~ x_dev_from_site_mean + x_site_mean + (1|site))
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 (x_{ij} - \bar{x_j}) + \beta_1 \bar{x_j} + \beta_2 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$

First Differences model - assumed 1 measurement per site per year:
lm(delta_y ~ delta_x)
$\Delta y_{ij} \sim \mathcal{N}(\widehat{\Delta y_{ij}}, \sigma^2)$
$\widehat{\Delta y_{ij}} = \beta_0 \Delta_x{ij} + \beta_2$

Code for app is at github

These are the coefficients for the relevant term in each model that dives the estimated effect of temperature on snail abundance. For the GMC model this is $x-\bar{x_j}$ and for the panel model, this is $\Delta x$

The bias of the coefficients relative to the true value

The difference between the RE estimated temperature effect and that from other models.

The difference between the estimated effect and 0 or or the true coefficient value.

Exploring Different Methods for Coping with Omitted Variable Bias in the Relationship Between Snails and Temperature

Site-level Properties

Plot-level Properties