This app shows the consqeuences of different model structures
for the analysis of data contaminated by omitted variable bias.
Consider a system (see the system tab) where you are interested in how temperature affects snail abundance. However, temperature is driven by oceanography at a site scale, and oceanography also drives recruitment. You have measured neither oceanography nor recruitment.
In your work, you have measured sites anually (or assessed plots within sites in one year - same models - as long as there is variation in temperature between plots!) and both temperature and number of snails is recorded. Temperature within one replicate is influenced by both site-level average temperature as well as more immediate varying conditions - i.e., either variation within a site or variation across years.
Snail abundance is influenced both by temperature as well as site-level recruitment. To remind you, a) we have no measure of site level recruitment and b) site level recruitment and temperature are collinear - both are driven by site-level oceanography.
The models we use to analyze the data are as follows for year (or plot) i in site j:
Naive model:
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + eta_1$
RE model:
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + \beta_1 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$
FE Mean Differenced model:
$y_{ij} - \bar{y_i} \sim \mathcal{N}(\widehat{y_{ij} - \bar{y_i}}, \sigma^2)$
$\widehat{y_{ij} - \bar{y_i}} = \beta_1 (x_{ij} - \bar{x_i})$
FE Dummy Variables model:
$y_i \sim \mathcal{N}(\widehat{y_i}, \sigma^2)$
$\widehat{y_ij} = \beta_0 x_{1ij} + \sum_{k=1}^{j} \beta_k x_{2ij}$
$x_{2ij} = 1 \text{ if site} = j, 0 \text{ otherwise}$
Group Mean Covariate model:
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + + \beta_1 \bar{x_j} + \beta_2 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$
Group Mean Centered model:
$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 (x_{ij} - \bar{x_j}) + \beta_1 \bar{x_j} + \beta_2 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$
First Differences model - assumed 1 measurement per site per year:
$\Delta y_{ij} \sim \mathcal{N}(\widehat{\Delta y_{ij}}, \sigma^2)$
$\widehat{\Delta y_{ij}} = \beta_0 \Delta_x{ij} + \beta_2$
Code for app is at github
Consider a system (see the system tab) where you are interested in how temperature affects snail abundance. However, temperature is driven by oceanography at a site scale, and oceanography also drives recruitment. You have measured neither oceanography nor recruitment.
In your work, you have measured sites anually (or assessed plots within sites in one year - same models - as long as there is variation in temperature between plots!) and both temperature and number of snails is recorded. Temperature within one replicate is influenced by both site-level average temperature as well as more immediate varying conditions - i.e., either variation within a site or variation across years.
Snail abundance is influenced both by temperature as well as site-level recruitment. To remind you, a) we have no measure of site level recruitment and b) site level recruitment and temperature are collinear - both are driven by site-level oceanography.
The models we use to analyze the data are as follows for year (or plot) i in site j:
Naive model:
lm(y ~ x)$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + eta_1$
RE model:
lmer(y ~ x + (1|site))$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + \beta_1 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$
FE Mean Differenced model:
lm(y_dev_from_site_mean ~ x_dev_from_site_mean)$y_{ij} - \bar{y_i} \sim \mathcal{N}(\widehat{y_{ij} - \bar{y_i}}, \sigma^2)$
$\widehat{y_{ij} - \bar{y_i}} = \beta_1 (x_{ij} - \bar{x_i})$
FE Dummy Variables model:
lm(y ~ x + site)$y_i \sim \mathcal{N}(\widehat{y_i}, \sigma^2)$
$\widehat{y_ij} = \beta_0 x_{1ij} + \sum_{k=1}^{j} \beta_k x_{2ij}$
$x_{2ij} = 1 \text{ if site} = j, 0 \text{ otherwise}$
Group Mean Covariate model:
lmer(y ~ x + x_site_mean + (1|site))$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 x_{ij} + + \beta_1 \bar{x_j} + \beta_2 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$
Group Mean Centered model:
lmer(y ~ x_dev_from_site_mean + x_site_mean + (1|site))$y_{ij} \sim \mathcal{N}(\widehat{y_{ij}}, \sigma^2)$
$\widehat{y_{ij}} = \beta_0 (x_{ij} - \bar{x_j}) + \beta_1 \bar{x_j} + \beta_2 + \mu_j$
$\mu_j \sim \mathcal{N}(0, \sigma_{site}^2)$
First Differences model - assumed 1 measurement per site per year:
lm(delta_y ~ delta_x)$\Delta y_{ij} \sim \mathcal{N}(\widehat{\Delta y_{ij}}, \sigma^2)$
$\widehat{\Delta y_{ij}} = \beta_0 \Delta_x{ij} + \beta_2$
Code for app is at github
These are the coefficients for the relevant term
in each model that dives the estimated effect of
temperature on snail abundance. For the GMC model
this is $x-\bar{x_j}$ and for the panel model, this
is $\Delta x$
The bias of the coefficients relative to the true value
The difference between the RE estimated temperature effect
and that from other models.
The difference between the estimated effect and 0 or
or the true coefficient value.