Concept
When using FE, we assume that the characteristics of an individual may impact or bias the predictor or outcome variables, and we need to control for this. This is the rationale behind the assumption of the correlation between an entity’s error term and predictor variables. FE removes the effect of those time-invariant characteristics, and therefore, we can assess the net effect of the predictors on the outcome variable.
When using FE, we assume that something within the individual may impact or bias the predictor or outcome variables, and we need to control for this.FE model removes the effects of individual or entity's time-invariant characteristics so we can assess the net effect of the predictors on the outcome variable.
The FE regression model has n different intercepts, one for each entity. These intercepts can be represented by a set of binary variables, and these binary variables absorb the influences of all omitted variables that differ from one entity to the next but are constant over time.
Estimation
This guide discusses two different ways to estimate fixed effects models: (i) within estimator, (ii) dummy variable estimator .
(i)Within Estimator
This is the more commonly used estimator for fixed effects models. This estimator is called the "within estimator", as it uses time variation within each cross-section.
- Use the following dataset (ignore this step if you have already opened the dataset in the previous section)
use https://dss.princeton.edu/training/Panel101_new.dta, clear
- Declare the dataset as a panel using xtset (ignore this step if you have already declared the dataset as a panel)
- Use the following command to estimate your fixed effects model
xtreg y x1 x2, fe
Note: using thefe option indicates we estimate a fixed effects model.
Stata will give us the following results:
. xtreg y x1 x2, fe
Fixed-effects (within) regression Number of obs = 70
Group variable: country Number of groups = 7
R-squared: Obs per group:
Within = 0.0903 min = 10
Between = 0.0546 avg = 10.0
Overall = 0.0000 max = 10
F(2,61) = 3.03
corr(u_i, Xb) = -0.8561 Prob > F = 0.0557
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | 2.23e+09 1.13e+09 1.97 0.053 -2.86e+07 4.50e+09
x2 | 2.05e+09 2.00e+09 1.02 0.310 -1.95e+09 6.06e+09
_cons | 1.23e+08 7.99e+08 0.15 0.878 -1.48e+09 1.72e+09
-------------+----------------------------------------------------------------
sigma_u | 3.070e+09
sigma_e | 2.794e+09
rho | .54680874 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(6, 61) = 3.14 Prob > F = 0.0095
The coefficient of x1 indicates how much of Y changes over time, on average per country, when x1 increases by one unit, holding all other variables constant.
The first highlighted p-value suggests whether x1 significantly affects the dependent variable (y). As the p value is < 0.10, the coefficient for x1 is significant at 10% level.
The second highlighted p-value suggests whether the estimated model is statistically significant. As the p value is < 0.01, the model is statistically significant at 1% level.
(ii)Dummy Variable Regression
When there are a small number of fixed effects to be estimated, it is convenient to just run dummy variable regression for aFE model.
- Use the following dataset (ignore this step if you have already opened the dataset for the previous section)
use https://dss.princeton.edu/training/Panel101_new.dta, clear
- Declare the dataset as a panel using xtset (ignore this step if you have already declared the dataset as a panel)
- Use the following command to estimate your fixed effects model
reg y x1 x2 i.country
Stata will give us the following results:
. reg y x1 x2 i.country
Source | SS df MS Number of obs = 70
-------------+---------------------------------- F(8, 61) = 2.42
Model | 1.5096e+20 8 1.8870e+19 Prob > F = 0.0245
Residual | 4.7634e+20 61 7.8088e+18 R-squared = 0.2406
-------------+---------------------------------- Adj R-squared = 0.1411
Total | 6.2729e+20 69 9.0912e+18 Root MSE = 2.8e+09
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | 2.23e+09 1.13e+09 1.97 0.053 -2.86e+07 4.50e+09
x2 | 2.05e+09 2.00e+09 1.02 0.310 -1.95e+09 6.06e+09
|
country |
B | -6.77e+09 4.88e+09 -1.39 0.171 -1.65e+10 2.99e+09
C | -1.44e+09 1.96e+09 -0.74 0.464 -5.36e+09 2.47e+09
D | -2.93e+09 5.24e+09 -0.56 0.578 -1.34e+10 7.55e+09
E | -6.54e+09 5.10e+09 -1.28 0.204 -1.67e+10 3.65e+09
F | 6.14e+08 1.38e+09 0.44 0.659 -2.15e+09 3.38e+09
G | -3.32e+08 2.12e+09 -0.16 0.876 -4.56e+09 3.90e+09
|
_cons | 2.61e+09 1.94e+09 1.34 0.184 -1.27e+09 6.49e+09
------------------------------------------------------------------------------
Notice that the estimated coefficients for x1 and x2 are the same for both the "Within Estimator" method and the "Dummy Variable Regression" method.
Notes:
- Including a lagged dependent variable as a regressor in a fixed effects model can introduce bias, a problem often referred to as the "Nickell bias" or "dynamic panel bias."This bias arises because the lagged dependent variable is correlated with the individual-specific effects, violating the assumption of strict exogeneity required for consistent estimation of fixed effects models. In this case, using dynamic panel data models such as the Arellano-Bondor the generalized method of moments (GMM) can provide consistent estimates.