# Un-biasing standard errors in panel data

Introducing a function to test for group and time effects

When dealing with panel data, one must expect biased standard errors. An excellent paper on this issue is Petersen (2009) which this post builds on. Most commonly, the bias results from either correlation between different time periods of a group (a group effect) or a correlation across different groups within the same time period (a time effect). While the former is often referred to as serial correlation, the latter is referred to as cross-sectional dependence.

Usually, when you use a package such as `plm`

you have two different options to choose from for both `random effect`

and `fixed effect`

models to account for the one, the other, or both: the `individual`

(only the group effect), the `time`

(only the time effect) or the `twoways`

effect (both group and time effect). Although this seems easy right away, there may be situations where you cannot simply specify `twoways`

effects (for example, if you intend to use the `nested`

random effect model) or, more critically, in my opinion, one of the key assumptions of this approach is not valid: namely that both group and time effects are “fixed.”

“Fixed” means that a group effect stays constant across all your time observations. This means implicitly that you assume that the impact of all the things you do not know or cannot measure for a firm is time-invariant, which is a strong assumption.

Think about the following examples: Image you are interested in the money raised by start-ups in several funding rounds. It is well known that the pitch, i.e., the presentation, influences investors (Martens, Jennings, and Jennings 2007). The problem is that often you cannot explicitly measure how “good” the pitch was—you are usually not present and do not have a clear indication of how well investors received it.

As you have observations from multiple funding rounds, you usually would use a dummy for each start-up to control for such unobserved effects. However, that dummy does not change and stays the same across all funding rounds—even though a presenter’s performance at the respective pitches might vary enormously (think about either having a bad day or it may be even a completely different presenter in different funding rounds). Therefore, the “fixed” group dummy may be biased.

For the time effect, the situation is similar. For capturing time effects, you would use year dummies to indicate in which year the funding took place. Again this is “fixed”; in this case, this means that it is the same for all start-ups that looked for funding in a given year. Just think about the pandemic: How realistic is it that funding rounds in 2021 have been identical for a start-up that works on a covid vaccine and a start-up that works on a dating app (which no one can use in a lockdown)? These simplified examples show that such “fixed” time and group effects are a strong assumption.

Petersen (2009) recommends an additional and possibly even superior way of addressing this issue: Clustering your standard errors for the time and group clusters. I used it for one of my projects, and I wanted to share a small function I wrote to quickly test if a group and/or time effect is present in your data. Let me illustrate with the `"Fatalities"`

data that comes with the `AER`

package.

```
library(plm) #for estimating models
library(AER) #for data
library(lmtest) #for coeftest command
```

```
data(Fatalities)
rdm <- plm(fatal ~ unemp+income+miles+beertax, index= c("state", "year"), model= "random", effect= "twoways", data=Fatalities) # run a random effect regression
summary(rdm)
```

```
## Twoways effects Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = fatal ~ unemp + income + miles + beertax, data = Fatalities,
## effect = "twoways", model = "random", index = c("state",
## "year"))
##
## Balanced Panel: n = 48, T = 7, N = 336
##
## Effects:
## var std.dev share
## idiosyncratic 7081.10 84.15 0.009
## individual 744674.57 862.95 0.990
## time 376.31 19.40 0.001
## theta: 0.9632 (id) 0.4693 (time) 0.4693 (total)
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -297.995 -45.673 -5.660 29.962 583.503
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 8.2922e+02 2.3080e+02 3.5928 0.0003271 ***
## unemp -2.3422e+01 5.4540e+00 -4.2944 1.752e-05 ***
## income 2.7880e-02 1.0932e-02 2.5503 0.0107624 *
## miles -6.0498e-04 4.8299e-03 -0.1253 0.9003188
## beertax -2.1565e+02 8.6497e+01 -2.4931 0.0126639 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 3134900
## Residual Sum of Squares: 2492200
## R-Squared: 0.205
## Adj. R-Squared: 0.1954
## Chisq: 85.3536 on 4 DF, p-value: < 2.22e-16
```

What we need to do now is to test the standard errors for bias. A good starting point for this is the `coeftest`

function provided by the `lmtest`

package. First, we compute so-called robust standard errors, where robust stands for heteroscedasticity robust. Usually this is done via a method developed by White (which is why you call it that way in the `coeftest`

command).

```
rdm_se_hc <- coeftest (rdm, vcov. = vcovHC(rdm, method = "white1", type = "HC3"))
rdm_se_hc
```

```
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.2922e+02 2.1639e+02 3.8321 0.000152 ***
## unemp -2.3422e+01 5.2501e+00 -4.4612 1.118e-05 ***
## income 2.7880e-02 1.1380e-02 2.4498 0.014810 *
## miles -6.0498e-04 5.3644e-03 -0.1128 0.910276
## beertax -2.1565e+02 7.5794e+01 -2.8451 0.004716 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Note that `white1`

and `HC3`

are usually the standard values for the `vcovHC`

command—however if a `plm`

object is analyzed, the default is switched to clustered standard errors due to the panel structure. Therefore we have to manually switch it to the selection above. We can see from the results that the robust standard errors have already more than doubled, so a significant bias has already been found and accounted for. While robust SE help against heteroscedasticity, they do not help against serial correlation and cross-sectional dependence. This is what clustering is for—let us to see what it brings at this point:

```
rdm_se_cg <- coeftest (rdm, vcov. = vcovHC(rdm, cluster = "group"))
rdm_se_cg
```

```
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.2922e+02 3.0003e+02 2.7638 0.006033 **
## unemp -2.3422e+01 7.8944e+00 -2.9669 0.003227 **
## income 2.7880e-02 1.7842e-02 1.5626 0.119101
## miles -6.0498e-04 2.8147e-03 -0.2149 0.829951
## beertax -2.1565e+02 1.0040e+02 -2.1478 0.032456 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

These are the SE clustered by group, in our example, the states. As we can see, they have increased a bit, but not too much. Petersen says that when a group effect is present, your SE should increase “3-5 times” (p.476). Here the ratio is much smaller (remember that we already included a random group effect in the model beforehand).

In a similar manner, one can also cluster by time to look for a time effect. Here the command is:

```
rdm_se_ct <- coeftest (rdm, vcov. = vcovHC(rdm, cluster = "time"))
rdm_se_ct
```

```
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.2922e+02 2.4396e+02 3.3990 0.0007589 ***
## unemp -2.3422e+01 4.7791e+00 -4.9009 1.497e-06 ***
## income 2.7880e-02 7.6184e-03 3.6595 0.0002940 ***
## miles -6.0498e-04 1.3758e-03 -0.4397 0.6604202
## beertax -2.1565e+02 4.7557e+01 -4.5345 8.082e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Here, the SE is smaller than the HC ones. Again, this does not speak for a time effect (remember we included a random time effect using the `twoways`

option). Thus, in this example, it seems perfectly fine to use the robust SE instead of the clustered ones as there seems not to be a time or group effect that is not captured by our random variables.

Here is the small function I wrote. Please note a few things: It compares the clustered SE with the robust White as suggested by Petersen. It then extracts the second column that contains the standard errors from the `named num`

values the `coeftest`

produce. I then sum them up and divide them with each other to obtain the ratio and compare them to the critical values of 3 (group effect) and 2 (time effect). Take this only as an indication! The change in standard error values is usually not consistent across variables, i.e., it can be stronger/weaker for some. As it actually sums up all the standard errors of the variables included in the model (including your control variables), the magnitude of the group effect may be partially driven by the scales your variables are measured on. If you want to be sure, you can always single out an individual IV with the code. Do the following:

`rdm_se_ct[,2] #this will select all the standard errors (second column) from your coeftest`

```
## (Intercept) unemp income miles beertax
## 2.439589e+02 4.779086e+00 7.618368e-03 1.375804e-03 4.755659e+01
```

`rdm_se_ct[3,2] # this will select the standard error of the third IV (income- dont forget to count the intercept)`

`## [1] 0.007618368`

`rdm_se_ct[3,2]/rdm_se_hc[3,2] # test for time effect in income variable- no time effect`

`## [1] 0.6694436`

Again the value is below two which speaks against the presence of a time effect, or more correctly, an additional time effect that is not controlled for by our random `twoways`

effect.

Let’s look at the general function:

```
test_time_effect <- function(x,...) {
cov_w<- coeftest(x, vcov. = vcovHC(x, method = "white1", type = "HC3",...))
cov_time<- coeftest(x, vcov. = vcovHC(x, cluster = "time", method = "arellano", ...))
result_ti <- sum(cov_time[,2])/ sum(cov_w[,2])
cat("The time clustered errors are:", result_ti, "times larger!")
ifelse(result_ti>2, "Time effect likely present", "No time effect present")
}
test_group_effect<- function(x,...) {
cov_w<- coeftest(x, vcov. = vcovHC(x, method = "white1",type = "HC3", ...))
cov_gr<- coeftest(x, vcov. = vcovHC(x, cluster = "group", method = "arellano", ...))
result_gr <- sum(cov_gr[,2])/ sum(cov_w[,2])
cat("The group clustered errors are:", result_gr, "times larger!")
ifelse(result_gr>3, "Group effect likely present", "No group effect present")
}
```

Let’s test it

`test_time_effect(rdm)`

`## The time clustered errors are: 0.9961568 times larger!`

`## [1] "No time effect present"`

`test_group_effect(rdm)`

`## The group clustered errors are: 1.37284 times larger!`

`## [1] "No group effect present"`

All works the same for the fixed effect option:

```
data(Fatalities) #select build-in dataset
fix <- plm(fatal ~ unemp+income+miles+beertax, index= c("state", "year"), model= "within", effect= "twoways", data=Fatalities) # run a random effect regression
summary(fix)
```

```
## Twoways effects Within Model
##
## Call:
## plm(formula = fatal ~ unemp + income + miles + beertax, data = Fatalities,
## effect = "twoways", model = "within", index = c("state",
## "year"))
##
## Balanced Panel: n = 48, T = 7, N = 336
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -409.9168 -37.6606 2.0487 36.6331 423.3816
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## unemp -3.2170e+01 5.7049e+00 -5.6390 4.203e-08 ***
## income 3.6515e-02 1.1477e-02 3.1816 0.001630 **
## miles 1.5639e-03 4.7378e-03 0.3301 0.741580
## beertax -2.7983e+02 8.8942e+01 -3.1463 0.001833 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2644200
## Residual Sum of Squares: 1968500
## R-Squared: 0.25553
## Adj. R-Squared: 0.10288
## F-statistic: 23.8545 on 4 and 278 DF, p-value: < 2.22e-16
```

```
fix_se_hc <- coeftest (fix, vcov. = vcovHC(fix, method = "white1", type = "HC3"))
fix_se_hc
```

```
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## unemp -3.2170e+01 6.1220e+00 -5.2548 2.955e-07 ***
## income 3.6515e-02 1.0742e-02 3.3993 0.0007747 ***
## miles 1.5639e-03 8.8353e-03 0.1770 0.8596331
## beertax -2.7983e+02 1.0712e+02 -2.6124 0.0094785 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
fix_se_cg <- coeftest (fix, vcov. = vcovHC(fix, cluster = "group"))
fix_se_cg
```

```
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## unemp -3.2170e+01 1.0236e+01 -3.1429 0.001854 **
## income 3.6515e-02 1.8091e-02 2.0184 0.044515 *
## miles 1.5639e-03 3.1294e-03 0.4997 0.617651
## beertax -2.7983e+02 1.5102e+02 -1.8530 0.064944 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
fix_se_ct <- coeftest (fix, vcov. = vcovHC(fix, cluster = "time"))
fix_se_ct
```

```
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## unemp -3.2170e+01 4.9493e+00 -6.4998 3.710e-10 ***
## income 3.6515e-02 7.5746e-03 4.8206 2.358e-06 ***
## miles 1.5639e-03 2.2998e-03 0.6800 0.4971
## beertax -2.7983e+02 3.5026e+01 -7.9893 3.633e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`test_time_effect(fix)`

`## The time clustered errors are: 0.3530503 times larger!`

`## [1] "No time effect present"`

`test_group_effect(fix)`

`## The group clustered errors are: 1.423968 times larger!`

`## [1] "No group effect present"`

### References

*Academy of Management Journal*50 (5): 1107–32. https://doi.org/10.5465/amj.2007.27169488.

*Review of Financial Studies*22 (1): 435–80. https://doi.org/10.1093/rfs/hhn053.