Also note that there is a horizontal line drawn at 0 on the y-axis. This is the point from which the deviations of wage are shown. These deviations are residuals of the empirical distribution from the theoretical distribution. The residuals above the line mean that there are a lot of observations in a particular wage bin, while those below the line indicate very few observations. Bars above the graph represent too many observations for that particular wage bin positive residuals while those below it represents that the observations are too few for that particular bin negative residuals.

Normality Tests in Stata There are also statistical tests that can be done in Stata to test for normality. The histogram with a normal and kernel density curve that we produced above showed that the variable was positively skewed since most observations were gathered close to the left tail.

A skewness value of 0 would indicate a normal distribution. In fact, any value between Any value between 0. In this case, the skewness value of 3. Kurtosis Kurtosis values indicate whether a distribution curve is heavier in the tails or lighter. Generally, a kurtosis value of 3 would mean a variable is normally distributed curve with equal weight on both tails, while anything below 3 suggests a flatter distribution with thin tails and a wider bell shape.

Statistical Tests for Normality in Stata These values for skewness and kurtosis give us an idea about the distribution of variables but they are not statistical tests. The three tests mentioned above can be performed in Stata to get a better idea of how normally or not distributed a variable is. Skewness and Kurtosis Normality Test If we follow the menu options as defined above and select the Skewness and kurtosis normality test, we get the following dialogue box.

Simply enter the variable name. As can be seen from the output, the command for this is sktest followed by the variable name. The null hypothesis here is that data is normally distributed. IAlso, it would follow that the error terms within firms are not independent. So I would be inclined to model this with -xtreg, fe- rather than pooled linear regression. And if you do that, by accounting for the fixed effects, you may find the residual distribution becomes closer to normal.

Going far out on a limb because I know so little about this subject matter, I would think that investment spending could have a very wild and very heavy-tailed distribution, even conditional on all the variables in your model. While Size and Profitability might account for a fair amount of the variation, I would still expect a pretty complicated distribution of investment spending even accounting for those factors.

So another thought is to try to explore the possibility of transformations of Size or Profitability that might better capture the relationship of those attributes to investment spending than just entering them linearly in the model. In addition to making your model a better description of the real world, that approach, too, might leave you with a less disturbing residual distribution.

