This post presents a quick tutorial on how to fill missing values in variables in Stata. This tutorial uses fillmissing program which can be downloaded by typing the following command in Stata command window
net install fillmissing, from(http://fintechprofessor.com) replace
Important Note: This post does not imply that filling missing values is justified by theory. Users should make their own decisions and follow appropriate theory while filling missing values.
After the installation of the fillmissing program, we can use it to fill missing values in numeric as well as string variables. Also, this program allows the
bysort prefix to fill missing values by groups. We shall see several examples of using
bysort prefix to perform by-groups calculations. But let us first quickly go through the different options of the program.
The fillmissing program offers the following options to fill missing values
Let us quickly go through these options. Please note that options starting from serial number 6 are applicable only in the case of numerical variables.
with() is used to specify the source from where the missing values will be filled. Option
with(any) is an optional option and hence if not specified, will automatically be invoked by the fillmissing program. This option is best to fill missing values of a constant variable, i.e. a variable that has all similar values, however, due to some reason, some of the values are missing. Option
with(any) will try to fill the missing values from any available non-missing values of the given variable.
Example 1: Fill missing values with(any)
Let us first create a sample dataset of one variable having 10 observations. You can copy-paste the following code to Stata Do editor to generate the dataset
clear all set obs 10 gen symbol = "AABS" replace symbol = "" in 5 replace symbol = "" in 8
The above dataset has missing values on row 5 and 8. To fill the missing values from any other available non-missing values, let us use the
fillmissing symbol, with(any)
with(any) is the default option of the program, we could also write the above code as
with(previous) is used to fill the current missing value with the preceding or previous value of the same variable. Please note that if the previous value is also missing, the current value will remain missing. Further, this option does not sort the data, so whatever the current sort of the data is, fillmissing will use that sort and identify the current and previous observation.
Example 2: Fill missing values with(previous)
Let’s create a dummy dataset first.
clear all set obs 10 gen symbol = "AABS" replace symbol = "AKBL" in 1 replace symbol = "" in 2
The dataset looks like this
+--------+ | symbol | +--------+ | AKBL | | | | AABS | | AABS | | AABS | | AABS | | AABS | | AABS | | AABS | | AABS | +--------+
To fill the missing value in observation number 2 with AKBL, i.e. from previous observation, we would type:
fillmissing symbol, with(previous)
In the next blog post, I shall talk about other options of the fillmissing program. Specifically, I shall discuss the use of
bys with fillmissing program. Therefore, you may visit the blog section of this site or subscribe to updates from this site.