fillmissing: Fill Missing Values in Stata

  • 9

fillmissing: Fill Missing Values in Stata

Category:Blog Tags : 

This post presents a quick tutorial on how to fill missing values in variables in Stata. This tutorial uses fillmissing program which can be downloaded by typing the following command in Stata command window

net install fillmissing, from(http://fintechprofessor.com) replace

 

Important Note: This post does not imply that filling missing values is justified by theory. Users should make their own decisions and follow appropriate theory while filling missing values.

 

After the installation of the fillmissing program, we can use it to fill missing values in numeric as well as string variables. Also, this program allows the bysort prefix to fill missing values by groups. We shall see several examples of using bysort prefix to perform by-groups calculations. But let us first quickly go through the different options of the program.

 

Program Options

The fillmissing program offers the following options to fill missing values

  1. with(any)
  2. with(previous)
  3. with(next)
  4. with(first)
  5. with(last)
  6. with(mean)
  7. with(max)
  8. with(min)
  9. with(median)

Let us quickly go through these options. Please note that options starting from serial number 6 are applicable only in the case of numerical variables.

 

1. with(any)

Option with() is used to specify the source from where the missing values will be filled. Option with(any) is an optional option and hence if not specified, will automatically be invoked by the fillmissing program. This option is best to fill missing values of a constant variable, i.e. a variable that has all similar values, however, due to some reason, some of the values are missing. Option with(any) will try to fill the missing values from any available non-missing values of the given variable.

Example 1: Fill missing values with(any)

Let us first create a sample dataset of one variable having 10 observations. You can copy-paste the following code to Stata Do editor to generate the dataset

clear all
set obs 10
gen symbol = "AABS"
replace symbol = "" in 5
replace symbol = "" in 8

The above dataset has missing values on row 5 and 8. To fill the missing values from any other available non-missing values, let us use the with(any) option.

fillmissing symbol, with(any)

Since with(any) is the default option of the program, we could also write the above code as

fillmissing symbol

 

2. with(previous)

Option with(previous) is used to fill the current missing value with the preceding or previous value of the same variable. Please note that if the previous value is also missing, the current value will remain missing. Further, this option does not sort the data, so whatever the current sort of the data is, fillmissing will use that sort and identify the current and previous observation.

Example 2: Fill missing values with(previous)

Let’s create a dummy dataset first.

clear all
set obs 10
gen symbol = "AABS" 
replace symbol = "AKBL" in 1
replace symbol = "" in 2 

The dataset looks like this

 
 +--------+
 | symbol |
 +--------+
 |   AKBL |
 |        |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 +--------+ 

To fill the missing value in observation number 2 with AKBL, i.e. from previous observation, we would type:

fillmissing symbol, with(previous)

 

What’s Next

In the next blog post, I shall talk about other options of the fillmissing program. Specifically, I shall discuss the use of by and bys with fillmissing program. Therefore, you may visit the blog section of this site or subscribe to updates from this site.

 


9 Comments

Syed Ahmad Gillani

December 29, 2019at 1:50 pm

dear dr please check your email I have asked about Corporate governance data.. detail is in my email

Attaullah Shah

December 30, 2019at 11:48 pm

I did not receive your email. Please send it to attahshah15@hotmail.com

Dr. Hassan Raza

January 29, 2020at 2:50 pm

Dear sir, this code is not installing to stata, please help “net install fillmissing, from(http://fintechprofessor.com) replace”

    Attaullah Shah

    January 29, 2020at 7:56 pm

    Dear Dr. Hassan Raz
    I have converted the site to https protocol, therefore, you may try this method.

    net install fillmissing, from(https://fintechprofessor.com) replace

James Kirkbride

March 27, 2020at 2:19 pm

Very useful command, thanks. Would be helpful to have a help file installed along with the package itself for future reference. I can also confirm this works with the “bysort” command (in my Stata 15), which is exactly what I needed it to be able to do.

Aco

March 28, 2020at 10:32 pm

Dear, I have a question when using this fillmissing code in stata.
Example:

by Product pair_id, sort: fillmissing tarrifs, with (mean)

This command uses the average of the group, but I would like to use the average of the previous variable and the posterior variable to replace the missing, keeping the limits within each group.
Example of the database with missing

Year    Rates    Country
2000	5	 USA
2001	. 	USA
2002	. 	USA
2003	 4 	USA
2004	. 	USA
2005	 6 	USA
2000	. 	BRA
2001	 4 	BRA
2002	. 	BRA
2003	. 	BRA
2004	 8 	BRA
2005	. 	BRA

Example that I would like to arrive using the fillmissing code


2000	5	
2001	4.5	
2002	4.25	
2003	4	
2004	5	
2005	6	
2000	4	BRA
2001	4	BRA
2002	6	BRA
2003	7	BRA
2004	8	BRA
2005	8	BRA

I hope you can help me.
My best regards.

Attaullah Shah

March 30, 2020at 11:28 pm

I could not understand the requirements. The data you have posted and the fillmissing command that you have used do not match. Can you please clarify it a bit further on what to use for filling the missing values?

Aco

April 5, 2020at 8:04 pm

Dear,
I am sorry for the lack of clarity in the explanation.

The original database consists of a panel, with more than 100 importing and 100 exporting countries, organized in pairs. The dependent variable is import flow and the dependent variable is tariff.

The following database is similar to the original

year    rates	country
2000		BRA USA
2001	4	BRA USA
2002		BRA USA
2003		BRA USA
2004	8	BRA USA
2005		BRA USA
2000	5	USA BRA
2001		USA BRA
2002		USA BRA
2003	4	USA BRA
2004		USA BRA
2005	6	USA BRA

Command:

by country, sort: fillmissing rates, with (mean)

Result with the above command

year     rates	 country
2000	6	BRA USA
2001	4	BRA USA
2002	6	BRA USA
2003	6	BRA USA
2004	8	BRA USA
2005	6	BRA USA
2000	5	USA BRA
2001	5	USA BRA
2002	5	USA BRA
2003	4	USA BRA
2004	5	USA BRA
2005	6	USA BRA

This command uses the average of the group, but I would like to use the average of the previous variable and the posterior variable to replace the missing, keeping the limits within each group (BRA USA; USA BRA; and so on).

My expected result would be is to arrive to a base of data similar to the base below:

year	rates	country
2000	4	BRA USA
2001	4	BRA USA
2002	6	BRA USA
2003	7	BRA USA
2004	8	BRA USA
2005	8	BRA USA
2000	5	USA BRA
2001	4.5	USA BRA
2002	4.25	USA BRA
2003	4	USA BRA
2004	5	USA BRA
2005	6	USA BRA

The asdoc and fillmissing commands are very useful and help a lot in the job.
Excuse me for the inconvenience.
My best regards.

Attaullah Shah

April 6, 2020at 1:23 pm

I think it is an interesting problem and will need recursive loops. I can do it in the Mata language. Are you willing to make a contribution to support addition of this feature to the fillmissing program ($50 or so)?
A donation can be sent to the PayPal email address attaullah.shah@imsciences.edu.pk

Leave a Reply