Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages

Author Archives: Attaullah Shah

  • 1

How Fama and French June to July Portfolios are Constructed?

Category:Asset Pricing Research,Blog Tags : 

The description of portfolios’ construction given in various Fama and Fench papers is usually confusing for many researchers, especially those who are new to asset pricing models. The typical language used in Fama and French papers reads like this

The size breakpoint for year t is the median NYSE market equity at the end of June of year t. BE/ME for June of year t is the book equity for the last fiscal year end in t-1 divided by ME for December of t-1.

This blog post aims at explaining the above paragraph with some examples.

Break-points for Portfolio Construction

The size-breakpoints

As mentioned in the above paragraph, the size-breakpoints are based on the market capitalization of firms at the end of June of the current year. This means while making two groups of firms:

  1. First, we need to reduce the data to keep the market capitalization of each firm at the end of June.
  2. Also, we need to further reduce the data to keep only firms listed at the NYSE stock exchange.

The BE/ME-breakpoints

The BE/ME variable uses lagged values of the book equity and market equity. However, the way the lagged values are obtained for both the variables differs from one another. The book equity is the last fiscal year’s available book equity. Since the assumption is that the financial year ends in June, therefore, the last June’s book equity is called book equity for the last fiscal year end in t-1

Consider the following monthly data where we have observations for a single firm over three years period. The variable year represents the calendar year that starts in January and ends in December. The variable fyear represents the fiscal year, that starts in July and ends in June.

From these observations, we need ME in December of yeat t-1. In our dataset, the first December appears in the calendar year 2016. The ME on that date is 958. For the calendar year 2006, the corresponding BE value for the fiscal year is 467, that is the book equity for the last fiscal year end in t-1
We are able to calculate the BE/ME ratio in June 2017 as = 467 / 958. This value will be used for finding the breakpoints and making the three BE/ME portfolios, which are then held from July of year t to June of year t+1, as shown in the following snapshot.

The Yearly Portfolios

The portfolios for July of year t to June of t+1 include all NYSE, AMEX, and NASDAQ stocks for which we have market equity data for December of t-1 and June of t, and (positive) book equity data for t-1.


Once the breakpoints for size and

How to Do it Programatically?

There are more than a dozen steps to fully implement the Fama and French model. Entry-level researchers might try to do all these steps in MS Excel. However, doing these steps in Excel is not only cumbersome but also prone to errors. Further, the process is manual, therefore, it cannot be easily replicated.

We have developed codes in Stata to construct the three factors of the Fama and French model as well as the 25 RHS (right-hand side) portfolios. Our codes generate factors that have over 97% correlation with the Fama and French factors.

Why buy codes for Fama and French Model?

There are several reasons that you should consider using the codes of a professional. These reasons include but are not limited to the accuracy of the code, quick learning, replicability of the codes in the same project or other projects, and validation of your own code if you have written a code yourself.

Pricing Options


  • Source Code
  • Comments
  • Email Support


Most Popular

  • Source Code
  • Data Handling
  • Comments
  • Email Support


  • Source Code
  • Example Dataset
  • Comments
  • Email Support

  • 2

fillmissing: Fill Missing Values in Stata

Category:Blog Tags : 

This post presents a quick tutorial on how to fill missing values in variables in Stata. This tutorial uses fillmissing program which can be downloaded by typing the following command in Stata command window

net install fillmissing, from( replace


Important Note: This post does not imply that filling missing values is justified by theory. Users should make their own decisions and follow appropriate theory while filling missing values.


After the installation of the fillmissing program, we can use it to fill missing values in numeric as well as string variables. Also, this program allows the bysort prefix to fill missing values by groups. We shall see several examples of using bysort prefix to perform by-groups calculations. But let us first quickly go through the different options of the program.


Program Options

The fillmissing program offers the following options to fill missing values

  1. with(any)
  2. with(previous)
  3. with(next)
  4. with(first)
  5. with(last)
  6. with(mean)
  7. with(max)
  8. with(min)
  9. with(median)

Let us quickly go through these options. Please note that options starting from serial number 6 are applicable only in the case of numerical variables.


1. with(any)

Option with() is used to specify the source from where the missing values will be filled. Option with(any) is an optional option and hence if not specified, will automatically be invoked by the fillmissing program. This option is best to fill missing values of a constant variable, i.e. a variable that has all similar values, however, due to some reason, some of the values are missing. Option with(any) will try to fill the missing values from any available non-missing values of the given variable.

Example 1: Fill missing values with(any)

Let us first create a sample dataset of one variable having 10 observations. You can copy-paste the following code to Stata Do editor to generate the dataset

clear all
set obs 10
gen symbol = "AABS"
replace symbol = "" in 5
replace symbol = "" in 8

The above dataset has missing values on row 5 and 8. To fill the missing values from any other available non-missing values, let us use the with(any) option.

fillmissing symbol, with(any)

Since with(any) is the default option of the program, we could also write the above code as

fillmissing symbol


2. with(previous)

Option with(previous) is used to fill the current missing value with the preceding or previous value of the same variable. Please note that if the previous value is also missing, the current value will remain missing. Further, this option does not sort the data, so whatever the current sort of the data is, fillmissing will use that sort and identify the current and previous observation.

Example 2: Fill missing values with(previous)

Let’s create a dummy dataset first.

clear all
set obs 10
gen symbol = "AABS" 
replace symbol = "AKBL" in 1
replace symbol = "" in 2 

The dataset looks like this

 | symbol |
 |   AKBL |
 |        |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |
 |   AABS |

To fill the missing value in observation number 2 with AKBL, i.e. from previous observation, we would type:

fillmissing symbol, with(previous)


What’s Next

In the next blog post, I shall talk about other options of the fillmissing program. Specifically, I shall discuss the use of by and bys with fillmissing program. Therefore, you may visit the blog section of this site or subscribe to updates from this site.


  • 2

Export output of Table command from Stata to Word using asdoc


Exporting tables from table command was the most challenging part in asdoc programming. Nevertheless, asdoc does a pretty good job in exporting table from table command. asdoc accepts almost all options with table command, except cellwidth(#), stubwidth(#), and csepwidth(#).


7.1 One-way table

Example 54 : One-way table; frequencies shown by default

sysuse auto, clear
asdoc table rep78, title(Table of Freq. for Repairs) replace


Example 55 : One-way table; show count of non-missing observations for mpg}

asdoc table rep78, contents(n mpg) replace

Example 56 : One-way table; multiple statistics on mpg requested

asdoc table rep78, c(n mpg mean mpg sd mpg median mpg) replace


Example 57 : Add formatting – 2 decimals

asdoc table rep78, c(n mpg mean mpg sd mpg median mpg) dec(2) replace


7.2 Two-way table

Example 58 : Two-way table; frequencies shown by default

asdoc table rep78 foreign, replace


Example 59 : Two-way table; show means of mpg for each cell

asdoc table rep78 foreign, c(mean mpg) replace


Example 60 : Add formatting

asdoc table rep78 foreign, c(mean mpg) dec(2) center replace


Example 61 : Add row and column totals

asdoc table rep78 foreign, c(mean mpg) dec(2) center row col replace


7.3 Three-way table

Example 62 : Three-way table

webuse byssin, clear
asdoc table workplace smokes race [fw=pop], c(mean prob) replace

7.4 Four-way table

Example 65 : Four-way table with by()

webuse byssin1, clear
asdoc table workplace smokes race [fw=pop], by(sex) c(mean prob) replace


Example 66 : Four-way table with supercolumn, row, and column totals

asdoc table workplace smokes race [fw=pop], by(sex) c(mean prob) sc col row replace

  • 0

Customized tables using option row() of asdoc – Stata


This is rather a quick example of how to use option row() of asdoc for creating highly customized tables. We are interested in a table that is given bellow.

* Load example dataset
sysuse auto,clear

* Write the header row of the table with table title
asdoc, row(Dependent variable:domestic or foreign, Domestic mean/frequency, Domestic SD, Foreign mean/frequency, Foreign SD, t-test) title(Summary staticis) save(myfile) replace

* Add the second row : \i, adds an empty cell
asdoc, row( Model independent variables, \i, \i, \i, \i, \i) append

* Use a loop over each variable that include price, mpg, ...
foreach var of varlist price mpg rep78 headroom trunk weight length turn{
  * First summarize each variable for a given sample, that is if foregin is   zero
  qui sum `var' if foreign==0

  * Obtain the mean divided by frequency
  local mf=`r(mean)'/`r(N)'

   * Store the mf and standard deviation variable in accum macro
  asdoc, accum(`mf', `r(sd)')

* now repeat the same for the second sample, ie. when foreing is 1
  qui sum `var' if foreign==1
  local mf=`r(mean)'/`r(N)'
  asdoc, accum(`mf', `r(sd)')

* Conduct a two sample ttest using foregin as a grouping variable
  ttest `var', by(foreign)

* Obtain the t-statistics
  local t : di %9.3f = abs(`r(t)')

* Create significance stars
  if `r(p)'<=0.01 {
    local star "***" 
  else if `r(p)'<=0.05{ 
    local star "**" 
  else if `r(p)'<=0.1{
   local star "*"
  else {
   local star " " 
  local tstar `t'`star' 

* Add the t-value and stars to the accum macro
  asdoc, accum(`tstar') 

* Finally write this complete row where we first write the variable name
* and then all accumulated variables that are present in $accum macro.
asdoc, row(`var', $accum) 

  • 7

tabstat with asdoc in Stata


asdoc makes some elegant tables when used with tabstat command. There are several custom-made routines in asdoc that creates clean tables from tabstat command. asdoc fully supports the command structure and options of tabstat. And, yes asdoc allows one additional statistics, that is, t-statistics alongside the allowed statistics in tabstat. For reporting purposes, asdoc categorizes tabstat commands in two groups:

(1) stats without a grouping variable

(2) stats over a grouping variable.


Tabstat Without-by

If statistics are less than variables, the table is transposed, i.e. statistics are shown in columns, while variables are shown in rows


Example 49 : One variable, many stats, including t-statistics

sysuse auto, clear  
asdoc tabstat price , stat(min max mean sd median p1 p99 tstat) replace 


Example 50 : Many variables, one statistic

asdoc tabstat price mpg rep78 headroom trunk weight length foreign , stat( mean) replace


Example 51 : Many variables, many statistics

asdoc tabstat price mpg rep78 headroom trunk weight length foreign , /// 
stat( max mean sd median p1 p99 tstat) replace


Tabstat with-by


Example 52 :

bysort foreign: asdoc tabstat price mpg rep78 headroom trunk weight length, stat(mean) replace


asdoc tabstat price mpg rep78 headroom trunk weight length, ///
stat(mean) by(foreign) replace


Example 53 : By with many variables and many statistics

bysort foreign: asdoc tabstat price mpg rep78 headroom trunk weight length, ///
stat(mean sd p1 p99 tstat) replace


  • 7

Quick setup of Python with Stata 16

Category:Blog,Stata Programs Tags : 

With the announcement of Stata 16, Python commands can be executed directly from the Stata command prompt, do files or ado programs. That would definitely expand the possibilities of doing extraordinary things without leaving the Stata environment. However, this integration exposes Stata to all the problems of Python installations and its packages.

First of all, Python does not come as part of the Stata installation. Stata depends on the already installed version of Python. That would definitely make a Stata-Python code less portable. One solution might be the portable version of Python. Only time can tell what will work best in such situations.

In this short post, I am going to outline a few basic steps to get started with Python from Stata. These steps are mentioned below:

1.What Version of Python to Install

A number of options are available to install Python. Over the past 12 months, I found that the installation of Python using Anaconda is the least problematic one. And with Stata 16, this again came out true. The stand-alone version of Python did not work with Stata. Each time I tried to type python from the Stata command prompt, the error message generated by Stata was:

initialized          no

What I did was to uninstall the other version of Python and kept only the Anaconda installation.

2. Set the Installation path

Stata can search for any available Python installation, including the installation through Anaconda. To search and associate python with Stata, I typed the following from the Stata command prompt:

python search 
set python_exec  D:\Anaconda\python.exe, permanently

The first line of code finds the directory path and the Python executable file. The second line of code sets which Python version to use. Option permanently would save this path for future use as well. And that’s all.

3. Using Python

Once the above steps go without an error, we are ready to use Python. In the Stata command window, we can enter the Python environment by typing python, and the three greater than familiar symbol will appear on the screen

 . python
 --------- python (type end to exit) ------- 

  • 2

Fama and MacBeth regression with Shanken correction using asreg


If you are not yet familiar with asreg, here is a quick start. Implementing the Fama and MacBeth regression using asreg is super-fast and easy. Here are a few posts related to this implementation.

FMB regressions with asreg

FMB regression – what, how and where

FMB regressions with 25-portfolios – An example

The Shanken Correction

In applying standard OLS formulas to a cross-sectional regression, we assume that the right-hand variables β are fixed. The β in the cross-sectional regressions are not fixed, of course, but are estimated in the time-series regression.  Therefore, there might be a sampling error in the estimates of β.  Shanken (1992) suggested a correction to the standard errors of the estimates.

How to do it?

The focus in this post is on the Fama and MacBeth implementation with Shanken () correction. Like with many other commands using asreg, the Shanken correction is fairly easy. The following steps are needed:

1. Find a covariance matrix among the right hand-side variables and write it to a matrix. Suppose variables inour dataset inlcude rm_rf smb and hml, then to find the covariance and write it to a matrix, we would do the following:

cor rm_rf smb hml, cov
matrix S = r(C)  

2. Find the first stage lambda of the RHS variables.

bys portfolios: asreg excess_returns rm_rf smb hml
* Remove uncessary variables
 drop _Nobs _R2 _adjR2 _b_cons 

3. Fama and MacBeth regression: In this last stage, we would use the fmb and shanken option. The shanken option requires the covariance matrix that we created in step 1 above

asreg excess_returns _b_mmrf _b_smb _b_hml , fmb shanken(S)



The asreg program is a freeware and can be downloaded from SSC. The Shanken correction is available for $100/model, plus a $50 for raw data processing (in case the data is not in Stata format and variables are not already constructed). For further details, please contact us at:


See our full list of completed projects


  1. Fama, E. F., & MacBeth, J. D. (1973). Risk, return, and equilibrium: Empirical tests. Journal of political economy81(3), 607-636.
  2. Shanken, J. (1992). On the estimation of beta-pricing models. The review of financial studies5(1), 1-33.

  • 0

Fama – MacBeth (1973) procedure: What, how and where | asreg in Stata


Fama and MacBeth (1973) procedure can be used in testing asset pricing models and in other areas. In this post, my primary focus is on its use in testing asset pricing models.

FMB in asset pricing models

It is actually a three-step process. We would divide the time period into three parts.

1. The first step is to find the assets/portfolios betas in the first period. Some researchers would use these betas to classify assets into portfolios.

2. The second step is to find betas of these portfolios in the second period.

3. The third step is to find the portfolio returns in the third period and test whether the betas from the second period can explain these returns? This step involves:
(i) cross-sectional regressions of the portfolio returns on the portfolio betas in each period.
(ii) averaging coefficients from the cross-sectional regressions across time. The standard errors are adjusted for cross-sectional dependence.

What does asreg do in the above process

asreg with fmb option performs step 3(i) and 3(ii). 

asreg can also help in step (1) where individual betas need to be calculated for each stock. The command might look like

bys company: asreg returns market_returns if period == 1

This means that for typical asset pricing tests, the researcher has to do step (1) and (2) and arrange the data in a panel format, listing portfolio returns and betas as variables in columns. And then use asreg with fmb option, e.g.

keep if period == 3
xtset company month
asreg returns betas, fmb

Where else FMB regression can be used?

Fama and MacBeth (1973) procedure (i.e step 3(i) and (ii)) is also used in areas other than testing the asset pricing models. You can see one example in my paper, Table 3, column 8, page 264

Shah, Attaullah & Shah, Hamid Ali & Smith, Jason M. & Labianca, Giuseppe (Joe), 2017. “Judicial efficiency and capital structure: An international study,” Journal of Corporate Finance, Elsevier, vol. 44(C), pages 255-274.

  • 13

Export correlation table to Word with stars and significance level using asdoc


The updated version of asdoc can now create a table of correlation with significance levels starred at different levels. The new version can be installed by typing the following line in Stata.

Installation of the new version

net install asdoc, from( replace

An Example

sysuse auto, clear
asdoc pwcorr price mpg rep78 headroom trunk weight length turn , star(all) replace nonum


Just like with any other Stata command, we would write asdoc as a prefix to the Stata command. In this case, the Stata command is pwcorr which is followed by the variable names. After the comma, we added option nonum, star(all) and replace. These are explained bellow:

star(all) = This option is used to report stars to signfy significance at different levels. These are: ***

  1. *** to show significance at 1% or bellow
  2. ** to show significance at 5% or bellow
  3. * to show significance at 10% or bellow

nonum = Without using this option, asdoc will report numeric numbers as column headers

replace = This option replaces any existing file

You would be interested in this blog entry where I show several useful options of asdoc that can be used with correlation tables.

  • 3

asdoc abbreviates / truncates my variable names and labels | Word to Stata


Stephen Okiya has asked the following question

I notice that the variable names are truncated in spite of using the option abb(100). Do you know why this is the case?


asdoc uses the abbrev() function of Mata. For some reasons, the abbrev() function splits the following sentence in half, no matter which value we set for the abbreviation.

loc vari " Child's age when she/he was first fed something other than breast milk"

. dis abbrev("`vari'", 32)
 Child's age when she/he was firs

 . dis abbrev("`vari'", 100)
 Child's age when she/he was firs

However, we set the second argument of abbrev() function to missing, then the full sentence is show

. dis abbrev("`vari'", .)

Child's age when she/he was first fed something other than breast milk

Therefore, if we prefer not to abbreviate any name or label, just provide missing value for the abb() option of asdoc. So the following will show all the text

asdoc sum Q85, label abb(.)