Skip to content
Stata.Professor : Your Partner in Research Logo Stata.Professor : Your Partner in Research Logo

  • Home
  • Paid Help
    • Empirical Models using Stata®
    • Data Management
    • Paid Help | Pricing
    • Paid Help – Frequently Asked Questions (FAQs)
    • Privacy
  • Completed Projects
    • Asset Pricing Models
      • Fama and French Model
      • Fama – MacBeth
      • Bootstrap Fund Performance
      • Investors attention
      • Misvaluing Innovation
    • Implied cost of equity (ICC)
      • Gebhardt et al. (2001) – GLS
      • ICC GLS Model
      • Claus and Thomas (2001)
      • Cost of Equity using CAPM & FF3
      • Mutual Funds performance
    • Event studies
      • Event study methodology
      • Returns to New IPOs
      • Share repurchases
    • Momentum & Trading
      • asm Momentum Portfolios
      • Absolute Strength Momentum
      • Non-parametric momentum
      • Trading rules
      • Herding behavior
      • Sentiment, disagreement
    • KMV – Default Probability
    • Earning Management Model
    • Market Liquidity
      • Market liquidity models
      • Bid Ask Spread
    • Volatility
      • Backtesting VaR Models
      • P42 – Volatility portfolios
      • P31 – Realized volatility
      • Volatility Managed Portfolios
      • Expected Idiosyncratic Skewness
    • CEO debt and R&D
    • Zero-leverage firms
    • Financial Statement Comparability
  • Stata Programs
    • ASDOC: Stata to Word
      • asdoc : examples
      • asdoc : Tables of summary statistics
    • ASROL: Rolling window stats
    • ASREG: rolling regressions
    • ASTILE: for Portfolios and Groups
    • ASCOL: Converts Data Frequency
    • ASM: Momentum Portfolios
    • Stata Random Tips
    • Download Statistics
  • BLOG
  • ForumsbbPress Forums
  • About Us
  • Home
  • Paid Help
    • Empirical Models using Stata®
    • Data Management
    • Paid Help | Pricing
    • Paid Help – Frequently Asked Questions (FAQs)
    • Privacy
  • Completed Projects
    • Asset Pricing Models
      • Fama and French Model
      • Fama – MacBeth
      • Bootstrap Fund Performance
      • Investors attention
      • Misvaluing Innovation
    • Implied cost of equity (ICC)
      • Gebhardt et al. (2001) – GLS
      • ICC GLS Model
      • Claus and Thomas (2001)
      • Cost of Equity using CAPM & FF3
      • Mutual Funds performance
    • Event studies
      • Event study methodology
      • Returns to New IPOs
      • Share repurchases
    • Momentum & Trading
      • asm Momentum Portfolios
      • Absolute Strength Momentum
      • Non-parametric momentum
      • Trading rules
      • Herding behavior
      • Sentiment, disagreement
    • KMV – Default Probability
    • Earning Management Model
    • Market Liquidity
      • Market liquidity models
      • Bid Ask Spread
    • Volatility
      • Backtesting VaR Models
      • P42 – Volatility portfolios
      • P31 – Realized volatility
      • Volatility Managed Portfolios
      • Expected Idiosyncratic Skewness
    • CEO debt and R&D
    • Zero-leverage firms
    • Financial Statement Comparability
  • Stata Programs
    • ASDOC: Stata to Word
      • asdoc : examples
      • asdoc : Tables of summary statistics
    • ASROL: Rolling window stats
    • ASREG: rolling regressions
    • ASTILE: for Portfolios and Groups
    • ASCOL: Converts Data Frequency
    • ASM: Momentum Portfolios
    • Stata Random Tips
    • Download Statistics
  • BLOG
  • ForumsbbPress Forums
  • About Us
  • Home
  • Paid Help
    • Empirical Models using Stata®
    • Data Management
    • Paid Help | Pricing
    • Paid Help – Frequently Asked Questions (FAQs)
    • Privacy
  • Completed Projects
    • Asset Pricing Models
      • Fama and French Model
      • Fama – MacBeth
      • Bootstrap Fund Performance
      • Investors attention
      • Misvaluing Innovation
    • Implied cost of equity (ICC)
      • Gebhardt et al. (2001) – GLS
      • ICC GLS Model
      • Claus and Thomas (2001)
      • Cost of Equity using CAPM & FF3
      • Mutual Funds performance
    • Event studies
      • Event study methodology
      • Returns to New IPOs
      • Share repurchases
    • Momentum & Trading
      • asm Momentum Portfolios
      • Absolute Strength Momentum
      • Non-parametric momentum
      • Trading rules
      • Herding behavior
      • Sentiment, disagreement
    • KMV – Default Probability
    • Earning Management Model
    • Market Liquidity
      • Market liquidity models
      • Bid Ask Spread
    • Volatility
      • Backtesting VaR Models
      • P42 – Volatility portfolios
      • P31 – Realized volatility
      • Volatility Managed Portfolios
      • Expected Idiosyncratic Skewness
    • CEO debt and R&D
    • Zero-leverage firms
    • Financial Statement Comparability
  • Stata Programs
    • ASDOC: Stata to Word
      • asdoc : examples
      • asdoc : Tables of summary statistics
    • ASROL: Rolling window stats
    • ASREG: rolling regressions
    • ASTILE: for Portfolios and Groups
    • ASCOL: Converts Data Frequency
    • ASM: Momentum Portfolios
    • Stata Random Tips
    • Download Statistics
  • BLOG
  • ForumsbbPress Forums
  • About Us

Difference-in-Differences Estimator in Stata

Difference-in-Differences Estimator in StataAttaullah Shah2025-02-24T10:23:41+05:00

Difference-in-Differences (DiD) estimators

How do researchers measure the real impact of a big event—like a policy change or natural disaster—on a specific group? Enter the Differences-in-Differences (DID) regression, It is a powerful tool for isolating causal effects by comparing changes between two groups over time.

Imagine two groups: one affected by an event (treatment group) and another that remains unchanged (control group). If no event occurred, the gap between these groups would stay consistent. DID checks whether that gap shifts after the event—if it does, that suggests a causal impact.

Think of it as a before-and-after snapshot with a twist. Instead of just analyzing the treatment group’s outcome, DID also tracks changes in the control group to separate true effects from pre-existing differences.

Below, we present an example from the Stata Manual. It demonstrates how to use the commands didregress and xtdidregress.

Fitting a DID model

Let’s say a health provider wants to know if a new hospital admission process makes patients more satisfied. They have patient satisfaction data for each month from January to July. In April, some hospitals started using this new process. Out of 46 hospitals, 18 tried the new process.

‏‏‎ To see if the new process worked, the health provider will use a method called DID regression. They will compare patient satisfaction scores in hospitals that used the new process to those that didn’t. Patient satisfaction (“satis”) is measured on a scale of 0 to 10 (0 = very unsatisfied, 10 = very satisfied). A variable called “procedure” will be used: it’s marked as 1 if a patient used the new process after March, and 0 if not. This will help them estimate the impact of the new procedure on patient satisfaction. To get the ATET on the outcome satis, we type

didregress (satis) (procedure), group(hospital) time(month)
When you write the command, the first parentheses are for what you’re measuring – in this case, patient satisfaction (satis). If you had other things you wanted to consider in your model, you’d put them here too, but here it’s just satis.

The second parentheses are for the “procedure” variable – this tells the command which hospitals used the new process.

The words group() and time() tell the command to include “fixed effects.” Think of these as extra checks in your model for hospital groups and time periods. group() is also important because it tells the computer to group the data by hospital when it calculates the standard errors, making sure the results are reliable even if data within the same hospital are related. This is called clustering at the hospital level.

 * Load hospdd data

use https://www.stata-press.com/data/r18/hospdd
‏‏‎ ‎
didregress (satis) (procedure), group(hospital) time(month)
Treatment and time information

Time variable: month
Control: procedure = 0
Treatment: procedure = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
hospital | 28 18
-------------+---------------------
Time |
Minimum | 1 4
Maximum | 1 4
-----------------------------------

Difference-in-differences regression Number of obs = 7,368
Data type: Repeated cross-sectional

(Std. err. adjusted for 46 clusters in hospital)
-------------------------------------------------------------------------------
| Robust
satis | Coefficient std. err. t P>|t| [95% conf. interval] --------------+----------------------------------------------------------------
ATET |
procedure |
(New vs Old) | .8479879 .0321121 26.41 0.000 .7833108 .912665
-------------------------------------------------------------------------------
Note: ATET estimate adjusted for group effects and time effects.

The first table in the results tells you about the groups and when the new procedure started. The “Group” part shows you how many hospitals are in each group: 28 used the old procedure (control group) and 18 used the new one (treated group).

The next part of the table shows you when the control group hospitals are first seen in the data, and when the treated hospitals started using the new procedure. In this example, all hospitals that used the new procedure started in April (time period 4). If some hospitals had started later, these times would be different.

The “ATET” number is 0.85. This means that, on average, patient satisfaction increased by almost 1 point because of the new procedure. Another way to think about it is: if those 18 hospitals hadn’t used the new procedure, their patient satisfaction would likely be almost 1 point lower.

Now, we need to check if the patient satisfaction scores were changing in a similar way for both groups before the new procedure started. This is called checking the “parallel trends” assumption, which is important for DID to work correctly. We can visually check this by plotting the average satisfaction scores over time for both groups.

Graphical diagnostics for parallel trends

Stata Graph - Graph 3.4 3.6 3.8 4 4.2 4.4 Patient satisfaction score 1 2 3 4 5 6 7 Month Observed means 3.4 3.6 3.8 4 4.2 4.4 Patient satisfaction score 1 2 3 4 5 6 7 Month Linear-trends model Graphical diagnostics for parallel trends Control Treatment The graph seems to indicate that the parallel-trends assumption is satisfied. Prior to the policy implementation, treated and control hospitals followed a parallel path.

We can also use a statistical test to check if the satisfaction trends were parallel before the new procedure. This test adds extra variables to our model to represent how satisfaction changed over time before and after the new procedure for both groups.

This test looks at the difference in trends before the procedure started. If this difference is zero, it means the trends were parallel. If not, our ATET result might be less reliable.

We can run this test using the command estat ptrends. The results are:


. estat ptrends
Parallel-trends test (pretreatment time period)
H0: Linear trends are parallel
F(1, 45) = 0.55
Prob > F = 0.4615

You can find more examples and uses in the Stata Manual.

The drdid Package

The drdid package implements the locally efficient, doubly robust difference-in-differences (DiD) estimators for the average treatment effect as proposed by Sant’Anna and Zhao (2020). It also includes inverse probability weighting (IPW) and outcome regression estimators for the DiD average treatment effect on the treated (ATT).

These doubly robust estimators combine IPW and outcome regression to yield more reliable statistical properties. In particular, the estimator remains valid as long as either the treatment propensity model or the outcome regression model is correctly specified.

Note that the panel data estimators assume you are working with time-invariant variables. Even if some variables vary over time, only their pre-treatment values are used when estimating the outcome or probability models.

In contrast, when using repeated cross-section data, you can incorporate time-varying controls. However, the underlying assumption is that the control variables should be time constant—a condition Sant’Anna and Zhao (2020) refer to as the stationarity assumption.

While it is possible to add time-varying covariates to panel data estimators by including changes in covariates as additional controls alongside the pre-treatment values, this approach may lead to inconsistent results unless the controls are strictly exogenous (which is a strong assumption). Otherwise, the effects that should be captured by the ATT might be absorbed by the varying covariates.

Syntax

 drdid depvar [indepvars] [if] [in] [weight], ///
ivar(panel_id) time(time_var) treatment(treat_var) [options]

Where:

  • depvar: The dependent variable or outcome variable of interest.
  • [indepvars]: A list of independent variables or covariates to be included in your model.
  • [if] [in]: Standard Stata if and in qualifiers for sample restrictions.
  • [weight]: Optional weight variable.
  • ivar(panel_id): Required for panel data. Specifies the panel identifier variable.
  • time(time_var): Specifies the time variable. This variable must be numeric, positive, and regularly spaced.
  • treatment(treat_var): Specifies the treatment indicator variable. Lower values are interpreted as the control group, and higher values as the treated group.
  • [options]: A range of options to customize the estimation, standard errors, and output.

Options

Model Specification Options

  • ivar(varname): Specifies the panel identifier variable for panel data. Crucial for panel data structures.
  • time(varname): Defines the time variable, which should be positive and regularly spaced.
  • treatment(varname): Indicates the binary treatment variable, distinguishing between control (lower values) and treated (higher values) groups.

Estimation Methods Options

The drdid command provides various estimators, selectable via options. Here’s a summary:

Option Description Data Compatibility
drimp Default. Improved DR estimator using inverse probability tilting and Weighted Least Squares (WLS). Panel and RC
dripw DR estimator with stabilized IPW and Ordinary Least Squares (OLS). Panel and RC
reg Outcome Regression (OR) DiD estimator. Panel and RC
ipw Abadie (2005) IPW estimator. Panel and RC
ipwra IPW regression adjustment (via teffects). Panel only
all Compute all compatible estimators. Useful for robustness checks across different methods. Panel and RC

Standard Errors Options

  • wboot: Requests wild bootstrap standard errors. By default, uses 999 repetitions and Mammen distribution.
  • cluster(clust_var): Computes clustered standard errors. Can be used with asymptotic or bootstrap SEs.
  • rseed(#): Sets the seed for bootstrap reproducibility, ensuring consistent results across runs when using wboot or cluster with bootstrap.
  • level(#): Sets the confidence interval level. Default is 95% (level(95)).

Other Options

  • stub(str): Allows you to save influence function variables, prefixed with the specified stub(str) (e.g., stub(att) would save variables like att_ipw, att_or etc.).
  • noisily: Displays intermediate estimation steps, providing more detailed output during computation.

When to Use Which Estimator?

Panel Data:

  • For panel data, drimp (the default) and dripw are recommended for their doubly robust properties.
  • Consider ipwra for IPW regression adjustment specifically in panel data settings.
  • It’s generally advisable to avoid ipwra when working with repeated cross-sectional data.

Repeated Cross-Section (RC) Data:

  • When using drimp or dripw with repeated cross-sectional data, append the rc1 option to obtain non-locally efficient estimators, as recommended for RC designs.

Examples

Panel Data (Default drimp Estimator)

  1. use "https://friosavila.github.io/playingwithstata/drdid/lalonde.dta", clear

    * Estimate drdid
    drdid re age educ, ivar(id) time(year) tr(experimental)

  2. Comparing All Compatible Estimators
    drdid re age educ, ivar(id) time(year) tr(experimental) all
    
  3. Wild Bootstrap Standard Errors
    drdid re age educ, ivar(id) time(year) tr(experimental) wboot reps(500) rseed(123)
  4. Repeated Cross-Section with Clustering
    drdid re age educ, time(year) tr(experimental) cluster(id) dripw

Common Issues & Solutions

  1. Time-Varying Covariates
    • Panel Data
    • When using panel data, it is crucial to use only time-invariant covariates (e.g., characteristics fixed at baseline). Including time-varying covariates can introduce bias unless they are strictly exogenous.
    • RC Data
    • For repeated cross-sectional data, time-varying covariates are permissible, assuming the stationarity condition holds.
  2. Clustering
    • For panel estimators, clustering is automatically applied at the ivar level. Using the cluster() option adds an additional layer of two-way clustering. Ensure that your ivar variable is properly nested within the specified cluster() variable.
  3. Error Messages
    • “Treatment variable must have 2 groups“: This error indicates that your treatment variable needs to be properly coded as a binary variable (e.g., 0 for control, 1 for treated). Recode your treatment variable accordingly.
    • Panel ID required
    • This message signifies that you are attempting to use drdid with panel data but have not specified the panel identifier. Include the ivar(panel_id) option in your command.
 

Post Estimation

  1. Displaying Results
    drdid_display, bmatrix(e(b)) vmatrix(e(V))
    

    Use drdid_display to present the estimation results in a clear matrix format, showing coefficients and variance-covariance matrix.

  2. Saving Weights and Propensity Scores
    drdid_predict ipw_weights, weight  // Generate IPW weights
    drdid_predict pscores, pscore      // Generate propensity scores
    

    The drdid_predict command allows you to generate and save Inverse Probability Weights (IPW) using the weight option and propensity scores using the pscore option for further analysis or diagnostics.

References & Further Exploration

Sant’Anna, P. H. C., & Zhao, J. (2020). Doubly robust DiD estimators. Journal of Econometrics.

Abadie, A. (2005). Semiparametric DiD estimators. Review of Economic Studies.

Rios-Avila, F., Sant’Anna, P. H. C., & Naqvi, A. (2021). DRDID: Doubly robust DiD estimators for Stata.

For related commands and deeper exploration, consider these Stata resources:

csdid: For event-study Difference-in-Differences designs.

didregress, xtdidregress: Alternative Difference-in-Differences commands available in Stata.

For further assistance and community discussion, the Statalist Forum is an invaluable resource.

Authors: Fernando Rios-Avila, Pedro H. C. Sant’Anna, Asjad Naqvi

Last Updated: Feb 2025

Paid Help Menu

  • What is Paid Help
  • Paid Help | Pricing
  • Paid Help – FAQs
  • List of Completed Projects
  • Details of All Projects
  • Our Stata Programs
  • Contact

Key Links

  • Paid Help
  • Data Management
  • Empirical Models using Stata®
  • Paid Help – FAQs
  • Paid Help | Pricing
  • Privacy

Recent Projects

  • Conditional Beta using MGARCH Approach
  • Returns to New IPOs
  • Trading frequency | Price Impact
  • Measuring Financial Statement Comparability
  • Expected Idiosyncratic Skewness and Stock Returns
  • Absolute Strength Momentum

Contact Info

Phone: +92 3459146115

Mobile: +92 3459146115

Email: attaullah.shah@imsciences.edu.pk

Copyright 2012 - 2020 Attaullah Shah | All Rights Reserved
FacebookXInstagramPinterest
Page load link
Go to Top