asrol’s Options


asrol has one required option and 8 optional options: Details are given below:

1. stat()

Option stat is used to specify required statistics. Version 4.0 or higher of asrol supports multiple statistics for multiple variables. The following statistics are allowed;

Option Details
sd Estimates the standard deviation of non-missing values
mean Finds the arithmetic mean of non-missing values
gmean Finds the geometric mean of positive values
sum Adds all the numbers in a given window
product Multiplies all the numbers in a given window
median Returns median of non-missing values
skewness Returns skewnessof non-missing values
kurtosis Returns kurtosis of non-missing values
count Counts the number of non-missing observations in a given window
missing Counts the number of missing values in a given window
min Returns the smallest value in a given window
max Returns the largest value in a given window
max2 Returns the second largest value in a given window
max3 Returns the third largest value in a given window
max4 Returns the fourth largest value in a given window
max5 Returns the fifth largest value in a given window
first Returns the first observation in a given window
last Returns the last observation in a given window
perc(k) Returns the k-th percentile of values in a range. This option must be used in combination with the option median.
add(#) Adds the value # to each value in a given window before computing the geometric mean or products of values.
ignorezerio Used with product and gmean statistics. See more details in this section

2. window():

Please visit this page for more detailed discussion on understanding the option window. The latest version of asrol accepts up to three arguments and is written in like this:

window(rangevar #from #up to)

The rangevar is usually a time variable such as date, monthly date, or yearly date. Both #from and #upto are numeric values that mark how far the window will stretch from the current observation. Negative values of # mean going back # periods from the current observation. Similarly, positive values mean going ahead # periods from the current observations. If our time variable is year and we want a rolling window of 5 observations, (that is, the current observation and previous 4 observations), then option window cab written in two ways:

window(year 5)
OR
window(year -5 0)

Rolling window calculations
The default for rolling window is to calculate required statistics on available observation that are within the range. Therefore, the calculations of the required statistics start with one observation at the beginning of the rolling window. As we progress in the data set, the number of observations gradually increase until the maximum length of the rolling window is reached. Consider the following data of 10 observations, where X is the variable of interest for which we would like to calculate arithmetic mean in a rolling window of 5; and months is the rangevar. To understand the mechanics of the rolling window more clearly, we shall generate three additional statistics:
count, first, and last.

bys id: asrol X, window(months 5) stat(count) gen(count)
bys id: asrol X, window(months 5) stat(mean) gen(mean)
bys id: asrol X, window(months 5) stat(first) gen(first)
bys id: asrol X, window(months 5) stat(last) gen(last)
   +--------------------------------------------------------+
   | id    months    X    mean    count    first    last    |
   |--------------------------------------------------------|
   | 1    2016m10    .6881   .6881    1   .6881    .6881    |
   | 1    2016m11    .9795   .8338    2   .6881    .9795    |
   | 1    2016m12    .6702   .7792    3   .6881    .6702    |
   | 1    2017m1    .5949    .7331    4   .6881    .5949    |
   | 1    2017m2    .7971    .7459    5   .6881    .7971    |
   |--------------------------------------------------------|
   | 1    2017m3    .7836    .765     5    .9795    .7836    |
   | 1    2017m4    .6546    .7001    5    .6702    .6546   |
   | 1    2017m5    .0968    .5854    5    .5949    .9689   |
   | 1    2017m6    .6885    .6041    5    .7971    .6885   |
   | 1    2017m7    .8725    .6192    5    .7836    .8725   |
   +--------------------------------------------------------+

Explanation
For the first observation, that is 2016m10, the mean value is based on a single observation, as there are no previous data. The same is reflected by the variables count, first, and last. For the second observation, the mean value is based on two observations of X, i.e., (0.6881 + .9795) / 2 = .8338 . We can also observe such details from the variable count, that has a value of 2; variable first which shows that the first value in the rolling window this far is .6881 and last, which shows that the last value in the rolling window is .9795. As we move down the data points, the rolling window keeps on adding more observations until the fifth observation, i.e. 2017m2. After this observation, the observations at the start of the rolling window are dropped and more recent observations are added. It is pertinent to mention that users can limit the calculations of required statistics until minimum number of observations are available, see option
minimum for more details.

No Window
Since the option window is optional, it can be dropped altogether. In such a case, asrol can be used like gen or egen. When used with  bysort prefix, asrol can closely match the performance of egen in calculating statistics by groups.

3.  gen(new_variable_name)

This is an optional option to specify name of the new variable, where the variable name is enclosed in parenthesis after gen. If we do not specify this option, asrol will automatically generate a new variable with the name format of stat_rollingwindow_varname. When finding multiple statistics, one statistic for multiple variables, or multiple statistics for multiple variables, asrol will automatically assign names to the new variables. Therefore, option gen() cannot be used in such cases.

4. minimum(#)

The option minimum forces asrol to find required statistics only when the minimum number of observations are available. If a specific window does not have that many observations, values of the new variable will be replaced with missing values. Please note that # is an integer and should be greater than zero. Therefore, min(0), min(-5), or min(1.5) are treated as illegal commands. Examples of legal commands are min(2), min(10), or min(100).

5. by( varlist )

asrol is byable and hence the required statistics can be calculated using a single variable or multiple variables as sorting filter. For example, we can find mean profitability for each company in a rolling window of 5 years. Here, we use a single filter, that is company.
Imagine that we have a data set of 40 countries, each one having 60 industries, and each industry has 1000 firms. We might be interested in finding mean profitability of each industry within each country in a rolling window of 5 years. In that case, we shall use the option by or using the bysort prefix. Hence both of the following commands yield similar results. However, the command with bysort prefix has some speed advantage.

asrol profitability, window(year 5) stat(mean), by(country industry)

bys country industry : asrol profitability, window(year 5) stat(mean)

6. perc(k)

This is an optional option. Without using perc(k) option, stat(median) finds the median value or the 50th percentile of the values in a
given window. However, if option perc(k) is specified, then the stat(median) will find k-th percentile of the values in range. For example, if we are interested in finding the 75th percentiles of the values in our desired rolling window, then we have to invoke the
option perc(.75) along with using the option stat(median). See the following example:

bys country industry : asrol profitability, window(year 5) stat(median) perc(.75)

Note :
The calculation of percentiles follows a similar method as used in summarize and _pctile as defaults. Therefore, the percentile values might be slightly different from the values calculated with centile. For details related to different definitions of percentiles, see Hyndman and Fan (1996).

7. Options related to product and gmean : add(#) and ignorezero

This version of asrol improves the calculation of product of values and the geometric mean. Since both the statistics involve multiplication of values in a given window, the presence of missing values and zeros present a challenge to getting desired results. Following are the defaults in asrol to deal with missing values and zeros:

7.1 : Missing values are ignored when calculating the product or the geometric mean of values.

7.2 : To be consistent with Stata’s default for geometric mean calculations, (see ameans), the default in asrol is to ignore zeros and negative numbers. So the geometric mean of 0,2,4,6 is 3.6342412, that is [2 * 4 * 6]^(1/3). And the geometric mean of 0,-2,4,6 is 4.8989795, that is [4 * 6]^(1/2).

7.3 : Zeros are considered when calculating the product of values. So the product of 0,2,4,6 is 0

Two variations are possible when we want to treat zeros differently. These are discussed below:

7.4 Option ignorezero: This option can be used to ignore zeros when calculating the product of values. Therefore, when the zero is ignored, the product of 0,2,4,6 is 48

7.5 Option add(#) : This option adds a constant # to each values in the range before calculating the product or the geometric mean. Once the required statistic is calculated, then the constant is subtracted back. So using option add(1), the product of 0,.2,.4,.6 is 1.6880001 that is [1+0 * 1+.2 * 1+.4 * 1+.6] – 1 and the geometric mean is .280434 is [(1+0 * 1+.2 * 1+.4 * 1+.6)^(1/4)] – 1.

The Stata’s ameans command calculates three types of means, including the geometric mean. The difference between asrol’ gmean() function and the Stata ameans command lies in the treatment of option add(#). ameans does not subtract the constant # from the
results, whereas asrol does.

9. xf(excluding focal observation)

The xf is an abbreviation that I use for “excluding focal”. There might be circumstances where we want to exclude the focal observation while calculating the required statistics. asrol allows excluding focal observation with two flavors. The first one is to exclude only the current observation while the second one is to exclude all observation of the relevant variable if there are similar (duplicate) values of the rangevar elsewhere in the given window. An example will better explain the distinction between the two options. Consider the following data of 5 observations, where X is the variable of interest for which we would like to calculate arithmetic mean and year is the rangevar. Our calculations do not use any rolling window, therefore the option window is dropped.

Example A:

asrol X, stat(mean) xf(focal) gen(xfocal)

Example B:

asrol X, stat(mean) xf(year) gen(xfyear)

   +-----------------------------------+
   | year    X     xfocal    xfyear    |
   |-----------------------------------|
   | 2001    100    350    350         |
   | 2002    200    325    325         |
   | 2003    300    300    266.66667   |
   | 2003    400    275    266.66667   |
   | 2004    500    250    250         |
   +-----------------------------------+

Explanation :

In Example A, we invoke the option xf() as xf(focal). asrol generates a new variable xfocal that contains the mean values of the rest of the observations in the given window, excluding the focal observation. Therefore, in the year 2001, xfocal variable has a value of 350, that is the average of the values of X in the years 2002, 2003, 2003, 2004 i.e. (200+300+400+500)/4 = 350. Similarly, the second observation of the xfocal variable is 325, that is (100+300+400+500)/4 = 325. Similar calculations are made when required statistics are estimated in a rolling window.

Example B differs from Example A in definition of the focal observation(s). In Example B, we invoke the option xf() as xf(year), where year is an existing numeric variable. With this option, the focal observation(s) is(are) defined as the current observation and other observations where the focal observation of the rangevar has duplicates. Our data set has two duplicate values in the rangevar, i.e., year 2003. Therefore, the mean values are calculated as shown bellow:

    +----------------------------------------------------+
    | obs 1:    (200 + 300 + 400 + 500)/4 =    350       |
    | obs 2:    (100 + 300 + 400 + 500)/4 =    325       |
    | obs 3:    (100 + 200 + 500 ) /3     =    266.66667 |
    | obs 4:    (100 + 200 + 500 ) /3     =    266.66667 |
    | obs 5:    (100 + 200 + 300 + 400)/4 =    250       |
    +----------------------------------------------------+