# asrol’s Options

asrol has one required option and 8 optional options: Details are given below:

## 1. stat()

Option stat is used to specify required statistics. Version 4.0 or higher of asrol supports multiple statistics for multiple variables. The following statistics are allowed;

Option | Details |

sd |
Estimates the standard deviation of non-missing values |

mean |
Finds the arithmetic mean of non-missing values |

gmean |
Finds the geometric mean of positive values |

sum |
Adds all the numbers in a given window |

product |
Multiplies all the numbers in a given window |

median |
Returns median of non-missing values |

skewness |
Returns skewnessof non-missing values |

kurtosis |
Returns kurtosis of non-missing values |

count |
Counts the number of non-missing observations in a given window |

missing |
Counts the number of missing values in a given window |

min |
Returns the smallest value in a given window |

max |
Returns the largest value in a given window |

max2 |
Returns the second largest value in a given window |

max3 |
Returns the third largest value in a given window |

max4 |
Returns the fourth largest value in a given window |

max5 |
Returns the fifth largest value in a given window |

first |
Returns the first observation in a given window |

last |
Returns the last observation in a given window |

perc(k) |
Returns the k-th percentile of values in a range. This option must be used in combination with the option median. |

add(#) |
Adds the value # to each value in a given window before computing the geometric mean or products of values. |

ignorezerio |
Used with product and gmean statistics. See more details in this section |

## 2. window():

Please visit this page for more detailed discussion on understanding the option window. The latest version of asrol accepts up to three arguments and is written in like this:

window(rangevar #from #up to)

The *rangevar* is usually a time variable such as date, monthly date, or yearly date. Both *#from* and *#upto* are numeric values that mark how far the window will stretch from the current observation. Negative values of # mean going back # periods from the current observation. Similarly, positive values mean going ahead # periods from the current observations. If our time variable is year and we want a rolling window of 5 observations, (that is, the current observation and previous 4 observations), then option window cab written in two ways:

window(year 5) OR window(year -5 0)

**Rolling window calculations**

The default for rolling window is to calculate required statistics on available observation that are within the range. Therefore, the calculations of the required statistics start with one observation at the beginning of the rolling window. As we progress in the data set, the number of observations gradually increase until the maximum length of the rolling window is reached. Consider the following data of 10 observations, where X is the variable of interest for which we would like to calculate arithmetic mean in a rolling window of 5; and months is the rangevar. To understand the mechanics of the rolling window more clearly, we shall generate three additional statistics:

count, first, and last.

bys id: asrol X, window(months 5) stat(count) gen(count) bys id: asrol X, window(months 5) stat(mean) gen(mean) bys id: asrol X, window(months 5) stat(first) gen(first) bys id: asrol X, window(months 5) stat(last) gen(last) +--------------------------------------------------------+ | id months X mean count first last | |--------------------------------------------------------| | 1 2016m10 .6881 .6881 1 .6881 .6881 | | 1 2016m11 .9795 .8338 2 .6881 .9795 | | 1 2016m12 .6702 .7792 3 .6881 .6702 | | 1 2017m1 .5949 .7331 4 .6881 .5949 | | 1 2017m2 .7971 .7459 5 .6881 .7971 | |--------------------------------------------------------| | 1 2017m3 .7836 .765 5 .9795 .7836 | | 1 2017m4 .6546 .7001 5 .6702 .6546 | | 1 2017m5 .0968 .5854 5 .5949 .9689 | | 1 2017m6 .6885 .6041 5 .7971 .6885 | | 1 2017m7 .8725 .6192 5 .7836 .8725 | +--------------------------------------------------------+

**Explanation**

For the first observation, that is 2016m10, the mean value is based on a single observation, as there are no previous data. The same is reflected by the variables count, first, and last. For the second observation, the mean value is based on two observations of X, i.e., (0.6881 + .9795) / 2 = .8338 . We can also observe such details from the variable count, that has a value of 2; variable first which shows that the first value in the rolling window this far is .6881 and last, which shows that the last value in the rolling window is .9795. As we move down the data points, the rolling window keeps on adding more observations until the fifth observation, i.e. 2017m2. After this observation, the observations at the start of the rolling window are dropped and more recent observations are added. It is pertinent to mention that users can limit the calculations of required statistics until minimum number of observations are available, see option

minimum for more details.

**No Window**

Since the option window is optional, it can be dropped altogether. In such a case, asrol can be used like *gen* or *egen*. When used with bysort prefix, asrol can closely match the performance of *egen* in calculating statistics by groups.

## 3. gen(*new_variable_name*)

This is an optional option to specify name of the new variable, where the variable name is enclosed in parenthesis after gen. If we do not specify this option, asrol will automatically generate a new variable with the name format of s*tat_rollingwindow_varname*. When finding multiple statistics, one statistic for multiple variables, or multiple statistics for multiple variables, asrol will automatically assign names to the new variables. Therefore, option *gen()* cannot be used in such cases.

## 4. minimum(#)

The option minimum forces asrol to find required statistics only when the minimum number of observations are available. If a specific window does not have that many observations, values of the new variable will be replaced with missing values. Please note that # is an integer and should be greater than zero. Therefore, min(0), min(-5), or min(1.5) are treated as illegal commands. Examples of legal commands are min(2), min(10), or min(100).

## 5. by( varlist )

asrol is byable and hence the required statistics can be calculated using a single variable or multiple variables as sorting filter. For example, we can find mean profitability for each company in a rolling window of 5 years. Here, we use a single filter, that is company.

Imagine that we have a data set of 40 countries, each one having 60 industries, and each industry has 1000 firms. We might be interested in finding mean profitability of each industry within each country in a rolling window of 5 years. In that case, we shall use the option by or using the bysort prefix. Hence both of the following commands yield similar results. However, the command with bysort prefix has some speed advantage.

asrol profitability, window(year 5) stat(mean), by(country industry)

bys country industry : asrol profitability, window(year 5) stat(mean)

## 6. perc(k)

This is an optional option. Without using perc(*k*) option, stat(median) finds the median value or the 50th percentile of the values in a

given window. However, if option perc(*k*) is specified, then the stat(median) will find k-th percentile of the values in range. For example, if we are interested in finding the 75th percentiles of the values in our desired rolling window, then we have to invoke the

option perc(.75) along with using the option stat(median). See the following example:

bys country industry : asrol profitability, window(year 5) stat(median) perc(.75)

**Note **:

The calculation of percentiles follows a similar method as used in summarize and *_pctile *as defaults. Therefore, the percentile values might be slightly different from the values calculated with *centile*. For details related to different definitions of percentiles, see Hyndman and Fan (1996).

## 7. Options related to product and gmean : add(#) and ignorezero

This version of asrol improves the calculation of product of values and the geometric mean. Since both the statistics involve multiplication of values in a given window, the presence of missing values and zeros present a challenge to getting desired results. Following are the defaults in asrol to deal with missing values and zeros:

7.1 : Missing values are ignored when calculating the product or the geometric mean of values.

7.2 : To be consistent with Stata’s default for geometric mean calculations, (see *ameans*), the default in asrol is to ignore zeros and negative numbers. So the geometric mean of 0,2,4,6 is 3.6342412, that is [2 * 4 * 6]^(1/3). And the geometric mean of 0,-2,4,6 is 4.8989795, that is [4 * 6]^(1/2).

7.3 : Zeros are considered when calculating the product of values. So the product of 0,2,4,6 is 0

Two variations are possible when we want to treat zeros differently. These are discussed below:

7.4 Option ignorezero: This option can be used to ignore zeros when calculating the product of values. Therefore, when the zero is ignored, the product of 0,2,4,6 is 48

7.5 Option add(#) : This option adds a constant # to each values in the range before calculating the product or the geometric mean. Once the required statistic is calculated, then the constant is subtracted back. So using option add(1), the product of 0,.2,.4,.6 is 1.6880001 that is [1+0 * 1+.2 * 1+.4 * 1+.6] – 1 and the geometric mean is .280434 is [(1+0 * 1+.2 * 1+.4 * 1+.6)^(1/4)] – 1.

The Stata’s *ameans* command calculates three types of means, including the geometric mean. The difference between asrol’ *gmean()* function and the Stata *ameans* command lies in the treatment of option add(#). *ameans* does not subtract the constant # from the

results, whereas asrol does.

## 9. xf(excluding focal observation)

The xf is an abbreviation that I use for “excluding focal”. There might be circumstances where we want to exclude the focal observation while calculating the required statistics. asrol allows excluding focal observation with two flavors. The first one is to exclude only the current observation while the second one is to exclude all observation of the relevant variable if there are similar (duplicate) values of the *rangevar *elsewhere in the given window. An example will better explain the distinction between the two options. Consider the following data of 5 observations, where X is the variable of interest for which we would like to calculate arithmetic mean and year is the rangevar. Our calculations do not use any rolling window, therefore the option window is dropped.

**Example A:**

asrol X, stat(mean) xf(focal) gen(xfocal) Example B: asrol X, stat(mean) xf(year) gen(xfyear) +-----------------------------------+ | year X xfocal xfyear | |-----------------------------------| | 2001 100 350 350 | | 2002 200 325 325 | | 2003 300 300 266.66667 | | 2003 400 275 266.66667 | | 2004 500 250 250 | +-----------------------------------+

**Explanation :**

In Example A, we invoke the option xf() as xf(focal). asrol generates a new variable xfocal that contains the mean values of the rest of the observations in the given window, excluding the focal observation. Therefore, in the year 2001, xfocal variable has a value of 350, that is the average of the values of X in the years 2002, 2003, 2003, 2004 i.e. (200+300+400+500)/4 = 350. Similarly, the second observation of the xfocal variable is 325, that is (100+300+400+500)/4 = 325. Similar calculations are made when required statistics are estimated in a rolling window.

Example B differs from Example A in definition of the focal observation(s). In Example B, we invoke the option xf() as xf(year), where year is an existing numeric variable. With this option, the focal observation(s) is(are) defined as the current observation and other observations where the focal observation of the rangevar has duplicates. Our data set has two duplicate values in the rangevar, i.e., year 2003. Therefore, the mean values are calculated as shown bellow:

+----------------------------------------------------+ | obs 1: (200 + 300 + 400 + 500)/4 = 350 | | obs 2: (100 + 300 + 400 + 500)/4 = 325 | | obs 3: (100 + 200 + 500 ) /3 = 266.66667 | | obs 4: (100 + 200 + 500 ) /3 = 266.66667 | | obs 5: (100 + 200 + 300 + 400)/4 = 250 | +----------------------------------------------------+

RicardoOctober 19, 2021 at 9:38 pmHello, thank you for your code.

I am looking to implement asrol to generate a variable indicating the second lowest observation for a given window (756 days) and group(id).

bys id: asrol X, window(days 756) stat(min2) gen(second_lowest)

Is there a way to twitch the code to return this second lowest value?

Thank you,

Attaullah ShahNovember 6, 2021 at 10:02 amCurrently, there is function in asrol to find a second lowest value in a given range. However, I have added this to my agenda for future updates of asrol.

RossodiaOctober 29, 2022 at 6:00 amHello, thanks for the package. However, I just tried the function and it seems xf() yield wrong results for excluding focal observation based on a grouping variable.

my original data is as below:

ticker eanndats analys anndats value

000V 7-May-20 190000 3/15/2020 -0.74

000V 7-May-20 110000 3/18/2020 -0.70

000V 7-May-20 72000 3/23/2020 -0.75

000V 7-May-20 120000 4/2/2020 -0.77

000V 7-May-20 110000 4/3/2020 0

000V 7-May-20 110000 4/28/2020 -0.70

000V 6-Aug-20 72000 5/8/2020 -0.69

000V 6-Aug-20 120000 5/8/2020 -0.63

000V 6-Aug-20 190000 5/10/2020 -0.72

000V 6-Aug-20 110000 6/4/2020 -0.62

000V 5-Nov-20 120000 8/7/2020 -0.59

000V 5-Nov-20 190000 8/10/2020 -0.68

000V 5-Nov-20 110000 8/19/2020 -0.53

000V 9-Mar-21 110000 11/6/2020 -0.49

000V 9-Mar-21 120000 11/6/2020 1.25

000V 9-Mar-21 190000 11/8/2020 1.16

000V 9-Mar-21 72000 11/9/2020 1.11

000V 9-Mar-21 72000 11/23/2020 0.96

000V 9-Mar-21 120000 11/24/2020 1.04

000V 9-Mar-21 110000 11/25/2020 -0.50

I want to get the mean value (1) before the anndats date (2) excluding all obs from the focal analys ID for each group(ticker eanndats). I tried your package with code:

bys ticker eanndats: asrol value, stat(mean) win(anndats -90 0) xf(analys) gen(mean_ex)

The result is as below, and it is just the same output as excluding the current observation.

ticker eanndats analys anndats value mean_ex

000V 7-May-20 190000 3/15/2020 -0.74

000V 7-May-20 110000 3/18/2020 -0.70 -0.74

000V 7-May-20 72000 3/23/2020 -0.75 -0.72

000V 7-May-20 120000 4/2/2020 -0.77 -0.73

000V 7-May-20 110000 4/3/2020 0 -0.74

000V 7-May-20 110000 4/28/2020 -0.70 -0.59

Thank you so much!

Attaullah ShahNovember 4, 2022 at 11:07 amLet me check it. You can expect a reply in a week time or so.