Category Archives: Uncategorized

Fama – MacBeth (1973) procedure: What, how and where | asreg in Stata

Category:Uncategorized

Fama and MacBeth (1973) procedure can be used in testing asset pricing models and other areas. In this post, my primary focus is on its use in testing asset pricing models.

FMB in asset pricing models

It is actually a three-step process. We would divide the time period into three parts.

1. The first step is to find the assets/portfolios betas in the first period. Some researchers would use these betas to classify assets into portfolios.

2. The second step is to find betas of these portfolios in the second period.

3. The third step is to find the portfolio returns in the third period and test whether the betas from the second period can explain these returns? This step involves:
(i) cross-sectional regressions of the portfolio returns on the portfolio betas in each period.
(ii) averaging coefficients from the cross-sectional regressions across time. The standard errors are adjusted for cross-sectional dependence.

What does asreg do in the above process

asreg with fmb option performs step 3(i) and 3(ii).

asreg can also help in step (1) where individual betas need to be calculated for each stock. The command might look like
Code:

`bys company: asreg returns market_returns if period == 1`

This means that for typical asset pricing tests, the researcher has to do step (1) and (2) and arrange the data in a panel format, listing portfolio returns and betas as variables in columns. And then use asreg with fmb option, e.g.

```keep if period == 3
xtset company month
asreg returns betas, fmb```

Where else FMB regression can be used?

Fama and MacBeth (1973) procedure (i.e step 3(i) and (ii)) is also used in areas other than testing the asset pricing models. You can see one example in my paper, Table 3, column 8, page 264

Shah, Attaullah & Shah, Hamid Ali & Smith, Jason M. & Labianca, Giuseppe (Joe), 2017. “Judicial efficiency and capital structure: An international study,” Journal of Corporate Finance, Elsevier, vol. 44(C), pages 255-274.

Export correlation table to Word with stars and significance level using asdoc

Category:Uncategorized

The updated version of asdoc can now create a table of correlation with significance levels starred at different levels. The new version can be installed by typing the following line in Stata.

Installation of the new version

`net install asdoc, from(http://fintechprofessor.com) replace`

An Example

`sysuse auto, clear asdoc pwcorr price mpg rep78 headroom trunk weight length turn , star(all) replace nonum`

Explanation

Just like with any other Stata command, we would write asdoc as a prefix to the Stata command. In this case, the Stata command is pwcorr which is followed by the variable names. After the comma, we added option nonum, star(all) and replace. These are explained bellow:

star(all) = This option is used to report stars to signfy significance at different levels. These are: ***

1. *** to show significance at 1% or bellow
2. ** to show significance at 5% or bellow
3. * to show significance at 10% or bellow

nonum = Without using this option, asdoc will report numeric numbers as column headers

replace = This option replaces any existing file

You would be interested in this blog entry where I show several useful options of asdoc that can be used with correlation tables.

Fama and MacBeth regression over 25 Portfolios using asreg in Stata

Category:Uncategorized

Antonio has asked the following question

Dear Sir,
I was wondering how to run a Fama and MacBeth regression over 25 Portfolios. In accordance with your code, the first variable needs to be the dependent variable while the following variables are considered as independent variables. Basically, I would like to calculate the risk premium of a factor over the 25 value and size-sorted portfolios. Therefore in my case, I would have more dependent variables and just one dependent variable.

`use http://fintechprofessor.com/ff.dta, clear`

So before running the Fama and MacBeth regressions, this is how the data needs to be structured.

The data is in a long format where the portfolios are tracked by a variable, called the panelvar. The portfolio returns are written in a separate variable, in our case, it is named as returns. The panelvar has values from 1, up to 25. The first 10 observations of the portfolios 1 and 2 look like:

```. list in 1/10, noob
+---------------------------------------------+
|    mofd   P   returns       size        MTB |
|---------------------------------------------|
|  1993m6   1      .038    64.0125   5.224508 |
|  1993m7   1     .0539   71.86839   4.505145 |
|  1993m8   1    -.0639   27.82528   1.888283 |
|  1993m9   1    -.0328   20.08383   7.730755 |
| 1993m10   1     .0249   59.34985   8.961844 |
|---------------------------------------------|
| 1993m11   1     .0657   47.42625   3.766557 |
| 1993m12   1     .0408   81.47429   5.148165 |
|  1994m1   1     .0185   42.39914   5.375627 |
|  1994m2   1     .0323   62.36839   4.882884 |
|  1994m3   1   -.00598   64.79323   1.281697 |
+---------------------------------------------+
. list in 101/110, noob
+---------------------------------------------+
|    mofd   P   returns       size        MTB |
|---------------------------------------------|
|  1993m6   2     .0114   41.16883   4.549813 |
|  1993m7   2    -.0158   10.09915   2.136258 |
|  1993m8   2    .00616   73.43023   2.924793 |
|  1993m9   2    -.0141   58.28651   7.608449 |
| 1993m10   2     .0129    63.4972   1.137969 |
|---------------------------------------------|
| 1993m11   2    -.0223    16.1786   1.368057 |
| 1993m12   2     .0322   64.10929   6.226629 |
|  1994m1   2    -.0144   54.48264   7.883276 |
|  1994m2   2     .0388   74.99379   1.362888 |
|  1994m3   2     .0345   68.66164   7.102628 |
+---------------------------------------------+
```

How to run the Fama and MacBeth regression

`ssc install asreg, replace`

asreg can estimate three types of regressions: (1) cross-sectional regressions (2) rolling window regressions and (3) Fama and MacBeth regressions. You can read more details here.

Since our main focus here is on the Fama and MacBeth procedure, the discussion this point onwards will use option fmb of the asreg program. The syntax is:

`asreg depvar indepvars, fmb`

The data must be first declared as panel data with the xtset command. In our dataset, we have P as the panel variable and mofd as the time variable, therefore, to declare the data as panel data, the xtset command would be:

`xtset P mofd`

In our dataset, we have the variable returns as the dependent variable and size and MTB as the two independent variables. The command for the Fama and MacBeth regression would be:

```. asreg returns  size MTB , fmb
```

Explanation

retunrs = The dependent variable

size and MTB = independent variables

asdoc Unicode issue | Stata to MS Word

Category:Uncategorized

For those who are not yet familiar with asdoc, asdoc can be downloaded from SSC and can be used with almost all Stata commands. Here is a short blog post that shows how asdoc can be used with any Stata command. You can also watch several YouTube videos that show the use of asdoc

asdoc installation

``` ssc install asdoc

```

The problem

Since asdoc uses RTF file format, any Unicode character passed from Stata to the RTF file will be incorrectly shown in the output file. For example, a variable has the following value label which is correctly shown on Stata screen, but when written with asdoc, it is distorted.

```Stata view: Don’t know
asdoc view: ```

The solution

If you have Stata 14 or higher, then the good news is that you can use unicode command to change the file encoding. Specifically, the subcommand convertfile of unicode command can be extremely useful here. In the following example, I shall download an example dataset, called unicode.dta from my site, and then tabulate the variable M25 with asdoc, write the results to Myfile.doc and convert the coding and write the results to a new file, Myfile_new.doc

```use http://fintechprofessor.com/unicode.dta, clear
asdoc tab M25, replace```

The file generated by asdoc looks like this, the last line of the file highlights the given issue.

Now lets try the unicode solution.

`unicode convertfile  Myfile.doc  Myfile_new.doc, dstencoding(Windows-1252) replace`

asdoc – custom table title | MS Word | Stata

Category:Uncategorized

For those who are not yet familiar with asdoc, asdoc can be downloaded from SSC and can be used with almost all Stata commands. Here is a short blog post that shows how asdoc can be used with any Stata command. You can also watch several YouTube videos that show the use of asdoc

asdoc installation

` ssc install asdoc`

Customizing the table title

The focus of this blog entry to highlight some of the tips that can be used to customize table titles. asdoc produces some generic table titles with many Stata commands. However, the default table titles can be replaced by using the option title. In the following text, I discuss this and seveal other features of asdoc to modify the table titles:

1. Custom titles

If custom titles are needed, just use option title( title text). See the following example.

```sysuse auto
asdoc tab rep78 , title(My cutom table title) replace
```

2. Title not required

We can use an empty title if the table title is not required. In the following example, I am using the backslash (\) inside the title option. This will produce empty title in the document.

```asdoc tab rep78 , title(\) replace
```

3. Title is needed in italics

We can use many RTF directives inside the title option. So to get the title in italics font, the code would be:

```asdoc tab rep78 , title(\i My custom table title) replace
```

4. Title with bigger font size

The default in asdoc is to report table titles in boldface and font size of 10pt. If bigger or smaller font size is needed, again we can pass the RTF directive \fs# to the document. So to produce a title with 14 pt, we shall type:

`asdoc tab rep78 , title(\fs28 My custom table title) replace`

The Error option matrow() not allowed” when using asdoc with tabulate command

Category:Uncategorized

Ronald asked the following question: I am getting the error when using asdoc with tabulate command.
option matrow() not allowed
r(198);

I asked Ronald to send his dataset for a closer look. Upon inspection, it turned out that Ronald was trying to tabulate values of a string variable, and asdoc had problem with that. As a temporary solution, I suggested the following two steps.

Step 1: Convert the string variable to numeric variable with value labels. The good news is that we can do that in one line of code using the encode command. So assume that the string variable name is country, we shall encode it:

`encode country, gen(country_new)`

where country_new is a new is a new variable with value labels.

Step 2: Now we can use it with asdoc, so to tabulate it and send the output to MS Word, we type:

`asdoc tab country_new, replace`

Reshape data in Stata – An easy to understand tutorial

Category:Blog,Uncategorized Tags :

From wide to long format

Suppose we have the data in the following format

`````` +-------------------------------------------------------+
| id   sex   inc80   inc81   inc82   ue80   ue81   ue82 |
|-------------------------------------------------------|
|  1     0    5000    5500    6000      0      1      0 |
|  2     1    2000    2200    3300      1      0      0 |
|  3     0    3000    2000    1000      0      0      1 |
+-------------------------------------------------------+``````

The above structure is known as the wide format. If we wish to convert it to a long format, such as the one given below,

`` +-----------------------------+  `` | id   year   sex    inc   ue |  `` |-----------------------------|` |  1     80     0   5000    0 | |  1     81     0   5500    1 | |  1     82     0   6000    0 | |  2     80     1   2000    1 | |  2     81     1   2200    0 | |-----------------------------| |  2     82     1   3300    0 | |  3     80     0   3000    0 | |  3     81     0   2000    0 | |  3     82     0   1000    1 |  +-----------------------------+ `

We shall just type

`reshape long inc ue, i(id) j(year)`

Explanation

Since we need to convert the data from a wide format to a long format, this is why the command that we wrote was reshape long. After that, we have to specify the names of the variables which are in the wide format. In our dataset, there are 2 variables which are INC and UE. Both of these variables have a numeric part. That numeric part is what we call the variable J. We specify this J variable in the option j(new variable). In our dataset, there is no variable with the name year, however, we wrote the option j(year) so that a new variable is created for the numeric values of 80, 81, and 82. We also specified option i(id), where option i needs an existing variable that is a unique panel identifier.

To practice the above yourself, here is the source data and code.

`use http://www.stata-press.com/data/r15/reshape1listreshape long inc ue, i(id) j(year)list`

Reshape long to wide

Continuing from the previous example, we can reshape the data back to wide format by

`reshape wide inc ue, i(id) j(year)`

email-subscribers-form id=”{form-id}”

Exporting ttest results from Stata to Word using asdoc

Category:asdoc,Blog,Stata Programs,Uncategorized

asdoc installation

If you have not already studied the features of asdoc, you can visit this page that lists the table of contents of what asdoc can do. You can also read this one paragraph introduction to asdoc. The following line of code will install asdoc from SSC

`ssc install asdochelp asdoc`

Reporting t-tests with asdoc

Before we make the t-test results table for our example data, let us breifly explore the options available in asdoc for making a t-test results table.

Whether it is one-sample t-test or two-sample or other forms, asdoc manages to report the results line by line for each test. asdoc also allows accumulating results from different runs of t-tests. For this purpose, the option rowappend of asdoc really comes handy. With the sub-command ttest , we can use the following options of asdoc to control asdoc behavior.

(1) replace / append

(2) save(filename)

(3) title(text)

(4) fs(#)

(5) hide.

(6) stats()

(7) rowappend.

These options are discussed in detail in Section 1 of asdoc help file. Option stats and rowappend are discussed below:

Option stat()

Without stat() option, asdoc reports the number of observations (obs), mean, standard error, t-value, and p-value with t-tests. However, we can select all or few statistics using the stat option. The following table lists the keywords and their details for reporting the desired statistics.

 keyword details n Number of observations mean Arithmetic mean se Standard error df degrees of freedom obs Number of observations t t-value p p-value sd standard deviation dif difference in means if two-sample t-test

Option rowappned

ttest tables can be constructed in steps by adding results of different t-tests to an existing table one by one using option rowappend. There is only one limitation that the t-tests are performed and asdoc command applied without writing any other results to the file in-between.

An example

Suppose we have the following data set with variables r0, r1, r2, r3, and y. The data can be downloaded into Stata by

`use http://fintechprofessor.com/ttests.dta, clear`

The variables ro-r3 are the numeric variables for which we would like to conduct one-sample ttest whereas variable y is a numeric date variable that tracks years. We wish to conduct a ttest for each of the r0-r3 variables and in each year and make one table from all such tests.

Without using a loop

` asdoc ttest R0==0 if Y==2009, replace title(One Sample t-test Results) asdoc ttest R1==0 if Y==2009, rowappend asdoc ttest R2==0 if Y==2009, rowappend asdoc ttest R3==0 if Y==2009, rowappend asdoc ttest R0==0 if Y==2010, rowappend asdoc ttest R1==0 if Y==2010, rowappend asdoc ttest R2==0 if Y==2010, rowappend asdoc ttest R3==0 if Y==2010, rowappend asdoc ttest R0==0 if Y==2011, rowappend asdoc ttest R1==0 if Y==2011, rowappend asdoc ttest R2==0 if Y==2011, rowappend asdoc ttest R3==0 if Y==2011, rowappend asdoc ttest R0==0 if Y==2012, rowappend asdoc ttest R1==0 if Y==2012, rowappend asdoc ttest R2==0 if Y==2012, rowappend asdoc ttest R3==0 if Y==2012, rowappend`

And appreciate the results

Explanation

1.In the first line of code, we wrote asdoc ttest in the beggining of the line. This is how we use asdoc with Stata commands. We just add asdoc to the beggining of any Stata command and that’s all.

2. We used two options of asdoc in the first line of code: the replace and title(). Replace will replace any existing file with the name Myfile.doc and title will add the specific test as a title to the output file.

3. In the second line of code, we added option rowappend() that will append the results to the existing table in the file Myfile.doc

4. And the process continues untill all ttests are estimated.

asdoc: Cutomizing the regression output | MS Word from Stata | Confidence Interval, adding stars, etc.

Category:asdoc,Blog,Uncategorized

Version 2.3 of asdoc adds the following features for reporting detailed regression tables.

1. Reporting confidence interval

2. Suppressing confidence intervals

3. Suppressing the stars which are used to show significance level

4. Customization of significance level for stars

These features are discussed in details below. If you have not already studied the features of asdoc, you can visit this page that lists the table of contents of what asdoc can do. You can also read this one paragraph introduction to asdoc. The following line of code will install this beta version of asdoc from our website

`net install asdoc, from(http://fintechprofessor.com) replacehelp asdoc`

Details of the new features

The new features related to creating detailed regression tables with asdoc are discussed below with details.

1. Confidence interval

I received several emails and comments on blog posts suggesting the addition of confidence intervals (CI) to the detailed regression tables created by asdoc. In version 2.3 onwards, confidence intervals are shown by default. This means that we do not have to add an additional option to report CI. See the following example.

`sysuse auto, clearasdoc reg price mpg rep78 headroom trunk weight length turn , replace`

2. Suppressing the confidence interval

If confidence intervals are not needed, we can use option noci. For example

`asdoc reg price mpg rep78 headroom trunk weight length turn , replace noci`

3. Suppressing stars

Similarly, if we are not interested in reporting significance stars, we can use option nostars. For example,

4. Setting custom significance level

The default significance levels for reporting stars are set at : *** for p-values <=0.01; ** for p-values <=0 .05, and * for p-values <=0.1. However, now we can set our own levels for statistical significance using option setstars. An example of setstars option looks like:

`setstars(***@.01, **@.05, *@.1)`

As we can see from the above line, setstars separates each argument by a comma. Each argument has three components. The first component is the symbol (in our case it is *) which will be reported for the given significance elve. The second component is the @ sign that connects the significance level with the symbol. And the third component is the value at which the significance level is set. So if we want to report stars such that

`* for p-value .001 or less** for p-value .01 or less*** for p-value .05 or less`

We shall write the option setstars as

`setstars(*@.001, **@.01, ***@.05)`

Continuing with our example, let us use the above option to report our defined level of stars.

`asdoc reg price mpg rep78 headroom trunk weight length turn , replace setstars(*@.001, **@.01, ***@.05)`

Merging datasets in Stata on long strings and less precise matching criterion

Category:Uncategorized

The Problem

Consider the following two datasets where company names are not exactly the same in both the datasets, still we want to merge them using the name variable as the merging criterion. How that can be done in Stata.

 name in dataset 1 name dataset 2 symbol data set2 The Pakistan General Insurance Co. Ltd. The Pakistan General Insurance PKGI Security Leasing Corporation Ltd. Security Leasing Corporation L SLCL NATIONAL BANK OF PAKISTAN NATIONAL BANK OF PAKISTAN NBP THE BANK OF KHYBER THE BANK OF KHYBER BOK THE BANK OF PUNJAB Ltd THE BANK OF PUNJAB BOP SILKBANK LIMITED . SILKBANK LIMITED SILK Awwal Modaraba Comay Awwal Modaraba AWWAL B.F. Modaraba. B.F. Modaraba BFMOD

The above table shows that the company names not only differ in terms of different number of characters but also in terms of capitalization. However, there is a general patterns of similarity in the first few characters, starting from left to right. Given that, we can split the problem into two parts.

1.Extract similar characters

Extract the first few characters that are similar in both the dataset and merge the data using those similar characters. For example, in the second row of the above table, “The Pakistan General Insurance” part is similar in both the tables. If we count these characters, they are 30. So how exactly are we going to find the matching number of characters in each case? We shall not do that. Instead, we shall take the iterative path where:

1. we start with extracting the first n-number of characters from both the key variables in the two datasets and merge using the extracted (truncated) variables. If the merge succeeds, we shall save the merged data separately. Also, we shall delete, from the initial file, those records which were successfully merged, and further process those that did not merge. In this first step, we shall normally start with extracting a large number of characters, for example, up to 30 characters in the case of row 2 of the above table.

Please note that the relevant Stata function is substr() for extracting a given number of characters from a variable. We shall discuss it further as we proceed in this article.

2. In the next iteration, we shall further process those records which did not merge. This time we shall extract one character less than what we used in the preceeding step. The idea is that if two variables did not have the first 30 characters in common, they might have the first 29 characters in common. But this time, we need to be careful as we reduce the extraction of the initial number of characters, the chances of matching incorrect records increases. For example, consider the following two records where the first 12 characters are exactly the same in both the records i.e. “THE BANK OF “, therefore, if we merge the two data sets using only the first 12 characters, we shall incorrectly merge THE BANK OF KHYBER into THE BANK OF PUNJAB

 THE BANK OF KHYBER THE BANK OF PUNJAB

So using the first 29 characters of the two variables, we shall proceed to merge. Again, we shall retain the successfully merged records, append it to the already saved successful merged data, and delete it from the initial dataset, just we did in the first step above.

3. The iterative process will continue in the same fashion, and each time we need to pay more attention to the merged data to identify any incorrect merges as explained in step 2.

4. We shall stop at a point where the data seems to have a reasonable number of similar observations based on the initial characters.

2. Use lower case

The second part of the problem is the dissimilarity in the capitalization of the names. This part of the problem is easy to handle as we can use the function lower() to convert all the names to lower cases and then use it in the merging process.

An Example

Let’s use a simple example to implement what we have read so far. In this example, we are going to use the same data as given in the above table. From that data, we shall create two datasets. One of the dataset will remain in the Stata memory, we shall call it data_memory. The other data set will be saved to a file, we shall call it data_file. The data_file has two variables, name and symbol. We shall merge the data_memory into data_file using variable name as the merging criterion. To create the two dataset, we can copy and paste the following code to Stata do editor and run it.

`* Copy the following code and run from Stata do editor *-------------------------------------------------------------------------- *                        Create the  data_memory *-------------------------------------------------------------------------- clear input str30 name str5 symbol "The Pakistan General Insurance" "PKGI"  "Security Leasing Corporation L" "SLCL"  "NATIONAL BANK OF PAKISTAN"      "NBP"   "THE BANK OF KHYBER"             "BOK"   "THE BANK OF PUNJAB"             "BOP"   "SILKBANK LIMITED"               "SILK"  "Awwal Modaraba"                 "AWWAL" "B.F. Modaraba"                  "BFMOD" endsave data_memory, replace*--------------------------------------------------------------------------*                          Now to create the data_file*-------------------------------------------------------------------------- clear input str39 name "The Pakistan General Insurance Co. Ltd." "Security Leasing Corporation Ltd."       "NATIONAL BANK OF PAKISTAN"               "THE BANK OF KHYBER"                      "THE BANK OF PUNJAB Ltd"                  "SILKBANK LIMITED ."                      "Awwal Modaraba Comay"                    "B.F. Modaraba."                          end save data_file, replace `

Now that we have created the two datasets, let’s start the process step by step. In the following code box, the code perform the first step as discussed above. I shall then explain what actually the code does.

`use data_file, cleargen name2=lower(substr(name,1,30))clonevar name_old = namesave data_file, replaceuse data_memorygen name2=lower(substr(name,1,30))merge 1:1 name2 using data_filesave temporary, replacekeep if _merge == 3save matched, replacelist name name2 symbol+----------------------------------------------------------------------+            name                                    name2              |---------------------------------------------------------------------- |NATIONAL BANK OF PAKISTAN                 national bank of pakistan    |Security Leasing Corporation Ltd.         security leasing corporation | THE BANK OF KHYBER                        the bank of khyber           |The Pakistan General Insurance Co. Ltd.   the pakistan general insurance -----------------------------------------------------------------------use temporary, clearkeep if _merge == 2keep namesave remaining, replacelist  ` +--------------------+`  `|     name           |  `  `+--------------------+  ` |     Awwal Modaraba | |      B.F. Modaraba | |   SILKBANK LIMITED | | THE BANK OF PUNJAB | +--------------------+  `

Explanation

In the above code block, we loaded the first dataset i.e. data_memory and created a truncated variable (name2) from the original variable using the following line of code. The first function in this line is lower() that converts all cases to lower cases. The second function is substr() that extracts bytes 1 to 30 from the string variable name. So this line will convert “Security Leasing Corporation Ltd.” to “security leasing corporation

`gen name2=lower(substr(name,1,30))`

After creating the name2 variable, we saved and replaced the data_file. We then loaded the other dataset, that is, data_memory and created a similar truncated variable as above, and then merged the two datasets using this truncated variable. The merged dataset has all useful information and we saved it in a temporary.dta file for further processing.

The first step in this processing is to isolate successful merges, that is flagged by the _merge == 3 code, therefore, we kept all such observations and saved them to matched.dta file. The list command shows that we have 4 successful merges in this go. We then reload the temporary.dta file to isolate those records that did not merge. Therefore, those records are saved to another file, which we named as remaining.dta.

In our next iteration, we shall take all of the above steps using the remaining.dta file, instead of data_file.dta file.

` *------------------------------------------------------------------------- *                        Iteration 2 - using first 29 characters*-------------------------------------------------------------------------  use data_file, cleardrop name2gen name2=lower(substr(name,1,29))save data_file, replaceuse  remaining, cleargen name2=lower(substr(name,1,29))merge 1:1 name2 using data_filesave temporary, replacekeep if _merge == 3save matched, replacelist name name2 symbol` +--------------------------------------------------+  `` |               name                name2   symbol |  `` |--------------------------------------------------|` |     Awwal Modaraba       awwal modaraba    AWWAL | |      B.F. Modaraba        b.f. modaraba    BFMOD | |   SILKBANK LIMITED     silkbank limited     SILK | | THE BANK OF PUNJAB   the bank of punjab      BOP | +--------------------------------------------------+ save remaining, replace`

Is there a simple way to do this?

smerge Program

The above tutorial is best for a learning the merging process, however, it is too lengthy and is not optimal for quick applications. I have managed to get the code in a standard Stata ado program. I call this program as smerge, that is merge based on sub-string.

Installation

The program can be installed by typing the following line in Stata command window

`net install smerge, from(http://fintechprofessor.com) replace`

The syntax of the program is given below:

` smerge varname, use(file_name) [lower() higher()]`

smerge is the program name, followed by the varname, that is the variable name which should exist in both the files, i.e. the file currently loaded file, i.e. the file in the memory of Stata and the file on disk. After the comma, the program options are listed. Option use(file_name) is a required option. In the brackets, we have to provide the file name with which we want to merge the data. It can be a full directory path with the file name or simply the file name if the file is in the current directory.

options enclosed in brackets [] show optional options in Stata. There are two optional options in smerge program. If these options are not specified, then the default values are used, which are 30 for the higher and 10 for the lower. The higher() option sets the initial value of the number of string bites to be extracted from the varname to make a truncated variable that will be used for the first iteration in the merge process. The initial value is decremented by one in the next iteration until it reaches the limit set by the option lower().

An Example using smerge

Let use the same dataset which we created above. We shall first load the file data_memory and merge it with the file data_file.dta. Both the datasets have the variable name.

`use data_memory.dta, clearsmerge name, use(data_file) lower(10) higher(30)list. list `+---------------------------------------------------------+`` |                                    name   symbol   nstr | `` |---------------------------------------------------------|` | The Pakistan General Insurance Co. Ltd.     PKGI     30 | |       Security Leasing Corporation Ltd.     SLCL     30 | |               NATIONAL BANK OF PAKISTAN      NBP     30 | |                      THE BANK OF KHYBER      BOK     30 | |                  THE BANK OF PUNJAB Ltd      BOP     18 | |---------------------------------------------------------| |                      SILKBANK LIMITED .     SILK     16 | |                    Awwal Modaraba Comay    AWWAL     14 | |                          B.F. Modaraba.    BFMOD     13 | +---------------------------------------------------------+ `

Explanations

Read the second line in the above code block. We typed smerge and then the variable name. The name variable is the key variable that we are using as the merging criterion. Option use(data_file) tells smerge program to merge the current data in memeory with the file data_file. And option higher(30) and lower(10) sets the limits for sub-string extraction for creating the truncated variables that will be used for merging. And that’s it. We just need one line of code to do all the steps that we went through in the earlier tutorial.

The list command shows three variables, one of them is nstr. This variable records the number of strings from left to right of the varname. This number is recorded when the merge is successful. So the list command shows that the four records were merged in the first iteration i.e when we extracted the first 30 characters. The 5th record was merged when we extracted 18 characters i.e. The full entry is THE BANK OF PUNJAB Ltd, and the first 18 characters are THE BANK OF PUNJAB