In this post, I show how to convert string variables to numeric in Stata. String variables are shown in red . If a numeric variable is stored as a string variable in Stata, we have several ways to convert them to numeric variables. Let’s start with the destring command first.
1. The destring command
The destring command might be the first choice for converting string variables to numeric if we have a limited number of non-numeric characters. With this command, we can either generate a new variable or replace the existing one. Here is an example:
*First create a dummy data input str15(price return) "120.25" ".10" "122.25" ".12" end *Now destring the two variables destring price, replace destring return, replace
1. The ignore option of destring
If there are non-numeric characters in our dataset, destring command will show an error
non-numeric characters found
For example, our data might have comma separators, therefore, destring will generate the above error. In such cases, we can use the ignore(“,”) option, which tells Stata that the given character i.e. ” ,” should be ignored. See the following example:
*First create a dummy data input str15(price return) "12,000.25" ".10" "12,200.25" ".12" end *Now destring the two variables destring price, replace ignore(",")
1.2 List all non-numeric characters
We can list all non-numeric characters using the tabulate command and the real() function. Suppose that our variable strvar contains non-numeric values
tabulate strvar if missing(real(strvar ))
Suppose that the above code comes up with a list of the following non-numeric characters.
tabulate strvar if missing(real(strvar)) strvar | Freq. Percent Cum. ------------+----------------------------------- #error | 1 0.00 0.00 #na | 1 0.00 0.00 . | 171,106 100.00 100.00 14* | 1 0.00 100.00 ------------+-----------------------------------
The above tables shows that there are three non-numeric variables in our dataset. These are #error, #na, and *. We can specify these in the ignore option. The following code creates a new variable numvar
destring strvar, ignor("#error" "#na" "*") gen(numvar)
2. The real() function
The destring command is useful in a sense that it does not convert data to missing observations. Instead, it gives you an error message when there are non-numeric characters in the variable. If you are sure that observations with non-numeric characters are not needed , you can use the real()
function with generate
command.
generate newvar = real(strvar)
The above code uses a brute force. It converts string values to numeric values. And if there are non-numeric characters? Those observations are set to missing values.
AOA Sir, I really appreciate your hard working and such kind of knowledge give a potential to do something for country and nations and whole world
Hi there,
I am Abrar Rehman Rather a PhD student. My question is how to destring a variable containing both numeric values and alphabets in stata