How can I find things in a character variable in SAS?

You can find a specific character, such as a letter, a group of letters, or special characters, by using the index function. For example, suppose that you have a data file with names and other information and you want to identify only those records for people with  "Harvey" in their name.  You could use the index function as shown below.  First, let's input an example data set and use proc print to see that it was entered correctly.

data temp;
input name $ 1-12 age;
Harvey Smith 30
John West    35
Jim Cann     41
James Harvey 32
Harvy Adams  33

proc print data = temp;
Obs    name            age

 1     Harvey Smith     30
 2     John West        35
 3     Jim Cann         41
 4     James Harvey     32
 5     Harvy Adams      33

Now, let's use the index function to find the cases with "Harvey" in the name.

data temp1;
set temp;
x = index(name, "Harvey");

proc print data = temp1;
Obs    name            age    x

 1     Harvey Smith     30    1
 2     John West        35    0
 3     Jim Cann         41    0
 4     James Harvey     32    7
 5     Harvy Adams      33    0

The values of the variable x tell us the first location in the variable name where SAS encountered the word "Harvey".  In the second observation, John West does not have the word "Harvey" in his name, so a value of 0 was returned. 

Now let's suppose that you wanted to search for one of several characters in a string variable.  For example, perhaps you want to search for "-", "_" or "X".  To accomplish this, you could use the indexc function, which will allow you to supply multiple excerpts. The variable found1 is included to show why you cannot use the index function and supply it will all of the characters for which you are searching.

data temp3;
input string $ 1-11;
4-5 abc XxX
11_ jkl xxx
abc 3-5 jjj
xXx ()1 lll
xxx 344 aaa

data temp4;
set temp3;
found = indexc(string, "-", "_", "X");
found1 = index(string, "-_X");

proc print data = temp4;
Obs      string       found    found1

 1     4-5 abc XxX      2         0
 2     11_ jkl xxx      3         0
 3     abc 3-5 jjj      6         0
 4     xXx ()1 lll      2         0
 5     xxx 344 aaa      0         0

As you can see from the output above, the value in the variable found indicates the position that the first of any of the characters listed in the indexc function was encountered.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.