Data Processing in SAS

Libraries
Libraries are directories where your datasets are stored. It is a good idea to declare a local library where you want store the dataset after you’re done processing it. They can be declared with the libname function. Here i declare the library out. Any datasets stored in this library can be accessed by preceding the dataset name with the library name, in library.dataset format. Read the data step section to see this in action.

libname out 'C:\sasdata';

Data Step
The data step is where new data is introduced into SAS, and one way to create new variables. SAS has many, many ways to move data into the program, and to manipulate it once it’s in place. We will only go into a few.

  • Cards – The first method available is to enter the data directly. Using the cards command in the data step, we can create the dataset without referencing outside files.
    data out.projdata;
    	input a b;
    	cards;
    	0 1
    	1 2
    	3 2
    	4 1
    	;
    	run;

    Note that in this data step, I am creating the permanent dataset out.projdata, with 4 observations of 2 variables. On a side note, “cards” is a holdover from when punch cards were used to enter data. The command “datalines” can be used as well.

  • Set – Set allows us to call existing SAS datasets. Data stored on the machine in the sas format, .sas7bdat, can be called by name into SAS. Give the folder they reside in a library name, and reference them by name using the library.dataset format.
    data out.projdata2;
    	set out.projdata;
    	c = a+b;
    	run;
  • Infile – Infile allows us to reference an exterior file that’s not stored in SAS’s dataset format. There are many options to this, but in general datasets stored in .csv or .txt are relatively easy to import. There will be made reference to something called a delimiter. This is a character that separates data stored in 2 columns. For instance, if a row had 2,1,3,4, then the delimiter would be a comma.
    data out.projdata;
            infile "C:\sasdata\data.txt" dlm= '|' firstobs=1;
            input a b c d;
            run;

    In this statement, we see that the file I’m reading in is using a pipe ‘|’ as the delimiter. In files that use tab delimiting, we have to use the hex representation of tab, ’09’x.

    data out.projdata;
            infile "C:\sasdata\data.txt" dlm='09'x firstobs=1;
            input a b c d;
            run;

    The file I’m importing in this datastep has 4 variables that I’m giving arbitrary names.

  • Proc Import – Not a data step, but another important way to get data into SAS is proc import. This gives us much more flexibility than the infile command in the datastep.
    proc import datafile="C:\sasdata\data.txt";
            out=out.projdata;
            dbms=dlm;
            delimiter='|';
            run;

    This command is even easier if we are importing a .csv. In that case, the dbms option is csv, and the delimiter is assumed as a comma.