The SUM function can be used within what two statements?
PROC PRINT and DATA
SUM function
SUM(argument1, argument2);
argument = numeric constant, variable, or arithmetic expression
If an argument in a sum function has a missing value, what will be the result of the sum? eg. sum(salary, bonus) where salary is missing value.
The missing value will be read as a zero and the sum will equal to bonus
What’s the difference between:
compensation=sum(salary, bonus)
and
compensation= salary + bonus
Hint: think missing value
The sum function will treat missing value as zero and return a value, while using + when there’s a missing value will return a missing value
what do date functions, like the following, do?
YEAR(SAS-date)
MONTH(SAS-date)
DAY(SAS-date)
WEEKDAY(SAS-date)
QTR(SAS-date)
They return a more familiar value from a SAS date
4 digit year
number 1-12
number 1-31
number 1-7 where 1=sunday
number 1-4
current date functions
TODAY()
DATE()
return current date as a SAS date Value
MDY(month,day,year)
returns SAS Date Value from a specific date
If you drop a variable before a sum function, will the sum function work?
YES, the DROP statement is a compile-only statement. SAS sets a drop flag for the dropped variables, but the variables are in the PDV and are therefore available for processing
IF, THEN statement
IF expression THEN statement;
Why use the ELSE statement rather than several IF statements?
The system would process every single IF statement. The ELSE statement saves processing resources.
ELSE statement
IF expression THEN statement;
ELSE IF expression THEN statement;
What operator can you use if either of 2 conditions must be true to process a statement?
What operator can you use to encompass the “rest” of the observations (multiple) at the end of the IF THEN saga?
OR
ELSE statement;
What can you use if you want to have multiple THEN statements for an IF, THEN condition?
Use a DO group!
IF expression THEN
DO;
END;
ELSE IF expression THEN
DO;
END;
In the following program, what length is the variable Freq assigned? How can you fix a truncated value?
data work.bonus;
set orion.sales;
if Country=’US’ then
do;
Bonus=500;
Freq=’Once a Year’
end;
else if Country=’AU’ then
do;
Bonus=300;
Freq=’Twice a Year’;
end;
run;
It takes the length of the first assigment statement (i.e. ‘Once a Year’)
Use length statement to fix truncated values (length Freq $ 12;). You could also “pad” the first assignment statement with a blank at the end.
How to you concatenate two data sets (i.e. merge vertically)?
DATA newdataset;
SET SASDataSet1 SASDataSet2….;
RUN;
This will show DataSet1 following by data from DataSet2
How to you concatenate two datasets with differently named variables that hold the same information?
RENAME!
DATA NewDataSet;
SET SASDataSet1
SASDataSet2 (RENAME=(oldname1=newname1 oldname2=newname2));
RUN;
How do you merge two data sets horizontally?
You can merge the two data sets based on a common variable
DATA newdataset;
MERGE SASdataset1 SASdataset2;
BY (descending)variable;
RUN;
Note: Data sets must be sorted by the BY variable
How do determine which data set(s) contributed to values in an observation?
Use IN= Data Set Option
data newdataset;
merge sasdataset1 (IN=tempvariable1) sasdataset2 (IN=tempvariable2);
by commonvariable;
run;
SAS will create a temporary variable with value of 0 if data set did not contribute to obs and value of 1 if it did contribute. These variables will NOT appear in the output data set. They’re only available during execution phase
How do you do a match-merge where you omit non-matching obs between data sets?
Use IN= Data Set Option to see which data set contribute info to each observation. Then use IF statement to select only obs where both data sets contribute:
DATA newdataset;
MERGE sasdataset1 (IN=tempvar1) sasdataset2 (IN=tempvar2);
BY commonvar;
IF tempvar1= 1 AND tempvar2=1;
RUN;
Alternate syntax for IF statement to select data sets contributing/not contributing data to an observation
IF tempvar1=1 and tempvar2=1 = IF tempvar1 and tempvar2
IF tempvar1=0 or tempvar2=0 = IF not tempvar1 or not tempvar2
How do you count the frequency of certain observations?
use PROC FREQ
PROC FREQ data=sasdataset;
TABLES variable1 variable2;
RUN;
If you omit the table statement, SAS produces a one-way frequency table for every variable in the data set.
Note: PROC FREQ outputs automatically - doesn’t need proc print step
PROC FREQ automatically displays frequency, percent, cumulative frequency, cumulative percent. How do you suppress some of these statistics?
PROC FREQ data=sasdataset;
TABLES variable1/option1 option2;
RUN;
options:
nocum => suppresses both cumulative frequency and percent
nopercent => supresses both percent and cumulative percent
If do both, the table will only display frequencies
How do you make a frequency table of a continuous variable?
Use PROC FORMAT to define ranges
proc format;
value tiers low-25000=’Tier1’
25000<-50000=’Tier2’;
run;
proc freq data=sasdataset;
tables Salary;
format Salary tiers.;
run;
How do you make a frequency tables across 2 variables? E.g. determine the number of male and female employess in each the US and Australia
PROC FREQ data=sasdataset;
TABLES Gender;
BY Country;
RUN;
(Remember to sort by the “by” variable first)
This produces two tables - one for gender frequency in the US and one for gender frequency in Australia.