1 Foreword This Exam Preparation book is intended for those preparing for the SAS Certified Base Programming for SAS 9 Exam. The Art of Service is an Accredited Training Organization and has been training IT professionals since 1998. The strategies and content in this book is a result of experience and understanding of the SAS programming, and the exam requirements. This book is not a replacement for completing the course. This is a study aid to assist those who have completed an accredited course and preparing for the exam. Do not underestimate the value of your own notes and study aids. The more you have, the more prepared you will be. While it is not possible to pre-empt every question that may be asked in the SAS Certified Base Programmer exam, this book covers the main concepts covered within the SAS Programming discipline. The SAS Certified Base Programmer for SAS 9 is the ideal certification for those relatively new to SAS programming or new to SAS certification. It is also the principle certification for many of the advanced certifications available from SAS. These study notes and sample exam questions in this book will allow you to more easily prepare for an SAS Programming exam. Ivanka Menken Executive Director The Art of Service
1
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Write a review - receive Any Free eBook from our Catalog - $99 Value If you recently bought this book we would love to hear from you! Benefit from receiving a free eBook from our catalog at http://www.emereo.org if you write a review on Amazon (eg the online store where you purchased this book) about your last purchase! How does it work? To post a review on Amazon, just log in to your account and click on the Create your own review (under Customer Reviews) button of the relevant product page. You can find examples of product reviews in Amazon. IF you purchased from another online store, simply follow their procedures. What happens when I submit my review? Once you have submitted your review, send us an email at
[email protected] with the link to your review, and the eBook you'd like as our thank you from http://www.emereo.org. Pick any book you like from the catalog, up to $ 99 RRP. You will receive an email with your eBook as download link. It's that simple.
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
2
2 Table of Contents 1 2 3 4 5 6
Foreword............................................................................ 1 Table of Contents............................................................... 3 SAS Certified Base Programmer for SAS 9 ....................... 6 Exam Specifics .................................................................. 7 Exam Prerequisites............................................................ 7 SAS Fundamentals ............................................................ 8 6.1 SAS Programs ............................................................. 8 6.2 SAS Libraries ............................................................. 10 6.3 SAS Data Sets ........................................................... 14 6.4 SAS System Outputs ................................................. 17 7 Creating SAS Data Sets .................................................. 19 7.1 Raw Data Files........................................................... 19 7.2 The Data Step............................................................ 21 7.3 Processing DATA Steps ............................................ 27 7.4 Debugging DATA Steps............................................. 30 7.5 SAS Data Sets ........................................................... 33 7.6 Combining SAS Data Sets......................................... 37 7.7 Processing Match-Merge ........................................... 41 8 Working with Data............................................................ 44 8.1 Column Input ............................................................. 44 8.2 Reading Free Format Data ........................................ 47 8.3 Reading Data and Time Values ................................. 51 9 Working with Observations .............................................. 52 9.1 Creating One Observation From Multiple Records .... 52 9.2 Creating Multiple Observations From Single Records 54 10 Creating Reports.............................................................. 55 10.1 Basic Reports ......................................................... 55 10.2 Advanced Reporting ............................................... 63 10.3 Descriptive Statistics............................................... 68 10.4 Creating Frequency Tables .................................... 72 10.5 Producing HTML Output ......................................... 75 11 Advanced Work with Variables ........................................ 78 11.1 User-Defined Formatting ........................................ 78 11.2 Using SAS Functions.............................................. 85
3
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
11.3 Expanding DO Loops.............................................. 93 11.4 Process Arrays ....................................................... 95 12 Troubleshooting SAS Programs ...................................... 98 12.1 Types of Errors ....................................................... 98 13 Practice Exam................................................................ 100 14 Answer Guide ................................................................ 111 14.1 Answers to Questions........................................... 111 15 References .................................................................... 118
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
4
Notice of Rights All rights reserved. No part of this book may be reproduced or transmitted in any form by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Notice of Liability The information in this book is distributed on an “As Is” basis without warranty. While every precaution has been taken in the preparation of the book, neither the author nor the publisher shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the instructions contained in this book or by the products described in it. Trademarks Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations appear as requested by the owner of the trademark. All other product names and services identified throughout this book are used in editorial fashion only and for the benefit of such companies with no intention of infringement of the trademark. No such use, or the use of any trade name, is intended to convey endorsement or other affiliation with this book.
5
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
3 SAS Certified Base Programmer for SAS 9 SAS Certified Base Programmer for SAS 9 covers a basic programming knowledge to develop efficient code to meet business requirements. The certification is geared towards individuals who are new to SAS programming or new to SAS certifications. The certification program for Base Programmer for SAS 9 is the basis for all other certifications provided by SAS. A SAS Base Programmer should have some experience in data management and SAS programming, specifically with tasks related to: x manipulating and transforming data x importing and exporting raw data files x combining SAS date sets x creating basic reports x correcting errors in data syntax and programming logic. Certification is valid for three years, at which time a recertification exam must be completed.
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
6
4 Exam Specifics Exams are delivered in a secure environment, proctored, and timed by Prometric. Details about the exam include: x 70 multiple-choice questions x Score of 65% x 2 hour testing time. Tests can be scheduled online. Prometric requires a 24-hour waiting period for first-time registrations of new candidate. The first-time register will receive an email confirmation. If a candidate fails the exam, they must wait a minimum of 30 calendar days before attempting again. Candidates are required to bring two forms of identification to the testing center on the day of their test. They are recommended to arrive 15minutes before the test. If they are more than fifteen minutes late, a seat may not be available, nor a refund guaranteed.
5 Exam Prerequisites There are no prerequisites for SAS Certified Base Programmer for SAS 9.
7
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
6 SAS Fundamentals Key components of SAS x Programs x Libraries x Files x Data sets.
6.1
SAS Programs
SAS programs are used to access, manage, analyze, or present data. Most SAP programs are a combination of DATA and PROC (procedure) steps. DATA steps create or modify data sets, specifically: x Entering data into a SAS data set x Checking for errors in the data x Computation of data x Producing new SAS data sets x Producing customer-designed reports. PROC steps are pre-defined procedures to analyze and process data in a SAS data set. PROC steps are used to manipulate the data typically for reporting purposes. As a result, PROC steps can: x Create new SAS data sets x List, sort, and provide summaries of data x Produce statistical results x Produce plots and charts. 6.1.1
Formatting and Behavior of SAS Programs x Programs consist of statements. x Statements start with a keyword. x Statements end with a semicolon.
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
8
Statements are in free format: x Can begin and end anywhere on a line x Can exist on multiple lines x Can exist with multiple statements on a single line Statements can be in uppercase or lowercase. Case sensitivity is typically resolved using quotation marks. Separation of words is resolved using spaces or special characters. 6.1.2 Processing Programs Upon submission of a SAS program, SAS will read the statements. While statements are being read, error checking is in place. New steps begin with a DATA and PROC statement. The completion of a step is identified by a RUN or QUIT statement. SAS will read the step completely. When a new step is found, SAS stops reading and will execute the step it just read. When a step is executed, the processing activity is logged with the results of the processing. The SAS log is a collection of these messages as well as any errors that occurred. General results of processing: x Creation of a SAS data set x Production of SAS log messages x Creation of a report x Sorting and managing data.
9
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
6.2
SAS Libraries
Libraries are collections of files. Implementation of libraries depends on the operating environment. A library can be a physical or logical collection. Summary of libraries in different operating environments: x Windows – stored in the same directory x UNIX – stored in the same directory x OpenVMS – stored in the same directory x CMS – same file type x z/OS – specially formatted host data Files can be stored temporarily or permanent. When a library name is ‘Work’ or unspecified, files are stored in a temporary SAS data library. Temporary files are deleted at the end of a session. Files are permanently stored in libraries that have a specified name other than ‘Work’. 6.2.1 File Referencing To reference, a two-level name is used. The format of a two-level name is: libref.filename. libref is ‘library reference’, the SAS data library containing the file. filename is the name of the specified file. If there is no libref identified or ‘work’, the file reference is a temporary file. All other files are permanent. Several libraries are defined by default: x ‘Sashelp’ is a permanent read-only library containing sample data and other files that control the behavior of the SAS x ‘Sasuser’ is a permanent library containing SAS files that Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
10
x
stores personal settings in the Profile catalog ‘Work’ is a temporary library for files.
Defining libraries is often the first step in setting up a SAS session. Assign a library name (libref): x Use the LIBNAME statement x Format: LIBNAME libref 'SAS-data-library' x libref is 1 to 8 characters long beginning with a letter or underscore and containing only letters, numbers, or underscores x SAS-data-library is the name of the library where files are stored x Multiple LIBNAME statements can be used to assign as many librefs as required. x LIBNAME statements are global and stay in effect until the libref is modified, canceled or the SAS session ends. The files in the library are inaccessible if the library is deleted or the session ends. The files still exist on the operating system. Comparison of physical names of SAS data within operating systems for the library “sample”: x Windows – c:\sample\data x UNIX - /users/month/sample/sasdata x OpenVMS – dua0:[month.fitness] x CMS – b x z\OS – month.sample.sasdata 6.2.2 Referencing Other Formats The LIBNAME statement can be used to reference files created by other software programs. To read or write to foreign files, an engine for the file type must be used. A SAS engine is a set of internal instructions used by SAS. Sometimes, the appropriate engine is brought up automatically; other times, the engine must be specified.
11
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
To specify the engine to be used x Use the LIBNAME statement x Format: LIBNAME libref engine 'SAS-data-library' x libref is 1 to 8 characters long beginning with a letter or underscore and containing only letters, numbers, or underscores x engine is the name of the library engine used in the operating system x SAS-data-library is the name of the library where files are stored Interface library engines support read-only access to BMDP, ORISIS, and SPSS files. Physical filenames associated with a libref are actual filenames (a rule exception). x Example: Using libref = Differ and the SPSS engine for the c:\newdata.dat in the windows operating environment: libname differ spss 'c:\newdata.dat' Other types of data that can be read or written to are dependent on the operating environment and the SAS/ACCESS products that are licensed, including: x Relational Databases ORACLE SYBASE Infomix DB2 for z/OS (OS/390) DB2 for UNIX and PC Oracle Rdb ODBC CA-Openingres x
Non-relational Files ADABAS IMS/DL-I CA-IDMS SYSTEM 2000
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
12
x
PC Files Excel (.xls) Lotus (.wkn) DBF DIF
6.2.3 View Library Contents The CONTENTS procedure creates output to: x describe the contents of a library x describe the descriptor information for a SAS data set. To create SAS output of library contents: x Use the PROC CONTENTS statement x Format: PROC CONTENTS DATA=libref._ALL_NODETAILS; RUN; x libref is the libref assigned to the SAS library x _ALL _ requests a listing of all files in the library x NODETAILS prevents detailed information about each file from being in the output. PROC DATASETS can be used to view the contents of a SAS library or SAS data set with a CONTENTS statement. The difference between the two procedures is the type of statement and the defaults for libref: x PROC DATASETS is an interactive procedure requiring a QUIT statement x The PROC DATASETS default is DATA=option x The PROC CONTENTS default is either Work or User. To create SAS output of library contents: x Use the PROC DATASETS statement with CONTENTS x Format: PROC DATASETS; CONTENTS DATA=libref._ALL_NODETAILS; QUIT; x CONTENTS describes the contents of SAS data sets
13
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
x x x
specified and prints the directory of the library libref is the libref assigned to the SAS library _ALL _ requests a listing of all files in the library NODETAILS prevents detailed information about each file from being in the output.
When viewing descriptor information for a specific data set, the variables are listed alphabetically by default. Descriptor information can be viewed in logical position, or creation order, by using the VARNUM option: x PROC DATASETS; CONTENTS DATA=libref VARNUM; QUIT;
6.3
SAS Data Sets
SAS data sets are one type of SAS file. To process data, it must be in the form of a data set. Processing can consist of: x Accessing data x Analyzing, managing, and presenting data. Data sets must have a name. Valid data set names can: x Be 1 to 32 characters long x Begin with a letter, uppercase or lowercase, or an underscore x Consist of any combination of letters, numbers, and underscores. Data set has two parts: a descriptive portion and a data portion that the data set can locate.
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
14
6.3.1 Descriptive Portion Contains information about the data set, including: x data set name x data type x creation date and time x number of observations x number of variables x number of indexes. Contains information about the variables in the data set, including: x name x type x length x format x informat x label. 6.3.2 Data Portion Collection of data values arranged in a table. Observations refer to the rows in the table. Each row is a collection of data values that relate to a single object. Variables refer to the columns in the table. Each column is a collection of values that describe a particular characteristic. Each variable in a data set has a series of attributes. Variables of the same name can aid in combining SAS data sets. To combine data sets, variables must be the same type. A variable name conforms to SAS naming conventions. Variable names follow the same rules as data set names: x Be 1 to 32 characters long x Begin with a letter, uppercase or lowercase, or an underscore x Consist of any combination of letters, numbers, and underscores.
15
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Variable types are either character or numeric: x Character variables can contain any value x Numeric variables can contain only numeric values x Representation of missing values is determined by variable type: o A blank represents a missing value for character values o A period represents a missing value for numeric values. Variable length identifies the number of bytes used to store the variable. Length is dependent on type: x Character variables can be up to 32,767 bytes long x Numeric variables have a constant default length of 8 with an infinite number of digits possible x Numeric variables have a constant length because they are stored as floating-point numbers x A different length can be specified for numeric variables. Variable format affects the way data values are written. x Formats are provided by SAS software or created by a programmer x The appropriate format must be selected to write value in a given formatting. The maximum width of the written value usually required to be specified x Some formats require specification of the number of decimal places x A format can be permanently assigned to a variable in a SAS data set or temporarily assigned in a PROC step to control output appearance. When data values must be read into a SAS data set, how they are read must be specified. Reading specifications are controlled by informats. Variables can have labels: x Labels are descriptive text up to 256 characters long x By default, variables are identified by their names: for more information on the variables, labels are applied. Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
16
6.4
SAS System Outputs
Result formats for SAS formats can be specified: x Traditional SAS listing x HTML document x Both a listing and HTML document. The appearance of the output can be controlled, specifically (for SAS listings): x line size (the maximum width of the log and output) x page size (the number of lines per printed page) x page numbers displayed x data and time displayed. To modify system options, use an OPTIONS statement (a global statement). x OPTIONS: o options identifies one or more system options to be changed o The system options available are dependent on the operating system. Survey of common options: NUMBER | NONUMBER
Page numbers appear by default, so NONUMBER prevents the printing of page numbers.
DATE | NODATE
Date and time appear by default, so NODATE prevents the printing of date and time.
PAGENO=option
Specifies the beginning page number for report. The default is 'page 1'.
PAGESIZE=option
Specifies how many lines on each page of the output.
LINESIZE=option
Specifies the width of the print line. Observations that are unable to fit on a single line will continue on a different line.
TEARCUTOFF=option Fulfills Year 2000 Compliance.
17
Copyright The Art of Service Brisbane, Australia │ Email:
[email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
OBS=
Specifies the last observation to be processed. The default is MAX.
FIRSTOBS=
Specifies the first observation to be processed. The default is 1.
FIRSTOBS = and OBS=
Specifies a range of observation to be processed.
FORMCHAR= 'formatting-character'
Specifies the formatting characters for the output device.
FORMDLIM= 'delimiting-character'
Specifies the character used to delimit page breaks. Default is null.
LABEL | NOLABEL
Can replace variable names with descriptive labels. The default is LABEL.
SOURCE | NOSOURCE
Controls if SAS source statements are written to the log. Default is SOURCE.
FIRSTOBS= and OBS= can be used as data set options instead of system options. A data set option is set in parenthesis following the input data set name. x Format: options firstobs=8 obs=16; proc print data=newdata.set (firstobs=5 obs=22); run; To view the current setting of one or all SAS system options, use the OPTIONS procedure. x Format: PROC OPTIONS
; RUN;
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
18
7 Creating SAS Data Sets
7.1
Raw Data Files
A raw data file is an external text file. Raw data files are nonproprietary. Many software programs use raw data files. The external file contains data values which are organized in fields. Fields may be difficult to locate in raw data format. Fixed fields ensures the values for specific variables are located in the same position for all records. 7.1.1 Creating a Data Set with Fixed Fields 1. The SAS library used to store the data set must be referenced. 2. A DATA step program is written to read the raw data file and create a SAS data set x The following instructions must be provided to SAS by the DATA step in order to read the raw data file: o Name or location of the external file o Name for the new SAS data file o Reference for the external file o Description of the data values to be read x Additional SAS statements can be used to customize the data. 3. A PROC PRINT step can produce a list report to display the data values. Basic statements for construction: LIBNAME statement
Reference a SAS data library.
FILENAME statement
Reference an external file.
DATA statement
Name a SAS data set.
INFILE statement
Identify an external file.
INPUT statement
Describe data.
19
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
RUN statement
Execute DATA step.
PROC PRINT statement
List data.
RUN statement
Execute final program statement.
7.1.2 Creating References To reference a permanent SAS library, a LIBNAME statement is required: x Use the LIBNAME statement x Format: LIBNAME libref 'SAS-data-library' o libref is 1 to 8 characters long beginning with a letter or underscore and containing only letters, numbers, or underscores o SAS-data-library is the name of the library where files are stored. A LIBNAME statement is not required for all situations, such as temporary SAS data set or if the libref has been automatically assigned for the permanent library by SAS. The external file containing the raw data must be referenced using a FILENAME statement: x Use the FILENAME statement x Format: FILENAME fileref 'filename' o fileref is a name associated to an external file and must be 1 to 8 characters long beginning with a letter or underscore and containing only letters, numbers, or underscores o filename is the fully qualified name or location of the file. When a fileref is associated with an individual external file, the fileref is specified in subsequent SAS statements and commands. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
20
When a fileref is associated with an aggregated storage location, the fileref is specified with the individual filename in parenthesis. When reading raw data, the INFILE statement identifies which file is storing the data. To identify the external file where data is stored: x Use the INFILE statement x Format: INFILE file-specification ; o file-specification takes the form of fileref from a previously defined file reference or 'filename' to point to the actual name and location of the file o options describes the characteristics of the input file and how the raw data should be read. LIBNAME and FILENAME are global statements which are in place until they are changed, canceled, or the SAS session ends.
7.2
The Data Step
7.2.1 Writing a Basic Step Program The start of a DATA step is the DATA statement. Within the DATA statement, the SAS data set to be created is named. To begin a DATA step and name a SAS data set: x Use the DATA statement x Format: DATA SAS-data-set-1 (...SAS-data-set-n>; o SAS-data-set-n is the name of the data set(s) to be created Columns are the common input style specifying actual locations for values. Columns should not be used in all situations, but only when data is: x consists of standard character and numeric values x in fixed fields.
21
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Standard numeric data values contain only: x numbers x decimal points x scientific notations (e-notation) x positive or negative numbers. Nonstandard numeric data values include: x contain special characters such as % or $ x date and time values x fractions x integer binaries x real binaries x hexadecimal numbers. Raw data organized in columns, or fixed format, allow a beginning and ending column for each file to be specified. Raw data that is not organized in columns is considered in free format. Free format cannot be read by column input. 7.2.2 Describing Data The INPUT statement is used to describe the fields of raw data. For each field of raw data to be read, the information required for the INPUT statement includes: x a valid SAS variable name x a variable type (character or numeric) x a range identifying starting and ending columns.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
22
To describe the field of raw data to be read and placed in the SAS data step: x Use the INPUT statement x Format: INPUT variable; <$> startcol-endcol... o variable is the SAS name that you assign to the field o The dollar sign indicates the variable type as character. To indicate numeric, no dollar sign should exist o startcol is the starting column for the variable o endcol is the ending column for the variable. Using column input, you can: x read any or all fields from the file x read the filed in any order x specify only starting columns when values occupy only one column. Variable names conform to SAS naming conventions: x Be 1 to 32 characters long x Begin with a letter, uppercase or lowercase, or an underscore x Consist of any combination of letters, numbers, and underscores. To verify data, use the OBS=option in the infile statement. Using OBS=n allows a portion of the data to be processed. The log will verify that the raw data file was read correctly, typically showing: x The number of records read x The number of observations and variables created. If invalid data occurs, a note in the log will occur which include the variable that erred, a column ruler and actual data line. To view the data processed, use a PROC PRINT step.
23
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
7.2.3 Creating and Modifying Variables To modify existing values or create new variables, an assignment statement can be used in any DATA step. To modify or create variables: x Format: variable=expression; o variable is the name of the new or existing variable o expression is any valid SAS expression. SAS expressions are used to: x transform variables x create new variables x conditionally process variables x calculate new values x assign new values. Expressions are instructions that consist of a sequence of operands and operators: x Operands are variable names or constants in character or numeric format x Operators are special-characters, grouping parenthesis, or functions. Arithmetic operators allow calculations to be performed. Arithmetic operators include: x negative prefix (-) x exponentiation (**) x multiplication (*) x division (/) x addition (+) x subtraction (-)
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
24
Arithmetic operators have priorities associated with them to assist in properly calculating an expression when more than one operator exists in the expression. To use priorities properly: x Higher priorities are performed before lower priorities x If consecutive operators have the same priority to be performed: o from right to left within priority I o from left to right within priorities II and III x The control of operations can be done with parentheses. Comparison operators express a condition. Comparison operators include: x equal to (= or eq) x not equal to (^= or ne) x greater than (> or gt) x less than (< or lt) x greater than or equal to (>= or ge) x less than or equal to (<= or le) x equal to one of a list (in) Logical operators create a link between sequences of expressions. Logical operators (boolean) include: x both and (AND or &) x either or (OR or |) x not (^) Data values can be assigned to variables using date constants. Constants are represented by 'ddmmmyy' or ddmmmyyyy' followed by a D x Format: 'ddmmmyyyy' D
25
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Some situations require processing to be done only when an observation meets a specific condition, which uses a subsetting IF statement to indicate the condition. To express a condition: x Use the IF statement x Format: IF expression; o expression is any valid SAS expression: o If the expression is true the DATA step continues o If the expression is false, no further statements are processed. 7.2.4 Using Instream Data Lines The INFILE statement identifies an external file to read. The DATALINES statement identifies data in the SAS program to be read, and is the last statement in the DATA step. To read instream data lines: x Use the DATALINES statement x Format: DATALINES; Only one DATALINES statement for each DATA step x x
CARDS is an alias for DATALINES and can be used as well If the data contains semicolons, use DATALINES4;;;;
7.2.5 Creating a Raw Data File To create a raw data file: x Use the SET statement x Format: DATA _null_; SET dataset; _null _ allows the DATA step to be used to without creating a SAS data set SET identifies the SAS data set to read Writing observations from the data set into the raw data can be done Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
26
using FILE and PUT statements. x The FILE statement specifies the output file. x The PUT statement how the lines should be written to the file. To specify an output file: x Use the FILE statement x Format: FILE file-specification; o file-specification takes the form of fileref from a previously defined file reference or 'filename' to point to the actual name and location of the file o options describes the characteristics of the input file and how the raw data should be read. To describe the lines written to the raw data: x Use the PUT statement x Format: PUT variable startcol-endcol...; o variable is the name of the variable being written o startcol identifies the line to begin writing the value o endcol identifies the line to end writing the value.
7.3
Processing DATA Steps
When a DATA step is submitted, the SAS processes the DATA step and creates a new SAS data set. The process consists of two phases: x The compilation phase scans each statement for errors. An error will prevent further processing. When complete with no errors, the descriptor portion of the data set is created x The execution phase, the DATA step reads and processes the input data. By default, the DATA step is executed once for each record in the input file.
27
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
7.3.1 Compiling the DATA Step To start the compilation phase, the input buffer is created: x The input buffer is an area of memory used to hold a record from the external file x It is only required when reading raw data x Refers to logical, not physical, memory. The creation of a program data vector begins after the input buffer is created. The program data vector is an area of memory where the SAS will build the data set one observation at a time: x Refers to logical, not physical, memory x Contains two variables automatically which are used for processing but not written to the data set x _N_ is the number of times the DATA step begins execution x _ERROR_ is the occurrence of an error caused by data. The default is 0, identifying no errors. One of more errors existing, the value is 1. Each statement is scanned for syntax errors, which include: x missing or misspelled words x invalid variable names x missing or invalid punctuation x invalid options. INPUT statements add a slot to the program data vector for each variable in the new data set. Variable attributes are identified when the variable is first encountered. At the end of the DATA step, the compilation phase is complete. The descriptor portion of the new data set is created, which includes: x the name of the data set x the number of observations and variables x the name and attributes of the variables.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
28
7.3.2 Executing the DATA Step The execution phase will create the data portion of the data set which contains the data values. The input raw data file is read from the program data vector. Observations are new data that are written out. One observation is created for each record being inputted. At the start of the execution phase, the file is checked for errors. x _N_ is set at value 1. The INFILE statement is read to identify the location of the raw data. Any INPUT statements are read from the input buffer: x an input pointer will track of the reading position x the input statement starts at column 1 by default. At the end of the DATA step: x Values in the program data vector are written as the first observation x The value of _N_ is incremented by 1 and a return to the top of the DATA step x Variable values in the program data vector are re-set to missing. The DATA step becomes a loop with each pass through called an iteration. The execution phase continues until the end-of-file marker in the raw data file is reached.
29
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
7.4
Debugging DATA Steps
7.4.1 Errors During Compilation During the compilation phase, many errors can be detected, including: x misspelled keywords and data set names x missing semicolons x unbalanced quotation marks x invalid options. The SAS will interpret some syntax errors. If the error is not interpreted: x prints the word ERROR in the log followed by the error message x compiles the step, but does not execute x prints a warning - NOTE: The SAS System stopped this step because of errors. 7.4.2 Errors During Execution Errors can occur during the Execution phase. How errors are dealt with is dependent on the type of error. Actions taken can include: x A note warning, or error message in the log x Values stored in the program data vector displayed in the log x Processing continues x Process stops. 7.4.3 Performing Testing DATA steps can be compiled and executed without creating a dataset, by using a NULL keyword. Used to detect common error messages and saves on development time.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
30
To create a NULL Data Set: x Use the DATA statement x Format: DATA _null_; All compilation and execution errors are written to the log, without creating a data set. After correcting errors, the NULL statement can be replaced with the appropriate data set name. The OBS= option in the INFILE statement can limit the number of observations that are read or created during execution. The PUT statement can be used to examine variable values. A personal message can be created in the log. IF-THEN/ELSE statements can be used to conditionally test for values. 7.4.4 Creating Message Strings Several forms of messages can be added to the log using the PUT statement. To specify a character string, enclose the string with quotations marks: x Format: PUT 'NOTE: Character Strings' ; To specify one or more data set variables for a single iteration of the DATA step, add the expression 'code type': x Format: PUT 'NOTE: Data Set Variables:' code type; o This expression will display only the value of the variable. x Format: PUT 'NOTE: Data Set Variables:' code= type=; o This expression will display the name and value of the variable.
31
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
To specify the automatic variables, _N_ and _ERROR_: x Format: PUT 'NOTE: Automatic variables:' code= _n_= _error_=; To specify all variables and automatic variables, use the _all_ expression: x Format: PUT 'NOTE: All variables:' _all_; Conditional processing can be used to flag errors or out of range data, using IF-THEN/ELSE statements: x Format: IF desired option THEN desired action ELSE PUT 'NOTE: Conditional processing';
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
32
7.5
SAS Data Sets
SAS Data Sets are created from either raw data sets or other SAS data sets. 7.5.1 Reading SAS Data Sets To create a data set, the libraries have to be referenced first: x The library where the data is stored x The library where the data will be stored Then DATA and SET statements are used to create a data set using values from another data set. To create a data set from values found in another data set: x Use the DATA statement and SET statement x Format: DATA SAS-data-set SET SAS-data-set RUN; o SAS-data-set in the DATA statement is the libref. filename of the SAS data set to be created o SAS-data-set in the SET statement is the libref. filename of the SAS data set to be read Sometimes, the data to be used needs some manipulation. Some common actions include: IF variable condition THEN action; IF variable limit;
Use a subset of data. Variable conditions can be <, >, <=, >=, Variable limits are typically equalities or inequalities.
DROP variable1, variable2...;
Drop any unwanted variables.
NewVariable = variable condition1;
Create or modify a variable.
LENGTH variable;
Specify a variable's length.
33
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
IF variable condition Conditional statements. THEN action; ELSE alternate action; LABEL variable;
Label a variable.
FORMAT variable;
Format a variable.
DROP= and KEEP+ statements are options that can be used whenever data sets are named, namely either the DATA or SET statements: x If a variable should be processed but not appear in the new data set, use the DROP= option in the DATA statement x If a variable should not be processed nor appear in the new data set, use the DROP= option in the SET statement. A BY statement can be used to group observations: x Datasets can be sorted by the value of one or more variables specified from a BY statement x Two temporary variables can be created: FIRST.variable and LAST.variable. The value for both is either 1 or 0 and they identify the first and last observation in each BY group. Multiple BY variables can be used to assign conditions to data sets. The result of multiple BY variables: x FIRST.variable for each variable is set to 1 for the first occurrence of a new value x A change in the value of a primary BY variable forces LAST.variable to be 1 for secondary BY variables. Observations in the input data set are read as they appear in the physical file, or sequentially. Sequential reading can be bypassed using a POINT= option.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
34
To read observations directly: x Use the POINT= statement x Format: POINT=variable; o variable specifies a temporary numeric variable containing the observation number of the desired observation and must be given a value before executing a SET statement. Reading observations continues until an end-of-file marker is reached: the use of a POINT= option will force the observation reading to never reach the end-of-file marker. The condition causes continuous looping; to resolve: x Use a STOP statement x Use programming logic that checks for an invalid value of the POINT= variable 7.5.2 Writing Observations To override the way the DATA step writes observations to output, an OUTPUT statement can be used. Observations are added to the data set only when the OUTPUT statement is executed. To control the observations written to a data set: x Use the OUTPUT statement x Format: DATA SAS-data-set; SET SAS-data-set; OUTPUT <SAS-data-set(s)>; STOP; o SAS-data-set(s) names the data set(s) where observations will be written o If a data set is not named, the OUTPUT statement impacts all data sets named by the DATA step.
35
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
In some cases, identifying the end of the data set is required to control the observations that are written, such as identifying totals. To identify the last observation in a DATA set: x Use the END= option x Format: END= variable o variable is the creation of a temporary variable containing an end-of-file marker. The variable is initialized at 0 and set to 1 when the SET statement reads the last observation of the data set. The variable specified in the END= option will not be added to the data set. END= and POINT= cannot be used in conjunction. 7.5.3 Running a Data Step Reading SAS data sets is similar to reading raw data, except that SAS retains the values of the variables from each observation. Includes a compilation and execution phase. Compilation steps: x Program data vector is created x Each statement is scanned for syntax errors x Any created variables are added to the program data vector x At the bottom of the DATA step, compilation ends and the descriptor portion of the new SAS data set is created. Execution steps: x DATA step executes for each observation in the input data set x The value of _N_ is 1 and the value of _ERROR_ is 0; all other variables are initialized to missing x The SET statement reads the first observation from the input data set and write the values to the program data vector x Any assignment statements are executed x At the end of the first iteration of the DATA step, the Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
36
x x x
7.6
values in the program data vector are written to the new data set The value of _N_ is set to N+1 The values of variables read from the SAS data set with the SET statement are retained and all other variables set to missing The process continues until all observations are read.
Combining SAS Data Sets
Methods of combining data sets: x One-to-one reading – will create observations that contain all variables from each contributing data set based on relative position in each data set x Concatenating – Appends observations from one data set to another x Interweaving – Intersperses observations from two or more data sets, based on one or more common variables x Match-merging – Matches observations from two or more data sets into a single observation in a new data set based on the values of a common variable. 7.6.1 One-to-One Reading To create a data set combining multiple data sets using one-to-one: x Use the DATA statement and multiple SET statements x Format: DATA output-SAS-data-set; SET SAS-data-set-1; SET SAS-data-set-n; RUN; o output-SAS-data-set names the data set to be created o SAS-data-set-n names the data sets to be read.
37
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
The criteria used by one-to-one readings to select data: x The new data set contains all variables from all input data sets x If data sets have variables of the same name, the values from the last data set overwrite the values read from previous data sets x The number of observations in the new data set will be the same as the number of observations in the smallest original data set x Observations are combined based on their relative position, that is, the first observations from each data set are combined, and so on. 7.6.2 Concatenating To create a data set combining multiple data sets using concatenation: x Use the DATA statement and the SET statement x Format: DATA output-SAS-data-set; SET SAS-data-set-1 SAS-data-set-n; RUN; o output-SAS-data-set names the data set to be created o SAS-data-set-n names the data sets to be read. The criteria used by Concatenation readings to select data: x The new data set contains all variables and observations from all input data sets x All observations are read from the data sets listed in the SET statement x Conditions to watch for: o Instances of a common variable must have the same type attribute, or processing is stopped and an error message issued o Common variables with different attributes other than type will retain the attribute from the first data set containing the variable, including length, label, format, and informat. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
38
7.6.3 Interweaving To create a data set combining multiple data sets using interweaving: x Use the DATA statement and the SET statement x Format: DATA output-SAS-data-set; SET SAS-data-set-1 SAS-data-set-n; BY variable(s); RUN; o output-SAS-data-set names the data set to be created o SAS-data-set-n names the data sets to be read o variable(s) specifies one or more variables used to interweave. The criteria used by Interweaving readings to select data: x Observations in each BY group in each data set is read sequentially in the order of the data sets and BY variables are listed x The new data set contains all variables from all input data sets and contains the total number of observations from all input data sets x Observations are grouped by a common variable.
39
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
7.6.4 Simple Match-Merging To create a data set combining multiple data sets using matchmerging: x Use the DATA statement and the MERGE statement x Format: DATA output-SAS-data-set; MERGE SAS-data-set-1 SAS-data-set-n; BY variable(s); RUN; o output-SAS-data-set names the data set to be created o SAS-data-set-n names the data sets to be read o variable(s) specifies one or more variables used to merge o DESCENDING identifies if the input data set should be sorted from largest to smallest. The absence of DESCENDING has the input data set sorted in ascending order. The criteria used by Interweaving readings to select data: x The new data set contains all observations from all input data sets, though options can be in place that would reduce the number of observations found in the output x Each observation of each data set is checked sequentially for matching BY values and writes the combined observation to the new data set x If an input data set doesn't have any observations for a particular value for a same-name variable, missing values will be displayed in the output.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
40
7.7
Processing Match-Merge
7.7.1 Match-merge Compilation and Execution To compile data sets used for a merge-match: x Descriptor portions of all data sets listed in the MERGE statement are read x The remaining DATA step program is read for variables and options impacting the merge x A program data vector is created for the merged data set x A tracking pointer is assigned for each data set listed in the MERGE statement. After compilation, the observations are sequentially match-merged by moving the tracking pointers through each observation of each data set and checking for BY values that match: x If a match is found, the observation is written to the program data vector in the order the data sets appear in the MERGE statement. x The value of same-named variable retain the value of the last data set where the variable appears x If a match is not found, the order of the values is determined by SAS and the observation is written to the program data vector. By default, all observations are written to the program data vector, including observations with missing data and no matching BY values. Observations written to the PDV are written to the output data set. If an observation contains missing values for a variable, the observation in the output data set contains missing values. If a BY variable exists, observations that have missing values will appear at the top of the output data set. If an input data set doesn't have a matching BY value, observations in the output data set contains missing values for the variables.
41
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
7.7.2 Renaming Variables Same-named variables can appear in multiple data sets to be merged, which will overwrite each other based on the order the data set appears in the MERGE statement. Sometimes, the name of a variable in one or more data set needs to be renamed to prevent being overwritten. To rename variables: x Use the RENAME-data set option and the MERGE statement x Format: DATA output-SAS-data-set; MERGE SAS-data-set-1(RENAME=old-variablename=new-variable-name)) SAS-data-set-n; BY variable(s); RUN; o output-SAS-data-set names the data set to be created o SAS-data-set-n names the data sets to be read o variable(s) specifies one or more variables used to merge o RENAME=options follows the name of each data set that contains the variable(s) to be renamed o old-variable-name identifies the name of the variable to be renamed o new-variable-name identifies the new name for the variable. Variables can be renamed for data sets specified by the DATA and SET statements. 7.7.3 Excluding Unmatched Observations Match-merging combines all observations in all input data sets. Sometimes, the desired result displays only those observations that match for two or more specific input data sets. Exclusion of unmatched observations can be performed using the IN=data set option and a subsetting IF statement in the DATA step: Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
42
x x
The IN= data set option creates and names the variable that identifies if a data set contributed data to the current observation The subsetting IF statement checks the IN= values and writes to the merged data set only those observations where IN= is specified.
To exclude unmatched observations: x Use the IN= option and the MERGE statement x Format: DATA output-SAS-data-set; MERGE SAS-data-set-1(IN=variable1) SAS-data-set-n (IN=variablen) ; BY variable(s); IF variable1=1 and variablen=1 RUN; o output-SAS-data-set names the data set to be created o SAS-data-set-n names the data sets to be read o variable(s) specifies one or more variables used to merge o IN=option, in parentheses, follows the data set name o variablen names the variable to be created o new-variable-name identifies the new name for the variable o The value of the variable is 1 if the data set is contributing to the current observation and 0 if it is not o IF statement provides the condition that an observation is created only if both conditions are met. Variables can be dropped or kept by using the DROP= and KEEP=data set options.
43
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
8 Working with Data Raw data can be organized in several ways: x arranged in columns, or fixed fields x arranged without columns, or free format. Raw data can contain: x standard data without any special characters x nonstandard data with special characters. SAS has three input styles x column input x formatted input x list input.
8.1
Column Input
Column input allows fields to be read in any order. Character variable values that contain embedded blanks can be read. Missing data does not require a placeholder. A blank field will be read as missing and will not cause other fields to be misread. Fields or parts of a field can be re-read. Fields do not have to be separated by blanks or other delimiters. 8.1.1 Numeric Data When raw data is organized in fixed fields: x use column input to read standard data only x use formatted input to read both standard and nonstandard data.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
44
Standard numeric data values contain: x numbers x decimal points x numbers in scientific notation x positive and negative numbers. Nonstandard numeric data values contain: x special characters x data and time values x fractions x integer binary x real binary x hexadecimal forms. To read both standard and nonstandard data in fixed fields: x Use an INPUT statement x Format: INPUT <pointer-control> variable informat.; o pointer-control places the input pointer in the appropriate column o variable is the name of the variable to be created o informat is the special instruction specifying how SAS reads raw data. Two column pointer controls are utilized: x the @n moves the input pointer to a specific column number x the+n moves the input pointer forward to a column number relative to the current position x Format: INPUT @n variable informat.; x Format: INPUT +n variable informat.;
45
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
@n is an absolute pointer control. @ moves to column n, the first column to be read. When reading a record, the @n pointer can be moved forward and backward. +n moves the input pointer forward to a column number relative to the current position. The notation +(-n) will move the pointer control backwards. 8.1.2 Informats Informats are used to explain how the raw data is read. The $w. informat allows character data to be read. The dollar sign indicates character only data. The w represents the field width, or number of columns, of the data. The period ends the informat. The w.d format allows standard numeric data to be read. The w represents the field width, or number of columns, of the data. If a decimal point exists with the raw data, that acts as one decimal. The period acts as a delimiter. The optional d specifies the number of implied decimal places (not necessary if the value already has decimal places). The COMMAw.d will read nonstandard numeric data, removing any embedded: x blanks x commas x dashes x dollar signs x percent signs x right parentheses x left parentheses. 8.1.3 Record Formats How data is read by column input and formatted input processes can be affected by the record format of the external file. Two common record formats are fixed-length records and variable-length formats. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
46
A fixed-length record format has a predetermined number of columns. The end-of-record marker exists after the last column. The typical fixed-length record will have a length of 80 columns. A variable-length record format has an end-of-record market after the last field of each record. Variable-length records containing fixed-field data might have values which are shorter than others or values that are missing. With varied data, the end of column can conflict with the end-ofrecord marker causing errors. A PAD option can be used to bypass common problems with reading fixed-field data in variable-length records. The PAD option will add blanks to each record to give all data lines the same length. The default value of the maximum record length is determined by the operating environment. The maximum record length can be changed using the LRECL=option in the INFILE statement.
8.2
Reading Free Format Data
In free format records, the fields are often separated by a delimiter, such as blanks or the pound sign. A List input can read standard and nonstandard data in a free-format record: x Use an INPUT statement x Format: INPUT variable <$>; o variable is the name of the variable to be read o $ identifies the variable as a character variable. By default, List input does not have specified column locations, so: x all fields must be separated by at least one delimiter x fields cannot be skipped or re-read x the order for reading fields is from left to right.
47
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
8.2.1 Processing List Input List input processing will scan for values instead of reading from specific columns. x SAS reads the first field until it encounters a delimiter x The delimiter indicates the end of the field, so the data value is assigned to the first variable in the program data vector x Then the scan continues until the next delimiter is encountered x The data value for the next variable is assigned in the program data vector x This continues until all the fields have been read and values are assigned to variables. The data set can be displayed using the PRINT procedure. Blanks as delimiters are easy to read. Other delimiters need to be explicitly identified. To specify a delimiter: x Use an INFILE statement and DLM-option x Format: INFILE file-specification DLM=delimiter(s); o file-specification takes the form of fileref from a previously defined file reference or 'filename' to point to the actual name and location of the file o delimiter(s) specifies a delimiter for list input. Delimiters can be specified in two ways: x A 'list of delimiting characters' will specify one or more characters (must be enclosed in quotation marks) x character-variable specifies a character variable to be a delimiter. A delimiter should be the same as a character within the field values.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
48
When the variable values are sequential and separated by a delimiter, variables can be expressed and specified in a range. By default several limitations exist on the type of data that can be read using list input: x Character values that are longer than eight characters will be truncated x Data must be in standard numeric or character format x Character values cannot contain embedded delimiters x Missing numeric and character values must be represented by a period or some other character. 8.2.2 Reading Missing Values Missing values can exist anywhere in the data set. If missing values occur at the end of the record, the MISSOVER option in the INFILE statement can be used to read them. The MISSOVER option will prevent the SAS from going to another record if values cannot be found for every specified variable in the current line. MISSOVER only works with missing values at the end of a record. To begin to read missing values in the beginning or middle of the record, the DSD option in the INFILE statement can be used. DSD changes how delimiters are treated when using a list input: x by setting the default delimiter to a comma x treating two consecutive delimiters as a missing value x removing quotation marks from values. If multiple delimiters are used in the original file, they can be identified by using the DLM=option. Modifying List inputs allows it to be more versatile. Two modifiers can be used: x the ampersand is used to read character values that contain embedded blanks x the colon is used to read nonstandard data values and character values longer than eight characters.
49
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
8.2.3 Managing Length The default length of character values is 8. Variables that are longer than 8 are truncated when written to the program data vector. Using a LENGTH statement before the INPUT statement will define the length and type of the variable.
8.2.4 Creating Free-Format Data To create free-format data: x Use a PUT statement x Format: PUT variable <: format>; o variable is the name of the variable to be written o a colon precedes a format o format specifies a format to use when writing variables. Using a DLM=option in the FILE statement can be used to be create a character-delimited raw data file. If special characters are required in the data format, use the DSD=option in the FILE statement, including quoted strings. To create a simple raw data file: x Use a PROC EXPORT statement x Format: PROC EXPORT DATA=SAS-data-set; OUTFILE=filename ; RUN; o SAS-data-set names the input SAS data set o filename identifies the complete path and file name of the output o delimiter identifies the delimiter used to separate columns in the output file.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
50
8.3
Reading Data and Time Values
SAS stores a date value as the number of days from January 1, 1960 to a given date. SAS stores a time value as the number of seconds since midnight. The SAS datetime value is the number of seconds between midnight on January 1, 1960 and a given data and time. Data and Time informats are used to read data and time expressions and convert them to SAS data and time values. Like other informats, they consist of: x a name x a field width x a period delimiter. When specifying an informat: x Use an INPUT statement x Format: INPUT <pointer-control> variable informat.; o pointer-control places the input pointer in the appropriate column o variable is the name of the variable to be created o informat is the special instruction specifying how SAS reads raw data. Some common data informats: x MMDDYYw. - 11/15/09 x Dates. - 15Nov09 x MMDDYYw. - 11-15-09 x YYMMDDw. - 09/11/15 MM will always represent the month and be an integer between 01 and 12. DD will always represent the day and be an integer between 01 and 31. YY is the year, which can be represented YYYY. w is the field width.
51
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Date fields can be separated by blanks, hyphens, and slashes. The field width should be large enough to include all delimiters. x x x x
Time informat is expressed as hh:mm:ss:ss. The hh represents hours and be an integer between 00 and 23. The mm represents minutes and be an integer between 00 and 59. The ss:ss represents seconds and hundredths of seconds and is optional.
The WEEKDATEw. format will display the day of the week, monthname, day and year. The WORDDATEw. format will write the data in the form monthname, day, and year.
9 Working with Observations In the simplest form, the information for a single observation will be found in a single record. In many cases, the information for a single observation can be found in multiple records. To access this information: x write multiple INPUT statement to read each record x write one INPUT statement contain a line pointer control to specify the record(s) being read.
9.1
Creating One Observation From Multiple Records
9.1.1 Line Pointer Controls When SAS reads raw data, its position is tracked using an input pointer. Column pointer controls are used to determine the column position of the input pointer. The input pointer can also be positioned on a specific record using a line pointer control.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
52
Two types of line pointer controls exist: x the forward slash specifies the line location relative to the current one x the #n specifies an absolute number of the line for moving the pointer. 9.1.2 Reading Multiple Records To read multiple records sequentially: x use the forward slash line pointer x the / line pointer control will only move the input pointer forward x the / line pointer control is specified after instructions for reading values in the current record. When a DATA step executes, the values of the first record are read and the input pointer is moved by the line pointer control to the second record. The second record is read and the second line pointer control moves the input pointer to the third record. This continues until the INPUT statement is complete. The values read are written to the data set as an observation. Control returns to the top of the DATA steps and the variable values are reinitialized to missing. The process continues starting where the last iteration left off and continues until all records have been read. The number of records for each observation must be the same. To read records non-sequentially: x The #n line pointer control specifies the absolute number of the line where the input pointer will be moved x This absolute condition allows records to be read in any order x The #n line pointer control must be specified before any instructions for reading values in a record. A SAS program can use the forward slash line pointer control and #n line pointer control together to read records sequentially and nonsequentially.
53
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
9.2
Creating Multiple Observations From Single Records
In some cases, the information for multiple observations can be found in a single record. To manage the control for reading a record, a line-hold specifier is required. SAS provides two line-hold specifiers: x a trailing at sign (@) will hold the input record for the execution of the next INPUT statement x a double trailing at sign (@@) holds the input record for the execution of the next INPUT statement, even across iterations of the DATA step. Normally, each time a DATA step executes, the INPUT statement reads a new record. When @@ is used: x to read multiple SAS observations from single data line x the data line is held in the input buffer across multiple executions of the DATA step x the @ pointer control, column input, nor the MISSOVER option should be used. A record being held by @@ will not be released until: x the input pointer moves past the end of the record x an INPUT statement that has no line-hold specifier executes. With a single @, reading records: x will allow the next INPUT statement to read from the same record x release the current record when a subsequent INPUT statement executes without a line-hold specifier.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
54
10 Creating Reports 10.1 Basic Reports The PROC PRINT statement can be used to create a printout, or report, of the information in a data set. The report can be enhanced with additional statements and options. To print the contents of a SAS data set: x Use the PROC PRINT statement x Format: LIBNAME libref 'SAS-data-library' PROC PRINT data=SAS-data-set; RUN; o libref is 1 to 8 characters long beginning with a letter or underscore and containing only letters, numbers, or underscores o SAS-data-library is the name of the library where files are stored o SAS-data-set is the name of the SAS data set to be printed. Default layout for a report includes: x all observations and variables are printed x observations numbers are listed in the far left column x variables appear in the order they occur in the data set. Statements and options for changing report characteristics SUM variable(s)
Produce column totals for numeric values
VAR variable(s);
Select the variables to include in the report and the order they appear
PROC PRINT
Observation numbers are not displayed
data=libref NOOBS ID variable(s);
55
Identifies one or more variables that will replace the
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
OBS column. If a variable is listed in an ID statement and a VAR statement, the output will have two columns for the same variable WHERE whereexpression
Identifies the conditions for selecting observations in the report. The where-expression can be any SAS expression
LABEL variable=newvariable-name
Change the label name of a variable
10.1.1 WHERE Expressions Any variable found in the SAS data set can be specified in the WHERE statement, not just variables specified in the VAR statement. Either character or numeric variables can be included in an WHERE statement. Conditions based on the value of a character variable must: x be enclosed in quotation marks x must be written exactly as it appears in the data set. The operators, CONTAINS and ?, can be used with the WHERE statement to select observations with a specific substring. Multiple conditions can be used to select observations using logical operators. When testing for multiple values of the same variable: x The variable name is specified in each expression WHERE variable=value1 OR variable=value2; x The use of the IN operator WHERE variable IN (value1, value2); x Control the evaluation of compound expression, use parentheses.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
56
10.1.2 Sorting Data Default reports list observation in the order they appear in the data set. To sort the report based on the values of a variable, use the PROC SORT statement. The SORT procedure will: x rearrange observations in the SAS data set x create a new SAS data set containing the rearranged observations x replace the original SAS data set x sort on multiple variables x sort in ascending or descending order x treats missing values as the smallest value x not generate printed output. To perform a basic sort of variables: x Use the PROC SORT statement x Format: PROC SORT DATA=SAS-data-set ; BY BY-variable(s); RUN; o DATA= specifies the data set to be read o OUT= specifies the output data set that contains the data in sorted order o BY-variable(s) specifies one or more variables used to sort the data o DESCENDING identifies if the input data set should be sorted from largest to smallest. The absence of DESCENDING has the input data set sorted in ascending order. Without the OUT= option, the data set specified is permanently sorted.
57
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
10.1.3 Creating Subtotals Creating column totals for numeric variables is done using SUM statement. To subtotal numeric variables, use a BY statement in conjunction with the SUM statement. To create a total and subtotals for numeric variables: x Use the SUM statement and the BY statement x Format: SUM variable(s); BY < > ; o variable(s) specifies one or more variables, separated by blanks. o BY-variable(s) specifies variables to be used to form BY groups. More than one variable should be separated by blanks o DESCENDING identifies if the input data set should be sorted from largest to smallest. The absence of DESCENDING has the input data set sorted in ascending order o NOTSORTED specifies that the observations are not necessarily sorted in any order. Variables listed in the SUM statement are not required to be specified in the VAR statement. 10.1.4 Creating a Customized Layout Using the ID statement in conjunction with the ID and SUM statements will show the BY variable heading only once: If the variable specified by the IN statement is the same as the BY statement, then: x The OBS column is suppressed x The ID/BY variable is printed in the far left column x Each ID/BY value is printed at the start of each BY group and on the same line as the group's subtotal. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
58
Each BY group can be printed on a separate page using the PAGEBY statement: x Format: PAGEBY BY-variable; To double space the report, use the DOUBLE option in the PROC PRINT statement. 10.1.5 Titles and Footnotes Reports can be created so they are easy to interpret, including: x adding titles and footnotes x replacing variable names with descriptive labels x formatting variable values. Up to 10 titles with procedure output can be specified using the TITLE statement before the PROC step. Up to 10 footnotes can be specified using the FOOTNOTE statement before the PROC step. x Format: TITLE 'text'; FOOTNOTE 'text'; PROC PRINT SAS-data-set; o n is a number from 1 to 10 which specifies the title or footnote o 'text' is the title or footnote to be displayed. x x x x
The maximum length is dependent on the operating environment and the value of the LINESIZE= option TITLE is like TITLE1; FOOTNOTE is like FOONOTE1 The default tile is The SAS System There is no default footnote.
Blank lines can be created in the title or footnote by skipping title expressions; such as title1, title3.
59
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
When multiple titles occur: x In HTML output, the lines appear consecutively without extra spacing x In SAS listing output, lines are centered by default. When multiple footnotes occur: x The FOOTNOTE statement with the largest number appear on the bottom line x In HTML output, the lines appear consecutively without extra spacing x In SAS listing output, lines are centered by default. TITLE and FOOTNOTE statements are global statements and are in place until they are modified, canceled, or the SAS session ends. When redefining a title or footnote, all higher-numbered titles or footnotes are canceled. A null TITLE or FOOTNOTE statement has no number or text. 10.1.6 Assigning Descriptive Labels Columns can be labeled with descriptive text. To label columns, use: x the LABEL statement to assign a descriptive label to a variable x the LABEL option in the PROC PRINT statement to specify the labels to be displayed x
Format: LABEL variable1='label1'; variable2='label2'; ... ;
Variables can be assigned separate LABEL statements or be grouped with several variables assigned a single LABEL statement.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
60
10.1.7 Assigning Formats To associate formats with variables, use the FORMAT statement. Formats only affect the output of the data values, not the actual data values stored. To format a variable: x Format: FORMAT variable(s) format-name; o variable(s) is a name of one or more variables whose values are to be written according to a particular pattern o format-name specifies a SAS or user-defined format for the values. Variables can be assigned separate FORMAT statements or be grouped with several variables assigned a single FORMAT statement. Common SAS formats: w.
Rounded to the nearest integer in w spaces.
w.d
Rounded to d decimal places in w spaces.
$w.
Character values in w spaces.
COMMAw.d
Contains commas and decimal places.
DOLLARw.d
Contains dollar signs, commas and decimal spaces.
DATEw.
Date values in the form 12/12/10 (DATE7.) or 12DEC2010 (DATE9.)
MMDDYYw.
Date values in the form 12DEC10 (DATE8.) or 12/12/2010 (DATE10.)
All SAS formats specify a total field width (w). The field width includes the numeric digits and commas. Stored Value
4060
Desired Format
COMMAw.d
61
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Displayed Value
4,060
Positions Displayed
5
FORMAT Statement
format variable comma5.0;
The number of decimal places used for a numeric decimal can be specified (D). Stored Value
4060
Desired Format
COMMAw.d
Displayed Value
4,060.00
Positions Displayed
8
FORMAT Statement
format variable comma8.2;
Currency is displayed with appropriate dollar signs, commas, and decimal spaces. Stored Value
4060
Desired Format
DOLLARw.d
Displayed Value
$4,060.00
Positions Displayed
9
FORMAT Statement
format variable dollar9.2;
To temporarily assign a label or format to the data output, use the LABEL or FORMAT statements within the PROC PRINT step. To permanently assign a label or format to the data , use the LABEL or FORMAT statements within the DATA step.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
62
10.2 Advanced Reporting To create advanced forms of reports, use the PROC REPORT statement. Types of customization allow: x Request separate subtotals and grand totals x Calculate columns x Create and store report definitions. The PROC REPORT statement can be used: x In a windowing mode with prompts to guide reporting building x In a windowing mode without the prompts x In a nonwindowing mode. 10.2.1 Default List Reports To create a basic list report: x Use the PROC REPORT statement x Format: PROC PRINT ; RUN; o SAS-data-set is the name of the SAS data set to be used for the report o options provides the report in a window or nonwindowing environment. The COLUMN statement identifies the variables displayed in the report and their order of appearance. x Format: COLUMN variable(s); The WHERE statement identifies the observations, or rows, to be selected based on a condition.
63
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
10.2.2 Defining Variables Reports can be enhanced by: x defining how variables can be used in the report x assigning formats to variables x specifying column headings and widths x justifying the variable values x justifying column headings x changing the order of rows. The DEFINE statement is used to describe the use and display of variables. To define variables: x Use the DEFINE statement x Format: DEFINE variable/ <usage> <justification> <”column heading”>; o variable is the name of the variable to be defined o usage specifies how a variable is to be used. Valid options are ACROSS, ANALYSIS, COMPUTED, DISPLAY, GROUP, and ORDER o attribute(s) specifies the attributes of the variables, including FORMAT=, WIDTH=, and SPACING= o option(s) specifies formatting options, including DESCENDING, NOPRINT, NOZERO, and PAGE o justification specifies the justification of the column (CENTER, LEFT, or RIGHT) o “column heading” specifies the label for the column. Format controls the appearance of data in the report. Any SAS or user-defined format can be used with FORMAT=.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
64
By default, the column width is the length of the variable for character variables or 9 for numeric variables. x The WIDTH=attribute can be used to specify the width for the columns in the report x The value of WIDTH can be 1 to the value of the LINESIZE= system option x WIDTH= attribute has no effect on HTML output. Spacing of columns can allow reports to be easier to read. Defining spacing allows the number of blank characters between the selected column and the column immediately to the left to be specified. x The default spacing is 2 x The spacing can be redefined using SPACING= attribute x The SPACING= attribute has no effect on HTML output. Column headings can be changed by enclosing the heading text in quotation marks. How words break in column headings can be controlled using a split character. To use a split character: x use the default slash (/) x use the SPLIT= option and define a split character. By default, PROC REPORT will left-justify character variables and right-justify numeric variables. To define a different justification, specify the desired option (right, left, center) in the DEFINE statement. The option will justify the column headings and the values within the column width. The appearance of the headings found in the list report can be changed using two options: x HEADLINE underlines all column headings and the spaces between them x HEADSKIP creates a blank line beneath all column headings or after the underline if the HEADLINE option is used. These options have no effect on HTML output.
65
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
10.2.3 Variable Usage How variables are used will change the layout of the report. Each variable can be used: x DISPLAY – default for character variables x ORDER x GROUP x ACROSS x ANALYSIS – default for numeric variables x COMPUTED. Display variables don't affect the order of rows in the report. If one or more display variables are in the report, a detail row for each observation is read from the data set. A detail contains a value for each display variable. Order variables determine the order of rows in the report. The detail rows are ordered according to their formatted values. By default, the order is ascending: to change this, use the DESCENDING option. Summary reports are possible using group variables. A group variable will group the detail rows in a report according to their formatting value. The PROC REPORT will consolidate the relevant observations into one row. To create a summary report, all variables in the report have to be defined as group, analysis, across, or computed. If a variable in the report is not defined or defined as display or group, a group variable is displayed as an order variable. The default statistic for analysis variables is SUM. To associate a statistic with an analysis variable, the statistic should be specified as an attribute in the DEFINE statement.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
66
Possible statistical attributes: CSS
Corrected sum of squares
CV
Coefficient of variation
MAX
Maximum value
MEAN
Average
MIN
Minimum value
N
Number of observations with nonmissing values
NMISS
Number of observations with missing values
PCTN
Percentage of a cell or row frequency to a total frequency
PCTSUM
Percentage of a cell or row sum to a total sum
PRT
Probability of a greater absolution value of student's t
RANGE
Range
STD
Standard deviation
STDERR
Standard error of the mean
SUM
Sum
SUMWGT Sum of the Weight variable values T
Student's t for testing the hypothesis that the population mean is 0
USS
Uncorrected sum of squares
VAR
Variance
Across variables are functionally similar to group variables. Groups are created horizontally, not vertically, with across variables. Analysis variables will have all values for a variable summed when using across variables. Computed variables can be associated to character or numeric variables. Computed variables are not in the input data set, and they are not added to the input data set by the PROC REPORT. The use of a computed variable cannot be changed.
67
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Computed variables can be added in a nonwindowing environment: x Include the computed variable in the COLUMN statement x Define the usage as COMPUTED x Compute the value of the variable in a compute block associated with the variable.
10.3 Descriptive Statistics Descriptive statistics are used to summarize large amounts of data. The type of descriptive statistics required and the SAS procedure used is dependent on what needs to be summarized. Continuous data values or discrete data values can be summarized. 10.3.1 Computing Numeric Variables The MEANS procedure is used to provide mean, minimum, and maximum and other data summarization to control the output of the report. To create a MEANS procedure: x Use PROC MEANS statement x Format: PROC MEANS <statistic-keyword(s)> ; RUN; o SAS-data-set is the name of the SAS data set used o statistic-keyword(s) identifies the statistics to compute o options will control the content, analysis, and appearance of the output.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
68
In its simplest form, PROC MEANS will print for every numeric variable in a data set: x the number of nonmissing values (n-count) x the mean x the standard deviation x the minimum value x the maximum value. To limit the output, the use of statistic keywords will focus the output. DESCRIPTIVE STATISTICS CLM
Two-sided confidence limit for the mean
CSS
Corrected sum of squares
CV
Coefficient of variation
KURTOSIS Kurtosis / KURT LCLM
One-sided confidence limit below the mean
MAX
Maximum value
MEAN
Average
MIN
Minimum value
N
Number of observations with nonmissing values
NMISS
Number of observations with missing values
RANGE
Range
SKEWNES Skewness S /SKEW STDDEV /STD
Standard deviation
STDERR / STDMEAN
Standard error of the mean
SUM
Sum
SUMWGT
Sum of the Weight variable values
UCLM
One-sided confidence limit above the mean
USS
Uncorrected sum of squares
VAR
Variance
69
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
QUANTILE STATISTICS th
MEDIAM / P50
Median or 50 percentile
P1
1 percentile
P5
5 percentile
P10
10 percentile
Q1 / P25
Lower quartile or 25 percentile
Q3 / P75
Upper quartile or 75 percentile
P90
90 percentile
P95
95 percentile
P99
99 percentile
QRANGE
Difference between upper and lower quartiles: Q3-Q1
st
th
st
th th
st st st
HYPOTHESIS TESTING PROBT
Probability of a greater absolute value for the t value
T
Student's t for testing the hypothesis that the population mean is 0
10.3.2 Formatting Statistics By default, PROC MEANS uses the best format to display values which can lead to unnecessary decimal places. MAXDEC= can limit the number of decimal places displayed. By default, statistics are generated for every numeric variable in a data set. To focus on specific variables, a VAR statement can be used. By default, the statistics will be applied to all observations in a data set. To apply statistics to groups of observations, add a CLASS statement. The values of CLASS variables are used only to categorize data, therefore, statistics are not generated for CLASS variables. They can be either character or numeric, but must have a limited number of Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
70
discrete values to be grouped meaningfully. The order of the variables in the CLASS statement determines their order in the output. BY statements can specify variables for categorizing like CLASS statements. Differences include: x BY processing requires that data is already sorted or indexed. The SORT procedure must be used before using PROC MEANS x The layout of group results are different: A CLASS statement will create a single table, while a BY statement will create multiple tables. 10.3.3 Creating Summarized Data Sets A SAS data set can be created that contains only a summarized variable. To create summarized DATA set using PROC MEANS: x Use PROC MEANS statement and OUTPUT statement x Format: PROC MEANS <statistic-keyword(s)> ; OUTPUT OUT=SAS-data-set-out <statistickeyword=variable-name(s)>; RUN; o SAS-data-set is the name of the SAS data set to be used o statistic-keyword(s) identifies the statistics to compute o options will control the content, analysis, and appearance of the output o SAS-data-set-out is the name of the output SAS data set o statistic-keyword identifies the summary statistic to be written out o variable-name(s) identifies the names of the variables created to contain the values of the summary variables.
71
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
When the statistic-keyword= option is not used, the summary statistics produced will be N, MEAN, STD, MIN, and MAX for all numeric variables listed in the VAR statement or all numeric variables is no VAR statement exists. The keyword for the statistics must be specified and then all the variables to be computed. If a VAR statement exists, the variables must be listed in the same order as the variables in the VAR statement. Summarized output data set can be created using the SUMMARY procedure. The PROC SUMMARY uses the same code to produce output data as PROC MEANS. The difference between PROC MEANS and PROC SUMMARY is the use of defaults: PROC MEANS produces a report by default, PROC SUMMARY must include a PRINT option.
10.4 Creating Frequency Tables To describe the data by reporting the distribution of variable values, the FREQ procedure should be used. The FREQ procedure is a descriptive procedure and a statistical procedures. One-way and nway, or crosstabulation, frequency tables can be created. The FREQ procedures can include several statements and options for controlling output. To create a basic frequency report: x Use PROC FREQ statement x Format: PROC FREQ ; RUN; o SAS-data-set is the name of the SAS data set to be used.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
72
By default, a one-way table is created with the frequency, percent, cumulative frequency, and cumulative percent of every value of all variables in a data set. x Frequency is the number of observations with the desired value x Percent is the frequency of the value divided by the total number of observations x Cumulative frequency is the sum of the frequency counts of the value and all other values above it in the table x Cumulative percent is the cumulative frequency of the value divided by the total number of observations. By default, the FREQ procedure will create frequency tables for every variable in the data set. Frequency distributions work best with variables that can be categorized and whose values are best summarized by counts. To identify the variables to be processed by the FREQ procedure: x Use TABLES statement x Format: TABLES variable(s); 10.4.1 Creating N-Way Tables Crosstabulation is used when two distinct, but related, variables are compared. The most basic form of crosstabulation is a two-way table. To create an n-way table: x Use PROC FREQ statement and TABLES statement x Format: PROC FREQ ; TABLES variable-1* variable-2 <*.. variable-n>; RUN; o SAS-data-set is the name of the SAS data set to be used o variable-1 identifies table rows o variable-2 identifies table columns o variable-n identifies a multi-way table.
73
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
When specifying crosstabulation, PROC FREQ produces tables with cells that contain: x cell frequency x cell percentage of total frequency x cell percentage of row frequency x cell percentage of column frequency. When creating n-way tables, a series of two-way tables are produced with a table for each level of the variables. 10.4.2 Table Layout To read n-way tables, the order of variables is an important consideration. The last two variables of the TABLES statement will become the two-way rows and columns. A table will be created for each value of the of the variables preceding the last two variables. A CROSSLIST option will display crosstabulation tables in ODS column format. CROSSLIST creates a table with a customizable table definition. A table definition is customized using the TEMPLATE procedure. x Format: TABLES variable-1* variable-2 <*.. variable-n> /CROSSLIST; A LIST option will allow bulky, complex crosstabulations to be read as a continuous list - great for n-way tables that have three or more variables specified. It eliminates row and column frequencies and percents x Format: TABLES variable-1* variable-2 <*.. variable-n> /LIST; The output of the FREQ procedures can be limited to a few specific statistics: x NOFREQ suppresses cell frequencies x NOPERCENT suppresses cell percentages x NOROW suppresses row percentages x NOCOL suppresses column percentages. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
74
10.5 Producing HTML Output SAS uses Output Delivery System (ODS) statements to generate HTML output. With ODS, HTML output in any operating environment can be created, customized, and managed through programming statements. 10.5.1 Output Delivery System Output can be created in a variety of formats with ODS: x HTML output x Output data set of procedures results x Traditional SAS listing output. When ODS statements are submitted and the output is created by the SAS program: x ODS creates the output in the form of output objects; each output object containing: o Data component - the results of a procedure or DATA step o Table definition – information about how the results are rendered. x The output object is sent to specified ODS destination(s) and creates the formatted output controlled by the destination. ODS destinations supported include: x HTML – output formatted in HTML x Listing – output formatted like traditional SAS procedure output x Markup Language Family – output formatted in markup languages, such as XML x ODS Document – hierarchy of output objects x Output – SAS data sets x Printer Family – output formatted in PS, PDF, or PCL files x RTF – output formatted in RTF format using Microsoft Word.
75
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
For each type of formatted output created, an ODS statement is required to open the destination and another ODS statement to close the destination. The default destination is Listing. To open and close ODS destinations: x Use ODS statement x Format: ODS open-destination; ODS close-destination CLOSE ; o open-destination is the keyword and any required options for the output type to be created o close-destination is the keyword for the type of output. Since the default destination is Listing, it is considered always open. Best practices would close the listing destination at the beginning of the program. If multiple ODS destinations are open concurrently, they can be closed at the same time using the keyword _ALL_. 10.5.2 Creating Simple HTML To create HTML output: x Use ODS HTML statement x Format: ODS HTML BODY=file-specification; ODS HTML CLOSE ; o file-specification identifies the file containing the HTML output. The specification can be: an HTML filename a fileref a SAS catalog entry. ODS HTML statements can be used to direct the output from multiple procedures to the same HTML file.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
76
To create HTML output with a table of contents: x Use ODS HTML statement x Format: ODS HTML o BODY=body-file-specification; o CONTENTS=contents-file-specification; o FRAME=frame-file-specification; ODS HTML CLOSE ; body-file-specification is the name of the HTML file containing the procedure output contents-file-specification is the name of the HTML file containing a table of contents with links to the procedure output frame-file-specification is the name of the HTML file connecting the table of contents with the body file x x x
A CONTENTS= option is required if FRAME= is used A table of contents will contain a numbered heading for each procedure that creates output Links are generated between files by using HTML filenames specified in the ODS HTML statement.
The URL=suboption in the BODY or CONTENTS file specification will allow an URL to be used for all links that is created to the file. Either relative URLs or absolute URLs can be specified. PATH=option will allow the specification of a location where HTML output can be store and assists in streamlining the ODS HTML statement. STYLE=option can be used to change the appearance of the HTML output, and valid SAS or user-defined style definition can be used.
77
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
11 Advanced Work with Variables 11.1 User-Defined Formatting Raw data or data found in a basic SAS data set may not be easily read in its basic form requiring additional formatting to clarify the information. A common occurrence is the existence of coded information: a set of numerical expressions used to identify meaningful information. PROC FORMAT can be used to replace the code with the information it represents. To define a specific format: x Use PROC FORMAT statement x Format: PROC FORMAT ; options include: LIBRARY=libref identifies the SAS data library which contains the catalog storing user-defined formats FMTLIB will print the contents of a format catalog All user-defined formats are stored in a format catalog. If one doesn't exist, SAS will automatically create one named Work.Formats. Work.Formats is a temporary library that exists until the end of the SAS session. A permanent format catalog named Formats can be used when specified in the LIBRARY=option of the PROC FORMAT statement. To use the Formats library, a LIBNAME statement is required to associate the libref to the permanent library. When a permanent format is associated with a variable, the libref is used to reference the location of the format catalog.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
78
11.1.1 Creating User-Defined Formats To define format for displaying one or more values: x Use VALUE statement x Format: VALUE format-name range1='label1' range2=label2' ...; o format-name is the name of the format that is being created o range specifies one or more variable values and a character string or existing format o label is a text string enclosed in quotation marks. A VALUE format name must: x begin with a dollar sign if the format is applied to character data x not be longer than eight characters x not be the name of an existing SAS formatting x not end in a number x not end in a period when in the VALUE statement. A VALUE range will identify: x a single value x a range of numeric values x a range of character values enclosed in quotation marks x a list of unique values separated by commas. Character values must be enclosed in quotation marks and match the variable's values. The format's name must start with a dollar sign. Numeric values do not need to be enclosed in quotation marks. The format's name does not require a dollar sign.
79
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Non-inclusive ranges can be specified: x using the “less than” symbol to avoid overlapping, as in a
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
80
11.1.3 Displaying User-Defined Formats Top display a list of all the formats in the format catalog with the descriptions of their values, the keyword FNTLIB can be added to the PROC FORMAT statement. The name, range, and label for each format is provided, as well as: x the length of the longest value x number of values defined by this format x version of SAS the format is compatible with x the data and time the format was created. 11.1.4 Creating Variables Anew variable can be created to accumulate the values of multiple variables. To create an accumulator variable: x Use SUM statement x Format: variable+expression; o variable is the name of the accumulator variable o expression is any valid SAS expression. An accumulator variable must be numeric. The variable is automatically set to 0 before reading the first observation. The sum statement will add the results of the expression on the right side of the plus sign to the numeric variable on the left. The numeric variable will retain the new value in the program data vector until the next observation is read. By default, an accumulator variable is initialized to 0. To initialize to a different number, use the RETAIN statement. The RETAIN statement will: x assign an initial value to a retained variable x prevent variables from being initialized with each execution of the DATA step
81
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
To initialize a variable to a value other than 0: x Use RETAIN statement x Format: RETAIN variable initial-value; o variable is the name of the variable whose value will be retained o initial-value is the initial numeric or character value for the variable. Restrictions to the RETAIN statement include: x no effect on variables which are read with SET, MERGE, and UPDATE statements x the retained variable will be initialized to missing of no initial value is specified x the RETAIN statement is a compile-time only statement to create variables that do not already exist. 11.1.5 Grouping Variables Variables can be assigned values conditionally. To conditionally assign a value: x Use IF-THEN statement x Format: IF expression THEN statement; o expression is any valid SAS expression o statement is any executable SAS statement. The assignment statement is executed when the condition is true; if false the value of the variable is missing. Any of the comparison of logical operators can be used in an IF-THEN statement. Logical comparisons are enclosed in parentheses to be evaluated. IF-THEN ELSE statements allow alternative actions to be applied to the condition. Multiple IF-THEN ELSE can be used to provide numerous alternatives. SAS will execute the IF-THEN statements until a true statement is found.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
82
The length of the variable can vary based on the expressions used unless it is explicit specified using the LENGTH statement. x Format: LENGTH variable(s) <$> length; The dollar sign is required if the variable is a character variable. Some observations may be unwanted; to remove them use the DELETE statement. x Format: DELETE ; To conditionally delete an observation: x Use IF-THEN statement and DELETE statement x Format: IF expression THEN DELETE If the expression is: o TRUE, the DELETE statement executes o FALSE, the DELETE statement does not execute and the processing continues. To read and process fields that are not kept in the data set, use the DROP= and KEEP= data set options: x Format: (DROP variable(s)) (KEEP variable(s)) The KEEP option is used when there are more variables dropped than kept. Exclusion of variables can be done using statements instead of data set options; the differences are: x DROP statements cannot be used in the SAS procedure steps x The DROP statement applies to all output data sets named in the DATA statement x To exclude variables in some data sets but not all, use the DROP data set options and associate it to the individual data sets to impact.
83
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Another method for conditionally assigning values is to sue SELECT groups. To conditionally assign values using SELECT: x Use SELECT statement x Format: SELECT <select-expression)>; WHEN-1 (when-expression-1 <..., when-expressionn>) statement; WHEN-n (when-expression-1 <..., when-expressionn>) statement; END; o SELECT begins the SELECT group o select-expression is the SAS expression that is evaluated o WHEN identifies the SAS statements to be executed when a condition is true o when-expression identifies any SAS expression, value. There must be at least one expression o statement is any executable SAS statement o OTHERWISE specifies a statement to be executed when all WHEN statement prove false (optional) o END terminates the SELECT group processing. Conditional processing can be done on groups of statements rather than single SAS statements by creating DO groups. To create a DO group: x Use DO statement x Format: DO; SAS statements END; o DO statement begins the processing o SAS statement are called a DO group and executed as a unit o END terminates the DO group processing.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
84
DO groups can be used within IF/THEN.ELSE statements and SELECT groups. Three forms of DO statements: x The iterative DO statement will execute any statement between DO and END repetitively x The DO UNTIL statement executes statements in a DO loop repetitively until a condition is true x The DO WHILE statement executes statements in a DO loop repetitively while a condition is true
11.2 Using SAS Functions SAS functions are built-in routines enabling data manipulations quickly and easily. Categories of SAS functions include: x Array x Bitwise Logical Operations x Character x Character String Matching x Currency Conversion x Data and Time x Descriptive Statistics x Double Byte Character Set x External Files x Financial x Hyperbolic x Macro x Mathematical x Probability x Quantile x Random Number x SAS File I/O x Special x State and ZIP Code x Trigonometric x Truncation
85
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
x x x
Variable Control Variable Information Web Tools.
With SAS functions: x Calculate sample statistics x Create SAS date values x Convert Zip Codes to state postal codes x Round values x Generate random numbers x Extract a portion of a character value x Convert data from one type to another. 11.2.1 Basics of SAS Functions To use a SAS function: x Format: function-name (argument-1 <,argument-n>); arguments can be variables, constants, or expressions If more than one argument, a comma is used to separate arguments. If the argument is a list or array, the argument must be preceded by the word OF to be interpreted correctly. Target variables are the variables where the result of a function is assigned. The default length of the target variable is dependent on the function, causing more space than necessary in the data set. Add a LENGTH statement for the target variable to control the length of the variable. 11.2.2 Data Conversions The INPUT function will convert character data values to numeric values. The PUT function will convert numeric data values to character values.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
86
When conversion between variable types is required, SAS will automatically attempt the conversion with unpredictable results. Explicit conversions provide better results. Automatic character-to-numeric conversion happens when a character value is: x assigned to a defined numeric variable x used in an arithmetic operation x compared to a numeric value x specified in a function requiring numeric arguments. Automatic conversions: x use the w.d informat x produce a numeric missing value from any character value that does not conform to the standard numeric notation. WHERE statements do not perform automatic conversions. The INPUT statement will explicitly convert characters to numeric values. To explicitly convert character to numeric values: x Format: INPUT (source, informat); o source identifies the character variable, constant, or expression to be converted to a numeric value o informat must be numeric and identified Automatic numeric-to-character conversion happens when a numeric value is: x assigned to a defined character variable x used with an operator that requires character values x specified in a function requiring character arguments.
87
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
The INPUT statement will explicitly convert characters to numeric values. To explicitly convert numeric to character values: x Format: PUT (source, format); o source identifies the numeric variable, constant, or expression to be converted to a character value o format must match the data type of the source being specified. A PUT statement will: x always return a character string x return the source written with a format. The format and the source must agree in type. 11.2.3 Modifying Date Values Many SAS functions work with data and time values. SAS stores a date value as the number of days from January 1, 1960 to a given date. SAS stores a time value as the number of seconds since midnight. The SAS datetime value is the number of seconds between midnight on January 1, 1960 and a given data and time. SAS Date Functions MDY
SAS data (month, day, year)
TODAY DATE
Today's date
TIME
Current time
DAY
Extracts the day value from a SAS data.
QTR
Extracts the QTR value from a SAS data.
WEEKDAY
Represents the days of the week.
MONTH
Extracts the month value from a SAS data.
YEAR
Extracts the year value from a SAS data.
INTCK
Returns the number of time intervals that occur in a given time span.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
88
INTNX
Applies multiples of a given interval to a date, time, or datetime to obtains a result.
DATDIF YRDIF
Calculates the days or years between two SAS dates.
11.2.4 Modifying Character Values Functions which modify character values can: x separate character string variables into multiple smaller variables x replace a portion of a character variable's values x search for a specific string. Character functions include: SCAN
Returns a specified word from a character value.
SUBSTR
Extracts a substring or replaces character values.
TRIM
Trims trailing blanks from character values.
CATX
Concatenates character strings, removing leading and trailing blanks and inserts separators.
INDEX
Searches a character value for a specific string.
FIND
Searches for a specific substring of characters within a character string specified.
UPCASE
Converts all letters in a value to uppercase.
LOWCASE
Converts all letters in a value to lowercase.
PROPCASE Converts all letters in a value to proper case. TRANWRD
Replaces or removes all occurrences of a pattern of characters within a character string.
The SCAN function can be used to separate a character value into words and to return a specified word. Delimiters are used to separate a character string into words. Two or more contiguous delimiters are treated as one delimiter.
89
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Default delimiters include: blanks . < ( + | & ! $ * ) ; ^ - / , % To use a SCAN function with delimiters: x Format: SCAN (argument, n, delimiters); o argument identifies the character variable or expression to be scanned o n identifies which word to read o delimiters are special characters enclosed in parentheses. The SCAN function assigns a length of 200 to every target variable. A LENGTH statement in the DATA step will limit the length. The SUBSTR function allows: x a portion of the character value to be extracted x a portion of the character value to be replaced.
To use a SUBSTR function: x Format: SUBSTR (argument, position, ); o argument identifies the character variable or expression to be scanned o position is the character position to start from o n identifies the number of characters to extract. To use a TRIM function to remove trailing blanks: x Format: TRIM (argument); o argument is any character expression: character variable or character function To use a CATX function for concatenation: x Format: CATX (separator, string-1 <,...string-n>); o separator identifies the character string to be used between concatenated strings o string specifies a SAS character string. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
90
To use a INDEX function to search for a string within a value: x Format: INDEX (source, excerpt); o source is the character variable or expression to search o excerpt is the character string enclosed in quotation marks. To use a FIND function to search for a string within a character string: x Format: FIND (string, substring<,modifiers>); o string is the character constant, variable, or expression to be searched o substring is the character constant, variable, or expression to be searched for in string o modifiers is the character constant, variable, or expression which specifies one or more modifiers o startpos is an integer specifying the position where the search should start and its direction. To use an UPPERCASE function: x Format: UPCASE (argument); o argument is any character expression: character variable or character constant. To use a LOWERCASE function: x Format: LOWCASE (argument); o argument is any character expression: character variable or character constant. To use a PROPCASE function to convert all words to proper names: x Format: PROPCASE (argument <,delimiter(s)>); o argument is any character expression: character variable or character constant o delimiter(s) identify the separators of the string.
91
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
To use a TRANWRD function to replace or remove all occurrences of a pattern: x Format: TRANWRD (source, target, replacement) o source is the string to translate o target is the string to be searched for in the source o replacement is the string that replaces target. 11.2.5 Modifying Numeric Values To return the integer portion of a numeric value: x Format: INT (argument); o argument is any numeric variable, constant or expression. To round values to the nearest unit: x Format: ROUND (argument,round-off-unit); o argument is any numeric variable, constant or expression o round-off-unit is numeric and nonnegative.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
92
11.3 Expanding DO Loops DO loops allow: x concise DATA steps to be written x data to be generated x statements to be conditionally executed x data to be read. Groups of statements are processes repeatedly, not just once. To construct a DO loop: x Use a DO statement x Format: DO index-variable=start TO stop BY increment; SAS statements END; o index variable will store the value of the current iteration of the DO loop o The start, stop, and increment values: are set when the entering the DO loop cannot be changed during processing can be numbers, variables, or SAS expressions. 11.3.1 Executing DO Loops Initiating DO loops within DATA steps begin with compiling the data. A program data vector is created. When the DATA step is executed, the values are assigned to the variables that will be used within the DO loop. The DO loop executes where the specified calculations are made and the result added to the previous value for the variable in the program data vector. The DO loop continues to execute for as many times as specified in the program. At this point the DO loop ends and finally the DATA step.
93
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
If an observation for each iteration of the DO loop is desired, place an OUTPUT statement inside the loop. The DO loop's index variable can be decremented by specifying a negative value for the BY value of the DO statement. The number of times a DO loop is executed can be specified by listing items in a series. The DO loop will execute for each item listed in the series. The items in the series must be separated by commas. DO loops can be nested in a DO loop. For multiple nested DO loops, a unique index-variable name must be assigned for each iterative DO statement. Each DO loop must have an END statement. The number of times a DO loop is executed can be conditionally determined using the DO UNTIL and DO WHERE statements. DO UNTIL statements are used when DO loops should be executed repeatedly until a condition is met. To execute a DO loop until a condition is met: x Use a DO UNTIL statement x Format: DO UNTIL (expression); more SAS statements END; expression is a valid SAS expression enclosed in parentheses. The DO UNTIL statement will execute at least once. DO WHILE statements are used when DO loops should be executed repeatedly while a condition is met. To execute a DO loop until a condition is met: x Use a DO WHILE statement x Format: DO WHILE (expression); more SAS statements END; expression is a valid SAS expression enclosed in parentheses Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
94
If the expression is false during the first iteration, the DO loop will never execute. DO loops can be used to draw sample observations from a data set. To create sample data, enclose a SET statement in the DO loop. Use the start, stop, and increment values to control which observations to select from. A POINT=option should be associated with the index variable.
11.4 Process Arrays Arrays can be used to: x perform repetitive calculations x create several variables with the same attributes x read data x change variables to observations x change observations to variables x compare variables x perform a table lookup. A SAS array is a temporary grouping of variables under a single name. An array exists for the duration of the DATA step. Arrays are used to reduce the number of statements required for processing variables. To group variables into an array: x Use an ARRAY statement x Format: ARRAY array-name{dimension} <elements>; o array-name specifies the name of the array o dimension shows the number and arrangement of array elements. The default dimension is one. o elements list the variables to include in the array.
95
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
11.4.1 One-dimensional Arrays A dimension for a one-dimensional array can be specified in several ways: x The number of array elements is specified x A range of values is specified x An asterisk will have the SAS count the number of elements x The dimension can be enclosed in parentheses, graces, or brackets. When elements are specified, they need to be separated by a space. Elements can be specified as a variable list. Common variable lists: x a numbered range of variables Var1-Varn x all numeric variables _NUMERIC_ x all character variables CHARACTER x all variables _ALL_ The advantage of arrays is the ability to reference the elements by an index value. An index value is assigned to each array element. An array reference is used to perform actions on an array element during execution. To create an array reference: x Format: array-name{index value} index value: o is enclosed in parentheses, braces, or brackets o specifies an integer, numeric variable, or SAS numeric expression o Is within the lower and upper bounds of the array's dimension. 11.4.2 Executing Arrays During compilation, a program data vector is created. The DATA step is scanned for syntax errors. The index values of the array elements are assigned. Since an ARRAY statement is a compile-time only Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
96
statement, it is ignored during execution. The DATA step executes. The DIM function can be used to return the number of elements in the array. x Format: DIM (array-name) 11.4.3 Advanced Array Functions Variables can be created in an ARRAY statement. Omit the array elements from the ARRAY statement. With no array elements, SAS will automatically create variables and assigns default names to them. To create variables into an array: x Use an ARRAY statement x Format: ARRAY array-name{dimension}; o array-name specifies the name of the array o dimension shows the number and arrangement of array elements. The default dimension is one. Default variable names are created by concatenating the array name and the numeric value of the dimension. An array of character variables can be created by adding a dollar sign after the array dimension. By default, the length of the variable name is 8, but that can be changed by assigning a length after the dollar sign. To assign initial values in an ARRAY statement: x place the values after the array elements x specify an initial value for each corresponding array element x separate each value with a blank or comma x enclose the initial values in parentheses. Temporary array elements for the DATA step process can be done by specifying _TEMPORARY_ after the array name and dimension. Temporary array elements do not appear in the resulting data set.
97
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
11.4.4 Multidimensional Arrays Variables can be grouped into table-like structures, called multidimensional arrays. To define a multidimensional array, the number of elements in each dimension is specified separated by a comma. The first dimension specifies the number of rows. The second dimension specifies the number of columns. Array elements are grouped in the order they are listed in the ARRAY statement. Multidimensional arrays are typically used with nested DO loops. Arrays can also rotate a SAS data set: x changing variables into observations x changing observations into variables.
12 Troubleshooting SAS Programs SAS programs consist of SAS statements. A consistent layout enhances readability. Best practices: x begin DATA and PROC steps in column one x indent statements within a step x include a RUN statement after every DATA or PROC step x begin RUN statements in column one.
12.1 Types of Errors The most common errors: x syntax errors – occur when statements do not conform to language rules x data errors – occur when data values are not appropriate for the specified SAS statements. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
98
12.1.1 Syntax Errors SAS scans for syntax errors in each step before processing the step. Information, including any errors, is written to the SAS log. Errors are displayed in the SAS log: x beginning with the word ERROR x the possible location of the error x explanation for the error. Programs are edited in the Editor window. After correcting programs, you resubmit it. 12.1.2 Resolving Common Problems Common errors that may require editing: x spelling mistakes x omitted semicolons x unbalanced quotation marks x invalid options specified. Many errors come with messages in the SAS log: x omitted semicolon – log message indicates an error in a statement which seems valid x unbalanced quotations – log message indicates test string is too long or ambiguous x invalid options – log message indicated option is invalid or unrecognized. A missing RUN statement will have an error message “DATA (or PROC) step running” at the top of the active window. Each step in a SAS program is compiled and executed independent of all other steps. A DATA or PROC statement indicated the beginning of the step. A RUN or QUIT statement indicated the end of the step. To resolve, submit a RUN statement.
99
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
13 Practice Exam The following multiple-choice questions are a refresher from the Foundation level as a prelude. Question 1 Which of the following items is not considered a category for SAS functions? A) B) C) D)
Special Descriptive Statistics Arithmetic Operations Variable Control
Question 2 Statements are free format expressions that make up the SAS program. Which of the following are true about statements? A) B) C) D)
Statements must begin at the start of the line Statements are always in lowercase to delineate from keywords Only one statement is allowed any line of programming code Statements always end in a semicolon
Question 3 Which of the following is true about priorities in relation to calculating arithmetic expressions? A) If consecutive operators have the same priority, they are always read from right to left B) Exponential calculations will always be done before multiplication in any simple expression C) Multiplication and addition have a higher priority than subtraction D) Arithmetic operations have to be enclosed by parentheses
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
100
Question 4 Which of the following statements are correct when using the RETAIN statement? A) Variables read with the SET statement will not be affected by the RETAIN statement B) The RETAIN statement is used when a variable needs to be initialized to a value of 0 C) Variables will be initialized to the desired value each time the DATA step is executed D) Variables are created during execution
Question 5 When multiple expressions are used within a function, what delimiter should be used to separate them? A) B) C) D)
Space Semicolon Parentheses Comma
Question 6 Which of the following statements is not correct about crosstabulation of data? A) Frequency calculations for the number of cells that appear will be included in the table B) Crosstabulation is used when comparing three or more distinct variables C) The first variable listed in the TABLES statement will define the rows of the table D) Each level of variables will be represented by a separate two-way table
101
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 7 Which of the following statements is not true for list inputs? A) B) C) D)
Fields cannot be skipped or re-read Fields are read from left to right Fields must be separated by a delimiter The MISSOVER option must be used at all times to read missing values
Question 8 Each data set has a descriptive portion providing useful information about the data set. Which of the following items would not be found in the descriptive portion? A) B) C) D)
Creation data and time Set name Information on labels Error messages
Question 9 What would be the correct raw data value of a datatime field for Dec 15, 1986 10:00pm for SAS? A) B) C) D)
10942 10935 15756360 121519862200
Question 10 Which location pointer identifies the exact line number from which to begin reading data from? A) B) C) D)
#n pointer input pointer / pointer Column pointer
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
102
Question 11 What is the purpose of the SUM statement? A) B) C) D)
To combine numeric variables To obtain a total output of multiple values To produce column totals All of the above
Question 12 Which programming statement is used to perform conditional processing on groups of SAS statements? A) B) C) D)
DO BY WHERE IF/THEN
Question 13 Which character function would be used to remove extra spaces from character strings? A) B) C) D)
SUBSTR TRIM FIND CATX
Question 14 Which of the statements below is incorrect? A) B) C) D)
103
SAS will read and execute steps simultaneously A step is completed when the QUIT statement appears All programming steps begin with either a DATA or PROC keyword A step is completed when the RUN statement appears
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 15 Which of the following statements is true about column inputs of data? A) B) C) D)
Once a field is read, the data step cannot come back to it Should not be used to read nonstandard numeric data All fields have to be read sequentially Missing data must have a placeholder in the form of a period
Question 16 Which of the following is true about performance testing of DATA steps? A) Performance testing cannot be done on data steps B) A data set is created during performance testing C) The number of observations can be limited using the OBS= in the FILENAME statement D) Conditions can be placed in the performance testing to enable value testing
Question 17 Which of the following conditions should discourage the use of the one-toone readings to combine data sets? A) When a better method is available B) All observations from each data set read should be contained in the final data set C) All variables from each data set read should be contained in the final data set D) Variables have the same name in multiple data sets
Question 18 When is the best time to use the DROP= statement? A) B) C) D)
Only when used with the KEEP= statement Only when a variable should not be processed When the variable should not appear in the data set Never
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
104
Question 19 In order to sequentially read multiple records to create a single observation, what condition must be in place to succeed? A) B) C) D)
The records must be in free format. Variables have to be specified explicitly The #n line pointer has to be specified The number of observations for each record must be the same
Question 20 When would you use a CLASS statement? A) The CLASS statement is used to sort data in a defined structure B) The CLASS statement is used to define and group observations C) The CLASS statement is used when applying a specified statistic to a group of observations D) The CLASS statement is used when a table must be created
Question 21 Which of the following is not considered a character type of variable? A) B) C) D)
Miller Miller12 Miller.2 2miller4u
Question 22 When would the keyword, FILE, be appropriate to use? A) B) C) D)
105
When attempting to describe the file where raw data is being written When referencing the file to be read When creating a raw data file When referencing the file to be written to
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 23 What data summarizations are provided by the PROC MEANS statement by default? A) B) C) D)
The minimum value the n-count The standard deviation All of the above
Question 24 Which of the following is a true statement about executing a data step? A) B) C) D)
The input raw data is read from the input buffer Error checking is performed at the beginning of the execution phase Multiple observations can be found in a single fixed field record The iteration for a DATA step refers to the number of INPUT statement in the step
Question 25 When using the VALUE statement to define the format of specific values, which of the following statements is not correct about identifying the range. A) A series of character values can be identified as the range for a VALUE statement B) A single value can be identified as the range for a VALUE statement C) A list of unique values separated by commas can be identified as the range for a VALUE statement D) A series of numeric values can be identified as the range for a VALUE statement
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
106
Question 26 Which of the following describes the programming process of adding observations from one data set to the observations of another data set? A) B) C) D)
Match-merging Interweaving Concatenating None of the above
Question 27 What statement is used to identify the conditions for selecting observations in a report? A) B) C) D)
WHERE IF/THEN ELSE SET INPUT
Question 28 Which of the following items can be expected by the SORT procedure? A) B) C) D)
Replace the original SAS data set Will not generate printed output Rearrange observations in the SAS data set All of the above
Question 29 When is character-to-numeric conversion not automatically performed? A) B) C) D)
107
When used in an arithmetic operation When used within a WHERE statement When compared to a numeric value When assigned a defined numeric variable
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 30 Several libraries are created during a SAS session by default. Which of the following is not one of those libraries? A) B) C) D)
Sasuser Libname Sashelp Work
Question 31 When are SAS arrays not used? A) B) C) D)
SAS arrays are used to compare variables SAS arrays are used to change variables to observations SAS arrays are used to group data within the data set SAS arrays are used to perform repetitive calculations
Question 32 Which of the following in a correctly created data set name? A) B) C) D)
_Set.name addAnew_1 0pen_2C1other_rtm*XCX create\B4infinity
Question 33 What must happen for DO Loops to be nested within each other? A) Each DO loop must end in a RUN statement B) A condition must be in place to determine if the DO loop should be done C) An OUTPUT statement is required to ensure the result of the DOloop is retained D) A unique index-variable name must be assigned for each iterative DO statement Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
108
Question 34 Which of the following programming conventions can be used in conjunction with the line-hold specifier, @@? A) B) C) D)
@ pointer control MISSOVER option column input INPUT statement
Question 35 Which of the following is not an error identified during the compilation phase? A) B) C) D)
Quotation marks are unbalances No RUN statement in the step An option is invalid Semicolons are missing in statements
Question 36 Which of the following methods for combining data sets is distinguished for its use of the BY statement? A) B) C) D)
Interweaving Match-merging Concatenation None of the above
Question 37 Which of the following options does not describe how variables are to be used in creating a report? A) B) C) D)
109
DISPLAY ACROSS GROUP NUMERIC
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 38 Which of the following is a standard numeric data value? A) B) C) D)
Integer binaries Scientific notation Fractions Hexadecimal numbers
Question 39 Which of the following keywords is used to create a raw data file? A) B) C) D)
filename libname infile set
Question 40 Which of the following is not an attribute of the program data vector? A) B) C) D)
Will scan data for errors Is created after the creation of the input buffer Refers to physical and logical memory available to the SAS program Will expand as the number of variables increase
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
110
14 Answer Guide 14.1 Answers to Questions Question 1 Answer: C Reasoning: There are many categories for SAS functions, but Arithmetic Operations is not one of them. The category is actually Bitwise Logical Operations. Question 2 Answer: D Reasoning: Statements can start anywhere in a line which also allows for multiple statement to exist on a single line. There are no restrictions to the case required for statements. Statements always begin with keywords and end in a semicolon. Question 3 Answer: B Reasoning: Exponential expressions have a higher priority than multiplication expressions, unless parentheses exist to control any calculations. Question 4 Answer: A Reasoning: The RETAIN statement is used when a variable needs to be initialized to a value other than 0. The statement will prevent variables from being initialized. All of this is done during the compilation stage, not the execution stage. Variables that are read with the SET, MERGE, and UPDATE statements are not affected by the RETAIN statement. Question 5 Answer: D Reasoning: Multiple variables, constants, and expressions must be separated by a comma when used in an argument.
111
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 6 Answer: B Reasoning: Crosstabulation is used when comparing two distinct variables. More variables may be compared as well. Question 7 Answer: D Reasoning: If missing values are represented by a special character that is not used as a separated of data, then the MISSOVER option is not required. Question 8 Answer: D Reasoning: Error messages are found in the program log, not the descriptive portion of the data set. Question 9 Answer: C Reasoning: The datetime value is the number of seconds from midnight on 1/1/1960. This of course would be a rather large number, in fact, 120 seconds from 10942 days (including 7 extra days from leap years). The last answer is incorrect as it is the date and time written out without any delimiters. Question 10 Answer: A Reasoning: The #n pointer references the absolute number of the line for moving the pointer to read data. Question 11 Answer: C Reasoning: In creating reports for SAS programs, the SUM statement is used to create column totals for numeric values. Question 12 Answer: A Reasoning: The key word in this question is ‘group’. Conditional processing can be performed using some iteration of BY, WHERE, and IF/THEN, but DO provides an opportunity to group statements together. Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
112
Question 13 Answer: D Reasoning: CATX is the most correct answer as it removes both leading and trailing blanks from character strings. TRIM only removes the trailing blanks from character values. Question 14 Answer: A Reasoning: Programming steps begin with DATA or PROC and end with either QUIT or END. SAS steps are completely read before executing. Question 15 Answer: B Reasoning: Column input is used when data fields are fixed. In this situation, the columns can be read in any order and re-read if necessary. Missing data does not require a placeholder either. Unfortunately, only standard data should be read using column input: nonstandard data should utilize the formatted input. Question 16 Answer: D Reasoning: Performance testing can be done on data steps by putting a NULL in the DATA step. This will prevent a data set from being created. The number of observations can be limited by putting the OBS= option in the INFILE statement. Conditions can be set using IF/THEN ELSE statements to test for values. Question 17 Answer: B Reasoning: One-to-one readings are the simplest form of combining data sets which will provide the best option in ensuring the programming code is not overly complex. In this type of combining, all the variables from each of the data sets will exist in the new data set. Variables with the same name does not matter, since the values of the last data set read will overwrite all previous values from existing data sets. If the values should not be overwritten, the variable name can be changed before the output is rendered. Not all observations will appear in the final data set: if this is a requirement than concatenation is the next best choice for combining data sets.
113
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 18 Answer: C Reasoning: DROP= and KEEP= can be used independently from each other. Whether the DROP= statement appears in the same statement as the DATA or SET statement will determine if the variable will be processed or not. In all cases though, the variable will not appear in the data set. Question 19 Answer: D Reasoning: The only correct answer is that the number of observations for each record must be the same. Question 20 Answer: C Reasoning: The CLASS statement will allow statistics to be applied to a group of observations. As a result of this statement, the data would be sorted, grouped and displayed in a single table; but is not the primary reason for using the CLASS statement. Question 21 Answer: C Reasoning: SAS naming conventions require variables names to be 1-32 characters long, can begin and consist of any combination of letters, numbers, and underscores. Question 22 Answer: D Reasoning: SET is used to create a raw data file. FILENAME references a file to be read, while FILE is used to reference a file to be written to. The PUT statement will describe the lines written to the raw data. Question 23 Answer: D Reasoning: PROC MEANS will also provide the mean and the maximum value without limiting the output.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
114
Question 24 Answer: B Reasoning: Input raw data is read from the program data vector while INPUT statements are read from the input buffer. In raw data with fixed fields, a single record provides only one observation. Multiple observations can be contained in a single record for free-format files. An iteration refers to the complete cycle through the data step, typically one iteration for each observation found. Question 25 Answer: A Reasoning: C shows that detail is required in the answer. The range can consist of character values which are enclosed by quotation marks. Question 26 Answer: C Reasoning: Match merging will merge several data sets together based on the value of a common variable. Interweaving can do the same as match-weaving, but with more than one common variable. Concatenation will append, or add on, to existing data sets. . Question 27 Answer: A Reasoning: The WHERE statement will identify conditions for identifying variables and values to print in a report. Question 28 Answer: D Reasoning: All of the situations describe are possible with the SORT procedure. Question 29 Answer: B Reasoning: A WHERE statement will not use automatic conversion of any kind.
115
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
Question 30 Answer: B Reasoning: All of the libraries are created by default except libref, which is not a library but a keyword for naming a library. Question 31 Answer: C Reasoning: SAS arrays are a temporary grouping of variables that exist for the duration of a data step. They will not 'group' data within the data set. Question 32 Answer: B Reasoning: Data set names can be 1-32 characters long and consists and begin with alphanumeric characters and underscores. Question 33 Answer: D Reasoning: DO statements must end with an END statement. A condition is not necessary. The OUTPUT statement is used to create observations for each iteration of the DO loop. A unique indexvariable name is required when nesting DO loops. Question 34 Answer: D Reasoning: A, B, C should not be used with the double training at line-hold specifier. Question 35 Answer: B Reasoning: The compilation phase will identify most syntax errors, while the executing phase will identify processing errors. No RUN or QUIT statement will result in processing to continue indefinitely.
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
116
Question 36 Answer: A Reasoning: Match-merging is distinguished by the MERGE statement and concatenation by multiple SET statements. The interweaving method uses BY statements to choose which observations are appropriate from each of the data sets. Though the match-merging method also uses BY statements to focus the observations used, it must have the MERGE statement to perform properly. Question 37 Answer: D Reasoning: There is no numeric options: the default for numeric values is ANALYSIS. Additional options not mentioned include ORDER and COMPUTED. Question 38 Answer: B Reasoning: Standard numeric data values consist of positive and negative numbers, decimal points, and scientific notation. Question 39 Answer: D Reasoning: 'Libname' is used to create a library. 'Filename' will reference an external file where raw data is stored and after it has been referenced once, the 'infile' is used to identify the external file. 'Set' will create a raw data file. Question 40 Answer: C Reasoning: The program data vector refers to logical memory only.
117
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
15 References SAS Certification Prep Guide: Base Programming for SAS 9, www.scripd.com For more information:
www.support.sas.com
Websites www.artofservice.com.au www.theartofservice.org www.theartofservice.com
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
118
INDEX* A absolution value 67, 119 accumulator variable 81, 119 Advanced Reporting 3, 63 Amazon 2 analysis variables 66-7, 119 answer 112, 115, 119 Answer 4, 111-17, 119 answer, correct 113-14 Answer Guide 4, 111, 119 appearance 17, 63-5, 68, 71, 77, 119 Appends observations 37, 119 argument 86, 90-2, 111, 119 Arithmetic Operations 100, 107, 111, 119 Arithmetic operators 24-5, 119 array elements 96-8, 119 arrangement of 95, 97 array-name 95-7, 119 ARRAY statement 96-8, 119 ARRAY statement Format 95, 97, 119 arrays 86, 95-8, 119 multidimensional 98 Art of Service Copyright 1, 119 ascending order 40, 57-8, 119 attributes 15, 28, 38, 64-6, 110, 119 automatic conversions 87, 115, 119 automatic variables 32, 119 B blanks 16, 44, 47-8, 52, 58, 90, 97, 119 BODY 76-7, 119 book 1-2, 5, 119 brackets 96, 119 Brisbane 4-6, 17-20, 32-4, 39, 55-6, 61-2, 67, 69-70, 75, 88-9, 92, 104, 118-19, 126, 129 bytes 16, 119 C candidate 7, 119 catalog 2 CATX 89-90, 103, 113, 119 cells 67, 74, 101, 119 certifications 6, 119 character constant 91, 119
119
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
character data 46, 79, 119 character data values 86, 119 character expression 90-1, 119 Character functions 89-90, 103, 120 character strings 31, 79, 89-91, 103, 113, 120 character value 86, 120 character values 16, 49-50, 61, 79, 82, 86-90, 106, 113, 115, 120 character values 87, 120 character variable 47-8, 56, 65, 83, 87, 90-1, 97, 120 character variables ORDER xf0b7 GROUP 66, 120 characters 11-12, 14-16, 18, 20, 22-4, 46, 48-9, 55-6, 67, 70, 80, 8790, 114, 116, 120 formatting 18 special 9, 22, 44-5, 50, 90, 112 split 65 CLASS statement 70-1, 105, 114, 120 CLASS variables 70, 120 code 31-2, 72, 78, 120 code type 31, 120 coded information 78, 120 collection 9-10, 15, 120 colon 49-50, 120 column frequencies 74, 120 column headings 64-5, 120 column input 3, 22-3, 44, 46, 54, 104, 109, 113, 120 column number 45-6, 120 COLUMN statement 63, 120 COLUMN statement 68, 120 column totals 55, 58, 103, 112, 120 column width 65, 120 columns 15, 21-3, 29, 44-8, 50-1, 56, 60, 64-5, 74, 98, 113, 120 combining data sets 37, 109, 113, 120 combining multiple data sets 37-40, 120 commas 61-2, 79, 86, 94, 98, 101, 106, 111, 120 format variable 62 COMMAw 46, 61-2, 120 compilation 31, 36, 41, 96, 120 compilation phase 28, 30, 109, 116, 120 concatenation 38, 90, 109, 113, 115, 117, 120 Conditional processing 32, 84, 103, 112, 120 conditions 25-6, 34-5, 43, 56, 63, 82, 84-5, 94, 104-5, 107-8, 113, 11516, 120 conjunction 36, 58, 109, 120 CONTENTS DATA 13-14, 120 contents-file-specification, procedure output 77 CONTENTS statement 13, 120 Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
120
Continuous data values 68, 120 control 10, 25, 35-6, 53-4, 68, 71, 86, 95, 111, 121 conversion 87, 121 correct raw data value 102, 121 created data set name 108, 121 CROSSLIST 74, 121 crosstabulation 72-4, 101, 112, 121 D DATA and PROC statement 9, 121 DATA and PROC steps 98, 121 Data and Time informats 51, 121 data output 62, 121 DATA output-SAS-data-set 37-40, 42-3, 121 DATA SAS-data-set 33, 35, 121 data set names 30, 121 data set 28, 37, 121 DATA statement 19, 21, 33-4, 37-40, 83, 121 DATA statement and SET statement Format 33, 121 DATA statement Format 21, 31, 121 data step 3, 21, 36, 104, 106, 113, 115-16, 121 DATA step 8, 19, 21, 24, 26-31, 35-6, 42, 53-4, 62, 75, 90, 93, 95-6, 101, 104, 106 [2] DATA Step 28-9, 121 DATA step Copyright 81, 121 DATA step executes 53-4, 97, 121 DATA step process 97, 121 DATA step program 19, 41, 121 DATA step 54, 121 DATALINES 26, 121 DATALINES statement 26, 121 DATALINES statement Format 26, 121 date 17, 51, 88-9, 112, 121 DATE 17, 61, 88, 121 Date values 51, 61, 88, 121 Create SAS 86 datetime value 112, 121 days 7, 51-2, 88-9, 112, 121 DB2 12, 121 Debugging DATA Steps 3, 30, 121 decimal places 16, 46, 61-2, 70, 121 Default delimiters 49, 90, 121 default names 97 Default reports list observation 57, 121 default value 47 Default variable names 97
121
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
delimiters 44, 46-50, 52, 89-91, 101-2, 112, 121 DESCENDING 40, 57-8, 64, 121 descriptive labels formatting variable values 59, 121 descriptive portion 14-15, 102, 112, 121 Descriptive Statistics 3, 68-9, 100, 122 descriptor information 13-14, 122 descriptor portion 27-8, 36, 122 designations 5, 122 Desired Format 61-2, 122 destination 75-6, 122 default 76 detail rows 66 deviation, standard 67, 69, 106 dimension 95-8, 122 default 95, 97 display 19, 31, 52, 64, 66, 80, 109, 122 display values 70, 122 display variables 66, 122 DLM 49-50, 122 dollar sign 23, 46, 61-2, 79, 83, 97, 122 DOLLARw 61-2, 122 DSD 49-50, 122 E eBook 2 END 36, 84-5, 93-4, 113, 122 end-of-file marker 29, 35-6, 122 end-of-record marker 47, 122 END statement 94, 116, 122 engine 11-12, 122 environment, operating 10, 12, 47, 59, 75 Error messages 30, 38, 99, 102, 112, 122 errors 4, 8-9, 27-30, 32, 36, 47, 98-9, 106, 109-10, 122 common 98-9 syntax 28, 30, 96, 99, 116 exam 1, 7, 122 excerpt 91, 122 Exclusion of unmatched observations 42, 122 execute 9, 20, 82-3, 85, 93-5, 122 Execute DATA step 20, 122 execution 31, 41, 54, 81, 96-7, 101, 122 execution phase 27, 29-30, 36, 106, 122 exporting raw data files xf0b7 6, 122 expression 24-6, 31-2, 56, 81-4, 86-8, 90-2, 94-5, 100, 111, 122 Extracts 88-90, 122 Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
122
F field width 46, 51-2, 61, 122 fields 19, 22-3, 44, 47-8, 102, 104, 122 delimiter 47 fixed 19, 21, 44-5, 115 file name 50, 123 file-specification 21, 27, 48, 76, 123 FILE statement 27, 50, 123 FILE statement Format 27, 123 filename 10, 12, 20-1, 27, 33, 48, 50, 76-7, 110, 117, 123 FILENAME fileref 20, 123 FILENAME references 114, 123 FILENAME statement 19-20, 104, 123 FILENAME statement Format 20, 123 fileref 20-1, 27, 48, 76, 123 files 10-14, 20-2, 27, 29, 48-9, 55, 76-7, 105, 114, 123 external 19-21, 26, 28, 46, 117 free-format 115 input 21, 27 FIRSTOBS 18, 123 fixed-field data 47, 123 floating-point numbers 16, 123 following 100-2, 104-10, 123 FOOTNOTE statement 59-60, 123 format 10-11, 16, 34, 38, 46, 50, 52, 61-2, 64, 72, 75, 78-81, 88, 106, 123, 125 assigning 61, 64 free 9, 22, 44, 105, 129 format catalog 78, 80-1, 123 format name 80, 123 format-name 61, 79, 123 FORMAT statement 61, 80, 123 FORMAT Statement 62, 123 single 61 single PROC 80 FORMAT variable 34, 61, 123 format's name 79, 123 formatted input processes 46, 123 formatted values 66, 123 formatting 8, 16, 78, 120, 123 formatting-character 18, 123 formatting value 66, 123 frame-file-specification, procedure output 77 free-format data 50, 123 free format expressions 100, 123 FREQ procedures 72-4, 123
123
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
frequency 73, 123 cumulative 73 frequency tables 3, 72-3, 123 function 24, 86-7, 89, 91, 101, 123 function-name 86, 123 G group processing 84, 124 group variables 66-7, 95, 124 groups 34, 39, 58-9, 64, 66-7, 70, 84-5, 93, 103, 109, 116, 124, 127 H Hexadecimal numbers hh 52, 124
110, 124
I ID/BY value 58, 124 ID statement 56, 58, 124 IF/THEN 103, 107, 112-13, 124 IF-THEN 82, 124 IF-THEN statement 82, 124 increment values 93, 95, 124 INDEX 89, 91, 119, 124 index value 96, 124 index variable 93, 95, 124 index-variable name 124 unique 94, 108, 116 INFILE statement 19, 21, 26, 29, 31, 47, 49, 113, 124 INFILE statement and DLM-option Format 48, 124 INFILE statement Format 21, 124 infinite number 16, 124 information 5, 13-16, 22, 52, 54-5, 75, 78, 99, 102, 118, 124 informats 16, 38, 45-6, 51, 87, 124 initialize 81-2, 124 input 123, 129, 131 INPUT 28-9, 45, 51, 87, 107, 115, 124 input, formatted 44, 113, 123 input buffer 28-9, 54, 106, 110, 115, 124 input data set name 18, 124 input data sets 34, 36, 38-42, 57-8, 67, 124 input pointer 29, 45-6, 51-4, 102, 124 input record 54 INPUT statement 19, 22, 50, 52-4, 87-8, 106, 109, 124 multiple 52 INPUT statement executes 54, 124 INPUT statement Format 23, 45, 47, 51, 124 Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
124
instructions 5, 11, 19, 24, 53, 124 integer 51-2, 91, 96, 124 time values fractions xf0b7 45 Intersperses observations 37, 124 Interweaving 39, 107, 109, 115, 124 interweaving readings 39-40, 124 iteration 29, 36, 53-4, 93-5, 106, 112, 115-16, 124 J justification K keywords
64-5, 125
8, 72, 76, 80, 100, 105, 110-11, 116, 125
L label 16, 18, 34, 38, 60, 62, 64, 79-81, 102, 125 label columns 60, 125 label name 56, 125 LABEL statement 60, 125 single 60 length 16, 38, 47, 50, 65, 81, 83, 86, 90, 97, 125 LENGTH statement 50, 83, 86, 90, 125 letters 11-12, 14-15, 20, 23, 55, 89, 114, 125 liability 5, 125 libname 12, 21, 108, 110, 117, 125 LIBNAME libref 11, 20, 55, 125 LIBNAME libref engine 12, 125 LIBNAME statement 11, 19-20, 78, 125 LIBNAME statement Format 11-12, 20, 125 libraries 10-12, 20, 33, 55, 78, 80, 108, 116-17, 125 formats 78 permanent 10, 20, 78, 80 temporary 11, 78, 80 library contents 13, 125, 130 library name 10-11, 125 library NODETAILS 13-14 libref 10-14, 20, 33, 55, 78, 116, 125 libref.filename 10, 125 limited number 70, 125 line pointer control 52-3, 125 lines 17, 27, 49, 53, 58, 60, 100, 111-12, 114, 125 links 2, 25, 77, 125 list input 47-9, 102, 125 listing 13-14, 17, 76, 125 location 19-21, 27, 29, 48, 77-8, 99, 125 log 2, 17-18, 23, 30-1, 125
125
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
log message 99, 125 log Values 30, 125 loops 4, 29, 85, 93-5, 98, 108, 116, 125 Lower quartile 70, 125 lowercase 9, 14-15, 23, 89, 100, 125 M match 41-2, 79, 88, 125 Match-merging 42, 107, 109, 117, 125 Matches observations 37, 126 MAX 18, 67, 69, 72, 126 maximum value 67, 69, 114, 126 memory 28, 126 merge 40-3, 82, 111, 115, 126 MERGE SAS-data-set-1 40, 42-3, 126 MERGE statement 41-2, 117, 126 MERGE statement Format 40, 43, 126 Miller 105, 126 minimum value 67, 69, 106, 126 missing value 49, 126 MISSOVER option 49, 54, 102, 109, 112, 126 modifiers 49, 91, 126 Modifying Character Values 89, 126 Modifying Date Values 88, 126 Modifying Numeric Values 92, 126 multiplication 100, 126 N name 10-12, 14-16, 19-21, 24, 27-8, 31, 37-40, 42-3, 47-8, 50-1, 55, 63-4, 71-3, 77, 79, 81-2 [10] negative numbers 22, 45, 117, 126 negative value 94, 126 newdata.dat 12, 126 NODATE 17, 126 NODETAILS 13, 126 Nonstandard numeric data values 22, 45, 126 NONUMBER 17, 126 NOTSORTED 58, 126 NULL statement 31, 126 number 11-12, 14-17, 20, 23, 28, 31, 38, 46, 51, 59-60, 67, 69-70, 88, 93-8, 104-6, 112-14 [11] numbered heading 77, 126 numbered range 96, 126 numeric 16, 22-3, 70, 81, 87-8, 92, 96, 126 numeric data values 86, 126 numeric value 87, 126 Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
126
numeric values 16, 55, 79, 86-8, 92, 97, 106-7, 112, 117, 126 numeric values 21, 79, 126 numeric values Representation 16, 126 numeric variables 16, 56, 58, 65, 67, 69-70, 72, 81, 88, 92, 96, 103, 126 O obs 18, 127 OBS 18, 23, 31, 104, 113, 127 OBS column 56, 58, 127 Observation numbers 35, 55, 127 observations 3, 15, 17-18, 26, 28-9, 34-43, 52-3, 55-6, 66-7, 69-70, 73, 83, 94-5, 104-5, 107-8, 113-17 [6] changing 98 combined 40 first 18, 29, 36, 38, 81 group 34, 105 group of 105, 114 last 18, 34, 36 multiple 3, 54, 106, 115 multiple SAS 54 observations change 95 reading 35, 129 sample 95 selecting 56, 107 single 37, 52, 105 observations 98, 127 rearranged 57 observations change 127 observations number 15, 127 ODS (Output Delivery System) 75-7, 127 ODS column format 74, 127 ODS destinations 75, 127 ODS statement 76, 127 One-dimensional Arrays 96, 127 One-sided confidence limit 69, 127 One-to-One Reading 37-8, 113, 127 online 2 operators 24-5, 56, 87, 127 comparison 25, 120 logical 25, 56, 82 OPTIONS statement 17, 127 OS 10-12, 127 output 3, 13-14, 17, 35, 40, 50, 56, 60-1, 65, 68-9, 71-2, 74-7, 113-14, 127 formatted 75-6
127
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
printed 57, 107 output data 72, 127 output data set 41, 57, 83, 127 Output Delivery System, see ODS output file 27, 50, 127 output objects 75, 127 output objects Output 75, 127 output SAS data set 71, 127 output-SAS-data-set names 37-40, 42-3, 128 OUTPUT statement 35, 94, 108, 116, 128 OUTPUT statement Format 35, 71, 128 output Links 77, 128 P PAD option 47, 128 PAGEBY statement 59, 128 parentheses 18, 21, 25, 43, 82, 90, 94, 96-7, 101, 111, 128 percentile 70, 128 performance 104, 113, 128 physical names 11, 128 pointer 46, 53, 102, 112, 128 position 19, 45-6, 52, 90-1, 128 relative 37-8 predetermined number 47, 128 printing 17, 128 priorities 25, 100, 111, 128 PROC 8, 19-20, 23, 55, 59-60, 62-3, 68-72, 99, 106, 113-14, 128 PROC CONTENTS statement Format 13, 128 PROC DATASETS 13, 128 PROC DATASETS statement 13, 128 PROC EXPORT statement Format 50, 128 PROC FORMAT 78, 128 PROC FORMAT statement 78, 81, 128 PROC FREQ 72-4, 128 PROC REPORT 65-7, 128 PROC REPORT statement 63, 128 PROC REPORT statement Format 63, 128 PROC SORT statement 57, 128 PROC SORT statement Format 57, 128 PROC statement 9, 99, 121, 128 PROC step 98, 128 PROC steps 8, 16, 59, 98, 121, 128 PROC SUMMARY 72, 128 procedure output 120, 123 processing 9, 14, 26-8, 38, 71, 83-4, 99, 116, 128 Processing DATA Steps 3, 27, 128 Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
128
processing errors 116, 128 program data vector 28-30, 36-7, 41, 50, 81, 93, 96, 110, 115, 117, 128 program data vector 36, 48, 128 program statement 128 final 20 Prometric 7, 128 publisher 5, 128 PUT 27, 31-2, 88, 128 PUT statement 27, 31, 88, 114, 128 PUT statement Format 27, 50, 129 Q QTR value 88, 129 qualified name 20, 129 QUIT 13-14, 113, 129 QUIT statement 9, 13, 99, 103, 116, 129 R random numbers 86, 129 range 18, 22, 49, 67, 69, 79, 81, 96, 106, 115, 129 raw data 20-3, 26-7, 29, 36, 44-6, 51-2, 78, 105, 114-15, 117, 129 raw data file 19, 23, 26, 29, 50, 105, 110, 114, 117, 129 character-delimited 50 Raw data files 19, 129 Raw Data Files 3, 19, 129 Raw data files, software programs use 19 Reading Data and Time Values 3, 51, 129 Reading Free Format Data 3, 47, 129 Reading Missing Values 49, 129 Rearrange observations 107, 129 Reasoning 111-17, 129 record formats 46, 129 common 46 fixed-length 47 records 19, 23, 27-8, 46-7, 49, 52-4, 105, 114, 129 fixed-length 46-7 free format 47 free-format 47 multiple 3, 52-3, 105 single fixed field 106 variable-length 47 reference 10, 19-20, 78, 96, 114, 117 RENAME 42, 129 rename variables 42, 129 replacement 1, 92, 129
129
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
report 17, 55-7, 59, 63-6, 68, 72, 107, 109, 112, 115, 129 review 2 round-off-unit 92, 129 Rounded 61, 129 rows 15, 63, 66, 74, 98, 101, 129 order of 64, 66 S Same-named variables 41-2, 130 sample 11, 127, 130 SAS 1, 6, 9-11, 19-20, 28, 30, 36, 41, 44-5, 49, 51-2, 80-2, 84, 93-4, 96-8, 102-3 [7] SAS arrays 95, 108, 116, 130 SAS Certified Base Programmer for SAS 1, 3, 6-7, 130 SAS data 11, 51, 88, 130 SAS-data-library 11-12, 20, 55, 130 SAS data set 8, 13-16, 19, 21, 26-7, 33, 36-7, 55-7, 63, 68, 71-3, 98, 107, 130 SAS-data-set 33, 35, 50, 55, 57, 59, 63, 68, 71-3, 130 SAS-data-set names 50, 130 SAS-data-set-out 71, 130 SAS data set 8, 19, 57, 130 SAS Data Sets 3, 14, 19, 33, 37, 130 SAS data step 23, 130 SAS datetime value 51, 88, 130 SAS expressions 24, 56, 84, 93, 130 valid 24, 26, 81-2, 94 SAS files 10, 14, 130 SAS formats 17, 61, 130 SAS functions 85-6, 100, 111, 130 SAS library 13-14, 130 SAS listing output 60, 130 SAS log 9, 99, 130 SAS name 23, 130 SAS output of library contents 13, 130 SAS procedure output Markup 130 traditional 75 SAS procedure steps 83, 130 SAS programming 1, 6, 130 SAS programs 8-9, 26, 53, 75, 98-100, 110, 112, 130 SAS Programs 3, 8, 130 SAS session 11, 21, 60, 78, 108, 130 SAS statement 84, 130 executable 82, 84 SAS stores 51, 88, 130 SAS System Outputs 3, 17, 130 Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
130
SAS uses Output Delivery System 75, 130 scan 48, 89-90, 130 SCAN function 89-90, 130 search 80, 91, 130 character variable's values 89 Searches 89, 130 seconds 51-2, 88, 112, 130 semicolons 8, 26, 100-1, 109, 111, 130 omitted 99 separators 89-91, 131 sequences 24-5, 131 session 10-11, 131 SET SAS-data-set 33, 35, 131 SET SAS-data-set-1 37-9, 131 SET statement 33-8, 95, 101, 114, 131 SET statement Format 26, 38-9, 131 Set.name 108, 131 sign 46, 54, 81, 131 Single Records 3, 52, 54, 115, 131 source 18, 87-8, 91-2, 131 spaces 9, 61, 65, 86, 96, 101, 131 decimal 61-2 spacing 64-5, 131 extra 60 specifier 54, 109, 116, 131 squares 67, 69, 131-2 ss 52, 131 Standard error 67, 69, 131 Standard numeric data values 22, 45, 110, 117, 131 statement 20, 26-8, 34, 42-3, 55-6, 58-60, 63-6, 71, 77, 81-5, 94, 97101, 103-4, 106-9, 111, 114-15 [3] assignment 24, 82 global 17 infile 23 input 29 multiple 111 programming 103 statement Copyright 108, 131 statement executes 83, 131 statement Format 26, 35, 55, 58, 64, 68, 76-7, 82-4, 93-4, 131 statistic-keyword 68, 71-2, 131 statistics 66, 68, 70-2, 74, 105, 114, 131 summary 71-2 STD 67, 69, 72, 131 STDERR 67, 69, 131 step 8-9, 11, 19, 23, 30, 62, 99, 103, 106, 109, 126, 131
131
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
stop 35, 93, 95, 131 STOP statement 35, 131 Stored Value 61-2, 131 string 31, 89-92, 131 subsetting 26, 42-3, 131 SUBSTR 89-90, 103, 131 SUBSTR function 90, 131 substring 56, 89, 91, 131 subtotals 58, 63, 132 Sum 67, 69, 132 SUM 58, 66-7, 69, 132 sum statement 81, 132 SUM statement 58, 103, 112, 132 SUM variable, changing report characteristics Summarized output data set 72, 132 SUMWGT 67, 69, 132 system options 17-18, 65, 132
55
T table 15, 73-4, 101, 105, 132 n-way 73-4 two-way 73-4 table of contents 77, 132 Table of Contents 3, 132 TABLES statement 74, 101, 132 TABLES statement Format 73, 132 TABLES variable-1 73-4, 132 Target variables 86, 90, 132 Temporary array elements 97, 132 Temporary files 10, 132 temporary variables 34, 36, 132 text 59-60, 132 time informats 51, 121, 132 time values 3, 51, 88, 129, 132 time values fractions 22, 132 time values fractions xf0b7 124 time number 15, 132 title 59-60, 132 higher-numbered 60 TITLE statement 59, 132 tracking pointers 41, 132 trade name 5, 132 trademarks 5, 132 trailing blanks 89-90, 113, 132 TRIM 89-90, 103, 113, 132 Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
132
U Uncorrected sum of squares 67, 69, 132 unmatched observations 42-3, 122, 132 uppercase 9, 14-15, 23, 89, 132 URL 77, 132 usage 64, 68, 132 user-defined formats 61, 64, 78-81, 132 User-Defined Formatting 3, 78, 132 USS 67, 69, 133 V Valid data set names 14, 133 valid SAS variable name 22, 133 VALUE format name 79, 133 VALUE format-name 79, 133 VALUE statement 79-80, 106, 133 VALUE statement Copyright 106, 133 value 126, 133 smallest 57 value number 133 longest 81 value Numeric variables 16 values 15-16, 23-4, 27-9, 31, 33-8, 40-1, 46-9, 53, 56-7, 61-2, 65-8, 70-1, 73-4, 81-2, 91-4, 113 [15] values 24, 86, 133 character variable's 130 VAR 67, 69, 133 VAR statement 56, 58, 70, 72, 133 variable-2 73-4, 133 Variable conditions 33-4, 133 variable informat 45, 51, 133 variable-length record format 47 variable names 133 invalid 28 variable values 64, 133 variablen 43, 133 variablen names 43, 133 variable's values 79, 133 variables number 15, 133 Variance 67, 69, 133 W Weight variable values 67, 69, 133 when-expression-1 84, 133 width 17, 64-5, 133 maximum 16-17
133
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
windowing mode 63, 133 Work.Formats 78, 133 Work‟ 10-11, 133 www.emereo.org 2
Copyright The Art of Service Brisbane, Australia │ Email: [email protected] │Web: http://theartofservice.com │eLearning: http://theartofservice.org Phone: +61 (0)7 3252 2055
134