This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
inc jmp
Variable ForLoop
ForDone:
To implement this as a set of macros, we need to be able to write a short piece of code that will write the above assembly language statements for us. At first blush, this would seem easy, why not use the following code? ForLp
ForLoop:
macro mov mov mov cmp jg
Variable, Start, Stop ax, Start Variable, ax ax, Variable ax, Stop ForDone
Page 411
Chapter 08 endm Next
macro inc jmp
Variable Variable ForLoop
ForDone: endm
These two macros would produce correct code – exactly once. However, a problem develops if you try to use these macros a second time. This is particularly evident when using nested loops: ForLp ForLp . . . Next Next
I, 1, 10 J, 1, 10
J I
The macros above emit the following 80x86 code:
ForLoop:
ForLoop:
mov mov mov cmp jg
ax, 1 I, ax ax, I ax, 10 ForDone
;The ForLp I, 1, 10 ; macro emits these ; statements. ; . ; .
mov mov mov cmp jg . . . inc jmp
ax, 1 J, ax ax, J ax, 10 ForDone
;The ForLp J, 1, 10 ; macro emits these ; statements. ; . ; .
J ForLp
;The Next J macro emits these ; statements.
inc jmp
I ForLp
;The Next I macro emits these ; statements.
ForDone:
ForDone:
The problem, evident in the code above, is that each time you use the ForLp macro you emit the label “ForLoop” to the code. Likewise, each time you use the Next macro, you emit the label “ForDone” to the code stream. Therefore, if you use these macros more than once (within the same procedure), you will get a duplicate symbol error. To prevent this error, the macros must generate unique labels each time you use them. Unfortunately, the local directive will not work here. The local directive defines a unique symbol within a single macro invocation. If you look carefully at the code above, you’ll see that the ForLp macro emits a symbol that the code in the Next macro references. Likewise, the Next macro emits a label that the ForLp macro references. Therefore, the label names must be global since the two macros can reference each other’s labels. The solution the actual ForLp and Next macros use is to generate globally known labels of the form “$$For” + “variable name” + “some unique number.” and “$$Next” + “variable name” + “some unique number”. For the example given above, the real ForLp and Next macros would generate the following code:
$$ForI0:
$$ForJ0:
Page 412
mov mov mov cmp jg
ax, 1 I, ax ax, I ax, 10 $$NextI0
;The ForLp I, 1, 10 ; macro emits these ; statements. ; . ; .
mov mov mov cmp
ax, 1 J, ax ax, J ax, 10
;The ForLp J, 1, 10 ; macro emits these ; statements. ; .
Directives and Pseudo Opcodes jg . . . inc jmp
$$NextJ0
;
.
J $$ForJ0
;The Next J macro emits these ; statements.
inc jmp
I $$ForI0
;The Next I macro emits these ; statements.
$$NextJ0:
$$NextI0:
The real question is, “How does one generate such labels?” Constructing a symbol of the form “$$ForI” or “$$NextJ” is pretty easy. Just create a symbol by concatenating the string “$$For” or “$$Next” with the loop control variable’s name. The problem occurs when you try to append a numeric value to the end of that string. The actual ForLp and Next code accomplishes this creating assembly time variable names of the form “$$Forvariable_name” and incrementing this variable for each loop with the given loop control variable name. By calling the macros MakeLbl, jgDone, and jmpLoop, ForLp and Next output the appropriate labels and ancillary instructions. The ForLp and Next macros are very complex. Far more complex than you would typically find in a program. They do, however, demonstrate the power of MASM’s macro facilities. By the way, there are much better ways to create these symbols using macro functions. We’ll discuss macro functions next.
8.14.7
Macro Functions A macro function is a macro whose sole purpose is to return a value for use in the operand field of some other statement. Although there is the obvious parallel between procedures and functions in a high level language and procedural macros and functional macros, the analogy is far from perfect. Macro functions do not let you create sequences of code that emit some instructions that compute a value when the program actually executes. Instead, macro functions simply compute some value at assembly time that MASM can use as an operand. A good example of a macro function is the Date function. This macro function packs a five bit day, four bit month, and seven bit year value into 16 bits and returns that 16 bit value as the result. If you needed to create an initialized array of dates, you could use code like the following: DateArray
word word word word word
Date(2, Date(1, Date(7, Date(7, Date(6,
4, 84) 1, 94) 20, 60) 19, 69) 18, 74)
. . .
The Date function would pack the data and the word directive would emit the 16 bit packed value for each date to the object code file. You invoke macro functions by using their name where MASM expects a text expression of some sort. If the macro function requires any parameters, you must enclose them within parentheses, just like the parameters to Date, above. Macro functions look exactly like standard macros with two exceptions: they do not contain any statements that generate code and they return a text value via an operand to the exitm directive. Note that you cannot return a numeric value with a macro function. If you need to return a numeric value, you must first convert it to a text value. The following macro function implements Date using the 16 bit date format given in Chapter One (see “Bit Fields and Packed Data” on page 28):
Page 413
Chapter 08 Date
macro local = exitm endm
Value
month, day, year Value (month shl 12) or (day shl 7) or year %Value
The text expansion operator (“%”) is necessary in the operand field of the exitm directive because macro functions always return textual data, not numeric data. The expansion operator converts the numeric value to a string of digits acceptable to exitm. One minor problem with the code above is that this function returns garbage if the date isn’t legal. A better design would generate an error if the input date is illegal. You can use the “.err” directive and conditional assembly to do this. The following implementation of Date checks the month, day, and year values to see if they are somewhat reasonable: Date
macro local
month, day, year Value
if
(month gt 12) or (month lt 1) or \ (day gt 31) or (day lt 1) or \ (year gt 99) (year lt 1)
.err exitm endif Value
= exitm endm
<0>
;;Must return something!
(month shl 12) or (day shl 7) or year %Value
With this version, any attempt to specify a totally outrageous date triggers the assembly of the “.err” directive that forces an error at assembly time.
8.14.8
Predefined Macros, Macro Functions, and Symbols MASM provides four built-in macros and four corresponding macro functions. In addition, MASM also provides a large number of predefined symbols you can access during assembly. Although you would rarely use these macros, functions, and variables outside of moderately complex macros, they are essential when you do need them.
Table 43: MASM Predefined Macros Name
operands
Example
Description
substr
string, start, length
NewStr substr Oldstr, 1, 3
Returns a string consisting of the characters from start to start+length in the string operand. The length operand is optional. If it is not present, MASM returns all characters from position start through the end of the string.
Pos instr 2, OldStr,
Searches for “substr” within “string” starting at position “start.” The starting value is optional. If it is missing, MASM begins searching for the string from position one. If MASM cannot find the substring within the string operand, it returns the value zero.
StrSize sizestr OldStr
Returns the size of the string in the operand field.
NewStr catstr OldStr, <$$>
Creates a new string by concatenating each of the strings appearing in the operand field of the catstr macro.
Returns: text data
instr
start, string, substr Returns: numeric data
sizestr
string
catstr
string, string, ...
Returns: numeric data
Returns: text data
Page 414
Directives and Pseudo Opcodes The substr and catstr macros return text data. In some respects, they are similar to the textequ directive since you use them to assign textual data to a symbol at assembly time. The instr and sizestr are similar to the “=” directive insofar as they return a numeric value. The catstr macro can eliminate the need for the MakeLbl macro found in the ForLp macro. Compare the following version of ForLp to the previous version (see “A Sample Macro to Implement For Loops” on page 409). ForLp
$$For&LCV& $$For&LCV&
macro local
LCV, Start, Stop ForLoop
ifndef = else = endif
$$For&LCV& 0
mov mov
ax, Start LCV, ax
$$For&LCV& + 1
; Due to bug in MASM, this won’t actually work. The idea is sound, though ; Read on for correct solution. ForLoop &ForLoop&:
textequ
@catstr($For&LCV&, %$$For&LCV&)
mov cmp jgDone endm
ax, LCV ax, Stop $$Next&LCV&, %$$For&LCV&
MASM also provides macro function forms for catstr, instr, sizestr, and substr. To differentiate these macro functions from the corresponding predefined macros, MASM uses the names @catstr, @instr, @sizestr, and @substr. The the following equivalences between these operations: Symbol Symbol
catstr textequ
String1, String2, ... @catstr(String1, String2, ...)
Symbol Symbol
substr textequ
SomeStr, 1, 5 @substr(SomeStr, 1, 5)
Symbol Symbol
instr =
1, SomeStr, SearchStr @substr(1, SomeStr, SearchStr)
Symbol Symbol
sizestr =
SomeStr @sizestr(SomeStr)
Table 44: MASM Predefined Macro Functions Name
Parameters
Example
string, start, length Returns: text data
ifidn
@instr
start, string, substr Returns: numeric data
if
@sizestr
string Returns: numeric data
byte
@catstr
string, string, ... Returns: text data
jg
@substr
@substr(parm, 1, 4), <[bx]>
@instr(parm,) @sizestr(SomeStr) @catstr($$Next&LCV&, %$$For&LCV&)
The last example above shows how to get rid of the jgDone and jmpLoop macros in the ForLp macro. A final, improved, version of the ForLp and Next macros, eliminating the three support macros and working around the bug in MASM might look something like the following:
Page 415
Chapter 08 ForLp
$$For&LCV& $$For&LCV&
ForLoop &ForLoop&:
Next
NextLbl &NextLbl&:
macro local
LCV, Start, Stop ForLoop
ifndef = else = endif
$$For&LCV& 0
mov mov
ax, Start LCV, ax
textequ
@catstr($For&LCV&, %$$For&LCV&)
mov cmp jg endm
ax, LCV ax, Stop @catstr($$Next&LCV&, %$$For&LCV&)
macro local inc jmp textequ
LCV NextLbl LCV @catstr($$For&LCV&, %$$For&LCV&) @catstr($Next&LCV&, %$$For&LCV&)
$$For&LCV& + 1
endm
MASM also provides a large number of built in variables that return information about the current assembly. The following table describes these built in assembly time variables.
Table 45: MASM Predefined Assembly Time Variables Category
Name
Description
Return result
Date & Time Information
@Date
Returns the date of assembly.
Text value
@Time
Returns a string denoting the time of assembly.
Text value
Page 416
Directives and Pseudo Opcodes
Table 45: MASM Predefined Assembly Time Variables Category
Name
Description
Return result
@CPU
Returns a 16 bit value whose bits determine the active processor directive. Specifying the .8086, .186, .286, .386, .486, and .586 directives enable additional instructions in MASM. They also set the corresponding bits in the @cpu variable. Note that MASM sets all the bits for the processors it can handle at any one given time. For example, if you use the .386 directive, MASM sets bits zero, one, two, and three in the @cpu variable.
Bit 0 - 8086 instrs permissible. Bit 1 - 80186 instrs permissible. Bit 2 - 80286 instrs permissible. Bit 3- 80386 instrs permissible. Bit 4- 80486 instrs permissible. Bit 5- Pentium instrs permissible. Bit 6- Reserved for 80686 (?). Bit 7- Protected mode instrs okay. Bit 8- 8087 instrs permissible. Bit 10- 80287 instrs permissible. Bit 11- 80386 instrs permissible. (bit 11 is also set for 80486 and Pentium instr sets).
@Environ
@Environ(name) returns the text associated with DOS environment variable name. The parameter must be a text value that evaluates to a valid DOS environment variable name.
Text value
@Interface
Returns a numeric value denoting the current language type in use. Note that this information is similar to that provided by the opattr attribute.
Bits 0-2 000- No language type 001- C 010- SYSCALL 011- STDCALL 100- Pascal 101- FORTRAN 110- BASIC
Environment Information
The H.O. bit determines if you are assembling code for MS-DOS/Windows or OS/2. This directive is mainly useful for those using MASM’s simplified segment directives. Since this text does not deal with the simplified directives, further discussion of this variable is unwarranted.
File Information
Bit 7 0- MS-DOS or Windows 1- OS/2
@Version
Returns a numeric value that is the current MASM version number multiplied by 100. For example, MASM 6.11’s @version variable returns 611.
Numeric value
@FileCur
Returns the current source or include file name, including any necessary pathname information.
Text value
@FileName
Returns the current source file name (base name only, no path information). If in an include file, this variable returns the name of the source file that included the current file.
Text value
@Line
Returns the current line number in the source file.
Numeric value
Page 417
Chapter 08
Table 45: MASM Predefined Assembly Time Variables Category
Segment a Information
Name
Description
Return result
@code
Returns the name of the current code segment.
Text value
@data
Returns the name of the current data segment.
Text value
@FarData?
Returns the name of the current far data segment.
Text value
@WordSize
Returns two if this is a 16 bit segment, four if this is a 32 bit segment.
Numeric value
@CodeSize
Returns zero for Tiny, Small, Compact, and Flat models. Returns one for Medium, Large, and Huge models.
Numeric value
@DataSize
Returns zero for Tiny, Small, Medium, and Flat memory models. Returns one for Compact and Large models. Returns two for Huge model programs.
Numeric value
@Model
Returns one for Tiny model, two for Small model, three for Compact model, four for Medium model, five for Large model, six for Huge model, and seven for Flag model.
Numeric value
@CurSeg
Returns the name of the current code segment.
Text value
The name of the current stack segment.
Text value
@stack
a. These functions are intended for use with MASM’s simplified segment directives. This chapter does not discuss these directives, so these functions will probably be of little use.
Although there is insufficient space to go into detail about the possible uses for each of these variables, a few examples might demonstrate some of the possibilities. Other uses of these variables will appear throughout the text; however, the most impressive uses will be the ones you discover. The @CPU variable is quite useful if you want to assemble different code sequences in your program for different processors. The section on conditional assembly in this chapter described how you could create a symbol to determine if you are assembling the code for an 80386 and later processor or a stock 8086 processor. The @CPU symbol provides a symbol that will tell you exactly which instructions are allowable at any given point in your program. The following is a rework of that example using the @CPU variable: if shl else mov shl endif
@CPU and 100b ;Need an 80286 or later processor ax, 4 ; for this instruction. ;Must be 8086 processor. cl, 4 ax, cl
You can use the @Line directive to put special diagnostic messages in your code. The following code would print an error message including the line number in the source file of the offending assertion, if it detects an error at run-time: mov cmp je mov call jmp
8.14.9
ax, ErrorFlag ax, 0 NoError ax, @Line ;Load AX with current line # PrintError ;Go print error message and Line # Quit ;Terminate program.
Macros vs. Text Equates Macros, macro functions, and text equates all substitute text in a program. While there is some overlap between them, they really do serve different purposes in an assembly language program.
Page 418
Directives and Pseudo Opcodes Text equates perform a single text substitution on a line. They do not allow any parameters. However, you can replace text anywhere on a line with a text equate. You can expand a text equate in the label, mnemonic, operand, or even the comment field. Furthermore, you can replace multiple fields, even an entire line with a single symbol. Macro functions are legal in the operand field only. However, you can pass parameters to macro functions making them considerably more general than simple text equates. Procedural macros let you emit sequences of statements (with text equates you can emit, at most, one statement).
8.14.10 Macros: Good and Bad News Macros offer considerable convenience. They let you insert several instructions into your source file by simply typing a single command. This can save you an incredible amount of typing when entering huge tables, each line of which contains some bizarre, but repeated calculation. It's useful (in certain cases) for helping make your programs more readable. Few would argue that ForLp I,1,10 is not more readable than the corresponding 80x86 code. Unfortunately, it's easy to get carried away and produce code that is inefficient, hard to read, and hard to maintain. A lot of so-called “advanced” assembly language programmers get carried away with the idea that they can create their own instructions via macro definitions and they start creating macros for every imaginable function under the sun. The COPY macro presented earlier is a good example. The 80x86 doesn't support a memory to memory move operation. Fine, we'll create a macro that does the job for us. Soon, the assembly language program doesn't look like 80x86 assembly language at all. Instead, a large number of the statements are macro invocations. Now this may be great for the programmer who has created all these macros and intimately understands their operation. To the 80x86 programmer who isn't familiar with those macros, however, it's all gibberish. Maintaining a program someone else wrote, that contains “new” instructions implemented via macros, is a horrible task. Therefore, you should rarely use macros as a device to create new instructions on the 80x86. Another problem with macros is that they tend to hide side effects. Consider the COPY macro presented earlier. If you encountered a statement of the form COPY VAR1,VAR2 in an assembly language program, you'd think that this was an innocuous statement that copies VAR2 to VAR1. Wrong! It also destroys the current contents of the ax register leaving a copy of the value in VAR2 in the ax register. This macro invoca-
tion doesn't make this very clear. Consider the following code sequence: mov copy mov
ax, 5 Var2, Var1 Var1, ax
This code sequence copies Var1 into Var2 and then (supposedly) stores five into Var1. Unfortunately, the COPY macro has wiped out the value in ax (leaving the value originally contained in Var1 alone), so this instruction sequence does not modify Var1 at all! Another problem with macros is efficiency. Consider the following invocations of the COPY macro: copy copy copy
Var3, Var1 Var2, Var1 Var0, Var1
These three statements generate the code: mov mov mov mov mov mov
ax, Var1 Var3, ax ax, Var1 Var2, ax ax, Var1 Var0, ax
Page 419
Chapter 08 Clearly, the last two mov ax,Var1 instructions are superfluous. The ax register already contains a copy of Var1, there is no need to reload ax with this value. Unfortunately, this inefficiency, while perfectly obvious in the expanded code, isn't obvious at all in the macro invocations. Another problem with macros is complexity. In order to generate efficient code, you can create extremely complex macros using conditional assembly (especially ifb, ifidn, etc.), repeat loops (described a little later), and other directives. Unfortunately, these macros are small programs all on their own. You can have bugs in your macros just as you can have bugs in your assembly language program. And the more complex your macros become, the more likely they'll contain bugs that will, of course, become bugs in your program when invoking the macro. Overusing macros, especially complex ones, produces hard to read code that is hard to maintain. Despite the enthusiastic claims of those who love macros, the unbridled use of macros within a program generally causes more bugs than it helps to prevent. If you're going to use macros, go easy on them. There is a good side to macros, however. If you standardize on a set of macros and document all your programs as using these macros, they may help make your programs more readable. Especially if those macros have easily identifiable names. The UCR Standard Library for 80x86 Assembly Language Programmers uses macros for most library calls. You’ll read more about the UCR Standard Library in the next chapter.
8.15
Repeat Operations Another macro format (at least by Microsoft's definition) is the repeat macro. A repeat macro is nothing more than a loop that repeats the statements within the loop some specified number of times. There are three types of repeat macros provided by MASM: repeat/rept, for/irp, and forc/irpc. The repeat/rept macro uses the following syntax: repeat <statements> endm
expression
Expression must be a numeric expression that evaluates to an unsigned constant. The repeat directive duplicates all the statements between repeat and endm that many times.
The following code generates a table of 26 bytes containing the 26 uppercase characters: ASCIICode
ASCIICode
= repeat byte = endm
'A' 26 ASCIICode ASCIICode+1
The symbol ASCIICode is assigned the ASCII code for “A”. The loop repeats 26 times, each time emitting a byte with the value of ASCIICode. Also, the loop increments the ASCIICode symbol on each repetition so that it contains the ASCII code of the next character in the ASCII table. This effectively generates the following statements: byte byte
‘A’ ‘B’
. . .
ASCIICode
byte byte =
‘Y’ ‘Z’ 27
Note that the repeat loop executes at assembly time, not at run time. Repeat is not a mechanism for creating loops within your program; use it for replicating sections of code within your program. If you want to create a loop that executes some number of times within your program, use the loop instruction. Although the following two code sequences produce the same result, they are not the same:
Page 420
Directives and Pseudo Opcodes ; Code sequence using a run-time loop: AddLp:
mov add add loop
cx, 10 ax, [bx] bx, 2 AddLp
; Code sequence using an assembly-time loop: repeat add add endm
10 ax, [bx] bx, 2
The first code sequence above emits four machine instructions to the object code file. At assembly time, the 80x86 CPU executes the statements between AddLp and the loop instruction ten times under the control of the loop instruction. The second code sequence above emits 20 instructions to the object code file. At run time, the 80x86 CPU simply executes these 20 instructions sequentially, with no control transfer. The second form will be faster, since the 80x86 does not have to execute the loop instruction every third instruction. On the other hand, the second version is also much larger because it replicates the body of the loop ten times in the object code file. Unlike standard macros, you do not define and invoke repeat macros separately. MASM emits the code between the repeat and endm directives upon encountering the repeat directive. There isn't a separate invocation phase. If you want to create a repeat macro that can be invoked throughout your program, consider the following: REPTMacro
macro repeat <statements> endm endm
Count Count
By placing the repeat macro inside a standard macro, you can invoke the repeat macro anywhere in your program by invoking the REPTMacro macro. Note that you need two endm directives, one to terminate the repeat macro, one to terminate the standard macro. Rept is a synonym for repeat. Repeat is the newer form, MASM supports Rept for compatibility with older source files. You should always use the repeat form.
8.16
The FOR and FORC Macro Operations Another form of the repeat macro is the for macro. This macro takes the following form: for
parameter,
<statements> endm
The angle brackets are required around the items in the operand field of the for directive. The braces surround optional items, the braces should not appear in the operand field. The for directive replicates the instructions between for and endm once for each item appearing in the operand field. Furthermore, for each iteration, the first symbol in the operand field is assigned the value of the successive items from the second parameter. Consider the following loop: for byte endm
value,<0,1,2,3,4,5> value
This loop emits six bytes containing the values zero, one, two, ..., five. It is absolutely identical to the sequence of instructions:
Page 421
Chapter 08 byte byte byte byte byte byte
0 1 2 3 4 5
Remember, the for loop, like the repeat loop, executes at assembly time, not at run time. For ’s second operand need not be a literal text constant; you can supply a macro parameter, macro function result, or a text equate for this value. Keep in mind, though, that this parameter must expand to a text value with the text delimiters around it. Irp is an older, obsolete, synonym for for. MASM allows irp to provide compatibility with older source code. However, you should always use the for directive.
The third form of the loop macro is the forc macro. It differs from the for macro in that it repeats a loop the number of times specified by the length of a character string rather than by the number of operands present. The syntax for the forc directive is forc
parameter,<string>
<statements> endm
The statements in the loop repeat once for each character in the string operand. The angle brackets must appear around the string. Consider the following loop: forc byte endm
value,<012345> value
This loop produces the same code as the example for the for directive above. Irpc is an old synonym for forc provided for compatibility reasons. You should always use forc in your new code.
8.17
The WHILE Macro Operation The while macro lets you repeat a sequence of code in your assembly language file an indefinite number of times. An assembly time expression, that while evaluates before emitting the code for each loop, determines whether it repeats. The syntax for this macro is while expression <Statements> endm
This macro evaluates the assembly-time expression; if this expression’s value is zero, the while macro ignores the statements up to the corresponding endm directive. If the expression evaluates to a non-zero value (true), then MASM assembles the statements up to the endm directive and reevaluates the expression to see if it should assemble the body of the while loop again. Normally, the while directive repeats the statements between the while and endm as long as the expression evaluates true. However, you can also use the exitm directive to prematurely terminate the expansion of the loop body. Keep in mind that you need to provide some condition that terminates the loop, otherwise MASM will go into an infinite loop and continually emit code to the object code file until the disk fills up (or it will simply go into an infinite loop if the loop does not emit any code).
8.18
Macro Parameters Standard MASM macros are very flexible. If the number of actual parameters (those supplied in the operand field of the macro invocation) does not match the number of for-
Page 422
Directives and Pseudo Opcodes mal parameters (those appearing in the operand field of the macro definition), MASM won’t necessarily complain. If there are more actual parameters than formal parameters, MASM ignores the extra parameters and generates a warning. If there are more formal parameters than actual parameters, MASM substitutes the empty string (“<>”) for the extra formal parameters. By using the ifb and ifnb conditional assembly directives, you can test this last condition. While this parameter substitution technique is flexible, it also leaves open the possibility of error. If you want to require that the programmer supply exactly three parameters and they actually supply less, MASM will not generate an error. If you forget to test for the presence of each parameter using ifb, you could generate bad code. To overcome this limitation, MASM provides the ability to specify that certain macro parameters are required. You can also assign a default value to a parameter if the programming doesn’t supply one. Finally, MASM also provides facilities to allow a variable number of macro arguments. If you want to require a programmer to supply a particular macro parameters, simply put “:req” after the macro parameter in the macro definition. At assembly time, MASM will generate an error if that particular macro is missing. Needs2Parms
macro
parm1:req, parm2:req
. . .
endm . . .
Needs2Parms ax Needs2Parms Needs2Parms ax, bx
;Generates an error. ;Generates an error. ;Works fine.
Another possibility is to have the macro supply a default value for a macro if it is missing from the actual parameter list. To do this, simply use the “:=” operator immediately after the parameter name in the formal parameter list. For example, the int 10h BIOS function provides various video services. One of the most commonly used video services is the ah=0eh function that outputs the character in al to the video display. The following macro lets the caller specify which function they want to use, and defaults to function 0eh if they don’t specify a parameter: Video
macro mov int endm
service := <0eh> ah, service 10h
The last feature MASM’s macros support is the ability to process a variable number of parameters. To do this you simply place the operator “:vararg” after the last formal parameter in the parameter list. MASM associates the first n actual parameters with the corresponding formal parameters appearing before the variable argument, it then creates a text equate of all remaining parameters to the formal parameter suffixed with the “:vararg” operator. You can use the for macro to extract each parameter from this variable argument list. For example, the following macro lets you declare an arbitrary number of two dimensional arrays, all the same size. The first two parameters specify the number of rows and columns, the remaining optional parameters specify the names of the arrays: MkArrays &AryName&
macro for word endm endm
NumRows:req, NumCols:req, Names:vararg AryName, Names NumRows dup (NumCols dup (?))
. . .
MkArrays 8, 12, A, B, X, Y
Page 423
Chapter 08
8.19
Controlling the Listing MASM provides several assembler directives that are useful for controlling the output of the assembler. These directives include echo, %out, title, subttl, page, .list, .nolist, and .xlist. There are several others, but these are the most important.
8.19.1
The ECHO and %OUT Directives The echo and %out directives simply print whatever appears in its operand field to the video display during assembly. Some examples of echo and %out appeared in the sections on conditional assembly and macros. Note that %out is an older form of echo provided for compatibility with old source code.. You should use echo in all your new code.
8.19.2
The TITLE Directive The title assembler directive assigns a title to your source file. Only one title directive may appear in your program. The syntax for this directive is title
text
MASM will print the specified text at the top of each page of the assembled listing.
8.19.3
The SUBTTL Directive The subttl (subtitle) directive is similar to the title directive, except multiple subtitles may appear within your source file. Subtitles appear immediately below the title at the top of each page in the assembled listing. The syntax for the subttl directive is subttl
text
The specified text will become the new subtitle. Note that MASM will not print the new subtitle until the next page eject. If you wish to place the subtitle on the same page as the code immediately following the directive, use the page directive (described next) to force a page ejection.
8.19.4
The PAGE Directive The page directive performs two functions- it can force a page eject in the assembly listing and it can set the width and length of the output device. To force a page eject, the following form of the page directive is used: page
If you place a plus sign, “+”, in the operand field, then MASM performs a page break, increments the section number, and resets the page number to one. MASM prints page numbers using the format section-page If you want to take advantage of the section number facility, you will have to manually insert page breaks (with a “+” operand) in front of each new section. The second form of the page command lets you set the printer page width and length values. It takes the form: page
length, width
where length is the number of lines per page (defaults to 50, but 56-60 is a better choice for most printers) and width is the number of characters per line. The default page width is Page 424
Directives and Pseudo Opcodes 80 characters. If your printer is capable of printing 132 columns, you should change this value to 132 so your listings will be easier to read. Note that some printers, even if their carriage is only 8-1/2" wide, will print at least 132 columns across in a condensed mode. Typically some control character must be sent to the printer to place it in condensed mode. You can insert such a control character in a comment at the beginning of your source listing.
8.19.5
The .LIST, .NOLIST, and .XLIST Directives The .list, .nolist, and .xlist directives can be used to selectively list portions of your source file during assembly. .List turns the listing on, .Nolist turns the listing off. .Xlist is an obsolete form of .Nolist for older code. By sprinkling these three directives throughout your source file, you can list only those sections of code that interest you. None of these directives accept any operands. They take the following forms: .list .nolist .xlist
8.19.6
Other Listing Directives MASM provides several other listing control directives that this chapter will not cover. These let you control the output of macros, conditional assembly segments, and so on to the listing file. Please see the appendices for details on these directives.
8.20
Managing Large Programs Most assembly language programs are not totally stand alone programs. In general, you will call various standard library or other routines which are not defined in your main program. For example, you’ve probably noticed by now that the 80x86 doesn’t provide any instructions like “read”, “write”, or “printf” for doing I/O operations. In fact, the only instructions you’ve seen that do I/O include the 80x86 in and out instructions, which are really just special mov instructions, and the echo/%out directives that perform assembly-time output, not the run-time output you want. Is there no way to do I/O from assembly language? Of course there is. You can write procedures that perform the I/O operations like “read” and “write”. Unfortunately, writing such routines is a complex task, and beginning assembly language programmers are not ready for such tasks. That’s where the UCR Standard Library for 80x86 Assembly Language Programmers comes in. This is a package of procedures you can call to perform simple I/O operations like “printf”. The UCR Standard Library contains thousands of lines of source code. Imagine how difficult programming would be if you had to merge these thousands of lines of code into your simple programs. Fortunately, you don’t have to. For small programs, working with a single source file is fine. For large programs this gets very cumbersome (consider the example above of having to include the entire UCR Standard Library into each of your programs). Furthermore, once you’ve debugged and tested a large section of your code, continuing to assemble that same code when you make a small change to some other part of your program is a waste of time. The UCR Standard Library, for example, takes several minutes to assemble, even on a fast machine. Imagine having to wait five or ten minutes on a fast Pentium machine to assemble a program to which you’ve made a one line change! As with HLLs, the solution is separate compilation (or separate assembly in MASM’s case). First, you break up your large source files into manageable chunks. Then you Page 425
Chapter 08 assemble the separate files into object code modules. Finally, you link the object modules together to form a complete program. If you need to make a small change to one of the modules, you only need to reassemble that one module, you do not need to reassemble the entire program. The UCR Standard Library works in precisely this way. The Standard Library is already assembled and ready to use. You simply call routines in the Standard Library and link your code with the Standard Library using a linker program. This saves a tremendous amount of time when developing a program that uses the Standard Library code. Of course, you can easily create your own object modules and link them together with your code. You could even add new routines to the Standard Library so they will be available for use in future programs you write. “Programming in the large” is a term software engineers have coined to describe the processes, methodologies, and tools for handling the development of large software projects. While everyone has their own idea of what “large” is, separate compilation, and some conventions for using separate compilation, are one of the big techniques for “programming in the large.” The following sections describe the tools MASM provides for separate compilation and how to effectively employ these tools in your programs.
8.20.1
The INCLUDE Directive The include directive, when encountered in a source file, switches program input from the current file to the file specified in the parameter list of the include. This allows you to construct text files containing common equates, macros, source code, and other assembler items, and include such a file into the assembly of several separate programs. The syntax for the include directive is include
filename
Filename must be a valid DOS filename. MASM merges the specified file into the assembly at the point of the include directive. Note that you can nest include statements inside files you include. That is, a file being included into another file during assembly may itself include a third file.
Using the include directive by itself does not provide separate compilation. You could use the include directive to break up a large source file into separate modules and join these modules together when you assemble your file. The following example would include the PRINTF.ASM and PUTC.ASM files during the assembly of your program: include include
printf.asm putc.asm
end
Now your program will benefit from the modularity gained by this approach. Alas, you will not save any development time. The include directive inserts the source file at the point of the include during assembly, exactly as though you had typed that code in yourself. MASM still has to assemble the code and that takes time. Were you to include all the files for the Standard Library routines, your assemblies would take forever. In general, you should not use the include directive to include source code as shown above16. Instead, you should use the include directive to insert a common set of constants (equates), macros, external procedure declarations, and other such items into a program. Typically an assembly language include file does not contain any machine code (outside of a macro). The purpose of using include files in this manner will become clearer after you see how the public and external declarations work.
16. There is nothing wrong with this, other than the fact that it does not take advantage of separate compilation.
Page 426
Directives and Pseudo Opcodes
8.20.2
The PUBLIC, EXTERN, and EXTRN Directives Technically, the include directive provides you with all the facilities you need to create modular programs. You can build up a library of modules, each containing some specific routine, and include any necessary modules into an assembly language program using the appropriate include commands. MASM (and the accompanying LINK program) provides a better way: external and public symbols. One major problem with the include mechanism is that once you've debugged a routine, including it into an assembly wastes a lot of time since MASM must reassemble bug-free code every time you assemble the main program. A much better solution would be to preassemble the debugged modules and link the object code modules together rather than reassembling the entire program every time you change a single module. This is what the public and extern directives provide for you. Extrn is an older directive that is a synonym for extern. It provides compatibility with old source files. You should always use the extern directive in new source code. To use the public and extern facilities, you must create at least two source files. One file contains a set of variables and procedures used by the second. The second file uses those variables and procedures without knowing how they're implemented. To demonstrate, consider the following two modules: ;Module #1: DSEG Var1 Var2 DSEG CSEG Proc1
Proc1 CSEG
public segment word word ends
Var1, Var2, Proc1 para public 'data' ? ?
segment assume proc mov add mov ret endp ends end
para public 'code' cs:cseg, ds:dseg near ax, Var1 ax, Var2 Var1, ax
extern segment
Var1:word, Var2:word, Proc1:near para public 'code'
;Module #2: CSEG
. . .
mov mov call
Var1, 2 Var2, 3 Proc1
. . .
CSEG
ends end
Module #2 references Var1, Var2, and Proc1, yet these symbols are external to module #2. Therefore, you must declare them external with the extern directive. This directive takes the following form: extern
name:type {,name:type...}
Name is the name of the external symbol, and type is the type of that symbol. Type may be any of near, far, proc, byte, word, dword, qword, tbyte, abs (absolute, which is a constant), or
some other user defined type. The current module uses this type declaration. Neither MASM nor the linker checks the declared type against the module defining name to see if the types agree. Therefore, you must exercise caution when defining external symbols. The public directive lets you export a symbol's value to external modules. A public declaration takes the form: Page 427
Chapter 08
Header.a
Implementation Module
Using Module
INCLUDE Header.a
INCLUDE Header.a
Figure 8.8 Using a Single Include file for Implementation and Using Modules public
name {,name ...}
Each symbol appearing in the operand field of the public statement is available as an external symbol to another module. Likewise, all external symbols within a module must appear within a public statement in some other module. Once you create the source modules, you should assemble the file containing the public declarations first. With MASM 6.x, you would use a command like ML /c pubs.asm The “/c” option tells MASM to perform a “compile-only” assembly. That is, it will not try to link the code after a successful assembly. This produces a “pubs.obj” object module. Next, assemble the file containing the external definitions and link in the code using the MASM command: ML exts.asm pubs.obj Assuming there are no errors, this will produce a file “exts.exe” which is the linked and executable form of the program. Note that the extern directive defines a symbol in your source file. Any attempt to redefine that symbol elsewhere in your program will produce a “duplicate symbol” error. This, as it turns out, is the source of problems which Microsoft solved with the externdef directive.
8.20.3
The EXTERNDEF Directive The externdef directive is a combination of public and extern all rolled into one. It uses the same syntax as the extern directive, that is, you place a list of name:type entries in the operand field. If MASM does not encounter another definition of the symbol in the current source file, externdef behaves exactly like the extern statement. If the symbol does appear in the source file, then externdef behaves like the public command. With externdef there really is no need to use the public or extern statements unless you feel somehow compelled to do so. The important benefit of the externdef directive is that it lets you minimize duplication of effort in your source files. Suppose, for example, you want to create a module with a bunch of support routines for other programs. In addition to sharing some routines and some variables, suppose you want to share constants and macros as well. The include file mechanism provides a perfect way to handle this. You simply create an include file containing the constants, macros, and externdef definitions and include this file in the module that implements your routines and in the modules that use those routines (see Figure 8.8). Note that extern and public wouldn’t work in this case because the implementation module needs the public directive and the using module needs the extern directive. You would have to create two separate header files. Maintaining two separate header files that
Page 428
Directives and Pseudo Opcodes contain mostly identical definitions is not a good idea. The externdef directive provides a solution. Within your headers files you should create segment definitions that match those in the including modules. Be sure to put the externdef directives inside the same segments in which the symbol is actually defined. This associates a segment value with the symbol so that MASM can properly make appropriate optimizations and other calculations based on the symbol’s full address: ; From “HEADER.A” file: cseg
segment
para public ‘code’
externdef
Routine1:near, Routine2:far
cseg
ends
dseg
segment
para public ‘data’
externdef
i:word, b:byte, flag:byte
dseg
ends
This text adopts the UCR Standard Library convention of using an “.a” suffix for assembly language header files. Other common suffixes in use include “.inc” and “.def”.
8.21
Make Files Although using separate compilation reduces assembly time and promotes code reuse and modularity, it is not without its own drawbacks. Suppose you have a program that consists of two modules: pgma.asm and pgmb.asm. Also suppose that you’ve already assembled both modules so that the files pgma.obj and pgmb.obj exist. Finally, you make changes to pgma.asm and pgmb.asm and assemble the pgma.asm but forget to assemble the pgmb.asm file. Therefore, the pgmb.obj file will be out of date since this object file does not reflect the changes made to the pgmb.asm file. If you link the program’s modules together, the resulting .exe file will only contain the changes to the pgma.asm file, it will not have the updated object code associated with pgmb.asm. As projects get larger, as they have more modules associated with them, and as more programmers begin working on the project, it gets very difficult to keep track of which object modules are up to date. This complexity would normally cause someone to reassemble (or recompile) all modules in a project, even if many of the .obj files are up to date, simply because it might seem too difficult to keep track of which modules are up to date and which are not. Doing so, of course, would eliminate many of the benefits that separate compilation offers. Fortunately, there is a tool that can help you manage large projects: nmake. The nmake program, will a little help from you, can figure out which files need to be reassemble and which files have up to date .obj files. With a properly defined make file, you can easily assemble only those modules that absolutely must be assembled to generate a consistent program. A make file is a text file that lists assembly-time dependencies between files. An .exe file, for example, is dependent on the source code whose assembly produce the executable. If you make any changes to the source code you will (probably) need to reassemble or recompile the source code to produce a new .exe file17. Typical dependencies include the following: • •
An executable file (.exe) generally depends only on the set of object files (.obj) that the linker combines to form the executable. A given object code file (.obj) depends on the assembly language source files that were assembled to produce that object file. This includes the
17. Obviously, if you only change comments or other statements in the source file that do not affect the executable file, a recompile or reassembly will not be necessary. To be safe, though, we will assume any change to the source file will require a reassembly.
Page 429
Chapter 08
•
assembly language source files (.asm) and any files included during that assembly (generally .a files). The source files and include files generally don’t depend on anything.
A make file generally consists of a dependency statement followed by a set of commands to handle that dependency. A dependency statement takes the following form: dependent-file : list of files
Example: pgm.exe: pgma.obj pgmb.obj
This statement says that “pgm.exe” is dependent upon pgma.obj and pgmb.obj. Any changes that occur to pgma.obj or pgmb.obj will require the generate of a new pgm.exe file. The nmake.exe program uses a time/date stamp to determine if a dependent file is out of date with respect to the files it depends upon. Any time you make a change to a file, MS-DOS and Windows will update a modification time and date associated with the file. The nmake.exe program compares the modification date/time stamp of the dependent file against the modification date/time stamp of the files it depends upon. If the dependent file’s modification date/time is earlier than one or more of the files it depends upon, or one of the files it depends upon is not present, then nmake.exe assumes that some operation must be necessary to update the dependent file. When an update is necessary, nmake.exe executes the set of (MS-DOS) commands following the dependency statement. Presumably, these commands would do whatever is necessary to produce the updated file. The dependency statement must begin in column one. Any commands that must execute to resolve the dependency must start on the line immediately following the dependency statement and each command must be indented one tabstop. The pgm.exe statement above would probably look something like the following: pgm.exe: pgma.obj pgmb.obj ml /Fepgm.exe pgma.obj pgmb.obj
(The “/Fepgm.exe” option tells MASM to name the executable file “pgm.exe.”) If you need to execute more than one command to resolve the dependencies, you can place several commands after the dependency statement in the appropriate order. Note that you must indent all commands one tab stop. Nmake.exe ignores any blank lines in a make file. Therefore, you can add blank lines, as appropriate, to make the file easier to read and understand. There can be more than a single dependency statement in a make file. In the example above, for example, pgm.exe depends upon the pgma.obj and pgmb.obj files. Obviously, the .obj files depend upon the source files that generated them. Therefore, before attempting to resolve the dependencies for pgm.exe, nmake.exe will first check out the rest of the make file to see if pgma.obj or pgmb.obj depends on anything. If they do, nmake.exe will resolve those dependencies first. Consider the following make file: pgm.exe: pgma.obj pgmb.obj ml /Fepgm.exe pgma.obj pgmb.obj pgma.obj: pgma.asm ml /c pgma.asm pgmb.obj: pgmb.asm ml /c pgmb.asm
The nmake.exe program will process the first dependency line it finds in the file. However, the files pgm.exe depends upon themselves have dependency lines. Therefore, nmake.exe will first ensure that pgma.obj and pgmb.obj are up to date before attempting to execute MASM to link these files together. Therefore, if the only change you’ve made has been to pgmb.asm, nmake.exe takes the following steps (assuming pgma.obj exists and is up to date).
Page 430
Directives and Pseudo Opcodes 1.
Nmake.exe processes the first dependency statement. It notices that dependency lines for pgma.obj and pgmb.obj (the files on which pgm.exe depends) exist. So it processes those statements first.
2.
Nmake.exe processes the pgma.obj dependency line. It notices that the pgma.obj file is newer than the pgma.asm file, so it does not execute the command following this dependency statement.
3.
Nmake.exe processes the pgmb.obj dependency line. It notes that pgmb.obj is older than pgmb.asm (since we just changed the pgmb.asm source file). Therefore, nmake.exe executes the DOS command following on the next line. This generates a new pgmb.obj file that is now up to date.
4.
Having process the pgma.obj and pgmb.obj dependencies, nmake.exe now returns its attention to the first dependency line. Since nmake.exe just created a new pgmb.obj file, its date/time stamp will be newer than pgm.exe’s. Therefore, nmake.exe will execute the ml command that links pgma.obj and pgmb.obj together to form the new pgm.exe file.
Note that a properly written make file will instruct nmake.exe to assembly only those modules absolutely necessary to produce a consistent executable file. In the example above, nmake.exe did not bother to assemble pgma.asm since its object file was already up to date. There is one final thing to emphasize with respect to dependencies. Often, object files are dependent not only on the source file that produces the object file, but any files that the source file includes as well. In the previous example, there (apparently) were no such include files. Often, this is not the case. A more typical make file might look like the following: pgm.exe: pgma.obj pgmb.obj ml /Fepgm.exe pgma.obj pgmb.obj pgma.obj: pgma.asm pgm.a ml /c pgma.asm pgmb.obj: pgmb.asm pgm.a ml /c pgmb.asm
Note that any changes to the pgm.a file will force nmake.exe to reassemble both pgma.asm and pgmb.asm since the pgma.obj and pgmb.obj files both depend upon the pgm.a include file. Leaving include files out of a dependency list is a common mistake programmers make that can produce inconsistent .exe files. Note that you would not normally need to specify the UCR Standard Library include files nor the Standard Library .lib files in the dependency list. True, your resulting .exe file does depend on this code, but the Standard Library rarely changes, so you can safely leave it out of your dependency list. Should you make a modification to the Standard Library, simply delete any old .exe and .obj files and force a reassembly of the entire system. Nmake.exe, by default, assumes that it will be processing a make file named “makefile”. When you run nmake.exe, it looks for “makefile” in the current directory. If it doesn’t find this file, it complains and terminates18. Therefore, it is a good idea to collect the files for each project you work on into their own subdirectory and give each project its own makefile. Then to create an executable, you need only change into the appropriate subdirectory and run the nmake.exe program. Although this section discusses the nmake program in sufficient detail to handle most projects you will be working on, keep in mind that nmake.exe provides considerable functionality that this chapter does not discuss. To learn more about the nmake.exe program, consult the documentation that comes with MASM.
18. There is a command line option that lets you specify the name of the makefile. See the nmake documentation in the MASM manuals for more details.
Page 431
Chapter 08
8.22
Sample Program Here is a single program that demonstrates most of the concepts from this chapter. This program consists of several files, including a makefile, that you can assemble and link using the nmake.exe program. This particular sample program computes “cross products” of various functions. The multiplication table you learned in school is a good example of a cross product, so are the truth tables found in Chapter Two of your textbook. This particular program generates cross product tables for addition, subtraction, division, and, optionally, remainder (modulo). In addition to demonstrating several concepts from this chapter, this sample program also demonstrates how to manipulate dynamically allocated arrays. This particular program asks the user to input the matrix size (row and column sizes) and then computes an appropriate set of cross products for that array.
8.22.1 EX8.MAK The cross product program contains several modules. The following make file assembles all necessary files to ensure a consistent .EXE file. ex8.exe:ex8.obj geti.obj getarray.obj xproduct.obj matrix.a ml ex8.obj geti.obj getarray.obj xproduct.obj ex8.obj: ex8.asm matrix.a ml /c ex8.asm geti.obj: geti.asm matrix.a ml /c geti.asm getarray.obj: getarray.asm matrix.a ml /c getarray.asm
xproduct.obj: xproduct.asm matrix.a ml /c xproduct.asm
8.22.2 Matrix.A MATRIX.A is the header file containing definitions that the cross product program uses. It also contains all the externdef statements for all externally defined routines. ; ; ; ; ; ; ;
MATRIX.A This include file provides the external definitions and data type definitions for the matrix sample program in Chapter Eight. Some useful type definitions:
Integer Char
typedef typedef
word byte
; Some common constants: Bell
equ
07;ASCII code for the bell character.
; A “Dope Vector” is a structure containing information about arrays that ; a program allocates dynamically during program execution. This particular ; dope vector handles two dimensional arrays. It uses the following fields: ; ; TTLPoints at a zero terminated string containing a description ; of the data in the array. ; ; FuncPointer to function to compute for this matrix.
Page 432
Directives and Pseudo Opcodes ; ; ; ; ; ; ; ; ; DopeVec TTL Func Data Dim1 Dim2 ESize DopeVec
Data-
Pointer to the base address of the array.
Dim1-
This is a word containing the number of rows in the array.
Dim2-
This is a word containing the number of elements per row in the array.
ESize-
Contains the number of bytes per element in the array. struct dword dword dword word word word ends
? ? ? ? ? ?
; Some text equates the matrix code commonly uses: Base
textequ
<es:[di]>
byp wp dp
textequ textequ textequ
<word ptr>
; Procedure declarations. InpSeg
segment
para public ‘input’
externdef geti:far externdef getarray:far InpSeg
ends
cseg
segment
para public ‘code’
externdef CrossProduct:near cseg
ends
; Variable declarations dseg
segment
para public ‘data’
externdef InputLine:byte dseg
ends
; Uncomment the following equates if you want to turn on the ; debugging statements or if you want to include the MODULO function. ;debug ;DoMOD
equ equ
0 0
8.22.3 EX8.ASM This is the main program. It calls appropriate routines to get the user input, compute the cross product, and print the result. ; ; ; ;
Sample program for Chapter Eight. Demonstrates the use of many MASM features discussed in Chapter Six including label types, constants, segment ordering, procedures, equates, address expressions, coercion and type operators, segment prefixes,
Page 433
Chapter 08 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
the assume directive, conditional assembly, macros, listing directives, separate assembly, and using the UCR Standard Library. Include the header files for the UCR Standard Library. Note that the “stdlib.a” file defines two segments; MASM will load these segments into memory before “dseg” in this program. The “.nolist” directive tells MASM not to list out all the macros for the standard library when producing an assembly listing. Doing so would increase the size of the listing by many tens of pages and would tend to obscure the real code in this program. The “.list” directive turns the listing back on after MASM gets past the standard library files. Note that these two directives (“.nolist” and “.list”) are only active if you produce an assembly listing using MASM’s “/Fl” command line parameter.
.nolist include stdlib.a includelib stdlib.lib .list
; The following statement includes the special header file for this ; particular program. The header file contains external definitions ; and various data type definitions. include
; ; ; ; ;
The following two statements allow us to use 80386 instructions in the program. The “.386” directive turns on the 80386 instruction set, the “option” directive tells MASM to use 16-bit segments by default (when using 80386 instructions, 32-bit segments are the default). DOS real mode programs must be written using 16-bit segments. .386 option
segment:use16
dseg
segment
para public ‘data’
Rows Columns
integer integer
? ?
; ; ; ; ; ; ; ; ; ;
; ; ; ; ; ; ;
;Number of rows in matrices ;Number of columns in matrices
Input line is an input buffer this code uses to read a string of text from the user. In particular, the GetWholeNumber procedure passes the address of InputLine to the GETS routine that reads a line of text from the user and places each character into this array. GETS reads a maximum of 127 characters plus the enter key from the user. It zero terminates that string (replacing the ASCII code for the ENTER key with a zero). Therefore, this array needs to be at least 128 bytes long to prevent the possibility of buffer overflow. Note that the GetArray module also uses this array.
InputLine
Page 434
matrix.a
char
128 dup (0)
The following two pointers point at arrays of integers. This program dynamically allocates storage for the actual array data once the user tells the program how big the arrays should be. The Rows and Columns variables above determine the respective sizes of these arrays. After allocating the storage with a call to MALLOC, this program stores the pointers to these arrays into the following two pointer variables.
Directives and Pseudo Opcodes RowArray ColArray
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
dword dword
? ?
;Pointer to Row values ;Pointer to column values.
ResultArrays is an array of dope vectors(*) to hold the results from the matrix operations: [0][1][2][3]-
addition table subtraction table multiplication table division table
[4]- modulo (remainder) table -- if the symbol “DoMOD” is defined. The equate that follows the ResultArrays declaration computes the number of elements in the array. “$” is the offset into dseg immediately after the last byte of ResultArrays. Subtracting this value from ResultArrays computes the number of bytes in ResultArrays. Dividing this by the size of a single dope vector produces the number of elements in the array. This is an excellent example of how you can use address expressions in an assembly language program. The IFDEF DoMOD code demonstrates how easy it is to extend this matrix. Defining the symbol “DoMOD” adds another entry to this array. The rest of the program adjusts for this new entry automatically. You can easily add new items to this array of dope vectors. You will need to supply a title and a function to compute the matrice’s entries. Other than that, however, this program automatically adjusts to any new entries you add to the dope vector array. (*) A “Dope Vector” is a data structure that describes a dynamically allocated array. A typical dope vector contains the maximum value for each dimension, a pointer to the array data in memory, and some other possible information. This program also stores a pointer to an array title and a pointer to an arithmetic function in the dope vector.
ResultArrays
DopeVec DopeVec
{AddTbl,Addition}, {SubTbl,Subtraction} {MulTbl,Multiplication}, {DivTbl,Division}
ifdef DopeVec endif
DoMOD {ModTbl,Modulo}
; Add any new functions of your own at this point, before the following equate:
RASize
=
($-ResultArrays) / (sizeof DopeVec)
; Titles for each of the four (five) matrices. AddTbl SubTbl MulTbl DivTbl
char char char char
“Addition Table”,0 “Subtraction Table”,0 “Multiplication Table”,0 “Division Table”,0
ModTbl
ifdef char endif
DoMOD “Modulo (Remainder) Table”,0
; This would be a good place to put a title for any new array you create. dseg
ends
Page 435
Chapter 08 ; ; ; ;
Putting PrintMat inside its own segment demonstrates that you can have multiple code segments within a program. There is no reason we couldn’t have put “PrintMat” in CSEG other than to demonstrate a far call to a different segment.
PrintSeg
segment
para public ‘PrintSeg’
; PrintMatPrints a matrix for the cross product operation. ; ; On Entry: ; ; DS must point at DSEG. ; DS:SI points at the entry in ResultArrays for the ; array to print. ; ; The output takes the following form: ; ; Matrix Title ; ; <- column matrix values -> ; ; ^ *------------------------* ; | | | ; R | | ; o | Cross Product Matrix | ; w | Values | ; | | ; V | | ; a | | ; l | | ; u | | ; e | | ; s | | ; | | | ; v *------------------------*
PrintMat
; ; ; ;
proc assume
far ds:dseg
Note the use of conditional assembly to insert extra debugging statements if a special symbol “debug” is defined during assembly. If such a symbol is not defined during assembly, the assembler ignores the following statements: ifdef print char endif
debug “In PrintMat”,cr,lf,0
; First, print the title of this table. The TTL field in the dope vector ; contains a pointer to a zero terminated title string. Load this pointer ; into es:di and call PUTS to print that string. putcr les puts ; ; ; ;
Now print the column values. Note the use of PUTISIZE so that each value takes exactly six print positions. The following loop repeats once for each element in the Column array (the number of elements in the column array is given by the Dim2 field in the dope vector).
ColValLp:
Page 436
di, [si].DopeVec.TTL
print char
cr,lf,lf,”
mov les mov
dx, [si].DopeVec.Dim2 di, ColArray ax, es:[di]
“,0
;Skip spaces to move past the ; row values. ;# times to repeat the loop. ;Base address of array. ;Fetch current array element.
Directives and Pseudo Opcodes mov putisize add dec jne putcr putcr
cx, 6 di, 2 dx ColValLp
;Print the value using a ; minimum of six positions. ;Move on to next element. ;Repeat this loop DIM2 times. ;End of column array output ;Insert a blank line.
; Now output each row of the matrix. Note that we need to output the ; RowArray value before each row of the matrix. ; ; RowLp is the outer loop that repeats for each row.
RowLp:
mov les mov add mov mov putisize print char
Rows, 0 di, RowArray bx, Rows bx, bx ax, es:[di][bx] cx, 5
;Repeat for 0..Dim1-1 rows. ;Output the current RowArray ; value on the left hand side ; of the matrix. ;ES:DI is base, BX is index. ;Output using five positions.
“: “,0
; ColLp is the inner loop that repeats for each item on each row.
ColLp:
; ; ; ;
mov mov imul add add
;Repeat for 0..Dim2-1 cols. ;Compute index into the array ; index := (Rows*Dim2 + ; columns) * 2
Note that we only have a pointer to the base address of the array, so we have to fetch that pointer and index off it to access the desired array element. This code loads the pointer to the base address of the array into the es:di register pair. les mov
; ; ; ; ; ;
Columns, 0 bx, Rows bx, [si].DopeVec.Dim2 bx, Columns bx, bx
di, [si].DopeVec.Data ax, es:[di][bx]
;Base address of array. ;Get array element
The functions that compute the values for the array store an 8000h into the array element if some sort of error occurs. Of course, it is possible to produce 8000h as an actual result, but giving up a single value to trap errors is worthwhile. The following code checks to see if an error occurred during the cross product. If so, this code prints “ ****”, otherwise, it prints the actual value. cmp jne print char jmp
ax, 8000h GoodOutput
;Check for error value
“ ****”,0 DoNext
;Print this for errors.
GoodOutput:
mov cx, 6 putisize
;Use six print positions. ;Print a good value.
DoNext:
mov inc mov cmp jb
;Move on to next array ; element.
ax, Columns ax Columns, ax ax, [si].DopeVec.Dim2 ColLp
putcr
PrintMat
mov inc mov cmp jb ret endp
;See if we’re done with ; this column. ;End each column with CR/LF
ax, Rows ax Rows, ax ax, [si].DopeVec.Dim1 RowLp
;Move on to the next row.
;Have we finished all the ; rows? Repeat if not done.
Page 437
Chapter 08 PrintSeg
ends
cseg
segment assume
;GetWholeNum; ;
This routine reads a whole number (an integer greater than zero) from the user. If the user enters an illegal whole number, this procedure makes the user re-enter the data.
GetWholeNum
proc lesi gets
near InputLine
call jc cmp jle ret
Geti BadInt ax, 0 BadInt
BadInt:
GetWholeNum
; ; ; ; ; ;
;Point es:di at InputLine array.
;Get an integer from the line. ;Carry set if error reading integer. ;Must have at least one row or column!
Bell “Illegal integer value, please re-enter”,cr,lf,0 GetWholeNum
Various routines to call for the cross products we compute. On entry, AX contains the first operand, dx contains the second. These routines return their result in AX. They return AX=8000h if an error occurs. Note that the CrossProduct function calls these routines indirectly.
addition
AddDone: addition
subtraction
SubDone: subtraction multiplication
MulDone: multiplication division
BadDivide:
Page 438
print char char jmp endp
para public ‘code’ cs:cseg, ds:dseg
proc add jno mov ret endp
far ax, dx AddDone ax, 8000h
;Check for signed arithmetic overflow. ;Return 8000h if overflow occurs.
proc sub jno mov ret endp
far ax, dx SubDone ax, 8000h
;Return 8000h if overflow occurs.
far ax, dx MulDone ax, 8000h
;Error if overflow occurs.
proc push
far cx
;Preserve registers we destory.
mov cwd test je idiv
cx, dx
proc imul jno mov ret endp
cx, cx BadDivide cx
;See if attempting division by zero.
mov pop ret
dx, cx cx
;Restore the munged register.
mov
ax, 8000h
Directives and Pseudo Opcodes
division
mov pop ret endp
dx, cx cx
; The following function computes the remainder if the symbol “DoMOD” ; is defined somewhere prior to this point.
modulo
BadMod:
modulo
ifdef proc push
DoMOD far cx
mov cwd test je idiv mov mov pop ret
cx, dx
mov mov pop ret endp endif
cx, cx BadDivide cx ax, dx dx, cx cx
;See if attempting division by zero.
;Need to put remainder in AX. ;Restore the munged registers.
ax, 8000h dx, cx cx
; If you decide to extend the ResultArrays dope vector array, this is a good ; place to define the function for those new arrays.
; The main program that reads the data from the user, calls the appropriate ; routines, and then prints the results. Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
; Prompt the user to enter the number of rows and columns: GetRows:
print byte
“Enter the number of rows for the matrix:”,0
call mov
GetWholeNum Rows, ax
; Okay, read each of the row values from the user: print char ; ; ; ; ; ; ; ; ;
“Enter values for the row (vertical) array”,cr,lf,0
Malloc allocates the number of bytes specified in the CX register. AX contains the number of array elements we want; multiply this value by two since we want an array of words. On return from malloc, es:di points at the array allocated on the “heap”. Save away this pointer in the “RowArray” variable. Note the use of the “wp” symbol. This is an equate to “word ptr” appearing in the “matrix.a” include file. Also note the use of the address expression “RowArray+2” to access the segment portion of the double word pointer. mov shl malloc mov
cx, ax cx, 1 wp RowArray, di
Page 439
Chapter 08 mov
wp RowArray+2, es
; Okay, call “GetArray” to read “ax” input values from the user. ; GetArray expects the number of values to read in AX and a pointer ; to the base address of the array in es:di. print char
“Enter row data:”,0
mov call
ax, Rows GetArray
;# of values to read. ;ES:DI still points at array.
; Okay, time to repeat this for the column (horizontal) array. GetCols:
print byte
“Enter the number of columns for the matrix:”,0
call mov
GetWholeNum Columns, ax
;Get # of columns from the user. ;Save away number of columns.
; Okay, read each of the column values from the user: print char ; ; ; ; ;
“Enter values for the column (horz.) array”,cr,lf,0
Malloc allocates the number of bytes specified in the CX register. AX contains the number of array elements we want; multiply this value by two since we want an array of words. On return from malloc, es:di points at the array allocated on the “heap”. Save away this pointer in the “RowArray” variable. mov shl malloc mov mov
cx, ax cx, 1 wp ColArray, di wp ColArray+2, es
;Convert # Columns to # bytes ; by multiply by two. ;Get the memory. ;Save pointer to the ;columns vector (array).
; Okay, call “GetArray” to read “ax” input values from the user. ; GetArray expects the number of values to read in AX and a pointer ; to the base address of the array in es:di.
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
“Enter Column data:”,0
mov call
ax, Columns GetArray
;# of values to read. ;ES:DI points at column array.
Okay, initialize the matrices that will hold the cross products. Generate RASize copies of the following code. The “repeat” macro repeats the statements between the “repeat” and the “endm” directives RASize times. Note the use of the Item symbol to automatically generate different indexes for each repetition of the following code. The “Item = Item+1” statement ensures that Item will take on the values 0, 1, 2, ..., RASize on each repetition of this loop. Remember, the “repeat..endm” macro copies the statements multiple times within the source file, it does not execute a “repeat..until” loop at run time. That is, the following macro is equivalent to making “RASize” copies of the code, substituting different values for Item for each copy. The nice thing about this code is that it automatically generates the proper amount of initialization code, regardless of the number of items placed in the ResultArrays array.
Item
Page 440
print char
=
0
Directives and Pseudo Opcodes
Item
; ; ; ; ;
repeat
RASize
mov imul add malloc
cx, Columns cx, Rows cx, cx
mov mov
wp ResultArrays[Item * (sizeof DopeVec)].Data, di wp ResultArrays[Item * (sizeof DopeVec)].Data+2, es
mov mov
ax, Rows ResultArrays[Item * (sizeof DopeVec)].Dim1, ax
mov mov
ax, Columns ResultArrays[Item * (sizeof DopeVec)].Dim2, ax
mov
ResultArrays[Item * (sizeof DopeVec)].ESize, 2
= endm
Item+1
;Compute the size, in bytes, ; of the matrix and allocate ; sufficient storage for the ; array.
Okay, we’ve got the input values from the user, now let’s compute the addition, subtraction, multiplication, and division tables. Once again, a macro reduces the amount of typing we need to do at this point as well as automatically handling however many items are present in the ResultArrays array.
element
element
=
0
repeat lfs lgs
RASize bp, RowArray ;Pointer to row data. bx, ColArray ;Pointer to column data.
lea call
cx, ResultArrays[element * (sizeof DopeVec)] CrossProduct
= endm
element+1
; Okay, print the arrays down here. Once again, note the use of the ; repeat..endm macro to save typing and automatically handle additions ; to the ResultArrays array.
Item
Item
; ; ; ; ;
=
0
repeat mov call = endm
RASize si, offset ResultArrays[item * (sizeof DopeVec)] PrintMat Item+1
Technically, we don’t have to free up the storage malloc’d for each of the arrays since the program is about to quit. However, it’s a good idea to get used to freeing up all your storage when you’re done with it. For example, were you to add code later at the end of this program, you would have that extra memory available to that new code.
Item
les free les free
di, ColArray
= repeat les free
0 RASize di, ResultArrays[Item * (sizeof DopeVec)].Data
di, RowArray
Page 441
Chapter 08 Item
= endm
Item+1
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?)
;DOS macro to quit program.
Main
8.22.4 GETI.ASM GETI.ASM contains a routine (geti) that reads an integer value from the user. ; GETI.ASM ; ; This module contains the integer input routine for the matrix ; example in Chapter Eight.
InpSeg
.nolist include .list
stdlib.a
include
matrix.a
segment
para public ‘input’
; Geti-On entry, es:di points at a string of characters. ; This routine skips any leading spaces and comma characters and then ; tests the first (non-space/comma) character to see if it is a digit. ; If not, this routine returns the carry flag set denoting an error. ; If the first character is a digit, then this routine calls the ; standard library routine “atoi2” to convert the value to an integer. ; It then ensures that the number ends with a space, comma, or zero ; byte. ; ; Returns carry clear and value in AX if no error. ; Returns carry set if an error occurs. ; ; This routine leaves ES:DI pointing at the character it fails on when ; converting the string to an integer. If the conversion occurs without ; an error, the ES:DI points at a space, comma, or zero terminating byte.
geti
; ; ; ; ; ; ; ; ;
Page 442
proc
far
ifdef print char endif
debug “Inside GETI”,cr,lf,0
First, skip over any leading spaces or commas. Note the use of the “byp” symbol to save having to type “byte ptr”. BYP is a text equate appearing in the macros.a file. A “byte ptr” coercion operator is required here because MASM cannot determine the size of the memory operand (byte, word, dword, etc) from the operands. I.e., “es:[di]” and ‘ ‘ could be any of these three sizes. Also note a cute little trick here; by decrementing di before entering
Directives and Pseudo Opcodes ; the loop and then immediately incrementing di, we can increment di before ; testing the character in the body of the loop. This makes the loop ; slightly more efficient and a lot more elegant.
SkipSpcs:
dec inc cmp je cmp je
di di byp es:[di], ‘ ‘ SkipSpcs byp es:[di], ‘,’ SkipSpcs
; See if the first non-space/comma character is a decimal digit:
TryDigit:
mov cmp jne mov
al, es:[di] al, ‘-’ ;Minus sign is also legal in integers. TryDigit al, es:[di+1] ;Get next char, if “-”
isdigit jne
BadGeti
;Jump if not a digit.
; Okay, convert the characters that follow to an integer: ConvertNum:
atoi2 jc
BadGeti
;Leaves integer in AX ;Bomb if illegal conversion.
; Make sure this number ends with a reasonable character (space, comma, ; or a zero byte): cmp je cmp je cmp je
byp es:[di], ‘ ‘ GoodGeti byp es:[di], ‘,’ GoodGeti byp es:[di], 0 GoodGeti
ifdef print char char endif
debug “GETI: Failed because number did not end with “ “a space, comma, or zero byte”,cr,lf,0
BadGeti:
stc ret
;Return an error condition.
GoodGeti:
clc ret endp
;Return no error and an integer in AX
geti InpSeg
ends end
8.22.5 GetArray.ASM GetArray.ASM contains the GetArray input routine. This reads the data for the array from the user to produce the cross products. Note that GetArray reads the data for a single dimension array (or one row in a multidimensional array). The cross product program reads two such vectors: one for the column values and one for the row values in the cross product. Note: This routine uses subroutines from the UCR Standard Library that appear in the next chapter. ; GETARRAY.ASM ; ; This module contains the GetArray input routine. ; set of values for a row of some array.
This routine reads a
.386
Page 443
Chapter 08 option
segment:use16
.nolist include .list
stdlib.a
include
matrix.a
; Some local variables for this module: localdseg
segment
para public ‘LclData’
NumElements ArrayPtr
word dword
? ?
Localdseg
ends
InpSeg
segment assume
; GetArray; ; ; ; ; ; ; ; ; ;
Read a set of numbers and store them into an array.
GetArray
On Entry: es:di points at the base address of the array. ax contains the number of elements in the array. This routine reads the specified number of array elements from the user and stores them into the array. If there is an input error of some sort, then this routine makes the user reenter the data. proc pusha push push push ifdef print char puti putcr endif
; ; ; ; ; ; ;
para public ‘input’ ds:Localdseg
far ds es fs
;Preserve all the registers ; that this code modifies
debug “Inside GetArray, # of input values =”,0
mov mov
cx, Localdseg ds, cx
;Point ds at our local ; data segment.
mov mov mov
wp ArrayPtr, di wp ArrayPtr+2, es NumElements, ax
;Save in case we have an ; error during input.
The following loop reads a line of text from the user containing some number of integer values. This loop repeats if the user enters an illegal value on the input line. Note: LESI is a macro from the stdlib.a include file. It loads ES:DI with the address of its operand (as opposed to les di, InputLine that would load ES:DI with the dword value at address InputLine).
RetryLp:
lesi gets mov lfs
InputLine
;Read input line from user.
cx, NumElements si, ArrayPtr
;# of values to read. ;Store input values here.
; This inner loop reads “ax” integers from the input line. ; an error, it transfers control to RetryLp above. ReadEachItem:
Page 444
call
geti
If there is
;Read next available value.
Directives and Pseudo Opcodes jc mov add loop
BadGA fs:[si], ax ;Save away in array. si, 2 ;Move on to next element. ReadEachItem ;Repeat for each element.
pop pop pop popa ret
fs es ds
;Restore the saved registers ; from the stack before ; returning.
; If an error occurs, make the user re-enter the data for the entire ; row: BadGA:
getArray InpSeg
print char char jmp endp
“Illegal integer value(s).”,cr,lf “Re-enter data:”,0 RetryLp
ends end
8.22.6 XProduct.ASM This file contains the code that computes the actual cross-product. ; XProduct.ASM; ; This file contains the cross-product module. .386 option
segment:use16
.nolist include stdlib.a includelib stdlib.lib .list include
matrix.a
; Local variables for this module. dseg DV RowNdx ColNdx RowCntr ColCntr dseg
segment dword integer integer integer integer ends
para public ‘data’ ? ? ? ? ?
cseg
segment assume
para public ‘code’ ds:dseg
; CrossProduct- Computes the cartesian product of two vectors. ; ; On entry: ; ; FS:BP-Points at the row matrix. ; GS:BX-Points at the column matrix. ; DS:CX-Points at the dope vector for the destination. ; ; This code assume ds points at dseg. ; This routine only preserves the segment registers. RowMat ColMat
textequ textequ
Page 445
Chapter 08 DVP
textequ
CrossProduct
proc
near
ifdef print char endif
debug
xchg mov mov mov mov xchg
“Entering CrossProduct routine”,cr,lf,0
bx, cx ;Get dope vector pointer ax, DVP.Dim1 ;Put Dim1 and Dim2 values RowCntr, ax ; where they are easy to access. ax, DVP.Dim2 ColCntr, ax bx, cx
; Okay, do the cross product operation. This is defined as follows: ; ; for RowNdx := 0 to NumRows-1 do ; for ColNdx := 0 to NumCols-1 do ; Result[RowNdx, ColNdx] = Row[RowNdx] op Col[ColNdx];
OutsideLp:
InsideLp:
Done: CrossProduct cseg
Page 446
mov add mov cmp jge
RowNdx, -1 RowNdx, 1 ax, RowNdx ax, RowCntr Done
;Really starts at zero.
mov add mov cmp jge
ColNdx, -1 ColNdx, 1 ax, ColNdx ax, ColCntr OutSideLp
;Really starts at zero.
mov add mov
di, RowNdx di, di ax, RowMat[di]
mov add mov
di, ColNdx di, di dx, ColMat[di]
push mov
bx bx, cx
call
DVP.Func
;Save pointer to column matrix. ;Put ptr to dope vector where we can ; use it. ;Compute result for this guy.
mov imul add imul
di, di, di, di,
les mov
bx, DVP.Data es:[bx][di], ax
;Get base address of array. ;Save away result.
pop jmp
bx InsideLp
;Restore ptr to column array.
ret endp ends end
RowNdx ;Index into array is DVP.Dim2 ; (RowNdx*Dim2 + ColNdx) * ElementSize ColNdx DVP.ESize
Directives and Pseudo Opcodes
8.23
Laboratory Exercises In this set of laboratory exercises you will assemble various short programs, produce assembly listings, and observe the object code the assembler produces for some simple instruction sequences. You will also experiment with a make file to observe how it properly handles dependencies.
8.23.1
Near vs. Far Procedures The following short program demonstrates how MASM automatically generates near and far call and ret instructions depending on the operand field of the proc directive (this program is on the companion CD-ROM in the chapter eight subdirectory). Assemble this program with the /Fl option to produce an assembly listing. Look up the opcodes for near and far call and ret instructions in Appendix D. Compare those values against the opcodes this program emits. For your lab report: describe how MASM figures out which instructions need to be near or far. Include the assembled listing with your report and identify which instructions are near or far calls and returns. ; EX8_1.asm (Laboratory Exercise 8.1) cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Procedure1
proc
near
; MASM will emit a *far* call to procedure2 ; since it is a far procedure. call
Procedure2
; Since this return instruction is inside ; a near procedure, MASM will emit a near ; return.
Procedure1
ret endp
Procedure2
proc
far
; MASM will emit a *near* call to procedure1 ; since it is a near procedure. call
Procedure1
; Since this return instruction is inside ; a far procedure, MASM will emit a far ; return.
Procedure2 Main
ret endp proc mov mov mov
ax, dseg ds, ax es, ax
; MASM emits the appropriate call instructions ; to the following procedures.
Quit:
call call
Procedure1 Procedure2
mov
ah, 4ch
Page 447
Chapter 08
8.23.2
Main
int endp
cseg
ends
sseg stk sseg
segment byte ends end
21h
para stack 'stack' 1024 dup ("stack ") Main
Data Alignment Exercises In this exercise you will compile two different programs using the MASM “/Fl” command line option so you can observe the addresses MASM assigns to the variables in the program. The first program (Ex8_2a.asm) uses the even directive to align objects on a word boundary. The second program (Ex8_2b.asm) uses the align directive to align objects on different sized boundaries. For your lab report: Include the assembly listings in your lab report. Describe what the even and align directives are doing in the program and comment on how this produces faster running programs. ; EX8_2a.asm ; ; Example demonstrating the EVEN directive. dseg
segment
; Force an odd location counter within ; this segment: i
byte
0
; This word is at an odd address, which is bad! j
word
0
; Force the next word to align itself on an ; even address so we get faster access to it.
k
even word
0
; Note that even has no effect if we're already ; at an even address.
l dseg cseg procedure
even word ends
0
segment assume ds:dseg proc mov ax, [bx] mov i, al mov bx, ax
; The following instruction would normally lie on ; an odd address. The EVEN directive inserts a ; NOP so that it falls on an even address. even mov
bx, cx
; Since we're already at an even address, the ; following EVEN directive has no effect. even mov
Page 448
dx, ax
Directives and Pseudo Opcodes procedure cseg
ret endp ends end
; EX8_2b.asm ; ; Example demonstrating the align ; directive. dseg
segment
; Force an odd location counter ; within this segment: i
byte
0
; This word is at an odd address, ; which is bad! j
word
0
; Force the next word to align itself ; on an even address so we get faster ; access to it.
k
align word
2 0
; Force odd address again: k_odd
byte
0
; Align the next entry on a double ; word boundary.
l
align dword
4 0
; Align the next entry on a quad ; word boundary:
RealVar
align real8
8 3.14159
; Start the following on a paragraph ; boundary:
Table dseg
8.23.3
align dword ends end
16 1,2,3,4,5
Equate Exercise In this exercise you will discover a major difference between a numeric equate and a textual equate (program Ex8_3.asm on the companion CD-ROM). MASM evaluates the operand field of a numeric equate when it encounters the equate. MASM defers evaluation of a textual equate until it expands the equate (i.e., when you use the equate in a program). For your lab report: assemble the following program using MASM’s “/Fl” command line option and look at the object code emitted for the two equates. Explain
Page 449
Chapter 08 why the instruction operands are different even though the two equates are nearly identical. ; Ex8_3.asm ; ; Comparison of numeric equates with textual equates ; and the differences they produce at assembly time. ; cseg segment equ1 equ $+2 ;Evaluates "$" at this stmt. equ2 equ <$+2> ;Evaluates "$" on use. MyProc proc mov ax, 0 lea bx, equ1 lea bx, equ2 lea bx, equ1 lea bx, equ2 MyProc endp cseg ends end
8.23.4
IFDEF Exercise In this exercise, you will assemble a program that uses conditional assembly and observe the results. The Ex8_4.asm program uses the ifdef directive to test for the presence of DEBUG1 and DEBUG2 symbols. DEBUG1 appears in this program while DEBUG2 does not. For your lab report: assemble this code using the “/Fl” command line parameter. Include the listing in your lab report and explain the actions of the ifdef directives. ; ; ; ; ; ; ; ; ;
Ex8_4.asm Demonstration of IFDEF to control debugging features. This code assumes there are two levels of debugging controlled by the two symbols DEBUG1 and DEBUG2. In this code example DEBUG1 is defined while DEBUG2 is not. .xlist include stdlib.a .list .nolistmacro .listif
DEBUG1
=
cseg DummyProc
segment proc ifdef print byte byte endif ret endp
DummyProc Main
Page 450
proc ifdef print byte byte endif
0
DEBUG2 "In DummyProc" cr,lf,0
DEBUG1 "Calling DummyProc" cr,lf,0
call
DummyProc
ifdef
DEBUG1
Directives and Pseudo Opcodes
Main cseg
8.23.5
print byte byte endif ret endp ends end
"Return from DummyProc" cr,lf,0
Make File Exercise In this exercise you will experiment with a make file to see how nmake.exe chooses which files to reassemble. In this exercise you will be using the Ex8_5a.asm, Ex8_5b.asm, Ex8_5.a, and Ex8_5.mak files found in the Chapter Eight subdirectory on the companion CD-ROM. Copy these files to a local subdirectory on your hard disk (if they are not already there). These files contain a program that reads a string of text from the user and prints out any vowels in the input string. You will make minor changes to the .asm and .a files and run the make file and observe the results. The first thing you should do is assemble the program and create up to date .exe and .obj files for the project. You can do this with the following DOS command: nmake Ex8_5.mak
Assuming that the .obj and .exe files were not already present in the current directory, the nmake command above will assemble and link the Ex8_5a.asm and Ex8_5b.asm files producing the Ex8.exe executable. Using the editor, make a minor change (such as inserting a single space on a line containing a comment) to the Ex8_5a.asm file. Execute the above nmake command. Record what the make file does in your lab report. Next, make a minor change to the Ex8_5b.asm file. Run the above nmake command and record the result in your lab report. Explain the results. Finally, make a minor change to the Ex8_5.a file. Run the nmake command and describe the results in your lab report. For your lab report: explain how the changes to each of the files above affects the make operation. Explain why nmake does what it does. For additional credit: Try deleting (one at a time) the Ex8_5a.obj, Ex8_5b.obj, and Ex8_5.exe files and run the nmake command. Explain why nmake does what it does when you individually delete each of these files. Ex8_5.mak makefile: ex8_5.exe: ex8_5a.obj ex8_5b.obj ml /Feex8_5.exe ex8_5a.obj ex8_5b.obj ex8_5a.obj: ex8_5a.asm ex8_5.a ml /c ex8_5a.asm ex8_5b.obj: ex8_5b.asm ex8_5.a ml /c ex8_5b.asm
Ex8_5.a Header File: ; ; ; ; ; ; ;
Header file for Ex8_5 project. This file includes the EXTERNDEF directive which makes the PrintVowels name public/external. It also includes the PrtVowels macro which lets us call the PrintVowels routine in a manner similar to the UCR Standard Library
Page 451
Chapter 08 ; routines. externdef PrintVowels:near PrtVowels
macro call endm
PrintVowels
Ex8_5a.asm source file: ; ; ; ; ; ; ; ;
Ex8_5a.asm Randall Hyde 2/7/96 This program reads a string of symbols from the user and prints the vowels. It demonstrates the use of make files
.xlist include stdlib.a includelib stdlib.lib .list ; ; ; ; ;
The following include file brings in the external definitions of the routine(s) in the Lab6x10b module. In particular, it gives this module access to the "PrtVowels" routine found in Lab8_5b.asm. include
cseg
segment
Main
proc
Ex8_5.a
para public 'code'
meminit ; Read a string from the user, print all the vowels ; present in that string, and then free up the memory ; allocated by the GETSM routine: print byte byte byte
"I will find all your vowels" cr,lf "Enter a line of text: ",0
getsm print byte "Vowels on input line: ",0 PrtVowels putcr free
Page 452
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes
segment byte
para public 'zzzzzz' 16 dup (?)
Directives and Pseudo Opcodes zzzzzzseg
8.24 1)
ends end
Main
Programming Projects Write a program that inputs two 4x4 integer matrices from the user and compute their matrix product. The matrix multiply algorithm (computing C := A * B) is for i := 0 to 3 do for j := 0 to 3 do begin c[i,j] := 0; for k := 0 to 3 do c[i,j] := c[i,j] + a[i,k] * b[k,j]; end;
2)
3)
Feel free to use the ForLp and Next macros from Chapter Six. Modify the sample program (“Sample Program” on page 432) to use the FORLP and NEXT macros provided in the textbook. Replace all for loop simulations in the program with the corresponding macros. Write a program that asks the user to input three integer values, m, p, and n. This program should allocate storage for three arrays: A[0..m-1, 0..p-1], B[0..p-1, 0..n-1], and C[0..m-1, 0..n-1]. The program should then read values for arrays A and B from the user. Next, this program should compute the matrix product of A and B using the algorithm: for i := 0 to m-1 do for j := 0 to n-1 do begin c[i,j] := 0; for k := 0 to p-1 do c[i,j] := c[i,j] + a[i,k] * b[k,j]; end;
4)
5)
8.25
Finally, the program should print arrays A, B, and C. Feel free to use the ForLp and Next macro given in this chapter. You should also take a look at the sample program (see “Sample Program” on page 432) to see how to dynamically allocate storage for arrays and access arrays whose dimensions are not known until run time. The ForLp and Next macros provide in this chapter only increment their loop control variable by one on each iteration of the loop. Write a new macro, ForTo, that lets you specify an increment constant. Increment the loop control variable by this constant on each iteration of the for loop. Write a program to demonstrate the use of this macro. Hint: you will need to create a global label to pass the increment information to the NEXT macro, or you will need to perform the increment operation inside the ForLp macro. Write a third version for ForLp and Next (see Program #7 above) that lets you specify negative increments (like the for..downto statement in Pascal). Call this macro ForDT (for..downto).
Summary This chapter introduced several assembler directives and pseudo-opcodes supported by MASM. This chapter, by no means, is a complete description of what MASM has to offer. It does provide enough information to get you going. Assembly language statements are free format and there is usually one statement per line in your source file. Although MASM allows free format input, you should carefully structure your source files to make them easier to read. •
See “Assembly Language Statements” on page 355.
Page 453
Chapter 08 MASM keeps track of the offset of an instruction or variable in a segment using the location counter. MASM increments the location counter by one for each byte of object code it writes to the output file. •
See “The Location Counter” on page 357.
Like HLLs, MASM lets you use symbolic names for variables and statement labels. Dealing with symbols is much easier than numeric offsets in an assembly language program. MASM symbols look a whole lot like their HLL with a few extensions. •
See “Symbols” on page 358
MASM provides several different types of literal constants including binary, decimal, and hexadecimal integer constants, string constants, and text constants. • • • •
See “Literal Constants” on page 359. See “Integer Constants” on page 360. See “String Constants” on page 361. See “Text Constants” on page 362.
To help you manipulate segments within your program, MASM provides the segment/ends directives. With the segment directive you can control the loading order and alignment of modules in memory. • • • • • • •
See “Segments” on page 366. See “Segment Names” on page 367. See “Segment Loading Order” on page 368. See “Segment Operands” on page 369. See “The CLASS Type” on page 374. See “Typical Segment Definitions” on page 376. See “Why You Would Want to Control the Loading Order” on page 376.
MASM provides the proc/endp directives for declaring procedures within your assembly language programs. Although not strictly necessary, the proc/endp directives make your programs much easier to read and maintain. The proc/endp directives also let you use local statement names within your procedures. •
See “Procedures” on page 365.
Equates let you define symbolic constants of various sorts in your program. MASM provides three directives for defining such constants: “=”, equ, and textequ. As with HLLs, the judicious use of equates can help make your program easier to read. •
See “Declaring Manifest Constants Using Equates” on page 362.
As you saw in Chapter Four, MASM gives you the ability to declare variables in the data segment using the byte, word, dword and other directives. MASM is a strongly typed assembler and attaches a type as well as a location to variable names (most assemblers only attach a location). This helps MASM locate obscure bugs in your program. • • • • •
See “Variables” on page 384. See“Label Types” on page 385. See “How to Give a Symbol a Particular Type” on page 385. See “Label Values” on page 386. See “Type Conflicts” on page 386.
MASM supports address expressions that let you use arithmetic operators to build constant address values at assembly time. It also lets you override the type of an address value and extract various pieces of information about a symbol. This is very useful for writing maintainable programs. • • • • •
Page 454
See “Address Expressions” on page 387. See “Symbol Types and Addressing Modes” on page 387. See “Arithmetic and Logical Operators” on page 388. See “Coercion” on page 390. See “Type Operators” on page 392.
Directives and Pseudo Opcodes •
See “Operator Precedence” on page 396.
MASM provides several facilities for telling the assembler which segment associates with which segment register. It also gives you the ability to override a default choice. This lets your program manage several segments at once with a minimum of fuss. • •
See “Segment Prefixes” on page 377. See “Controlling Segments with the ASSUME Directive” on page 377.
MASM provides you with a “conditional assembly” capability that lets you choose which segments of code are actually assembled during the assembly process. This is useful for inserting debugging code into your programs (that you can easily remove with a single statement) and for writing programs that need to run in different environments (by inserting and removing different sections of code). • • • • • •
See “Conditional Assembly” on page 397. See “IF Directive” on page 398. See “IFE directive” on page 399. See “IFDEF and IFNDEF” on page 399. See “IFB, IFNB” on page 399. See “IFIDN, IFDIF, IFIDNI, and IFDIFI” on page 400.
MASM, living up to its name, provides a powerful macro facility. Macros are sections of code you can replicate by simply placing the macro’s name in your code. Macros, properly used, can help you write shorter, easier to read, and more robust programs. Alas, improperly used, macros produce hard to maintain, inefficient programs. • • • • • •
See “Macros” on page 400. See “Procedural Macros” on page 400. See “The LOCAL Directive” on page 406. See“The EXITM Directive” on page 406. See “Macros: Good and Bad News” on page 419. See “Repeat Operations” on page 420.
MASM provides several directives you can use to produce “assembled listings” or print-outs of your program with lots of assembler generated (useful!) information. These directives let you turn on and off the listing operation, display information on the display during assembly, and set titles on the output. • • • • • • •
See “Controlling the Listing” on page 424. See “The ECHO and %OUT Directives” on page 424. See “The TITLE Directive” on page 424. See “The SUBTTL Directive” on page 424. See “The PAGE Directive” on page 424. See “The .LIST, .NOLIST, and .XLIST Directives” on page 425. See “Other Listing Directives” on page 425.
To handle large projects (“Programming in the Large”) requires separate compilation (or separate assembly in MASM’s case). MASM provides several directives that let you merge source files during assembly, separately assemble modules, and communicate procedure and variables names between the modules. • • • •
See “Managing Large Programs” on page 425. See “The INCLUDE Directive” on page 426. See “The PUBLIC, EXTERN, and EXTRN Directives” on page 427. See “The EXTERNDEF Directive” on page 428.
Page 455
Chapter 08
8.26
Questions
1)
What is the difference between the following instruction sequences?
and
MOV
AX, VAR+1
MOV INC
AX, VAR AX
2)
What is the source line format for an assembly language statement?
3)
What is the purpose of the ASSUME directive?
4)
What is the location counter?
5)
Which of the following symbols are valid? a) ThisIsASymbol
b) This_Is_A_Symbol
c) This.Is.A.Symbol
d) .Is_This_A_Symbol?
e) ________________
f) @_$?_To_You
g) 1WayToGo
h) %Hello
i) F000h
j) ?A_0$1
k) $1234
l) Hello there
6)
How do you specify segment loading order?
7)
What is the type of the symbols declared by the following statements? a)symbol1 b)symbol2: c)symbol3 d)symbol4 e)symbol5 f)symbol6 g)symbol7 h)symbol8 i)symbol9 j)symbol10 k)symbol11 l)symbol12 m)symbol13 n)symbol14
equ proc db dw proc equ equ dd macro segment equ equ equ
0
? ? far this word byte ptr symbol7 ? para public 'data' this near 'ABCD' <MOV AX, 0>
8)
Which of the symbols in question 7 are not assigned the current location counter value?
9)
Explain the purpose of the following operators:
10)
a) PTR
b) SHORT
f) SEG
g) OFFSET
c) THIS
d) HIGH
e) LOW
What is the difference between the values loaded into the BX register (if any) in the following code sequence? mov lea
bx, offset Table bx, Table
11)
What is the difference between the REPEAT macro and the DUP operator?
12)
In what order will the following segments be loaded into memory? CSEG CSEG DSEG DSEG ESEG
Page 456
segment … ends segment … ends segment …
para public 'CODE'
para public 'DATA'
para public 'CODE'
Directives and Pseudo Opcodes ESEG
13)
ends
Which of the following address expressions do not produce the same result as the others: a) Var1[3][5]
b) 15[Var1]
e) Var1*3*5
f) Var1+3+5
c) Var1[8]
d) Var1+2[6]
Page 457
Chapter 08
Page 458
Arithmetic and Logical Operations
Chapter Nine
There is a lot more to assembly language than knowing the operations of a handful of machine instructions. You’ve got to know how to use them and what they can do. Many instructions are useful for operations that have little to do with their mathematical or obvious functions. This chapter discusses how to convert expressions from a high level language into assembly language. It also discusses advanced arithmetic and logical operations including multiprecision operations and tricks you can play with various instructions.
9.0
Chapter Overview This chapter discusses six main subjects: converting HLL arithmetic expressions into assembly language, logical expressions, extended precision arithmetic and logical operations, operating on different sized operands, machine and arithmetic idioms, and masking operations. Like the preceding chapters, this chapter contains considerable material that you may need to learn immediately if you’re a beginning assembly language programmer. The sections below that have a “•” prefix are essential. Those sections with a “❏” discuss advanced topics that you may want to put off for a while. • • • • • • • • • • ❏ ❏ ❏
• ❏ ❏
• ❏ ❏ ❏ ❏
• ❏ ❏ ❏ ❏ ❏ ❏
Arithmetic expressions Simple assignments Simple expressions Complex expressions Commutative operators Logical expressions Multiprecision operations Multiprecision addition operations Multiprecision subtraction operations Extended precision comparisons Extended precision multiplication Extended precision division Extended precision negation Extended precision AND, OR, XOR, and NOT Extended precision shift and rotate operations Operating on different sized operands Multiplying without MUL and IMUL Division without DIV and IDIV Using AND to compute remainders Modulo-n Counters with AND Testing for 0FFFFF...FFFh Test operations Testing signs with the XOR instructions Masking operations Masking with the AND instructions Masking with the OR instruction Packing and unpacking data types Table lookups
None of this material is particularly difficult to understand. However, there are a lot of new topics here and taking them a few at a time will certain help you absorb the material better. Those topics with the “•” prefix are ones you will frequently use; hence it is a good idea to study these first.
Page 459 Thi d
t
t d ith F
M k
402
Chapter 09
9.1
Arithmetic Expressions Probably the biggest shock to beginners facing assembly language for the very first time is the lack of familiar arithmetic expressions. Arithmetic expressions, in most high level languages, look similar to their algebraic equivalents: X:=Y*Z;
In assembly language, you’ll need several statements to accomplish this same task, e.g., mov imul mov
ax, y z x, ax
Obviously the HLL version is much easier to type, read, and understand. This point, more than any other, is responsible for scaring people away from assembly language. Although there is a lot of typing involved, converting an arithmetic expression into assembly language isn’t difficult at all. By attacking the problem in steps, the same way you would solve the problem by hand, you can easily break down any arithmetic expression into an equivalent sequence of assembly language statements. By learning how to convert such expressions to assembly language in three steps, you’ll discover there is little difficulty to this task.
9.1.1
Simple Assignments The easiest expressions to convert to assembly language are the simple assignments. Simple assignments copy a single value into a variable and take one of two forms: variable := constant
or variable := variable
If variable appears in the current data segment (e.g., DSEG), converting the first form to assembly language is easy, just use the assembly language statement: mov
variable, constant
This move immediate instruction copies the constant into the variable. The second assignment above is somewhat complicated since the 80x86 doesn’t provide a memory–to-memory mov instruction. Therefore, to copy one memory variable into another, you must move the data through a register. If you’ll look at the encoding for the mov instruction in the appendix, you’ll notice that the mov ax, memory and mov memory, ax instructions are shorter than moves involving other registers. Therefore, if the ax register is available, you should use it for this operation. For example, var1 := var2;
becomes mov mov
ax, var2 var1, ax
Of course, if you’re using the ax register for something else, one of the other registers will suffice. Regardless, you must use a register to transfer one memory location to another. This discussion, of course, assumes that both variables are in memory. If possible, you should try to use a register to hold the value of a variable.
9.1.2
Simple Expressions The next level of complexity up from a simple assignment is a simple expression. A simple expression takes the form:
Page 460
Arithmetic and Logical Operations var := term1 op term2; Var is a variable, term1 and term2 are variables or constants, and op is some arithmetic operator (addition, subtraction, multiplication, etc.).
As simple as this expression appears, most expressions take this form. It should come as no surprise then, that the 80x86 architecture was optimized for just this type of expression. A typical conversion for this type of expression takes the following form: mov op mov
ax, term1 ax, term2 var, ax
Op is the mnemonic that corresponds to the specified operation (e.g., “+” = add, “-” = sub, etc.).
There are a few inconsistencies you need to be aware of. First, the 80x86’s {i}mul instructions do not allow immediate operands on processors earlier than the 80286. Further, none of the processors allow immediate operands with {i}div. Therefore, if the operation is multiplication or division and one of the terms is a constant value, you may need to load this constant into a register or memory location and then multiply or divide ax by that value. Of course, when dealing with the multiply and divide instructions on the 8086/8088, you must use the ax and dx registers. You cannot use arbitrary registers as you can with other operations. Also, don’t forget the sign extension instructions if you’re performing a division operation and you’re dividing one 16/32 bit number by another. Finally, don’t forget that some instructions may cause overflow. You may want to check for an overflow (or underflow) condition after an arithmetic operation. Examples of common simple expressions: X := Y + Z; mov add mov
ax, y ax, z x, ax
mov sub mov
ax, y ax, z x, ax
X := Y - Z;
X := Y * Z; {unsigned} mov mul mov
ax, y z x, ax
;Use IMUL for signed arithmetic. ;Don’t forget this wipes out DX.
X := Y div Z; {unsigned div} mov mov div mov
ax, y dx, 0 z x, ax
;Zero extend AX into DX
X := Y div Z; {signed div} mov cwd idiv mov
ax, y ;Sign extend AX into DX z x, ax
X := Y mod Z; {unsigned remainder} mov mov div mov
ax, y dx, 0 z x, dx
;Zero extend AX into DX ;Remainder is in DX
Page 461
Chapter 09 X := Y mod Z; {signed remainder} mov cwd idiv mov
ax, y ;Sign extend AX into DX z x, dx
;Remainder is in DX
Since it is possible for an arithmetic error to occur, you should generally test the result of each expression for an error before or after completing the operation. For example, unsigned addition, subtraction, and multiplication set the carry flag if an overflow occurs. You can use the jc or jnc instructions immediately after the corresponding instruction sequence to test for overflow. Likewise, you can use the jo or jno instructions after these sequences to test for signed arithmetic overflow. The next two examples demonstrate how to do this for the add instruction: X := Y + Z; {unsigned} mov add mov jc
ax, y ax, z x, ax uOverflow
X := Y + Z; {signed} mov add mov jo
ax, y ax, z x, ax sOverflow
Certain unary operations also qualify as simple expressions. A good example of a unary operation is negation. In a high level language negation takes one of two possible forms: var := -var
or
var1 := -var2
Note that var := -constant is really a simple assignment, not a simple expression. You can specify a negative constant as an operand to the mov instruction: mov
var, -14
To handle the first form of the negation operation above use the single assembly language statement: neg
var
If two different variables are involved, then use the following: mov neg mov
ax, var2 ax var1, ax
Overflow only occurs if you attempt to negate the most negative value (-128 for eight bit values, -32768 for sixteen bit values, etc.). In this instance the 80x86 sets the overflow flag, so you can test for arithmetic overflow using the jo or jno instructions. In all other cases the80x86 clears the overflow flag. The carry flag has no meaning after executing the neg instruction since neg (obviously) does not apply to unsigned operands.
9.1.3
Complex Expressions A complex expression is any arithmetic expression involving more than two terms and one operator. Such expressions are commonly found in programs written in a high level language. Complex expressions may include parentheses to override operator precedence, function calls, array accesses, etc. While the conversion of some complex expressions to assembly language is fairly straight-forward, others require some effort. This section outlines the rules you use to convert such expressions. A complex function that is easy to convert to assembly language is one that involves three terms and two operators, for example: W := W - Y - Z;
Page 462
Arithmetic and Logical Operations Clearly the straight-forward assembly language conversion of this statement will require two sub instructions. However, even with an expression as simple as this one, the conversion is not trivial. There are actually two ways to convert this from the statement above into assembly language: mov sub sub mov
ax, w ax, y ax, z w, ax
mov sub sub
ax, y ax, z w, ax
and
The second conversion, since it is shorter, looks better. However, it produces an incorrect result (assuming Pascal-like semantics for the original statement). Associativity is the problem. The second sequence above computes W := W - (Y - Z) which is not the same as W := (W - Y) - Z. How we place the parentheses around the subexpressions can affect the result. Note that if you are interested in a shorter form, you can use the following sequence: mov add sub
ax, y ax, z w, ax
This computes W:=W-(Y+Z). This is equivalent to W := (W - Y) - Z. Precedence is another issue. Consider the Pascal expression: X := W * Y + Z;
Once again there are two ways we can evaluate this expression: X := (W * Y) + Z; or X := W * (Y + Z);
By now, you’re probably thinking that this text is crazy. Everyone knows the correct way to evaluate these expressions is the second form provided in these two examples. However, you’re wrong to think that way. The APL programming language, for example, evaluates expressions solely from right to left and does not give one operator precedence over another. Most high level languages use a fixed set of precedence rules to describe the order of evaluation in an expression involving two or more different operators. Most programming languages, for example, compute multiplication and division before addition and subtraction. Those that support exponentiation (e.g., FORTRAN and BASIC) usually compute that before multiplication and division. These rules are intuitive since almost everyone learns them before high school. Consider the expression: X op1 Y op2 Z
If op1 takes precedence over op2 then this evaluates to (X op1 Y) op2 Z otherwise if op2 takes precedence over op1 then this evaluates to X op1 (Y op2 Z ). Depending upon the operators and operands involved, these two computations could produce different results. When converting an expression of this form into assembly language, you must be sure to compute the subexpression with the highest precedence first. The following example demonstrates this technique: ; W := X + Y * Z; mov mov mul add mov
bx, x ax, y z bx, ax w, bx
;Must compute Y * Z first since ; “*” has the highest precedence. ;Now add product with X’s value. ;Save away result.
Since addition is a commutative operation, we could optimize the above code to produce: Page 463
Chapter 09 ; W := X + Y * Z; mov mul add mov
ax, y z ax, x w, ax
;Must compute Y * Z first since ; “*” has the highest precedence. ;Now add product with X’s value. ;Save away result.
If two operators appearing within an expression have the same precedence, then you determine the order of evaluation using associativity rules. Most operators are left associative meaning that they evaluate from left to right. Addition, subtraction, multiplication, and division are all left associative. A right associative operator evaluates from right to left. The exponentiation operator in FORTRAN and BASIC is a good example of a right associative operator: 2^2^3 is equal to 2^(2^3) not (2^2)^3
The precedence and associativity rules determine the order of evaluation. Indirectly, these rules tell you where to place parentheses in an expression to determine the order of evaluation. Of course, you can always use parentheses to override the default precedence and associativity. However, the ultimate point is that your assembly code must complete certain operations before others to correctly compute the value of a given expression. The following examples demonstrate this principle: ; W := X - Y - Z mov sub sub mov
ax, x ax, y ax, z w, ax
;All the same operator, so we need ; to evaluate from left to right ; because they all have the same ; precedence.
mov imul add mov
ax, y z ax, x w, ax
;Must compute Y * Z first since ; multiplication has a higher ; precedence than addition.
mov cwd idiv sub mov
ax, x
;Here we need to compute division ; first since it has the highest ; precedence.
mov imul imul mov
ax, y z x w, ax
; W := X + Y * Z
; W := X / Y - Z
y ax, z w, ax
; W := X * Y * Z ;Addition and multiplication are ; commutative, therefore the order ; of evaluation does not matter
There is one exception to the associativity rule. If an expression involves multiplication and division it is always better to perform the multiplication first. For example, given an expression of the form: W := X/Y * Z
It is better to compute X*Z and then divide the result by Y rather than divide X by Y and multiply the quotient by Z. There are two reasons this approach is better. First, remember that the imul instruction always produces a 32 bit result (assuming 16 bit operands). By doing the multiplication first, you automatically sign extend the product into the dx register so you do not have to sign extend ax prior to the division. This saves the execution of the cwd instruction. A second reason for doing the multiplication first is to increase the accuracy of the computation. Remember, (integer) division often produces an inexact result. For example, if you compute 5/2 you will get the value two, not 2.5. Computing (5/2)*3 produces six. However, if you compute (5*3)/2 you get the value seven which is a little closer to the real quotient (7.5). Therefore, if you encounter an expression of the form: W := X/Y*Z;
You can usually convert this to assembly code: Page 464
Arithmetic and Logical Operations mov imul idiv mov
ax, x z z w, ax
Of course, if the algorithm you’re encoding depends on the truncation effect of the division operation, you cannot use this trick to improve the algorithm. Moral of the story: always make sure you fully understand any expression you are converting to assembly language. Obviously if the semantics dictate that you must perform the division first, do so. Consider the following Pascal statement: W := X - Y * Z;
This is similar to a previous example except it uses subtraction rather than addition. Since subtraction is not commutative, you cannot compute Y * Z and then subtract X from this result. This tends to complicate the conversion a tiny amount. Rather than a straight forward multiply and addition sequence, you’ll have to load X into a register, multiply Y and Z leaving their product in a different register, and then subtract this product from X, e.g., mov mov imul sub mov
bx, x ax, y z bx, ax w, bx
This is a trivial example that demonstrates the need for temporary variables in an expression. The code uses the bx register to temporarily hold a copy of X until it computes the product of Y and Z. As your expression increase in complexity, the need for temporaries grows. Consider the following Pascal statement: W := (A + B) * (Y + Z);
Following the normal rules of algebraic evaluation, you compute the subexpressions inside the parentheses (i.e., the two subexpressions with the highest precedence) first and set their values aside. When you computed the values for both subexpressions you can compute their sum. One way to deal with complex expressions like this one is to reduce it to a sequence of simple expressions whose results wind up in temporary variables. For example, we can convert the single expression above into the following sequence: Temp1 := A + B; Temp2 := Y + Z; W := Temp1 * Temp2;
Since converting simple expressions to assembly language is quite easy, it’s now a snap to compute the former, complex, expression in assembly. The code is mov add mov mov add mov mov imul mov
ax, a ax, b Temp1, ax ax, y ax, z temp2, ax ax, temp1, temp2 w, ax
Of course, this code is grossly inefficient and it requires that you declare a couple of temporary variables in your data segment. However, it is very easy to optimize this code by keeping temporary variables, as much as possible, in 80x86 registers. By using 80x86 registers to hold the temporary results this code becomes: mov add mov add imul mov
ax, a ax, b bx, y bx, z bx w, ax
Yet another example: Page 465
Chapter 09 X := (Y+Z) * (A-B) / 10;
This can be converted to a set of four simple expressions: Temp1 := (Y+Z) Temp2 := (A-B) Temp1 := Temp1 * Temp2 X := Temp1 / 10
You can convert these four simple expressions into the assembly language statements: mov add mov sub mul mov idiv mov
ax, y ax, z bx, a bx, b bx bx, 10 bx x, ax
;Compute AX := Y+Z ;Compute BX := A-B ;Compute AX := AX * BX, this also sign ; extends AX into DX for idiv. ;Compute AX := AX / 10 ;Store result into X
The most important thing to keep in mind is that temporary values, if possible, should be kept in registers. Remember, accessing an 80x86 register is much more efficient than accessing a memory location. Use memory locations to hold temporaries only if you’ve run out of registers to use. Ultimately, converting a complex expression to assembly language is little different than solving the expression by hand. Instead of actually computing the result at each stage of the computation, you simply write the assembly code that computes the results. Since you were probably taught to compute only one operation at a time, this means that manual computation works on “simple expressions” that exist in a complex expression. Of course, converting those simple expressions to assembly is fairly trivial. Therefore, anyone who can solve a complex expression by hand can convert it to assembly language following the rules for simple expressions.
9.1.4
Commutative Operators If “@” represents some operator, that operator is commutative if the following relationship is always true: (A @ B) = (B @ A)
As you saw in the previous section, commutative operators are nice because the order of their operands is immaterial and this lets you rearrange a computation, often making that computation easier or more efficient. Often, rearranging a computation allows you to use fewer temporary variables. Whenever you encounter a commutative operator in an expression, you should always check to see if there is a better sequence you can use to improve the size or speed of your code. The following tables list the commutative and non-commutative operators you typically find in high level languages:
Table 46: Some Common Commutative Binary Operators
Page 466
Pascal
C/C++
Description
+
+
Addition
*
*
Multiplication
AND
&& or &
OR
|| or |
XOR
^
=
==
Equality
<>
!=
Inequality
Logical or bitwise AND Logical or bitwise OR (Logical or) Bitwise exclusive-OR
Arithmetic and Logical Operations
Table 47: Some Common Noncommutative Binary Operators
9.2
Pascal
C/C++
Description
-
-
Subtraction
/ or DIV
/
Division
MOD
%
Modulo or remainder
<
<
Less than
<=
<=
Less than or equal
>
>
Greater than
>=
>=
Greater than or equal
Logical (Boolean) Expressions Consider the following expression from a Pascal program: B := ((X=Y) and (A <= C)) or ((Z-A) <> 5); B is a boolean variable and the remaining variables are all integers.
How do we represent boolean variables in assembly language? Although it takes only a single bit to represent a boolean value, most assembly language programmers allocate a whole byte or word for this purpose. With a byte, there are 256 possible values we can use to represent the two values true and false. So which two values (or which two sets of values) do we use to represent these boolean values? Because of the machine’s architecture, it’s much easier to test for conditions like zero or not zero and positive or negative rather than to test for one of two particular boolean values. Most programmers (and, indeed, some programming languages like “C”) choose zero to represent false and anything else to represent true. Some people prefer to represent true and false with one and zero (respectively) and not allow any other values. Others select 0FFFFh for true and 0 for false. You could also use a positive value for true and a negative value for false. All these mechanisms have their own advantages and drawbacks. Using only zero and one to represent false and true offers one very big advantage: the 80x86 logical instructions (and, or, xor and, to a lesser extent, not) operate on these values exactly as you would expect. That is, if you have two boolean variables A and B, then the following instructions perform the basic logical operations on these two variables: mov and mov
ax, A ax, B C, ax
;C := A and B;
mov or mov
ax, A ax, B C, ax
;C := A or B;
mov xor mov
ax, A ax, B C, ax
;C := A xor B;
mov not and mov
ax, A ax ax, 1 B, ax
;Note that the NOT instruction does not ; properly compute B := not A by itself. ; I.e., (NOT 0)does not equal one. ;B := not A;
mov xor mov
ax, A ax, 1 B, ax
;Another way to do B := NOT A; ;B := not A;
Note, as pointed out above, that the not instruction will not properly compute logical negation. The bitwise not of zero is 0FFh and the bitwise not of one is 0FEh. Neither result is zero or one. However, by anding the result with one you get the proper result. Note that Page 467
Chapter 09 you can implement the not operation more efficiently using the xor ax, 1 instruction since it only affects the L.O. bit. As it turns out, using zero for false and anything else for true has a lot of subtle advantages. Specifically, the test for true or false is often implicit in the execution of any logical instruction. However, this mechanism suffers from a very big disadvantage: you cannot use the 80x86 and, or, xor, and not instructions to implement the boolean operations of the same name. Consider the two values 55h and 0AAh. They’re both non-zero so they both represent the value true. However, if you logically and 55h and 0AAh together using the 80x86 and instruction, the result is zero. (True and true) should produce true, not false. A system that uses non-zero values to represent true and zero to represent false is an arithmetic logical system. A system that uses the two distinct values like zero and one to represent false and true is called a boolean logical system, or simply a boolean system. You can use either system, as convenient. Consider again the boolean expression: B := ((X=Y) and (A <= D)) or ((Z-A) <> 5);
The simple expressions resulting from this expression might be: Temp2 := X = Y Temp := A <= D Temp := Temp and Temp2 Temp2 := Z-A Temp2 := Temp2 <> 5 B := Temp or Temp2
The assembly language code for these expressions could be:
L1: L2:
ST1: L3:
ST2: L4:
mov cmp jnz mov jmp mov
ax, ax, L1 al, L2 al,
x y 1
;See if X = Y and load zero or ; one into AX to denote the result ; of this comparison. ;X = Y
0
;X <> Y
mov cmp jle mov jmp mov
bx, bx, ST1 bl, L3 bl,
A D
;See if A <= D and load zero or one ; into BX to denote the result of ; this comparison.
and
bl, al
;Temp := Temp and Temp2
mov sub cmp jnz mov jmp mov
ax, Z ax, A ax, 5 ST2 al, 0 short L4 al, 1
;See if (Z-A) <> 5. ;Temp2 := Z-A; ;Temp2 := Temp2 <> 5;
or mov
al, bl B, al
;Temp := Temp or Temp2; ;B := Temp;
0 1
As you can see, this is a rather unwieldy sequence of statements. One slight optimization you can use is to assume a result is going to be true or false and initialize the corresponding boolean result ahead of time: mov mov cmp jne mov
bl, ax, ax, L1 bl,
0 x Y
;Assume X <> Y
1
;X is equal to Y, so make this true.
mov mov cmp jnle mov
bh, ax, ax, L2 bh,
0 A D
;Assume not (A <= D)
1
;A <= D so make this true
L1:
Page 468
Arithmetic and Logical Operations L2: and
bl, bh
;Compute logical AND of results.
mov mov sub cmp je mov
bh, ax, ax, ax, L3: bh,
0 Z A 5
;Assume (Z-A) = 5
1
;(Z-A) <> 5
or mov
bl, bh B, bl
L3: ;Logical OR of results. ;Save boolean result.
Of course, if you have an 80386 or later processor, you can use the setcc instructions to simplify this a bit: mov cmp sete
ax, x ax, y al
mov cmp setle and mov sub cmp setne or mov
bx, A bx, D bl bl, al ax, Z ax, A ax, 5 al bl, al B, bl
;TEMP2 := X = Y
;TEMP := A <= D ;Temp := Temp and Temp2 ;Temp2 := Z-A; ;Temp2 := Temp2 <> 5; ;Temp := Temp or Temp2; ;B := Temp;
This code sequence is obviously much better than the previous one, but it will only execute on 80386 and later processors. Another way to handle boolean expressions is to represent boolean values by states within your code. The basic idea is to forget maintaining a boolean variable throughout the execution of a code sequence and use the position within the code to determine the boolean result. Consider the following implementation of the above expression. First, let’s rearrange the expression to be B := ((Z-A) <> 5) or ((X=Y) and (A <= D));
This is perfectly legal since the or operation is commutative. Now consider the following implementation:
SetBtoFalse: Done:
mov mov sub cmp jne
B, 1 ax, Z ax, A ax, 5 Done
;Assume the result is true. ;See if (Z-A) <> 5 ;If this condition is true, the ; result is always true so there ; is no need to check the rest.
mov cmp jne
ax, X ax, Y SetBtoFalse
;If X <> Y, the result is false, ; no matter what A and D contain
mov cmp jle mov
ax, A ax, D Done B, 0
;Now see if A <= D. ;If so, quit. ;If B is false, handle that here.
Notice that this section of code is a lot shorter than the first version above (and it runs on all processors). The previous translations did everything computationally. This version uses program flow logic to improve the code. It begins by assuming a true result and sets the B variable to true. It then checks to see if (Z-A) <> 5. If this is true the code branches to the done table because B is true no matter what else happens. If the program falls through to the mov ax, X instruction, we know that the result of the previous comparison is false. There is no need to save this result in a temporary since we implicitly know its value by the fact that we’re executing the mov ax, X instruction. Likewise, the second group of statements above checks to see if X is equal to Y. If it is not, we already know the result is false Page 469
Chapter 09 so this code jumps to the SetBtoFalse label. If the program begins executing the third set of statements above, we know that the first result was false and the second result was true; the position of the code guarantees this. Therefore, there is no need to maintain temporary boolean variables that keep track of the state of this computation. Consider another example: B := ((A = E) or (F <> D)) and ((A<>B) or (F = D));
Computationally, this expression would yield a considerable amount of code. However, by using flow control you can reduce it to the following:
Test2:
SetBto1: Done:
mov mov cmp je
b, 0 ax, a ax, e test2
;Assume result is false. ;See if A = E.
mov cmp je
ax, f ax, d Done
mov cmp jne
ax, a ax, b SetBto1
;If not, check 2nd subexpression ; to see if F <> D. ;If so, we’re done, else fall ; through to next tests. ;Does A <> B?
mov cmp jne
ax, f ax, d Done
mov
b, 1
;If so, 1st subexpression is true.
;If so, we’re done. ;If not, see if F = D.
There is one other difference between using control flow vs. computation logic: when using control flow methods, you may skip the majority of the instructions that implement the boolean formula. This is known as short-circuit evaluation. When using the computation model, even with the setcc instruction, you wind up executing most of the statements. Keep in mind that this is not necessarily a disadvantage. On pipelined processors it may be much faster to execute several additional instructions rather than flush the pipeline and prefetch queue. You may need to experiment with your code to determine the best solution. When working with boolean expressions don’t forget the that you might be able to optimize your code by simplifying those boolean expressions (see “Simplification of Boolean Functions” on page 52). You can use algebraic transformations (especially DeMorgan’s theorems) and the mapping method to help reduce the complexity of an expression.
9.3
Multiprecision Operations One big advantage of assembly language over HLLs is that assembly language does not limit the size of integers. For example, the C programming language defines a maximum of three different integer sizes: short int, int, and long int. On the PC, these are often 16 or 32 bit integers. Although the 80x86 machine instructions limit you to processing eight, sixteen, or thirty-two bit integers with a single instruction, you can always use more than one instruction to process integers of any size you desire. If you want 256 bit integer values, no problem. The following sections describe how extended various arithmetic and logical operations from 16 or 32 bits to as many bits as you please.
9.3.1
Multiprecision Addition Operations The 80x86 add instruction adds two 8, 16, or 32 bit numbers1. After the execution of the add instruction, the 80x86 carry flag is set if there is an overflow out of the H.O. bit of
1. As usual, 32 bit arithmetic is available only on the 80386 and later processors.
Page 470
Arithmetic and Logical Operations
Step 1: Add the least significant words together:
Step 2: Add the middle words together:
C
(plus carry, if any) Step 3: Add the most significant words together:
C
(plus carry, if any)
Figure 8.1 Multiprecision (48-bit) Addition the sum. You can use this information to do multiprecision addition operations. Consider the way you manually perform a multidigit (multiprecision) addition operation: Step 1: Add the least significant digits together: 289 +456 ----
produces
289 +456 ---5 with carry 1.
Step 2: Add the next significant digits plus the carry: 1 (previous carry) 289 +456 produces ---5
289 +456 ---45 with carry 1.
Step 3: Add the most significant digits plus the carry: 289 +456 ---45
produces
1 (previous carry) 289 +456 ---745
The 80x86 handles extended precision arithmetic in an identical fashion, except instead of adding the numbers a digit at a time, it adds them a byte or a word at a time. Consider the three-word (48 bit) addition operation in Figure 8.1.
Page 471
Chapter 09 The add instruction adds the L.O. words together. The adc (add with carry) instruction adds all other word pairs together. The adc instruction adds two operands plus the carry flag together producing a word value and (possibly) a carry. For example, suppose that you have two thirty-two bit values you wish to add together, defined as follows: X Y
dword dword
? ?
Suppose, also, that you want to store the sum in a third variable, Z, that is likewise defined with the dword directive. The following 80x86 code will accomplish this task: mov add mov mov adc mov
ax, word ax, word word ptr ax, word ax, word word ptr
ptr X ptr Y Z, ax ptr X+2 ptr Y+2 Z+2, ax
Remember, these variables are declared with the dword directive. Therefore the assembler will not accept an instruction of the form mov ax, X because this instruction would attempt to load a 32 bit value into a 16 bit register. Therefore this code uses the word ptr coercion operator to coerce symbols X, Y, and Z to 16 bits. The first three instructions add the L.O. words of X and Y together and store the result at the L.O. word of Z. The last three instructions add the H.O. words of X and Y together, along with the carry out of the L.O. word, and store the result in the H.O. word of Z. Remember, address expressions of the form “X+2” access the H.O. word of a 32 bit entity. This is due to the fact that the 80x86 address space addresses bytes and it takes two consecutive bytes to form a word. Of course, if you have an 80386 or later processor you needn’t go through all this just to add two 32 bit values together, since the 80386 directly supports 32 bit operations. However, if you wanted to add two 64 bit integers together on the 80386, you would still need to use this technique. You can extend this to any number of bits by using the adc instruction to add in the higher order words in the values. For example, to add together two 128 bit values, you could use code that looks something like the following: BigVal1 BigVal2 BigVal3
dword dword dword
0,0,0,0 0,0,0,0 0,0,0,0
;Four double words in 128 bits!
mov add mov
eax, BigVal1 eax, BigVal2 BigVal3, eax
;No need for dword ptr operator since ; these are dword variables.
mov adc mov
eax, BigVal1+4 eax, BigVal2+4 BigVal3+4, eax
;Add in the values from the L.O. ; entity to the H.O. entity using ; the ADC instruction.
mov adc mov
eax, BigVal1+8 eax, BigVal2+8 BigVal3+8, eax
mov adc mov
eax, BigVal1+12 eax, BigVal2+12 BigVal3+12, eax
. . .
9.3.2
Multiprecision Subtraction Operations Like addition, the 80x86 performs multi-byte subtraction the same way you would manually, except it subtracts whole bytes , words, or double words at a time rather than decimal digits. The mechanism is similar to that for the add operation, You use the sub instruction on the L.O. byte/word/double word and the sbb instruction on the high order
Page 472
Arithmetic and Logical Operations values. The following example demonstrates a 32 bit subtraction using the 16 bit registers on the 8086: var1 var2 diff
dword dword dword
? ? ?
mov sub mov mov sbb mov
ax, word ax, word word ptr ax, word ax, word word ptr
ptr var1 ptr var2 diff, ax ptr var1+2 ptr var2+2 diff+2, ax
The following example demonstrates a 128-bit subtraction using the 80386 32 bit register set: BigVal1 BigVal2 BigVal3
dword dword dword
0,0,0,0 0,0,0,0 0,0,0,0
;Four double words in 128 bits!
mov sub mov
eax, BigVal1 eax, BigVal2 BigVal3, eax
;No need for dword ptr operator since ; these are dword variables.
mov sbb mov
eax, BigVal1+4 eax, BigVal2+4 BigVal3+4, eax
;Subtract the values from the L.O. ; entity to the H.O. entity using ; the SUB and SBB instructions.
mov sbb mov
eax, BigVal1+8 eax, BigVal2+8 BigVal3+8, eax
mov sbb mov
eax, BigVal1+12 eax, BigVal2+12 BigVal3+12, eax
. . .
9.3.3
Extended Precision Comparisons Unfortunately, there isn’t a “compare with borrow” instruction that can be used to perform extended precision comparisons. Since the cmp and sub instructions perform the same operation, at least as far as the flags are concerned, you’d probably guess that you could use the sbb instruction to synthesize an extended precision comparison; however, you’d only be partly right. There is, however, a better way. Consider the two unsigned values 2157h and 1293h. The L.O. bytes of these two values do not affect the outcome of the comparison. Simply comparing 21h with 12h tells us that the first value is greater than the second. In fact, the only time you ever need to look at both bytes of these values is if the H.O. bytes are equal. In all other cases comparing the H.O. bytes tells you everything you need to know about the values. Of course, this is true for any number of bytes, not just two. The following code compares two signed 64 bit integers on an 80386 or later processor: ; ; ; ; ; ;
This sequence transfers control to location “IsGreater” if QwordValue > QwordValue2. It transfers control to “IsLess” if QwordValue < QwordValue2. It falls though to the instruction following this sequence if QwordValue = QwordValue2. To test for inequality, change the “IsGreater” and “IsLess” operands to “NotEqual” in this code. mov cmp jg jl mov cmp jg jl
eax, dword eax, dword IsGreater IsLess eax, dword eax, dword IsGreater IsLess
ptr QWordValue+4 ptr QWordValue2+4
;Get H.O. dword
ptr QWordValue ptr QWordValue2
Page 473
Chapter 09 To compare unsigned values, simply use the ja and jb instructions in place of jg and jl. You can easily synthesize any possible comparison from the sequence above, the following examples show how to do this. These examples do signed comparisons, substitute ja, jae, jb, and jbe for jg, jge, jl, and jle (respectively) to do unsigned comparisons. QW1 QW2
qword qword
? ?
dp
textequ
; 64 bit test to see if QW1 < QW2 (signed). ; Control transfers to “IsLess” label if QW1 < QW2. Control falls ; through to the next statement if this is not true. mov cmp jg jl mov cmp jl
eax, dp eax, dp NotLess IsLess eax, dp eax, dp IsLess
QW1+4 QW2+4
;Get H.O. dword
QW1 QW2
;Fall through to here if H.O. ; dwords are equal.
NotLess: ; 64 bit test to see if QW1 <= QW2 (signed). mov cmp jg jl mov cmp jle
eax, dp QW1+4 eax, dp QW2+4 NotLessEq IsLessEq eax, dp QW1 eax, dword ptr QW2 IsLessEq
;Get H.O. dword
NotLessEQ: ; 64 bit test to see if QW1 >QW2 (signed). mov cmp jg jl mov cmp jg
eax, dp eax, dp IsGtr NotGtr eax, dp eax, dp IsGtr
QW1+4 QW2+4
;Get H.O. dword
QW1 QW2
;Fall through to here if H.O. ; dwords are equal.
NotGtr: ; 64 bit test to see if QW1 >= QW2 (signed). mov cmp jg jl mov cmp jge
eax, dp QW1+4 eax, dp QW2+4 IsGtrEq NotGtrEq eax, dp QW1 eax, dword ptr QW2 IsGtrEq
;Get H.O. dword
NotGtrEq: ; 64 bit test to see if QW1 = QW2 (signed or unsigned). This code branches ; to the label “IsEqual” if QW1 = QW2. It falls through to the next instruction ; if they are not equal. mov cmp jne mov cmp je NotEqual:
Page 474
eax, dp QW1+4 eax, dp QW2+4 NotEqual eax, dp QW1 eax, dword ptr QW2 IsEqual
;Get H.O. dword
Arithmetic and Logical Operations ; 64 bit test to see if QW1 <> QW2 (signed or unsigned). This code branches ; to the label “NotEqual” if QW1 <> QW2. It falls through to the next ; instruction if they are equal. mov cmp jne mov cmp jne
9.3.4
eax, dp QW1+4 eax, dp QW2+4 NotEqual eax, dp QW1 eax, dword ptr QW2 NotEqual
;Get H.O. dword
Extended Precision Multiplication Although a 16x16 or 32x32 multiply is usually sufficient, there are times when you may want to multiply larger values together. You will use the 80x86 single operand mul and imul instructions for extended precision multiplication. Not surprisingly (in view of how adc and sbb work), you use the same techniques to perform extended precision multiplication on the 80x86 that you employ when manually multiplying two values. Consider a simplified form of the way you perform multi-digit multiplication by hand: 1) Multiply the first two digits together (5*3):
2) Multiply 5*2:
123 45 --15
123 45 --15 10
3) Multiply 5*1:
4) 4*3:
123 45 --15 10 5
5) Multiply 4*2:
123 45 --15 10 5 12 6) 4*1:
123 45 --15 10 5 12 8
123 45 --15 10 5 12 8 4
7) Add all the partial products together: 123 45 --15 10 5 12 8 4 -----5535
Page 475
Chapter 09
1) Multiply the L.O. words
A C
2) Multiply D * A
B D
A C
D*B
B D D*B
D*A 3) Multiply C times B
A C
4) Multiply C * A
B D
A C
D*B
B D D*B
D*A C*B
D*A C*B C*A 5) Compute sum of partial products
A C
B D D*B
D*A C*B C*A AB * CB Figure 8.2 Multiprecision Multiplication The 80x86 does extended precision multiplication in the same manner except that it works with bytes, words, and double words rather than digits. Figure 8.2 shows how this works. Probably the most important thing to remember when performing an extended precision multiplication is that you must also perform a multiple precision addition at the same time. Adding up all the partial products requires several additions that will produce the result. The following listing demonstrates the proper way to multiply two 32 bit values on a 16 bit processor: Note: Multiplier and Multiplicand are 32 bit variables declared in the data segment via the dword directive. Product is a 64 bit variable declared in the data segment via the qword directive. Page 476
Arithmetic and Logical Operations Multiply
proc push push push push
near ax dx cx bx
; Multiply the L.O. word of Multiplier times Multiplicand: mov mov mul mov mov
ax, word ptr Multiplier bx, ax word ptr Multiplicand word ptr Product, ax cx, dx
;Save Multiplier val ;Multiply L.O. words ;Save partial product ;Save H.O. word
mov mul add adc mov mov
ax, bx word ptr Multiplicand+2 ax, cx dx, 0 bx, ax cx, dx
;Get Multiplier in BX ;Multiply L.O. * H.O. ;Add partial product ;Don’t forget carry! ;Save partial product ; for now.
; Multiply the H.O. word of Multiplier times Multiplicand:
Multiply
mov mul add mov adc
ax, word ptr Multiplier+2 word ptr Multiplicand ax, bx word ptr product+2, ax cx, dx
;Get H.O. Multiplier ;Times L.O. word ;Add partial product ;Save partial product ;Add in carry/H.O.!
mov mul add adc mov mov
ax, word word ptr ax, cx dx, 0 word ptr word ptr
;Multiply the H.O. ; words together. ;Add partial product ;Don’t forget carry! ;Save partial product
pop pop pop pop ret endp
bx cx dx ax
ptr Multiplier+2 Multiplicand+2
Product+4, ax Product+6, dx
One thing you must keep in mind concerning this code, it only works for unsigned operands. Multiplication of signed operands appears in the exercises.
9.3.5
Extended Precision Division You cannot synthesize a general n-bit/m-bit division operation using the div and idiv instructions. Such an operation must be performed using a sequence of shift and subtract instructions. Such an operation is extremely messy. A less general operation, dividing an n bit quantity by a 32 bit (on the 80386 or later) or 16 bit quantity is easily synthesized using the div instruction. The following code demonstrates how to divide a 64 bit quantity by a 16 bit divisor, producing a 64 bit quotient and a 16 bit remainder: dseg dividend divisor Quotient Modulo dseg
segment dword word dword word ends
para public ‘DATA’ 0FFFFFFFFh, 12345678h 16 0,0 0
cseg
segment assume
para public ‘CODE’ cs:cseg, ds:dseg
; Divide a 64 bit quantity by a 16 bit quantity: Divide64
proc
near
mov sub
ax, word ptr dividend+6 dx, dx
Page 477
Chapter 09
Divide64 cseg
div mov mov div mov mov div mov mov div mov mov ret endp ends
divisor word ptr Quotient+6, ax ax, word ptr dividend+4 divisor word ptr Quotient+4, ax ax, word ptr dividend+2 divisor word ptr Quotient+2, ax ax, word ptr dividend divisor word ptr Quotient, ax Modulo, dx
This code can be extended to any number of bits by simply adding additional mov / div / mov instructions at the beginning of the sequence. Of course, on the 80386 and later processors you can divide by a 32 bit value by using edx and eax in the above sequence (with a few other appropriate adjustments). If you need to use a divisor larger than 16 bits (32 bits on an 80386 or later), you’re going to have to implement the division using a shift and subtract strategy. Unfortunately, such algorithms are very slow. In this section we’ll develop two division algorithms that operate on an arbitrary number of bits. The first is slow but easier to understand, the second is quite a bit faster (in general). As for multiplication, the best way to understand how the computer performs division is to study how you were taught to perform long division by hand. Consider the operation 3456/12 and the steps you would take to manually perform this operation: 12 3456 24
2 12 3456 24 105 96
28 12 3456 24 105 96 96 96
(1) 12 goes into 34 two times.
(3) 12 goes into 105 eight times.
(5) 12 goes into 96 exactly eight times.
2 12 3456 24 105
28 12 3456 24 105 96 96 288 12 3456 24 105 96 96 96
(2) Subtract 24 from 35 and drop down the 105.
(4) Subtract 96 from 105 and drop down the 96.
(6) Therefore, 12 goes into 3456 exactly 288 times.
This algorithm is actually easier in binary since at each step you do not have to guess how many times 12 goes into the remainder nor do you have to multiply 12 by your guess to obtain the amount to subtract. At each step in the binary algorithm the divisor goes into the remainder exactly zero or one times. As an example, consider the division of 27 (11011) by three (11):
11 11011 11
Page 478
11 goes into 11 one time.
Arithmetic and Logical Operations
1 11 11011 11 00
Subtract out the 11 and bring down the zero.
1 11 11011 11 00 00
11 goes into 00 zero times.
10 11 11011 11 00 00 01
Subtract out the zero and bring down the one.
10 11 11011 11 00 00 01 00
11 goes into 01 zero times.
100 11 11011 11 00 00 01 00 11
Subtract out the zero and bring down the one.
100 11 11011 11 00 00 01 00 11 11
11 goes into 11 one time.
1001 11 11011 11 00 00 01 00 11 11 00
This produces the final result of 1001.
Page 479
Chapter 09 There is a novel way to implement this binary division algorithm that computes the quotient and the remainder at the same time. The algorithm is the following: Quotient := Dividend; Remainder := 0; for i:= 1 to NumberBits do Remainder:Quotient := Remainder:Quotient SHL 1; if Remainder >= Divisor then Remainder := Remainder - Divisor; Quotient := Quotient + 1; endif endfor NumberBits is the number of bits in the Remainder, Quotient, Divisor, and Dividend variables. Note that the Quotient := Quotient + 1 statement sets the L.O. bit of Quotient to one since this algorithm previously shifts Quotient one bit to the left. The 80x86 code to implement this
algorithm is ; Assume Dividend (and Quotient) is DX:AX, Divisor is in CX:BX, ; and Remainder is in SI:DI.
BitLoop:
GoesInto:
TryNext:
mov sub sub shl rcl rcl rcl cmp ja jb cmp jb
bp, 32 si, si di, di ax, 1 dx, 1 di, 1 si, 1 si, cx GoesInto TryNext di, bx TryNext
sub sbb inc dec jne
di, bx si, cx ax bp BitLoop
;Count off 32 bits in BP ;Set remainder to zero ;See the section on shifts ; that describes how this ; 64 bit SHL operation works ;Compare H.O. words of Rem, ; Divisor. ;Compare L.O. words. ;Remainder := Remainder ; Divisor ;Set L.O. bit of AX ;Repeat 32 times.
This code looks short and simple, but there are a few problems with it. First, it does not check for division by zero (it will produce the value 0FFFFFFFFh if you attempt to divide by zero), it only handles unsigned values, and it is very slow. Handling division by zero is very simple, just check the divisor against zero prior to running this code and return an appropriate error code if the divisor is zero. Dealing with signed values is equally simple, you’ll see how to do that in a little bit. The performance of this algorithm, however, leaves a lot to be desired. Assuming one pass through the loop takes about 30 clock cycles2, this algorithm would require almost 1,000 clock cycles to complete! That’s an order of magnitude worse than the DIV/IDIV instructions on the 80x86 that are among the slowest instructions on the 80x86. There is a technique you can use to boost the performance of this division by a fair amount: check to see if the divisor variable really uses 32 bits. Often, even though the divisor is a 32 bit variable, the value itself fits just fine into 16 bits (i.e., the H.O. word of Divisor is zero). In this special case, that occurs frequently, you can use the DIV instruction which is much faster.
9.3.6
Extended Precision NEG Operations Although there are several ways to negate an extended precision value, the shortest way is to use a combination of neg and sbb instructions. This technique uses the fact that neg subtracts its operand from zero. In particular, it sets the flags the same way the sub
2. This will vary depending upon your choice of processor.
Page 480
Arithmetic and Logical Operations instruction would if you subtracted the destination value from zero. This code takes the following form: neg neg sbb
dx ax dx,0
The sbb instruction decrements dx if there is a borrow out of the L.O. word of the negation operation (which always occurs unless ax is zero). To extend this operation to additional bytes, words, or double words is easy; all you have to do is start with the H.O. memory location of the object you want to negate and work towards the L.O. byte. The following code computes a 128 bit negation on the 80386 processor: Value
dword . . . neg neg sbb neg sbb sbb neg sbb sbb sbb
0,0,0,0
;128 bit integer.
Value+12 Value+8 Value+12, 0 Value+4 Value+8, 0 Value+12, 0 Value Value+4, 0 Value+8, 0 Value+12, 0
;Neg H.O. dword ;Neg previous dword in memory. ;Adjust H.O. dword ;Neg the second dword in object. ;Adjust 3rd dword in object. ;Carry any borrow through H.O. word. ;Negate L.O. word. ;Adjust 2nd dword in object. ;Adjust 3rd dword in object. ;Carry any borrow through H.O. word.
Unfortunately, this code tends to get really large and slow since you need to propogate the carry through all the H.O. words after each negate operation. A simpler way to negate larger values is to simply subract that value from zero: Value
9.3.7
dword . . . mov sub mov mov sbb mov mov sbb mov mov sbb mov mov sbb mov
0,0,0,0,0
;160 bit integer.
eax, 0 eax, Value Value, eax eax, 0 eax, Value+4 Value+8, ax eax, 0 eax, Value+8 Value+8, ax eax, 0 eax, Value+12 Value+12, ax eax, 0 eax, Value+16 Value+16, ax
Extended Precision AND Operations Performing an n-word and operation is very easy – simply and the corresponding words between the two operands, saving the result. For example, to perform the and operation where all three operands are 32 bits long, you could use the following code: mov and mov mov and mov
ax, word ax, word word ptr ax, word ax, word word ptr
ptr source1 ptr source2 dest, ax ptr source1+2 ptr source2+2 dest+2, ax
Page 481
Chapter 09 This technique easily extends to any number of words, all you need to is logically and the corresponding bytes, words, or double words in the corresponding operands.
9.3.8
Extended Precision OR Operations Multi-word logical or operations are performed in the same way as multi-word and operations. You simply or the corresponding words in the two operand together. For example, to logically or two 48 bit values, use the following code: mov or mov mov or mov mov or mov
9.3.9
ax, word ax, word word ptr ax, word ax, word word ptr ax, word ax, word word ptr
ptr operand1 ptr operand2 operand3, ax ptr operand1+2 ptr operand2+2 operand3+2, ax ptr operand1+4 ptr operand2+4 operand3+4, ax
Extended Precision XOR Operations Extended precision xor operations are performed in a manner identical to and/or – simply xor the corresponding words in the two operands to obtain the extended precision result. The following code sequence operates on two 64 bit operands, computes their exclusive-or, and stores the result into a 64 bit variable. This example uses the 32 bit registers available on 80386 and later processors. mov xor mov mov xor mov
9.3.10
eax, dword ptr operand1 eax, dword ptr operand2 dword ptr operand3, eax eax, dword ptr operand1+4 eax, dword ptr operand2+4 dword ptr operand3+4, eax
Extended Precision NOT Operations The not instruction inverts all the bits in the specified operand. It does not affect any flags (therefore, using a conditional jump after a not instruction has no meaning). An extended precision not is performed by simply executing the not instruction on all the affected operands. For example, to perform a 32 bit not operation on the value in (dx:ax), all you need to do is execute the instructions: not not
ax dx
or
not not
dx ax
Keep in mind that if you execute the not instruction twice, you wind up with the original value. Also note that exclusive-oring a value with all ones (0FFh, 0FFFFh, or 0FF..FFh) performs the same operation as the not instruction.
9.3.11
Extended Precision Shift Operations Extended precision shift operations require a shift and a rotate instruction. Consider what must happen to implement a 32 bit shl using 16 bit operations:
Page 482
1)
A zero must be shifted into bit zero.
2)
Bits zero through 14 are shifted into the next higher bit.
3)
Bit 15 is shifted into bit 16.
Arithmetic and Logical Operations
15
4
3
2
1
0
... 31
20 19 18 17 16
...
C
Figure 8.3 32-bit Shift Left Operation 4)
Bits 16 through 30 must be shifted into the next higher bit.
5)
Bit 31 is shifted into the carry flag.
The two instructions you can use to implement this 32 bit shift are shl and rcl. For example, to shift the 32 bit quantity in (dx:ax) one position to the left, you’d use the instructions: shl rcl
ax, 1 dx, 1
Note that you can only shift an extended precision value one bit at a time. You cannot shift an extended precision operand several bits using the cl register or an immediate value greater than one as the count using this technique To understand how this instruction sequence works, consider the operation of these instructions on an individual basis. The shl instruction shifts a zero into bit zero of the 32 bit operand and shifts bit 15 into the carry flag. The rcl instruction then shifts the carry flag into bit 16 and then shifts bit 31 into the carry flag. The result is exactly what we want. To perform a shift left on an operand larger than 32 bits you simply add additional rcl instructions. An extended precision shift left operation always starts with the least significant word and each succeeding rcl instruction operates on the next most significant word. For example, to perform a 48 bit shift left operation on a memory location you could use the following instructions: shl rcl rcl
word ptr Operand, 1 word ptr Operand+2, 1 word ptr Operand+4, 1
If you need to shift your data by two or more bits, you can either repeat the above sequence the desired number of times (for a constant number of shifts) or you can place the instructions in a loop to repeat them some number of times. For example, the following code shifts the 48 bit value Operand to the left the number of bits specified in cx: ShiftLoop:
shl rcl rcl loop
word ptr Operand, 1 word ptr Operand+2, 1 word ptr Operand+4, 1 ShiftLoop
You implement shr and sar in a similar way, except you must start at the H.O. word of the operand and work your way down to the L.O. word: DblSAR:
sar word ptr Operand+4, 1 rcr word ptr Operand+2, 1 rcr word ptr Operand, 1
DblSHR:
shr word ptr Operand+4, 1 rcr word ptr Operand+2, 1 rcr word ptr Operand, 1
There is one major difference between the extended precision shifts described here and their 8/16 bit counterparts – the extended precision shifts set the flags differently than
Page 483
Chapter 09 the single precision operations. For example, the zero flag is set if the last rotate instruction produced a zero result, not if the entire shift operation produced a zero result. For the shift right operations, the overflow, and sign flags aren’t set properly (they are set properly for the left shift operation). Additional testing will be required if you need to test one of these flags after an extended precision shift operation. Fortunately, the carry flag is the flag most often tested after a shift operation and the extended precision shift instructions properly set this flag. The shld and shrd instructions let you efficiently implement multiprecision shifts of several bits on 80386 and later processors. Consider the following code sequence: ShiftMe
dword . . . mov shld mov shld shl
1234h, 5678h, 9012h
eax, ShiftMe+4 ShiftMe+8, eax, 6 eax, ShiftMe ShiftMe+4, eax, 6 ShiftMe, 6
Recall that the shld instruction shifts bits from its second operand into its first operand. Therefore, the first shld instruction above shifts the bits from ShiftMe+4 into ShiftMe+8 without affecting the value in ShiftMe+4. The second shld instruction shifts the bits from ShiftMe into ShiftMe+4. Finally, the shl instruction shifts the L.O. double word the appropriate amount. There are two important things to note about this code. First, unlike the other extended precision shift left operations, this sequence works from the H.O. double word down to the L.O. double word. Second, the carry flag does not contain the carry out of the H.O. shift operation. If you need to preserve the carry flag at that point, you will need to push the flags after the first shld instruction and pop the flags after the shl instruction. You can do an extended precision shift right operation using the shrd instruction. It works almost the same way as the code sequence above except you work from the L.O. double word to the H.O. double word. The solution is left as an exercise at the end of this chapter.
9.3.12
Extended Precision Rotate Operations The rcl and rcr operations extend in a manner almost identical to that for shl and shr . For example, to perform 48 bit rcl and rcr operations, use the following instructions: rcl rcl rcl
word ptr operand,1 word ptr operand+2, 1 word ptr operand+4, 1
rcr rcr rcr
word ptr operand+4, 1 word ptr operand+2, 1 word ptr operand, 1
The only difference between this code and the code for the extended precision shift operations is that the first instruction is a rcl or rcr rather than a shl or shr instruction. Performing an extended precision rol or ror instruction isn’t quite as simple an operation. The 8086 extended precision versions of these instructions appear in the exercises. On the 80386 and later processors, you can use the bt, shld, and shrd instructions to easily implement an extended precision rol or ror instruction. The following code shows how to use the shld instruction to do an extended precision rol: ; Compute ROL EDX:EAX, 4 mov shld shld bt
Page 484
ebx, edx, eax, eax,
edx eax, 4 ebx, 4 0
;Set carry flag, if desired.
Arithmetic and Logical Operations An extended precision ror instruction is similar; just keep in mind that you work on the L.O. end of the object first and the H.O. end last.
9.4
Operating on Different Sized Operands Occasionally you may need to compute some value on a pair of operands that are not the same size. For example, you may need to add a word and a double word together or subtract a byte value from a word value. The solution is simple: just extend the smaller operand to the size of the larger operand and then do the operation on two similarly sized operands. For signed operands, you would sign extend the smaller operand to the same size as the larger operand; for unsigned values, you zero extend the smaller operand. This works for any operation, although the following examples demonstrate this for the addition operation. To extend the smaller operand to the size of the larger operand, use a sign extension or zero extension operation (depending upon whether you’re adding signed or unsigned values). Once you’ve extended the smaller value to the size of the larger, the addition can proceed. Consider the following code that adds a byte value to a word value: var1 var2
byte word
? ?
Unsigned addition:
Signed addition:
mov mov add
mov cbw add
al, var1 ah, 0 ax, var2
al, var1 ax, var2
In both cases, the byte variable was loaded into the al register, extended to 16 bits, and then added to the word operand. This code works out really well if you can choose the order of the operations (e.g., adding the eight bit value to the sixteen bit value). Sometimes, you cannot specify the order of the operations. Perhaps the sixteen bit value is already in the ax register and you want to add an eight bit value to it. For unsigned addition, you could use the following code: mov . . add adc
ax, var2
al, var1 ah, 0
;Load 16 bit value into AX ;Do some other operations leaving ; a 16 bit quantity in AX. ;Add in the 8 bit value. ;Add carry into the H.O. word.
The first add instruction in this example adds the byte at var1 to the L.O. byte of the value in the accumulator. The adc instruction above adds the carry out of the L.O. byte into the H.O. byte of the accumulator. Care must be taken to ensure that this adc instruction is present. If you leave it out, you may not get the correct result. Adding an eight bit signed operand to a sixteen bit signed value is a little more difficult. Unfortunately, you cannot add an immediate value (as above) to the H.O. word of ax. This is because the H.O. extension byte can be either 00h or 0FFh. If a register is available, the best thing to do is the following: mov mov cbw add
bx, ax al, var1
;BX is the available register.
ax, bx
If an extra register is not available, you might try the following code:
add0: addedFF:
add cmp jge adc jmp adc
al, var1 var1, 0 add0 ah, 0FFh addedFF ah, 0
Page 485
Chapter 09 Of course, if another register isn’t available, you could always push one onto the stack and save it while you’re performing the operation, e.g., push mov mov cbw add pop
bx bx, ax al, var1 ax, bx bx
Another alternative is to store the 16 bit value in the accumulator into a memory location and then proceed as before: mov mov cbw add
temp, ax al, var1 ax, temp
All the examples above added a byte value to a word value. By zero or sign extending the smaller operand to the size of the larger operand, you can easily add any two different sized variables together. Consider the following code that adds a signed byte operand to a signed double word: var1 var2
byte dword
? ?
mov cbw cwd add adc
al, var1 ;Extend to 32 bits in DX ax, word ptr var2 dx, word ptr var2+2
Of course, if you have an 80386 or later processor, you could use the following code: movsx add
eax, var1 eax, var2
An example more applicable to the 80386 is adding an eight bit value to a quadword (64 bit) value, consider the following code: BVal QVal
byte qword
-1
movsx cdq add adc
eax, BVal
1
eax, dword ptr QVal edx, dword ptr QVal+4
For additional examples, see the exercises at the end of this chapter.
9.5
Machine and Arithmetic Idioms An idiom is an idiosyncrasy. Several arithmetic operations and 80x86 instructions have idiosyncracies that you can take advantage of when writing assembly language code. Some people refer to the use of machine and arithmetic idioms as “tricky programming” that you should always avoid in well written programs. While it is wise to avoid tricks just for the sake of tricks, many machine and arithmetic idioms are well-known and commonly found in assembly language programs. Some of them can be really tricky, but a good number of them are simply “tricks of the trade.” This text cannot even begin to present all of the idioms in common use today; they are too numerous and the list is constantly changing. Nevertheless, there are some very important idioms that you will see all the time, so it makes sense to discuss those.
Page 486
Arithmetic and Logical Operations
9.5.1
Multiplying Without MUL and IMUL If you take a quick look at the timing for the multiply instruction, you’ll notice that the execution time for this instruction is rather long. Only the div and idiv instructions take longer on the 8086. When multiplying by a constant, you can avoid the performance penalty of the mul and imul instructions by using shifts, additions, and subtractions to perform the multiplication. Remember, a shl operation performs the same operation as multiplying the specified operand by two. Shifting to the left two bit positions multiplies the operand by four. Shifting to the left three bit positions multiplies the operand by eight. In general, shifting an operand to the left n bits multiplies it by 2n. Any value can be multiplied by some constant using a series of shifts and adds or shifts and subtractions. For example, to multiply the ax register by ten, you need only multiply it by eight and then add in two times the original value. That is, 10*ax = 8*ax + 2*ax. The code to accomplish this is shl mov shl shl add
ax, bx, ax, ax, ax,
1 ax 1 1 bx
;Multiply AX by two ;Save 2*AX for later ;Multiply AX by four ;Multiply AX by eight ;Add in 2*AX to get 10*AX
The ax register (or just about any register, for that matter) can be multiplied by most constant values much faster using shl than by using the mul instruction. This may seem hard to believe since it only takes two instructions to compute this product: mov mul
bx, 10 bx
However, if you look at the timings, the shift and add example above requires fewer clock cycles on most processors in the 80x86 family than the mul instruction. Of course, the code is somewhat larger (by a few bytes), but the performance improvement is usually worth it. Of course, on the later 80x86 processors, the mul instruction is quite a bit faster than the earlier processors, but the shift and add scheme is generally faster on these processors as well. You can also use subtraction with shifts to perform a multiplication operation. Consider the following multiplication by seven: mov shl shl shl sub
bx, ax, ax, ax, ax,
ax 1 1 1 bx
;Save AX*1 ;AX := AX*2 ;AX := AX*4 ;AX := AX*8 ;AX := AX*7
This follows directly from the fact that ax*7 = (ax*8)-ax. A common error made by beginning assembly language students is subtracting or adding one or two rather than ax*1 or ax*2. The following does not compute ax*7: shl shl shl sub
ax, ax, ax, ax,
1 1 1 1
It computes (8*ax)-1, something entirely different (unless, of course, ax = 1). Beware of this pitfall when using shifts, additions, and subtractions to perform multiplication operations. You can also use the lea instruction to compute certain products on 80386 and later processors. The trick is to use the 80386 scaled index mode. The following examples demonstrate some simple cases: lea lea lea lea lea
eax, eax, eax, eax, eax,
[ecx][ecx] [eax]eax*2] [eax*4] [ebx][ebx*4] [eax*8]
;EAX ;EAX ;EAX ;EAX ;EAX
:= := := := :=
ECX EAX EAX EBX EAX
* * * * *
2 3 4 5 8
Page 487
Chapter 09 lea
9.5.2
eax, [edx][edx*8]
;EAX := EDX * 9
Division Without DIV and IDIV Much as the shl instruction can be used for simulating a multiplication by some power of two, the shr and sar instructions can be used to simulate a division by a power of two. Unfortunately, you cannot use shifts, additions, and subtractions to perform a division by an arbitrary constant as easily as you can use these instructions to perform a multiplication operation. Another way to perform division is to use the multiply instructions. You can divide by some value by multiplying by its reciprocal. The multiply instruction is marginally faster than the divide instruction; multiplying by a reciprocal is almost always faster than division. Now you’re probably wondering “how does one multiply by a reciprocal when the values we’re dealing with are all integers?” The answer, of course, is that we must cheat to do this. If you want to multiply by one tenth, there is no way you can load the value 1/10th into an 80x86 register prior to performing the division. However, we could multiply 1/10th by 10, perform the multiplication, and then divide the result by ten to get the final result. Of course, this wouldn’t buy you anything at all, in fact it would make things worse since you’re now doing a multiplication by ten as well as a division by ten. However, suppose you multiply 1/10th by 65,536 (6553), perform the multiplication, and then divide by 65,536. This would still perform the correct operation and, as it turns out, if you set up the problem correctly, you can get the division operation for free. Consider the following code that divides ax by ten: mov mul
dx, 6554 dx
;Round (65,536/10)
This code leaves ax/10 in the dx register. To understand how this works, consider what happens when you multiply ax by 65,536 (10000h). This simply moves ax into dx and sets ax to zero. Multiplying by 6,554 (65,536 divided by ten) puts ax divided by ten into the dx register. Since mul is marginally faster than div , this technique runs a little faster than using a straight division. Multiplying by the reciprocal works well when you need to divide by a constant. You could even use it to divide by a variable, but the overhead to compute the reciprocal only pays off if you perform the division many, many times (by the same value).
9.5.3
Using AND to Compute Remainders The and instruction can be used to quickly compute remainders of the form: dest := dest MOD 2n
To compute a remainder using the and instruction, simply and the operand with the value 2n-1. For example, to compute ax = ax mod 8 simply use the instruction: and
ax, 7
Additional examples: and and and and and mov
Page 488
ax, ax, ax, ax, ax, ah,
3 0Fh 1Fh 3Fh 7Fh 0
;AX := AX mod ;AX := AX mod ;AX := AX mod ;AX := AX mod ;AX := AX mod ;AX := AX mod ; (Same as ax
4 16 32 64 128 256 and 0FFh)
Arithmetic and Logical Operations
9.5.4
Implementing Modulo-n Counters with AND If you want to implement a counter variable that counts up to 2n-1 and then resets to zero, simply using the following code: inc and
CounterVar CounterVar, nBits
where nBits is a binary value containing n one bits right justified in the number. For example, to create a counter that cycles between zero and fifteen, you could use the following: inc and
9.5.5
CounterVar CounterVar, 00001111b
Testing an Extended Precision Value for 0FFFF..FFh The and instruction can be used to quickly check a multi-word value to see if it contains ones in all its bit positions. Simply load the first word into the ax register and then logically and the ax register with all the remaining words in the data structure. When the and operation is complete, the ax register will contain 0FFFFh if and only if all the words in that structure contained 0FFFFh. E.g., mov and and
ax, word ptr var ax, word ptr var+2 ax, word ptr var+4
. . .
and cmp je
9.5.6
ax, word ptr var+n ax, 0FFFFh Is0FFFFh
TEST Operations Remember, the test instruction is an and instruction that doesn’t retain the results of the and operation (other than the flag settings). Therefore, many of the comments concerning the and operation (particularly with respect to the way it affects the flags) also hold for the test instruction. However, since the test instruction doesn’t affect the destination operand, multiple bit tests may be performed on the same value. Consider the following code: test jnz test jnz test jnz etc.
ax, 1 Bit0 ax, 2 Bit1 ax, 4 Bit3
This code can be used to successively test each bit in the ax register (or any other operand for that matter). Note that you cannot use the test/cmp instruction pair to test for a specific value within a string of bits (as you can with the and/cmp instructions). Since test doesn’t strip out any unwanted bits, the cmp instruction would actually be comparing the original value rather than the stripped value. For this reason, you’ll normally use the test instruction to see if a single bit is set or if one or more bits out of a group of bits are set. Of course, if you have an 80386 or later processor, you can also use the bt instruction to test individual bits in an operand. Another important use of the test instruction is to efficiently compare a register against zero. The following test instruction sets the zero flag if and only if ax contains zero (anything anded with itself produces its original value; this sets the zero flag only if that value is zero): test ax, ax
Page 489
Chapter 09 The test instruction is shorter than cmp
ax, 0
cmp
eax, 0
or though it is no better than cmp
al, 0
Note that you can use the and and or instructions to test for zero in a fashion identical to test. However, on pipelined processors like the 80486 and Pentium chips, the test instruction is less likely to create a hazard since it does not store a result back into its destination register.
9.5.7
Testing Signs with the XOR Instruction Remember the pain associated with a multi-precision signed multiplication operation? You need to determine the sign of the result, take the absolute value of the operands, multiply them, and then adjust the sign of the result as determined before the multiplication operation. The sign of the product of two numbers is simply the exclusive-or of their signs before performing the multiplication. Therefore, you can use the xor instruction to determine the sign of the product of two extended precision numbers. E.g., 32x32 Multiply: mov xor mov
al, byte ptr Oprnd1+3 al, byte ptr Oprnd2+3 cl, al
;Save sign
; Do the multiplication here (don’t forget to take the absolute ; value of the two operands before performing the multiply). . . .
; Now fix the sign. cmp jns
cl, 0 ResultIsPos
;Check sign bit
; Negate the product here. . . .
ResultIsPos:
9.6
Masking Operations A mask is a value used to force certain bits to zero or one within some other value. A mask typically affects certain bits in an operand (forcing them to zero or one) and leaves other bits unaffected. The appropriate use of masks allows you to extract bits from a value, insert bits into a value, and pack or unpacked a packed data type. The following sections describe these operations in detail.
9.6.1
Masking Operations with the AND Instruction If you’ll take a look at the truth table for the and operation back in Chapter One, you’ll note that if you fix either operand at zero the result is always zero. If you set that operand to one, the result is always the value of the other operand. We can use this property of the and instruction to selectively force certain bits to zero in a value without affecting other bits. This is called masking out bits.
Page 490
Arithmetic and Logical Operations As an example, consider the ASCII codes for the digits “0”..”9”. Their codes fall in the range 30h..39h respectively. To convert an ASCII digit to its corresponding numeric value, you must subtract 30h from the ASCII code. This is easily accomplished by logically anding the value with 0Fh. This strips (sets to zero) all but the L.O. four bits producing the numeric value. You could have used the subtract instruction, but most people use the and instruction for this purpose.
9.6.2
Masking Operations with the OR Instruction Much as you can use the and instruction to force selected bits to zero, you can use the or instruction to force selected bits to one. This operation is called masking in bits. Remember the masking out operation described earlier with the and instruction? In that example we wanted to convert an ASCII code for a digit to its numeric equivalent. You can use the or instruction to reverse this process. That is, convert a numeric value in the range 0..9 to the ASCII code for the corresponding digit, i.e., ‘0’..’9’. To do this, logically or the specified numeric value with 30h.
9.7
Packing and Unpacking Data Types One of the primary uses of the shift and rotate instructions is packing and unpacking data. Byte and word data types are chosen more often than any other since the 80x86 supports these two data sizes with hardware. If you don’t need exactly eight or 16 bits, using a byte or word to hold your data might be wasteful. By packing data, you may be able to reduce memory requirements for your data by inserting two or more values into a single byte or word. The cost for this reduction in memory use is lower performance. It takes time to pack and unpack the data. Nevertheless, for applications that aren’t speed critical (or for those portions of the application that aren’t speed critical), the memory savings might justify the use of packed data. The data type that offers the most savings when using packing techniques is the boolean data type. To represent true or false requires a single bit. Therefore, up to eight different boolean values can be packed into a single byte. This represents an 8:1 compression ratio, therefore, a packed array of boolean values requires only one-eighth the space of an equivalent unpacked array (where each boolean variable consumes one byte). For example, the Pascal array B:packed array[0..31] of boolean;
requires only four bytes when packed one value per bit. When packed one value per byte, this array requires 32 bytes. Dealing with a packed boolean array requires two operations. You’ll need to insert a value into a packed variable (often called a packed field) and you’ll need to extract a value from a packed field. To insert a value into a packed boolean array, you must align the source bit with its position in the destination operand and then store that bit into the destination operand. You can do this with a sequence of and, or, and shift instructions. The first step is to mask out the corresponding bit in the destination operand. Use an and instruction for this. Then the source operand is shifted so that it is aligned with the destination position, finally the source operand is or’d into the destination operand. For example, if you want to insert bit zero of the ax register into bit five of the cx register, the following code could be used: and and ror shr shr or
cl, al, al, al, al, cl,
0DFh 1 1 1 1 al
;Clear bit five (the destination bit) ;Clear all AL bits except the src bit. ;Move to bit 7 ;Move to bit 6 ;move to bit 5
Page 491
Chapter 09
Val 1
Val 2
Va l 3
Va l 4
Val 5
Unused Figure 8.4 Packed Data This code is somewhat tricky. It rotates the data to the right rather than shifting it to the left since this requires fewer shifts and rotate instructions. To extract a boolean value, you simply reverse this process. First, you move the desired bit into bit zero and then mask out all the other bits. For example, to extract the data in bit five of the cx register leaving the single boolean value in bit zero of the ax register, you’d use the following code: mov shl shl rol and
al, al, al, al, ax,
cl 1 1 1 1
;Bit 5 ;Bit 6 ;Bit 7 ;Clear
to bit 6 to bit 7 to bit 0 all bits except 0
To test a boolean variable in a packed array you needn’t extract the bit and then test it, you can test it in place. For example, to test the value in bit five to see if it is zero or one, the following code could be used: test jnz
cl, 00100000b BitIsSet
Other types of packed data can be handled in a similar fashion except you need to work with two or more bits. For example, suppose you’ve packed five different three bit fields into a sixteen bit value as shown in Figure 8.4. If the ax register contains the data to pack into value3, you could use the following code to insert this data into field three: mov shr shr and and or
ah, al ax, 1 ax, 1 ax, 11100000b DATA, 0FE3Fh DATA, ax
;Do a shl by 8 ;Reposition down to bits 6..8 ;Strip undesired bits ;Set destination field to zero. ;Merge new data into field.
Extraction is handled in a similar fashion. First you strip the unneeded bits and then you justify the result: mov and shr shr shr shr shr shr
ax, ax, ax, ax, ax, ax, ax, ax,
DATA 1Ch 1 1 1 1 1 1
This code can be improved by using the following code sequence: mov shl shl mov and
ax, ax, ax, al, ax,
DATA 1 1 ah 07h
Additional uses for packed data will be explored throughout this book.
Page 492
Arithmetic and Logical Operations
9.8
Tables The term “table” has different meanings to different programmers. To most assembly language programmers, a table is nothing more than an array that is initialized with some data. The assembly language programmer often uses tables to compute complex or otherwise slow functions. Many very high level languages (e.g., SNOBOL4 and Icon) directly support a table data type. Tables in these languages are essentially arrays whose elements you can access with an non-integer value (e.g., floating point, string, or any other data type). In this section, we will adopt the assembly language programmer’s view of tables. A Table is an array containing preinitialized values that do not change during the execution of the program. A table can be compared to an array in the same way an integer constant can be compared to an integer variable. In assembly language, you can use tables for a variety of purposes: computing functions, controlling program flow, or simply “looking things up”. In general, tables provide a fast mechanism for performing some operation at the expense of some space in your program (the extra space holds the tabular data). In the following sections we’ll explore some of the many possible uses of tables in an assembly language program.
9.8.1
Function Computation via Table Look Up Tables can do all kinds of things in assembly language. In HLLs, like Pascal, it’s real easy to create a formula which computes some value. A simple looking arithmetic expression is equivalent to a considerable amount of 80x86 assembly language code. Assembly language programmers tend to compute many values via table look up rather than through the execution of some function. This has the advantage of being easier, and often more efficient as well. Consider the following Pascal statement: if (character >= ‘a’) and (character <= ‘z’) then character := chr(ord(character) - 32); This Pascal if statement converts the character variable character from lower case to upper case if character is in the range ‘a’..’z’. The 80x86 assembly language code that does the same thing is
NotLower:
mov cmp jb cmp ja and mov
al, character al, ‘a’ NotLower al, ‘z’ NotLower al, 05fh ;Same operation as SUB AL,32 character, al
Had you buried this code in a nested loop, you’d be hard pressed to improve the speed of this code without using a table look up. Using a table look up, however, allows you to reduce this sequence of instructions to just four instructions: mov lea xlat mov
al, character bx, CnvrtLower character, al
CnvrtLower is a 256-byte table which contains the values 0..60h at indices 0..60h, 41h..5Ah at indices 61h..7Ah, and 7Bh..0FFh at indices 7Bh..0FFh. Often, using this table look up facility will increase the speed of your code.
As the complexity of the function increases, the performance benefits of the table look up method increase dramatically. While you would almost never use a look up table to convert lower case to upper case, consider what happens if you want to swap cases: Via computation: mov cmp
al, character al, ‘a’
Page 493
Chapter 09
NotLower:
jb cmp ja and jmp
NotLower al, ‘z’ NotLower al, 05fh ConvertDone
cmp jb cmp ja or
al, ‘A’ ConvertDone al, ‘Z’ ConvertDone al, 20h
mov
character, al
ConvertDone:
The table look up code to compute this same function is: mov lea xlat mov
al, character bx, SwapUL character, al
As you can see, when computing a function via table look up, no matter what the function is, only the table changes, not the code doing the look up. Table look ups suffer from one major problem – functions computed via table look ups have a limited domain. The domain of a function is the set of possible input values (parameters) it will accept. For example, the upper/lower case conversion functions above have the 256-character ASCII character set as their domain. A function such as SIN or COS accepts the set of real numbers as possible input values. Clearly the domain for SIN and COS is much larger than for the upper/lower case conversion function. If you are going to do computations via table look up, you must limit the domain of a function to a small set. This is because each element in the domain of a function requires an entry in the look up table. You won’t find it very practical to implement a function via table look up whose domain the set of real numbers. Most look up tables are quite small, usually 10 to 128 entries. Rarely do look up tables grow beyond 1,000 entries. Most programmers don’t have the patience to create (and verify the correctness) of a 1,000 entry table. Another limitation of functions based on look up tables is that the elements in the domain of the function must be fairly contiguous. Table look ups take the input value for a function, use this input value as an index into the table, and return the value at that entry in the table. If you do not pass a function any values other than 0, 100, 1,000, and 10,000 it would seem an ideal candidate for implementation via table look up, its domain consists of only four items. However, the table would actually require 10,001 different elements due to the range of the input values. Therefore, you cannot efficiently create such a function via a table look up. Throughout this section on tables, we’ll assume that the domain of the function is a fairly contiguous set of values. The best functions that can be implemented via table look ups are those whose domain and range is always 0..255 (or some subset of this range). Such functions are efficiently implemented on the 80x86 via the XLAT instruction. The upper/lower case conversion routines presented earlier are good examples of such a function. Any function in this class (those whose domain and range take on the values 0..255) can be computed using the same two instructions (lea bx,table / xlat) above. The only thing that ever changes is the look up table. The xlat instruction cannot be (conveniently) used to compute a function value once the range or domain of the function takes on values outside 0..255. There are three situations to consider: • • •
The domain is outside 0..255 but the range is within 0..255, The domain is inside 0..255 but the range is outside 0..255, and Both the domain and range of the function take on values outside 0..255.
We will consider each of these cases separately. Page 494
Arithmetic and Logical Operations If the domain of a function is outside 0..255 but the range of the function falls within this set of values, our look up table will require more than 256 entries but we can represent each entry with a single byte. Therefore, the look up table can be an array of bytes. Next to look ups involving the xlat instruction, functions falling into this class are the most efficient. The following Pascal function invocation, B := Func(X);
where Func is function Func(X:word):byte;
consists of the following 80x86 code: mov mov mov
bx, X al, FuncTable [bx] B, al
This code loads the function parameter into bx, uses this value (in the range 0..??) as an index into the FuncTable table, fetches the byte at that location, and stores the result into B. Obviously, the table must contain a valid entry for each possible value of X. For example, suppose you wanted to map a cursor position on the video screen in the range 0..1999 (there are 2,000 character positions on an 80x25 video display) to its X or Y coordinate on the screen. You could easily compute the X coordinate via the function X:=Posn mod 80 and the Y coordinate with the formula Y:=Posn div 80 (where Posn is the cursor position on the screen). This can be easily computed using the 80x86 code: mov mov div
bl, 80 ax, Posn bx
; X is now in AH, Y is now in AL
However, the div instruction on the 80x86 is very slow. If you need to do this computation for every character you write to the screen, you will seriously degrade the speed of your video display code. The following code, which realizes these two functions via table look up, would improve the performance of your code considerably: mov mov mov
bx, Posn al, YCoord[bx] ah, XCoord[bx]
If the domain of a function is within 0..255 but the range is outside this set, the look up table will contain 256 or fewer entries but each entry will require two or more bytes. If both the range and domains of the function are outside 0..255, each entry will require two or more bytes and the table will contain more than 256 entries. Recall from Chapter Four the formula for indexing into a single dimensional array (of which a table is a special case): Address := Base + index * size
If elements in the range of the function require two bytes, then the index must be multiplied by two before indexing into the table. Likewise, if each entry requires three, four, or more bytes, the index must be multiplied by the size of each table entry before being used as an index into the table. For example, suppose you have a function, F(x), defined by the following (pseudo) Pascal declaration: function F(x:0..999):word;
You can easily create this function using the following 80x86 code (and, of course, the appropriate table): mov shl mov
bx, X bx, 1 ax, F[bx]
;Get function input value and ; convert to a word index into F.
Page 495
Chapter 09 The shl instruction multiplies the index by two, providing the proper index into a table whose elements are words. Any function whose domain is small and mostly contiguous is a good candidate for computation via table look up. In some cases, non-contiguous domains are acceptable as well, as long as the domain can be coerced into an appropriate set of values. Such operations are called conditioning and are the subject of the next section.
9.8.2
Domain Conditioning Domain conditioning is taking a set of values in the domain of a function and massaging them so that they are more acceptable as inputs to that function. Consider the following function: sin x = 〈 sin x|x ∈ [ – 2 π, 2π ] 〉
This says that the (computer) function SIN(x) is equivalent to the (mathematical) function sin x where -2π ≤ x ≤ 2π As we all know, sine is a circular function which will accept any real valued input. The formula used to compute sine, however, only accept a small set of these values. This range limitation doesn’t present any real problems, by simply computing SIN(X mod (2*pi)) we can compute the sine of any input value. Modifying an input value so
that we can easily compute a function is called conditioning the input. In the example above we computed X mod 2*pi and used the result as the input to the sin function. This truncates X to the domain sin needs without affecting the result. We can apply input conditioning can be applied to table look ups as well. In fact, scaling the index to handle word entries is a form of input conditioning. Consider the following Pascal function: function val(x:word):word; case x of 0: val := 1: val := 2: val := 3: val := 4: val := otherwise end; end;
begin 1; 1; 4; 27; 256; val := 0;
This function computes some value for x in the range 0..4 and it returns zero if x is outside this range. Since x can take on 65,536 different values (being a 16 bit word), creating a table containing 65,536 words where only the first five entries are non-zero seems to be quite wasteful. However, we can still compute this function using a table look up if we use input conditioning. The following assembly language code presents this principle: xor mov cmp ja shl mov
ax, ax bx, x bx, 4 ItsZero bx, 1 ax, val[bx]
;AX := 0, assume X > 4.
ItsZero:
This code checks to see if x is outside the range 0..4. If so, it manually sets ax to zero, otherwise it looks up the function value through the val table. With input conditioning, you can implement several functions that would otherwise be impractical to do via table look up.
Page 496
Arithmetic and Logical Operations
9.8.3
Generating Tables One big problem with using table look ups is creating the table in the first place. This is particularly true if there are a large number of entries in the table. Figuring out the data to place in the table, then laboriously entering the data, and, finally, checking that data to make sure it is valid, is a very time-staking and boring process. For many tables, there is no way around this process. For other tables there is a better way – use the computer to generate the table for you. An example is probably the best way to describe this. Consider the following modification to the sine function: ( r × ( 1000 × sin x ) ) ( sin x ) × r = 〈---------------------------------------------------|x ∈ [ 0, 3 1000 This states that x is an integer in the range 0..359 and r is an integer. The computer can
easily compute this with the following code: mov shl mov mov mul mov div
bx, bx, ax, bx, bx bx, bx
X 1 Sines [bx] ;Get SIN(X)*1000 R ;Compute R*(SIN(X)*1000) 1000
;Compute (R*(SIN(X)*1000))/1000
Note that integer multiplication and division are not associative. You cannot remove the multiplication by 1000 and the division by 1000 because they seem to cancel one another out. Furthermore, this code must compute this function in exactly this order. All that we need to complete this function is a table containing 360 different values corresponding to the sine of the angle (in degrees) times 1,000. Entering a table into an assembly language program containing such values is extremely boring and you’d probably make several mistakes entering and verifying this data. However, you can have the program generate this table for you. Consider the following Turbo Pascal program: program maketable; var i:integer; r:integer; f:text; begin assign(f,’sines.asm’); rewrite(f); for i := 0 to 359 do begin r := round(sin(I * 2.0 * pi / 360.0) * 1000.0); if (i mod 8) = 0 then begin writeln(f); write(f,’ dw ‘,r); end else write(f,’,’,r); end; close(f); end.
This program produces the following output: dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw
0,17,35,52,70,87,105,122 139,156,174,191,208,225,242,259 276,292,309,326,342,358,375,391 407,423,438,454,469,485,500,515 530,545,559,574,588,602,616,629 643,656,669,682,695,707,719,731 743,755,766,777,788,799,809,819 829,839,848,857,866,875,883,891 899,906,914,921,927,934,940,946 951,956,961,966,970,974,978,982 985,988,990,993,995,996,998,999 999,1000,1000,1000,999,999,998,996 995,993,990,988,985,982,978,974 970,966,961,956,951,946,940,934 927,921,914,906,899,891,883,875
Page 497
Chapter 09 dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw dw
866,857,848,839,829,819,809,799 788,777,766,755,743,731,719,707 695,682,669,656,643,629,616,602 588,574,559,545,530,515,500,485 469,454,438,423,407,391,375,358 342,326,309,292,276,259,242,225 208,191,174,156,139,122,105,87 70,52,35,17,0,-17,-35,-52 -70,-87,-105,-122,-139,-156,-174,-191 -208,-225,-242,-259,-276,-292,-309,-326 -342,-358,-375,-391,-407,-423,-438,-454 -469,-485,-500,-515,-530,-545,-559,-574 -588,-602,-616,-629,-643,-656,-669,-682 -695,-707,-719,-731,-743,-755,-766,-777 -788,-799,-809,-819,-829,-839,-848,-857 -866,-875,-883,-891,-899,-906,-914,-921 -927,-934,-940,-946,-951,-956,-961,-966 -970,-974,-978,-982,-985,-988,-990,-993 -995,-996,-998,-999,-999,-1000,-1000,-1000 -999,-999,-998,-996,-995,-993,-990,-988 -985,-982,-978,-974,-970,-966,-961,-956 -951,-946,-940,-934,-927,-921,-914,-906 -899,-891,-883,-875,-866,-857,-848,-839 -829,-819,-809,-799,-788,-777,-766,-755 -743,-731,-719,-707,-695,-682,-669,-656 -643,-629,-616,-602,-588,-574,-559,-545 -530,-515,-500,-485,-469,-454,-438,-423 -407,-391,-375,-358,-342,-326,-309,-292 -276,-259,-242,-225,-208,-191,-174,-156 -139,-122,-105,-87,-70,-52,-35,-17
Obviously it’s much easier to write the Turbo Pascal program that generated this data than to enter (and verify) this data by hand. This little example shows how useful Pascal can be to the assembly language programmer!
9.9
Sample Programs This chapter’s sample programs demonstrate several important concepts including extended precision arithmetic and logical operations, arithmetic expression evaluation, boolean expression evaluation, and packing/unpacking data.
9.9.1
Converting Arithmetic Expressions to Assembly Language The following sample program (Pgm9_1.asm on the companion CD-ROM) provides some examples of converting arithmetic expressions into assembly language: ; Pgm9_1.ASM ; ; Several examples demonstrating how to convert various ; arithmetic expressions into assembly language. .xlist include stdlib.a includelib stdlib.lib .list dseg
segment
para public 'data'
; Arbitrary variables this program uses. u v w x y
Page 498
word word word word word
? ? ? ? ?
Arithmetic and Logical Operations dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
; GETI-Reads an integer variable from the user and returns its ; its value in the AX register. geti _geti
textequ proc push push
es di
getsm atoi free
_geti
Main
pop pop ret endp
proc mov mov mov meminit
di es
ax, dseg ds, ax es, ax
print byte byte byte byte
"Abitrary expression program",cr,lf "---------------------------",cr,lf lf "Enter a value for u: ",0
geti mov
u, ax
print byte geti mov print byte geti mov print byte geti mov print byte geti mov
"Enter a value for v: ",0 v, ax
"Enter a value for w: ",0 w, ax
"Enter a non-zero value for x: ",0 x, ax
"Enter a non-zero value for y: ",0 y, ax
; Okay, compute Z := (X+Y)*(U+V*W)/X and print the result. print byte byte
cr,lf "(X+Y) * (U+V*W)/X is ",0
mov imul add mov
ax, v w ax, u bx, ax
;Compute V*W ; and then add in ; U. ;Save in a temp location for now.
mov add imul
ax, x ax, y bx
;Compute X+Y, multiply this ; sum by the result above, ; and then divide the whole
Page 499
Chapter 09 idiv
x
; thing by X.
puti putcr ; Compute ((X-Y*U) + (U*V) - W)/(X*Y) print byte
"((X-Y*U) + (U*V) - W)/(X*Y) = ",0
mov imul mov sub mov
ax, u dx, dx, cx,
mov imul add
ax, u V cx, ax
;Compute U*V
sub
cx, w
;Compute ((X-Y*U) + (U*V) - W)
mov imul
ax, x y
;Compute (X*Y)
xchg cwd idiv
ax, cx
y
;Compute y*u first
X ax dx
;Now compute X-Y*U ;Save in temp
;Compute (X-Y*U) + (U*V)
;Compute NUMERATOR/(X*Y) cx
puti putcr
9.9.2
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment byte ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?) Main
Boolean Operations Example The following sample program (Pgm9_2.asm on the companion CD-ROM) demonstrates how to manipulate boolean values in assembly language. It also provides an example of Demorgan’s Theorems in operation. ; Pgm9_2.ASM ; ; This program demonstrates DeMorgan's theorems and ; various other logical computations. .xlist include stdlib.a includelib stdlib.lib .list dseg
segment
para public 'data'
; Boolean input variables for the various functions ; we are going to test.
Page 500
Arithmetic and Logical Operations a b
byte byte
dseg
ends
cseg
segment assume
0 0
para public 'code' cs:cseg, ds:dseg
; Get0or1-Reads a "0" or "1" from the user and returns its ; its value in the AX register. get0or1 _get0or1
textequ proc push push
es di
getsm atoi free
_get0or1
Main
pop pop ret endp
proc mov mov mov meminit
di es
ax, dseg ds, ax es, ax
print byte byte byte byte byte byte byte byte
"Demorgan's Theorems",cr,lf "-------------------",cr,lf lf "According to Demorgan's theorems, all results " "between the dashed lines",cr,lf "should be equal.",cr,lf lf "Enter a value for a: ",0
get0or1 mov
a, al
print byte get0or1 mov
b, al
print byte byte
"---------------------------------",cr,lf "Computing not (A and B): ",0
mov mov and xor
ah, al, al, al,
"Enter a value for b: ",0
0 a b 1
;Logical NOT operation.
puti putcr print byte mov xor mov xor or
"Computing (not A) OR (not B): ",0 al, a al, 1 bl, b bl, 1 al, bl
Page 501
Chapter 09 puti print byte byte byte mov xor or puti
cr,lf "---------------------------------",cr,lf "Computing (not A) OR B: ",0 al, a al, 1 al, b
print byte byte mov xor and xor puti
cr,lf "Computing not (A AND (not B)): ",0 al, b al, 1 al, a al, 1
print byte byte byte mov xor or puti
cr,lf "---------------------------------",cr,lf "Computing (not A) OR B: ",0 al, a al, 1 al, b
print byte byte mov xor and xor puti
cr,lf "Computing not (A AND (not B)): ",0 al, b al, 1 al, a al, 1
print byte byte byte mov or xor puti
cr,lf "---------------------------------",cr,lf "Computing not (A OR B): ",0 al, a al, b al, 1
print byte byte mov xor and xor and puti
cr,lf "Computing (not A) AND (not B): ",0 al, a al, 1 bl, b bl, 1 al, bl
print byte byte byte
Page 502
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
cr,lf "---------------------------------",cr,lf 0 ;DOS macro to quit program.
para stack 'stack' 1024 dup ("stack ")
Arithmetic and Logical Operations zzzzzzseg LastBytes zzzzzzseg
9.9.3
segment byte ends end
para public 'zzzzzz' 16 dup (?) Main
64-bit Integer I/O This sample program (Pgm9_3.asm on the companion CD-ROM) shows how to read and write 64-bit integers. It provides the ATOU64 and PUTU64 routines that let you convert a string of digits to a 64-bit unsigned integer and output a 64-bit unsigned integer as a decimal string to the display. ; Pgm9_3.ASM ; ; This sample program provides two procedures that read and write ; 64-bit unsigned integer values on an 80386 or later processor. .xlist include stdlib.a includelib stdlib.lib .list .386 option
segment:use16
dp byp
textequ textequ
dseg
segment
para public 'data'
; Acc64 is a 64 bit value that the ATOU64 routine uses to input ; a 64-bit value. Acc64
qword
0
; Quotient holds the result of dividing the current PUTU value by ; ten. Quotient
qword
0
; NumOut holds the string of digits created by the PUTU64 routine. NumOut
byte
32 dup (0)
; A sample test string for the ATOI64 routine: LongNumber
byte
"123456789012345678",0
dseg
ends
cseg
segment assume
; ATOU64; ; ; ; ; ; ; ; ; ; ; ;
On entry, ES:DI point at a string containing a sequence of digits. This routine converts that string to a 64-bit integer and returns that unsigned integer value in EDX:EAX.
para public 'code' cs:cseg, ds:dseg
This routine uses the algorithm: Acc := 0 while digits left Acc := (Acc * 10) + (Current Digit - '0') Move on to next digit
Page 503
Chapter 09 ;
endwhile
ATOU64
proc push mov mov
near di ;Save because we modify it. dp Acc64, 0 ;Initialize our accumulator. dp Acc64+4, 0
; While we've got some decimal digits, process the input string:
WhileDigits:
sub mov xor cmp ja
eax, eax al, es:[di] al, '0' al, 10 NotADigit
; Multiply Acc64 by ten.
;Zero out eax's H.O. 3 bytes. ;Translates '0'..'9' -> 0..9 ; and everything else is > 9.
Use shifts and adds to accomplish this:
shl rcl
dp Acc64, 1 dp Acc64+4, 1
;Compute Acc64*2
push push
dp Acc64+4 dp Acc64
;Save Acc64*2
shl rcl shl rcl
dp dp dp dp
;Compute Acc64*4
pop add pop adc
edx dp Acc64, edx edx dp Acc64+4, edx
Acc64, 1 Acc64+4, 1 Acc64, 1 Acc64+4, 1
;Compute Acc64*8
;Compute Acc64*10 as ; Acc64*2 + Acc64*8
; Add in the numeric equivalent of the current digit. ; Remember, the H.O. three words of eax contain zero. add
dp Acc64, eax
;Add in this digit
inc jmp
di WhileDigits
;Move on to next char. ;Repeat for all digits.
; Okay, return the 64-bit integer value in eax. NotADigit:
ATOU64
Page 504
mov mov pop ret endp
eax, dp Acc64 edx, dp Acc64+4 di
; PUTU64; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
On entry, EDX:EAX contain a 64-bit unsigned value. Output a string of decimal digits providing the decimal representation of that value.
PUTU64
proc push push push
This code uses the following algorithm: di := 30; while edx:eax <> 0 do OutputNumber[di] := digit; edx:eax := edx:eax div 10 di := di - 1; endwhile Output digits from OutNumber[di+1] through OutputNumber[30]
es eax ecx
Arithmetic and Logical Operations push push pushf
edx di
mov mov lea std mov
di, dseg es, di di, NumOut+30
;This is where the output ; string will go. ;Store characters in string ; backwards. ;Output zero terminator.
byp es:[di+1],0
; Save the value to print so we can divide it by ten using an ; extended precision division operation. mov mov
dp Quotient, eax dp Quotient+4, edx
; Okay, begin converting the number into a string of digits.
DivideLoop:
mov mov sub div mov
ecx, 10 eax, dp Quotient+4 edx, edx ecx dp Quotient+4, eax
mov div mov
eax, dp Quotient ecx dp Quotient, eax
;Value to divide by. ;Do a 64-bit by ; 32-bit division ; (see the text ; for details).
; At this time edx (dl, actually) contains the remainder of the ; above division by ten, so dl is in the range 0..9. Convert ; this to an ASCII character and save it away. mov or stosb
al, dl al, '0'
; Now check to see if the result is zero. ; quit.
OutputNumber:
PUTU64
When it is, we can
mov or jnz
eax, dp Quotient eax, dp Quotient+4 DivideLoop
inc puts popf pop pop pop pop pop ret endp
di
di edx ecx eax es
; The main program provides a simple test of the two routines ; above. Main
proc mov mov mov meminit lesi call call printf
ax, dseg ds, ax es, ax
LongNumber ATOU64 PutU64
Page 505
Chapter 09 byte byte dword
9.9.4
cr,lf "%x %x %x %x",cr,lf,0 Acc64+6, Acc64+4, Acc64+2, Acc64
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment byte ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?) Main
Packing and Unpacking Date Data Types This sample program demonstrates how to pack and unpack data using the Date data type introduced in Chapter One. ; Pgm9_4.ASM ; ; This program demonstrates how to pack and unpack ; data types. It reads in a month, day, and year value. ; It then packs these values into the format the textbook ; presents in chapter two. Finally, it unpacks this data ; and calls the stdlib DTOA routine to print it as text. .xlist include stdlib.a includelib stdlib.lib .list dseg
segment
para public 'data'
Month Day Year
byte byte byte
? ;Holds month value (1-12) ? ;Holds day value (1-31) ? ;Holds year value (80-99)
Date
word
? ;Packed data goes in here.
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
; GETI-Reads an integer variable from the user and returns its ; its value in the AX register. geti _geti
textequ proc push push
es di
getsm atoi free
_geti
Page 506
pop pop ret endp
di es
Arithmetic and Logical Operations Main
proc mov mov mov meminit print byte byte byte
ax, dseg ds, ax es, ax
"Date Conversion Program",cr,lf "-----------------------",cr,lf lf,0
; Get the month value from the user. ; Do a simple check to make sure this value is in the range ; 1-12. Make the user reenter the month if it is not. GetMonth:
BadMonth:
print byte geti mov cmp je cmp jbe print byte jmp
"Enter the month (1-12): ",0
Month, al ax, 0 BadMonth ax, 12 GoodMonth "Illegal month value, please re-enter",cr,lf,0 GetMonth
GoodMonth: ; ; ; ; ;
Okay, read the day from the user. Again, do a simple check to see if the date is valid. Note that this code only checks to see if the day value is in the range 1-31. It does not check those months that have 28, 29, or 30 day months.
GetDay:
BadDay:
print byte geti mov cmp je cmp jbe print byte jmp
"Enter the day (1-31): ",0 Day, al ax, 0 BadDay ax, 31 GoodDay "Illegal day value, please re-enter",cr,lf,0 GetDay
GoodDay: ; ; ; ; ;
Okay, get the year from the user. This check is slightly more sophisticated. If the user enters a year in the range 1980-1999, it will automatically convert it to 80-99. All other dates outside the range 80-99 are illegal.
GetYear:
print byte geti cmp jb cmp ja
ax, 1980 TestYear ax, 1999 BadYear
sub mov div mov jmp
dx, dx bx, 100 bx ax, dx GoodYear
"Enter the year (80-99): ",0
;Zero extend year to 32 bits. ;Compute year mod 100.
Page 507
Chapter 09 TestYear:
cmp jb cmp jbe
ax, 80 BadYear ax, 99 GoodYear
BadYear:
print byte jmp
"Illegal year value. GetYear
mov
Year, al
GoodYear:
Please re-enter",cr,lf,0
; Okay, take these input values and pack them into the following ; 16-bit format: ; ; bit 15 8 7 0 ; | | | | ; MMMMDDDD DYYYYYYY mov mov mov mov ror
ah, bh, al, cl, ax,
0 ah Month 4 cl
mov mov shl
bl, Day cl, 7 bx, cl
;Put Day into bit positions ; 7..11.
or or mov
ax, bx al, Year Date, ax
;Create MMMMDDDD D0000000 ;Create MMMMDDDD DYYYYYYY ;Save away packed date.
;Put Month into bit positions ; 12..15
; Print out the packed date (in hex): print byte putw putcr
"Packed date = ",0
; Okay, the following code demonstrates how to unpack this date ; and put it in a form the standard library's LDTOAM routine can ; use. mov mov shr mov
ax, cl, ah, dh,
Date 4 cl ah
mov shl and mov
ax, ax, ah, dl,
Date 1 11111b ah
mov and
cx, Date cx, 7fh
print byte LDTOAM puts free putcr
Page 508
Quit: Main
ExitPgm endp
cseg
ends
sseg stk
segment byte
;First, extract Month
;LDTOAM needs month in DH. ;Next get the day.
;Day needs to be in DL. ;Now process the year. ;Strip all but year bits.
"Date: ",0 ;Convert to a string
;DOS macro to quit program.
para stack 'stack' 1024 dup ("stack ")
Arithmetic and Logical Operations
9.10
sseg
ends
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?) Main
Laboratory Exercises In this laboratory you will perform the following activities: • • •
9.10.1
Use CodeView to set breakpoints within a program and locate some errors. Use CodeView to trace through sections of a program to discover problems with that program. Use CodeView to trace through some code you write to verify correctness and observe the calculation one step at a time.
Debugging Programs with CodeView In past chapters of this lab manual you’ve had the opportunity to use CodeView to view the machine state (register and memory values), enter simple assembly language programs, and perform other minor tasks. In this section we will explore one of CodeView’s most important capabilities - helping you locate problems within your code. This section discusses three features of CodeView we have ignored up to this point - Breakpoints, Watch operations, and code tracing. These features provide some very important tools for figuring out what is wrong with your assembly language programs. Code tracing is a feature CodeView provides that lets you execute assembly language statements one at a time and observe the results. Many programmers refer to this operation as single stepping because it lets you step through the program one statement per operation. Ultimately, though, the real purpose of single stepping is to let you observe the results of a sequence of instructions, noting all side effects, so you can see why that sequence is not producing desired results. CodeView provides two easy to use trace/single step commands. Pressing F8 traces through one instruction. CodeView will update all affected registers and memory locations and halt on the very next instruction. In the event the current instruction is a call, int, or other transfer of control instruction, CodeView transfers control to the target location and displays the instruction at that location. The second CodeView command for single stepping is the step command. You can execute the step command by pressing F10. The step command executes the current statement and stops upon executing the statement immediately following it in the program. For most instructions the step and trace commands do the same thing. However, for instructions that transfer control, the trace command follows the flow of control while the step command allows the CPU to run at full speed until returning back to the next instruction. This, for example, lets you quickly execute a subroutine without having to step through all the instructions in that subroutine. You should attempt to using the program trace command (F8) for most debugging purposes and only use the step command (F10) on call and int instructions. The step instruction may have some unintended effects on other transfer of control instructions like loop, and the conditional branches. The CodeView command window also provides two commands to trace or single step through an instruction. The “T” command traces through an instruction, the “P” command steps over an instruction. One major problem with tracing through your program is that it is very slow. Even if you hold the F8 key down and let it autorepeat, you’d only be executing 10-20 instructions per second. This is a million (or more) times slower than a typical high-end PC. If the program executes several thousand instructions before even getting to the point where you
Page 509
Chapter 09 suspect the bug will be, you would have to execute far too many trace operations to get to that point. A breakpoint is a point in your program where control returns to the debugger. This is the facility that lets you run a program a full speed up to a specific point (the break point) in your program. Breakpoints are, perhaps, the most important tool for locating errors in a machine language program. Since they are so useful, it is not surprising to find that CodeView provides a very rich set of breakpoint manipulation commands. There are three keystroke commands that let you run your program at full speed and set breakpoints. The F5 command (run) begins full speed execution of your program at CS:IP. If you do not have any breakpoints set, your program will run to completion. If you are interested in stopping your program at some point you should set a breakpoint before executing this command. Pressing F5 produces the same result as the “G” (go) command in the command window. The Go command is a little more powerful, however, because it lets you specify a non-sticky breakpoint at the same time. The command window Go commands take the following forms: G G breakpoint_address The F7 keystroke executes at full speed up to the instruction the cursor is on. This sets a non-sticky breakpoint. To use this command you must first place the cursor on an instruction in the source window and then press the F7 key. CodeView will set a breakpoint at the specified instruction and start the program running at full speed until it hits a breakpoint. A non-sticky breakpoint is one that deactivates whenever control returns back to CodeView. Once CodeView regains control it clears all non-sticky breakpoints. You will have to reset those breakpoints if you still need to stop at that point in your program. Note that CodeView clears the non-sticky breakpoints even if the program stops for some reason other than execution of those non-sticky breakpoints. One very important thing to keep in mind, especially when using the F7 command to set non-sticky breakpoints, is that you must execute the statement on which the breakpoint was set for the breakpoint to have any effect. If your program skips over the instruction on which you’ve set the breakpoint, you might not return to CodeView except via program termination. When choosing a point for a breakpoint, you should always pick a sequence point. A sequence point is some spot in your program to which all execution paths converge. If you cannot set a breakpoint at a sequence point, you should set several breakpoints in your program if you are not sure the code will execute the statement with the single breakpoint. The easiest way to set a sticky breakpoint is to move the cursor to the desired statement in the CodeView source window and press F9. This will brighten that statement to show that there is a breakpoint set on that instruction. Note that the F9 key only works on 80x86 machine instructions. You cannot use it on blank lines, comments, assembler directives, or pseudo-opcodes. CodeView’s command window also provides several commands to manipulate breakpoints including BC (Breakpoint Clear), BD (Breakpoint Disable), BE (Breakpoint Enable), BL (Breakpoint List), and BP (BreakPoint set). These commands are very powerful and let you set breakpoints on memory modification, expression evaluation, apply counters to breakpoints, and more. See the MASM “Environment and Tools” manual or the CodeView on-line help for more information about these commands. Another useful debugging tool in CodeView is the Watch Window. The watch window displays the values of some specified expressions during program execution. One important use of the watch window is to display the contents of selected variables while your program executes. Upon encountering a breakpoint, CodeView automatically updates all Page 510
Arithmetic and Logical Operations watch expressions. You can add a watch expression to the watch window using the DATA:Add Watch menu item. This opens up a dialog box that looks like the following:
By typing a variable name (like Counter above) you can add a watch item to the watch window. By opening the watch windows (from the Windows menu item) you can view the values of any watch expressions you’ve created. Watch expressions are quite useful because they let you observe how your program affects the values of variables throughout your code. If you place several variable names in the watch list you can execute a section of code up to a break point and observe how that code affected certain variables.
9.10.2
Debugging Strategies Learning how to effectively use a debugger to locate problems in your machine language programs is not something you can learn from a book. Alas, there is a bit of a learning curve to using a debugger like CodeView and learning the necessary techniques to quickly locate the source of an error within a program. For this reason all too many students fall back to debugging techniques they learned in their first or second quarter of programming, namely sticking a bunch of print statements throughout their code. You should not make this mistake. The time you spend learning how to properly use CodeView will pay off very quickly.
9.10.2.1 Locating Infinite Loops Infinite loops are a very common problem in many programs. You start a program running and the whole machine locks up on you. How do you deal with this? Well, the first thing to do is to load your program into CodeView. Once you start your program running and it appears to be in an infinite loop, you can manually break the program by pressing the SysReq or Ctrl-Break key. This generally forces control back to CodeView. If
Page 511
Chapter 09 you are currently executing in a small loop, you can use the trace command to step through the loop and figure out why it does not terminate. Another way to catch an infinite loop is to use a binary search. To use this technique, place a breakpoint in the middle of your program (or in the middle of the code you wish to test). Start the program running. If it hangs up, the infinite loop is before the breakpoint. If you execute the breakpoint, then the infinite loop occurs after the breakpoint3 Once you determine which half of your program contains the infinite loop, the next step is to place another breakpoint half way into that part of the program. If the infinite loop occurred before the breakpoint in the middle of the program, then you should set a new breakpoint one quarter of the way into the program, that is, halfway between the beginning of the program and the original breakpoint. If you got to the original breakpoint without encountering the infinite loop, then set a new breakpoint at the three-quarters point in your program, i.e., halfway between the original breakpoint and the end of your program. Run the program from the beginning again (you can use the CodeView command window command “L” to restart the program from the beginning). If you do not hit any of the three breakpoints you know that the infinite loop is in the first 25% of the program. Otherwise, the current breakpoints at the 25%, 50%, and 75% points in the program will effectively limit the source of the infinite loop to a smaller section of your program. You can repeat this step over and over again until you pinpoint the section of your program containing the infinite loop. Of course, you should not place a breakpoint within a loop when searching for an infinite loop. Otherwise CodeView will break on each iteration of the loop and it will take you much longer to find the error. Of course, if the infinite loop occurs inside some other loop you will eventually need to place breakpoints inside a loop, but hopefully you will find the infinite loop on the first execution of the outside loop. If you do need to place a breakpoint inside a loop that must execute several times before you really want the break to occur, you can attach a counter to a breakpoint that counts down from some value before actually breaking. See the MASM Environment and Tools manual, or use CodeView’s on-line help facility, to get more details on breakpoint counters.
9.10.2.2 Incorrect Computations Another common problem is that you get the wrong result after performing a sequence of arithmetic and logical computations. You can look at a section of code all day long and still not see the problem, but if you trace through the code, the incorrect code because quite obvious. If you think that a particular computation is not producing a correct result you should set a breakpoint at the first instruction of the computation and run the program at full speed up to that point. Be sure to check the values of all variables and registers used in the computation. All too often a bad computation is the result of bad input values, that means the incorrect computation is elsewhere in your program. Once you have verified that the input values are correct, you can being tracing the instructions of the computation one at a time. After each instruction executes you should compare the results you actually obtain against those you expected to obtain. The main thing to keep in mind when trying to determine why your program is producing incorrect results is that the source of the error could be somewhere else besides the point where you first notice the error. This is why you should always check in input register and variable values before tracing through a section of code. If you find that the input values are no correct, then the problem lies elsewhere in your program and you will have to search elsewhere.
3. Of course, you must make sure that the instruction on which you set the break point is a sequence point. If the code can jump over your breakpoint into the second half of the program, you have proven nothing.
Page 512
Arithmetic and Logical Operations
9.10.2.3 Illegal Instructions/Infinite Loops Part II Sometimes when your program hangs up it is not due to the execution of an infinite loop, but rather you’ve executed an opcode that is not a valid machine instruction. Other times you will press the SysReq key only to find you are executing code that is nowhere near your program, perhaps out in the middle of RAM and executing some really weird instructions. Most of the time this is due to a stack problem or executing some indirect jump. The best strategy here is to open a memory window and dump some memory around the stack pointer (SS:SP). Try and locate a reasonable return address on the top of stack (or shortly thereafter if there are many values pushed on the stack) and disassemble that code. Somewhere before the return address is probably a call. You should set a breakpoint at that location and begin single stepping into the routine, watching what happens on all indirect jumps and returns. Pay close attention to the stack during all this.
9.10.3
Debug Exercise I: Using CodeView to Find Bugs in a Calculation Exercise 1: Running CodeView. The following program contains several bugs (noted in the comments). Enter this program into the system (note, this code is available as the file Ex9_1.asm on the companion CD-ROM): dseg
segment
para public ‘data’
I J K
word word word
0 0 0
dseg
ends
cseg
segment assume
; ; ; ; ; ;
para public ‘code’ cs:cseg, ds:dseg
This program is useful for debugging purposes only! The intent is to execute this code from inside CodeView. This program is riddled with bugs. The bugs are very obvious in this short code sequence, within a larger program these bugs might not be quite so obvious.
Main
proc mov mov mov
ax, dseg ds, ax es, ax
; The following loop increments I until it reaches 10 ForILoop:
inc cmp jb
I I, 10 ForILoop
; This loop is supposed to do the same thing as the loop ; above, but we forgot to reinitialize I back to zero. ; What happens? ForILoop2:
; ; ; ; ;
inc cmp jb
I I, 10 ForILoop2
The following loop, once again, attempts to do the same thing as the first for loop above. However, this time we remembered to reinitialize I. Alas, there is another problem with this code, a typo that the assembler cannot catch.
ForILoop3:
mov inc cmp jb
I, 0 I I, 10 ForILoop
;<<<-- Whoops! Typo.
Page 513
Chapter 09 ; ; ; ; ;
The following loop adds I to J until J reaches 100. Unfortunately, the author of this code must have been confused and thought that AX contained the sum accumulating in J. It compares AX against 100 when it should really be comparing J against 100.
WhileJLoop:
mov add cmp jb
ax, I J, ax ax, 100 WhileJLoop
mov int endp
ah, 4ch 21h
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
Main cseg ends
;This is a bug! ;Quit to DOS.
Main
Assemble this program with the command: ML /Zi Ex9_1.asm
The “/Zi” option instructions MASM to include debugging information for CodeView in the .EXE file. Note that the “Z” must be uppercase and the “i” must be lower case. Load this into CodeView using the command: CV Ex9_1
Your display should now look something like the following:
Note that CodeView highlights the instruction it will execute next (mov ax, dseg in the above code). Try out the trace command by pressing the F10 key three times.This should leave the inc I instruction highlighted. Step through the loop and note all the major
Page 514
Arithmetic and Logical Operations changes that take place on each iteration (note: remember jb=jc so be sure to note the value of the carry flag on each iteration as well). For your lab report: Discuss the results in your lab manual. Also note the final value of I after completing the loop. Part Two: Locating a bug. The second loop in the program contains a major bug. The programmer forgot to reset I back to zero before executing the code starting at label ForILoop2. Trace through this loop until it falls through to the statement at label ForILoop3. For your lab report: Describe what went wrong and how pressing the F8 key would help you locate this problem. Part 3: Locating another bug. The third loop contains a typo that causes it to restart at label ForILoop. Trace through this code using the F8 key.
For your lab report: Describe the process of tracking this problem down and provide a description of how you could use the trace command to catch this sort of problem. Part 4: Verifying correctness. Program Ex9_2.asm is a corrected version of the above program. Single step through that code and verify that it works correctly. For your lab report: Describe the differences between the two debugging sessions in your lab manual. Part 5: Using Ex9_2.asm, open a watch window and add the watch expression “I” to that window. Set sticky breakpoints on the three jb instructions in the program. Run the program using the Go command and comment on what happens in the Watch window at each breakpoint. For your lab report: Describe how you could use the watch window to help you locate a problem in your programs.
9.10.4
Software Delay Loop Exercises Software Delay Loops. The Ex9_3.asm file contains a short software-based delay loop. Run this program and determine the value for the loop control variable that will cause a delay of 11 seconds. Note: the current value was chosen for a 66 MHz 80486 system; if you have a slower system you may want to reduce this value, if you have a faster system, you will want to increase this value. Adjust the value to get the delay as close to 11 seconds as you can on your PC. For your lab report: Provide the constant for you particular system that produces a delay of 11 seconds. Discuss how to create a delay of 1, 10, 20, 30, or 60 seconds using this code. For additional credit: After getting the delay loop to run for 11 seconds on your PC, take the executable around to different systems with different CPUs and different clock speeds. Run the program and measure the delay. Describe the differences in your lab report. Part 2: Hardware determined software delay loop. The Ex9_4.asm file contains a software delay loop that automatically determines the number of loop iterations by observing the BIOS real time clock variable. Run this software and observe the results. For your lab report: Determine the loop iteration count and include this value in your lab manual. If your PC has a turbo switch on it, set it to “non-turbo” mode when requested by the program. Measure the actual delay as accurately as you can with the turbo switch in turbo and in non-turbo mode. Include these timings in your lab report. For additional credit: Take the executable file around to different systems with different CPUs and different clock speeds. Run the program and measure the delays. Describe the differences in your lab report.
Page 515
Chapter 09
9.11
Programming Projects
9.12
Summary This chapter discussed arithmetic and logical operations on 80x86 CPUs. It presented the instructions and techniques necessary to perform integer arithmetic in a fashion similar to high level languages. This chapter also discussed multiprecision operations, how to perform arithmetic operations using non-arithmetic instructions, and how to use arithmetic instructions to perform non-arithmetic operations. Arithmetic expressions are much simpler in a high-level language than in assembly language. Indeed, the original purpose of the FORTRAN programming language was to provide a FORMula TRANslator for arithmetic expressions. Although it takes a little more effort to convert an arithmetic formula to assembly language than it does to, say, Pascal, as long as you follow some very simple rules the conversion is not hard. For a step-by-step description, see • • • • • •
“Arithmetic Expressions” on page 460 “Simple Assignments” on page 460 “Simple Expressions” on page 460 “Complex Expressions” on page 462 “Commutative Operators” on page 466 “Logical (Boolean) Expressions” on page 467
One big advantage to assembly language is that it is easy to perform nearly unlimited precision arithmetic and logical operations. This chapter describes how to do extended precision operations for most of the common operations. For complete instructions, see • • • • • • • • • • • •
“Multiprecision Operations” on page 470 “Multiprecision Addition Operations” on page 470 “Multiprecision Subtraction Operations” on page 472 “Extended Precision Comparisons” on page 473 “Extended Precision Multiplication” on page 475 “Extended Precision Division” on page 477 “Extended Precision NEG Operations” on page 480 “Extended Precision AND Operations” on page 481 “Extended Precision OR Operations” on page 482 “Extended Precision NOT Operations” on page 482 “Extended Precision Shift Operations” on page 482 “Extended Precision Rotate Operations” on page 484
At certain times you may need to operate on two operands that are different types. For example, you may need to add a byte value to a word value. The general idea is to extend the smaller operand so that it is the same size as the larger operand and then compute the result on these like-sized operands. For all the details, see •
“Operating on Different Sized Operands” on page 485
Although the 80x86 instruction set provides straight-forward ways to accomplish many tasks, you can often take advantage of various idioms in the instruction set or with respect to certain arithmetic operations to produce code that is faster or shorter than the obvious way. This chapter introduces a few of these idioms. To see some examples, check out • • • • • •
Page 516
“Machine and Arithmetic Idioms” on page 486 “Multiplying Without MUL and IMUL” on page 487 “Division Without DIV and IDIV” on page 488 “Using AND to Compute Remainders” on page 488 “Implementing Modulo-n Counters with AND” on page 489 “Testing an Extended Precision Value for 0FFFF..FFh” on page 489
Arithmetic and Logical Operations • •
“TEST Operations” on page 489 “Testing Signs with the XOR Instruction” on page 490
To manipulate packed data you need the ability to extract a field from a packed record and insert a field into a packed record. You can use the logical and and or instructions to mask the fields you want to manipulate; you can use the shl and shr instructions to position the data to their appropriate positions before inserting or after extracting data. To learn how to pack and unpack data, see • • • •
“Masking Operations” on page 490 “Masking Operations with the AND Instruction” on page 490 “Masking Operations with the OR Instruction” on page 491 “Packing and Unpacking Data Types” on page 491
Page 517
Chapter 09
9.13
Questions
1)
Describe how you might go about adding an unsigned word to an unsigned byte variable producing a byte result. Explain any error conditions and how to check for them.
2)
Answer question one for signed values.
3)
Assume that var1 is a word and var2 and var3 are double words. What is the 80x86 assembly language code that will add var1 to var2 leaving the sum in var3 if: a) var1, var2, and var3 are unsigned values. b) var1, var2, and var3 are signed values.
4)
“ADD BX, 4” is more efficient than “LEA BX, 4[BX]”. Give an example of an LEA instruction which is more efficient than the corresponding ADD instruction.
5)
Provide the single 80386 LEA instruction that will multiply EAX by five.
6)
Assume that VAR1 and VAR2 are 32 bit variables declared with the DWORD pseudo-opcode. Write code sequences that will test the following: a) VAR1 = VAR2 b) VAR1 <> VAR2 c) VAR1 < VAR2
(Unsigned and signed versions
d) VAR1 <= VAR2
for each of these)
e) VAR1 > VAR2 f) VAR1 >= VAR2 7)
Convert the following expressions into assembly language code employing shifts, additions, and subtractions in place of the multiplication: a) AX*15 b) AX*129 c) AX*1024 d) AX*20000
8)
What’s the best way to divide the AX register by the following constants? a) 8
b) 255
c) 1024
d) 45
9)
Describe how you could multiply an eight bit value in AL by 256 (leaving the result in AX) using nothing more than two MOV instructions.
10)
How could you logically AND the value in AX by 0FFh using nothing more than a MOV instruction?
11)
Suppose that the AX register contains a pair of packed binary values with the L.O. four bits containing a value in the range 0..15 and the H.O. 12 bits containing a value in the range 0..4095. Now suppose you want to see if the 12 bit portion contains the value 295. Explain how you could accomplish this with two instructions.
12)
How could you use the TEST instruction (or a sequence of TEST instructions) to see if bits zero and four in the AL register are both set to one? How would the TEST instruction be used to see if either bit is set? How could the TEST instruction be used to see if neither bit is set?
13)
Why can’t the CL register be used as a count operand when shifting multi-precision operands. I.e., why won’t the following instructions shift the value in (DX,AX) three bits to the left? mov shl rcl
Page 518
cl, 3 ax, cl dx, cl
Arithmetic and Logical Operations 14)
Provide instruction sequences that perform an extended precision (32 bit) ROL and ROR operation using only 8086 instructions.
15)
Provide an instruction sequence that implements a 64 bit ROR operation using the 80386 SHRD and BT instructions.
16)
Provide the 80386 code to perform the following 64 bit computations. Assume you are computing X := Y op Z with X, Y, and Z defined as follows: X y z
dword dword dword
0, 0 1, 2 3, 4
a) addition
b) subtraction
c) multiplication
c) Logical AND
d) Logical OR
e) Logical XOR
f) negate
g) Logical NOT
Page 519
Chapter 09
Page 520
Control Structures
Chapter 10
A computer program typically contains three structures: instruction sequences, decisions, and loops. A sequence is a set of sequentially executing instructions. A decision is a branch (goto) within a program based upon some condition. A loop is a sequence of instructions that will be repeatedly executed based on some condition. In this chapter we will explore some of the common decision structures in 80x86 assembly language.
10.0
Chapter Overview This chapter discusses the two primary types of control structures: decision and iteration. It describes how to convert high level language statements like if..then..else, case (switch), while, for etc., into equivalent assembly language sequences. This chapter also discusses techniques you can use to improve the performance of these control structures. The sections below that have a “•” prefix are essential. Those sections with a “❏” discuss advanced topics that you may want to put off for a while. • • • ❏
• • • • • • • ❏ ❏ ❏ ❏ ❏ ❏
10.1
Introduction to Decisions. IF..THEN..ELSE Sequences. CASE Statements. State machines and indirect jumps. Spaghetti code. Loops. WHILE Loops. REPEAT..UNTIL loops. LOOP..ENDLOOP. FOR Loops. Register usage and loops. Performance improvements. Moving the termination condition to the end of a loop. Executing the loop backwards. Loop invariants. Unraveling loops. Induction variables.
Introduction to Decisions In its most basic form, a decision is some sort of branch within the code that switches between two possible execution paths based on some condition. Normally (though not always), conditional instruction sequences are implemented with the conditional jump instructions. Conditional instructions correspond to the if..then..else statement in Pascal: IF (condition is true) THEN stmt1 ELSE stmt2 ;
Assembly language, as usual, offers much more flexibility when dealing with conditional statements. Consider the following Pascal statement: IF ((X T)) or (A <> B) THEN stmt1;
A “brute force” approach to converting this statement into assembly language might produce:
Page 521 Thi d
t
t d ith F
M k
402
Chapter 10 mov cl, 1 mov ax, X cmp ax, Y jl IsTrue mov cl, 0 IsTrue: mov ax, Z cmp ax, T jg AndTrue mov cl, 0 AndTrue: mov al, A cmp al, B je OrFalse mov cl, 1 OrFalse: cmp cl, 1 jne SkipStmt1 SkipStmt1:
;Assume true
;This one’s false
;It’s false now
;Its true if A <> B
As you can see, it takes a considerable number of conditional statements just to process the expression in the example above. This roughly corresponds to the (equivalent) Pascal statements: cl IF IF IF IF
:= true; (X >= Y) then cl (Z <= T) then cl (A <> B) THEN cl (CL = true) then
:= false; := false; := true; stmt1;
Now compare this with the following “improved” code: mov cmp jne mov cmp jnl mov cmp jng
ax, A ax, B DoStmt ax, X ax, Y SkipStmt ax, Z ax, T SkipStmt
DoStmt:
code for Stmt1 here>
SkipStmt:
Two things should be apparent from the code sequences above: first, a single conditional statement in Pascal may require several conditional jumps in assembly language; second, organization of complex expressions in a conditional sequence can affect the efficiency of the code. Therefore, care should be exercised when dealing with conditional sequences in assembly language. Conditional statements may be broken down into three basic categories: if..then..else statements, case statements, and indirect jumps. The following sections will describe these program structures, how to use them, and how to write them in assembly language.
10.2
IF..THEN..ELSE Sequences The most commonly used conditional statement is theif..then or if..then..else statement. These two statements take the following form shown in Figure 10.1. The if..then statement is just a special case of the if..then..else statement (with an empty ELSE block). Therefore, we’ll only consider the more general if..then..else form. The basic implementation of an if..then..else statement in 80x86 assembly language looks something like this:
Page 522
Control Structures
IF..THEN
IF..THEN..ELSE
Test for some condition
Test for some condition
Execute this block of statements if the condition is true.
Execute this block of statements if the condition is true.
Execute this block of statements if the condition is false Con ti nue executi on down here after the comple ti on of the THEN or if skipping the THEN block.
Co ntinu e execu tion down here after the compl etion o f the THEN or ELSE blocks
Figure 10.1 IF..THEN and IF..THEN..ELSE Statement Flow {Sequence of statements to test some condition} Jcc ElseCode {Sequence of statements corresponding to the THEN block} jmp EndOfIF ElseCode: {Sequence of statements corresponding to the ELSE block} EndOfIF:
Note: Jcc represents some conditional jump instruction. For example, to convert the Pascal statement: IF (a=b) then c := d else b := b + 1;
to assembly language, you could use the following 80x86 code: mov cmp jne mov mov jmp
ax, a ax, b ElseBlk ax, d c, ax EndOfIf
inc
b
ElseBlk: EndOfIf:
For simple expressions like (A=B) generating the proper code for an if..then..else statement is almost trivial. Should the expression become more complex, the associated assembly language code complexity increases as well. Consider the following if statement presented earlier: IF ((X > Y) and (Z < T)) or (A<>B) THEN C := D;
Page 523
Chapter 10 When processing complex if statements such as this one, you’ll find the conversion task easier if you break this if statement into a sequence of three different if statements as follows: IF (A<>B) THEN C := D IF (X > Y) THEN IF (Z < T) THEN C := D;
This conversion comes from the following Pascal equivalences: IF (expr1 AND expr2) THEN stmt;
is equivalent to IF (expr1) THEN IF (expr2) THEN stmt;
and IF (expr1 OR expr2) THEN stmt;
is equivalent to IF (expr1) THEN stmt; IF (expr2) THEN stmt;
In assembly language, the former if statement becomes: mov cmp jne mov cmp jng mov cmp jnl
ax, A ax, B DoIF ax, X ax, Y EndOfIf ax, Z ax, T EndOfIf
mov mov
ax, D C, ax
DoIf:
EndOfIF:
As you can probably tell, the code necessary to test a condition can easily become more complex than the statements appearing in the else and then blocks. Although it seems somewhat paradoxical that it may take more effort to test a condition than to act upon the results of that condition, it happens all the time. Therefore, you should be prepared for this situation. Probably the biggest problem with the implementation of complex conditional statements in assembly language is trying to figure out what you’ve done after you’ve written the code. Probably the biggest advantage high level languages offer over assembly language is that expressions are much easier to read and comprehend in a high level language. The HLL version is self-documenting whereas assembly language tends to hide the true nature of the code. Therefore, well-written comments are an essential ingredient to assembly language implementations of if..then..else statements. An elegant implementation of the example above is: ; IF ((X > Y) AND (Z < T)) OR (A <> B) THEN C := D; ; Implemented as: ; IF (A <> B) THEN GOTO DoIF; mov cmp jne
ax, A ax, B DoIF
; IF NOT (X > Y) THEN GOTO EndOfIF; mov cmp jng
ax, X ax, Y EndOfIf
; IF NOT (Z < T) THEN GOTO EndOfIF ; mov cmp jnl
Page 524
ax, Z ax, T EndOfIf
Control Structures ; THEN Block: DoIf:
mov mov
ax, D C, ax
; End of IF statement EndOfIF:
Admittedly, this appears to be going overboard for such a simple example. The following would probably suffice: ; IF ((X > Y) AND (Z < T)) OR (A <> B) THEN C := D; ; Test the boolean expression: mov cmp jne mov cmp jng mov cmp jnl
ax, A ax, B DoIF ax, X ax, Y EndOfIf ax, Z ax, T EndOfIf
mov mov
ax, D C, ax
; THEN Block: DoIf:
; End of IF statement EndOfIF:
However, as your if statements become complex, the density (and quality) of your comments become more and more important.
10.3
CASE Statements The Pascal case statement takes the following form : CASE variable OF const1:stmt1; const2:stmt2; . . . constn:stmtn END;
When this statement executes, it checks the value of variable against the constants const1 … constn. If a match is found then the corresponding statement executes. Standard Pascal places a few restrictions on the case statement. First, if the value of variable isn’t in the list of constants, the result of the case statement is undefined. Second, all the constants appearing as case labels must be unique. The reason for these restrictions will become
clear in a moment. Most introductory programming texts introduce the case statement by explaining it as a sequence of if..then..else statements. They might claim that the following two pieces of Pascal code are equivalent: CASE I OF 0: WriteLn(‘I=0’); 1: WriteLn(‘I=1’); 2: WriteLn(‘I=2’); END; IF I = 0 THEN WriteLn(‘I=0’) ELSE IF I = 1 THEN WriteLn(‘I=1’) ELSE IF I = 2 THEN WriteLn(‘I=2’);
Page 525
Chapter 10 While semantically these two code segments may be the same, their implementation is usually different1. Whereas the if..then..else if chain does a comparison for each conditional statement in the sequence, the case statement normally uses an indirect jump to transfer control to any one of several statements with a single computation. Consider the two examples presented above, they could be written in assembly language with the following code: mov shl jmp
bx, I bx, 1 ;Multiply BX by two cs:JmpTbl[bx]
JmpTbl
word
stmt0, stmt1, stmt2
Stmt0:
print byte jmp
“I=0”,cr,lf,0 EndCase
print byte jmp
“I=1”,cr,lf,0 EndCase
print byte
“I=2”,cr,lf,0
Stmt1:
Stmt2: EndCase:
; IF..THEN..ELSE form:
Not0:
Not1:
mov cmp jne print byte jmp
ax, I ax, 0 Not0
cmp jne print byte jmp
ax, 1 Not1
cmp jne Print byte
ax, 2 EndOfIF
“I=0”,cr,lf,0 EndOfIF
“I=1”,cr,lf,0 EndOfIF
“I=2”,cr,lf,0
EndOfIF:
Two things should become readily apparent: the more (consecutive) cases you have, the more efficient the jump table implementation becomes (both in terms of space and speed). Except for trivial cases, the case statement is almost always faster and usually by a large margin. As long as the case labels are consecutive values, the case statement version is usually smaller as well. What happens if you need to include non-consecutive case labels or you cannot be sure that the case variable doesn’t go out of range? Many Pascals have extended the definition of the case statement to include an otherwise clause. Such a case statement takes the following form: CASE variable OF const:stmt; const:stmt; . . . . . . const:stmt; OTHERWISE stmt END;
If the value of variable matches one of the constants making up the case labels, then the associated statement executes. If the variable’s value doesn’t match any of the case 1. Versions of Turbo Pascal, sadly, treat the case statement as a form of the if..then..else statement.
Page 526
Control Structures labels, then the statement following the otherwise clause executes. The otherwise clause is implemented in two phases. First, you must choose the minimum and maximum values that appear in a case statement. In the following case statement, the smallest case label is five, the largest is 15: CASE I OF 5:stmt1; 8:stmt2; 10:stmt3; 12:stmt4; 15:stmt5; OTHERWISE stmt6 END;
Before executing the jump through the jump table, the 80x86 implementation of this case statement should check the case variable to make sure it’s in the range 5..15. If not,
control should be immediately transferred to stmt6: mov cmp jl cmp jg shl jmp
bx, I bx, 5 Otherwise bx, 15 Otherwise bx, 1 cs:JmpTbl-10[bx]
The only problem with this form of the case statement as it now stands is that it doesn’t properly handle the situation where I is equal to 6, 7, 9, 11, 13, or 14. Rather than sticking extra code in front of the conditional jump, you can stick extra entries in the jump table as follows: mov cmp jl cmp jg shl jmp
bx, I bx, 5 Otherwise bx, 15 Otherwise bx, 1 cs:JmpTbl-10[bx]
Otherwise:
{put stmt6 here} jmp CaseDone
JmpTbl
word word word etc.
stmt1, Otherwise, Otherwise, stmt2, Otherwise stmt3, Otherwise, stmt4, Otherwise, Otherwise stmt5
Note that the value 10 is subtracted from the address of the jump table. The first entry in the table is always at offset zero while the smallest value used to index into the table is five (which is multiplied by two to produce 10). The entries for 6, 7, 9, 11, 13, and 14 all point at the code for the Otherwise clause, so if I contains one of these values, the Otherwise clause will be executed. There is a problem with this implementation of the case statement. If the case labels contain non-consecutive entries that are widely spaced, the following case statement would generate an extremely large code file: CASE I OF 0: stmt1; 100: stmt2; 1000: stmt3; 10000: stmt4; Otherwise stmt5 END;
In this situation, your program will be much smaller if you implement the case statement with a sequence of if statements rather than using a jump statement. However, keep one thing in mind- the size of the jump table does not normally affect the execution speed of the program. If the jump table contains two entries or two thousand, the case statement will execute the multi-way branch in a constant amount of time. The if statement implePage 527
Chapter 10 mentation requires a linearly increasing amount of time for each case label appearing in the case statement. Probably the biggest advantage to using assembly language over a HLL like Pascal is that you get to choose the actual implementation. In some instances you can implement a case statement as a sequence ofif..then..else statements, or you can implement it as a jump table, or you can use a hybrid of the two: CASE I OF 0:stmt1; 1:stmt2; 2:stmt3; 100:stmt4; Otherwise stmt5 END;
could become: mov cmp je cmp ja shl jmp etc.
bx, I bx, 100 Is100 bx, 2 Otherwise bx, 1 cs:JmpTbl[bx]
Of course, you could do this in Pascal with the following code: IF I = 100 then stmt4 ELSE CASE I OF 0:stmt1; 1:stmt2; 2:stmt3; Otherwise stmt5 END;
But this tends to destroy the readability of the Pascal program. On the other hand, the extra code to test for 100 in the assembly language code doesn’t adversely affect the readability of the program (perhaps because it’s so hard to read already). Therefore, most people will add the extra code to make their program more efficient. The C/C++ switch statement is very similar to the Pascal case statement. There is only one major semantic difference: the programmer must explicitly place a break statement in each case clause to transfer control to the first statement beyond the switch. This break corresponds to the jmp instruction at the end of each case sequence in the assembly code above. If the corresponding break is not present, C/C++ transfers control into the code of the following case. This is equivalent to leaving off the jmp at the end of the case’s sequence: switch (i) { case 0: stmt1; case 1: stmt2; case 2: stmt3; break; case 3: stmt4; break; default: stmt5; }
This translates into the following 80x86 code:
JmpTbl
Page 528
mov cmp ja
bx, i bx, 3 DefaultCase
shl jmp word
bx, 1 cs:JmpTbl[bx] case0, case1, case2, case3
Control Structures case0:
<stmt1’s code>
case1:
<stmt2’s code>
case2:
<stmt3’s code> EndCase
;Emitted for the break stmt.
<stmt4’s code> jmp EndCase
;Emitted for the break stmt.
jmp case3:
DefaultCase: EndCase:
10.4
<stmt5’s code>
State Machines and Indirect Jumps Another control structure commonly found in assembly language programs is the state machine. A state machine uses a state variable to control program flow. The FORTRAN programming language provides this capability with the assigned goto statement. Certain variants of C (e.g., GNU’s GCC from the Free Software Foundation) provide similar features. In assembly language, the indirect jump provides a mechanism to easily implement state machines. So what is a state machine? In very basic terms, it is a piece of code2 which keeps track of its execution history by entering and leaving certain “states”. For the purposes of this chapter, we’ll not use a very formal definition of a state machine. We’ll just assume that a state machine is a piece of code which (somehow) remembers the history of its execution (its state) and executes sections of code based upon that history. In a very real sense, all programs are state machines. The CPU registers and values in memory constitute the “state” of that machine. However, we’ll use a much more constrained view. Indeed, for most purposes only a single variable (or the value in the IP register) will denote the current state. Now let’s consider a concrete example. Suppose you have a procedure which you want to perform one operation the first time you call it, a different operation the second time you call it, yet something else the third time you call it, and then something new again on the fourth call. After the fourth call it repeats these four different operations in order. For example, suppose you want the procedure to add ax and bx the first time, subtract them on the second call, multiply them on the third, and divide them on the fourth. You could implement this procedure as follows: State StateMach
byte proc cmp jne
0 state,0 TryState1
; If this is state 0, add BX to AX and switch to state 1: add inc ret
ax, bx State
;Set it to state 1
; If this is state 1, subtract BX from AX and switch to state 2 TryState1:
cmp jne sub inc ret
State, 1 TryState2 ax, bx State
; If this is state 2, multiply AX and BX and switch to state 3: TryState2:
cmp
State, 2
2. Note that state machines need not be software based. Many state machines’ implementation are hardware based.
Page 529
Chapter 10 jne push mul pop inc ret
MustBeState3 dx bx dx State
; If none of the above, assume we’re in State 4. So divide ; AX by BX. MustBeState3:
StateMach
push xor div pop mov ret endp
dx dx, dx bx dx State, 0
;Zero extend AX into DX.
;Switch back to State 0
Technically, this procedure is not the state machine. Instead, it is the variable State and the cmp/jne instructions which constitute the state machine. There is nothing particularly special about this code. It’s little more than a case statement implemented via theif..then..else construct. The only thing special about this procedure is that it remembers how many times it has been called3 and behaves differently depending upon the number of calls. While this is a correct implementation of the desired state machine, it is not particularly efficient. The more common implementation of a state machine in assembly language is to use an indirect jump. Rather than having a state variable which contains a value like zero, one, two, or three, we could load the state variable with the address of the code to execute upon entry into the procedure. By simply jumping to that address, the state machine could save the tests above needed to execute the proper code fragment. Consider the following implementation using the indirect jump: State StateMach
word proc jmp
State0 State
; If this is state 0, add BX to AX and switch to state 1: State0:
add mov ret
ax, bx State, offset State1
;Set it to state 1
; If this is state 1, subtract BX from AX and switch to state 2 State1:
sub mov ret
ax, bx State, offset State2
;Switch to State 2
; If this is state 2, multiply AX and BX and switch to state 3: State2:
push mul pop mov ret
dx bx dx State, offset State3
;Switch to State 3
; If in State 3, do the division and switch back to State 0: State3:
StateMach
push xor div pop mov ret endp
dx dx, dx ;Zero extend AX into DX. bx dx State, offset State0 ;Switch to State 0
The jmp instruction at the beginning of the StateMach procedure transfers control to the location pointed at by the State variable. The first time you call StateMach it points at
3. Actually, it remembers how many times, MOD 4, that it has been called.
Page 530
Control Structures the State0 label. Thereafter, each subsection of code sets the State variable to point at the appropriate successor code.
10.5
Spaghetti Code One major problem with assembly language is that it takes several statements to realize a simple idea encapsulated by a single HLL statement. All too often an assembly language programmer will notice that s/he can save a few bytes or cycles by jumping into the middle of some programming structure. After a few such observations (and corresponding modifications) the code contains a whole sequence of jumps in and out of portions of the code. If you were to draw a line from each jump to its destination, the resulting listing would end up looking like someone dumped a bowl of spaghetti on your code, hence the term “spaghetti code”. Spaghetti code suffers from one major drawback- it’s difficult (at best) to read such a program and figure out what it does. Most programs start out in a “structured” form only to become spaghetti code at the altar of efficiency. Alas, spaghetti code is rarely efficient. Since it’s difficult to figure out exactly what’s going on, it’s very difficult to determine if you can use a better algorithm to improve the system. Hence, spaghetti code may wind up less efficient. While it’s true that producing some spaghetti code in your programs may improve its efficiency, doing so should always be a last resort (when you’ve tried everything else and you still haven’t achieved what you need), never a matter of course. Always start out writing your programs with straight-forward ifs and case statements. Start combining sections of code (via jmp instructions) once everything is working and well understood. Of course, you should never obliterate the structure of your code unless the gains are worth it. A famous saying in structured programming circles is “After gotos, pointers are the next most dangerous element in a programming language.” A similar saying is “Pointers are to data structures what gotos are to control structures.” In other words, avoid excessive use of pointers. If pointers and gotos are bad, then the indirect jump must be the worst construct of all since it involves both gotos and pointers! Seriously though, the indirect jump instructions should be avoided for casual use. They tend to make a program harder to read. After all, an indirect jump can (theoretically) transfer control to any label within a program. Imagine how hard it would be to follow the flow through a program if you have no idea what a pointer contains and you come across an indirect jump using that pointer. Therefore, you should always exercise care when using jump indirect instructions.
10.6
Loops Loops represent the final basic control structure (sequences, decisions, and loops) which make up a typical program. Like so many other structures in assembly language, you’ll find yourself using loops in places you’ve never dreamed of using loops. Most HLLs have implied loop structures hidden away. For example, consider the BASIC statement IF A$ = B$ THEN 100. This if statement compares two strings and jumps to statement 100 if they are equal. In assembly language, you would need to write a loop to compare each character in A$ to the corresponding character in B$ and then jump to statement 100 if and only if all the characters matched. In BASIC, there is no loop to be seen in the program. In assembly language, this very simple if statement requires a loop. This is but a small example which shows how loops seem to pop up everywhere. Program loops consist of three components: an optional initialization component, a loop termination test, and the body of the loop. The order with which these components are assembled can dramatically change the way the loop operates. Three permutations of these components appear over and over again. Because of their frequency, these loop structures are given special names in HLLs: while loops, repeat..until loops (do..while in C/C++), and loop..endloop loops. Page 531
Chapter 10
10.6.1
While Loops The most general loop is the while loop. It takes the following form: WHILE boolean expression DO statement;
There are two important points to note about the while loop. First, the test for termination appears at the beginning of the loop. Second as a direct consequence of the position of the termination test, the body of the loop may never execute. If the termination condition always exists, the loop body will always be skipped over. Consider the following Pascal while loop: I := 0; WHILE (I<100) do I := I + 1; I := 0; is the initialization code for this loop. I is a loop control variable, because it controls the execution of the body of the loop. (I<100) is the loop termination condition. That is, the loop will not terminate as long as I is less than 100. I:=I+1; is the loop body. This is the code that executes on each pass of the loop. You can convert this to 80x86 assembly language as follows: WhileLp:
mov cmp jge inc jmp
I, 0 I, 100 WhileDone I WhileLp
WhileDone:
Note that a Pascal while loop can be easily synthesized using an if and a goto statement. For example, the Pascal while loop presented above can be replaced by: I := 0; 1:
IF (I<100) THEN BEGIN I := I + 1; GOTO 1;
END;
More generally, any while loop can be built up from the following: 1:
optional initialization code IF not termination condition THEN BEGIN loop body GOTO 1; END;
Therefore, you can use the techniques from earlier in this chapter to convert if statements to assembly language. All you’ll need is an additional jmp (goto) instruction.
10.6.2
Repeat..Until Loops The repeat..until (do..while) loop tests for the termination condition at the end of the loop rather than at the beginning. In Pascal, the repeat..until loop takes the following form: optional initialization code REPEAT loop body UNTIL termination condition
This sequence executes the initialization code, the loop body, then tests some condition to see if the loop should be repeated. If the boolean expression evaluates to false, the loop repeats; otherwise the loop terminates. The two things to note about the repeat..until loop is that the termination test appears at the end of the loop and, as a direct consequence of this, the loop body executes at least once. Like the while loop, the repeat..until loop can be synthesized with an if statement and a goto . You would use the following: Page 532
Control Structures 1:
initialization code loop body IF NOT termination condition THEN GOTO 1
Based on the material presented in the previous sections, you can easily synthesize repeat..until loops in assembly language.
10.6.3
LOOP..ENDLOOP Loops If while loops test for termination at the beginning of the loop and repeat..until loops check for termination at the end of the loop, the only place left to test for termination is in the middle of the loop. Although Pascal and C/C++4 don’t directly support such a loop, the loop..endloop structure can be found in HLL languages like Ada. The loop..endloop loop takes the following form: LOOP loop body ENDLOOP;
Note that there is no explicit termination condition. Unless otherwise provided for, the loop..endloop construct simply forms an infinite loop. Loop termination is handled by an if and goto statement5. Consider the following (pseudo) Pascal code which employs a loop..endloop construct: LOOP READ(ch) IF ch = ‘.’ THEN BREAK; WRITE(ch); ENDLOOP;
In real Pascal, you’d use the following code to accomplish this: 1: READ(ch); IF ch = ‘.’ THEN GOTO 2; (* Turbo Pascal supports BREAK! *) WRITE(ch); GOTO 1 2:
In assembly language you’d end up with something like: LOOP1: getc cmp je putc jmp EndLoop:
10.6.4
al, ‘.’ EndLoop LOOP1
FOR Loops The for loop is a special form of the while loop which repeats the loop body a specific number of times. In Pascal, the for loop looks something like the following: FOR var := initial TO final DO stmt or FOR var := initial DOWNTO final DO stmt
Traditionally, the for loop in Pascal has been used to process arrays and other objects accessed in sequential numeric order. These loops can be converted directly into assembly language as follows:
4. Technically, C/C++ does support such a loop. “for(;;)” along with break provides this capability. 5. Many high level languages use statements like NEXT, BREAK, CONTINUE, EXIT, and CYCLE rather than GOTO; but they’re all forms of the GOTO statement.
Page 533
Chapter 10 In Pascal: FOR var := start TO stop DO stmt; In Assembly: mov mov cmp jg
FL:
var, start ax, var ax, stop EndFor
; code corresponding to stmt goes here. inc jmp
var FL
EndFor:
Fortunately, most for loops repeat some statement(s) a fixed number of times. For example, FOR I := 0 to 7 do write(ch);
In situations like this, it’s better to use the 80x86 loop instruction rather than simulate a for loop: mov mov call loop
LP:
cx, 7 al, ch putc LP
Keep in mind that the loop instruction normally appears at the end of a loop whereas the for loop tests for termination at the beginning of the loop. Therefore, you should take precautions to prevent a runaway loop in the event cx is zero (which would cause the loop instruction to repeat the loop 65,536 times) or the stop value is less than the start value. In the case of FOR var := start TO stop DO stmt;
assuming you don’t use the value of var within the loop, you’d probably want to use the assembly code: mov sub jl inc stmt loop
LP:
cx, stop cx, start SkipFor cx LP
SkipFor:
Remember, the sub and cmp instructions set the flags in an identical fashion. Therefore, this loop will be skipped if stop is less than start. It will be repeated (stop-start)+1 times otherwise. If you need to reference the value of var within the loop, you could use the following code: mov mov mov sub jl inc stmt inc loop
LP:
ax, start var, ax cx, stop cx, ax SkipFor cx var LP
SkipFor:
The downto version appears in the exercises.
10.7
Register Usage and Loops Given that the 80x86 accesses registers much faster than memory locations, registers are the ideal spot to place loop control variables (especially for small loops). This point is
Page 534
Control Structures amplified since the loop instruction requires the use of the cx register. However, there are some problems associated with using registers within a loop. The primary problem with using registers as loop control variables is that registers are a limited resource. In particular, there is only one cx register. Therefore, the following will not work properly: mov mov stmts loop stmts loop
Loop1: Loop2:
cx, 8 cx, 4 Loop2 Loop1
The intent here, of course, was to create a set of nested loops, that is, one loop inside another. The inner loop (Loop2) should repeat four times for each of the eight executions of the outer loop (Loop1). Unfortunately, both loops use the loop instruction. Therefore, this will form an infinite loop since cx will be set to zero (which loop treats like 65,536) at the end of the first loop instruction. Since cx is always zero upon encountering the second loop instruction, control will always transfer to the Loop1 label. The solution here is to save and restore the cx register or to use a different register in place of cx for the outer loop: Loop1: Loop2:
mov push mov stmts loop pop stmts loop
cx, 8 cx cx, 4
mov mov stmts loop stmts dec jnz
bx, 8 cx, 4
Loop2 cx Loop1
or: Loop1: Loop2:
Loop2 bx Loop1
Register corruption is one of the primary sources of bugs in loops in assembly language programs, always keep an eye out for this problem.
10.8
Performance Improvements The 80x86 microprocessors execute sequences of instructions at blinding speeds. You’ll rarely encounter a program that is slow which doesn’t contain any loops. Since loops are the primary source of performance problems within a program, they are the place to look when attempting to speed up your software. While a treatise on how to write efficient programs is beyond the scope of this chapter, there are some things you should be aware of when designing loops in your programs. They’re all aimed at removing unnecessary instructions from your loops in order to reduce the time it takes to execute one iteration of the loop.
10.8.1
Moving the Termination Condition to the End of a Loop Consider the following flow graphs for the three types of loops presented earlier: Repeat..until loop: Initialization code Loop body Test for termination Code following the loop While loop:
Page 535
Chapter 10 Initialization code Loop termination test Loop body Jump back to test Code following the loop Loop..endloop loop: Initialization Loop Loop Loop Jump Code following
code body, part one termination test body, part two back to loop body part 1 the loop
As you can see, the repeat..until loop is the simplest of the bunch. This is reflected in the assembly language code required to implement these loops. Consider the following repeat..until and while loops that are identical: SI := DI - 20; while (SI <= DI) do begin
SI := DI - 20; repeat
stmts SI := SI + 1; end;
stmts SI := SI + 1; until SI > DI;
The assembly language code for these two loops is6:
WL1:
mov sub cmp jnle stmts inc jmp
si, di si, 20 si, di QWL
U:
si WL1
mov sub stmts inc cmp jng
si, di si, 20 si si, di RU
QWL:
As you can see, testing for the termination condition at the end of the loop allowed us to remove a jmp instruction from the loop. This can be significant if this loop is nested inside other loops. In the preceding example there wasn’t a problem with executing the body at least once. Given the definition of the loop, you can easily see that the loop will be executed exactly 20 times. Assuming cx is available, this loop easily reduces to:
WL1:
lea mov stmts inc loop
si, -20[di] cx, 20 si WL1
Unfortunately, it’s not always quite this easy. Consider the following Pascal code: WHILE (SI <= DI) DO BEGIN stmts SI := SI + 1; END;
In this particular example, we haven’t the slightest idea what si contains upon entry into the loop. Therefore, we cannot assume that the loop body will execute at least once. Therefore, we must do the test before executing the body of the loop. The test can be placed at the end of the loop with the inclusion of a single jmp instruction: RU: Test:
jmp stmts inc cmp jle
short Test si si, di RU
6. Of course, a good compiler would recognize that both loops perform the same operation and generate identical code for each. However, most compilers are not this good.
Page 536
Control Structures Although the code is as long as the original while loop, the jmp instruction executes only once rather than on each repetition of the loop. Note that this slight gain in efficiency is obtained via a slight loss in readability. The second code sequence above is closer to spaghetti code that the original implementation. Such is often the price of a small performance gain. Therefore, you should carefully analyze your code to ensure that the performance boost is worth the loss of clarity. More often than not, assembly language programmers sacrifice clarity for dubious gains in performance, producing impossible to understand programs.
10.8.2
Executing the Loop Backwards Because of the nature of the flags on the 80x86, loops which range from some number down to (or up to) zero are more efficient than any other. Compare the following Pascal loops and the code they generate: for I := 1 to 8 do K := K + I - J;
FLP:
mov mov add sub mov inc cmp jle
I, 1 ax, K ax, I ax, J K, ax I I, 8 FLP
for I := 8 downto 1 do K := K + I - j;
FLP:
mov mov add sub mov dec jnz
I, 8 ax, K ax, I ax, J K, ax I FLP
Note that by running the loop from eight down to one (the code on the right) we saved a comparison on each repetition of the loop. Unfortunately, you cannot force all loops to run backwards. However, with a little effort and some coercion you should be able to work most loops so they operate backwards. Once you get a loop operating backwards, it’s a good candidate for the loop instruction (which will improve the performance of the loop on pre-486 CPUs). The example above worked out well because the loop ran from eight down to one. The loop terminated when the loop control variable became zero. What happens if you need to execute the loop when the loop control variable goes to zero? For example, suppose that the loop above needed to range from seven down to zero. As long as the upper bound is positive, you can substitute the jns instruction in place of the jnz instruction above to repeat the loop some specific number of times: FLP:
mov mov add sub mov dec jns
I, 7 ax, K ax, I ax, J K, ax I FLP
This loop will repeat eight times with I taking on the values seven down to zero on each execution of the loop. When it decrements zero to minus one, it sets the sign flag and the loop terminates. Keep in mind that some values may look positive but they are negative. If the loop control variable is a byte, then values in the range 128..255 are negative. Likewise, 16-bit values in the range 32768..65535 are negative. Therefore, initializing the loop control variable with any value in the range 129..255 or 32769..65535 (or, of course, zero) will cause the loop to terminate after a single execution. This can get you into a lot of trouble if you’re not careful.
Page 537
Chapter 10
10.8.3
Loop Invariant Computations A loop invariant computation is some calculation that appears within a loop that always yields the same result. You needn’t do such computations inside the loop. You can compute them outside the loop and reference the value of the computation inside. The following Pascal code demonstrates a loop which contains an invariant computation: FOR I := 0 TO N DO K := K+(I+J-2);
Since J never changes throughout the execution of this loop, the sub-expression “J-2” can be computed outside the loop and its value used in the expression inside the loop: temp := J-2; FOR I := 0 TO N DO K := K+(I+temp);
Of course, if you’re really interested in improving the efficiency of this particular loop, you’d be much better off (most of the time) computing K using the formula: ( N + 2) × ( N + 2) K = K + ( ( N + 1 ) × temp ) + ----------------------------------------------2
This computation for K is based on the formula: N
∑i i=0
( N + 1) × ( N ) = -------------------------------------2
However, simple computations such as this one aren’t always possible. Still, this demonstrates that a better algorithm is almost always better than the trickiest code you can come up with. In assembly language, invariant computations are even trickier. Consider this conversion of the Pascal code above:
FLP:
mov add mov mov mov mov add sub mov dec cmp jg
ax, J ax, 2 temp, ax ax, n I, ax ax, K ax, I ax, temp K, ax I I, -1 FLP
Of course, the first refinement we can make is to move the loop control variable (I) into a register. This produces the following code:
FLP:
Page 538
mov inc inc mov mov mov add sub mov dec cmp jg
ax, J ax ax temp, ax cx, n ax, K ax, cx ax, temp K, ax cx cx, -1 FLP
Control Structures This operation speeds up the loop by removing a memory access from each repetition of the loop. To take this one step further, why not use a register to hold the “temp” value rather than a memory location:
FLP:
mov inc inc mov mov add sub mov dec cmp jg
bx, J bx bx cx, n ax, K ax, cx ax, bx K, ax cx cx, -1 FLP
Furthermore, accessing the variable K can be removed from the loop as well:
FLP:
mov inc inc mov mov add sub dec cmp jg mov
bx, J bx bx cx, n ax, K ax, cx ax, bx cx cx, -1 FLP K, ax
One final improvement which is begging to be made is to substitute the loop instruction for the dec cx / cmp cx,-1 / JG FLP instructions. Unfortunately, this loop must be repeated whenever the loop control variable hits zero, the loop instruction cannot do this. However, we can unravel the last execution of the loop (see the next section) and do that computation outside the loop as follows:
FLP:
mov inc inc mov mov add sub loop sub mov
bx, J bx bx cx, n ax, K ax, cx ax, bx FLP ax, bx K, ax
As you can see, these refinements have considerably reduced the number of instructions executed inside the loop and those instructions that do appear inside the loop are very fast since they all reference registers rather than memory locations. Removing invariant computations and unnecessary memory accesses from a loop (particularly an inner loop in a set of nested loops) can produce dramatic performance improvements in a program.
10.8.4
Unraveling Loops For small loops, that is, those whose body is only a few statements, the overhead required to process a loop may constitute a significant percentage of the total processing time. For example, look at the following Pascal code and its associated 80x86 assembly language code:
Page 539
Chapter 10 FOR I := 3 DOWNTO 0 DO A [I] := 0; FLP:
mov mov shl mov dec jns
I, 3 bx, I bx, 1 A [bx], 0 I FLP
Each execution of the loop requires five instructions. Only one instruction is performing the desired operation (moving a zero into an element of A). The remaining four instructions convert the loop control variable into an index into A and control the repetition of the loop. Therefore, it takes 20 instructions to do the operation logically required by four. While there are many improvements we could make to this loop based on the information presented thus far, consider carefully exactly what it is that this loop is doing-- it’s simply storing four zeros into A[0] through A[3]. A more efficient approach is to use four mov instructions to accomplish the same task. For example, if A is an array of words, then the following code initializes A much faster than the code above: mov mov mov mov
A, 0 A+2, 0 A+4, 0 A+6, 0
You may improve the execution speed and the size of this code by using the ax register to hold zero: xor mov mov mov mov
ax, ax A, ax A+2, ax A+4, ax A+6, ax
Although this is a trivial example, it shows the benefit of loop unraveling. If this simple loop appeared buried inside a set of nested loops, the 5:1 instruction reduction could possibly double the performance of that section of your program. Of course, you cannot unravel all loops. Loops that execute a variable number of times cannot be unraveled because there is rarely a way to determine (at assembly time) the number of times the loop will be executed. Therefore, unraveling a loop is a process best applied to loops that execute a known number of times. Even if you repeat a loop some fixed number of iterations, it may not be a good candidate for loop unraveling. Loop unraveling produces impressive performance improvements when the number of instructions required to control the loop (and handle other overhead operations) represent a significant percentage of the total number of instructions in the loop. Had the loop above contained 36 instructions in the body of the loop (exclusive of the four overhead instructions), then the performance improvement would be, at best, only 10% (compared with the 300-400% it now enjoys). Therefore, the costs of unraveling a loop, i.e., all the extra code which must be inserted into your program, quickly reaches a point of diminishing returns as the body of the loop grows larger or as the number of iterations increases. Furthermore, entering that code into your program can become quite a chore. Therefore, loop unraveling is a technique best applied to small loops. Note that the superscalar x86 chips (Pentium and later) have branch prediction hardware and use other techniques to improve performance. Loop unrolling on such systems many actually slow down the code since these processors are optimized to execute short loops.
10.8.5
Induction Variables The following is a slight modification of the loop presented in the previous section:
Page 540
Control Structures FOR I := 0 TO 255 DO A [I] := 0;
FLP:
mov mov shl mov inc cmp jbe
I, 0 bx, I bx, 1 A [bx], 0 I I, 255 FLP
Although unraveling this code will still produce a tremendous performance improvement, it will take 257 instructions to accomplish this task7, too many for all but the most time-critical applications. However, you can reduce the execution time of the body of the loop tremendously using induction variables. An induction variable is one whose value depends entirely on the value of some other variable. In the example above, the index into the array A tracks the loop control variable (it’s always equal to the value of the loop control variable times two). Since I doesn’t appear anywhere else in the loop, there is no sense in performing all the computations on I. Why not operate directly on the array index value? The following code demonstrates this technique: FLP:
mov mov inc inc cmp jbe
bx, 0 A [bx], 0 bx bx bx, 510 FLP
Here, several instructions accessing memory were replaced with instructions that only access registers. Another improvement to make is to shorten the MOVA[bx],0 instruction using the following code:
FLP:
lea xor mov inc inc cmp jbe
bx, A ax, ax [bx], ax bx bx bx, offset A+510 FLP
This code transformation improves the performance of the loop even more. However, we can improve the performance even more by using the loop instruction and the cx register to eliminate the cmp instruction8:
FLP:
lea xor mov mov inc inc loop
bx, A ax, ax cx, 256 [bx], ax bx bx FLP
This final transformation produces the fastest executing version of this code9.
10.8.6
Other Performance Improvements There are many other ways to improve the performance of a loop within your assembly language programs. For additional suggestions, a good text on compilers such as “Compilers, Principles, Techniques, and Tools” by Aho, Sethi, and Ullman would be an
7. For this particular loop, the STOSW instruction could produce a big performance improvement on many 80x86 processors. Using the STOSW instruction would require only about six instructions for this code. See the chapter on string instructions for more details. 8. The LOOP instruction is not the best choice on the 486 and Pentium processors since dec cx” followed by “jne lbl” actually executes faster. 9. Fastest is a dangerous statement to use here! But it is the fastest of the examples presented here.
Page 541
Chapter 10 excellent place to look. Additional efficiency considerations will be discussed in the volume on efficiency and optimization.
10.9
Nested Statements As long as you stick to the templates provides in the examples presented in this chapter, it is very easy to nest statements inside one another. The secret to making sure your assembly language sequences nest well is to ensure that each construct has one entry point and one exit point. If this is the case, then you will find it easy to combine statements. All of the statements discussed in this chapter follow this rule. Perhaps the most commonly nested statements are the if..then..else statements. To see how easy it is to nest these statements in assembly language, consider the following Pascal code: if (x = y) then if (I >= J) then writeln(‘At point 1’) else writeln(‘At point 2) else write(‘Error condition’);
To convert this nested if..then..else to assembly language, start with the outermost if, convert it to assembly, then work on the innermost if: ; if (x = y) then mov cmp jne
ax, X ax, Y Else0
; Put innermost IF here jmp
IfDone0
; Else write(‘Error condition’); Else0:
print byte
“Error condition”,0
IfDone0:
As you can see, the above code handles the “if (X=Y)...” instruction, leaving a spot for the second if. Now add in the second if as follows: ; if (x = y) then mov cmp jne
;
IF ( I >= J) then writeln(‘At point 1’) mov cmp jnge print byte jmp
;
ax, X ax, Y Else0
ax, I ax, J Else1 “At point 1”,cr,lf,0 IfDone1
Else writeln (‘At point 2’);
Else1:
print byte
“At point 2”,cr,lf,0
jmp
IfDone0
IfDone1:
; Else write(‘Error condition’);
Page 542
Control Structures Else0:
print byte
“Error condition”,0
IfDone0:
The nested if appears in italics above just to help it stand out. There is an obvious optimization which you do not really want to make until speed becomes a real problem. Note in the innermost if statement above that the JMP IFDONE1 instructions simply jumps to a jmp instruction which transfers control to IfDone0. It is very tempting to replace the first jmp by one which jumps directly to IFDone0. Indeed, when you go in and optimize your code, this would be a good optimization to make. However, you shouldn’t make such optimizations to your code unless you really need the speed. Doing so makes your code harder to read and understand. Remember, we would like all our control structures to have one entry and one exit. Changing this jump as described would give the innermost if statement two exit points. The for loop is another commonly nested control structure. Once again, the key to building up nested structures is to construct the outside object first and fill in the inner members afterwards. As an example, consider the following nested for loops which add the elements of a pair of two dimensional arrays together: for i := 0 to 7 do for k := 0 to 7 do A [i,j] := B [i,j] + C [i,j];
As before, begin by constructing the outermost loop first. This code assumes that dx will be the loop control variable for the outermost loop (that is, dx is equivalent to “i”): ; for dx := 0 to 7 do mov cmp jnle
ForLp0:
dx, 0 dx, 7 EndFor0
; Put innermost FOR loop here inc jmp
dx ForLp0
EndFor0:
Now add the code for the nested for loop. Note the use of the cx register for the loop control variable on the innermost for loop of this code. ; for dx := 0 to 7 do mov cmp jnle
ForLp0:
;
dx, 0 dx, 7 EndFor0
for cx := 0 to 7 do
ForLp1:
mov cmp jnle
cx, 0 cx, 7 EndFor1
; Put code for A[dx,cx] := b[dx,cx] + C [dx,cx] here inc jmp
cx ForLp1
inc jmp
dx ForLp0
EndFor1:
EndFor0:
Once again the innermost for loop is in italics in the above code to make it stand out. The final step is to add the code which performs that actual computation. Page 543
Chapter 10
10.10 Timing Delay Loops Most of the time the computer runs too slow for most people’s tastes. However, there are occasions when it actually runs too fast. One common solution is to create an empty loop to waste a small amount of time. In Pascal you will commonly see loops like: for i := 1 to 10000 do ;
In assembly, you might see a comparable loop: DelayLp:
mov loop
cx, 8000h DelayLp
By carefully choosing the number of iterations, you can obtain a relatively accurate delay interval. There is, however, one catch. That relatively accurate delay interval is only going to be accurate on your machine. If you move your program to a different machine with a different CPU, clock speed, number of wait states, different sized cache, or half a dozen other features, you will find that your delay loop takes a completely different amount of time. Since there is better than a hundred to one difference in speed between the high end and low end PCs today, it should come as no surprise that the loop above will execute 100 times faster on some machines than on others. The fact that one CPU runs 100 times faster than another does not reduce the need to have a delay loop which executes some fixed amount of time. Indeed, it makes the problem that much more important. Fortunately, the PC provides a hardware based timer which operates at the same speed regardless of the CPU speed. This timer maintains the time of day for the operating system, so it’s very important that it run at the same speed whether you’re on an 8088 or a Pentium. In the chapter on interrupts you will learn to actually patch into this device to perform various tasks. For now, we will simply take advantage of the fact that this timer chip forces the CPU to increment a 32-bit memory location (40:6ch) about 18.2 times per second. By looking at this variable we can determine the speed of the CPU and adjust the count value for an empty loop accordingly. The basic idea of the following code is to watch the BIOS timer variable until it changes. Once it changes, start counting the number of iterations through some sort of loop until the BIOS timer variable changes again. Having noted the number of iterations, if you execute a similar loop the same number of times it should require about 1/18.2 seconds to execute. The following program demonstrates how to create such a Delay routine: .xlist include stdlib.a includelib stdlib.lib .list ; ; ; ; ; ; ;
PPI_B is the I/O address of the keyboard/speaker control port. This program accesses it simply to introduce a large number of wait states on faster machines. Since the PPI (Programmable Peripheral Interface) chip runs at about the same speed on all PCs, accessing this chip slows most machines down to within a factor of two of the slower machines.
PPI_B ; ; ; ; ; ;
Page 544
equ
61h
RTC is the address of the BIOS timer variable (40:6ch). The BIOS timer interrupt code increments this 32-bit location about every 55 ms (1/18.2 seconds). The code which initializes everything for the Delay routine reads this location to determine when 1/18th seconds have passed.
RTC
textequ
<es:[6ch]>
dseg
segment
para public ‘data’
Control Structures ; TimedValue contains the number of iterations the delay ; loop must repeat in order to waste 1/18.2 seconds. TimedValue
word
0
; RTC2 is a dummy variable used by the Delay routine to ; simulate accessing a BIOS variable. RTC2
word
dseg
ends
cseg
segment assume
0
para public ‘code’ cs:cseg, ds:dseg
; Main program which tests out the DELAY subroutine. Main
; ; ; ; ; ; ; ; ; ;
ax, dseg ds, ax
print byte
“Delay test routine”,cr,lf,0
Okay, let’s see how long it takes to count down 1/18th of a second. First, point ES as segment 40h in memory. The BIOS variables are all in segment 40h. This code begins by reading the memory timer variable and waiting until it changes. Once it changes we can begin timing until the next change occurs. That will give us 1/18.2 seconds. We cannot start timing right away because we might be in the middle of a 1/18.2 second period.
RTCMustChange:
; ; ; ;
proc mov mov
mov mov mov cmp je
ax, 40h es, ax ax, RTC ax, RTC RTCMustChange
Okay, begin timing the number of iterations it takes for an 18th of a second to pass. Note that this code must be very similar to the code in the Delay routine.
TimeRTC: DelayLp:
mov mov mov mov in dec jne cmp loope
cx, 0 si, RTC dx, PPI_B bx, 10 al, dx bx DelayLp si, RTC TimeRTC
neg mov
cx TimedValue, cx
mov mov
ax, ds es, ax
printf byte byte byte
“TimedValue = %d”,cr,lf “Press any key to continue”,cr,lf “This will begin a delay of five “
;CX counted down! ;Save away
Page 545
Chapter 10 byte dword
“seconds”,cr,lf,0 TimedValue
getc
DelayIt:
mov call loop
cx, 90 Delay18 DelayIt
Quit: Main
ExitPgm endp
;DOS macro to quit program.
; Delay18-This routine delays for approximately 1/18th sec. ; Presumably, the variable “TimedValue” in DS has ; been initialized with an appropriate count down ; value before calling this code. Delay18
; ; ; ; ; ; ; ; ; ; ; ; ;
near ds es ax bx cx dx si
mov mov mov
ax, dseg es, ax ds, ax
The following code contains two loops. The inside nested loop repeats 10 times. The outside loop repeats the number of times determined to waste 1/18.2 seconds. This loop accesses the hardware port “PPI_B” in order to introduce many wait states on the faster processors. This helps even out the timings on very fast machines by slowing them down. Note that accessing PPI_B is only done to introduce these wait states, the data read is of no interest to this code. Note the similarity of this code to the code in the main program which initializes the TimedValue variable. mov mov mov
cx, TimedValue si, es:RTC2 dx, PPI_B
mov in dec jne cmp loope
bx, 10 al, dx bx DelayLp si, es:RTC2 TimeRTC si dx cx bx ax es ds
Delay18
pop pop pop pop pop pop pop ret endp
cseg
ends
sseg stk sseg
segment word ends end
TimeRTC: DelayLp:
Page 546
proc push push push push push push push
para stack ‘stack’ 1024 dup (0) Main
Control Structures
10.11 Sample Program This chapter’s sample program is a simple moon lander game. While the simulation isn’t terribly realistic, this program does demonstrate the use and optimization of several different control structures including loops, if..then..else statements, and so on. ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Simple "Moon Lander" game. Randall Hyde 2/8/96 This program is an example of a trivial little "moon lander" game that simulates a Lunar Module setting down on the Moon's surface. At time T=0 the spacecraft's velocity is 1000 ft/sec downward, the craft has 1000 units of fuel, and the craft is 10,000 ft above the moon's surface. The pilot (user) can specify how much fuel to burn at each second. Note that all calculations are approximate since everything is done with integer arithmetic.
; Some important constants InitialVelocity InitialDistance InitialFuel MaxFuelBurn
= = = =
1000 10000 250 25
MoonsGravity AccPerUnitFuel
= =
5 -5
;Approx 5 ft/sec/sec ;-5 ft/sec/sec for each fuel unit.
.xlist include stdlib.a includelib stdlib.lib .list dseg
segment
para public 'data'
; Current distance from the Moon's Surface: CurDist
word
InitialDistance
; Current Velocity: CurVel
word
InitialVelocity
; Total fuel left to burn: FuelLeft
word
InitialFuel
; Amount of Fuel to use on current burn. Fuel
word
?
; Distance travelled in the last second. Dist
word
dseg
ends
cseg
segment assume
?
para public 'code' cs:cseg, ds:dseg
Page 547
Chapter 10 ; GETI-Reads an integer variable from the user and returns its ; its value in the AX register. If the user entered garbage, ; this code will make the user re-enter the value. geti _geti
; ; ; ; ; ; ;
es di bx
Read a string of characters from the user. Note that there are two (nested) loops here. The outer loop (GetILp) repeats the getsm operation as long as the user keeps entering an invalid number. The innermost loop (ChkDigits) checks the individual characters in the input string to make sure they are all decimal digits.
GetILp: ; ; ; ; ; ;
textequ proc push push push
getsm
Check to see if this string contains any non-digit characters: while (([bx] >= '0') and ([bx] <= '9')
bx := bx + 1;
Note the sneaky way of turning the while loop into a repeat..until loop.
ChkDigits:
mov dec inc mov IsDigit je
bx, di bx bx al, es:[bx] ChkDigits
cmp je
al, 0 GotNumber
;Pointer to start of string.
;Fetch next character. ;See if it's a decimal digit. ;Repeat if it is. ;At end of string?
; Okay, we just ran into a non-digit character. ; the user reenter the value. free print byte byte byte byte jmp
Complain and make
;Free space malloc'd by getsm. cr,lf "Illegal unsigned integer value, " "please reenter.",cr,lf "(no spaces, non-digit chars, etc.):",0 GetILp
; Okay, ES:DI is pointing at something resembling a number. ; it to an integer. GotNumber:
Page 548
atoi free
Convert
;Free space malloc'd by getsm.
_geti
pop pop pop ret endp
bx di es
; InitGame-
Initializes global variables this game uses.
InitGame
proc mov mov mov mov ret
CurVel, InitialVelocity CurDist, InitialDistance FuelLeft, InitialFuel Dist, 0
Control Structures InitGame
endp
; DispStatus;
Displays important information for each cycle of the game (a cycle is one second).
DispStatus
proc printf byte byte byte byte byte byte byte dword ret endp
DispStatus
cr,lf "Distance from surface: %5d",cr,lf "Current velocity: %5d",cr,lf "Fuel left: %5d",cr,lf lf "Dist travelled in the last second: %d",cr,lf lf,0 CurDist, CurVel, FuelLeft, Dist
; GetFuel; ; ; ; ; ; ; ; ;
Reads an integer value representing the amount of fuel to burn from the user and checks to see if this value is reasonable. A reasonable value must:
GetFuel
proc push
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
* Be an actual number (GETI handles this). * Be greater than or equal to zero (no burning negative amounts of fuel, GETI handles this). * Be less than MaxFuelBurn (any more than this and you have an explosion, not a burn). * Be less than the fuel left in the Lunar Module. ax
Loop..endloop structure that reads an integer input and terminates if the input is reasonable. It prints a message an repeats if the input is not reasonable. loop get fuel; if (fuel < MaxFuelBurn) then break; print error message. endloop if (fuel > FuelLeft) then fuel = fuelleft; print appropriate message. endif
GetFuelLp:
GoodFuel:
print byte geti cmp jbe
ax, MaxFuelBurn GoodFuel
print byte byte byte jmp
"The amount you've specified exceeds the " "engine rating,", cr, lf "please enter a smaller value",cr,lf,lf,0 GetFuelLp
"Enter amount of fuel to burn: ",0
mov cmp jbe printf byte byte byte dword
Fuel, ax ax, FuelLeft HasEnough
mov
ax, FuelLeft
"There are only %d units of fuel left.",cr,lf "The Lunar module will burn this rather than %d" cr,lf,0 FuelLeft, Fuel
Page 549
Chapter 10
HasEnough:
GetFuel
mov
Fuel, ax
mov sub mov pop ret endp
ax, FuelLeft ax, Fuel FuelLeft, ax ax
; ComputeStatus; ; This routine computes the new velocity and new distance based on the ; current distance, current velocity, fuel burnt, and the moon's ; gravity. This routine is called for every "second" of flight time. ; This simplifies the following equations since the value of T is ; always one. ; ; note: ; ; Distance Travelled = Acc*T*T/2 + Vel*T (note: T=1, so it goes away). ; Acc = MoonsGravity + Fuel * AccPerUnitFuel ; ; New Velocity = Acc*T + Prev Velocity ; ; This code should really average these values over the one second ; time period, but the simulation is so crude anyway, there's no ; need to really bother. ComputeStatus
proc push push push
ax bx dx
; First, compute the acceleration value based on the fuel burnt ; during this second (Acc = Moon's Gravity + Fuel * AccPerUnitFuel). mov mov imul
ax, Fuel dx, AccPerUnitFuel dx
;Compute ; Fuel*AccPerUnitFuel
add mov
ax, MoonsGravity bx, ax
;Add in Moon's gravity. ;Save Acc value.
; Now compute the new velocity (V=AT+V) add mov
ax, CurVel CurVel, ax
;Compute new velocity
; Next, compute the distance travelled (D = 1/2 * A * T^2 + VT +D) sar add mov neg add
bx, 1 ax, bx Dist, ax ax CurDist, ax dx bx ax
ComputeStatus
pop pop pop ret endp
; GetYorN; ;
Reads a yes or no answer from the user (Y, y, N, or n). Returns the character read in the al register (Y or N, converted to upper case if necessary).
GetYorN
proc getc ToUpper cmp je cmp jne ret
GotIt:
Page 550
al, 'Y' GotIt al, 'N' GetYorN
;Acc/2 ;Acc/2 + V (T=1!) ;Distance Travelled. ;New distance.
Control Structures GetYorN
endp
Main
proc mov mov mov meminit
MoonLoop:
ax, dseg ds, ax es, ax
print byte byte byte byte byte
cr,lf,lf "Welcome to the moon lander game.",cr,lf,lf "You must manuever your craft so that you touch" "down at less than 10 ft/sec",cr,lf "for a soft landing.",cr,lf,lf,0
call
InitGame
; The following loop repeats while the distance to the surface is greater ; than zero. WhileStillUp:
Landed:
SoftLanding:
TryAgain:
mov cmp jle
ax, CurDist ax, 0 Landed
call call call jmp
DispStatus GetFuel ComputeStatus WhileStillUp
cmp jle
CurVel, 10 SoftLanding
printf byte byte byte byte byte dword
"Your current velocity is %d.",cr,lf "That was just a little too fast. However, as a " "consolation prize,",cr,lf "we will name the new crater you just created " "after you.",cr,lf,0 CurVel
jmp
TryAgain
printf byte byte byte byte dword
"Congrats! You landed the Lunar Module safely at " "%d ft/sec.",cr,lf "You have %d units of fuel left.",cr,lf "Good job!",cr,lf,0 CurVel, FuelLeft
print byte call cmp je
"Do you want to try again (Y/N)? ",0 GetYorN al, 'Y' MoonLoop
print byte byte byte byte
cr,lf "Thanks for playing! “again sometime" cr,lf,lf,0
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?)
Come back to the moon “
;DOS macro to quit program.
Main
Page 551
Chapter 10
Voltage on output port
One Clock Period
Logic 1 Logic 0 Time Note: Frequency is equal to the recipricol of the clock period. Audible sounds are between 20 and 20,000 Hz.
Figure 10.2 An Audible Sound Wave: The Relationship Between Period and Frequency
Input an alternating electrical signal to the speaker.
The speaker responds by pushing the air in an out according to the electrical signal.
Figure 10.3 A Speaker
10.12 Laboratory Exercises In this laboratory exercise you will program the timer chip on the PC to produce musical tones. You will learn how the PC generates sound and how you can use this ability to encode and play music.
10.12.1The Physics of Sound Sounds you hear are the result of vibrating air molecules. When air molecules quickly vibrate back and forth between 20 and 20,000 times per second, we interpret this as some sort of sound. A speaker (see Figure 10.3) is a device which vibrates air in response to an electrical signal. That is, it converts an electric signal which alternates between 20 and 20,000 times per second (Hz) to an audible tone. Alternating a signal is very easy on a computer, all you have to do is apply a logic one to an output port for some period of time and then write a logic zero to the output port for a short period. Then repeat this over and over again. A plot of this activity over time appears in Figure 10.2. Although many humans are capable of hearing tones in the range 20-20Khz, the PC’s speaker is not capable of faithfully reproducing the tones in this range. It works pretty good for sounds in the range 100-10Khz, but the volume drops off dramatically outside this range. Fortunately, this lab only requires frequencies in the 110-2,000 hz range; well within the capabilities of the PC speaker.
Page 552
Control Structures
F D B G E
E C A F D Middle C
A F D B G
B G E C A
Figure 10.4 A Musical Staff
10.12.2
The Fundamentals of Music In this laboratory you will use the timer chip and the PC’s built-in speaker to produce musical tones. To produce true music, rather than annoying tones, requires a little knowledge of music theory. This section provides a very brief introduction to the notation musicians use. This will help you when you attempt to convert music in standard notation to a form the computer can use. Western music tends to use notation based on the alphabetic letters A…G. There are a total of 12 notes designated A, A#, B, C, C#, D, D#, E, F, F#, G, and G# 10. On a typical musical instrument these 12 notes repeat over and over again. For example, a typical piano might have six repetitions of these 12 notes. Each repetition is an octave. An octave is just a collection of 12 notes, it need not necessarily start with A, indeed, most pianos start with C. Although there are, technically, about 12 octaves within the normal hearing range of adults, very little music uses more than four or five octaves. In the laboratory, you will implement four octaves. Written music typically uses two staffs. A staff is a set of five parallel lines. The upper staff is often called the treble staff and the lower staff is often called the bass staff. An examples appears in Figure 10.4. A musical note, as the notation to the side of the staffs above indicates, appears both on the lines of the staffs and the spaces between the staffs. The position of the notes on the staffs determine which note to play, the shape of the note determines its duration. There are whole notes, half notes, quarter notes, eighth notes, sixteenth notes, and thirty-second notes11. Note durations are specified relative to one another. So a half note plays for one-half the time of a whole note, a quarter note plays for one-half the time of a half note (one quarter the time of a whole note), etc. In most musical passages, the quarter note is generally the basis for timing. If the tempo of a particular piece is 100 beats per second this means that you play 100 quarter notes per second. The duration of a note is determined by its shape as shown in Figure 10.5. In addition to the notes themselves, there are often brief pauses in a musical passage when there are no notes being played. These pauses are known as rests. Since there is nothing audible about them, only their duration matters. The duration of the various rests is the same as the normal notes; there are whole rests, half rests, quarter rests, etc. The symbols for these rests appear in . This is but a brief introduction to music notation. Barely sufficient for those without any music training to convert a piece of sheet music into a form suitable for a computer
10. The notes with the “#” (pronounced sharp) correspond to the black keys on the piano. The other notes correspond to the white keys on the piano. Note that western music notation also describes flats in addition to sharps. A# is equal to Bb (b denotes flat), C# corresponds to Db, etc. Technically, B is equivalent to Cb and C is equivalent to B# but you will rarely see musicians refer to these notes this way. 11. The only reason their aren’t shorter notes is because it would be hard to play one note which is 1/64th the length of another.
Page 553
Chapter 10
Whole Note
Half Note
Quarter Note
Eighth Note
Sixteenth Thirty-Second Note Note
Figure 10.5 Note Durations
Whole Rest
Half Rest
Quarter Rest
Eighth Rest
Sixteenth Rest
Thirty-Second Rest
Figure 10.6 Rest Durations
Amazing Grace. John Newton, John Rees, Edwin Excell #
#
Figure 10.7 Amazing Grace program. If you are interested in more information on music notation, the library is a good source of information on music theory. Figure 10.7 provides an adaptation of the hymn “Amazing Grace”. There are two things to note here. First, there is no bass staff, just two treble staffs. Second, the sharp symbol on the “F” line indicates that this song is played in “G-Major” and that all F notes should be F#. There are no F notes in this song, so that hardly matters 12.
10.12.3The Physics of Music Each musical note corresponds to a unique frequency. The A above middle C is generally 440 Hz (this is known as concert pitch since this is the frequency orchestras tune to). The A one octave below this is at 220 Hz, the A above this is 880Hz. In general, to get the next higher A you double the current frequency, to get the previous A you halve the current frequency. To obtain the remaining notes you multiply the frequency of A with a multiple of the twelfth root of two. For example, to get A# you would take the frequency for A
12. In the full version of the song there are F notes on the base clef.
Page 554
Control Structures and multiply it by the twelfth root of two. Repeating this operation yields the following (truncated) frequencies for four separate octaves:
Note
Frequency
Note
Frequency
Note
Frequency
Note
Frequency
A0
110
A1
220
A2
440
A3
880
A#0
117
A#1
233
A#2
466
A#3
932
B0
123
B1
247
B2
494
B3
988
C0
131
C1
262
C2
523
C3
1047
C#0
139
C#1
277
C#2
554
C#3
1109
D0
147
D1
294
D2
587
D3
1175
D#0
156
D#1
311
D#2
622
D#3
1245
E0
165
E1
330
E2
659
E3
1319
F0
175
F1
349
F2
698
F3
1397
F#0
185
F#1
370
F#2
740
F#3
1480
G0
196
G1
392
G2
784
G3
1568
G#0
208
G#1
415
G#2
831
G#3
1661
Notes: The number following each note denotes its octave. In the chart above, middle C is C1. You can generate additional notes by halving or doubling the notes above. For example, if you really need A(-1) (the octave below A0 above), dividing the frequency of A0 by two yields 55Hz. Likewise, if you want E4, you can obtain this by doubling E3 to produce 2638 Hz. Keep in mind that the frequencies above are not exact. They are rounded to the nearest integer because we will need integer frequencies in this lab.
10.12.4
The 8253/8254 Timer Chip PCs contain a special integrated circuit which produces a period signal. This chip (an Intel compatible 8253 or 8254, depending on your particular computer13) contains three different 16-bit counter/timer circuits. The PC uses one of these timers to generate the 1/18.2 second real time clock mentioned earlier. It uses the second of these timers to control the DMA refresh on main memory14. The third timer circuit on this chip is connected to the PC’s speaker. The PC uses this timer to produces beeps, tones, and other sounds. The RTC timer will be of interest to us in a later chapter. The DMA timer, if present on your PC, isn’t something you should mess with. The third timer, connected to the speaker, is the subject of this section.
10.12.5Programming the Timer Chip to Produce Musical Tones As mentioned earlier, one of the channels on the PC programmable interval timer (PIT) chip is connected to the PC’s speaker. To produce a musical tone we need to program this timer chip to produce the frequency of some desired note and then activate the 13. Most modern computers don’t actually have an 8253 or 8254 chip. Instead, there is a compatible device built into some other VLSI chip on the motherboard. 14. Many modern computer systems do not use this timer for this purpose and, therefore, do not include the second timer in their chipset.
Page 555
Chapter 10 speaker. Once you initialize the timer and speaker in this fashion, the PC will continuously produce the specified tone until you disable the speaker. To activate the speaker you must set bits zero and one of the “B Port” on the PC’s 8255 Programmable Peripheral Interface (PPI) chip. Port B of the PPI is an eight-bit I/O device located at I/O address 61h. You must use the in instruction to read this port and the out instruction to write data back to it. You must preserve all other bits at this I/O address. If you modify any of the other bits, you will probably cause the PC to malfunction, perhaps even reset. The following code shows how to set bits zero and one without affecting the other bits on the port: in or out
al, PPI_B al, 3 PPI_B, al
;PPI_B is equated to 61h ;Set bits zero and one.
Since PPI_B’s port address is less than 100h we can access this port directly, we do not have to load its port address into dx and access the port indirectly through dx. To deactivate the speaker you must write zeros to bits zero and one of PPI_B. The code is similar to the above except you force the bits to zero rather than to one. Manipulating bits zero and one of the PPI_B port let you turn on and off the speaker. It does not let you adjust the frequency of the tone the speaker produces. To do this you must program the PIT at I/O addresses 42h and 43h. To change the frequency applied to the speaker you must first write the value 0B6h to I/O port 43h (the PIT control word) and then you must write a 16-bit frequency divisor to port 42h (timer channel two). Since the port is only an eight-bit port, you must write the data using two successive OUT instructions to the same I/O address. The first byte you write is the L.O. byte of the divisor, the second byte you write is the H.O. byte. To compute the divisor value, you must use the following formula:
1193180 ---------------------------- = Divisor Frequency
For example, the divisor for the A above middle C (440 Hz) is 1,193,180/440 or 2,712 (rounded to the nearest integer). To program the PIT to play this note you would execute the following code: mov out mov out mov out
al, 0B6h PIT_CW, al al, 98h PIT_Ch2, al al, 0ah PIT_Ch2, al
;Control word code. ;Write control word (port 43h). ;2712 is 0A98h. ;Write L.O. byte (port 42h). ;Write H.O. byte (port 42h).
Assuming that you have activated the speaker, the code above will produce the A note until you deactivate the speaker or reprogram the PIT with a different divisor.
10.12.6Putting it All Together To create music you will need to activate the speaker, program the PIT, and then delay for some period of time while the note plays. At the end of that period, you need to reprogram the PIT and wait while the next note plays. If you encounter a rest, you need to deactivate the speaker for the given time interval. The key point is this time interval. If you simply reprogram the PPI and PIT chips at microprocessor speeds, your song will be over and done with in just a few microseconds. Far to fast to hear anything. Therefore, we need to use a delay, such as the software delay code presented earlier, to allow us to hear our notes.
Page 556
Control Structures A reasonable tempo is between 80 and 120 quarter notes per second. This means you should be calling the Delay18 routine between 9 and 14 times for each quarter note. A reasonable set of iterations is • • • • •
three times for sixteenth notes, six times for eighth notes, twelve times for quarter notes, twenty-four times for half notes, and forty-eight times for whole notes.
Of course, you may adjust these timings as you see fit to make your music sound better. The important parameter is the ratio between the different notes and rests, not the actual time. Since a typical piece of music contains many, many individual notes, it doesn’t make sense to reprogram the PIT and PPI chips individually for each note. Instead, you should write a procedure into which you pass a divisor and a count down value. That procedure would then play that note for the specified time and then return. Assuming you call this procedure PlayNote and it expects the divisor in ax and the duration (number of times to call Delay18) in cx , you could use the following macro to easily create songs in your programs: Note
macro mov mov call endm
divisor, duration ax, divisor cx, duration PlayNote
The following macro lets you easily insert a rest into your music: Rest
LoopLbl:
macro local mov call loop endm
Duration LoopLbl cx, Duration Delay18 LoopLbl
Now you can play notes by simply stringing together a list of these macros with the appropriate parameters. The only problem with this approach is that it is different to create songs if you must constantly supply divisor values. You’ll find music creation to be much simpler if you could specify the note, octave, and duration rather than a divisor and duration. This is very easy to do. Simply create a lookup table using the following definition: Divisors: array [Note, Sharp, Octave] of word;
Where Note is ‘A’;..”G”, Sharp is true or false (1 or 0), and Octave is 0..3. Each entry in the table would contain the divisor for that particular note.
10.12.7
Amazing Grace Exercise Program Ex10_1.asm on the companion CD-ROM is a complete working program that plays the tune “Amazing Grace.” Load this program an execute it. For your lab report: the Ex10_1.asm file uses a “Note” macro that is very similar to the one appearing in the previous section. What is the difference between Ex10_1’s Note macro and the one in the previous section? What changes were made to PlayNote in order to accommodate this difference? The Ex10_1.asm program uses straight-line code (no loops or decisions) to play its tune. Rewrite the main body of the loop to use a pair of tables to feed the data to the Note and Rest macros. One table should contain a list of frequency values (use -1 for a rest), the other table should contain duration values. Put the two tables in the data segment and iniPage 557
Chapter 10 tialize them with the values for the Amazing Grace song. The loop should fetch a pair of values, one from each of the tables and call the Note or Rest macro as appropriate. When the loop encounters a frequency value of zero it should terminate. Note: you must call the rest macro at the end of the tune in order to shut the speaker off. For your lab report: make the changes to the program, document them, and include the print-out of the new program in your lab report.
10.13 Programming Projects 1)
Write a program to transpose two 4x4 arrays. The algorithm to transpose the arrays is for i := 0 to 3 do for j := 0 to 3 do begin temp := A [i,j]; A [i,j] := B [j,i]; B [j,i] := temp; end;
Write a main program that calls a transpose procedure. The main program should read the A array values from the user and print the A and B arrays after computing the transpose of A and placing the result in B. 2)
Create a program to play music which is supplied as a string to the program. The notes to play should consist of a string of ASCII characters terminated with a byte containing the value zero. Each note should take the following form: (Note)(Octave)(Duration)
where “Note” is A..G (upper or lower case), “Octave” is 0..3, and “Duration” is 1..8. “1” corresponds to an eighth note, “2” corresponds to a quarter note, “4” corresponds to a half note, and “8” corresponds to a whole note. Rests consist of an explanation point followed by a “Duration” value. Your program should ignore any spaces appearing in the string. The following sample piece is the song “Amazing Grace” presented earlier. Music
byte byte byte byte byte
"d12 "g14 "b12 "d13 0
g14 b11 g11 b14 a12 g14 e12 d13 !1 d12 " b11 g11 b14 a12 d28" d23 b11 d21 b11 g14 d12 e13 g12 e11 " !1 d12 g14 b11 g11 b14 a12 g18"
Write a program to play any song appearing in string form like the above string. Using music obtained from another source, submit your program playing that other song. 3)
A C character string is a sequence of characters that end with a byte containing zero. Some common character string routines include computing the length of a character string (by counting all the characters in a string up to, but not including, the zero byte), comparing two strings for equality (by comparing corresponding characters in two strings, character by character until you encounter a zero byte or two characters that are not the same), and copying one string to another (by copying the characters from one string to the corresponding positions in the other until you encounter the zero byte). Write a program that reads two strings from the user, computes the length of the first of these, compares the two strings, and then copies the first string over the top of the second. Allow for a maximum of 128 characters (including the zero byte) in your strings. Note: do not use the Standard Library string routines for this project.
4)
Modify the moon lander game appearing in the Sample Programs section of this chapter (moon.asm on the companion CD-ROM, also see “Sample Program” on page 547) to allow the user to specify the initial velocity, starting distance from the surface, and initial fuel values. Verify that the values are reasonable before allowing the game to proceed.
Page 558
Control Structures
10.14 Summary This chapter discussed the implementation of different control structures in an assembly language programs including conditional statements (if..then..else and case statements), state machines, and iterations (loops, including while, repeat..until (do/while), loop..endloop, and for). While assembly language gives you the flexibility to create totally custom control structures, doing so often produces programs that are difficult to read and understand. Unless the situation absolutely requires something different, you should attempt to model your assembly language control structures after those in high level languages as much as possible. The most common control structure found in high level language programs is the IF..THEN..ELSE statement. You can easily synthesize(if..then and (if..then..else statements in assembly language using the cmp instruction, the conditional jumps, and the jmp instruction. To see how to convert HLL if..then..else statements into assembly language, check out •
“IF..THEN..ELSE Sequences” on page 522
A second popular HLL conditional statement is the case (switch) statement. The case statement provides an efficient way to transfer control to one of many different statements depending on the value of some expression. While there are many ways to implement the case statement in assembly language, the most common way is to use a jump table. For case statements with contiguous values, this is probably the best implementation. For case statements that have widely spaced, non-contiguous values, an if..then..else implementation or some other technique is probably best. For details, see •
“CASE Statements” on page 525
State machines provide a useful paradigm for certain programming situations. A section of code which implements a state machine maintains a history of prior execution within a state variable. Subsequent execution of the code picks up in a possibly different “state” depending on prior execution. Indirect jumps provide an efficient mechanism for implementing state machines in assembly language. This chapter provided a brief introduction to state machines. To see how to implement a state machine with an indirect jump, see •
“State Machines and Indirect Jumps” on page 529
Assembly language provides some very powerful primitives for constructing a wide variety of control structures. Although this chapter concentrates on simulating HLL constructs, you can build any convoluted control structure you care to from the 80x86’s cmp instruction and conditional branches. Unfortunately, the result may be very difficult to understand, especially by someone other than the original author. Although assembly language gives you the freedom to do anything you want, a mature programmer exercises restraint and chooses only those control flows which are easy to read and understand; never settling for convoluted code unless absolutely necessary. For a further description and additional guidelines, see •
“Spaghetti Code” on page 531
Iteration is one of the three basic components to programming language built around Von Neumann machines15. Loop control structures provide the basic iteration mechanism in most HLLs. Assembly language does not provide any looping primitives. Even the 80x86 loop instruction isn’t really a loop, it’s just a decrement, compare, and branch instruction. Nonetheless, it is very easy to synthesize common loop control structures in assembly language. The following sections describe how to construct HLL loop control structures in assembly language: • • •
“Loops” on page 531 “While Loops” on page 532 “Repeat..Until Loops” on page 532
15. The other two being conditional execution and the sequence.
Page 559
Chapter 10 • •
“LOOP..ENDLOOP Loops” on page 533 “FOR Loops” on page 533
Program loops often consume most of the CPU time in a typical program. Therefore, if you want to improve the performance of your programs, the loops are the first place you want to look. This chapter provides several suggestions to help improve the performance of certain types of loops in assembly language programs. While they do not provide a complete guide to optimization, the following sections provide common techniques used by compilers and experienced assembly language programmers: • • • • • • • •
Page 560
“Register Usage and Loops” on page 534 “Performance Improvements” on page 535 “Moving the Termination Condition to the End of a Loop” on page 535 “Executing the Loop Backwards” on page 537 “Loop Invariant Computations” on page 538 “Unraveling Loops” on page 539 “Induction Variables” on page 540 “Other Performance Improvements” on page 541
Control Structures
10.15 Questions 1)
Convert the following Pascal statements to assembly language: (assume all variables are two byte signed integers) a) IF (X=Y) then A := B; b) IF (X <= Y) then X := X + 1 ELSE Y := Y - 1; c) IF NOT ((X=Y) and (Z <> T)) then Z := T else X := T; d) IF (X=0) and ((Y-2) > 1) then Y := Y - 1;
2)
Convert the following CASE statement to assembly language: CASE I OF 0: I := 5; 1: J := J+1; 2: K := I+J; 3: K := I-J; Otherwise I := 0; END;
3)
Which implementation method for the CASE statement (jump table or IF form) produces the least amount of code (including the jump table, if used) for the following CASE statements? a) CASE I OF 0:stmt; 100:stmt; 1000:stmt; END;
b) CASE I OF 0:stmt; 1:stmt; 2:stmt; 3:stmt; 4:stmt; END;
4)
For question three, which form produces the fastest code?
5)
Implement the CASE statements in problem three using 80x86 assembly language.
6)
What three components compose a loop?
7)
What is the major difference between the WHILE, REPEAT..UNTIL, and LOOP..ENDLOOP loops?
8)
What is a loop control variable?
9)
Convert the following WHILE loops to assembly language: (Note: don’t optimize these loops, stick exactly to the WHILE loop format) a)
I := 0; WHILE (I < 100) DO I := I + 1;
b)
CH := ‘ ‘; WHILE (CH <> ‘.’) DO BEGIN CH := GETC; PUTC(CH); END;
10)
Convert the following REPEAT..UNTIL loops into assembly language: (Stick exactly to the REPEAT..UNTIL loop format) Page 561
Chapter 10 a)
I := 0; REPEAT I := I + 1; UNTIL I >= 100;
b)
REPEAT CH := GETC; PUTC(CH); UNTIL CH = ‘.’;
11)
Convert the following LOOP..ENDLOOP loops into assembly language: (Stick exactly to the LOOP..ENDLOOP format) a) I := 0;
LOOP I := I + 1;
IF I >= 100 THEN BREAK;
ENDLOOP; b) LOOP CH := GETC;
IF CH = ‘.’ THEN BREAK; PUTC(CH);
ENDLOOP; 12)
What are the differences, if any, between the loops in problems 4, 5, and 6? Do they perform the same operations? Which versions are most efficient?
13)
Rewrite the two loops presented in the previous examples, in assembly language, as efficiently as you can.
14)
By simply adding a JMP instruction, convert the two loops in problem four into REPEAT..UNTIL loops.
15)
By simply adding a JMP instruction, convert the two loops in problem five to WHILE loops.
16)
Convert the following FOR loops into assembly language (Note: feel free to use any of the routines provided in the UCR Standard Library package): a) FOR I := 0 to 100 do WriteLn(I); b) FOR I := 0 to 7 do FOR J := 0 to 7 do K := K*(I-J); c)
FOR I := 255 to 16 do A [I] := A[240-I]-I;
17)
The DOWNTO reserved word, when used in conjunction with the Pascal FOR loop, runs a loop counter from a higher number down to a lower number. A FOR loop with the DOWNTO reserved word is equivalent to the following WHILE loop: loopvar := initial; while (loopvar >= final) do begin stmt; loopvar := loopvar-1; end;
Implement the following Pascal FOR loops in assembly language: a) FOR I := start downto stop do WriteLn(I); b) FOR I := 7 downto 0 do FOR J := 0 to 7 do
Page 562
Control Structures K := K*(I-J); c)
FOR I := 255 downto 16 do A [I] := A[240-I]-I;
18)
Rewrite the loop in problem 11b maintaining I in BX, J in CX, and K in AX.
19)
How does moving the loop termination test to the end of the loop improve the performance of that loop?
20)
What is a loop invariant computation?
21)
How does executing a loop backwards improve the performance of the loop?
22)
What does unraveling a loop mean?
23)
How does unraveling a loop improve the loop’s performance?
24)
Give an example of a loop that cannot be unraveled.
25)
Give an example of a loop that can be but shouldn’t be unraveled.
Page 563
Chapter 10
Page 564
Procedures and Functions
Chapter 11
Modular design is one of the cornerstones of structured programming. A modular program contains blocks of code with single entry and exit points. You can reuse well written sections of code in other programs or in other sections of an existing program. If you reuse an existing segment of code, you needn’t design, code, nor debug that section of code since (presumably) you’ve already done so. Given the rising costs of software development, modular design will become more important as time passes. The basic unit of a modular program is the module. Modules have different meanings to different people, herein you can assume that the terms module, subprogram, subroutine, program unit, procedure, and function are all synonymous. The procedure is the basis for a programming style. The procedural languages include Pascal, BASIC, C++, FORTRAN, PL/I, and ALGOL. Examples of non-procedural languages include APL, LISP, SNOBOL4 ICON, FORTH, SETL, PROLOG, and others that are based on other programming constructs such as functional abstraction or pattern matching. Assembly language is capable of acting as a procedural or non-procedural language. Since you’re probably much more familiar with the procedural programming paradigm this text will stick to simulating procedural constructs in 80x86 assembly language.
11.0
Chapter Overview This chapter presents an introduction to procedures and functions in assembly language. It discusses basic principles, parameter passing, function results, local variables, and recursion. You will use most of the techniques this chapter discusses in typical assembly language programs. The discussion of procedures and functions continues in the next chapter; that chapter discusses advanced techniques that you will not commonly use in assembly language programs. The sections below that have a “•” prefix are essential. Those sections with a “❏” discuss advanced topics that you may want to put off for a while. • ❏
• • • • • ❏ ❏ ❏
• • • • ❏
• • • • • ❏ ❏
Procedures. Near and far procedures. Functions Saving the state of the machine Parameters Pass by value parameters. Pass by reference parameters. Pass by value-returned parameters. Pass by result parameters. Pass by name parameters. Passing parameters in registers. Passing parameters in global variables. Passing parameters on the stack. Passing parameters in the code stream. Passing parameters via a parameter block. Function results. Returning function results in a register. Returning function results on the stack. Returning function results in memory locations. Side effects. Local variable storage. Recursion.
Page 565 Thi d
t
t d ith F
M k
402
Chapter 11
11.1
Procedures In a procedural environment, the basic unit of code is the procedure. A procedure is a set of instructions that compute some value or take some action (such as printing or reading a character value). The definition of a procedure is very similar to the definition of an algorithm. A procedure is a set of rules to follow which, if they conclude, produce some result. An algorithm is also such a sequence, but an algorithm is guaranteed to terminate whereas a procedure offers no such guarantee. Most procedural programming languages implement procedures using the call/return mechanism. That is, some code calls a procedure, the procedure does its thing, and then the procedure returns to the caller. The call and return instructions provide the 80x86’s procedure invocation mechanism. The calling code calls a procedure with the call instruction, the procedure returns to the caller with the ret instruction. For example, the following 80x86 instruction calls the UCR Standard Library sl_putcr routine1: call
sl_putcr
sl_putcr prints a carriage return/line feed sequence to the video display and returns control to the instruction immediately following the call sl_putcr instruction.
Alas, the UCR Standard Library does not supply all the routines you will need. Most of the time you’ll have to write your own procedures. A simple procedure may consist of nothing more than a sequence of instructions ending with a ret instruction. For example, the following “procedure” zeros out the 256 bytes starting at the address in the bx register: ZeroBytes: ZeroLoop:
xor mov mov add loop ret
ax, ax cx, 128 [bx], ax bx, 2 ZeroLoop
By loading the bx register with the address of some block of 256 bytes and issuing a call ZeroBytes instruction, you can zero out the specified block. As a general rule, you won’t define your own procedures in this manner. Instead, you should use MASM’s proc and endp assembler directives. The ZeroBytes routine, using the proc and endp directives, is ZeroBytes
ZeroLoop:
ZeroBytes
proc xor mov mov add loop ret endp
ax, ax cx, 128 [bx], ax bx, 2 ZeroLoop
Keep in mind that proc and endp are assembler directives. They do not generate any code. They’re simply a mechanism to help make your programs easier to read. To the 80x86, the last two examples are identical; however, to a human being, latter is clearly a self-contained procedure, the other could simply be an arbitrary set of instructions within some other procedure. Consider now the following code: ZeroBytes: ZeroLoop:
DoFFs:
xor jcxz mov add loop ret
ax, ax DoFFs [bx], ax bx, 2 ZeroLoop
mov mov
cx, 128 ax, 0ffffh
1. Normally you would use the putcr macro to accomplish this, but this call instruction will accomplish the same thing.
Page 566
Procedures and Functions FFLoop:
mov sub loop ret
[bx], ax bx, 2 FFLoop
Are there two procedures here or just one? In other words, can a calling program enter this code at labels ZeroBytes and DoFFs or just at ZeroBytes? The use of the proc and endp directives can help remove this ambiguity: Treated as a single subroutine: ZeroBytes
ZeroLoop:
DoFFs: FFLoop:
ZeroBytes
proc xor jcxz mov add loop ret mov mov mov sub loop ret endp
ax, ax DoFFs [bx], ax bx, 2 ZeroLoop
cx, 128 ax, 0ffffh [bx], ax bx, 2 FFLoop
Treated as two separate routines: ZeroBytes
ZeroLoop:
ZeroBytes DoFFs
FFLoop:
DoFFs
proc xor jcxz mov add loop ret endp proc mov mov mov sub loop ret endp
ax, ax DoFFs [bx], ax bx, 2 ZeroLoop
cx, 128 ax, 0ffffh [bx], ax bx, 2 FFLoop
Always keep in mind that the proc and endp directives are logical procedure separators. The 80x86 microprocessor returns from a procedure by executing a ret instruction, not by encountering an endp directive. The following is not equivalent to the code above: ZeroBytes
proc xor ax, ax jcxz DoFFs ZeroLoop: mov [bx], ax add bx, 2 loop ZeroLoop ; Note missing RET instr. ZeroBytes endp DoFFs
proc mov cx, 128 mov ax, 0ffffh FFLoop: mov [bx], ax sub bx, 2 loop FFLoop ; Note missing RET instr. DoFFs endp
Without the ret instruction at the end of each procedure, the 80x86 will fall into the next subroutine rather than return to the caller. After executing ZeroBytes above, the 80x86 will drop through to the DoFFs subroutine (beginning with the mov cx, 128 instruction). Page 567
Chapter 11 Once DoFFs is through, the 80x86 will continue execution with the next executable instruction following DoFFs’ endp directive. An 80x86 procedure takes the form: ProcName proc {near|far} ProcName endp
;Choose near, far, or neither.
The near or far operand is optional, the next section will discuss its purpose. The procedure name must be on the both proc and endp lines. The procedure name must be unique in the program. Every proc directive must have a matching endp directive. Failure to match the proc and endp directives will produce a block nesting error.
11.2
Near and Far Procedures The 80x86 supports near and far subroutines. Near calls and returns transfer control between procedures in the same code segment. Far calls and returns pass control between different segments. The two calling and return mechanisms push and pop different return addresses. You generally do not use a near call instruction to call a far procedure or a far call instruction to call a near procedure. Given this little rule, the next question is “how do you control the emission of a near or far call or ret?” Most of the time, the call instruction uses the following syntax: call
ProcName
and the ret instruction is either2: or
ret ret
disp
Unfortunately, these instructions do not tell MASM if you are calling a near or far procedure or if you are returning from a near or far procedure. The proc directive handles that chore. The proc directive has an optional operand that is either near or far. Near is the default if the operand field is empty3. The assembler assigns the procedure type (near or far) to the symbol. Whenever MASM assembles a call instruction, it emits a near or far call depending on operand. Therefore, declaring a symbol with proc or proc near, forces a near call. Likewise, using proc far, forces a far call. Besides controlling the generation of a near or far call, proc’s operand also controls ret code generation. If a procedure has the near operand, then all return instructions inside that procedure will be near. MASM emits far returns inside far procedures.
11.2.1
Forcing NEAR or FAR CALLs and Returns Once in a while you might want to override the near/far declaration mechanism. MASM provides a mechanism that allows you to force the use of near/far calls and returns. Use the near ptr and far ptr operators to override the automatic assignment of a near or far call. If NearLbl is a near label and FarLbl is a far label, then the following call instructions generate a near and far call, respectively: call call
NearLbl FarLbl
;Generates a NEAR call. ;Generates a FAR call.
Suppose you need to make a far call to NearLbl or a near call to FarLbl. You can accomplish this using the following instructions: 2. There are also retn and retf instructions. 3. Unless you are using MASM’s simplified segment directives. See the appendices for details.
Page 568
Procedures and Functions call call
far ptr NearLbl near ptr FarLbl
;Generates a FAR call. ;Generates a NEAR call.
Calling a near procedure using a far call, or calling a far procedure using a near call isn’t something you’ll normally do. If you call a near procedure using a far call instruction, the near return will leave the cs value on the stack. Generally, rather than: call
far ptr NearProc
you should probably use the clearer code: push call
cs NearProc
Calling a far procedure with a near call is a very dangerous operation. If you attempt such a call, the current cs value must be on the stack. Remember, a far ret pops a segmented return address off the stack. A near call instruction only pushes the offset, not the segment portion of the return address. Starting with MASM v5.0, there are explicit instructions you can use to force a near or far ret. If ret appears within a procedure declared via proc and end;, MASM will automatically generate the appropriate near or far return instruction. To accomplish this, use the retn and retf instructions. These two instructions generate a near and far ret, respectively.
11.2.2
Nested Procedures MASM allows you to nest procedures. That is, one procedure definition may be totally enclosed inside another. The following is an example of such a pair of procedures: OutsideProc
proc jmp
near EndofOutside
InsideProc
proc mov ret endp
near ax, 0
call mov ret endp
InsideProc bx, 0
InsideProc EndofOutside:
OutsideProc
Unlike some high level languages, nesting procedures in 80x86 assembly language doesn’t serve any useful purpose. If you nest a procedure (as with InsideProc above), you’ll have to code an explicit jmp around the nested procedure. Placing the nested procedure after all the code in the outside procedure (but still between the outside proc/endp directives) doesn’t accomplish anything. Therefore, there isn’t a good reason to nest procedures in this manner. Whenever you nest one procedure within another, it must be totally contained within the nesting procedure. That is, the proc and endp statements for the nested procedure must lie between the proc and endp directives of the outside, nesting, procedure. The following is not legal: OutsideProc
proc
near
. . .
InsideProc
proc
near
. . .
OutsideProc
endp . . .
InsideProc
endp
The OutsideProc and InsideProc procedures overlap, they are not nested. If you attempt to create a set of procedures like this, MASM would report a “block nesting error”. Figure 11.1 demonstrates this graphically.
Page 569
Chapter 11
OutsideProc Procedure
InsideProc Procedure
Figure 11.1 Illegal Procedure Nesting
OutsideProc Procedure
InsideProc Procedure
Figure 11.2 Legal Procedure Nesting
Segment declared with SEGMENT/ENDS
Procedure declared with PROC/ENDP
Figure 11.3 Legal Procedure/Segment Nesting The only form acceptable to MASM appears in Figure 11.2. Besides fitting inside an enclosing procedure, proc/endp groups must fit entirely within a segment. Therefore the following code is illegal: cseg MyProc cseg MyProc
segment proc ret ends endp
near
The endp directive must appear before the cseg ends statement since MyProc begins inside cseg. Therefore, procedures within segments must always take the form shown in Figure 11.3. Not only can you nest procedures inside other procedures and segments, but you can nest segments inside other procedures and segments as well. If you’re the type who likes to simulate Pascal or C procedures in assembly language, you can create variable declaration sections at the beginning of each procedure you create, just like Pascal:
Page 570
cgroup
group
cseg1, cseg2
cseg1 cseg1
segment ends
para public ‘code’
cseg2 cseg2
segment ends
para public ‘code’
Procedures and Functions
Main Program ZeroWords Main Program Vars ZeroWords Vars
Figure 11.4 Example Memory Layout dseg dseg
segment ends
para public ‘data’
cseg1
segment assume
para public ‘code’ cs:cgroup, ds:dseg
MainPgm
proc
near
; Data declarations for main program: dseg I J dseg
segment word word ends
para public ‘data’ ? ?
; Procedures that are local to the main program: cseg2
segment
para public ‘code’
ZeroWords
proc
near
; Variables local to ZeroBytes: dseg AXSave BXSave CXSave dseg
segment word word word ends
para public ‘data’ ? ? ?
; Code for the ZeroBytes procedure:
ZeroWords
mov mov mov xor mov inc inc loop mov mov mov ret endp
Cseg2
ends
ZeroLoop:
AXSave, ax CXSave, cx BXSave, bx ax, ax [bx], ax bx bx ZeroLoop ax, AXSave bx, BXSave cx, CXSave
; The actual main program begins here:
MainPgm cseg1
mov mov call ret endp ends end
bx, offset Array cx, 128 ZeroWords
The system will load this code into memory as shown in Figure 11.4. ZeroWords follows the main program because it belongs to a different segment (cseg2) than MainPgm (cseg1). Remember, the assembler and linker combine segments with the
Page 571
Chapter 11 same class name into a single segment before loading them into memory (see “Segment Loading Order” on page 368 for more details). You can use this feature of the assembler to “pseudo-Pascalize” your code in the fashion shown above. However, you’ll probably not find your programs to be any more readable than using the straight forward non-nesting approach.
11.3
Functions The difference between functions and procedures in assembly language is mainly a matter of definition. The purpose for a function is to return some explicit value while the purpose for a procedure is to execute some action. To declare a function in assembly language, use the proc/endp directives. All the rules and techniques that apply to procedures apply to functions. This text will take another look at functions later in this chapter in the section on function results. From here on, procedure will mean procedure or function.
11.4
Saving the State of the Machine Take a look at this code: Loop0:
mov call putcr loop
cx, 10 PrintSpaces Loop0
. . .
PrintSpaces
PSLoop:
PrintSpaces
proc mov mov putc loop ret endp
near al, ‘ ‘ cx, 40 PSLoop
This section of code attempts to print ten lines of 40 spaces each. Unfortunately, there is a subtle bug that causes it to print 40 spaces per line in an infinite loop. The main program uses the loop instruction to call PrintSpaces 10 times. PrintSpaces uses cx to count off the 40 spaces it prints. PrintSpaces returns with cx containing zero. The main program then prints a carriage return/line feed, decrements cx, and then repeats because cx isn’t zero (it will always contain 0FFFFh at this point). The problem here is that the PrintSpaces subroutine doesn’t preserve the cx register. Preserving a register means you save it upon entry into the subroutine and restore it before leaving. Had the PrintSpaces subroutine preserved the contents of the cx register, the program above would have functioned properly. Use the 80x86’s push and pop instructions to preserve register values while you need to use them for something else. Consider the following code for PrintSpaces: PrintSpaces
PSLoop:
PrintSpaces
proc push push mov mov putc loop pop pop ret endp
near ax cx al, ‘ ‘ cx, 40 PSLoop cx ax
Note that PrintSpaces saves and restores ax and cx (since this procedure modifies these registers). Also, note that this code pops the registers off the stack in the reverse order that it pushed them. The operation of the stack imposes this ordering. Page 572
Procedures and Functions Either the caller (the code containing the call instruction) or the callee (the subroutine) can take responsibility for preserving the registers. In the example above, the callee preserved the registers. The following example shows what this code might look like if the caller preserves the registers: Loop0:
mov push push call pop pop putcr loop
cx, 10 ax cx PrintSpaces cx ax Loop0
. . .
PrintSpaces
PSLoop:
PrintSpaces
proc mov mov putc loop ret endp
near al, ‘ ‘ cx, 40 PSLoop
There are two advantages to callee preservation: space and maintainability. If the callee preserves all affected registers, then there is only one copy of the push and pop instructions, those the procedure contains. If the caller saves the values in the registers, the program needs a set of push and pop instructions around every call. Not only does this make your programs longer, it also makes them harder to maintain. Remembering which registers to push and pop on each procedure call is not something easily done. On the other hand, a subroutine may unnecessarily preserve some registers if it preserves all the registers it modifies. In the examples above, the code needn’t save ax. Although PrintSpaces changes the al, this won’t affect the program’s operation. If the caller is preserving the registers, it doesn’t have to save registers it doesn’t care about: Loop0:
Loop1:
mov push call pop putcr loop putcr putcr call
cx, 10 cx PrintSpaces cx
mov mov putc push push call pop pop putc putcr loop
al, ‘*’ cx, 100
Loop0
PrintSpaces
ax cx PrintSpaces cx ax
Loop1
. . .
PrintSpaces
PSLoop:
PrintSpaces
proc mov mov putc loop ret endp
near al, ‘ ‘ cx, 40 PSLoop
This example provides three different cases. The first loop (Loop0) only preserves the cx register. Modifying the al register won’t affect the operation of this program. Immediately after the first loop, this code calls PrintSpaces again. However, this code doesn’t save
Page 573
Chapter 11 ax or cx because it doesn’t care if PrintSpaces changes them. Since the final loop (Loop1) uses ax and cx, it saves them both.
One big problem with having the caller preserve registers is that your program may change. You may modify the calling code or the procedure so that they use additional registers. Such changes, of course, may change the set of registers that you must preserve. Worse still, if the modification is in the subroutine itself, you will need to locate every call to the routine and verify that the subroutine does not change any registers the calling code uses. Preserving registers isn’t all there is to preserving the environment. You can also push and pop variables and other values that a subroutine might change. Since the 80x86 allows you to push and pop memory locations, you can easily preserve these values as well.
11.5
Parameters Although there is a large class of procedures that are totally self-contained, most procedures require some input data and return some data to the caller. Parameters are values that you pass to and from a procedure. There are many facets to parameters. Questions concerning parameters include: • • •
where is the data coming from? how do you pass and return data? what is the amount of data to pass?
There are six major mechanisms for passing data to and from a procedure, they are • • • • • •
pass by value, pass by reference, pass by value/returned, pass by result, and pass by name. pass by lazy evaluation
You also have to worry about where you can pass parameters. Common places are • • • • •
in registers, in global memory locations, on the stack, in the code stream, or in a parameter block referenced via a pointer.
Finally, the amount of data has a direct bearing on where and how to pass it. The following sections take up these issues.
11.5.1
Pass by Value A parameter passed by value is just that – the caller passes a value to the procedure. Pass by value parameters are input only parameters. That is, you can pass them to a procedure but the procedure cannot return them. In HLLs, like Pascal, the idea of a pass by value parameter being an input only parameter makes a lot of sense. Given the Pascal procedure call: CallProc(I);
If you pass I by value, the CallProc does not change the value of I, regardless of what happens to the parameter inside CallProc. Since you must pass a copy of the data to the procedure, you should only use this method for passing small objects like bytes, words, and double words. Passing arrays and
Page 574
Procedures and Functions strings by value is very inefficient (since you must create and pass a copy of the structure to the procedure).
11.5.2
Pass by Reference To pass a parameter by reference, you must pass the address of a variable rather than its value. In other words, you must pass a pointer to the data. The procedure must dereference this pointer to access the data. Passing parameters by reference is useful when you must modify the actual parameter or when you pass large data structures between procedures. Passing parameters by reference can produce some peculiar results. The following Pascal procedure provides an example of one problem you might encounter: program main(input,output); var m:integer; procedure bletch(var i,j:integer); begin i := i+2; j := j-i; writeln(i,’ ‘,j); end; . . .
begin {main} m := 5; bletch(m,m); end.
This particular code sequence will print “00” regardless of m’s value. This is because the parameters i and j are pointers to the actual data and they both point at the same object. Therefore, the statement j:=j-i; always produces zero since i and j refer to the same variable. Pass by reference is usually less efficient than pass by value. You must dereference all pass by reference parameters on each access; this is slower than simply using a value. However, when passing a large data structure, pass by reference is faster because you do not have to copy a large data structure before calling the procedure.
11.5.3
Pass by Value-Returned Pass by value-returned (also known as value-result) combines features from both the pass by value and pass by reference mechanisms. You pass a value-returned parameter by address, just like pass by reference parameters. However, upon entry, the procedure makes a temporary copy of this parameter and uses the copy while the procedure is executing. When the procedure finishes, it copies the temporary copy back to the original parameter. The Pascal code presented in the previous section would operate properly with pass by value-returned parameters. Of course, when Bletch returns to the calling code, m could only contain one of the two values, but while Bletch is executing, i and j would contain distinct values. In some instances, pass by value-returned is more efficient than pass by reference, in others it is less efficient. If a procedure only references the parameter a couple of times, copying the parameter’s data is expensive. On the other hand, if the procedure uses this parameter often, the procedure amortizes the fixed cost of copying the data over many inexpensive accesses to the local copy. Page 575
Chapter 11
11.5.4
Pass by Result Pass by result is almost identical to pass by value-returned. You pass in a pointer to the desired object and the procedure uses a local copy of the variable and then stores the result through the pointer when returning. The only difference between pass by value-returned and pass by result is that when passing parameters by result you do not copy the data upon entering the procedure. Pass by result parameters are for returning values, not passing data to the procedure. Therefore, pass by result is slightly more efficient than pass by value-returned since you save the cost of copying the data into the local variable.
11.5.5
Pass by Name Pass by name is the parameter passing mechanism used by macros, text equates, and the #define macro facility in the C programming language. This parameter passing mechanism uses textual substitution on the parameters. Consider the following MASM macro: PassByName
macro mov add endm
Parameter1, Parameter2 ax, Parameter1 ax, Parameter2
If you have a macro invocation of the form: PassByName bx, I
MASM emits the following code, substituting bx for Parameter1 and I for Parameter2: mov add
ax, bx ax, I
Some high level languages, such as ALGOL-68 and Panacea, support pass by name parameters. However, implementing pass by name using textual substitution in a compiled language (like ALGOL-68) is very difficult and inefficient. Basically, you would have to recompile a function everytime you call it. So compiled languages that support pass by name parameters generally use a different technique to pass those parameters. Consider the following Panacea procedure: PassByName: procedure(name item:integer; var index:integer); begin PassByName; foreach index in 0..10 do item := 0; endfor; end PassByName;
Assume you call this routine with the statement PassByName(A[i], i); where A is an array of integers having (at least) the elements A[0]..A[10]. Were you to substitute the pass by name parameter item you would obtain the following code: begin PassByName; foreach index in 0..10 do A[I] := 0; (* Note that index and I are aliases *) endfor; end PassByName;
This code zeros out elements 0..10 of array A. High level languages like ALGOL-68 and Panacea compile pass by name parameters into functions that return the address of a given parameter. So in one respect, pass by name parameters are similar to pass by reference parameters insofar as you pass the address of an object. The major difference is that with pass by reference you compute the Page 576
Procedures and Functions address of an object before calling a subroutine; with pass by name the subroutine itself calls some function to compute the address of the parameter. So what difference does this make? Well, reconsider the code above. Had you passed A[I] by reference rather than by name, the calling code would compute the address of A[I] just before the call and passed in this address. Inside the PassByName procedure the variable item would have always referred to a single address, not an address that changes along with I. With pass by name parameters, item is really a function that computes the
address of the parameter into which the procedure stores the value zero. Such a function might look like the following: ItemThunk
ItemThunk
proc mov shl lea ret endp
near bx, I bx, 1 bx, A[bx]
The compiled code inside the PassByName procedure might look something like the following: ; item := 0; call mov
ItemThunk word ptr [bx], 0
Thunk is the historical term for these functions that compute the address of a pass by name parameter. It is worth noting that most HLLs supporting pass by name parameters do not call thunks directly (like the call above). Generally, the caller passes the address of a thunk and the subroutine calls the thunk indirectly. This allows the same sequence of instructions to call several different thunks (corresponding to different calls to the subroutine).
11.5.6
Pass by Lazy-Evaluation Pass by name is similar to pass by reference insofar as the procedure accesses the parameter using the address of the parameter. The primary difference between the two is that a caller directly passes the address on the stack when passing by reference, it passes the address of a function that computes the parameter’s address when passing a parameter by name. The pass by lazy evaluation mechanism shares this same relationship with pass by value parameters – the caller passes the address of a function that computes the parameter’s value if the first access to that parameter is a read operation. Pass by lazy evaluation is a useful parameter passing technique if the cost of computing the parameter value is very high and the procedure may not use the value. Consider the following Panacea procedure header: PassByEval: procedure(eval a:integer; eval b:integer; eval c:integer);
When you call the PassByEval function it does not evaluate the actual parameters and pass their values to the procedure. Instead, the compiler generates thunks that will compute the value of the parameter at most one time. If the first access to an eval parameter is a read, the thunk will compute the parameter’s value and store that into a local variable. It will also set a flag so that all future accesses will not call the thunk (since it has already computed the parameter’s value). If the first access to an eval parameter is a write, then the code sets the flag and future accesses within the same procedure activation will use the written value and ignore the thunk. Consider the PassByEval procedure above. Suppose it takes several minutes to compute the values for the a, b, and c parameters (these could be, for example, three different possible paths in a Chess game). Perhaps the PassByEval procedure only uses the value of one of these parameters. Without pass by lazy evaluation, the calling code would have to spend the time to compute all three parameters even though the procedure will only use one of the values. With pass by lazy evaluation, however, the procedure will only spend Page 577
Chapter 11 the time computing the value of the one parameter it needs. Lazy evaluation is a common technique artificial intelligence (AI) and operating systems use to improve performance.
11.5.7
Passing Parameters in Registers Having touched on how to pass parameters to a procedure, the next thing to discuss is where to pass parameters. Where you pass parameters depends, to a great extent, on the size and number of those parameters. If you are passing a small number of bytes to a procedure, then the registers are an excellent place to pass parameters. The registers are an ideal place to pass value parameters to a procedure. If you are passing a single parameter to a procedure you should use the following registers for the accompanying data types:
Data Size Byte: Word: Double Word:
Pass in this Register al ax dx:ax or eax (if 80386 or better)
This is, by no means, a hard and fast rule. If you find it more convenient to pass 16 bit values in the si or bx register, by all means do so. However, most programmers use the registers above to pass parameters. If you are passing several parameters to a procedure in the 80x86’s registers, you should probably use up the registers in the following order: First
Last ax, dx, si, di, bx, cx
In general, you should avoid using bp register. If you need more than six words, perhaps you should pass your values elsewhere. The UCR Standard Library package provides several good examples of procedures that pass parameters by value in the registers. Putc, which outputs an ASCII character code to the video display, expects an ASCII value in the al register. Likewise, puti expects the value of a signed integer in the ax register. As another example, consider the following putsi (put short integer) routine that outputs the value in al as a signed integer: putsi
putsi
proc push cbw puti pop ret endp
ax
ax
;Save AH’s value. ;Sign extend AL -> AX. ;Let puti do the real work. ;Restore AH.
The other four parameter passing mechanisms (pass by reference, value-returned, result, and name) generally require that you pass a pointer to the desired object (or to a thunk in the case of pass by name). When passing such parameters in registers, you have to consider whether you’re passing an offset or a full segmented address. Sixteen bit offsets can be passed in any of the 80x86’s general purpose 16 bit registers. si, di, and bx are the best place to pass an offset since you’ll probably need to load it into one of these registers anyway4. You can pass 32 bit segmented addresses dx:ax like other double word parameters. However, you can also pass them in ds:bx, ds:si, ds:di, es:bx, es:si, or es:di and be able to use them without copying into a segment register. The UCR Stdlib routine puts, which prints a string to the video display, is a good example of a subroutine that uses pass by reference. It wants the address of a string in the es:di register pair. It passes the parameter in this fashion, not because it modifies the parameter, but because strings are rather long and passing them some other way would be inefficient. As another example, consider the following strfill(str,c) that copies the char-
4. This does not apply to thunks. You may pass the address of a thunk in any 16 bit register. Of course, on an 80386 or later processor, you can use any of the 80386’s 32-bit registers to hold an address.
Page 578
Procedures and Functions acter c (passed by value in al) to each character position in str (passed by reference in es:di) up to a zero terminating byte: ; strfill;
copies value in al to the string pointed at by es:di up to a zero terminating byte.
byp
textequ
strfill
proc pushf cld push jmp
di sfStart
stosb cmp jne
byp es:[di], 0 sfLoop
sfLoop: sfStart:
strfill
pop popf ret endp
di
;Save direction flag. ;To increment D with STOS. ;Save, because it’s changed. ;es:[di] := al, di := di + 1; ;Done yet? ;Restore di. ;Restore direction flag.
When passing parameters by value-returned or by result to a subroutine, you could pass in the address in a register. Inside the procedure you would copy the value pointed at by this register to a local variable (value-returned only). Just before the procedure returns to the caller, it could store the final result back to the address in the register. The following code requires two parameters. The first is a pass by value-returned parameter and the subroutine expects the address of the actual parameter in bx. The second is a pass by result parameter whose address is in si. This routine increments the pass by value-result parameter and stores the previous result in the pass by result parameter: ; CopyAndInc; ; ; ;
BX contains the address of a variable. This routine copies that variable to the location specified in SI and then increments the variable BX points at. Note: AX and CX hold the local copies of these parameters during execution.
CopyAndInc
proc push push mov mov inc mov mov pop pop ret endp
CopyAndInc
ax cx ax, [bx] cx, ax ax [si], cx [bx], ax cx ax
;Preserve AX across call. ;Preserve CX across call. ;Get local copy of 1st parameter. ;Store into 2nd parm’s local var. ;Increment 1st parameter. ;Store away pass by result parm. ;Store away pass by value/ret parm. ;Restore CX’s value. ;Restore AX’s value.
To make the call CopyAndInc(I,J) you would use code like the following: lea lea call
bx, I si, J CopyAndInc
This is, of course, a trivial example whose implementation is very inefficient. Nevertheless, it shows how to pass value-returned and result parameters in the 80x86’s registers. If you are willing to trade a little space for some speed, there is another way to achieve the same results as pass by value-returned or pass by result when passing parameters in registers. Consider the following implementation of CopyAndInc: CopyAndInc
CopyAndInc
proc mov inc ret endp
cx, ax ax
;Make a copy of the 1st parameter, ; then increment it by one.
Page 579
Chapter 11 To make the CopyAndInc(I,J) call, as before, you would use the following 80x86 code: mov call mov mov
ax, I CopyAndInc I, ax J, cx
Note that this code does not pass any addresses at all; yet it has the same semantics (that is, performs the same operations) as the previous version. Both versions increment I and store the pre-incremented version into J. Clearly the latter version is faster, although your program will be slightly larger if there are many calls to CopyAndInc in your program (six or more). You can pass a parameter by name or by lazy evaluation in a register by simply loading that register with the address of the thunk to call. Consider the Panacea PassByName procedure (see “Pass by Name” on page 576). One implementation of this procedure could be the following: ;PassByName; ;
Expects a pass by reference parameter index passed in si and a pass by name parameter, item, passed in dx (the thunk returns the address in bx).
PassByName
proc push mov cmp jg call mov inc jmp
ForLoop:
ForDone: PassByName
pop ret endp
ax word ptr word ptr ForDone dx word ptr word ptr ForLoop
[si], 0 [si], 10
;Preserve AX across call ;Index := 0; ;For loop ends at ten.
[bx], 0 [si]
;Call thunk item. ;Store zero into item. ;Index := Index + 1;
ax
;Restore AX. ;All Done!
You might call this routine with code that looks like the following: lea lea call
si, I dx, Thunk_A PassByName
. . .
Thunk_A
Thunk_A
proc mov shl lea ret endp
bx, I bx, 1 bx, A[bx]
The advantage to this scheme, over the one presented in the earlier section, is that you can call different thunks, not just the ItemThunk routine appearing in the earlier example.
11.5.8
Passing Parameters in Global Variables Once you run out of registers, the only other (reasonable) alternative you have is main memory. One of the easiest places to pass parameters is in global variables in the data segment. The following code provides an example: mov mov mov mov mov mov call . . .
Page 580
ax, xxxx ;Pass this parameter by value Value1Proc1, ax ax, offset yyyy ;Pass this parameter by ref word ptr Ref1Proc1, ax ax, seg yyyy word ptr Ref1Proc1+2, ax ThisProc
Procedures and Functions
Previous Stack Contents i's current value j's current value
If CallProc is a NEAR Procedure
The sum of k+4 Return address
Stack Pointer
Figure 11.5 CallProc Stack Layout for a Near Procedure ThisProc
ThisProc
proc push push push les mov mov pop pop pop ret endp
near es ax bx bx, Ref1Proc1 ax, Value1Proc1 es:[bx], ax bx ax es
;Get address of ref parm. ;Get value parameter ;Store into loc pointed at by ; the ref parameter.
Passing parameters in global locations is inelegant and inefficient. Furthermore, if you use global variables in this fashion to pass parameters, the subroutines you write cannot use recursion (see “Recursion” on page 606). Fortunately, there are better parameter passing schemes for passing data in memory so you do not need to seriously consider this scheme.
11.5.9
Passing Parameters on the Stack Most HLLs use the stack to pass parameters because this method is fairly efficient. To pass parameters on the stack, push them immediately before calling the subroutine. The subroutine then reads this data from the stack memory and operates on it appropriately. Consider the following Pascal procedure call: CallProc(i,j,k+4);
Most Pascal compilers push their parameters onto the stack in the order that they appear in the parameter list. Therefore, the 80x86 code typically emitted for this subroutine call (assuming you’re passing the parameters by value) is push push mov add push call
i j ax, k ax, 4 ax CallProc
Upon entry into CallProc, the 80x86’s stack looks like that shown in Figure 11.5 (for a near procedure ) or Figure 11.6 (for a far procedure). You could gain access to the parameters passed on the stack by removing the data from the stack (Assuming a near procedure call):
Page 581
Chapter 11
Previous Stack Contents i's current value j's current value
If CallProc is a FAR Procedure
The sum of k+4 Return segment Return offset
Stack Pointer
Figure 11.6 CallProc Stack Layout for a Far Procedure
Previous Stack Contents First Parameter Second Parameter
If this is a NEAR Procedure
Third Parameter Return address Original BP Value
BP, SP
Figure 11.7 Accessing Parameters on the Stack CallProc
proc pop pop pop pop push
near RtnAdrs kParm jParm iParm RtnAdrs
. . .
CallProc
ret endp
There is, however, a better way. The 80x86’s architecture allows you to use the bp (base pointer) register to access parameters passed on the stack. This is one of the reasons the disp[bp], [bp][di], [bp][si], disp[bp][si], and disp[bp][di] addressing modes use the stack segment rather than the data segment. The following code segment gives the standard procedure entry and exit code: StdProc
proc push mov
near bp bp, sp
. . .
StdProc
pop ret endp
bp ParmSize
ParmSize is the number of bytes of parameters pushed onto the stack before calling the procedure. In the CallProc procedure there were six bytes of parameters pushed onto the stack so ParmSize would be six.
Take a look at the stack immediately after the execution of mov bp, sp in StdProc. Assuming you’ve pushed three parameter words onto the stack, it should look something like shown in Figure 11.7. Page 582
Procedures and Functions
Previous Stack Contents 10
First Parameter
8
Second Parameter
6
Third Parameter
4 2
Segment Portion Return address Offset Portion
0
Original BP Value
If this is a FAR Procedure
BP, SP
Offset from BP
Figure 11.8 Accessing Parameters on the Stack in a Far Procedure Now the parameters can be fetched by indexing off the bp register: mov mov mov
ax, 8[bp] ax, 6[bp] ax, 4[bp]
;Accesses the first parameter ;Accesses the second parameter ;Accesses the third parameter
When returning to the calling code, the procedure must remove these parameters from the stack. To accomplish this, pop the old bp value off the stack and execute a ret 6 instruction. This pops the return address off the stack and adds six to the stack pointer, effectively removing the parameters from the stack. The displacements given above are for near procedures only. When calling a far procedure, • • • •
0[BP] will point at the old BP value, 2[BP] will point at the offset portion of the return address, 4[BP] will point at the segment portion of the return address, and 6[BP] will point at the last parameter pushed onto the stack.
The stack contents when calling a far procedure are shown in Figure 11.8. This collection of parameters, return address, registers saved on the stack, and other items, is a stack frame or activation record. When saving other registers onto the stack, always make sure that you save and set up bp before pushing the other registers. If you push the other registers before setting up bp, the offsets into the stack frame will change. For example, the following code disturbs the ordering presented above: FunnyProc
proc push push push mov
near ax bx bp bp, sp
. . .
FunnyProc
pop pop pop ret endp
bp bx ax
Since this code pushes ax and bx before pushing bp and copying sp to bp, ax and bx appear in the activation record before the return address (that would normally start at location [bp+2]). As a result, the value of bx appears at location [bp+2] and the value of ax appears at location [bp+4]. This pushes the return address and other parameters farther up the stack as shown in Figure 11.9. Page 583
Chapter 11
If this is a NEAR Procedure
Previous Stack Contents 8
Parameters Begin
6
Return Address
4
AX
2
BX
0
Original BP Value BP, SP
Offset from BP
Figure 11.9 Messing up Offsets by Pushing Other Registers Before BP
Previous Stack Contents 4
Parameters Begin
2
Return Address
0
Original BP Value
-2
AX
-4
BX
If this is a NEAR Procedure
BP SP
Offset from BP
Figure 11.10 Keeping the Offsets Constant by Pushing BP First Although this is a near procedure, the parameters don’t begin until offset eight in the activation record. Had you pushed the ax and bx registers after setting up bp, the offset to the parameters would have been four (see Figure 11.10). FunnyProc
proc push mov push push
near bp bp, sp ax bx
. . .
FunnyProc
pop pop pop ret endp
bx ax bp
Therefore, the push bp and mov bp, sp instructions should be the first two instructions any subroutine executes when it has parameters on the stack. Accessing the parameters using expressions like [bp+6] can make your programs very hard to read and maintain. If you would like to use meaningful names, there are several ways to do so. One way to reference parameters by name is to use equates. Consider the following Pascal procedure and its equivalent 80x86 assembly language code:
Page 584
Procedures and Functions procedure xyz(var i:integer; j,k:integer); begin i := j+k; end;
Calling sequence: xyz(a,3,4);
Assembly language code: xyz_i xyz_j xyz_k xyz
xyz
equ equ equ proc push mov push push push les mov add mov pop pop pop pop ret endp
8[bp] 6[bp] 4[bp] near bp bp, sp es ax bx bx, xyz_i ax, xyz_j ax, xyz_k es:[bx], ax bx ax es bp 8
mov push mov push mov push mov push call
ax, ax ax, ax ax, ax ax, ax xyz
;Use equates so we can reference ; symbolic names in the body of ; the procedure.
;Get address of I into ES:BX ;Get J parameter ;Add to K parameter ;Store result into I parameter
Calling sequence: seg a offset a
;This parameter is passed by ; reference, so pass its ; address on the stack.
3
;This is the second parameter
4
;This is the third parameter.
On an 80186 or later processor you could use the following code in place of the above: push push push push call
seg a offset a 3 4 xyz
;Pass address of “a” on the ; stack. ;Pass second parm by val. ;Pass third parm by val.
Upon entry into the xyz procedure, before the execution of the les instruction, the stack looks like shown in Figure 11.11. Since you’re passing I by reference, you must push its address onto the stack. This code passes reference parameters using 32 bit segmented addresses. Note that this code uses ret 8. Although there are three parameters on the stack, the reference parameter I consumes four bytes since it is a far address. Therefore there are eight bytes of parameters on the stack necessitating the ret 8 instruction. Were you to pass I by reference using a near pointer rather than a far pointer, the code would look like the following: xyz_i xyz_j xyz_k xyz
equ equ equ proc push mov push push mov
8[bp] 6[bp] 4[bp] near bp bp, sp ax bx bx, xyz_i
;Use equates so we can reference ; symbolic names in the body of ; the procedure.
;Get address of I into BX
Page 585
Chapter 11
12
Previous Stack Contents
10
Segmented Address of I
8
If this is a NEAR Procedure
Value of J
6 4
Value of K
2
Return Address
0 -2
Original BP Value
-4
AX
-6
BX
BP
ES SP
Offset from BP
Figure 11.11 XYZ Stack Upon Procedure Entry
xyz
mov add mov pop pop pop ret endp
ax, xyz_j ax, xyz_k [bx], ax bx ax bp 6
;Get J parameter ;Add to K parameter ;Store result into I parameter
Note that since I’s address on the stack is only two bytes (rather than four), this routine only pops six bytes when it returns. Calling sequence: mov push mov push mov push call
ax, offset a ;Pass near address of a. ax ax, 3 ;This is the second parameter ax ax, 4 ;This is the third parameter. ax xyz
On an 80286 or later processor you could use the following code in place of the above: push push push call
offset a 3 4 xyz
;Pass near address of a. ;Pass second parm by val. ;Pass third parm by val.
The stack frame for the above code appears in Figure 11.12. When passing a parameter by value-returned or result, you pass an address to the procedure, exactly like passing the parameter by reference. The only difference is that you use a local copy of the variable within the procedure rather than accessing the variable indirectly through the pointer. The following implementations for xyz show how to pass I by value-returned and by result: ; xyz version using Pass by Value-Returned for xyz_i
Page 586
xyz_i xyz_j xyz_k
equ equ equ
8[bp] 6[bp] 4[bp]
xyz
proc push mov push push
near bp bp, sp ax bx
;Use equates so we can reference ; symbolic names in the body of ; the procedure.
Procedures and Functions
10
Previous Stack Contents
8 6
Address of I
4
Value of K
Value of J
2
Return Address
0
Original BP Value
-2
AX
-4
BX
If this is a NEAR Procedure
BP
SP
Offset from BP
Figure 11.12 Passing Parameters by Reference Using Near Pointers Rather than Far Pointers
xyz
push
cx
;Keep local copy here.
mov mov
bx, xyz_i cx, [bx]
;Get address of I into BX ;Get local copy of I parameter.
mov add mov
ax, xyz_j ax, xyz_k cx, ax
;Get J parameter ;Add to K parameter ;Store result into local copy
mov mov
bx, xyz_i [bx], cx
;Get ptr to I, again ;Store result away.
pop pop pop pop ret endp
cx bx ax bp 6
There are a couple of unnecessary mov instructions in this code. They are present only to precisely implement pass by value-returned parameters. It is easy to improve this code using pass by result parameters. The modified code is ; xyz version using Pass by Result for xyz_i xyz_i xyz_j xyz_k
equ equ equ
8[bp] 6[bp] 4[bp]
;Use equates so we can reference ; symbolic names in the body of ; the procedure.
xyz
proc push mov push push push
near bp bp, sp ax bx cx
;Keep local copy here.
mov add mov
ax, xyz_j ax, xyz_k cx, ax
;Get J parameter ;Add to K parameter ;Store result into local copy
mov mov
bx, xyz_i [bx], cx
;Get ptr to I, again ;Store result away.
pop pop pop pop ret endp
cx bx ax bp 6
xyz
Page 587
Chapter 11 As with passing value-returned and result parameters in registers, you can improve the performance of this code using a modified form of pass by value. Consider the following implementation of xyz: ; xyz version using modified pass by value-result for xyz_i xyz_i xyz_j xyz_k
equ equ equ
8[bp] 6[bp] 4[bp]
xyz
proc push mov push
near bp bp, sp ax
mov add mov
ax, xyz_j ax, xyz_k xyz_i, ax
;Get J parameter ;Add to K parameter ;Store result into local copy
pop pop ret endp
ax bp 4
;Note that we do not pop I parm.
xyz
;Use equates so we can reference ; symbolic names in the body of ; the procedure.
The calling sequence for this code is push push push call pop
a 3 4 xyz a
;Pass a’s value to xyz. ;Pass second parameter by val. ;Pass third parameter by val.
Note that a pass by result version wouldn’t be practical since you have to push something on the stack to make room for the local copy of I inside xyz. You may as well push the value of a on entry even though the xyz procedure ignores it. This procedure pops only four bytes off the stack on exit. This leaves the value of the I parameter on the stack so that the calling code can store it away to the proper destination. To pass a parameter by name on the stack, you simply push the address of the thunk. Consider the following pseudo-Pascal code: procedure swap(name Item1, Item2:integer); var temp:integer; begin temp := Item1; Item1 := Item2; Item2 := Temp; end;
If swap is a near procedure, the 80x86 code for this procedure could look like the following (note that this code has been slightly optimized and does not following the exact sequence given above):
Page 588
; swap; ;
swaps two parameters passed by name on the stack. Item1 is passed at address [bp+6], Item2 is passed at address [bp+4]
wp swap_Item1 swap_Item2
textequ equ equ
<word ptr> [bp+6] [bp+4]
swap
proc push mov push push call mov call xchg call
near bp bp, sp ax bx wp swap_Item1 ax, [bx] wp swap_Item2 ax, [bx] wp swap_Item1
;Preserve temp value. ;Preserve bx. ;Get adrs of Item1. ;Save in temp (AX). ;Get adrs of Item2. ;Swap temp <-> Item2. ;Get adrs of Item1.
Procedures and Functions
swap
mov pop pop ret endp
[bx], ax bx ax 4
;Save temp in Item1. ;Restore bx. ;Restore ax. ;Return and pop Item1/2.
Some sample calls to swap follow: ; swap(A[i], i) -- 8086 version. lea push lea push call
ax, thunk1 ax ax, thunk2 ax swap
; swap(A[i],i) -- 80186 & later version. push push call
offset thunk1 offset thunk2 swap
. . . ; Note: this code assumes A is an array of two byte integers. thunk1
thunk1 thunk2
thunk2
proc mov shl lea ret endp
near bx, i bx, 1 bx, A[bx]
proc lea ret endp
near bx, i
The code above assumes that the thunks are near procs that reside in the same segment as the swap routine. If the thunks are far procedures the caller must pass far addresses on the stack and the swap routine must manipulate far addresses. The following implementation of swap, thunk1, and thunk2 demonstrate this. ; swap; ;
swaps two parameters passed by name on the stack. Item1 is passed at address [bp+10], Item2 is passed at address [bp+6]
swap_Item1 swap_Item2 dp
equ equ textequ
[bp+10] [bp+6]
swap
proc push mov push push push call mov call xchg call mov pop pop pop ret endp
far bp bp, sp ax bx es dp swap_Item1 ax, es:[bx] dp swap_Item2 ax, es:[bx] dp swap_Item1 es:[bx], ax es bx ax 8
swap
;Preserve temp value. ;Preserve bx. ;Preserve es. ;Get adrs of Item1. ;Save in temp (AX). ;Get adrs of Item2. ;Swap temp <-> Item2. ;Get adrs of Item1. ;Save temp in Item1. ;Restore es. ;Restore bx. ;Restore ax. ;Return and pop Item1, Item2.
Some sample calls to swap follow:
Page 589
Chapter 11 ; swap(A[i], i) -- 8086 version. mov push lea push mov push lea push call
ax, seg thunk1 ax ax, thunk1 ax ax, seg thunk2 ax ax, thunk2 ax swap
; swap(A[i],i) -- 80186 & later version. push push push push call
seg thunk1 offset thunk1 seg thunk2 offset thunk2 swap
. . .
; Note: ; ;
this code assumes A is an array of two byte integers. Also note that we do not know which segment(s) contain A and I.
thunk1
proc mov push mov mov mov shl lea pop ret endp
far bx, bx bx, es, bx, bx, bx, es
proc mov mov lea ret endp
near bx, seg i es, bx bx, i
thunk1 thunk2
thunk2
seg A seg i bx es:i 1 A[bx]
;Need to return seg A in ES. ;Save for later. ;Need segment of I in order ; to access it. ;Get I’s value.
;Return segment of A[I] in es.
;Need to return I’s seg in es.
Passing parameters by lazy evaluation is left for the programming projects. Additional information on activation records and stack frames appears later in this chapter in the section on local variables.
11.5.10 Passing Parameters in the Code Stream Another place where you can pass parameters is in the code stream immediately after the call instruction. The print routine in the UCR Standard Library package provides an excellent example: print byte
“This parameter is in the code stream.”,0
Normally, a subroutine returns control to the first instruction immediately following the call instruction. Were that to happen here, the 80x86 would attempt to interpret the ASCII code for “This...” as an instruction. This would produce undesirable results. Fortunately, you can skip over this string when returning from the subroutine. So how do you gain access to these parameters? Easy. The return address on the stack points at them. Consider the following implementation of print:
Page 590
Procedures and Functions MyPrint
PrintLp:
EndStr:
MyPrint
proc push mov push push mov mov cmp jz putc inc jmp
near bp bp, sp bx ax bx, 2[bp] al, cs:[bx] al, 0 EndStr
inc mov pop pop pop ret endp
bx 2[bp], bx ax bx bp
bx PrintLp
;Load return address into BX ;Get next character ;Check for end of string ;If not end, print this char ;Move on to the next character ;Point at first byte beyond zero ;Save as new return address
This procedure begins by pushing all the affected registers onto the stack. It then fetches the return address, at offset 2[BP], and prints each successive character until encountering a zero byte. Note the presence of the cs: segment override prefix in the mov al, cs:[bx] instruction. Since the data is coming from the code segment, this prefix guarantees that MyPrint fetches the character data from the proper segment. Upon encountering the zero byte, MyPrint points bx at the first byte beyond the zero. This is the address of the first instruction following the zero terminating byte. The CPU uses this value as the new return address. Now the execution of the ret instruction returns control to the instruction following the string. The above code works great if MyPrint is a near procedure. If you need to call MyPrint from a different segment you will need to create a far procedure. Of course, the major difference is that a far return address will be on the stack at that point – you will need to use a far pointer rather than a near pointer. The following implementation of MyPrint handles this case. MyPrint
PrintLp:
EndStr:
MyPrint
proc push mov push push push
far bp bp, sp bx ax es
les mov cmp jz putc inc jmp
bx, 2[bp] al, es:[bx] al, 0 EndStr
inc mov pop pop pop pop ret endp
bx 2[bp], bx es ax bx bp
bx PrintLp
;Preserve ES, AX, and BX
;Load return address into ES:BX ;Get next character ;Check for end of string ;If not end, print this char ;Move on to the next character ;Point at first byte beyond zero ;Save as new return address
Note that this code does not store es back into location [bp+4]. The reason is quite simple – es does not change during the execution of this procedure; storing es into location [bp+4] would not change the value at that location. You will notice that this version of MyPrint fetches each character from location es:[bx] rather than cs:[bx]. This is because the string you’re printing is in the caller’s segment, that might not be the same segment containing MyPrint.
Page 591
Chapter 11 Besides showing how to pass parameters in the code stream, the MyPrint routine also exhibits another concept: variable length parameters. The string following the call can be any practical length. The zero terminating byte marks the end of the parameter list. There are two easy ways to handle variable length parameters. Either use some special terminating value (like zero) or you can pass a special length value that tells the subroutine how many parameters you are passing. Both methods have their advantages and disadvantages. Using a special value to terminate a parameter list requires that you choose a value that never appears in the list. For example, MyPrint uses zero as the terminating value, so it cannot print the NULL character (whose ASCII code is zero). Sometimes this isn’t a limitation. Specifying a special length parameter is another mechanism you can use to pass a variable length parameter list. While this doesn’t require any special codes or limit the range of possible values that can be passed to a subroutine, setting up the length parameter and maintaining the resulting code can be a real nightmare5. Although passing parameters in the code stream is an ideal way to pass variable length parameter lists, you can pass fixed length parameter lists as well. The code stream is an excellent place to pass constants (like the string constants passed to MyPrint) and reference parameters. Consider the following code that expects three parameters by reference: Calling sequence: call word
AddEm I,J,K
proc push mov push push push mov mov mov mov add mov mov add mov pop pop pop pop ret endp
near bp bp, sp si bx ax si, [bp+2] bx, cs:[si+2] ax, [bx] bx, cs:[si+4] ax, [bx] bx, cs:[si] [bx], ax si, 6 [bp+2], si ax bx si bp
Procedure: AddEm
AddEm
;Get return address ;Get address of J ;Get J’s value ;Get address of K ;Add in K’s value ;Get address of I ;Store result ;Skip past parms ;Save return address
This subroutine adds J and K together and stores the result into I. Note that this code uses 16 bit near pointers to pass the addresses of I, J, and K to AddEm. Therefore, I, J, and K must be in the current data segment. In the example above, AddEm is a near procedure. Had it been a far procedure it would have needed to fetch a four byte pointer from the stack rather than a two byte pointer. The following is a far version of AddEm: AddEm
proc push mov push push push push les mov mov
far bp bp, si bx ax es si, bx, ax,
sp
[bp+2] es:[si+2] [bx]
5. Especially if the parameter list changes frequently.
Page 592
;Get far ret adrs into es:si ;Get address of J ;Get J’s value
Procedures and Functions
AddEm
mov add mov mov add mov pop pop pop pop pop ret endp
bx, es:[si+4] ax, [bx] bx, es:[si] [bx], ax si, 6 [bp+2], si es ax bx si bp
;Get address of K ;Add in K’s value ;Get address of I ;Store result ;Skip past parms ;Save return address
In both versions of AddEm, the pointers to I, J, and K passed in the code stream are near pointers. Both versions assume that I, J, and K are all in the current data segment. It is possible to pass far pointers to these variables, or even near pointers to some and far pointers to others, in the code stream. The following example isn’t quite so ambitious, it is a near procedure that expects far pointers, but it does show some of the major differences. For additional examples, see the exercises. Callling sequence: call dword
AddEm I,J,K
proc push mov push push push push mov les mov les add les mov add mov pop pop pop pop pop ret endp
near bp bp, sp si bx ax es si, [bp+2] bx, cs:[si+2] ax, es:[bx] bx, cs:[si+4] ax, es:[bx] bx, cs:[si] es:[bx], ax si, 12 [bp+2], si es ax bx si bp
Code: AddEm
AddEm
;Get near ret adrs into si ;Get address of J into es:bx ;Get J’s value ;Get address of K ;Add in K’s value ;Get address of I ;Store result ;Skip past parms ;Save return address
Note that there are 12 bytes of parameters in the code stream this time around. This is why this code contains an add si, 12 instruction rather than the add si, 6 appearing in the other versions. In the examples given to this point, MyPrint expects a pass by value parameter, it prints the actual characters following the call, and AddEm expects three pass by reference parameters – their addresses follow in the code stream. Of course, you can also pass parameters by value-returned, by result, by name, or by lazy evaluation in the code stream as well. The next example is a modification of AddEm that uses pass by result for I, pass by value-returned for J, and pass by name for K. This version is slightly differerent insofar as it modifies J as well as I, in order to justify the use of the value-returned parameter.
Page 593
Chapter 11 ; AddEm(Result I:integer; ValueResult J:integer; Name K); ; ; Computes I:= J; ; J := J+K; ; ; Presumes all pointers in the code stream are near pointers. AddEm
AddEm
proc push mov push push push push
near bp bp, sp si bx cx ax
;Pointer to parameter block. ;General pointer. ;Temp value for I. ;Temp value for J.
mov
si, [bp+2]
;Get near ret adrs into si
mov mov mov
bx, cs:[si+2] ax, es:[bx] cx, ax
;Get address of J into bx ;Create local copy of J. ;Do I:=J;
call add
word ptr cs:[si+4] ax, [bx]
;Call thunk to get K’s adrs ;Compute J := J + K
mov mov
bx, cs:[si] [bx], cx
;Get address of I and store ; I away.
mov mov
bx, cs:[si+2] [bx], ax
;Get J’s address and store ; J’s value away.
add mov pop pop pop pop pop ret endp
si, 6 [bp+2], si ax cx bx si bp
;Skip past parms ;Save return address
Example calling sequences: ; AddEm(I,J,K) call word
AddEm I,J,KThunk
call word . . . proc lea ret endp
AddEm I,J,AThunk
; AddEm(I,J,A[I])
KThunk
KThunk AThunk
AThunk
proc mov shl lea ret endp
near bx, K
near bx, I bx, 1 bx, A[bx]
Note: had you passed I by reference, rather than by result, in this example, the call AddEm(I,J,A[i])
would have produced different results. Can you explain why? Passing parameters in the code stream lets you perform some really clever tasks. The following example is considerably more complex than the others in this section, but it Page 594
Procedures and Functions
4
Previous Stack Contents
2
Return Address
0
Original BP Value
SP, BP
Offset from BP
Figure 11.13 Stack Upon Entry into the ForStmt Procedure
4
Previous Stack Contents
2
Return Address
0
Original BP Value
BP
-2
Return Address
SP
Offset from BP
Figure 11.14 Stack Just Before Leaving the ForStmt Procedure demonstrates the power of passing parameters in the code stream and, despite the complexity of this example, how they can simplify your programming tasks. The following two routines implement a for/next statement, similar to that in BASIC, in assembly language. The calling sequence for these routines is the following: call word
ForStmt «LoopControlVar», «StartValue», «EndValue»
. .
« loop body statements» . .
call
Next
This code sets the loop control variable (whose near address you pass as the first parameter, by reference) to the starting value (passed by value as the second parameter). It then begins execution of the loop body. Upon executing the call to Next, this program would increment the loop control variable and then compare it to the ending value. If it is less than or equal to the ending value, control would return to the beginning of the loop body (the first statement following the word directive). Otherwise it would continue execution with the first statement past the call to Next. Now you’re probably wondering, “How on earth does control transfer to the beginning of the loop body?” After all, there is no label at that statement and there is no control transfer instruction instruction that jumps to the first statement after the word directive. Well, it turns out you can do this with a little tricky stack manipulation. Consider what the stack will look like upon entry into the ForStmt routine, after pushing bp onto the stack (see Figure 11.13). Normally, the ForStmt routine would pop bp and return with a ret instruction, which removes ForStmt’s activation record from the stack. Suppose, instead, ForStmt executes the following instructions: add push mov ret
word ptr 2[b], 2 [bp+2] bp, [bp]
;Skip the parameters. ;Make a copy of the rtn adrs. ;Restore bp’s value. ;Return to caller.
Just before the ret instruction above, the stack has the entries shown in Figure 11.14. Page 595
Chapter 11
8
Previous Stack Contents
6
ForStmt's Return Address
4
ForStmt's BP Value
2
Next's Return Address
0
Next's BP Value SP, BP
Offset from BP
Figure 11.15 The Stack upon Entering the Next Procedure Upon executing the ret instruction, ForStmt will return to the proper return address but it will leave its activation record on the stack! After executing the statements in the loop body, the program calls the Next routine. Upon initial entry into Next (and setting up bp), the stack contains the entries appearing in Figure 11.156. The important thing to see here is that ForStmt’s return address, that points at the first statement past the word directive, is still on the stack and available to Next at offset [bp+6]. Next can use this return address to gain access to the parameters and return to the appropriate spot, if necessary. Next increments the loop control variable and compares it to the ending value. If the loop control variable’s value is less than the ending value, Next pops its return address off the stack and returns through ForStmt’s return address. If the loop control variable is greater than the ending value, Next returns through its own return address and removes ForStmt’s activation record from the stack. The following is the code for Next and ForStmt: .xlist include stdlib.a includelib stdlib.lib .list dseg I J dseg
segment word word ends
para public ‘data’ ? ?
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
wp
textequ
<word ptr>
ForStmt
proc push mov push push mov mov mov mov add pop
near bp bp, sp ax bx bx, [bp+2] ;Get return address ax, cs:[bx+2];Get starting value bx, cs:[bx] ;Get address of var [bx], ax ;var := starting value wp [bp+2], 6 ;Skip over parameters bx
6. Assuming the loop does not push anything onto the stack, or pop anything off the stack. Should either case occur, the ForStmt/Next loop would not work properly.
Page 596
Procedures and Functions
ForStmt Next
; ; ; ;
pop push mov ret endp
ax [bp+2] bp, [bp]
;Copy return address ;Restore bp ;Leave Act Rec on stack
proc near push bp mov bp, sp push ax push bx mov bx, [bp+6] ;ForStmt’s rtn adrs mov ax, cs:[bx-2];Ending value mov bx, cs:[bx-6];Ptr to loop ctrl var inc wp [bx] ;Bump up loop ctrl cmp ax, [bx] ;Is end val < loop ctrl? jl QuitLoop
If we get here, the loop control variable is less than or equal to the ending value. So we need to repeat the loop one more time. Copy ForStmt’s return address over our own and then return, leaving ForStmt’s activation record intact. mov mov pop pop pop ret
ax, [bp+6] [bp+2], ax bx ax bp
;ForStmt’s return address ;Overwrite our return address
;Return to start of loop body
; If we get here, the loop control variable is greater than the ; ending value, so we need to quit the loop (by returning to Next’s ; return address) and remove ForStmt’s activation record. QuitLoop:
Next Main
Quit: Main cseg sseg stk sseg zzzzzzseg LastBytes zzzzzzseg
pop pop pop ret endp proc mov mov mov meminit
bx ax bp 4
ax, dseg ds, ax es, ax
call word call word printf byte dword
ForStmt I,1,5 ForStmt J,2,4
call call print byte
Next Next
ExitPgm endp ends segment byte ends segment byte ends end
“I=%d, J=%d\n”,0 I,J ;End of J loop ;End of I loop
“All Done!”,cr,lf,0
para stack ‘stack’ 1024 dup (“stack “) para public ‘zzzzzz’ 16 dup (?) Main
The example code in the main program shows that these for loops nest exactly as you would expect in a high level language like BASIC, Pascal, or C. Of course, this is not a particularly good way to construct a for loop in assembly language. It is many times slower than using the standard loop generation techniques (see “Loops” on page 531 for more Page 597
Chapter 11 details on that). Of course, if you don’t care about speed, this is a perfectly good way to implement a loop. It is certainly easier to read and understand than the traditional methods for creating a for loop. For another (more efficient) implementation of the for loop, check out the ForLp macros in Chapter Eight (see “A Sample Macro to Implement For Loops” on page 409). The code stream is a very convenient place to pass parameters. The UCR Standard Library makes considerable use of this parameter passing mechanism to make it easy to call certain routines. Printf is, perhaps, the most complex example, but other examples (especially in the string library) abound. Despite the convenience, there are some disadvantages to passing parameters in the code stream. First, if you fail to provide the exact number of parameters the procedure requires, the subroutine will get very confused. Consider the UCR Standard Library print routine. It prints a string of characters up to a zero terminating byte and then returns control to the first instruction following the zero terminating byte. If you leave off the zero terminating byte, the print routine happily prints the following opcode bytes as ASCII characters until it finds a zero byte. Since zero bytes often appear in the middle of an instruction, the print routine might return control into the middle of some other instruction. This will probably crash the machine. Inserting an extra zero, which occurs more often than you might think, is another problem programmers have with the print routine. In such a case, the print routine would return upon encountering the first zero byte and attempt to execute the following ASCII characters as machine code. Once again, this usually crashes the machine. Another problem with passing parameters in the code stream is that it takes a little longer to access such parameters. Passing parameters in the registers, in global variables, or on the stack is slightly more efficient, especially in short routines. Nevertheless, accessing parameters in the code stream isn’t extremely slow, so the convenience of such parameters may outweigh the cost. Furthermore, many routines (print is a good example) are so slow anyway that a few extra microseconds won’t make any difference.
11.5.11 Passing Parameters via a Parameter Block Another way to pass parameters in memory is through a parameter block. A parameter block is a set of contiguous memory locations containing the parameters. To access such parameters, you would pass the subroutine a pointer to the parameter block. Consider the subroutine from the previous section that adds J and K together, storing the result in I; the code that passes these parameters through a parameter block might be Calling sequence: ParmBlock I J K
dword word word word
I ? ? ?
;I, J, and K must appear in ; this order.
. . .
les call
bx, ParmBlock AddEm
. . .
AddEm
AddEm
proc push mov add mov pop ret endp
near ax ax, es:2[bx] ax, es:4[bx] es:[bx], ax ax
;Get J’s value ;Add in K’s value ;Store result in I
Note that you must allocate the three parameters in contiguous memory locations.
Page 598
Procedures and Functions This form of parameter passing works well when passing several parameters by reference, because you can initialize pointers to the parameters directly within the assembler. For example, suppose you wanted to create a subroutine rotate to which you pass four parameters by reference. This routine would copy the second parameter to the first, the third to the second, the fourth to the third, and the first to the fourth. Any easy way to accomplish this in assembly is ; Rotate; ;
On entry, BX points at a parameter block in the data segment that points at four far pointers. This code rotates the data referenced by these pointers.
Rotate
proc push push push
near es si ax
;Need to preserve these ; registers
les mov les xchg les xchg les xchg les mov
si, [bx+4] ax, es:[si] si, [bx] ax, es:[si] si, [bx+12] ax, es:[si] si, [bx+8] ax, es:[si] si, [bx+4] es:[si], ax
;Get ptr to 2nd var ;Get its value ;Get ptr to 1st var ;2nd->1st, 1st->ax ;Get ptr to 4th var ;1st->4th, 4th->ax ;Get ptr to 3rd var ;4th->3rd, 3rd->ax ;Get ptr to 2nd var ;3rd -> 2nd
pop pop pop ret endp
ax si es
Rotate
To call this routine, you pass it a pointer to a group of four far pointers in the bx register. For example, suppose you wanted to rotate the first elements of four different arrays, the second elements of those four arrays, and the third elements of those four arrays. You could do this with the following code: lea call lea call lea call
bx, RotateGrp1 Rotate bx, RotateGrp2 Rotate bx, RotateGrp3 Rotate
. . .
RotateGrp1 RotateGrp2 RotateGrp3
dword dword dword
ary1[0], ary2[0], ary3[0], ary4[0] ary1[2], ary2[2], ary3[2], ary4[2] ary1[4], ary2[4], ary3[4], ary4[4]
Note that the pointer to the parameter block is itself a parameter. The examples in this section pass this pointer in the registers. However, you can pass this pointer anywhere you would pass any other reference parameter – in registers, in global variables, on the stack, in the code stream, even in another parameter block! Such variations on the theme, however, will be left to your own imagination. As with any parameter, the best place to pass a pointer to a parameter block is in the registers. This text will generally adopt that policy. Although beginning assembly language programmers rarely use parameter blocks, they certainly have their place. Some of the IBM PC BIOS and MS-DOS functions use this parameter passing mechanism. Parameter blocks, since you can initialize their values during assembly (using byte, word, etc.), provide a fast, efficient way to pass parameters to a procedure. Of course, you can pass parameters by value, reference, value-returned, result, or by name in a parameter block. The following piece of code is a modification of the Rotate procedure above where the first parameter is passed by value (its value appears inside the parameter block), the second is passed by reference, the third by value-returned, and the Page 599
Chapter 11 fourth by name (there is no pass by result since Rotate needs to read and write all values). For simplicity, this code uses near pointers and assumes all variables appear in the data segment: ; Rotate; ; ; ;
On entry, DI points at segment that points at a value parameter, the the third is passed by passed by name.
Rotate
proc push push push push
near si ax bx cx
mov mov
si, [di+4] cx, [si]
;Get a copy of val/ret parm
mov call xchg xchg mov xchg mov
ax, [di] word ptr [di+6] ax, [bx] ax, cx bx, [di+2] ax, [bx] [di], ax
;Get 1st (value) parm ;Get ptr to 4th var ;1st->4th, 4th->ax ;4th->3rd, 3rd->ax ;Get adrs of 2nd (ref) parm ;3rd->2nd, 2nd->ax ;2nd->1st
mov mov
bx, [di+4] [bx], cx
;Get ptr to val/ret parm ;Save val/ret parm away.
pop pop pop pop ret endp
cx bx ax si
Rotate
a parameter block in the data four pointers. The first is second is passed by reference, value/return, the fourth is
;Used to access ref parms ;Temporary ;Used by pass by name parm ;Local copy of val/ret parm
A reasonable example of a call to this routine might be: I J K RotateBlk
word word word word
10 15 20 25, I, J, KThunk
. . .
lea call
di, RotateBlk Rotate
. . .
KThunk
KThunk
11.6
proc lea ret endp
near bx, K
Function Results Functions return a result, which is nothing more than a result parameter. In assembly language, there are very few differences between a procedure and a function. That is probably why there aren’t any “func” or “endf” directives. Functions and procedures are usually different in HLLs, function calls appear only in expressions, subroutine calls as statements7. Assembly language doesn’t distinguish between them. You can return function results in the same places you pass and return parameters. Typically, however, a function returns only a single value (or single data structure) as the
7. “C” is an exception to this rule. C’s procedures and functions are all called functions. PL/I is another exception. In PL/I, they’re all called procedures.
Page 600
Procedures and Functions function result. The methods and locations used to return function results is the subject of the next three sections.
11.6.1
Returning Function Results in a Register Like parameters, the 80x86’s registers are the best place to return function results. The getc routine in the UCR Standard Library is a good example of a function that returns a
value in one of the CPU’s registers. It reads a character from the keyboard and returns the ASCII code for that character in the al register. Generally, functions return their results in the following registers:
Use Bytes: Words: Double words:
First
Last al, ah, dl, dh, cl, ch, bl, bh ax, dx, cx, si, di, bx dx:ax eax, edx, ecx, esi, edi, ebx bx, si, di, dx ebx, esi , edi, eax, ecx, edx es:di, es:bx, dx:ax, es:si
16-bitOffsets: 32-bit Offsets Segmented Pointers:
On pre-80386 On 80386 and later. Do not use DS.
Once again, this table represents general guidelines. If you’re so inclined, you could return a double word value in (cl, dh, al, bh). If you’re returning a function result in some registers, you shouldn’t save and restore those registers. Doing so would defeat the whole purpose of the function.
11.6.2
Returning Function Results on the Stack Another good place where you can return function results is on the stack. The idea here is to push some dummy values onto the stack to create space for the function result. The function, before leaving, stores its result into this location. When the function returns to the caller, it pops everything off the stack except this function result. Many HLLs use this technique (although most HLLs on the IBM PC return function results in the registers). The following code sequences show how values can be returned on the stack: function PasFunc(i,j,k:integer):integer; begin PasFunc := i+j+k; end; m := PasFunc(2,n,l);
In assembly: PasFunc_rtn PasFunc_i PasFunc_j PasFunc_k
equ equ equ equ
10[bp] 8[bp] 6[bp] 4[bp]
PasFunc
proc push mov push mov add add mov pop pop ret endp
near bp bp, sp ax ax, PasFunc_i ax, PasFunc_j ax, PasFunc_k PasFunc_rtn, ax ax bp 6
PasFunc
Page 601
Chapter 11 Calling sequence: push mov push push push call pop
ax ax, 2 ax n l PasFunc ax
;Space for function return result
;Get function return result
On an 80286 or later processor you could also use the code: push push push push call pop
ax 2 n l PasFunc ax
;Space for function return result
;Get function return result
Although the caller pushed eight bytes of data onto the stack, PasFunc only removes six. The first “parameter” on the stack is the function result. The function must leave this value on the stack when it returns.
11.6.3
Returning Function Results in Memory Locations Another reasonable place to return function results is in a known memory location. You can return function values in global variables or you can return a pointer (presumably in a register or a register pair) to a parameter block. This process is virtually identical to passing parameters to a procedure or function in global variables or via a parameter block. Returning parameters via a pointer to a parameter block is an excellent way to return large data structures as function results. If a function returns an entire array, the best way to return this array is to allocate some storage, store the data into this area, and leave it up to the calling routine to deallocate the storage. Most high level languages that allow you to return large data structures as function results use this technique. Of course, there is very little difference between returning a function result in memory and the pass by result parameter passing mechanism. See “Pass by Result” on page 576 for more details.
11.7
Side Effects A side effect is any computation or operation by a procedure that isn’t the primary purpose of that procedure. For example, if you elect not to preserve all affected registers within a procedure, the modification of those registers is a side effect of that procedure. Side effect programming, that is, the practice of using a procedure’s side effects, is very dangerous. All too often a programmer will rely on a side effect of a procedure. Later modifications may change the side effect, invalidating all code relying on that side effect. This can make your programs hard to debug and maintain. Therefore, you should avoid side effect programming. Perhaps some examples of side effect programming will help enlighten you to the difficulties you may encounter. The following procedure zeros out an array. For efficiency reasons, it makes the caller responsible for preserving necessary registers. As a result, one side effect of this procedure is that the bx and cx registers are modified. In particular, the cx register contains zero upon return.
Page 602
Procedures and Functions ClrArray
ClrLoop:
ClrArray
proc lea mov mov inc inc loop ret endp
near bx, array cx, 32 word ptr [bx], 0 bx bx ClrLoop
If your code expects cx to contain zero after the execution of this subroutine, you would be relying on a side effect of the ClrArray procedure. The main purpose behind this code is zeroing out an array, not setting the cx register to zero. Later, if you modify the ClrArray procedure to the following, your code that depends upon cx containing zero would no longer work properly: ClrArray ClrLoop:
ClrArray
proc lea mov inc inc cmp jne ret endp
near bx, array word ptr [bx], 0 bx bx bx, offset array+32 ClrLoop
So how can you avoid the pitfalls of side effect programming in your procedures? By carefully structuring your code and paying close attention to exactly how your calling code and the subservient procedures interface with one another. These rules can help you avoid problems with side effect programming: •
•
•
•
• • •
Always properly document the input and output conditions of a procedure. Never rely on any other entry or exit conditions other than these documented operations. Partition your procedures so that they compute a single value or execute a single operation. Subroutines that do two or more tasks are, by definition, producing side effects unless every invocation of that subroutine requires all the computations and operations. When updating the code in a procedure, make sure that it still obeys the entry and exit conditions. If not, either modify the program so that it does or update the documentation for that procedure to reflect the new entry and exit conditions. Avoid passing information between routines in the CPU’s flag register. Passing an error status in the carry flag is about as far as you should ever go. Too many instructions affect the flags and it’s too easy to foul up a return sequence so that an important flag is modified on return. Always save and restore all registers a procedure modifies. Avoid passing parameters and function results in global variables. Avoid passing parameters by reference (with the intent of modifying them for use by the calling code).
These rules, like all other rules, were meant to be broken. Good programming practices are often sacrificed on the altar of efficiency. There is nothing wrong with breaking these rules as often as you feel necessary. However, your code will be difficult to debug and maintain if you violate these rules often. But such is the price of efficiency8. Until you gain enough experience to make a judicious choice about the use of side effects in your programs, you should avoid them. More often than not, the use of a side effect will cause more problems than it solves.
8. This is not just a snide remark. Expert programmers who have to wring the last bit of performance out of a section of code often resort to poor programming practices in order to achieve their goals. They are prepared, however, to deal with the problems that are often encountered in such situations and they are a lot more careful when dealing with such code.
Page 603
Chapter 11
11.8
Local Variable Storage Sometimes a procedure will require temporary storage, that it no longer requires when the procedure returns. You can easily allocate such local variable storage on the stack. The 80x86 supports local variable storage with the same mechanism it uses for parameters – it uses the bp and sp registers to access and allocate such variables. Consider the following Pascal program: program LocalStorage; var i,j,k:integer; c: array [0..20000] of integer; procedure Proc1; var a:array [0..30000] of integer; i:integer; begin {Code that manipulates a and i} end; procedure Proc2; var b:array [0..20000] of integer; i:integer; begin {Code that manipulates b and i} end; begin {main program that manipulates i,j,k, and c} end.
Pascal normally allocates global variables in the data segment and local variables in the stack segment. Therefore, the program above allocates 50,002 words of local storage (30,001 words in Proc1 and 20,001 words in Proc2). This is above and beyond the other data on the stack (like return addresses). Since 50,002 words of storage consumes 100,004 bytes of storage you have a small problem – the 80x86 CPUs in real mode limit the stack segment to 65,536 bytes. Pascal avoids this problem by dynamically allocating local storage upon entering a procedure and deallocating local storage upon return. Unless Proc1 and Proc2 are both active (which can only occur if Proc1 calls Proc2 or vice versa), there is sufficient storage for this program. You don’t need the 30,001 words for Proc1 and the 20,001 words for Proc2 at the same time. So Proc1 allocates and uses 60,002 bytes of storage, then deallocates this storage and returns (freeing up the 60,002 bytes). Next, Proc2 allocates 40,002 bytes of storage, uses them, deallocates them, and returns to its caller. Note that Proc1 and Proc2 share many of the same memory locations. However, they do so at different times. As long as these variables are temporaries whose values you needn’t save from one invocation of the procedure to another, this form of local storage allocation works great. The following comparison between a Pascal procedure and its corresponding assembly language code will give you a good idea of how to allocate local storage on the stack: procedure LocalStuff(i,j,k:integer); var l,m,n:integer; {local variables} begin l j n m
:= := := :=
i+2; l*k+j; j-l; l+j+n;
end;
Calling sequence: Page 604
Procedures and Functions LocalStuff(1,2,3);
Assembly language code: LStuff_i LStuff_j LStuff_k LStuff_l LStuff_m LStuff_n
equ equ equ equ equ equ
8[bp] 6[bp] 4[bp] -4[bp] -6[bp] -8[bp]
LocalStuff
proc push mov push sub mov add mov mov mul add mov sub mov add add mov
near bp bp, sp ax sp, 6 ax, LStuff_i ax, 2 LStuff_l, ax ax, LStuff_l LStuff_k ax, LStuff_j LStuff_j, ax ax, LStuff_l LStuff_n, ax ax, LStuff_l ax, LStuff_j LStuff_m, ax
add pop pop ret endp
sp, 6 ax bp 6
L0:
LocalStuff
;Allocate local variables.
;AX already contains j ;AX already contains n
;Deallocate local storage
The sub sp, 6 instruction makes room for three words on the stack. You can allocate l, m, and n in these three words. You can reference these variables by indexing off the bp register using negative offsets (see the code above). Upon reaching the statement at label L0, the stack looks something like Figure 11.15. This code uses the matching add sp, 6 instruction at the end of the procedure to deallocate the local storage. The value you add to the stack pointer must exactly match the value you subtract when allocating this storage. If these two values don’t match, the stack pointer upon entry to the routine will not match the stack pointer upon exit; this is like pushing or popping too many items inside the procedure. Unlike parameters, that have a fixed offset in the activation record, you can allocate local variables in any order. As long as you are consistent with your location assignments, you can allocate them in any way you choose. Keep in mind, however, that the 80x86 supports two forms of the disp[bp] addressing mode. It uses a one byte displacement when it is in the range -128..+127. It uses a two byte displacement for values in the range -32,768..+32,767. Therefore, you should place all primitive data types and other small structures close to the base pointer, so you can use single byte displacements. You should place large arrays and other data structures below the smaller variables on the stack. Most of the time you don’t need to worry about allocating local variables on the stack. Most programs don’t require more than 64K of storage. The CPU processes global variables faster than local variables. There are two situations where allocating local variables as globals in the data segment is not practical: when interfacing assembly language to HLLs like Pascal, and when writing recursive code. When interfacing to Pascal, your assembly language code may not have a data segment it can use, recursion often requires multiple instances of the same local variable.
Page 605
Chapter 11
10
Previous Stack Contents
8
Value of I Parameter
6
Value of J Parameter
4
Value of K parameter
2
Return Address
0
Original BP Value
-2
AX
-4
Storage for L
-6
Storage for M
-8
Storage for N
If this is a NEAR Procedure
BP
SP
Offset from BP
Figure 11.16 The Stack upon Entering the Next Procedure
11.9
Recursion Recursion occurs when a procedure calls itself. The following, for example, is a recursive procedure: Recursive
Recursive
proc call ret endp
Recursive
Of course, the CPU will never execute the ret instruction at the end of this procedure. Upon entry into Recursive, this procedure will immediately call itself again and control will never pass to the ret instruction. In this particular case, run away recursion results in an infinite loop. In many respects, recursion is very similar to iteration (that is, the repetitive execution of a loop). The following code also produces an infinite loop: Recursive
Recursive
proc jmp ret endp
Recursive
There is, however, one major difference between these two implementations. The former version of Recursive pushes a return address onto the stack with each invocation of the subroutine. This does not happen in the example immediately above (since the jmp instruction does not affect the stack). Like a looping structure, recursion requires a termination condition in order to stop infinite recursion. Recursive could be rewritten with a termination condition as follows:
Page 606
Procedures and Functions Recursive
QuitRecurse: Recursive
proc dec jz call ret endp
ax QuitRecurse Recursive
This modification to the routine causes Recursive to call itself the number of times appearing in the ax register. On each call, Recursive decrements the ax register by one and calls itself again. Eventually, Recursive decrements ax to zero and returns. Once this happens, the CPU executes a string of ret instructions until control returns to the original call to Recursive. So far, however, there hasn’t been a real need for recursion. After all, you could efficiently code this procedure as follows: Recursive RepeatAgain:
Recursive
proc dec jnz ret endp
ax RepeatAgain
Both examples would repeat the body of the procedure the number of times passed in the ax register9. As it turns out, there are only a few recursive algorithms that you cannot implement in an iterative fashion. However, many recursively implemented algorithms are more efficient than their iterative counterparts and most of the time the recursive form of the algorithm is much easier to understand. The quicksort algorithm is probably the most famous algorithm that almost always appears in recursive form. A Pascal implementation of this algorithm follows: procedure quicksort(var a:ArrayToSort; Low,High: integer); procedure sort(l,r: integer); var i,j,Middle,Temp: integer; begin i:=l; j:=r; Middle:=a[(l+r) DIV 2]; repeat while (a[i] < Middle) do i:=i+1; while (Middle < a[j]) do j:=j-1; if (i <= j) then begin Temp:=a[i]; a[i]:=a[j]; a[j]:=Temp; i:=i+1; j:=j-1; end; until i>j; if l<j then sort(l,j); if i
The sort subroutine is the recursive routine in this package. Recursion occurs at the last two if statements in the sort procedure. In assembly language, the sort routine looks something like this: 9. Although the latter version will do it considerably faster since it doesn’t have the overhead of the CALL/RET instructions.
Page 607
Chapter 11
cseg
include stdlib.a includelib stdlib.lib segment assume cs:cseg, ds:cseg, ss:sseg, es:cseg
; Main program to test sorting routine Main
Main
proc mov mov mov
ax, cs ds, ax es, ax
mov push mov push call
ax, 0 ax ax, 31 ax sort
ExitPgm endp
;Return to DOS
; Data to be sorted a
word word
31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
; procedure sort (l,r:integer) ; Sorts array A between indices l and r l r i j
equ equ equ equ
6[bp] 4[bp] -2[bp] -4[bp]
sort
proc push mov sub
near bp bp, sp sp, 4
mov mov mov
ax, l i, ax bx, r
mov
j, bx
; ; ; ;
;Make room for i and j. ;i := l ;j := r
Note: This computation of the address of a[(l+r) div 2] is kind of strange. Rather than divide by two, then multiply by two (since A is a word array), this code simply clears the L.O. bit of BX. add and mov
bx, l bx, 0FFFEh ax, a[bx]
;Middle := a[(l+r) div 2]
lea add add
bx, a bx, i bx, i
;Compute the address of a[i] ; and leave it in BX.
lea add add
si, a si, j si, j
;Compute the address of a[j] ; and leave it in SI.
;BX*2, because this is a word ; ; array,nullifies the “div 2” ; ; above. ; ; Repeat until i > j: Of course, I and J are in BX and SI.
RptLp: ; While (a [i] < Middle) do i := i + 1; soon. WhlLp1:
sub
bx, 2
;We’ll increment it real
add cmp jg
bx, 2 ax, [bx] WhlLp1
;AX still contains middle
; While (Middle < a[j]) do j := j-1
Page 608
Procedures and Functions WhlLp2:
add add cmp jl cmp jnle
si, 2 si, 2 ax, [si] WhlLp2 bx, si SkipIf
;We’ll decrement it in loop ;AX still contains middle ; value.
; Swap, if necessary
SkipIf:
mov xchg xchg
dx, [bx] dx, [si] dx, [bx]
add sub
bx, 2 si, 2
cmp jng
bx, si RptLp
;Bump by two (integer values)
; Convert SI and BX back to I and J lea sub shr sub
ax, a bx, ax bx, 1 si, ax shrsi, 1
; Now for the recursive part:
NoRec1:
NoRec2:
mov cmp jnl push push call
ax, l ax, si NoRec1 ax si sort
cmp jnl push push call mov pop ret
bx, r NoRec2 bx r sort sp, bp bp 4
Sort
endp
cseg sseg
ends segment stack ‘stack’ word 256 dup (?) ends end main
sseg
Other than some basic optimizations (like keeping several variables in registers), this code is almost a literal translation of the Pascal code. Note that the local variables i and j aren’t necessary in this assembly language code (we could use registers to hold their values). Their use simply demonstrates the allocation of local variables on the stack. There is one thing you should keep in mind when using recursion – recursive routines can eat up a considerable stack space. Therefore, when writing recursive subroutines, always allocate sufficient memory in your stack segment. The example above has an extremely anemic 512 byte stack space, however, it only sorts 32 numbers therefore a 512 byte stack is sufficient. In general, you won’t know the depth to which recursion will take you, so allocating a large block of memory for the stack may be appropriate. There are several efficiency considerations that apply to recursive procedures. For example, the second (recursive) call to sort in the assembly language code above need not be a recursive call. By setting up a couple of variables and registers, a simple jmp instruction can can replace the pushes and the recursive call. This will improve the performance of the quicksort routine (quite a bit, actually) and will reduce the amount of memory the stack requires. A good book on algorithms, such as D.E. Knuth’s The Art of Computer Programming, Volume 3, would be an excellent source of additional material on quickPage 609
Chapter 11 sort. Other texts on algorithm complexity, recursion theory, and algorithms would be a good place to look for ideas on efficiently implementing recursive algorithms.
11.10 Sample Program The following sample program demonstrates several concepts appearing in this chapter, most notably, passing parameters on the stack. This program (Pgm11_1.asm appearing on the companion CD-ROM) manipulates the PC’s memory-mapped text video display screen (at address B800:0 for color displays, B000:0 for monochrome displays). It provides routines that “capture” all the data on the screen to an array, write the contents of an array to the screen, clear the screen, scroll one line up or down, position the cursor at an (X,Y) coordinate, and retrieve the current cursor position. Note that this code was written to demonstrate the use of parameters and local variables. Therefore, it is rather inefficient. As the comments point out, many of the functions this package provides could be written to run much faster using the 80x86 string instructions. See the laboratory exercises for a different version of some of these functions that is written in such a fashion. Also note that this code makes some calls to the PC’s BIOS to set and obtain the cursor position as well as clear the screen. See the chapter on BIOS and DOS for more details on these BIOS calls. ; ; ; ; ; ; ; ; ; ; ; ; ;
Pgm11_1.asm Screen Aids. This program provides some useful screen manipulation routines that let you do things like position the cursor, save and restore the contents of the display screen, clear the screen, etc. This program is not very efficient. It was written to demonstrate parameter passing, use of local variables, and direct conversion of loops to assembly language. There are far better ways of doing what this program does (running about 5-10x faster) using the 80x86 string instructions.
.xlist include stdlib.a includelib stdlib.lib .list .386 option
;Comment out these two statements segment:use16 ; if you are not using an 80386.
; ScrSeg- This is the video screen's segment address. It should be ; B000 for mono screens and B800 for color screens. ScrSeg
=
0B800h
dseg
segment
para public 'data'
XPosn YPosn
word word
? ?
;Cursor X-Coordinate (0..79) ;Cursor Y-Coordinate (0..24)
; The following array holds a copy of the initial screen data.
Page 610
SaveScr
word
dseg
ends
cseg
segment assume
25 dup (80 dup (?))
para public 'code' cs:cseg, ds:dseg
Procedures and Functions ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Capture-
Copies the data on the screen to the array passed by reference as a parameter.
procedure Capture(var ScrCopy:array[0..24,0..79] of word); var x,y:integer; begin for y := 0 to 24 do for x := 0 to 79 do SCREEN[y,x] := ScrCopy[y,x]; end;
Activation record for Capture: | | | Previous stk contents | ------------------------| ScrCopy Seg Adrs | --| ScrCopy offset Adrs | ------------------------| Return Adrs (near) | ------------------------| Old BP | ------------------------- <- BP | X coordinate value | ------------------------| Y coordinate value | ------------------------| Registers, etc. | ------------------------- <- SP
ScrCopy_cap X_cap Y_cap
textequ textequ textequ
<word ptr [bp-2]> <word ptr [bp-4]>
Capture
proc push mov sub
bp bp, sp sp, 4
push push push push push
es ds ax bx di
mov mov
bx, ScrSeg es, bx
lds
di, ScrCopy_cap ;Get ptr to capture array.
mov mov mov imul add add
Y_cap, 0 X_cap, 0 bx, Y_cap bx, 80 bx, X_cap bx, bx
mov mov
ax, es:[bx] ;Read character code from screen. [di][bx], ax ;Store away into capture array.
inc cmp jb
X_Cap X_Cap, 80 XLoop
;Repeat for each character on this ; row of characters (each character ; in the row is two bytes).
inc
Y_Cap
;Repeat for each row on the screen.
YLoop: XLoop:
;Allocate room for locals.
;Set up pointer to SCREEN ; memory (ScrSeg:0).
;Screen memory is a 25x80 array ; stored in row major order ; with two bytes per element.
Page 611
Chapter 11
Capture
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Fill-
Y_Cap, 25 YLoop
pop pop pop pop pop mov pop ret endp
di bx ax ds es sp, bp bp 4
Copies array passed by reference onto the screen.
procedure Fill(var ScrCopy:array[0..24,0..79] of word); var x,y:integer; begin for y := 0 to 24 do for x := 0 to 79 do ScrCopy[y,x] := SCREEN[y,x]; end;
Activation record for Fill: | | | Previous stk contents | ------------------------| ScrCopy Seg Adrs | --| ScrCopy offset Adrs | ------------------------| Return Adrs (near) | ------------------------| Old BP | ------------------------- <- BP | X coordinate value | ------------------------| Y coordinate value | ------------------------| Registers, etc. | ------------------------- <- SP
ScrCopy_fill X_fill Y_fill
textequ textequ textequ
<word ptr [bp-2]> <word ptr [bp-4]>
Fill
proc push mov sub
bp bp, sp sp, 4
push push push push push
es ds ax bx di
mov mov
bx, ScrSeg es, bx
lds
di, ScrCopy_fill ;Get ptr to data array.
mov mov
Y_Fill, 0 X_Fill, 0
YLoop:
Page 612
cmp jb
;Set up pointer to SCREEN ; memory (ScrSeg:0).
Procedures and Functions XLoop:
Fill
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Scroll_up-
mov imul add add
bx, bx, bx, bx,
Y_Fill 80 X_Fill bx
mov mov
ax, [di][bx] ;Store away into capture array. es:[bx], ax ;Read character code from screen.
inc cmp jb
X_Fill X_Fill, 80 XLoop
;Repeat for each character on this ; row of characters (each character ; in the row is two bytes).
inc cmp jb
Y_Fill Y_Fill, 25 YLoop
;Repeat for each row on the screen.
pop pop pop pop pop mov pop ret endp
di bx ax ds es sp, bp bp 4
;Screen memory is a 25x80 array ; stored in row major order ; with two bytes per element.
Scrolls the screen up on line. It does this by copying the second line to the first, the third line to the second, the fourth line to the third, etc.
procedure Scroll_up; var x,y:integer; begin for y := 1 to 24 do for x := 0 to 79 do SCREEN[Y-1,X] := SCREEN[Y,X]; end; Activation record for Scroll_up: | | | Previous stk contents | ------------------------| Return Adrs (near) | ------------------------| Old BP | ------------------------- <- BP | X coordinate value | ------------------------| Y coordinate value | ------------------------| Registers, etc. | ------------------------- <- SP
X_su Y_su
textequ textequ
<word ptr [bp-2]> <word ptr [bp-4]>
Scroll_up
proc push mov sub
bp bp, sp sp, 4
push push push
ds ax bx
mov mov
ax, ScrSeg ds, ax
;Make room for X, Y.
Page 613
Chapter 11 su_Loop1: su_Loop2:
Scroll_up
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 614
Scroll_dn-
mov mov
Y_su, 0 X_su, 0
mov imul add add
bx, bx, bx, bx,
mov mov
ax, [bx+160] ;Fetch word from source line. [bx], ax ;Store into dest line.
inc cmp jb
X_su X_su, 80 su_Loop2
inc cmp jb
Y_su Y_su, 80 su_Loop1
pop pop pop mov pop ret endp
bx ax ds sp, bp bp
Y_su 80 X_su bx
;Compute index into screen ; array. ;Remember, this is a word array.
Scrolls the screen down one line. It does this by copying the 24th line to the 25th, the 23rd line to the 24th, the 22nd line to the 23rd, etc.
procedure Scroll_dn; var x,y:integer; begin for y := 23 downto 0 do for x := 0 to 79 do SCREEN[Y+1,X] := SCREEN[Y,X]; end; Activation record for Scroll_dn: | | | Previous stk contents | ------------------------| Return Adrs (near) | ------------------------| Old BP | ------------------------- <- BP | X coordinate value | ------------------------| Y coordinate value | ------------------------| Registers, etc. | ------------------------- <- SP
X_sd Y_sd
textequ textequ
<word ptr [bp-2]> <word ptr [bp-4]>
Scroll_dn
proc push mov sub
bp bp, sp sp, 4
push push push
ds ax bx
mov
ax, ScrSeg
;Make room for X, Y.
Procedures and Functions
sd_Loop1: sd_Loop2:
Scroll_dn
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
GotoXY-
mov mov mov
ds, ax Y_sd, 23 X_sd, 0
mov imul add add
bx, bx, bx, bx,
mov mov
ax, [bx] ;Fetch word from source line. [bx+160], ax ;Store into dest line.
inc cmp jb
X_sd X_sd, 80 sd_Loop2
dec cmp jge
Y_sd Y_sd, 0 sd_Loop1
pop pop pop mov pop ret endp
bx ax ds sp, bp bp
Y_sd 80 X_sd bx
;Compute index into screen ; array. ;Remember, this is a word array.
Positions the cursor at the specified X, Y coordinate.
procedure gotoxy(x,y:integer); begin BIOS(posnCursor,x,y); end; Activation record for GotoXY | | | Previous stk contents | ------------------------| X coordinate value | ------------------------| Y coordinate value | ------------------------| Return Adrs (near) | ------------------------| Old BP | ------------------------- <- BP | Registers, etc. | ------------------------- <- SP
X_gxy Y_gxy
textequ textequ
GotoXY
proc push mov push push push
bp bp, sp ax bx dx
mov mov mov mov int
ah, bh, dh, dl, 10h
2 0 Y_gxy X_gxy
;Magic BIOS value for gotoxy. ;Display page zero. ;Set up BIOS (X,Y) parameters. ;Make the BIOS call.
Page 615
Chapter 11
GotoXY
pop pop pop mov pop ret endp
; GetX-
Returns cursor X-Coordinate in the AX register.
GetX
proc push push push
bx cx dx
mov mov int
ah, 3 bh, 0 10h
;Read X, Y coordinates from ; BIOS
mov mov
al, dl ah, 0
;Return X coordinate in AX.
dx cx bx
GetX
pop pop pop ret endp
; GetY-
Returns cursor Y-Coordinate in the AX register.
GetY
proc push push push
bx cx dx
mov mov int
ah, 3 bh, 0 10h
mov mov
al, dh ah, 0
pop pop pop ret endp
dx cx bx
GetY
; ; ; ; ; ;
ClearScrn-
;Return Y Coordinate in AX.
Clears the screen and positions the cursor at (0,0).
procedure ClearScrn; begin BIOS(Initialize) end;
ClearScrn
Page 616
dx bx ax sp, bp bp 4
proc push push push push
ax bx cx dx
mov mov mov
ah, 6 al, 0 bh, 07
;Magic BIOS number. ;Clear entire screen. ;Clear with black spaces.
Procedures and Functions
ClearScrn
mov mov mov int
cx, 0000;Upper left corner is (0,0) dl, 79 ;Lower X-coordinate dh, 24 ;Lower Y-coordinate 10h ;Make the BIOS call.
push push call
0 0 GotoXY
pop pop pop pop ret endp
dx cx bx ax
;Position the cursor to (0,0) ; after the call.
; A short main program to test out the above: Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
; Save the screen as it looks when this program is run. push push call
seg SaveScr offset SaveScr Capture
call mov
GetX XPosn, ax
call mov
GetY YPosn, ax
; Clear the screen to prepare for our stuff. call
ClearScrn
; Position the cursor in the middle of the screen and print some stuff. push push call
30 10 GotoXY
;X value ;Y value
print byte
"Screen Manipulatation Demo",0
push push call
30 11 GotoXY
print byte
"Press any key to continue",0
getc
; Scroll the screen up two lines call call getc
Scroll_up Scroll_up
;Scroll the screen down four lines: call
Scroll_dn
Page 617
Chapter 11 call call call getc
Scroll_dn Scroll_dn Scroll_dn
; Restore the screen to what it looked like prior to this call. push push call
seg SaveScr offset SaveScr Fill
push push call
XPosn YPosn GotoXY
Quit: Main cseg
ExitPgm endp ends
;DOS macro to quit program.
sseg stk sseg
segment byte ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?) Main
11.11 Laboratory Exercises This laboratory exercise demonstrates how a C/C++ program calls some assembly language functions. This exercise consists of two program units: a Borland C++ program (Ex11_1.cpp) and a MASM 6.11 program (Ex11_1a.asm). Since you may not have access to a C++ compiler (and Borland C++ in particular)10, the EX11.EXE file contains a precompiled and linked version of these files. If you have a copy of Borland C++ then you can compile/assemble these files using the makefile that also appears in the Chapter 11 subdirectory. The C++ program listing appears in Section 11.11.1. This program clears the screen and then bounces a pound sign (“#”) around the screen until the user presses any key. Then this program restores the screen to the previous display before running the program and quits. All screen manipulation, as well as testing for a keypress, is taken care of by functions written in assembly language. The “extern” statements at the beginning of the program provide the linkage to these assembly language functions11. There are a few important things to note about how C/C++ passes parameters to an assembly language function: •
•
•
C++ pushes parameters on the stack in the reverse order that they appear in a parameter list. For example, for the call “f(a,b);” C++ would push b first and a second. This is opposite of most of the examples in this chapter. In C++, the caller is responsible for removing parameters from the stack. In this chapter, the callee (the function itself) usually removed the parameters by specifying some value after the ret instruction. Assembly language functions that C++ calls must not do this. C++ on the PC uses different memory models to control whether pointers and functions are near or far. This particular program uses the compact
10. There is nothing Borland specific in this C++ program. Borland was chosen because it provides an option that generates well annotated assembly output. 11. The extern “C” phrase instructs Borland C++ to generate standard C external names rather than C++ mangled names. A C external name is the function name with an underscore in front of it (e.g., GotoXY becomes _GotoXY). C++ completely changes the name to handle overloading and it is difficult to predict the actual name of the corresponding assembly language function.
Page 618
Procedures and Functions
•
•
memory model. This provides for near procedures and far pointers. Therefore, all calls will be near (with only a two-byte return address on the stack) and all pointers to data objects will be far. Borland C++ requires a function to preserve the segment registers, BP, DI, and SI. The function need not preserve any other registers. If an assembly language function needs to return a 16-bit function result to C++, it must return this value in the AX register. See the Borland C++ Programmer’s Guide (or corresponding manual for your C++ compiler) for more details about the C/C++ and assembly language interface.
Most C++ compilers give you the option of generating assembly language output rather than binary machine code. Borland C++ is nice because it generates nicely annotated assembly output with comments pointing out which C++ statments correspond to a given sequence of assembly language instructions. The assembly language output of BCC appears in Section 11.11.2 (This is a slightly edited version to remove some superfluous information). Look over this code and note that, subject to the rules above, the C++ compiler emits code that is very similar to that described throughout this chapter. The Ex11_1a.asm file (see section 11.11.3) is the actual assembly code the C++ program calls. This contains the functions for the GotoXY, GetXY, ClrScrn, tstKbd, Capture, PutScrn, PutChar, and PutStr routines that Ex11_1.cpp calls. To avoid legal software distribution problems, this particular C/C++ program does not include any calls to C/C++ Standard Library functions. Furthermore, it does not use the standard C0m.obj file from Borland that calls the main program. Borland’s liberal license agreement does not allow one to distribute their librarys and object modules unlinked with other code. The assembly language code provides the necessary I/O routines and it also provides a startup routine (StartPgm) that call the C++ main program when DOS/Windows transfers control to the program. By supplying the routines this way, you do not need the Borland libraries or object code to link these programs together. One side effect of linking the modules in this fashion is that the compiler, assembler, and linker cannot store the correct source level debugging information in the .exe file. Therefore, you will not be able to use CodeView to view the actual source code. Instead, you will have to work with disassembled machine code. This is where the assembly output from Borland C++ (Ex11_1.asm) comes in handy. As you single step through the main C++ program you can trace the program flow by looking at the Ex11_1.asm file. For your lab report: single step through the StartPgm code until it calls the C++ main function. When this happens, locate the calls to the routines in Ex11_1a.asm. Set breakpoints on each of these calls using the F9 key. Run up to each breakpoint and then single step into the function using the F8 key. Once inside, display the memory locations starting at SS:SP. Identify each parameter passed on the stack. For reference parameters, you may want to look at the memory locations whose address appears on the stack. Report your findings in your lab report. Include a printout of the Ex11_1.asm file and identify those instructions that push each parameter onto the stack. At run time, determine the values that each parameter push sequence pushes onto the stack and include these values in your lab report. Many of the functions in the assembly file take a considerable amount of time to execute. Therefore, you should not single step through each of the functions. Instead, make sure you’ve set up the breakpoints on each of the call instructions in the C++ program and use the F5 key to run (at full speed) up to the next function call.
11.11.1 Ex11_1.cpp extern extern extern extern
"C" "C" "C" "C"
void GotoXY(unsigned y, unsigned x); void GetXY(unsigned &x, unsigned &y); void ClrScrn(); int tstKbd();
Page 619
Chapter 11 extern extern extern extern
"C" "C" "C" "C"
void void void void
Capture(unsigned ScrCopy[25][80]); PutScr(unsigned ScrCopy[25][80]); PutChar(char ch); PutStr(char *ch);
int main() { unsigned SaveScr[25][80]; int
dx, x, dy, y;
long
i;
unsigned
savex, savey;
GetXY(savex, savey); Capture(SaveScr); ClrScrn(); GotoXY(24,0); PutStr("Press any key to quit"); dx = 1; dy = 1; x = 1; y = 1; while (!tstKbd()) { GotoXY(y, x); PutChar('#'); for (i=0; i<500000; ++i); GotoXY(y, x); PutChar(' '); x += dx; y += dy; if (x >= 79) { x = 78; dx = -1; } else if (x <= 0) { x = 1; dx = 1; } if (y >= 24) { y = 23; dy = -1; } else if (y <= 0) { y = 1; dy = 1; } } PutScr(SaveScr);
Page 620
Procedures and Functions GotoXY(savey, savex); return 0; }
11.11.2 Ex11_1.asm
_TEXT segment byte public 'CODE' _TEXT ends DGROUP group _DATA,_BSS assume cs:_TEXT,ds:DGROUP _DATA segment word public 'DATA' d@ label byte d@w label word _DATA ends _BSS segment word public 'BSS' b@ label byte b@w label word _BSS ends _TEXT ; ; ; _main
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
segment byte public 'CODE' int main() assume proc push mov sub push push { unsigned SaveScr[25][80]; int
dx, x, dy, y;
long
i;
unsigned
savex, savey;
GetXY(savex, savey); push lea push push lea push call add
; ; ;
cs:_TEXT near bp bp,sp sp,4012 si di
ss ax,word ptr [bp-12] ax ss ax,word ptr [bp-10] ax near ptr _GetXY sp,8
Capture(SaveScr); push lea push call pop pop
ss ax,word ptr [bp-4012] ax near ptr _Capture cx cx
;
Page 621
Chapter 11 ; ;
ClrScrn(); call
; ; ; ;
near ptr _ClrScrn
GotoXY(24,0); xor push mov push call pop pop
; ; ;
ax,ax ax ax,24 ax near ptr _GotoXY cx cx
PutStr("Press any key to quit"); push mov push call pop pop
; ; ; ;
ds ax,offset DGROUP:s@ ax near ptr _PutStr cx cx
dx = 1; mov
; ; ;
word ptr [bp-2],1 dy = 1;
mov ; ; ;
word ptr [bp-4],1 x = 1;
mov ; ; ;
si,1 y = 1;
mov jmp @1@58: ; ; ; ; ; ;
di,1 @1@422
while (!tstKbd()) { GotoXY(y, x); push push call pop pop
; ; ;
si di near ptr _GotoXY cx cx PutChar('#');
mov push call pop ; ; ; ;
al,35 ax near ptr _PutChar cx
for (i=0; i<500000; ++i); mov mov jmp
word ptr [bp-6],0 word ptr [bp-8],0 short @1@114
add adc
word ptr [bp-8],1 word ptr [bp-6],0
@1@86:
Page 622
Procedures and Functions @1@114: cmp jl jne cmp jb @1@198: ; ; ; ;
word ptr [bp-6],7 short @1@86 short @1@198 word ptr [bp-8],-24288 short @1@86
GotoXY(y, x); push push call pop pop
; ; ;
si di near ptr _GotoXY cx cx PutChar(' ');
mov push call pop ; ; ; ; ; ;
al,32 ax near ptr _PutChar cx
x += dx; add
; ; ;
si,word ptr [bp-2] y += dy;
add ; ; ;
di,word ptr [bp-4] if (x >= 79)
cmp jl ; ; ; ;
si,79 short @1@254 { x = 78;
mov
si,78
; ; ;
dx = -1; mov
; ; ;
word ptr [bp-2],-1 }
jmp @1@254: ; ; ;
short @1@310
else if (x <= 0) or jg
; ; ; ;
si,si short @1@310 { x = 1;
mov
si,1
; ; ;
dx = 1; mov
@1@310: ; ;
word ptr [bp-2],1
}
Page 623
Chapter 11 ; ; ;
if (y >= 24) cmp jl
di,24 short @1@366
; ; ; ;
{ y = 23; mov
di,23
; ; ;
dy = -1; mov
word ptr [bp-4],-1
; ; ;
} jmp
short @1@422
@1@366: ; ; ;
else if (y <= 0) or jg
di,di short @1@422
; ; ; ;
{ y = 1; mov
di,1
; ; ;
dy = 1; mov
word ptr [bp-4],1
call or jne jmp
near ptr _tstKbd ax,ax @@0 @1@58
@1@422:
@@0: ; ; ; ; ; ; ; ;
}
} PutScr(SaveScr); push lea push call pop pop
; ; ;
GotoXY(savey, savex); push push call pop pop
; ; ;
word ptr [bp-10] word ptr [bp-12] near ptr _GotoXY cx cx
return 0; xor jmp
@1@478: ; ; } ;
Page 624
ss ax,word ptr [bp-4012] ax near ptr _PutScr cx cx
ax,ax short @1@478
Procedures and Functions
_main
pop pop mov pop ret endp
_TEXT
ends
_DATA s@
segment word public 'DATA' label byte db 'Press any key to quit' db 0 ends segment byte public 'CODE' ends public _main extrn _PutStr:near extrn _PutChar:near extrn _PutScr:near extrn _Capture:near extrn _tstKbd:near extrn _ClrScrn:near extrn _GetXY:near extrn _GotoXY:near equ s@ end
_DATA _TEXT _TEXT
_s@
di si sp,bp bp
11.11.3 EX11_1a.asm ; ; ; ; ; ; ; ; ; ; ; ;
Assembly code to link with a C/C++ program. This code directly manipulates the screen giving C++ direct access control of the screen. Note: Like PGM11_1.ASM, this code is relatively inefficient. It could be sped up quite a bit using the 80x86 string instructions. However, its inefficiency is actually a plus here since we don't want the C/C++ program (Ex11_1.cpp) running too fast anyway.
This code assumes that Ex11_1.cpp is compiled using the LARGE memory model (far procs and far pointers). .xlist include stdlib.a includelib stdlib.lib .list .386 option
;Comment out these two statements segment:use16 ; if you are not using an 80386.
; ScrSeg- This is the video screen's segment address. It should be ; B000 for mono screens and B800 for color screens. ScrSeg
=
_TEXT
; ; ; ; ; ; ; ; ;
0B800h
segment para public 'CODE' assume cs:_TEXT
_Capture-
Copies the data on the screen to the array passed by reference as a parameter.
procedure Capture(var ScrCopy:array[0..24,0..79] of word); var x,y:integer; begin for y := 0 to 24 do for x := 0 to 79 do
Page 625
Chapter 11 ; SCREEN[y,x] := ScrCopy[y,x]; ; end; ; ; ; Activation record for Capture: ; ; | | ; | Previous stk contents | ; ------------------------; | ScrCopy Seg Adrs | ; --; | ScrCopy offset Adrs | ; ------------------------; | Return Adrs (offset) | ; ------------------------; | X coordinate value | ; ------------------------; | Y coordinate value | ; ------------------------; | Registers, etc. | ; ------------------------- <- SP
ScrCopy_cap X_cap Y_cap
_Capture
rep
_Capture
; ; ; ; ; ; ; ; ; ; ;
Page 626
_PutScr-
textequ textequ textequ
<word ptr [bp-2]> <word ptr [bp-4]>
public proc push mov
_Capture near bp bp, sp
push push push push pushf cld
es ds si di
mov mov sub
si, ScrSeg ds, si si, si
;Set up pointer to SCREEN ; memory (ScrSeg:0).
les
di, ScrCopy_cap
;Get ptr to capture array.
mov movsd
cx, 1000
;4000 dwords on the screen
popf pop pop pop pop mov pop ret endp
di si ds es sp, bp bp
Copies array passed by reference onto the screen.
procedure PutScr(var ScrCopy:array[0..24,0..79] of word); var x,y:integer; begin for y := 0 to 24 do for x := 0 to 79 do ScrCopy[y,x] := SCREEN[y,x]; end;
Procedures and Functions ; ; Activation record for PutScr: ; ; | | ; | Previous stk contents | ; ------------------------; | ScrCopy Seg Adrs | ; --; | ScrCopy offset Adrs | ; ------------------------; | Return Adrs (offset) | ; ------------------------; | BP Value | <- BP ; ------------------------; | X coordinate value | ; ------------------------; | Y coordinate value | ; ------------------------; | Registers, etc. | ; ------------------------- <- SP
ScrCopy_fill X_fill Y_fill
public proc push mov
_PutScr
rep
_PutScr
; ; ; ; ; ; ; ; ;
textequ textequ textequ
<word ptr [bp-2]> <word ptr [bp-4]> _PutScr near bp bp, sp
push push push push pushf cld
es ds si di
mov mov sub
di, ScrSeg es, di di, di
;Set up pointer to SCREEN ; memory (ScrSeg:0).
lds
si, ScrCopy_cap
;Get ptr to capture array.
mov movsd
cx, 1000
;1000 dwords on the screen
popf pop pop pop pop mov pop ret endp
di si ds es sp, bp bp
GotoXY-Positions the cursor at the specified X, Y coordinate. procedure gotoxy(y,x:integer); begin BIOS(posnCursor,x,y); end; Activation record for GotoXY
Page 627
Chapter 11 ; ; ; ; ; ; ; ; ; ; ; ; ;
| | | Previous stk contents | ------------------------| X coordinate value | ------------------------| Y coordinate value | ------------------------| Return Adrs (offset) | ------------------------| Old BP | ------------------------- <- BP | Registers, etc. | ------------------------- <- SP
X_gxy Y_gxy
textequ textequ
_GotoXY
ClrScrn-
mov mov mov mov int
ah, bh, dh, dl, 10h
2 0 Y_gxy X_gxy
;Magic BIOS value for gotoxy. ;Display page zero. ;Set up BIOS (X,Y) parameters. ;Make the BIOS call.
sp, bp bp
Clears the screen and positions the cursor at (0,0).
procedure ClrScrn; begin BIOS(Initialize) end; Activation record for ClrScrn | | | Previous stk contents | ------------------------| Return Adrs (offset) | ------------------------- <- SP
_ClrScrn
_ClrScrn
Page 628
_GotoXY near bp bp, sp
mov pop ret endp
_GotoXY
; ; ; ; ; ; ; ; ; ; ; ; ; ;
public proc push mov
public proc
_ClrScrn near
mov mov mov mov mov mov int
ah, al, bh, cx, dl, dh, 10h
push push call add
0 0 _GotoXY sp, 4
ret endp
6 0 07 0000 79 24
;Magic BIOS number. ;Clear entire screen. ;Clear with black spaces. ;Upper left corner is (0,0) ;Lower X-coordinate ;Lower Y-coordinate ;Make the BIOS call. ;Position the cursor to (0,0) ; after the call. ;Remove parameters from stack.
Procedures and Functions ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
tstKbd-
Checks to see if a key is available at the keyboard.
function tstKbd:boolean; begin if BIOSKeyAvail then eat key and return true else return false; end; Activation record for tstKbd
_tstKbd
NoKey: _tstKbd
| | | Previous stk contents | ------------------------| Return Adrs (offset) | ------------------------- <- SP public proc mov int je mov int mov ret
_tstKbd near ah, 1 16h NoKey ah, 0 16h ax, 1
mov ret endp
ax, 0
;Check to see if key avail.
;Eat the key if there is one. ;Return true.
;No key, so return false.
; GetXY- Returns the cursor's current X and Y coordinates. ; ; procedure GetXY(var x:integer; var y:integer); ; ; Activation record for GetXY ; ; | | ; | Previous stk contents | ; ------------------------; | Y Coordinate | ; --Address --; | | ; ------------------------; | X coordinate | ; --Address --; | | ; ------------------------; | Return Adrs (offset) | ; ------------------------; | Old BP | ; ------------------------- <- BP ; | Registers, etc. | ; ------------------------- <- SP
GXY_X GXY_Y
_GetXY
textequ textequ
<[bp+4]> <[bp+8]>
public proc push mov push
_GetXY near bp bp, sp es
mov mov int
ah, 3 bh, 0 10h
;Read X, Y coordinates from ; BIOS
Page 629
Chapter 11
_GetXY
les mov mov
bx, GXY_X es:[bx], dl byte ptr es:[bx+1], 0
les mov mov
bx, GXY_Y es:[bx], dh byte ptr es:[bx+1], 0
pop pop ret endp
es bp
; PutChar- Outputs a single character to the screen at the current ; cursor position. ; ; procedure PutChar(ch:char); ; ; Activation record for PutChar ; ; | | ; | Previous stk contents | ; ------------------------; | char (in L.O. byte | ; ------------------------; | Return Adrs (offset) | ; ------------------------; | Old BP | ; ------------------------- <- BP ; | Registers, etc. | ; ------------------------- <- SP
ch_pc
_PutChar
_PutChar
textequ public proc push mov
<[bp+4]> _PutChar near bp bp, sp
mov mov int
al, ch_pc ah, 0eh 10h
pop ret endp
bp
; PutStr- Outputs a string to the display at the current cursor position. ; Note that a string is a sequence of characters that ends with ; a zero byte. ; ; procedure PutStr(var str:string); ; ; Activation record for PutStr ;
Page 630
Procedures and Functions ; ; ; ; ; ; ; ; ; ; ; ; ;
| | | Previous stk contents | ------------------------| String | --Address --| | ------------------------| Return Adrs (offset) | ------------------------| Old BP | ------------------------- <- BP | Registers, etc. | ------------------------- <- SP
Str_ps
_PutStr
PS_Loop:
PC_Done:
_PutStr
; StartPgm; ; ; ; ; ;
textequ
<[bp+4]>
public proc push mov push
_PutStr near bp bp, sp es
les mov cmp je
bx, Str_ps al, es:[bx] al, 0 PC_Done
push call pop inc jmp
ax _PutChar ax bx PS_Loop
pop pop ret endp
es bp
This is where DOS starts running the program. This is a substitute for the C0L.OBJ file normally linked in by the Borland C++ compiler. This code provides this routine to avoid legal problems (i.e., distributing unlinked Borland libraries). You can safely ignore this code. Note that the C++ main program is a near procedure, so this code needs to be in the _TEXT segment. extern proc
_main:near near
mov mov mov mov lea
ax, ds, es, ss, sp,
near ptr _main ah, 4ch 21h
StartPgm
call mov int endp
_TEXT
ends
_DATA stack EndStk _DATA
segment word word ends
StartPgm
_DATA ax ax ax EndStk
word public "DATA" 1000h dup (?) ?
Page 631
Chapter 11 sseg
segment word ends end
sseg
para stack 'STACK' 1000h dup (?) StartPgm
11.12 Programming Projects 1)
Write a version of the matrix multiply program inputs two 4x4 integer matrices from the user and compute their matrix product (see Chapter Eight question set). The matrix multiply algorithm (computing C := A * B) is for i := 0 to 3 do for j := 0 to 3 do begin c[i,j] := 0; for k := 0 to 3 do c[i,j] := c[i,j] + a[i,k] * b[k,j]; end;
The program should have three procedures: InputMatrix, PrintMatrix, and MatrixMul. They have the following prototypes: Procedure InputMatrix(var m:matrix); procedure PrintMatrix(var m:matrix); procedure MatrixMul(var result, A, B:matrix);
2)
In particular note that these routines all pass their parameters by reference. Pass these parameters by reference on the stack. Maintain all variables (e.g., i, j, k, etc.) on the stack using the techniques outlined in “Local Variable Storage” on page 604. In particular, do not keep the loop control variables in register. Write a main program that makes appropriate calls to these routines to test them. A pass by lazy evaluation parameter is generally a structure with three fields: a pointer to the thunk to call to the function that computes the value, a field to hold the value of the parameter, and a boolean field that contains false if the value field is uninitialized (the value field becomes initialized if the procedure writes to the value field or calls the thunk to obtain the value). Whenever the procedure writes a value to a pass by lazy evaluation parameter, it stores the value in the value field and sets the boolean field to true. Whenever a procedure wants to read the value, it first checks this boolean field. If it contains a true value, it simply reads the value from the value field; if the boolean field contains false, the procedure calls the thunk to compute the initial value. On return, the procedure stores the thunk result in the value field and sets the boolean field to true. Note that during any single activation of a procedure, the thunk for a parameter will be called, at most, one time. Consider the following Panacea procedure: SampleEval: procedure(select:boolean; eval a:integer; eval b:integer); var result:integer; endvar; begin SimpleEval; if (select) then result := a; else result := b; endif; writeln(result+2); end SampleEval;
Write an assembly language program that implements SampleEval. From your main proPage 632
Procedures and Functions
3)
gram call SampleEval a couple of times passing it different thunks for the a and b parameters. Your thunks can simply return a single value when called. Write a shuffle routine to which you pass an array of 52 integers by reference. The routine should fill the array with the values 1..52 and then randomly shuffle the items in the array. Use the Standard Library random and randomize routines to select an index in the array to swap. See Chapter Seven, “Random Number Generation: Random, Randomize” on page 343 for more details about the random function. Write a main program that passes an array to this procedure and prints out the result.
11.13 Summary In an assembly language program, all you need is a call and ret instruction to implement procedures and functions. Chapter Seven covers the basic use of procedures in an 80x86 assembly language program; this chapter describes how to organize program units like procedures and functions, how to pass parameters, allocate and access local variables, and related topics. This chapter begins with a review of what a procedure is, how to implement procedures with MASM, and the difference between near and far procedures on the 80x86. For details, see the following sections: • • • •
“Procedures” on page 566 “Near and Far Procedures” on page 568 “Forcing NEAR or FAR CALLs and Returns” on page 568 “Nested Procedures” on page 569
Functions are a very important construct in high level languages like Pascal. However, there really isn’t a difference between a function and a procedure in an assembly language program. Logically, a function returns a result and a procedure does not; but you declare and call procedures and functions identically in an assembly language program. See •
“Functions” on page 572
Procedures and functions often produce side effects. That is, they modify the values of registers and non-local variables. Often, these side effects are undesirable. For example, a procedure may modify a register that the caller needs preserved. There are two basic mechanisms for preserving such values: callee preservation and caller preservation. For details on these preservation schemes and other important issues see • •
“Saving the State of the Machine” on page 572 “Side Effects” on page 602
One of the major benefits to using a procedural language like Pascal or C++ is that you can easily pass parameters to and from procedures and functions. Although it is a little more work, you can pass parameters to your assembly language functions and procedures as well. This chapter discusses how and where to pass parameters. It also discusses how to access the parameters inside a procedure or function. To read about this, see sections • • • • • • • • • • •
“Parameters” on page 574 “Pass by Value” on page 574 “Pass by Reference” on page 575 “Pass by Value-Returned” on page 575 “Pass by Name” on page 576 “Pass by Lazy-Evaluation” on page 577 “Passing Parameters in Registers” on page 578 “Passing Parameters in Global Variables” on page 580 “Passing Parameters on the Stack” on page 581 “Passing Parameters in the Code Stream” on page 590 “Passing Parameters via a Parameter Block” on page 598 Page 633
Chapter 11 Since assembly language doesn’t really support the notion of a function, per se, implementing a function consists of writing a procedure with a return parameter. As such, function results are quite similar to parameters in many respects. To see the similarities, check out the following sections: • • • •
“Function Results” on page 600 “Returning Function Results in a Register” on page 601 “Returning Function Results on the Stack” on page 601 “Returning Function Results in Memory Locations” on page 602
Most high level languages provide local variable storage associated with the activation and deactivation of a procedure or function. Although few assembly language programs use local variables in an identical fashion, it’s very easy to implement dynamic allocation of local variables on the stack. For details, see section •
“Local Variable Storage” on page 604
Recursion is another HLL facility that is very easy to implement in an assembly language program. This chapter discusses the technique of recursion and then presents a simple example using the Quicksort algorithm. See •
Page 634
“Recursion” on page 606
Procedures and Functions
11.14 Questions 1)
Explain how the CALL and RET instructions operate.
2)
What are the operands for the PROC assembler directive? What is their function?
3)
Rewrite the following code using PROC and ENDP: FillMem: FillLoop:
moval, 0FFh mov[bx], al incbx loop FillLoop ret
4)
Modify your answer to problem (3) so that all affected registers are preserved by the FillMem procedure.
5)
What happens if you fail to put a transfer of control instruction (such as a JMP or RET) immediately before the ENDP directive in a procedure?
6)
How does the assembler determine if a CALL is near or far? How does it determine if a RET instruction is near or far?
7)
How can you override the assembler’s default decision whether to use a near or far CALL or RET?
8)
Is there ever a need for nested procedures in an assembly language program? If so, give an example.
9)
Give an example of why you might want to nest a segment inside a procedure.
10)
What is the difference between a function, and a procedure?
11)
Why should subroutines preserve the registers that they modify?
12)
What are the advantages and disadvantages of caller-preserved values and callee-preserved values?
13)
What are parameters?
14)
How do the following parameter passing mechanisms work? a) Pass by value b) Pass by reference c) Pass by value-returned d) Pass by name
15)
Where is the best place to pass parameters to a procedure?
16)
List five different locations/methods for passing parameters to or from a procedure.
17)
How are parameters that are passed on the stack accessed within a procedure?
18)
What’s the best way to deallocate parameters passed on the stack when the procedure terminates execution?
19)
Given the following Pascal procedure definition: procedure PascalProc(i,j,k:integer);
Explain how you would access the parameters of an equivalent assembly language program, assuming that the procedure is a near procedure. 20)
Repeat problem (19) assuming that the procedure is a far procedure.
21)
What does the stack look like during the execution of the procedure in problem (19)? Problem (20)?
22)
How does an assembly language procedure gain access to parameters passed in the code stream?
Page 635
Chapter 11 23)
How does the 80x86 skip over parameters passed in the code stream and continue program execution beyond them when the procedure returns to the caller?
24)
What is the advantage to passing parameters via a parameter block?
25)
Where are function results typically returned?
26)
What is a side effect?
27)
Where are local (temporary) variables typically allocated?
28)
How do you allocate local (temporary) variables within a procedure?
29)
Assuming you have three parameters passed by value on the stack and 4 different local variables, what does the activation record look like after the local variables have been allocated (assume a near procedure and no registers other than BP have been pushed onto the stack).
30)
What is recursion?
Page 636
Procedures: Advanced Topics
Chapter 12
The last chapter described how to create procedures, pass parameters, and allocate and access local variables. This chapter picks up where that one left off and describes how to access non-local variables in other procedures, pass procedures as parameters, and implement some user-defined control structures.
12.0
Chapter Overview This chapter completes the discussion of procedures, parameters, and local variables begun in the previous chapter. This chapter describes how block structured languages like Pascal, Modula-2, Algol, and Ada access local and non-local variables. This chapter also describes how to implement a user-defined control structure, the iterator. Most of the material in this chapter is of interest to compiler writers and those who want to learn how compilers generate code for certain types of program constructs. Few pure assembly language programs will use the techniques this chapter describes. Therefore, none of the material in this chapter is particularly important to those who are just learning assembly language. However, if you are going to write a compiler, or you want to learn how compilers generate code so you can write efficient HLL programs, you will want to learn the material in this chapter sooner or later. This chapter begins by discussing the notion of scope and how HLLs like Pascal access variables in nested procedures. The first section discusses the concept of lexical nesting and the use of static links and displays to access non-local variables. Next, this chapter discusses how to pass variables at different lex levels as parameters. The third section discusses how to pass parameters of one procedure as parameters to another procedure. The fourth major topic this chapter covers is passing procedures as parameters. This chapter concludes with a discussion of iterators, a user-defined control structure. This chapter assumes a familiarity with a block structured language like Pascal or Ada. If your only HLL experience is with a non-block structured language like C, C++, BASIC, or FORTRAN, some of the concepts in this chapter may be completely new and you will have trouble understanding them. Any introductory text on Pascal or Ada will help explain any concept you don’t understand that this chapter assumes is a prerequisite.
12.1
Lexical Nesting, Static Links, and Displays In block structured languages like Pascal1 it is possible to nest procedures and functions. Nesting one procedure within another limits the access to the nested procedure; you cannot access the nested procedure from outside the enclosing procedure. Likewise, variables you declare within a procedure are visible inside that procedure and to all procedures nested within that procedure2. This is the standard block structured language notion of scope that should be quite familiar to anyone who has written Pascal or Ada programs. There is a good deal of complexity hidden behind the concept of scope, or lexical nesting, in a block structured language. While accessing a local variable in the current activation record is efficient, accessing global variables in a block structured language can be very inefficient. This section will describe how a HLL like Pascal deals with non-local identifiers and how to access global variables and call non-local procedures and functions.
1. Note that C and C++ are not block structured languages. Other block structured languages include Algol, Ada, and Modula-2. 2. Subject, of course, to the limitation that you not reuse the identifier within the nested procedure.
Page 639 Thi d
t
t d ith F
M k
402
Chapter 12
12.1.1
Scope Scope in most high level languages is a static, or compile-time concept3. Scope is the notion of when a name is visible, or accessible, within a program. This ability to hide names is useful in a program because it is often convenient to reuse certain (non-descriptive) names. The i variable used to control most for loops in high level languages is a perfect example. Throughout this chapter you’ve seen equates like xyz_i, xyz_j, etc. The reason for choosing such names is that MASM doesn’t support the same notion of scoped names as high level languages. Fortunately, MASM 6.x and later does support scoped names. By default, MASM 6.x treats statement labels (those with a colon after them) as local to a procedure. That is, you may only reference such labels within the procedure in which they are declared. This is true even if you nest one procedure inside another. Fortunately, there is no good reason why anyone would want to nest procedures in a MASM program. Having local labels within a procedure is nice. It allows you to reuse statement labels (e.g., loop labels and such) without worrying about name conflicts with other procedures. Sometimes, however, you may want to turn off the scoping of names in a procedure; a good example is when you have a case statement whose jump table appears outside the procedure. If the case statement labels are local to the procedure, they will not be visible outside the procedure and you cannot use them in the case statement jump table (see “CASE Statements” on page 525). There are two ways you can turn off the scoping of labels in MASM 6.x. The first way is to include the statement in your program: option
noscoped
This will turn off variable scoping from that point forward in your program’s source file. You can turn scoping back on with a statement of the form option
scoped
By placing these statements around your procedure you can selectively control scoping. Another way to control the scoping of individual names is to place a double colon (“::”) after a label. This informs the assembler that this particular name should be global to the enclosing procedure. MASM, like the C programming language, supports three levels of scope: public, global (or static), and local. Local symbols are visible only within the procedure they are defined. Global symbols are accessible throughout a source file, but are not visible in other program modules. Public symbols are visible throughout a program, across modules. MASM uses the following default scoping rules: • • •
By default, statement labels appearing in a procedure are local to that procedure. By default, all procedure names are public. By default, most other symbols are global.
Note that these rules apply to MASM 6.x only. Other assemblers and earlier versions of MASM follow different rules. Overriding the default on the first rule above is easy – either use the option noscoped statement or use a double colon to make a label global. You should be aware, though, that you cannot make a local label public using the public or externdef directives. You must make the symbol global (using either technique) before you make it public. Having all procedure names public by default usually isn’t much of a problem. However, it might turn out that you want to use the same (local) procedure name in several different modules. If MASM automatically makes such names public, the linker will give you an error because there are multiple public procedures with the same name. You can turn on and off this default action using the following statements: option
proc:private
;procedures are global
3. There are languages that support dynamic, or run-time, scope; this text will not consider such languages.
Page 640
Procedures: Advanced Topics
One: Two: locals in Two: J, Parm Globals in Two: I, Entry, One
Locals in One: Entry, I, J, Two
Figure 12.1 Identifier Scope option
proc:export
;procedures are public
Note that some debuggers only provide symbolic information if a procedure’s name is public. This is why MASM 6.x defaults to public names. This problem does not exist with CodeView; so you can use whichever default is most convenient. Of course, if you elect to keep procedure names private (global only), then you will need to use the public or externdef directive to make desired procedure names public. This discussion of local, global, and public symbols applies mainly to statement and procedure labels. It does not apply to variables you’ve declared in your data segment, equates, macros, typedefs, or most other symbols. Such symbols are always global regardless of where you define them. The only way to make them public is to specify their names in a public or externdef directive. There is a way to declare parameter names and local variables, allocated on the stack, such that their names are local to a given procedure. See the proc directive in the MASM reference manual for details. The scope of a name limits its visibility within a program. That is, a program has access to a variable name only within that name’s scope. Outside the scope, the program cannot access that name. Many programming languages, like Pascal and C++, allow you to reuse identifiers if the scopes of those multiple uses do not overlap. As you’ve seen, MASM provides some minimal scoping features for statement labels. There is, however, another issue related to scope: address binding and variable lifetime. Address binding is the process of associating a memory address with a variable name. Variable lifetime is that portion of a program’s execution during which a memory location is bound to a variable. Consider the following Pascal procedures: procedure One(Entry:integer); var i,j:integer; procedure Two(Parm:integer); var j:integer; begin for j:= 0 to 5 do writeln(i+j); if Parm < 10 then One(Parm+1); end; begin {One} for i := 1 to 5 do Two(Entry); end;
Figure 12.1 shows the scope of identifiers One, Two, Entry, i, j, and Parm. The local variable j in Two masks the identifier j in procedure One while inside Two.
Page 641
Chapter 12
12.1.2
Unit Activation, Address Binding, and Variable Lifetime Unit activation is the process of calling a procedure or function. The combination of an activation record and some executing code is considered an instance of a routine. When unit activation occurs a routine binds machine addresses to its local variables. Address binding (for local variables) occurs when the routine adjusts the stack pointer to make room for the local variables. The lifetime of those variables is from that point until the routine destroys the activation record eliminating the local variable storage. Although scope limits the visibility of a name to a certain section of code and does not allow duplicate names within the same scope, this does not mean that there is only one address bound to a name. It is quite possible to have several addresses bound to the same name at the same time. Consider a recursive procedure call. On each activation the procedure builds a new activation record. Since the previous instance still exists, there are now two activation records on the stack containing local variables for that procedure. As additional recursive activations occur, the system builds more activation records each with an address bound to the same name. To resolve the possible ambiguity (which address do you access when operating on the variable?), the system always manipulates the variable in the most recent activation record. Note that procedures One and Two in the previous section are indirectly recursive. That is, they both call routines which, in turn, call themselves. Assuming the parameter to One is less than 10 on the initial call, this code will generate multiple activation records (and, therefore, multiple copies of the local variables) on the stack. For example, were you to issue the call One(9), the stack would look like Figure 12.2 upon first encountering the end associated with the procedure Two. As you can see, there are several copies of I and J on the stack at this point. Procedure Two (the currently executing routine) would access J in the most recent activation record that is at the bottom of Figure 12.2. The previous instance of Two will only access the variable J in its activation record when the current instance returns to One and then back to Two. The lifetime of a variable’s instance is from the point of activation record creation to the point of activation record destruction. Note that the first instance of J above (the one at the top of the diagram above) has the longest lifetime and that the lifetimes of all instances of J overlap.
12.1.3
Static Links Pascal will allow procedure Two access to I in procedure One. However, when there is the possibility of recursion there may be several instances of i on the stack. Pascal, of course, will only let procedure Two access the most recent instance of i. In the stack diagram in Figure 12.2, this corresponds to the value of i in the activation record that begins with “One(9+1) parameter.” The only problem is how do you know where to find the activation record containing i? A quick, but poorly thought out answer, is to simply index backwards into the stack. After all, you can easily see in the diagram above that i is at offset eight from Two’s activation record. Unfortunately, this is not always the case. Assume that procedure Three also calls procedure Two and the following statement appears within procedure One: If (Entry <5) then Three(Entry*2) else Two(Entry);
With this statement in place, it’s quite possible to have two different stack frames upon entry into procedure Two: one with the activation record for procedure Three sandwiched between One and Two’s activation records and one with the activation records for procedures One and Two adjacent to one another. Clearly a fixed offset from Two’s activation record will not always point at the i variable on One’s most recent activation record.
Page 642
Procedures: Advanced Topics
Previous Stack Contents 9
One(9) parameter
Return Address Saved BP Value "I" Local Variable
One Activation Record
"J" Local Variable 9
Two(9) parameter
Return Address Saved BP Value
Two Activation Record
"J" Local Variable 10
One(9+1) parameter
Return Address Saved BP Value
One Activation Record
"I" Local Variable "J" Local Variable 1 0 Return Address Saved BP Value
Two(9+1) parameter Two Activation Record
"J" Local Variable
Figure 12.2 Indirect Recursion The astute reader might notice that the saved bp value in Two’s activation record points at the caller’s activation record. You might think you could use this as a pointer to One’s activation record. But this scheme fails for the same reason the fixed offset technique fails. Bp’s old value, the dynamic link, points at the caller’s activation record. Since the caller isn’t necessarily the enclosing procedure the dynamic link might not point at the enclosing procedure’s activation record. What is really needed is a pointer to the enclosing procedure’s activation record. Many compilers for block structured languages create such a pointer, the static link. Consider the following Pascal code: procedure Parent; var i,j:integer; procedure Child1; var j:integer; begin for j := 0 to 2 do writeln(i); end {Child1}; procedure Child2; var i:integer; begin for i := 0 to 1 do Child1; end {Child2};
Page 643
Chapter 12
Previous Stack Contents
Activation record for Parent
Activation record for Child2
Activation record for Child1 SP
Figure 12.3 Activation Records after Several Nested Calls begin {Parent} Child2; Child1; end;
Just after entering Child1 for the first time, the stack would look like Figure 12.3. When Child1 attempts to access the variable i from Parent, it will need a pointer, the static link, to Parent’s activation record. Unfortunately, there is no way for Child1, upon entry, to figure out on it’s own where Parent’s activation record lies in memory. It will be necessary for the caller (Child2 in this example) to pass the static link to Child1. In general, the callee can treat the static link as just another parameter; usually pushed on the stack immediately before executing the call instruction. To fully understand how to pass static links from call to call, you must first understand the concept of a lexical level. Lexical levels in Pascal correspond to the static nesting levels of procedures and functions. Most compiler writers specify lex level zero as the main program. That is, all symbols you declare in your main program exist at lex level zero. Procedure and function names appearing in your main program define lex level one, no matter how many procedures or functions appear in the main program. They all begin a new copy of lex level one. For each level of nesting, Pascal introduces a new lex level. Figure 12.4 shows this. During execution, a program may only access variables at a lex level less than or equal to the level of the current routine. Furthermore, only one set of values at any given lex level are accessible at any one time4 and those values are always in the most recent activation record at that lex level. Before worrying about how to access non-local variables using a static link, you need to figure out how to pass the static link as a parameter. When passing the static link as a parameter to a program unit (procedure or function), there are three types of calling sequences to worry about: •
A program unit calls a child procedure or function. If the current lex level is n, then a child procedure or function is at lex level n+1 and is local to
4. There is one exception. If you have a pointer to a variable and the pointer remains accessible, you can access the data it points at even if the variable actually holding that data is inaccessible. Of course, in (standard) Pascal you cannot take the address of a local variable and put it into a pointer. However, certain dialects of Pascal (e.g., Turbo) and other block structured languages will allow this operation.
Page 644
Procedures: Advanced Topics
Lex Level Zero Lex Level One Lex Level Two Note: Each rectangle represents a procedure or function.
Figure 12.4 Procedure Schematic Showing Lexical Levels
Previous Stack Contents Parameters Static Link Return Address Dynamic Link (Old BP) Local variables
Any Registers Saved on Stack SP
Figure 12.5 Generic Activation Record
•
•
the current program unit. Note that most block structured languages do not allow calling procedures or functions at lex levels greater than n+1. A program unit calls a peer procedure or function. A peer procedure or function is one at the same lexical level as the current caller and a single program unit encloses both program units. A program unit calls an ancestor procedure or function. An ancestor unit is either the parent unit, a parent of an ancestor unit, or a peer of an ancestor unit.
Calling sequences for the first two types of calls above are very simple. For the sake of this example, assume the activation record for these procedures takes the generic form in Figure 12.5. When a parent procedure or function calls a child program unit, the static link is nothing more than the value in the bp register immediately prior to the call. Therefore, to pass the static link to the child unit, just push bp before executing the call instruction:
Page 645
Chapter 12
Lex Level 0 Lex Level 1 Lex Level 2
Eac h box represents an activation record.
Lex Level 3 Lex Level 3 Lex Level 4
Each arror represents a static link.
Lex Level 5 Lex Level 5 Lex Level 5
Figure 12.6 Static Links push bp call ChildUnit
Of course the child unit can process the static link off the stack just like any other parameter. In this case, that the static and dynamic links are exactly the same. In general, however, this is not true. If a program unit calls a peer procedure or function, the current value in bp is not the static link. It is a pointer to the caller’s local variables and the peer procedure cannot access those variables. However, as peers, the caller and callee share the same parent program unit, so the caller can simply push a copy of its static link onto the stack before calling the peer procedure or function. The following code will do this, assuming all procedures and functions are near: push [bp+4] call PeerUnit
;Push static link onto stk.
If the procedure or function is far, the static link would be two bytes farther up the stack, so you would need to use the following code: push [bp+6] call PeerUnit
;Push static link onto stk.
Calling an ancestor is a little more complex. If you are currently at lex level n and you wish to call an ancestor at lex level m (m < n), you will need to traverse the list of static links to find the desired activation record. The static links form a list of activation records. By following this chain of activation records until it ends, you can step through the most recent activation records of all the enclosing procedures and functions of a particular program unit. The stack diagram in Figure 12.6 shows the static links for a sequence of procedure calls statically nested five lex levels deep. If the program unit currently executing at lex level five wishes to call a procedure at lex level three, it must push a static link to the most recently activated program unit at lex level two. In order to find this static link you will have to traverse the chain of static links. If you are at lex level n and you want to call a procedure at lex level m you will have to traverse (n-m)+1 static links. The code to accomplish this is
Page 646
Procedures: Advanced Topics ; Current lex level is 5. This code locates the static link for, ; and then calls a procedure at lex level 2. Assume all calls are ; near: mov mov mov push call
bx, [bp+4] bx, ss:[bx+4] bx, ss:[bx+4] ss:[bx+4] ProcAtLL2
;Traverse static link to LL 4. ;To Lex Level 3. ;To Lex Level 2. ;Ptr to most recent LL1 A.R.
Note the ss: prefix in the instructions above. Remember, the activation records are all in the stack segment and bx indexes the data segment by default.
12.1.4
Accessing Non-Local Variables Using Static Links In order to access a non-local variable, you must traverse the chain of static links until you get a pointer to the desired activation record. This operation is similar to locating the static link for a procedure call outlined in the previous section, except you traverse only n-m static links rather than (n-m)+1 links to obtain a pointer to the appropriate activation record. Consider the following Pascal code: procedure Outer; var i:integer; procedure Middle; var j:integer; procedure Inner; var k:integer; begin k := 3; writeln(i+j+k); end; begin {middle} j := 2; writeln(i+j); Inner; end; {middle} begin {Outer} i := 1; Middle; end; {Outer}
The Inner procedure accesses global variables at lex level n-1 and n-2 (where n is the lex level of the Inner procedure). The Middle procedure accesses a single global variable at lex level m-1 (where m is the lex level of procedure Middle). The following assembly language code could implement these three procedures: Outer
proc push mov sub
near bp bp, sp sp, 2
mov push call
word ptr [bp-2],1 bp Middle
;Set I to one. ;Static link for Middle.
sp, bp bp 2
;Remove local variables.
Outer
mov pop ret endp
Middle
proc
near
;Make room for I.
;Remove static link on ret.
Page 647
Chapter 12
Middle Inner
Inner
push mov sub
bp bp, sp sp, 2
;Save dynamic link ;Set up activation record. ;Make room for J.
mov mov mov add puti putcr push call
word ptr [bp-2],2 bx, [bp+4] ax, ss:[bx-2] ax, [bp-2]
;J := 2; ;Get static link to prev LL. ;Get I’s value. ;Add to J and then ; print the sum.
bp Inner
;Static link for Inner.
mov pop ret endp
sp, bp bp 2
;Remove static link on RET.
proc push mov sub
near bp bp, sp sp, 2
;Save dynamic link ;Set up activation record. ;Make room for K.
mov mov mov add
word ptr [bp-2],2 bx, [bp+4] ax, ss:[bx-2] ax, [bp-2]
;K := 3; ;Get static link to prev LL. ;Get J’s value. ;Add to K
mov add puti putcr
bx, ss:[bx+4] ax, ss:[bx-2]
;Get ptr to Outer’s Act Rec. ;Add in I’s value and then ; print the sum.
mov pop ret endp
sp, bp bp 2
;Remove static link on RET.
As you can see, accessing global variables can be very inefficient5. Note that as the difference between the activation records increases, it becomes less and less efficient to access global variables. Accessing global variables in the previous activation record requires only one additional instruction per access, at two lex levels you need two additional instructions, etc. If you analyze a large number of Pascal programs, you will find that most of them do not nest procedures and functions and in the ones where there are nested program units, they rarely access global variables. There is one major exception, however. Although Pascal procedures and functions rarely access local variables inside other procedures and functions, they frequently access global variables declared in the main program. Since such variables appear at lex level zero, access to such variables would be as inefficient as possible when using the static links. To solve this minor problem, most 80x86 based block structured languages allocate variables at lex level zero directly in the data segment and access them directly.
12.1.5
The Display After reading the previous section you might get the idea that one should never use non-local variables, or limit non-local accesses to those variables declared at lex level zero. After all, it’s often easy enough to put all shared variables at lex level zero. If you are designing a programming language, you can adopt the C language designer’s philosophy and simply not provide block structure. Such compromises turn out to be unnecessary. There is a data structure, the display, that provides efficient access to any set of non-local variables.
5. Indeed, perhaps one of the main reasons the C programming language is not block structured is because the language designers wanted to avoid inefficient access to non-local variables.
Page 648
Procedures: Advanced Topics
Lex Level 0 Lex Level 1 Display 0 1 2 3 4 5 6
Lex Level 2 Lex Level 3 Lex Level 3 Lex Level 4 Lex Level 5
????
Lex Level 5 Lex Level 5
Figure 12.7 The Display A display is simply an array of pointers to activation records. Display[0] contains a pointer to the most recent activation record for lex level zero, Display[1] contains a pointer to the most recent activation record for lex level one, and so on. Assuming you’ve maintained the Display array in the current data segment (always a good place to keep it) it only takes two instructions to access any non-local variable. Pictorially, the display works as shown in Figure 12.7. Note that the entries in the display always point at the most recent activation record for a procedure at the given lex level. If there is no active activation record for a particular lex level (e.g., lex level six above), then the entry in the display contains garbage. The maximum lexical nesting level in your program determines how many elements there must be in the display. Most programs have only three or four nested procedures (if that many) so the display is usually quite small. Generally, you will rarely require more than 10 or so elements in the display. Another advantage to using a display is that each individual procedure can maintain the display information itself, the caller need not get involved. When using static links the calling code has to compute and pass the appropriate static link to a procedure. Not only is this slow, but the code to do this must appear before every call. If your program uses a display, the callee, rather than the caller, maintains the display so you only need one copy of the code per procedure. Furthermore, as the next example shows, the code to handle the display is short and fast. Maintaining the display is very easy. Upon initial entry into a procedure you must first save the contents of the display array at the current lex level and then store the pointer to the current activation record into that same spot. Accessing a non-local variable requires only two instructions, one to load an element of the display into a register and a second to access the variable. The following code implements the Outer, Middle, and Inner procedures from the static link examples. ; ; ; ;
Assume Outer is at lex level 1, Middle is at lex level 2, and Inner is at lex level 3. Keep in mind that each entry in the display is two bytes. Presumably, the variable Display is defined in the data segment.
Outer
proc push mov push sub
near bp bp, sp Display[2] sp, 2
;Save current Display Entry ;Make room for I.
Page 649
Chapter 12
Outer Middle
mov call
word ptr [bp-4],1 Middle
;Set I to one.
add pop pop ret endp
sp, 2 Display[2] bp
;Remove local variables ;Restore previous value.
proc push mov
near bp bp, sp
;Save dynamic link. ;Set up our activation
push sub
Display[4] sp, 2
;Save old Display value. ;Make room for J.
mov mov mov add puti putcr call
word ptr [bp-2],2 bx, Display[2] ax, ss:[bx-4] ax, [bp-2]
;J := 2; ;Get static link to prev LL. ;Get I’s value. ;Add to J and then ; print the sum.
add pop pop ret endp
sp, 2 Display[4] bp
proc push mov push sub
near bp bp, sp Display[6] sp, 2
;Save dynamic link ;Set up activation record. ;Save old display value ;Make room for K.
mov mov mov add
word ptr [bp-2],2 bx, Display[4] ax, ss:[bx-4] ax, [bp-2]
;K := 3; ;Get static link to prev LL. ;Get J’s value. ;Add to K
mov add puti putcr
bx, Display[2] ax, ss:[bx-4]
;Get ptr to Outer’s Act Rec. ;Add in I’s value and then ; print the sum.
add pop pop ret endp
sp, 2 Display [6] bp
record.
Middle Inner
Inner
Inner ;Remnove local variable. ;Restore old Display value.
Although this code doesn’t look particularly better than the former code, using a display is often much more efficient than using static links.
12.1.6
The 80286 ENTER and LEAVE Instructions When designing the 80286, Intel’s CPU designers decided to add two instructions to help maintain displays. Unfortunately, although their design works, is very general, and only requires data in the stack segment, it is very slow; much slower than using the techniques in the previous section. Although many non-optimizing compilers use these instructions, the best compilers avoid using them, if possible. The leave instruction is very simple to understand. It performs the same operation as the two instructions: mov pop
sp, bp bp
Therefore, you may use the instruction for the standard procedure exit code if you have an 80286 or later microprocessor. On an 80386 or earlier processor, the leave instruction is Page 650
Procedures: Advanced Topics shorter and faster than the equivalent move and pop sequence. However, the leave instruction is slower on 80486 and later processors. The enter instruction takes two operands. The first is the number of bytes of local storage the current procedure requires, the second is the lex level of the current procedure. The enter instruction does the following: ; ENTER Locals, LexLevel
lp:
Done: Lex0:
push mov cmp je dec jz sub push jmp
bp tempreg, sp LexLevel, 0 Lex0 LexLevel Done bp, 2 [bp] lp
;Save dynamic link. ;Save for later. ;Done if this is lex level zero.
;Quit if at last lex level. ;Index into display in prev act rec ; and push each element there. ;Repeat for each entry.
push mov sub
tempreg bp, tempreg sp, Locals
;Add entry for current lex level. ;Ptr to current act rec. ;Allocate local storage
As you can see from this code, the enter instruction copies the display from activation record to activation record. This can get quite expensive if you nest the procedures to any depth. Most HLLs, if they use the enter instruction at all, always specify a nesting level of zero to avoid copying the display throughout the stack. The enter instruction puts the value for the display[n] entry at location BP-(n*2). The enter instruction does not copy the value for display[0] into each stack frame. Intel assumes that you will keep the main program’s global variables in the data segment. To save time and memory, they do not bother copying the display[0] entry. The enter instruction is very slow, particularly on 80486 and later processors. If you really want to copy the display from activation record to activation record it is probably a better idea to push the items yourself. The following code snippets show how to do this: ; enter n, 0
;14 cycles on the 486 push sub
; enter n, 1
bp [bp-2] bp, sp bp, 2 sp, n
;1 ;4 ;1 ;1 ;1
cycle on the 486 cycles on the 486 cycle on the 486 cycle on the 486 cycle on the 486
;1 ;4 ;4 ;1 ;1 ;1
cycle on the 486 cycles on the 486 cycles on the 486 cycle on the 486 cycle on the 486 cycle on the 486
;1 ;4 ;4 ;4 ;1 ;1 ;1
cycle on the 486 cycles on the 486 cycles on the 486 cycles on the 486 cycle on the 486 cycle on the 486 cycle on the 486
;20 cycles on the 486 push push push mov add sub
; enter n, 3
;1 cycle on the 486 ;1 cycle on the 486
;17 cycles on the 486 push push mov add sub
; enter n, 2
bp sp, n
bp [bp-2] [bp-4] bp, sp bp, 4 sp, n
;23 cycles on the 486 push push push push mov add sub
bp [bp-2] [bp-4] [bp-6] bp, sp bp, 6 sp, n
Page 651
Chapter 12 ; enter n, 4
;26 cycles on the 486 push push push push push mov add sub
bp [bp-2] [bp-4] [bp-6] [bp-8] bp, sp bp, 8 sp, n
;1 ;4 ;4 ;4 ;4 ;1 ;1 ;1
cycle on the 486 cycles on the 486 cycles on the 486 cycles on the 486 cycles on the 486 cycle on the 486 cycle on the 486 cycle on the 486
; etc.
If you are willing to believe Intel’s cycle timings, you can see that the enter instruction is almost never faster than a straight line sequence of instructions that accomplish the same thing. If you are interested in saving space rather than writing fast code, the enter instruction is generally a better alternative. The same is generally true for the leave instruction as well. It is only one byte long, but it is slower than the corresponding mov bp,sp and pop bp instructions. Accessing non-local variables using the displays created by enter appears in the exercises.
12.2
Passing Variables at Different Lex Levels as Parameters. Accessing variables at different lex levels in a block structured program introduces several complexities to a program. The previous section introduced you to the complexity of non-local variable access. This problem gets even worse when you try to pass such variables as parameters to another program unit. The following subsections discuss strategies for each of the major parameter passing mechanisms. For the purposes of discussion, the following sections will assume that “local” refers to variables in the current activation record, “global” refers to variables in the data segment, and “intermediate” refers to variables in some activation record other than the current activation record. Note that the following sections will not assume that ds is equal to ss. These sections will also pass all parameters on the stack. You can easily modify the details to pass these parameters elsewhere.
12.2.1
Passing Parameters by Value in a Block Structured Language Passing value parameters to a program unit is no more difficult than accessing the corresponding variables; all you need do is push the value on the stack before calling the associated procedure. To pass a global variable by value to another procedure, you could use code like the following: push call
GlobalVar Procedure
;Assume “GlobalVar” is in DSEG.
To pass a local variable by value to another procedure, you could use the following code6: push call
[bp-2] Procedure
;Local variable in current activation ; record.
To pass an intermediate variable as a value parameter, you must first locate that intermediate variable’s activation record and then push its value onto the stack. The exact mechanism you use depends on whether you are using static links or a display to keep track of the intermediate variable’s activation records. If using static links, you might use
6. The non-global examples all assume the variable is at offset -2 in their activation record. Change this as appropriate in your code.
Page 652
Procedures: Advanced Topics code like the following to pass a variable from two lex levels up from the current procedure: mov mov push call
bx, [bp+4] bx, ss:[bx+4] ss:[bx-2] Procedure
;Assume S.L. is at offset 4. ;Traverse two static links ;Push variables value.
Passing an intermediate variable by value when you are using a display is somewhat easier. You could use code like the following to pass an intermediate variable from lex level one: mov push call
12.2.2
bx, Display[1*2] ss:[bx-2] Procedure
;Get Display[1] entry. ;Push the variable’s value.
Passing Parameters by Reference, Result, and Value-Result in a Block Structured Language The pass by reference, result, and value-result parameter mechanisms generally pass the address of parameter on the stack7. If global variables reside in the data segment, activation records all exist in the stack segment, and ds≠ss, then you must pass far pointers to access all possible variables8. To pass a far pointer you must push a segment value followed by an offset value on the stack. For global variables, the segment value is found in the ds register; for non-global values, ss contains the segment value. To compute the offset portion of the address you would normally use the lea instruction. The following code sequence passes a global variable by reference: push lea push call
ds ax, GlobalVar ax Procedure
;Push segment adrs first. ;Compute offset. ;Push offset of GlobalVar
Global variables are a special case because the assembler can compute their run-time offsets at assembly time. Therefore, for scalar global variables only, we can shorten the code sequence above to push push call
ds offset GlobalVar Procedure
;Push segment adrs. ;Push offset portion.
To pass a local variable by reference you code must first push ss’s value onto the stack and then push the local variable’s offset. This offset is the variable’s offset within the stack segment, not the offset within the activation record! The following code passes the address of a local variable by reference: push lea push call
ss ax, [bp-2] ax Procedure
;Push segment address. ;Compute offset of local ; variable and push it.
To pass an intermediate variable by reference you must first locate the activation record containing the variable so you can compute the effective address into the stack segment. When using static links, the code to pass the parameter’s address might look like the following:
7. As you may recall, pass by reference, value-result, and result all use the same calling sequence. The differences lie in the procedures themselves. 8. You can use near pointers if ds=ss or if you keep global variables in the main program’s activation record in the stack segment.
Page 653
Chapter 12 push mov mov lea push call
ss bx, [bp+4] bx, ss:[bx+4] ax, [bx-2] ax Procedure
;Push segment portion. ;Assume S.L. is at offset 4. ;Traverse two static links ;Compute effective address ;Push offset portion.
When using a display, the calling sequence might look like the following: push mov lea push call
ss bx, Display[1*2] ax, [bx-2] ax Procedure
;Push segment portion. ;Get Display[1] entry. ;Get the variable’s offset ; and push it.
As you may recall from the previous chapter, there is a second way to pass a parameter by value-result. You can push the value onto the stack and then, when the procedure returns, pop this value off the stack and store it back into the variable from whence it came. This is just a special case of the pass by value mechanism described in the previous section.
12.2.3
Passing Parameters by Name and Lazy-Evaluation in a Block Structured Language Since you pass the address of a thunk when passing parameters by name or by lazy-evaluation, the presence of global, intermediate, and local variables does not affect the calling sequence to the procedure. Instead, the thunk has to deal with the differing locations of these variables. The following examples will present thunks for pass by name, you can easily modify these thunks for lazy-evaluation parameters. The biggest problem a thunk has is locating the activation record containing the variable whose address it returns. In the last chapter, this wasn’t too much of a problem since variables existed either in the current activation record or in the global data space. In the presence of intermediate variables, this task becomes somewhat more complex. The easiest solution is to pass two pointers when passing a variable by name. The first pointer should be the address of the thunk, the second pointer should be the offset of the activation record containing the variable the thunk must access9. When the procedure calls the thunk, it must pass this activation record offset as a parameter to the thunk. Consider the following Panacea procedures: TestThunk:procedure(name item:integer; var j:integer); begin TestThunk; for j in 0..9 do item := 0; end TestThunk; CallThunk:procedure; var A: array[0..9] : integer; I: integer; endvar; begin CallThunk; TestThunk(A[I], I); end CallThunk;
The assembly code for the above might look like the following: ; TestThunk AR: ; ; BP+10-
Address of thunk
9. Actually, you may need to pass several pointers to activation records. For example, if you pass the variable “A[i,j,k]” by name and A, i, j, and k are all in different activation records, you will need to pass pointers to each activation record. We will ignore this problem here.
Page 654
Procedures: Advanced Topics ; ;
BP+8BP+4-
TestThunk
ForLoop:
ForDone:
TestThunk
CallThunk
Thunk
Thunk OverThunk:
CallThunk
12.3
Ptr to AR for Item and J parameters (must be in the same AR). Far ptr to J. proc push mov push push push
near bp bp, sp ax bx es
les mov cmp ja push call mov les inc jmp
bx, [bp+4] word ptr es:[bx], 0 word ptr es:[bx], 9 ForDone [bp+8] word ptr [bp+10] word ptr ss:[bx], 0 bx, [bp+4] word ptr es:[bx] ForLoop
pop pop pop pop ret endp
es bx ax bp 8
proc push mov sub
near bp bp, sp sp, 12
jmp proc push mov mov mov add add add pop ret endp
OverThunk
push push push lea push call mov ret endp
offset Thunk bp ss ax, [bp-22] ax TestThunk sp, bp
bp bp, bp, ax, ax, bx, bx, bp 2
sp [bp+4] [bp-22] ax -20 ax
;Get ptr to J. ;J := 0; ;Is J > 9? ;Push AR passed by caller. ;Call the thunk. ;Thunk returns adrs in BX. ;Get ptr to J. ;Add one to it.
;Make room for locals.
;Get AR address. ;Get I’s value. ;Double, since A is a word array. ;Offset to start of A ;Compute address of A[I] and ; return it in BX. ;Remove parameter from stack.
;Push (near) address of thunk ;Push ptr to A/I’s AR for thunk ;Push address of I onto stack. ; Offset portion of I.
Passing Parameters as Parameters to Another Procedure When a procedure passes one of its own parameters as a parameter to another procedure, certain problems develop that do not exist when passing variables as parameters. Indeed, in some (rare) cases it is not logically possible to pass some parameter types to some other procedure. This section deals with the problems of passing one procedure’s parameters to another procedure. Pass by value parameters are essentially no different than local variables. All the techniques in the previous sections apply to pass by value parameters. The following sections Page 655
Chapter 12 deal with the cases where the calling procedure is passing a parameter passed to it by reference, value-result, result, name, and lazy evaluation.
12.3.1
Passing Reference Parameters to Other Procedures Passing a reference parameter though to another procedure is where the complexity begins. Consider the following (pseudo) Pascal procedure skeleton: procedure HasRef(var refparm:integer); procedure ToProc(???? parm:integer); begin . . .
end; begin {HasRef} . . .
ToProc(refParm); . . .
end;
The “????” in the ToProc parameter list indicates that we will fill in the appropriate parameter passing mechanism as the discussion warrants. If ToProc expects a pass by value parameter (i.e., ???? is just an empty string), then HasRef needs to fetch the value of the refparm parameter and pass this value to ToProc. The fol-
lowing code accomplishes this10: les push call
bx, [bp+4] es:[bx] ToProc
;Fetch address of refparm ;Push integer pointed at by refparm
To pass a reference parameter by reference, value-result, or result parameter is easy – just copy the caller’s parameter as-is onto the stack. That is, if the parm parameter in ToProc above is a reference parameter, a value-result parameter, or a result parameter, you would use the following calling sequence: push push call
[bp+6] [bp+4] ToProc
;Push segment portion of ref parm. ;Push offset portion of ref parm.
To pass a reference parameter by name is fairly easy. Just write a thunk that grabs the reference parameter’s address and returns this value. In the example above, the call to ToProc might look like the following: Thunk0
Thunk0 SkipThunk:
jmp proc les ret endp
SkipThunk near bx, [bp+4]
push push call
offset Thunk0 bp ToProc
;Assume BP points at HasRef’s AR.
;Address of thunk. ;AR containing thunk’s vars.
Inside ToProc, a reference to the parameter might look like the following: push mov call pop mov
bp bp, [bp+4] near ptr [bp+6] bp ax, es:[bx]
;Save our AR ptr. ;Ptr to Parm’s AR. ;Call the thunk. ;Retrieve our AR ptr. ;Access variable.
. . .
10. The examples in this section all assume the use of a display. If you are using static links, be sure to adjust all the offsets and the code to allow for the static link that the caller must push immediately before a call.
Page 656
Procedures: Advanced Topics To pass a reference parameter by lazy evaluation is very similar to passing it by name. The only difference (in ToProc’s calling sequence) is that the thunk must return the value of the variable rather than its address. You can easily accomplish this with the following thunk: Thunk1
Thunk1
12.3.2
proc push push les mov pop pop ret endp
near es bx bx, [bp+4] ax, es:[bx] bx es
;Assume BP points at HasRef’s AR. ;Return value of ref parm in ax.
Passing Value-Result and Result Parameters as Parameters Assuming you’ve created a local variable that holds the value of a value-result or result parameter, passing one of these parameters to another procedure is no different than passing value parameters to other code. Once a procedure makes a local copy of the value-result parameter or allocates storage for a result parameter, you can treat that variable just like a value parameter or a local variable with respect to passing it on to other procedures. Of course, it doesn’t make sense to use the value of a result parameter until you’ve stored a value into that parameter’s local storage. Therefore, take care when passing result parameters to other procedures that you’ve initialized a result parameter before using its value.
12.3.3
Passing Name Parameters to Other Procedures Since a pass by name parameter’s thunk returns the address of a parameter, passing a name parameter to another procedure is very similar to passing a reference parameter to another procedure. The primary differences occur when passing the parameter on as a name parameter. When passing a name parameter as a value parameter, you first call the thunk, dereference the address the thunk returns, and then pass the value to the new procedure. The following code demonstrates such a call when the thunk returns the variable’s address in es:bx (assume pass by name parameter’s AR pointer is at address bp+4 and the pointer to the thunk is at address bp+6): push mov call push pop call
bp bp, [bp+4] near ptr [bp+6] word ptr es:[bx] bp ToProc
;Save our AR ptr. ;Ptr to Parm’s AR. ;Call the thunk. ;Push parameter’s value. ;Retrieve our AR ptr. ;Call the procedure.
. . .
Passing a name parameter to another procedure by reference is very easy. All you have to do is push the address the thunk returns onto the stack. The following code, that is very similar to the code above, accomplishes this: push mov call pop push push call
bp bp, [bp+4] near ptr [bp+6] bp es bx ToProc
;Save our AR ptr. ;Ptr to Parm’s AR. ;Call the thunk. ;Retrieve our AR ptr. ;Push seg portion of adrs. ;Push offset portion of adrs. ;Call the procedure.
. . .
Page 657
Chapter 12 Passing a name parameter to another procedure as a pass by name parameter is very easy; all you need to do is pass the thunk (and associated pointers) on to the new procedure. The following code accomplishes this: push push call
[bp+6] [bp+4] ToProc
;Pass Thunk’s address. ;Pass adrs of Thunk’s AR.
To pass a name parameter to another procedure by lazy evaluation, you need to create a thunk for the lazy-evaluation parameter that calls the pass by name parameter’s thunk, dereferences the pointer, and then returns this value. The implementation is left as a programming project.
12.3.4
Passing Lazy Evaluation Parameters as Parameters Lazy evaluation parameters typically consist of three components: the address of a thunk, a location to hold the value the thunk returns, and a boolean variable that determines whether the procedure must call the thunk to get the parameter’s value or if it can simply use the value previously returned by the thunk (see the exercises in the previous chapter to see how to implement lazy evaluation parameters). When passing a parameter by lazy evaluation to another procedure, the calling code must first check the boolean variable to see if the value field is valid. If not, the code must first call the thunk to get this value. If the boolean field is true, the calling code can simply use the data in the value field. In either case, once the value field has data, passing this data on to another procedure is no different than passing a local variable or a value parameter to another procedure.
12.3.5
Parameter Passing Summary
Table 48: Passing Parameters as Parameters to Another Procedure Pass as Value
Pass as Reference
Pass as Value-Result
Pass as Result
Pass as Name
Pass as Lazy Evaluation
Value
Pass the value
Pass address of the value parameter
Pass address of the value parameter
Pass address of the value parameter
Create a thunk that returns the address of the value parameter
Create a thunk that returns the value
Reference
Dereference parameter and pass the value it points at
Pass the address (value of the reference parameter)
Pass the address (value of the reference parameter)
Pass the address (value of the reference parameter)
Create a thunk that passes the address (value of the reference parameter)
Create a thunk that deferences the reference parameter and returns its value
Value-Result
Pass the local value as the value parameter
Pass the address of the local value as the parameter
Pass the address of the local value as the parameter
Pass the address of the local value as the parameter
Create a thunk that returns the address of the local value of the value-result parameter
Create a thunk that returns the value in the local value of the value-result parameter
Result
Pass the local value as the value parameter
Pass the address of the local value as the parameter
Pass the address of the local value as the parameter
Pass the address of the local value as the parameter
Create a thunk that returns the address of the local value of the result parameter
Create a thunk that returns the value in the local value of the result parameter
Page 658
Procedures: Advanced Topics
Table 48: Passing Parameters as Parameters to Another Procedure Pass as Value
Pass as Reference
Pass as Value-Result
Pass as Result
Pass as Name
Pass as Lazy Evaluation
Name
Call the thunk, dereference the pointer, and pass the value at the address the thunk returns
Call the thunk and pass the address it returns as the parameter
Call the thunk and pass the address it returns as the parameter
Call the thunk and pass the address it returns as the parameter
Pass the address of the thunk and any other values associated with the name parameter
Write a thunk that calls the name parameter’s thunk, dereferences the address it returns, and then returns the value at that address
Lazy Evaluation
If necessary, call the thunk to obtain the Lazy Eval parameter’s value. Pass the local value as the value parameter
If necessary, call the thunk to obtain the Lazy Eval parameter’s value. Pass the address of the local value as the parameter
If necessary, call the thunk to obtain the Lazy Eval parameter’s value. Pass the address of the local value as the parameter
If necessary, call the thunk to obtain the Lazy Eval parameter’s value. Pass the address of the local value as the parameter
If necessary, call the thunk to obtain the Lazy Eval parameter’s value. Create a thunk that returns the address of the Lazy Eval’s value field
Create a thunk that checks the boolean field of the caller’s Lazy Eval parameter. It should call the corresponding thunk if this variable is false. It should set the boolean field to true and then return the data in the value field
12.4
Passing Procedures as Parameters Many programming languages let you pass a procedure or function name as a parameter. This lets the caller pass along various actions to perform inside a procedure. The classic example is a plot procedure that graphs some generic math function passed as a parameter to plot. Standard Pascal lets you pass procedures and functions by declaring them as follows: procedure DoCall(procedure x); begin x; end;
The statement DoCall(xyz); calls DoCall that, in turn, calls procedure xyz. Passing a procedure or function as a parameter may seem like an easy task – just pass the address of the function or procedure as the following example demonstrates: procedure PassMe; begin Writeln('PassMe was called'); end; procedure CallPassMe(procedure x); begin x; end; begin {main} CallPassMe(PassMe); end.
Page 659
Chapter 12 The 80x86 code to implement the above could look like the following: PassMe
proc print byte ret endp
PassMe CallPassMe
CallPassMe Main
Main
near "PassMe was called",cr,lf,0
proc push mov call pop ret endp
near bp bp, sp word ptr [bp+4] bp 2
proc lea push call ExitPgm endp
near bx, PassMe bx CallPassMe
;Pass address of PassMe to ; CallPassMe
For an example as simple as the one above, this technique works fine. However, it does not always work properly if PassMe needs to access non-local variables. The following Pascal code demonstrates the problem that could occur: program main; procedure dummy; begin end; procedure Recurse1(i:integer; procedure x); procedure Print; begin writeln(i); end; procedure Recurse2(j:integer; procedure y); begin if (j=1) then y else if (j=5) then Recurse1(j-1, Print) else Recurse1(j-1, y); end; begin {Recurse1} Recurse2(i, x); end; begin {Main} Recurse1(5,dummy); end.
This code produces the following call sequence: Recurse1(5,dummy) → Recurse2(5,dummy) → Recurse1(4,Print) → Recurse2(4,Print) → Recurse1(3,Print) → Recurse2(3,Print) → Recurse1(2,Print) → Recurse2(2,Print) → Recurse1(1,Print) → Recurse2(1,Print) → Print Print will print the value of Recurse1’s i variable to the standard output. However, there are several activation records present on the stack that raises the obvious question, “which copy of i does Print display?” Without giving it much thought, you might conclude that it should print the value “1” since Recurse2 calls Print when Recurse1’s value for i is one. Note, though, that when Recurse2 passes the address of Print to Recurse1, i’s value is four. Pascal, like most block structured languages, will use the value of i at the point Recurse2
Page 660
Procedures: Advanced Topics passes the address of Print to Recurse1. Hence, the code above should print the value four, not the value one. This creates a difficult implementation problem. After all, Print cannot simply access the display to gain access to the global variable i – the display’s entry for Recurse1 points at the latest copy of Recurse1’s activation record, not the entry containing the value four which is what you want. The most common solution in systems using a display is to make a local copy of each display whenever calling a procedure or function. When passing a procedure or function as a parameter, the calling code copies the display along with the address of the procedure or function. This is why Intel’s enter instruction makes a copy of the display when building the activation record. If you are passing procedures and functions as parameters, you may want to consider using static links rather than a display. When using a static link you need only pass a single pointer (the static link) along with the routine’s address. Of course, it is more work to access non-local variables, but you don’t have to copy the display on every call, which is quite expensive. The following 80x86 code provides the implementation of the above code using static links: wp
textequ
<word ptr>
Dummy
proc ret endp
near
Dummy
; PrintIt; (Use the name PrintIt to avoid conflict). ; ; stack: ; ; bp+4: static link. PrintIt
PrintIt
proc push mov mov mov puti pop ret endp
near bp bp, sp bx, [bp+4] ax, ss:[bx-10]
;Get static link ;Get i’s value.
bp 2
; Recurse1(i:integer; procedure x); ; ; stack: ; ; bp+10: i ; bp+8: x’s static link ; bp+6: x’s address Recurse1
Recurse1
proc push mov push push push push call pop ret endp
near bp bp, sp wp [bp+10] wp [bp+8] wp [bp+6] bp Recurse1 bp 6
;Push ;Push ;Push ;Push
value of i onto stack. x’s static link. x’s address. Recurse1’s static link.
; Recurse2(i:integer; procedure y); ; ; stack: ; ; bp+10: j ; bp+8: y’s static link.
Page 661
Chapter 12 ; ;
bp+6: bp+4:
Recurse2
TryJeq5:
Call1:
R2Done: Recurse1 main
main
y’s address. Recurse2’s static link. proc push mov cmp jne push call jmp
near bp bp, sp wp [bp+10], 1 TryJeq5 [bp+8] wp [bp+6] R2Done
cmp jne mov dec push push lea push call jmp
wp [bp+10], 5 Call1 ax, [bp+10] ax ax [bp+4] ax, PrintIt ax Recurse1 R2Done
mov dec push push push call
ax, [bp+10] ax ax [bp+8] [bp+6] Recurse1
pop ret endp
bp 6
proc push mov mov push push lea push call pop ExitPgm endp
bp bp, sp ax, 5 ax bp ax, Dummy ax Recurse1 bp
;Is j=1? ;y’s static link. ;Call y. ;Is j=5?
;Push static link to R1. ;Push address of print.
;Pass along existing ; address and link.
;Push first parameter. ;Dummy static link. ;Push address of dummy.
There are several ways to improve this code. Of course, this particular program doesn’t really need to maintain a display or static list because only PrintIt accesses non-local variables; however, ignore that fact for the time being and pretend it does. Since you know that PrintIt only accesses variables at one particular lex level, and the program only calls PrintIt indirectly, you can pass a pointer to the appropriate activation record; this is what the above code does, although it maintains full static links as well. Compilers must always assume the worst case and often generate inefficient code. If you study your particular needs, however, you may be able to improve the efficiency of your code by avoiding much of the overhead of maintaining static lists or copying displays. Keep in mind that thunks are special cases of functions that you call indirectly. They suffer from the same problems and drawbacks as procedure and function parameters with respect to accessing non-local variables. If such routines access non-local variables (and thunks almost always will) then you must exercise care when calling such routines. Fortunately, thunks never cause indirect recursion (which is responsible for the crazy problems in the Recurse1 / Recurse2 example) so you can use the display to access any non-local variables appearing within the thunk.
Page 662
Procedures: Advanced Topics
12.5
Iterators An iterator is a cross between a control structure and a function. Although common high level languages do not often support iterators, they are present in some very high level languages11. Iterators provide a combination state machine/function call mechanism that lets a function pick up where it last left off on each new call. Iterators are also part of a loop control structure, with the iterator providing the value of the loop control variable on each iteration. To understand what an iterator is, consider the following for loop from Pascal: for I := 1 to 10 do <some statement>;
When learning Pascal you were probably taught that this statement initializes i with one, compares i with 10, and executes the statement if i is less than or equal to 10. After executing the statement, the for statement increments i and compares it with 10 again, repeating the process over and over again until I is greater than 10. While this description is semantically correct, and indeed, it’s the way that most Pascal compilers implement the for loop, this is not the only point of view that describes how the for loop operates. Suppose, instead, that you were to treat the “to” reserved word as an operator. An operator that expects two parameters (one and ten in this case) and returns the range of values on each successive execution. That is, on the first call the “to” operator would return one, on the second call it would return two, etc. After the tenth call, the “to” operator would fail which would terminate the loop. This is exactly the description of an iterator. In general, an iterator controls a loop. Different languages use different names for iterator controlled loops, this text will just use the name foreach as follows: foreach variable in iterator() do statements; endfor; Variable is a variable whose type is compatible with the return type of the iterator. An iterator returns two values: a boolean success or failure value and a function result. As long as the iterator returns success, the foreach statement assigns the other return value to variable and executes statements. If iterator returns failure, the foreach loop terminates and executes the next sequential statement following the foreach loop’s body. In the case of failure, the foreach statement does not affect the value of variable.
Iterators are considerably more complex than normal functions. A typical function call involves two basic operations: a call and a return. Iterator invocations involve four basic operations:
1) 2) 3) 4)
Initial iterator call Yielding a value Resumption of an iterator Termination of an iterator.
To understand how an iterator operates, consider the following short example from the Panacea programming language12: Range:iterator(start,stop:integer):integer; begin range; while (start <= stop) do yield start; start := start + 1; endwhile; 11. Ada and PL/I support very limited forms of iterators, though they do not support the type of iterators found in CLU, SETL, Icon, and other languages. 12. Panacea is a very high level language developed by Randall Hyde for use in compiler courses at UC Riverside.
Page 663
Chapter 12 end Range;
In the Panacea programming language, iterator calls may only appear in the foreach statement. With the exception of the yield statement above, anyone familiar with Pascal or C++ should be able to figure out the basic logic of this iterator. An iterator in the Panacea programming language may return to its caller using one of two separate mechanisms, it can return to the caller, by exiting through the end Range; statement or it may yield a value by executing the yield statement. An iterator succeeds if it executes the yield statement, it fails if it simply returns to the caller. Therefore, the foreach statement will only execute its corresponding statement if you exit an iterator with a yield. The foreach statement terminates if you simply return from the iterator. In the example above, the iterator returns the values start..stop via a yield and then the iterator terminates. The loop foreach i in Range(1,10) do write(i); endfor;
is comparable to the Pascal statement: for i := 1 to 10 do write(i);
When a Panacea program first executes the foreach statement, it makes an initial call to the iterator. The iterator runs until it executes a yield or it returns. If it executes the yield statement, it returns the value of the expression following the yield as the iterator result and it returns success. If it simply returns, the iterator returns failure and no iterator result. In the current example, the initial call to the iterator returns success and the value one. Assuming a successful return (as in the current example), the foreach statement assigns the iterator return value to the loop control variable and executes the foreach loop body. After executing the loop body, the foreach statement calls the iterator again. However, this time the foreach statement resumes the iterator rather than making an initial call. An iterator resumption continues with the first statement following the last yield it executed. In the range example, a resumption would continue execution at the start := start + 1; statement. On the first resumption, the Range iterator would add one to start, producing the value two. Two is less than ten (stop’s value) so the while loop would repeat and the iterator would yield the value two. This process would repeat over and over again until the iterator yields ten. Upon resuming after yielding ten, the iterator would increment start to eleven and then return, rather than yield, since this new value is not less than or equal to ten. When the range iterator returns (fails), the foreach loop terminates.
12.5.1
Implementing Iterators Using In-Line Expansion The implementation of an iterator is rather complex. To begin with, consider a first attempt at an assembly implementation of the foreach statement above:
ForLoop:
push push call jc puti call jnc
1 10 Range_Initial Failure Range_Resume ForLoop
;Assume 286 or better ; and parms passed on stack. ;Make initial call to iter. ;C=0, 1 means success, fail. ;Assume result is in AX. ;Resume iterator. ;Carry clear is success!
Failure:
Although this looks like a straight-forward implementation project, there are several issues to consider. First, the call to Range_Resume above looks simple enough, but there is no fixed address that corresponds to the resume address. While it is certainly true that this Range example has only one resume address, in general you can have as many yield statements as you like in an iterator. For example, the following iterator returns the values 1, 2, 3, and 4: Page 664
Procedures: Advanced Topics OneToFour:iterator:integer; begin OneToFour; yield yield yield yield
1; 2; 3; 4;
end OneToFour;
The initial call would execute the yield 1; statement. The first resumption would execute the yield 2; statement, the second resumption would execute yield 3;, etc. Obviously there is no single resume address the calling code can count on. There are a couple of additional details left to consider. First, an iterator is free to call procedures and functions13. If such a procedure or function executes the yield statement then resumption by the foreach statement continues execution within the procedure or function that executed the yield. Second, the semantics of an iterator require all local variables and parameters to maintain their values until the iterator terminates. That is, yielding does not deallocate local variables and parameters. Likewise, any return addresses left on the stack (e.g., the call to a procedure or function that executes the yield statement) must not be lost when a piece of code yields and the corresponding foreach statement resumes the iterator. In general, this means you cannot use the standard call and return sequence to yield from or resume to an iterator because you have to preserve the contents of the stack. While there are several ways to implement iterators in assembly language, perhaps the most practical method is to have the iterator call the loop controlled by the iterator and have the loop return back to the iterator function. Of course, this is counter-intuitive. Normally, one thinks of the iterator as the function that the loop calls on each iteration, not the other way around. However, given the structure of the stack during the execution of an iterator, the counter-intuitive approach turns out to be easier to implement. Some high level languages support iterators in exactly this fashion. For example, Metaware’s Professional Pascal Compiler for the PC supports iterators14. Were you to create a code sequence as follows: iterator OneToFour:integer; begin yield 1; yield 2; yield 3; yield 4; end;
and call it in the main program as follows: for i in OneToFour do writeln(i);
Professional Pascal would completely rearrange your code. Instead of turning the iterator into an assembly language function and call this function from within the for loop body, this code would turn the for loop body into a function, expand the iterator in-line (much like a macro) and call the for loop body function on each yield. That is, Professional Pascal would probably produce assembly language that looks something like the following:
13. In Panacea an iterator could also call other types of program units, including other iterators, but you can ignore this for now. 14. Obviously, this is a non-standard extension to the Pascal programming language provided in Professional Pascal.
Page 665
Chapter 12 ; The following procedure corresponds to the for loop body ; with a single parameter (I) corresponding to the loop ; control variable: ForLoopCode
proc push mov mov puti putcr pop ret endp
ForLoopCode ; ; ; ;
near bp bp, sp ax, [bp+4]
bp 2
;Get loop control value and ; print it.
;Pop loop control value off stk.
The follow code would be emitted in-line upon encountering the for loop in the main program, it corresponds to an in-line expansion of the iterator as though it were a macro, substituting a call for the yield instructions: push call push call push call push call
1 ForLoopCode 2 ForLoopCode 3 ForLoopCode 4 ForLoopCode
;On 286 and later processors only.
This method for implementing iterators is convenient and produces relatively efficient (fast) code. It does, however, suffer from a couple drawbacks. First, since you must expand the iterator in-line wherever you call it, much like a macro, your program could grow large if the iterator is not short and you use it often. Second, this method of implementing the iterator completely hides the underlying logic of the code and makes your assembly language programs difficult to read and understand.
12.5.2
Implementing Iterators with Resume Frames In-line expansion is not the only way to implement iterators. There is another method that preserves the structure of your program at the expense of a slightly more complex implementation. Several high level languages, including Icon and CLU, use this implementation. To start with, you will need another stack frame: the resume frame. A resume frame contains two entries: a yield return address (that is, the address of the next instruction after the yield statement) and a dynamic link, which is a pointer to the iterator’s activation record. Typically the dynamic link is just the value in the bp register at the time you execute the yield instruction. This version implements the four parts of an iterator as follows:
1) 2) 3) 4)
A call instruction for the initial iterator call, A call instruction for the yield statement, A ret instruction for the resume operation, and A ret instruction to terminate the iterator.
To begin with, an iterator will require two return addresses rather than the single return address you would normally expect. The first return address appearing on the stack is the termination return address. The second return address is where the subroutine transfers control on a yield operation. The calling code must push these two return addresses upon initial invocation of the iterator. The stack, upon initial entry into the iterator, should look something like Figure 12.8. As an example, consider the Range iterator presented earlier. This iterator requires two parameters, a starting value and an ending value: foreach i in Range(1,10) do writeln(i);
Page 666
Procedures: Advanced Topics
Previous Stack Contents Parameters for Iterator
If this is a NEAR Iterator
Termination Return Address Yield Return Address SP
Figure 12.8 Iterator Activation Record The code to make the initial call to the Range iterator, producing a stack like the one above, could be the following: push push push call
1 10 offset ForDone Range
;Push start parameter value. ;Push stop parameter value. ;Push termination address. ;Pushes yield return address.
ForDone is the first statement immediately following the foreach loop, that is, the instruction to execute when the iterator returns failure. The foreach loop body must begin with the first instruction following the call to Range. At the end of the foreach loop, rather than jumping back to the start of the loop, or calling the iterator again, this code should just execute a ret instruction. The reason will become clear in a moment. So the implementation of the above foreach statement could be the following: push push push call mov puti putcr ret
1 10 offset ForDone Range bp, [bp]
;Obviously, this requires a ; 80286 or later processor.
;Explained a little later.
ForDone:
Granted, this doesn’t look anything at all like a loop. However, by playing some major tricks with the stack, you’ll see that this code really does iterate the loop body (puti and putcr) as intended. Now consider the Range iterator itself, here’s the code to do the job: Range_Start Range_Stop Range_Yield
equ equ equ
word ptr <[bp+8]> word ptr <[bp+6]> word ptr <[bp+2]>
;Address of Start parameter. ;Address of Stop parameter. ;Yield return address.
Range
proc push mov mov cmp ja
near bp bp, sp ax, Range_Start ax, Range_Stop RangeDone
;Get start parameter and ; compare against stop. ;Terminate if start > stop
RangeLoop:
; Okay, build the resume frame:
RangeDone:
Range
push call pop inc jmp
bp Range_Yield bp Range_Start RangeLoop
;Save dynamic link. ;Do YIELD operation. ;Restore dynamic link. ;Bump up start value ;Repeat until start > stop.
pop add ret endp
bp sp, 2 4
;Restore old BP ;Pop YIELD return address ;Terminate iterator.
Page 667
Chapter 12
10
Previous Stack Contents
8
Value of Start Parameter (1)
6
Value of Stop Parameter (10)
4
Termination Return Address
2
Yield Return Address
0
If this is a NEAR Iterator
Original BP Value SP, BP
Offset from BP
Figure 12.9 Range Activation Record Although this routine is rather short, don’t let its size deceive you; it’s quite complex. The best way to describe how this iterator operates is to take it a few instructions at a time. The first two instructions are the standard entry sequence for a procedure. Upon execution of these two instructions, the stack looks like that in Figure 12.9. The next three statements in the Range iterator, at label RangeLoop, implement the termination test of the while loop. When the Start parameter contains a value greater than the Stop parameter, control transfers to the RangeDone label at which point the code pops the value of bp off the stack, pops the yield return address off the stack (since this code will not return back to the body of the iterator loop) and then returns via the termination return address that is immediately above the yield return address on the stack. The return instruction also pops the two parameters off the stack. The real work of the iterator occurs in the body of the while loop. The push, call, and pop instructions implement the yield statement. The push and call instructions build the resume frame and then return control to the body of the foreach loop. The call instruction is not calling a subroutine. What it is really doing here is finishing off the resume frame (by storing the yield resume address into the resume frame) and then it returns control back to the body of the foreach loop by jumping indirect through the yield return address pushed on the stack by the initial call to the iterator. After the execution of this call, the stack frame looks like that in Figure 12.9. Also note that the ax register contains the return value for the iterator. As with functions, ax is a good place to return the iterator return result. Immediately after yielding back to the foreach loop, the code must reload bp with the original value prior to the iterator invocation. This allows the calling code to correctly access parameters and local variables in its own activation record rather than the activation record of the iterator. Since bp just happens to point at the original bp value for the calling code, executing the mov bp, [bp] instruction reloads bp as appropriate. Of course, in this example reloading bp isn’t necessary because the body of the foreach loop does not reference any memory locations off the bp register, but in general you will need to restore bp. At the end of the foreach loop body the ret instruction resumes the iterator. The ret instruction pops the return address off the stack which returns control back to the iterator immediately after the call. The instruction at this point pops bp off the stack, increments the Start variable, and then repeats the while loop.
Page 668
Procedures: Advanced Topics
Previous Stack Contents Value of Start Parameter (1) Value of Stop Parameter (10) Termination Return Address
Iterator Activation Record
Yield Return Address Original BP Value BP Dynamic Link (old BP) Resume Frame Resume Return Address SP
Figure 12.10 Range Resume Record Of course, this is a lot of work to create a piece of code that simply repeats a loop ten times. A simple for loop would have been much easier and quite a bit more efficient that the foreach implementation described in this section. This section used the Range iterator because it was easy to show how iterators work using Range, not because actually implementing Range as an iterator is a good idea.
12.6
Sample Programs The sample programs in this chapter provide two examples of iterators. The first example is a simple iterator that processes characters in a string and returns the vowels found in that string. The second iterator is a synthetic program (i.e., written just to demonstrate iterators) that is considerably more complex since it deals with static links. The second sample program also demonstrates another way to build the resume frame for an iterator. Take a good look at the macros that this program uses. They can simplify the user of iterators in your programs.
12.6.1
An Example of an Iterator The following example demonstrates a simple iterator. This piece of code reads a string from the user and then locates all the vowels (a, e, i, o, u, w, y) on the line and prints their index into the string, the vowel at that position, and counts the occurrences of each vowel. This isn’t a particularly good example of an iterator, however it does serve to demonstrate an implementation and use. First, a pseudo-Pascal version of the program: program DoVowels(input,output); const Vowels = [‘a’, ‘e’, ‘i’, ‘o’, ‘u’, ‘y’, ‘w’, ‘A’, ‘E’, ‘I’, ‘O’, ‘U’, ‘Y’, ‘W’]; var
Page 669
Chapter 12 ThisVowel : integer; VowelCnt : array [char] of integer; iterator GetVowel(s:string) : integer; var CurIndex : integer; begin for CurIndex := 1 to length(s) do if (s [CurIndex] in Vowels) then begin { If we have a vowel, bump the cnt by 1 } Vowels[s[CurIndex]] := Vowels[s[CurIndex]]+1; ( Return index into string of current vowel } yield CurIndex; end; end; begin {main} { First, initialize our vowel counters } VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt VowelCnt
[‘a’] [‘e’] [‘i’] [‘o’] [‘u’] [‘w’] [‘y’] [‘A’] [‘E’] [‘I’] [‘O’] [‘U’] [‘W’] [‘Y’]
:= := := := := := := := := := := := := :=
0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0;
{ Read and process the input string} Write(‘Enter a string: ‘); ReadLn(InputStr); foreach ThisVowel in GetVowel(InputStr) do WriteLn(‘Vowel ‘,InputStr [ThisVowel], ‘ at position ‘, ThisVowel); { Output the vowel counts } WriteLn(‘# WriteLn(‘# WriteLn(‘# WriteLn(‘# WriteLn(‘# WriteLn(‘# WriteLn(‘#
of of of of of of of
A’’s:’,VowelCnt[‘a’] E’’s:’,VowelCnt[‘e’] I’’s:’,VowelCnt[‘i’] O’’s:’,VowelCnt[‘o’] U’’s:’,VowelCnt[‘u’] W’’s:’,VowelCnt[‘w’] Y’’s:’,VowelCnt[‘y’]
+ + + + + + +
VowelCnt[‘A’]); VowelCnt[‘E’]); VowelCnt[‘I’]); VowelCnt[‘O’]); VowelCnt[‘U’]); VowelCnt[‘W’]); VowelCnt[‘Y’]);
end.
Here’s the working assembly language version: .286 ;For PUSH imm instr. .xlist include stdlib.a includelib stdlib.lib .list ; Some “cute” equates: Iterator endi wp
textequ textequ textequ
<proc> <endp> <word ptr>
; Necessary global variables: dseg
Page 670
segment
para public ‘data’
Procedures: Advanced Topics ; As per UCR StdLib instructions, InputStr must hold ; at least 128 characters. InputStr
byte
128 dup (?)
; Note that the following statement initializes the ; VowelCnt array to zeros, saving us from having to ; do this in the main program. VowelCnt
word
256 dup (0)
dseg
ends
cseg
segment assume
; GetVowel; ; ; ; ; ; ; GVYield; GVStrPtr-
This iterator searches for the next vowel in the input string and returns the index to the value as the iterator result. On entry, ES:DI points at the string to process. On yield, AX returns the zero-based index into the string of the current vowel.
GVYield GVStrPtr
textequ textequ
GetVowel
Iterator push bp mov bp, sp
para public ‘code’ cs:cseg, ds:dseg
Address to call when performing the yield. A local variable that points at our string. <word ptr [bp+2]>
; Create and initialize GVStrPtr. This is a pointer to the ; next character to process in the input string. push push
es di
; Save original ES:DI values so we can restore them on YIELD ; and on termination. push push ; ; ; ;
Okay, here’s the main body character until the end of a vowel. If it is a vowel, it is not a vowel, move on
GVLoop:
; ; ; ;
es di
les mov cmp je
of the iterator. Fetch each the string and see if it is yield the index to it. If to the next character.
di, GVStrPtr ;Ptr to next char. al, es:[di] ;Get this character. al, 0 ;End of string? GVDone
The following statement will convert all lower case characters to upper case. It will also translate other characters to who knows what, but we don’t care since we only look at A, E, I, O, U, W, and Y. and
al, 5fh
; See if this character is a vowel. This is a disgusting ; set membership operation. cmp je cmp je cmp je cmp je cmp je cmp je
al, ‘A’ IsAVowel al, ‘E’ IsAVowel al, ‘I’ IsAVowel al, ‘O’ IsAVowel al, ‘U’ IsAVowel al, ‘W’ IsAVowel
Page 671
Chapter 12 cmp jne ; ; ; ; ; ; ; ; ; ; ;
If we’ve got a vowel we need to yield the index into the string to that vowel. To compute the index, we restore the original ES:DI values (which points at the beginning of the string) and subtract the current position (now in AX) from the first position. This produces a zero-based index into the string. This code must also increment the corresponding entry in the VowelCnt array so we can print the results later. Unlike the Pascal code, we’ve converted lower case to upper case so the count for upper and lower case characters will appear in the upper case slot.
IsAVowel:
; ; ; ;
GetVowel Main
Page 672
bx ;Bump the vowel ah, 0 ; count by one. bx, ax bx, 1 VowelCnt[bx] bx
mov pop sub pop
ax, di di ax, di es
push call pop push push
bp GVYield bp es di
;Restore original DI ;Compute index. ;Restore original ES ;Save our frame pointer ;Yield to caller ;Restore our frame pointer ;Save ES:DI again
inc jmp
wp GVStrPtr GVLoop
If we’ve reached the end of the string, terminate the iterator here. We need to restore the original ES:DI values, remove local variables, pop the YIELD address, and then return to the termination address.
GVDone:
; ; ; ; ; ; ; ; ; ;
push mov mov shl inc pop
Whether it was a vowel or not, we’ve now got to move on to the next character in the string. Increment our string pointer by one and repeat the process over again.
NotAVowel: ; ; ; ;
al, ‘Y’ NotAVowel
pop pop mov add pop ret endi
di es sp, bp sp, 2 bp
;Restore ES:DI ;Remove locals ;Pop YIELD address
proc mov mov mov
ax, dseg ds, ax es, ax
print byte lesi gets
“Enter a string: “,0 InputStr ;Read input line.
The following is the foreach loop. Note that the label “FOREACH” is present for documentation purpose only. In fact, the foreach loop always begins with the first instruction after the call to GetVowel. One other note: this assembly language code uses zero-based indexes for the string. The Pascal version uses one-based indexes for strings. So the actual numbers printed will be different. If you want the values printed by both programs to be identical,
Procedures: Advanced Topics ; uncomment the INC instruction below.
FOREACH:
;
ForDone:
12.6.2
push call mov print byte mov putc print byte mov inc puti putcr ret printf byte byte byte byte byte byte byte dword dword dword dword dword dword dword
offset ForDone GetVowel bx, ax
;Termination address. ;Start iterator
“Vowel “,0 al, InputStr[bx]
“ at position “,0 ax, bx ax
;Iterator resume. “# of A’s: “# of E’s: “# of I’s: “# of O’s: “# of U’s: “# of W’s: “# of Y’s: VowelCnt + VowelCnt + VowelCnt + VowelCnt + VowelCnt + VowelCnt + VowelCnt +
%d\n” %d\n” %d\n” %d\n” %d\n” %d\n” %d\n”,0 (‘A’*2) (‘E’*2) (‘I’*2) (‘O’*2) (‘U’*2) (‘W’*2) (‘Y’*2)
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Another Iterator Example One problem with the iterator examples appearing in this chapter up to this point is that they do not access any global or intermediate variables. Furthermore, these examples do not work if an iterator is recursive or calls other procedures that yield the value to the foreach loop. The major problem with the examples up to this point has been that the foreach loop body has been responsible for reloading the bp register with a pointer to the foreach loop’s procedure’s activation record. Unfortunately, the foreach loop body has to assume that bp currently points at the iterator’s activation record so it can get a pointer to its own activation record from that activation record. This will not be the case if the iterator’s activation record is not the one on the top of the stack. To rectify this problem, the code doing the yield operation must set up the bp register so that it points at the activation record of the procedure containing the foreach loop before returning back to the loop. This is a somewhat complex operation. The following macro accomplishes this from inside an iterator: Yield
macro mov push mov
dx, [BP+2] bp bp, [bp]
;Place to yield back to. ;Save Iterator link ;Get ptr to caller's A.R.
Page 673
Chapter 12 call pop endm
dx bp
;Push resume address and rtn. ;Restore ptr to our A. R.
Note an unfortunate side effect of this code is that it modifies the dx register. Therefore, the iterator does not preserve the dx register across a call to the iterator function. The macro above assumes that the bp register points at the iterator’s activation record. If it does not, then you must execution some additional instructions to follow the static links back to the iterator’s activation record to obtain the address of the foreach loop procedure’s activation record. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 674
ITERS.ASM Roughly corresponds to the example in Ghezzi & Jazayeri's "Programming Language Concepts" text. Randall Hyde
This program demonstrates an implementation of: l := 0; foreach i in range(1,3) do foreach j in iter2() do writeln(i, ',', j, ',', l):
iterator range(start,stop):integer; begin while start <= stop do begin yield start; start := start+1; end; end; iterator iter2:integer; var k:integer; begin foreach k in iter3 do yield k; end; iterator iter3:integer; begin l := l + 1; yield 1; l := l + 1; yield 2; l := l + 1; yield 0; end;
This code will print: 1, 1, 1, 2, 2, 2, 3, 3, 3,
1, 2, 0, 1, 2, 0, 1, 2, 0,
1 2 3 4 5 6 7 8 9
Procedures: Advanced Topics .xlist include stdlib.a includelibstdlib.lib .list .286 dseg
segment
;Allow extra adrs modes. para stack 'data'
; Put the stack in the data segment so we can use the small memory model ; to simplify addressing: stk EndStk
byte word
dseg
ends
cseg
segment assume
1024 dup ('stack') 0
para public 'code' cs:cseg, ds:dseg, ss:dseg
; Here's the structure of a resume frame. Note that this structure isn't ; actually used in this code. It is only provided to show you what data ; is sitting on the stack when Yield builds a resume frame. RsmFrm ResumeAdrs IteratorLink RsmFrm
; ; ; ; ; ; ; ; ;
struct word word ends
? ?
The following macro builds a resume frame and the returns to the caller of an iterator. It assumes that the iterator and whoever called the iterator have the standard activation record defined above and that we are building the standard resume frame described above. This code wipes out the DX register. Whoever calls the iterator cannot count on DX being preserved, likewise, the iterator cannot count on DX being preserved across a yield. Presumably, the iterator returns its value in AX.
ActRec DynamicLink YieldAdrs StaticLink ActRec
struct word word word ends
? ? ?
AR
equ
[bp].ActRec
Yield
macro mov push mov call pop endm
dx, AR.YieldAdrs bp bp, AR.DynamicLink dx bp
;Saved BP value. ;Return Adrs for proc. ;Static link for proc.
;Place to yield back to. ;Save Iterator link ;Get ptr to caller's A.R. ;Push resume address and rtn. ;Restore ptr to our A. R.
; Range(start, stop) - Yields start..stop and then fails. ; The following structure defines the activation record for Range: rngAR DynamicLink YieldAdrs StaticLink FailAdrs Stop Start
struct word word word word word word
? ? ? ? ? ?
;Saved BP value. ;Return Adrs for proc. ;Static link for proc. ;Go here when we fail ;Stop parameter ;Start parameter
Page 675
Chapter 12 rngAR
ends
rAR
equ
[bp].rngAR
Range
proc push mov
bp bp, sp
; While start <= stop, yield start: WhlStartLEStop:
mov cmp jnle
ax, rAR.Start ;Also puts return value ax, rAR.Stop ; in AX. RangeFail
yield
RangeFail:
Range
; ; ; ; ; ; ; ;
inc jmp
rAR.Start WhlStartLEStop
pop add ret endp
bp sp, 4 4
;Restore Dynamic Link. ;Skip ret adrs and S.L. ;Return through fail address.
Iter2- Just calls iter3() and returns whatever value it generates. Note: Since iter2 and iter3 are at the same lex level, the static link passed to iter3 must be the same as the static link passed to iter2. This is why the "push [bp]" instruction appears below (as opposed to the "push bp" instruction which appears in the calls to Range and iter2). Keep in mind, Range and iter2 are only called from main and bp contains the static link at that point. This is not true when iter2 calls iter3.
iter2
i3Fail:
iter2
proc push mov
bp bp, sp
push push call yield ret
offset i3Fail ;Failure address. [bp] ;Static link is link to main. iter3 ;Return value returned by iter3 ;Resume Iter3.
pop add ret endp
bp sp, 4
;Restore Dynamic Link. ;Skip return address & S.L. ;Return through fail address.
; Iter3() simply yields the values 1, 2, and 0: iter3
Page 676
proc push mov
bp bp, sp
mov inc mov yield
bx, AR.StaticLink;Point BX at main's AR. word ptr [bx-6];Increment L in main. ax, 1
mov inc mov yield mov inc mov yield
bx, AR.StaticLink word ptr [bx-6] ax, 2 bx, AR.StaticLink word ptr [bx-6] ax, 0
Procedures: Advanced Topics
iter3
pop add ret endp
bp sp, 4
;Restore Dynamic Link. ;Skip return address & S.L. ;Return through fail address.
; Main's local variables are allocated on the stack in order to justify ; the use of static links. i j l
equ equ equ
[bp-2] [bp-4] [bp-6]
Main
proc mov mov mov mov mov
ax, ds, es, ss, sp,
dseg ax ax ax offset EndStk
; Allocate storage for i, j, and l on the stack: mov sub
bp, sp sp, 6
meminit mov
word ptr l, 0 ;Initialize l.
; foreach i in range(1,3) do: push push push push call
1 ;Parameters. 3 offset iFail ;Failure address. bp ;Static link points at our AR. Range
; Yield from range comes here. RangeYield:
mov
The label is for your benefit.
i, ax
;Save away loop control value.
; foreach j in iter2 do: push push call
offset jfail bp iter2
;Failure address. ;Static link points at our AR.
; Yield from iter2 comes here: iter2Yield:
mov
j, ax
mov puti print byte mov puti print byte mov puti putcr
ax, i
", ",0 ax, j
", ",0 ax, l
; Restart iter2: ret
;Resume iterator.
; Restart Range down here:
Page 677
Chapter 12 jFail:
ret
;Resume iterator.
; All Done! iFail:
print byte
Quit: Main
ExitPgm endp
cseg
ends
cr,lf,"All Done!",cr,lf,0 ;DOS macro to quit program.
; zzzzzzseg must be the last segment that gets loaded into memory! ; This is where the heap begins. zzzzzzseg LastBytes zzzzzzseg
12.7
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Laboratory Exercises This chapter’s laboratory exercises consist of three components. In the first exercise you will experiment with a fairly complex set of iterators. In the second exercise you will learn how the 80286’s enter and leave instructions operate. In the third exercise, you will run some experiments on parameter passing mechanisms.
12.7.1
Iterator Exercise In this laboratory exercise you will be working with a program (Ex12_1.asm on the companion CD-ROM) that uses four iterators. The first three iterators perform some fairly simple computations, the fourth iterator returns (successively) pointers to the first three iterators’ code that the main program can use to call these iterators. For your lab report: study the following code and explain how it works. Run it and explain the output. Assemble the program with the “/Zi” option, then from within CodeView, set a breakpoint on the first instruction of the four iterators. Run the program up to these break points and dump the memory starting at the current stack pointer value (ss:sp). Describe the meaning of the data on the stack at each breakpoint. Also, set a breakpoint on the “call ax” instruction. Trace into the routine ax points at upon each breakpoint and describe which routine this instruction calls. How many times does this instruction execute? ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 678
EX12_1.asm Program to support the laboratory exercise in Chapter 12. This program combines iterators, passing parameters as parameters, and procedural parameters all into the same program.
This program implements the following iterators (examples written in panacea): program EX12_1; fib:iterator(n:integer):integer; var CurIndex:integer; Fn1: integer; Fn2: integer; endvar; begin fib;
Procedures: Advanced Topics ; yield 1; (* Always have at least n=0 *) ; if (n <> 0) then ; ; yield 1; (* Have at least n=1 at this point *) ; ; Fn1 := 1; ; Fn2 := 1; ; foreach CurIndex in 2..n do ; ; yield Fn1+Fn2; ; Fn2 = Fn1; ; Fn1 = CurIndex; ; ; endfor; ; endif; ; ; end fib; ; ; ; ; UpDown:iterator(n:integer):integer; ; var ; CurIndex:integer; ; endvar; ; begin UpDown; ; ; foreach CurIndex in 0..n do ; ; yield CurIndex; ; ; endfor; ; foreach CurIndex in n-1..0 do ; ; yield CurIndex; ; ; endfor; ; ; end UpDown; ; ; ; ; SumToN:iterator(n:integer):integer; ; var ; CurIndex:integer; ; Sum: integer; ; endvar; ; begin SumToN; ; ; Sum := 0; ; foreach CurIndex in 0..n do ; ; Sum := Sum + CurIndex; ; yield Sum; ; ; endfor; ; ; end SumToN; ; ; ; MultiIter returns a pointer to an iterator that accepts a single integer parameter. ; ; MultiIter: iterator: [iterator(n:integer)]; ; begin MultiIter; ; ; yield @Fib;(* Return pointers to the three iterators above *) ; yield @UpDown;(* as the result of this iterator.*) ; yield @SumToN; ; ; end MultiIter;
Page 679
Chapter 12 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
var i:integer; n:integer; iter:[iterator(n:integer)]; endvar; begin EX12_1; (* The following for loop repeats six times, passing its loop index as*) (* the parameter to the Fib, UpDown, and SumToN parameters.*) foreach n in 0..5 do
(* (* (* (* (*
The following (funny looking) iterator sequences through *) each of the three iterators: Fib, UpDown, and SumToN. It*) returns a pointer as the iterator value. The innermost *) foreach loop uses this pointer to call the appropriate *) iterator. *) foreach iter in MultiIter do (* Okay, this for loop invokes whatever iterator was *) (* return by the MultiIter iterator above. *) foreach i in [MultiIter](n) do write(i:3); endfor; writeln; endfor; writeln;
endfor; end EX12_1; .xlist include stdlib.a includelibstdlib.lib .list .286
wp ofs
textequ textequ
<word ptr>
dseg dseg
segment ends
para public 'code'
cseg
segment assume
para public 'code' cs:cseg, ss:sseg
; ; ; ; ; ; ; ; ;
Page 680
;Allow extra adrs modes.
The following macro builds a resume frame and the returns to the caller of an iterator. It assumes that the iterator and whoever called the iterator have the standard activation record defined above and that we are building the standard resume frame described above. This code wipes out the DX register. Whoever calls the iterator cannot count on DX being preserved, likewise, the iterator cannot count on DX being preserved across a yield. Presumably, the iterator returns its value in AX.
Procedures: Advanced Topics
Yield
macro mov push mov call pop endm
dx, [BP+2] bp bp, [bp] dx bp
;Place to yield back to. ;Save Iterator link ;Get ptr to caller's A.R. ;Push resume address and rtn. ;Restore ptr to our A. R.
; Fib(n) - Yields the sequence of fibonacci numbers from F(0)..F(n). ; The fibonacci sequence is defined as: ; ; F(0) and F(1) = 1. ; F(n) = F(n-1) + F(n-2) for n > 1.
; The following structure defines the activation record for Fib CurIndex Fn1 Fn2 DynamicLink YieldAdrs FailAdrs n
textequ textequ textequ textequ textequ textequ textequ
<[bp-6]> <[bp-4]> <[bp-2]> <[bp]> <[bp+2]> <[bp+4]> <[bp+6]>
;Current sequence value. ;F(n-1) value. ;F(n-2) value. ;Saved BP value. ;Return Adrs for proc. ;Go here when we fail ;The initial parameter
Fib
proc push mov sub
bp bp, sp sp, 6
;Make room for local variables.
; We will also begin yielding values starting at F(0). ; Since F(0) and F(1) are special cases, yield their values here. mov yield
ax, 1
;Yield F(0) (we always return at least ; F(0)).
cmp jb mov yield
wp n, 1 FailFib ax, 1
;See if user called this with n=0.
; Okay, n >=1 so we need to go into a loop to handle the remaining values. ; First, begin by initializing Fn1 and Fn2 as appropriate.
WhlLp:
FailFib:
mov mov mov
wp Fn1, 1 wp Fn2, 1 wp CurIndex, 2
mov cmp ja
ax, CurIndex ;See if CurIndex > n. ax, n FailFib
push mov add pop mov yield
Fn1 ax, Fn1 ax, Fn2 Fn2 Fn1, ax
inc jmp
wp CurIndex WhlLp
mov pop
sp, bp bp
;Fn1 becomes the new Fn2 value. ;Current value becomes new Fn1 value. ;Yield the current value.
;Deallocate local vars. ;Restore Dynamic Link.
Page 681
Chapter 12
Fib
add ret endp
sp, 2 2
;Skip ret adrs. ;Return through fail address.
; UpDown;
This function yields the sequence 0, 1, 2, ..., n, n-1, n-2, ..., 1, 0.
i
textequ <[bp-2]>
UpDown
UptoN:
proc push mov sub mov mov cmp jae
bp bp, sp sp, 2 wp i, 0 ax, i ax, n GoDown
;F(n-2) value.
;Make room for i. ;Initialize our index variable (i).
yield
GoDown:
UpDownDone:
inc jmp
wp i UpToN
mov yield mov cmp je dec jmp
ax, i ax, i ax, 0 UpDownDone wp i GoDown
UpDown
mov pop add ret endp
; SumToN(n);
This iterator returns 1, 2, 3, 6, 10, ... sum(n) where sum(n) = 1+2+3+4+...+n (e.g., n(n+1)/2);
j k SumToN
SumLp:
sp, bp bp sp, 2 2
;Deallocate local vars. ;Restore Dynamic Link. ;Skip ret adrs. ;Return through fail address.
textequ <[bp-2]> textequ <[bp-4]> proc push mov sub
bp bp, sp sp, 4
mov mov mov cmp ja
wp j, 0 wp k, 0 ax, j ax, n SumDone
add mov
ax, k k, ax
;Make room for j and k. ;Initialize our index variable (j). ;Initialize our sum (k).
yield
SumDone:
Page 682
inc jmp
wp j SumLp
mov pop add ret
sp, bp bp sp, 2 2
;Deallocate local vars. ;Restore Dynamic Link. ;Skip ret adrs. ;Return through fail address.
Procedures: Advanced Topics SumToN
endp
; MultiIter- This iterator returns a pointer to each of the above iterators. MultiIter
MultiIter
Main
proc push mov
bp bp, sp
mov yield mov yield mov yield
ax, ofs Fib
pop add ret endp
bp sp, 2
proc mov mov mov meminit
ax, ofs UpDown ax, ofs SumToN
ax, dseg ds, ax es, ax
; foreach bx in 0..5 do mov
bx, 0
;Loop control variable for outer loop.
WhlBXle5: ; foreach ax in MultiIter do push call
ofs MultiDone ;Failure address. MultiIter ;Get iterator to call.
; foreach i in [ax](bx) do push push call ; ; ; write(ax:3);
bx ;Push "n" (bx) onto the stack. ofs IterDone ;Failure Address ax ;Call the iterator pointed at by the ; return value from MultiIter.
mov cx, 3 putisize ret ; endfor, writeln; IterDone:
putcr ret
;Writeln;
; endfor, writeln; MultiDone:
putcr inc cmp jbe
bx bx, 5 WhlBXle5
; endfor Quit:
ExitPgm
;DOS macro to quit program.
Page 683
Chapter 12 Main
endp
cseg
12.7.2
ends
sseg stk sseg
segment word ends
para stack 'stack' 1024 dup (0)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
The 80x86 Enter and Leave Instructions The following code (Ex12_2.asm on the companion CD-ROM) uses the 80x86 enter and leave instructions to maintain a display in a block structured program. Assemble this program with the “/Zi” option and load it into CodeView. Set breakpoints on the calls to the Lex1, Lex2, Lex3, and Lex4 procedures. Run the program and when you encounter a breakpoint, use the F8 key to single step into each procedure. Single step over the enter instruction (to the following nop). Note the values of the bp and sp register before and after the execution of the enter instruction. For your lab report: explain the values in the bp and sp registers after executing each enter instruction. Dump memory from ss:sp to about ss:sp+32 using a memory window or the dw command in the command window. Describe the contents of the stack after the execution of each enter instruction. After executing through the enter instruction in the Lex4 procedure, set a breakpoint on each of the leave instructions. Run the program at full speed (using the F5 key) until you hit each of these leave instructions. Note the values of the bp and sp registers before and after the execution of each leave instruction. For your lab report: include these bp/sp values in your lab report and explain them. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 684
EX12_2.asm Program to demonstrate the ENTER and LEAVE instructions in Chapter 12. This program simulates the following Pascal code: program EnterLeave; var i:integer; procedure Lex1; var j:integer; procedure Lex2; var k:integer; procedure Lex3; var m:integer; procedure Lex4; var n:integer; begin writeln('Lex4'); for i:= 0 to 3 do for j:= 0 to 2 do write('(',i,',',j,') '); writeln; for k:= 1 downto 0 do for m:= 1 downto 0 do for n := 0 to 1 do write('(',m,',',k,',',n,') ');
Procedures: Advanced Topics ; writeln; ; end; ; ; begin {Lex3} ; ; writeln('Lex3'); ; for i := 0 to 1 do ; for j := 0 to 1 do ; for k := 0 to 1 do ; for m := 0 to 1 do ; writeln(i,j,k,m); ; ; Lex4; ; ; end; {Lex3} ; ; begin {Lex2} ; ; writeln('Lex2'); ; for i := 1 downto 0 do ; for j := 0 to 1 do ; for k := 1 downto 0 do ; write(i,j,k,' '); ; writeln; ; ; Lex3; ; ; end; {Lex2} ; ; begin {Lex1} ; ; writeln('Lex1'); ; Lex2; ; ; end; {Lex1} ; ; begin {Main (lex0)} ; ; writeln('Main Program'); ; Lex1; ; ; end. .xlist include stdlib.a includelib stdlib.lib .list .286
;Allow ENTER & LEAVE.
; Common equates all the procedures use: wp disp1 disp2 disp3 ; ; ; ;
textequ textequ textequ textequ
<word <word <word <word
ptr> ptr [bp-2]> ptr [bp-4]> ptr [bp-6]>
Note: the data segment and the stack segment are one and the same in this program. This is done to allow the use of the [bx] addressing mode when referencing local and intermediate variables without having to use a stack segment prefix.
sseg
segment
para stack 'stack'
i stk
word word
? 2046 dup (0)
sseg
ends
cseg
segment assume
;Main program variable.
para public 'code' cs:cseg, ds:sseg, ss:sseg
; Main's activation record looks like this: ;
Page 685
Chapter 12 ; ; Main
Quit: Main
| return address |<- SP, BP |----------------| proc mov mov mov print byte call ExitPgm endp
ax, ss ds, ax es, ax
;Make SS=DS to simplify addressing ; (there will be no need to stick "SS:" ; in front of addressing modes like ; "[bx]").
"Main Program",cr,lf,0 Lex1 ;DOS macro to quit program.
; Lex1's activation record looks like this: ; ; | return address | ; |----------------| ; | Dynamic Link | <- BP ; |----------------| ; | Lex1's AR Ptr | | Display ; |----------------| ; | J Local var | <- SP (BP-4) ; |----------------| Lex1_J
textequ
<word ptr [bx-4]>
Lex1
proc enter nop
near 2, 1
Lex1
print byte call leave ret endp
;A 2 byte local variable at lex level 1. ;Spacer instruction for single stepping
"Lex1",cr,lf,0 Lex2
; Lex2's activation record looks like this: ; ; | return address | ; |----------------| ; | Dynamic Link | <- BP ; |----------------| ; | Lex1's AR Ptr | | ; |----------------| | Display ; | Lex2's AR Ptr | | ; |----------------| ; | K Local var | <- SP (BP-6) ; |----------------| ; ; writeln('Lex2'); ; for i := 1 downto 0 do ; for j := 0 to 1 do ; for k := 1 downto 0 do ; write(i,j,k,' '); ; writeln; ; ; Lex3; Lex2_k k
textequ textequ
<word ptr [bx-6]> <word ptr [bp-6]>
Lex2
proc enter
near 2, 2
nop
Page 686
;A 2-byte local variable at lex level 2. ;Spacer instruction for single stepping
print byte
"Lex2",cr,lf,0
mov
i, 1
Procedures: Advanced Topics ForLpI: ForLpJ: ForLpK:
Lex2
mov mov mov
bx, disp1 Lex1_J, 0 k, 1
mov puti mov mov puti mov puti mov putc
ax, i
;"J" is at lex level one. ;"K" is local.
bx, disp1 ax, Lex1_J ax, k al, ' '
dec jns
k ForLpK
mov inc cmp jb
bx, disp1 Lex1_J Lex1_J, 2 ForLpJ
dec jns
i ForLpI
putcr call
Lex3
;Decrement from 1->0 and quit ; if we hit -1.
leave ret endp
; Lex3's activation record looks like this: ; ; | return address | ; |----------------| ; | Dynamic Link | <- BP ; |----------------| ; | Lex1's AR Ptr | | ; |----------------| | ; | Lex2's AR Ptr | | Display ; |----------------| | ; | Lex3's AR Ptr | | ; |----------------| ; | M Local var | <- SP (BP-8) ; |----------------| ; ; writeln('Lex3'); ; for i := 0 to 1 do ; for j := 0 to 1 do ; for k := 0 to 1 do ; for m := 0 to 1 do ; writeln(i,j,k,m); ; ; Lex4; Lex3_M m
textequ textequ
<word ptr [bx-8]> <word ptr [bp-8]>
Lex3
proc enter
near 2, 3
nop
ForILp: ForJlp: ForKLp: ForMLp:
;2-byte variable at lex level 3. ;Spacer instruction for single stepping
print byte
"Lex3",cr,lf,0
mov mov mov mov mov mov mov
i, 0 bx, disp1 Lex1_J, 0 bx, disp2 Lex2_K, 0 m, 0 ax, i
Page 687
Chapter 12 puti mov mov puti mov mov puti mov puti putcr
Lex3
bx, disp1 ax, Lex1_J bx, disp2 ax, Lex2_k ax, m
inc cmp jb
m m, 2 ForMLp
mov inc cmp jb
bx, disp2 Lex2_K Lex2_K, 2 ForKLp
mov inc cmp jb
bx, disp1 Lex1_J Lex1_J, 2 ForJLp
inc cmp jb
i i, 2 ForILp
call
Lex4
leave ret endp
; Lex4's activation record looks like this: ; ; | return address | ; |----------------| ; | Dynamic Link | <- BP ; |----------------| ; | Lex1's AR Ptr | | ; |----------------| | ; | Lex2's AR Ptr | | ; |----------------| | Display ; | Lex3's AR Ptr | | ; |----------------| | ; | Lex4's AR Ptr | | ; |----------------| ; | N Local var | <- SP (BP-10) ; |----------------| ; ; ; writeln('Lex4'); ; for i:= 0 to 3 do ; for j:= 0 to 2 do ; write('(',i,',',j,') '); ; writeln; ; for k:= 1 downto 0 do ; for m:= 1 downto 0 do ; for n := 0 to 1 do ; write('(',m,',',k,',',n,') '); ; writeln; n
textequ
<word ptr [bp-10]>
Lex4
proc enter
near 2, 4
nop print byte
Page 688
;2-byte local variable at lex level 4. ;Spacer instruction for single stepping
"Lex4",cr,lf,0
Procedures: Advanced Topics ForILp: ForJLp:
mov mov mov mov putc mov puti mov putc mov puti print byte
i, 0 bx, disp1 Lex1_J, 0 al, '('
inc cmp jb
Lex1_J Lex1_J, 3 ForJLp
inc cmp jb
i i, 4 ForILp
ax, i al, ',' ax, Lex1_J
;Note that BX still contains disp1.
") ",0 ;BX still contains disp1.
putcr
ForKLp: ForMLp: ForNLp:
Lex4 cseg zzzzzzseg LastBytes zzzzzzseg
mov mov mov mov mov mov putc
bx, disp2 Lex2_K, 1 bx, disp3 Lex3_M, 1 n, 0 al, '('
mov mov puti mov putc mov mov puti mov putc mov puti print byte
bx, disp3 ax, Lex3_M
inc cmp jb
n n, 2 ForNLp
mov dec jns
bx, disp3 Lex3_M ForMLp
mov dec jns
bx, disp2 Lex2_K ForKLp
al, ',' bx, disp2 ax, Lex2_K al, ',' ax, n
") ",0
leave ret endp ends segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Page 689
Chapter 12
12.7.3
Parameter Passing Exercises The following exercise demonstrates some simple parameter passing. This program passes arrays by reference, word variables by value and by reference, and some functions and procedure by reference. The program itself sorts two arrays using a generic sorting algorithm. The sorting algorithm is generic because the main program passes it a comparison function and a procedure to swap two elements if one is greater than the other. ; Ex12_3.asm ; ; This program demonstrates different parameter passing methods. ; It corresponds to the following (pseudo) Pascal code: ; ; ; program main; ; var i:integer; ; a:array[0..255] of integer; ; b:array[0..255] of unsigned; ; ; function LTint(int1, int2:integer):boolean; ; begin ; LTint := int1 < int2; ; end; ; ; procedure SwapInt(var int1, int2:integer); ; var temp:integer; ; begin ; temp := int1; ; int1 := int2; ; int2 := temp; ; end; ; ; function LTunsigned(uns1, uns2:unsigned):boolean; ; begin ; LTunsigned := uns1 < uns2; ; end; ; ; procedure SwapUnsigned(uns1, uns2:unsigned); ; var temp:unsigned; ; begin ; temp := uns1; ; uns1 := uns2; ; uns2 := temp; ; end; ; ; (* The following is a simple Bubble sort that will sort arrays containing *) ; (* arbitrary data types. *) ; ; procedure sort(data:array; elements:integer; function LT:boolean; procedure swap); ; var i,j:integer; ; begin ; ; for i := 0 to elements-1 do ; for j := i+1 to elements do ; if (LT(data[j], data[i])) then swap(data[i], data[j]); ; end; ; ; ; begin ; ; for i := 0 to 255 do A[i] := 128-i; ; for i := 0 to 255 do B[i] := 255-i; ; sort(A, 256, LTint, SwapInt); ; sort(B, 256, LTunsigned, SwapUnsigned); ; ; for i := 0 to 255 do ; begin
Page 690
Procedures: Advanced Topics ; ; ; ; ; ; ; ; ; ; ; end;
if (i mod 8) = 0 then writeln; write(A[i]:5); end; for i := 0 to 255 do begin if (i mod 8) = 0 then writeln; write(B[i]:5); end;
.xlist include stdlib.a includelib stdlib.lib .list .386 option
segment:use16
wp
textequ
<word ptr>
dseg A B dseg
segment word word ends
para public 'data' 256 dup (?) 256 dup (?)
cseg
segment assume
para public 'code' cs:cseg, ds:dseg, ss:sseg
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
function LTint(int1, int2:integer):boolean; begin LTint := int1 < int2; end; LTint's activation record looks like this: |----------------| | int1 | |----------------| | int2 | |----------------| | return address | |----------------| | old BP |<- SP, BP |----------------|
int1 int2
textequ textequ
<word ptr [bp+6]> <word ptr [bp+4]>
LTint
proc push mov
near bp bp, sp
mov cmp setl mov
ax, int1 ax, int2 al ah, 0
pop ret endp
bp 4
LTint
;Compare the two parameters ; and return true if int1
; Swap's activation record looks like this: ; ; |----------------| ; | Address | ; |--of ---| ; | int1 | ; |----------------| ; | Address | ; |--of ---| ; | int2 |
Page 691
Chapter 12 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
|----------------| | return address | |----------------| | old BP |<- SP, BP |----------------| The temporary variable is kept in a register. Note that swapping integers or unsigned integers can be done with the same code since the operations are identical for either type. procedure SwapInt(var int1, int2:integer); var temp:integer; begin temp := int1; int1 := int2; int2 := temp; end; procedure SwapUnsigned(uns1, uns2:unsigned); var temp:unsigned; begin temp := uns1; uns1 := uns2; uns2 := temp; end;
int1 int2
textequ textequ
SwapInt
proc push mov push push
near bp bp, sp es bx
les mov les xchg
bx, ax, bx, ax,
les mov
bx, int1 es:[bx], ax
pop pop pop ret endp
bx es bp 8
SwapInt ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 692
int1 es:[bx] int2 es:[bx]
;Get address of int1 variable. ;Get int1's value. ;Get address of int2 variable. ;Swap int1's value with int2's ;Get the address of int1 and ; store int2's value there.
LTunsigned's activation record looks like this: |----------------| | uns1 | |----------------| | uns2 | |----------------| | return address | |----------------| | old BP |<- SP, BP |----------------| function LTunsigned(uns1, uns2:unsigned):boolean; begin LTunsigned := uns1 < uns2; end;
uns1 uns2
textequ textequ
<word ptr [bp+6]> <word ptr [bp+4]>
LTunsigned
proc
near
Procedures: Advanced Topics
LTunsigned
push mov
bp bp, sp
mov cmp setb mov
ax, uns1 ax, uns2 al ah, 0
pop ret endp
bp 4
;Compare uns1 with uns2 and ; return true if uns1
; Sort's activation record looks like this: ; ; |----------------| ; | Data's | ; |-----| ; | Address | ; |----------------| ; | Elements | ; |----------------| ; | LT's | ; |-----| ; | Address | ; |----------------| ; | Swap's | ; |-----| ; | Address | ; |----------------| ; | return address | ; |----------------| ; | old BP |<- SP, BP ; |----------------| ; ; procedure sort(data:array; elements:integer; function LT:boolean; procedure swap); ; var i,j:integer; ; begin ; ; for i := 0 to elements-1 do ; for j := i+1 to elements do ; if (LT(data[j], data[i])) then swap(data[i], data[j]); ; end; data elements funcLT procSwap
textequ textequ textequ textequ
<word ptr [bp+8]> <word ptr [bp+6]> <word ptr [bp+4]>
i j
textequ textequ
<word ptr [bp-2]> <word ptr [bp-4]>
sort
proc push mov sub push push
near bp bp, sp sp, 4 es bx
mov mov inc cmp jae
i, 0 ax, i i ax, Elements IDone
mov mov cmp ja
j, ax ax, j ax, Elements JDone
les mov add
bx, data si, j si, si
ForILp:
ForJLp:
;Push the value of ; data[j] onto the ; stack.
Page 693
Chapter 12 push
es:[bx+si]
les mov add push
bx, data si, i si, si es:[bx+si]
;Push the value of ; data[i] onto the ; stack.
call cmp je
FuncLT ax, 0 NextJ
;See if data[i] < data[j] ;Test boolean result.
push mov add add push
wp data+2 ax, i ax, ax ax, wp data ax
;Pass data[i] by reference.
push mov add add push
wp data+2 ax, j ax, ax ax, wp data ax
;Pass data[j] by reference.
call
ProcSwap
NextJ:
inc jmp
j ForJLp
JDone:
inc jmp
i ForILp
IDone:
pop pop mov pop ret endp
bx es sp, bp bp 10
sort
; Main's activation record looks like this: ; ; | return address |<- SP, BP ; |----------------| ; ; begin ; ; for i := 0 to 255 do A[i] := 128-i; ; for i := 0 to 255 do B[i] := 33000-i; ; sort(A, 256, LTint, SwapInt); ; sort(B, 256, LTunsigned, SwapUnsigned); ; ; for i := 0 to 255 do ; begin ; if (i mod 8) = 0 then writeln; ; write(A[i]:5); ; end; ; ; for i := 0 to 255 do ; begin ; if (i mod 8) = 0 then writeln; ; write(B[i]:5); ; end; ; ; end; Main
proc mov mov mov
ax, dseg ds, ax es, ax
;Initialize the segment registers.
; Note that the following code merges the two initialization for loops ; into a single loop. mov mov
Page 694
ax, 128 bx, 0
Procedures: Advanced Topics ForILp:
mov mov mov add dec dec cmp jb
cx, 33000 A[bx], ax B[bx], cx bx, 2 ax cx bx, 256*2 ForILp
push push push push push call
ds offset A 256 offset LTint offset SwapInt Sort
;Seg address of A ;Offset of A ;# of elements in A ;Address of compare routine ;Address of swap routine
push push push push push call
ds offset B 256 offset LTunsigned offset SwapInt Sort
;Seg address of B ;Offset of B ;# of elements in A ;Address of compare routine ;Address of swap routine
; Print the values in A. ForILp2:
NotMod:
mov test jnz putcr mov mov putisize add cmp jb
; Print the values in B. mov ForILp3: test jnz putcr NotMod2: mov mov putusize add cmp jb
12.8 1)
bx, 0 bx, 0Fh NotMod
;See if (I mod 8) = 0 ; note: BX mod 16 = I mod 8.
ax, A[bx] cx, 5 bx, 2 bx, 256*2 ForILp2 bx, 0 bx, 0Fh NotMod2
;See if (I mod 8) = 0 ; note: BX mod 16 = I mod 8.
ax, B[bx] cx, 5 bx, 2 bx, 256*2 ForILp3
Quit: Main cseg
ExitPgm endp ends
;DOS macro to quit program.
sseg stk sseg
segment word ends
para stack 'stack' 256 dup (0)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Programming Projects Write at iterator to which you pass an array of characters by reference. The iterator should return an index into the array that points at a whitespace character (any ASCII code less than or equal to a space) it finds. On each call, the iterator should return the index of the next whitespace character. The iterator should fail if it encounters a byte containing the value zero. Use local variables for any values the iterator needs.
Page 695
Chapter 12 2)
Write a recursive routine that does the following: function recursive(i:integer):integer; var j,k:integer; begin j := i; k := i*i; if (i >= 0) then writeln(‘AR Address =’, Recursive(i-1)); writeln(i,’ ‘,j,’ ‘,k); recursive := Value in BP Register; end;
From your main program, call this procedure and pass it the value 10 on the stack. Verify that you get correct results back. Explain the results. 3)
Write a program that contains a procedure to which you pass four parameters on the stack. These should be passed by value, reference, value-result, and result, respectively (for the value-result parameter, pass the address of the object on the stack). Inside that procedure, you should call three other procedures that also take four parameters (each). However, the first parameter should use pass by value for all four parameters; the second procedure should use pass by reference for all four parameters; and the third should use pass by value-result for all four parameters. Pass the four parameters in the enclosing procedure as parameters to each of these three child procedures. Inside the child procedures, print the parameter’s values and change their results. Immediately upon return from each of these child procedures, print the parameters’ values. Write a main program that passes four local (to the main program) variables you’ve initialized with different values to the first procedure above. Run the program and verify that it is operating correctly and that it is passing the parameters to each of these procedures in a reasonable fashion.
4)
Write a program that implements the following Pascal program in assembly language. Assume that all program variables (including globals in the main program) are allocated in activation records on the stack. program nest3; var i:integer; procedure A(k:integer); procedure B(procedure c); var m:integer; begin for m:= 0 to 4 do c(m); end; {B} procedure D(n:integer); begin for i:= 0 to n-1 do writeln(i); end; {D} procedure E; begin writeln(‘A stuff:’); B(A); writeln(‘D stuff:’); B(D); end; {E} begin {A} B(D); writeln; if k < 2 then E;
Page 696
Procedures: Advanced Topics end; {A} begin {nest3} A(0); end; {nest3}
5)
The program in Section 12.7.2 (Ex12_2.asm on the companion CD-ROM) uses the 80286 enter and leave instructions to maintain the display in each activation record. As pointed out in Section 12.1.6, these instructions are quite slow, especially on 80486 and later processors. Rewrite this code by replacing the enter and leave instructions with the straight-line code that does the same job. In CodeView, single step through the program as per the second laboratory exercise (Section 12.7.2) to verify that your stack frames are identical to those the enter and leave instructions produce.
6)
The generic Bubble Sort program in Section 12.7.3 only works with data objects that are two bytes wide. This is because the Sort procedure passes the values of Data[I] and Data[J] on the stack to the comparison routines (LTint and LTunsigned) and because the sort routine multiplies the i and j indexes by two when indexing into the data array. This is a severe shortcoming to this generic sort routine. Rewrite the program to make it truly generic. Do this by writing a “CompareAndSwap” routine that will replace the LT and Swap calls. To CompareAndSwap you should pass the array (by reference) and the two array indexes (i and j) to compare and possibly swap. Write two versions of the CompareAndSwap routine, one for unsigned integers and one for signed integers. Run this program and verify that your implementation works properly.
12.9
Summary Block structured languages, like Pascal, provide access to non-local variables at different lex levels. Accessing non-local variables is a complex task requiring special data structures such as a static link chain or a display. The display is probably the most efficient way to access non-local variables. The 80286 and later processors provide special instructions, enter and leave for maintaining a display list, but these instructions are too slow for most common uses. For additional details, see • • • • • • • • •
“Lexical Nesting, Static Links, and Displays” on page 639 “Scope” on page 640 “Static Links” on page 642 “Accessing Non-Local Variables Using Static Links” on page 647 “The Display” on page 648 “The 80286 ENTER and LEAVE Instructions” on page 650 “Passing Variables at Different Lex Levels as Parameters.” on page 652 “Passing Parameters as Parameters to Another Procedure” on page 655 “Passing Procedures as Parameters” on page 659
Iterators are a cross between a function and a looping construct. They are a very powerful programming construct available in many very high level languages. Efficient implementation of iterators involves careful manipulation of the stack at run time. To see how to implement iterators, read the following sections: • • • • •
“Iterators” on page 663 “Implementing Iterators Using In-Line Expansion” on page 664 “Implementing Iterators with Resume Frames” on page 666 “An Example of an Iterator” on page 669 “Another Iterator Example” on page 673
Page 697
Chapter 12
12.10 Questions 1)
What is an iterator?
2)
What is a resume frame?
3)
How do the iterators in this chapter implement the success and failure results?
4)
What does the stack look like when executing the body of a loop controlled by an iterator?
5
What is a static link?
6)
What is a display?
7)
Describe how to access a non-local variable when using static links.
8)
Describe how to access a non-local variable when using a display.
9)
How would you access a non-local variable when using the display built by the 80286 ENTER instruction?
10)
Draw a picture of the activation record for a procedure at lex level 4 that uses the ENTER instruction to build the display.
11)
Explain why the static links work better than a display when passing procedures and functions as parameters.
12)
Suppose you want to pass an intermediate variable by value-result using the technique where you push the value before calling the procedure and then pop the value (storing it back into the intermediate variable) upon return from the procedure. Provide two examples, one using static links and one using a display, that implement pass by value-result in this fashion.
13)
Convert the following (pseudo) Pascal code into 80x86 assembly language. Assume Pascal supports pass by name and pass by lazy evaluation parameters as suggested by the following code. program main; var k:integer; procedure one(LazyEval i:integer); begin writeln(i); end; procedure two(name j:integer); begin one(j); end; begin {main} k := 2; two(k); end;
Page 698
MS-DOS, PC-BIOS, and File I/O
Chapter 13
A typical PC system consists of many component besides the 80x86 CPU and memory. MS-DOS and the PC’s BIOS provide a software connection between your application program and the underlying hardware. Although it is sometimes necessary to program the hardware directly yourself, more often than not it’s best to let the system software (MS-DOS and the BIOS) handle this for you. Furthermore, it’s much easier for you to simply call a routine built into your system than to write the routine yourself. You can access the IBM PC system hardware at one of three general levels from assembly language. You can program the hardware directly, you can use ROM BIOS routines to access the hardware for you, or you can make MS-DOS calls to access the hardware. Each level of system access has its own set of advantages and disadvantages. Programming the hardware directly offers two advantages over the other schemes: control and efficiency. If you’re controlling the hardware modes, you can get that last drop of performance out of the system by taking advantage of special hardware tricks or other details which a general purpose routine cannot. For some programs, like screen editors (which must have high speed access to the video display), accessing the hardware directly is the only way to achieve reasonable performance levels. On the other hand, programming the hardware directly has its drawbacks as well. The screen editor which directly accesses video memory may not work if a new type of video display card appears for the IBM PC. Multiple display drivers may be necessary for such a program, increasing the amount of work to create and maintain the program. Furthermore, had you written several programs which access the screen memory directly and IBM produced a new, incompatible, display adapter, you’d have to rewrite all your programs to work with the new display card. Your work load would be reduced tremendously if IBM supplied, in a fixed, known, location, some routines which did all the screen I/O operations for you. Your programs would all call these routines. When a manufacturer introduces a new display adapter, it supplies a new set of video display routines with the adapter card. These new routines would patch into the old ones (replacing or augmenting them) so that calls to the old routines would now call the new routines. If the program interface is the same between the two set of routines, your programs will still work with the new routines. IBM has implemented such a mechanism in the PC system firmware. Up at the high end of the one megabyte memory space in the PC are some addresses dedicated to ROM data storage. These ROM memory chips contain special software called the PC Basic Input Output System, or BIOS. The BIOS routines provide a hardware-independent interface to various devices in the IBM PC system. For example, one of the BIOS services is a video display driver. By making various calls to the BIOS video routines, your software will be able to write characters to the screen regardless of the actual display board installed. At one level up is MS-DOS. While the BIOS allows you to manipulate devices in a very low level fashion, MS-DOS provides a high-level interface to many devices. For example, one of the BIOS routines allows you to access the floppy disk drive. With this BIOS routine you may read or write blocks on the diskette. Unfortunately, the BIOS doesn’t know about things like files and directories. It only knows about blocks. If you want to access a file on the disk drive using a BIOS call, you’ll have to know exactly where that file appears on the diskette surface. On the other hand, calls to MS-DOS allow you to deal with filenames rather than file disk addresses. MS-DOS keeps track of where files are on the disk surface and makes calls to the ROM BIOS to read the appropriate blocks for you. This high-level interface greatly reduces the amount of effort your software need expend in order to access data on the disk drive. The purpose of this chapter is to provide a brief introduction to the various BIOS and DOS services available to you. This chapter does not attempt to begin to describe all of the routines or the options available to each routine. There are several other texts the size of this one which attempt to discuss just the BIOS or just MS-DOS. Furthermore, any attempt Page 699 Thi d
t
t d ith F
M k
402
Chapter 13 to provide complete coverage of MS-DOS or the BIOS in a single text is doomed to failure from the start– both are a moving target with specifications changing with each new version. So rather than try to explain everything, this chapter will simply attempt to present the flavor. Check in the bibliography for texts dealing directly with BIOS or MS -DOS.
13.0
Chapter Overview This chapter presents material that is specific to the PC. This information on the PC’s BIOS and MS-DOS is not necessary if you want to learn about assembly language programming; however, this is important information for anyone wanting to write assembly language programs that run under MS-DOS on a PC compatible machine. As a result, most of the information in this chapter is optional for those wanting to learn generic 80x86 assembly language programming. On the other hand, this information is handy for those who want to write applications in assembly language on a PC. The sections below that have a “•” prefix are essential. Those sections with a “❏” discuss advanced topics that you may want to put off for a while. • ❏
• ❏ ❏ ❏
• ❏
• • ❏ ❏ ❏
• • ❏ ❏ ❏ ❏
• • • • • • ❏ ❏ ❏ ❏
• • ❏ ❏ ❏
• • ❏ ❏ ❏
• Page 700
The IBM PC BIOS Print screen. Video services. Equipment installed. Memory available. Low level disk services Serial I/O. Miscellaneous services. Keyboard services. Printer services. Run BASIC. Reboot computer. Real time clock. MS-DOS calling sequence. MS-DOS character functions MS-DOS drive commands. MS-DOS date and time functions. MS-DOS memory management functions. MS-DOS process control functions. MS_DOS “new” filing calls. Open file. Create file. Close file. Read from a file. Write to a file. Seek. Set disk transfer address. Find first file. Find next file. Delete file. Rename file. Change/get file attributes. Get/set file date and time. Other DOS calls File I/O examples. Blocked file I/O. The program segment prefix. Accessing command line parameters. ARGC and ARGV. UCR Standard Library file I/O routines.
MS-DOS, PC BIOS, and File I/O • • • • • • • • ❏
13.1
FOPEN. FCREATE. FCLOSE. FFLUSH. FGETC. FREAD. FPUTC FWRITE. Redirection I/O through the STDLIB file I/O routines.
The IBM PC BIOS Rather than place the BIOS routines at fixed memory locations in ROM, IBM used a much more flexible approach in the BIOS design. To call a BIOS routine, you use one of the 80x86’s int software interrupt instructions. The int instruction uses the following syntax: int
value
Value is some number in the range 0..255. Execution of the int instruction will cause the 80x86 to transfer control to one of 256 different interrupt handlers. The interrupt vector table, starting at physical memory location 0:0, holds the addresses of these interrupt handlers. Each address is a full segmented address, requiring four bytes, so there are 400h bytes in the interrupt vector table -- one segmented address for each of the 256 possible software interrupts. For example, int 0 transfers control to the routine whose address is at location 0:0, int 1 transfers control to the routine whose address is at 0:4, int 2 via 0:8, int 3 via 0:C, and int 4 via 0:10. When the PC resets, one of the first operations it does is initialize several of these interrupt vectors so they point at BIOS service routines. Later, when you execute an appropriate int instruction, control transfers to the appropriate BIOS code. If all you’re doing is calling BIOS routines (as opposed to writing them), you can view the int instruction as nothing more than a special call instruction.
13.2
An Introduction to the BIOS’ Services The IBM PC BIOS uses software interrupts 5 and 10h..1Ah to accomplish various operations. Therefore, the int 5, and int 10h.. int 1ah instructions provide the interface to BIOS. The following table summarizes the BIOS services:
INT 5 10h 11h 12h 13h 14h 15h 16h 17h 18h 19h 1Ah
Function Print Screen operation. Video display services. Equipment determination. Memory size determination. Diskette and hard disk services. Serial I/O services. Miscellaneous services. Keyboard services. Printer services. BASIC. Reboot. Real time clock services.
Most of these routines require various parameters in the 80x86’s registers. Some require additional parameters in certain memory locations. The following sections describe the exact operation of many of the BIOS routine.
Page 701
Chapter 13
13.2.1
INT 5- Print Screen Instruction: int 5h BIOS Operation: Print the current text screen. Parameters: None If you execute the int 5h instruction, the PC will send a copy of the screen image to the printer exactly as though you’d pressed the PrtSc key on the keyboard. In fact, the BIOS issues an int 5 instruction when you press the PrtSc, so the two operations are absolutely identical (other than one is under software control rather than manual control). Note that the 80286 and later also uses int 5 for the BOUNDS trap.
13.2.2
INT 10h - Video Services int 10h Instruction: BIOS Operation: Video I/O Services Parameters: Several, passed in ax, bx, cx, dx, and es:bp registers.
The int 10h instruction does several video display related functions. You can use it to initialize the video display, set the cursor size and position, read the cursor position, manipulate a light pen, read or write the current display page, scroll the data in the screen up or down, read and write characters, read and write pixels in a graphics display mode, and write strings to the display. You select the particular function to execute by passing a value in the ah register. The video services represent one of the largest set of BIOS calls available. There are many different video display cards manufactured for PCs, each with minor variations and often each having its own set of unique BIOS functions. The BIOS reference in the appendices lists some of the more common functions available, but as pointed out earlier, this list is quite incomplete and out of date given the rapid change in technology. Probably the most commonly used video service call is the character output routine:
Name: Parameters
Write char to screen in TTY mode ah = 0Eh, al = ASCII code (In graphics mode, bl = Page number)
This routine writes a single character to the display. MS-DOS calls this routine to display characters on the screen. The UCR Standard Library also provides a call which lets you write characters directly to the display using BIOS calls. Most BIOS video display routines are poorly written. There is not much else that can be said about them. They are extremely slow and don’t provide much in the way of functionality. For this reason, most programmers (who need a high-performance video display driver) end up writing their own display code. This provides speed at the expense of portability. Unfortunately, there is rarely any other choice. If you need functionality rather than speed, you should consider using the ANSI.SYS screen driver provided with MS-DOS. This display driver provides all kinds of useful services such as clear to end of line, clear to end of screen, etc. For more information, consult your DOS manual.
Table 49: BIOS Video Functions (Partial List) AH
Input Parameters
Output Parameters
Description
0
al=mode
Sets the video display mode.
1
ch- Starting line. cl- ending line
Sets the shape of the cursor. Line values are in the range 0..15. You can make the cursor disappear by loading ch with 20h.
Page 702
MS-DOS, PC BIOS, and File I/O
Table 49: BIOS Video Functions (Partial List) AH
Input Parameters
2
bh- page dh- y coordinate dl- x coordinate
3
bh- page
Output Parameters
Description Position cursor to location (x,y) on the screen. Generally you would specify page zero. BIOS maintains a separate cursor for each page.
ch- starting line cl- ending line dl- x coordinate dh- y coordinate
Get cursor position and shape.
Obsolete (Get Light Pen Position).
4 5
al- display page
Set display page. Switches the text display page to the specified page number. Page zero is the standard text page. Most color adapters support up to eight text pages (0..7).
6
al- Number of lines to scroll. bh- Screen attribute for cleared area. cl- x coordinate UL ch- y coordinate UL dl- x coordinate LR dh- y coordinate LR
Clear or scroll up. If al contains zero, this function clears the rectangular portion of the screen specified by cl/ch (the upper left hand corner) and dl/dh (the lower right hand corner). If al contains any other value, this service will scroll that rectangular window up the number of lines specified in al.
7
al- Number of lines to scroll. bh- Screen attribute for cleared area. cl- x coordinate UL ch- y coordinate UL dl- x coordinate LR dh- y coordinate LR
Clear or scroll down. If al contains zero, this function clears the rectangular portion of the screen specified by cl/ch (the upper left hand corner) and dl/dh (the lower right hand corner). If al contains any other value, this service will scroll that rectangular window down the number of lines specified in al.
8
bh- display page
9
al- character bh- page bl- attribute cx- # of times to repli-
al- char read ah- char attribute
Read character’s ASCII code and attribute byte from current screen position. This call writes cx copies of the character and attribute in al/bl starting at the current cursor position on the screen. It does not change the cursor’s position.
cate character 0Ah
al- character bh- page
Writes character in al to the current screen position using the existing attribute. Does not change cursor position.
0Bh
bh- 0 bl- color
Sets the border color for the text display.
0Eh
al- character bh- page
Write a character to the screen. Uses existing attribute and repositions cursor after write.
0Fh
ah- # columns al- display mode bh- page
Get video mode
Note that there are many other BIOS 10h subfunctions. Mostly, these other functions deal with graphics modes (the BIOS is too slow for manipulating graphics, so you shouldn’t use those calls) and extended features for certain video display cards. For more information on these calls, pick up a text on the PC’s BIOS.
Page 703
Chapter 13
13.2.3
INT 11h - Equipment Installed Instruction: int 11h BIOS Operation: Return an equipment list Parameters: On entry: None, on exit: AX contains equipment list On return from int 11h, the AX register contains a bit-encoded equipment list with the following values:
Bit 0 Bit 1 Bits 2,3 Bits 4,5
Bits 6,7 Bit 8 Bits 9,10,11 Bit 12 Bit 13 Bits 14,15
Floppy disk drive installed Math coprocessor installed System board RAM installed (obsolete) Initial video mode 00- none 01- 40x25 color 10- 80x25 color 11- 80x25 b/w Number of disk drives DMA present Number of RS-232 serial cards installed Game I/O card installed Serial printer attached Number of printers attached.
Note that this BIOS service was designed around the original IBM PC with its very limited hardware expansion capabilities. The bits returned by this call are almost meaningless today.
13.2.4
INT 12h - Memory Available Instruction: int 12h BIOS Operation: Determine memory size Parameters: Memory size returned in AX Back in the days when IBM PCs came with up to 64K memory installed on the motherboard, this call had some meaning. However, PCs today can handle up to 64 megabytes or more. Obviously this BIOS call is a little out of date. Some PCs use this call for different purposes, but you cannot rely on such calls working on any machine.
13.2.5
INT 13h - Low Level Disk Services Instruction: int 13h BIOS Operation: Diskette Services Parameters: ax, es:bx, cx, dx (see below) The int 13h function provides several different low-level disk services to PC programs: Reset the diskette system, get the diskette status, read diskette sectors, write diskette sectors, verify diskette sectors, and format a diskette track and many more. This is another example of a BIOS routine which has changed over the years. When this routine was first developed, a 10 megabyte hard disk was considered large. Today, a typical high performance game requires 20 to 30 megabytes of storage.
Page 704
MS-DOS, PC BIOS, and File I/O
Table 50: Some Common Disk Subsystem BIOS Calls AH 0
Input Parameters dl- drive (0..7fh is
floppy, 80h..ffh is hard)
1
dl- drive (as above)
Output Parameters ah- status (0 and carry clear if no error, error code if error).
Resets the specified disk drive. Resetting a hard disk also resets the floppy drives.
ah- 0 al- status of previous
This call returns the following status values in al: 0- no error 1- invalid command 2- address mark not found 3- disk write protected 4- couldn’t find sector 5- reset error 6- removed media 7- bad parameter table 8- DMA overrun 9- DMA operation crossed 64K boundary 10- illegal sector flag 11- illegal track flag 12- illegal media 13- invalid # of sectors 14- control data address mark encountered 15- DMA error 16- CRC data error 17- ECC corrected data error 32- disk controller failed 64- seek error 128- timeout error 170- drive not ready 187- undefined error 204- write error 224- status error 255- sense failure
disk operation.
2
al- # of sectors to read es:bx- buffer address cl- bits 0..5: sector # cl- bits 6/7- track bits 8
Description
ah- return status al- burst error length
carry- 0:success, 1:error
Reads the specified number of 512 byte sectors from the disk. Data read must be 64 Kbytes or less.
&9 ch- track bits 0..7. dl- drive # (as above) dh- bits 0..5: head # dh- bits 6&7: track bits 10 & 11. 3
same as (2) above
same as (2) above
Writes the specified number of 512 byte sectors to the disk. Data written must not exceed 64 Kbytes in length.
4
Same as (2) above except there is no need for a buffer.
same as (2) above
Verifies the data in the specified number of 512 byte sectors on the disk.
0Ch
Same as (4) above except there is no need for a sector #
Same as (4) above
Sends the disk head to the specified track on the disk.
Page 705
Chapter 13
Table 50: Some Common Disk Subsystem BIOS Calls AH
Input Parameters
0Dh
dl- drive # (80h or 81h)
Output Parameters ah- return status carry-0:no error 1:error
Description Reset the hard disk controller
Note: see appropriate BIOS documentation for additional information about disk subsystem BIOS support.
13.2.6
INT 14h - Serial I/O Instruction: int 14h BIOS Operation: Access the serial communications port Parameters: ax, dx The IBM BIOS supports up to four different serial communications ports (the hardware supports up to eight). In general, most PCs have one or two serial ports (COM1: and COM2:) installed. Int 14h supports four subfunctions- initialize, transmit a character, receive a character, and status. For all four services, the serial port number (a value in the range 0..3) is in the dx register (0=COM1:, 1=COM2:, etc.). Int 14h expects and returns other data in the al or ax register.
13.2.6.1 AH=0: Serial Port Initialization Subfunction zero initializes a serial port. This call lets you set the baud rate, select parity modes, select the number of stop bits, and the number of bits transmitted over the serial line. These parameters are all specified by the value in the al register using the following bit encodings:
Page 706
Bits 5..7
Function Select baud rate 000- 110 baud 001- 150 010- 300 011- 600 100- 1200 101- 2400 110- 4800 111- 9600
3..4
Select parity 00- No parity 01- Odd parity 10- No parity 11- Even parity
2
Stop bits
0..1
0-One stop bit 1-Two stop bits Character Size 10- 7 bits 11- 8 bits
MS-DOS, PC BIOS, and File I/O Although the standard PC serial port hardware supports 19,200 baud, some BIOSes may not support this speed. Example: Initialize COM1: to 2400 baud, no parity, eight bit data, and two stop bitsmov mov mov int
ah, 0 al, 10100111b dx, 0 14h
;Initialize opcode ;Parameter data. ;COM1: port.
After the call to the initialization code, the serial port status is returned in ax (see Serial Port Status, ah=3, below).
13.2.6.2 AH=1: Transmit a Character to the Serial Port This function transmits the character in the al register through the serial port specified in the dx register. On return, if ah contains zero, then the character was transmitted properly. If bit 7 of ah contains one, upon return, then some sort of error occurred. The remaining seven bits contain all the error statuses returned by the GetStatus call except time out error (which is returned in bit seven). If an error is reported, you should use subfunction three to get the actual error values from the serial port hardware. Example: Transmit a character through the COM1: port mov mov mov int test jnz
dx, 0 al, ‘a’ ah, 1 14h ah, 80h SerialError
;Select COM1: ;Character to transmit ;Transmit opcode ;Check for error
This function will wait until the serial port finishes transmitting the last character (if any) and then it will store the character into the transmit register.
13.2.6.3 AH=2: Receive a Character from the Serial Port Subfunction two is used to read a character from the serial port. On entry, dx contains the serial port number. On exit, al contains the character read from the serial port and bit seven of ah contains the error status. When this routine is called, it does not return to the caller until a character is received at the serial port. Example: Reading a character from the COM1: port mov mov int test jnz
dx, 0 ah, 2 14h ah, 80h SerialError
;Select COM1: ;Receive opcode ;Check for error
13.2.6.4
AH=3: Serial Port Status This call returns status information about the serial port including whether or not an error has occurred, if a character has been received in the receive buffer, if the transmit buffer is empty, and other pieces of useful information. On entry into this routine, the dx register contains the serial port number. On exit, the ax register contains the following values:
Page 707
Chapter 13
AX: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bit Meaning Time out error Transmitter shift register empty Transmitter holding register empty Break detection error Framing error Parity error Overrun error Data available Receive line signal detect Ring indicator Data set ready (DSR) Clear to send (CTS) Delta receive line signal detect Trailing edge ring detector Delta data set ready Delta clear to send
There are a couple of useful bits, not pertaining to errors, returned in this status information. If the data available bit is set (bit #8), then the serial port has received data and you should read it from the serial port. The Transmitter holding register empty bit (bit #13) tells you if the transmit operation will be delayed while waiting for the current character to be transmitted or if the next character will be immediately transmitted. By testing these two bits, you can perform other operations while waiting for the transmit register to become available or for the receive register to contain a character. If you’re interested in serial communications, you should obtain a copy of Joe Campbell’s C Programmer’s Guide to Serial Communications. Although written specifically for C programmers, this book contains a lot of information useful to programmers working in any programming language. See the bibliography for more details.
13.2.7
INT 15h - Miscellaneous Services Originally, int 15h provided cassette tape read and write services1. Almost immediately, everyone realized that cassettes were history, so IBM began using int 15h for many other services. Today, int 15h is used for a wide variety of function including accessing expanded memory, reading the joystick/game adapter card, and many, many other operations. Except for the joystick calls, most of these services are beyond the scope of this text. Check on the bibliography if you interested in obtaining information on this BIOS call.
13.2.8
INT 16h - Keyboard Services Instruction: int 16h BIOS Operation: Read a key, test for a key, or get keyboard status Parameters: al The IBM PC BIOS provides several function calls dealing with the keyboard. As with many of the PC BIOS routines, the number of functions has increased over the years. This section describes the three calls that were available on the original IBM PC.
1. For those who do not remember that far back, before there were hard disks people used to use only floppy disks. And before there were floppy disks, people used to use cassette tapes to store programs and data. The original IBM PC was introduced in late 1981 with a cassette port. By early 1982, no one was using cassette tape for data storage.
Page 708
MS-DOS, PC BIOS, and File I/O
13.2.8.1 AH=0: Read a Key From the Keyboard If int 16h is called with ah equal to zero, the BIOS will not return control to the caller until a key is available in the system type ahead buffer. On return, al contains the ASCII code for the key read from the buffer and ah contains the keyboard scan code. Keyboard scan codes are described in the appendices. Certain keys on the PC’s keyboard do not have any corresponding ASCII codes. The function keys, Home, PgUp, End, PgDn, the arrow keys, and the Alt keys are all good examples. When such a key is pressed, int 16h returns a zero in al and the keyboard scan code in ah. Therefore, whenever an ASCII code of zero is returned, you must check the ah register to determine which key was pressed. Note that reading a key from the keyboard using the BIOS int 16h call does not echo the key pressed to the display. You have to call putc or use int 10h to print the character once you’ve read it if you want it echoed to the screen. Example: Read a sequence of keystrokes from the keyboard until Enter is pressed. ReadLoop:
13.2.8.2
mov int cmp jz putc cmp jne
ah, 0 16h al, 0 ReadLoop al, 0dh ReadLoop
;Read Key opcode ;Special function? ;If so, don’t echo this keystroke ;Carriage return (ENTER)?
AH=1: See if a Key is Available at the Keyboard This particular int 16h subfunction allows you to check to see if a key is available in the system type ahead buffer. Even if a key is not available, control is returned (right away!) to the caller. With this call you can occasionally poll the keyboard to see if a key is available and continue processing if a key hasn’t been pressed (as opposed to freezing up the computer until a key is pressed). There are no input parameters to this function. On return, the zero flag will be clear if a key is available, set if there aren’t any keys in the type ahead buffer. If a key is available, then ax will contain the scan and ASCII codes for that key. However, this function will not remove that keystroke from the typeahead buffer. Subfunction #0 must be used to remove characters. The following example demonstrates how to build a random number generator using the test keyboard function: Example: Generating a random number while waiting for a keystroke ; First, clear any characters out of the type ahead buffer ClrBuffer:
BufferIsClr: GenRandom:
mov int jz mov int jmp
ah, 1 16h BufferIsClr ah, 0 16h ClrBuffer
mov inc mov int jz xor mov int
cx, 0 cx ah, 1 16h GenRandom cl, ch ah, 0 16h
;Is a key available? ;If not, Discontinue flushing ;Flush this character from the ; buffer and try again.
;Initialize “random” number. ;See if a key is available yet.
;Randomize even more. ;Read character from buffer
; Random number is now in CL, key pressed by user is in AX
Page 709
Chapter 13 While waiting for a key, this routine is constantly incrementing the cx register. Since human beings cannot respond rapidly (at least in terms of microseconds) the cl register will overflow many times, even for the fastest typist. As a result, cl will contain a random value since the user will not be able to control (to better than about 2ms) when a key is pressed.
13.2.8.3 AH=2: Return Keyboard Shift Key Status This function returns the state of various keys on the PC keyboard in the al register. The values returned are as follows:
Bit 7 6 5 4 3 2 1 0
Meaning Insert state (toggle by pressing INS key) Caps lock (1=capslock on) Num lock (1=numlock on) Scroll lock (1=scroll lock on) Alt (1=Alt key currently down) Ctrl (1=Ctrl key currently down) Left shift (1=left shift key down) Right shift (1=right shift key down)
Due to a bug in the BIOS code, these bits only reflect the current status of these keys, they do not necessarily reflect the status of these keys when the next key to be read from the system type ahead buffer was depressed. In order to ensure that these status bits correspond to the state of these keys when a scan code is read from the type ahead buffer, you’ve got to flush the buffer, wait until a key is pressed, and then immediately check the keyboard status.
13.2.9
INT 17h - Printer Services Instruction: int 17h BIOS Operation: Print data and test the printer status Parameters: ax, dx Int 17h controls the parallel printer interfaces on the IBM PC in much the same way the int 14h controls the serial ports. Since programming a parallel port is considerably easier than controlling a serial port, using the int 17h routine is somewhat easier than using the int 14h routines. Int 17h provides three subfunctions, specified by the value in the ah register. These subfunctions are:
0-Print the character in the AL register. 1-Initialize the printer. 2-Return the printer status. Each of these functions is described in the following sections. Like the serial port services, the printer port services allow you to specify which of the three printers installed in the system you wish to use (LPT1:, LPT2:, or LPT3:). The value in the dx register (0..2) specifies which printer port is to be used. One final note- under DOS it’s possible to redirect all printer output to a serial port. This is quite useful if you’re using a serial printer. The BIOS printer services only talk to parallel printer adapters. If you need to send data to a serial printer using BIOS, you’ll have to use int 14h to transmit the data through a serial port.
Page 710
MS-DOS, PC BIOS, and File I/O
13.2.9.1 AH=0: Print a Character If ah is zero when you call int 17h, then the BIOS will print the character in the al register. Exactly how the character code in the al register is treated is entirely up to the printer device you’re using. Most printers, however, respect the printable ASCII character set and a few control characters as well. Many printers will also print all the symbols in the IBM/ASCII character set (including European, line drawing, and other special symbols). Most printers treat control characters (especially ESC sequences) in completely different manners. Therefore, if you intend to print something other than standard ASCII characters, be forewarned that your software may not work on printers other than the brand you’re developing your software on. Upon return from the int 17h subfunction zero routine, the ah register contains the current status. The values actually returned are described in the section on subfunction number two.
13.2.9.2 AH=1: Initialize Printer Executing this call sends an electrical impulse to the printer telling it to initialize itself. On return, the ah register contains the printer status as per function number two.
13.2.9.3 AH=2: Return Printer Status This function call checks the printer status and returns it in the ah register. The values returned are:
AH: 7 6 5 4 3 2 1 0
Bit Meaning 1=Printer busy, 0=printer not busy 1=Acknowledge from printer 1=Out of paper signal 1=Printer selected 1=I/O error Not used Not used Time out error
Acknowledge from printer is, essentially, a redundant signal (since printer busy/not busy gives you the same information). As long as the printer is busy, it will not accept additional data. Therefore, calling the print character function (ah=0) will result in a delay. The out of paper signal is asserted whenever the printer detects that it is out of paper. This signal is not implemented on many printer adapters. On such adapters it is always programmed to a logic zero (even if the printer is out of paper). Therefore, seeing a zero in this bit position doesn’t always guarantee that there is paper in the machine. Seeing a one here, however, definitely means that your printer is out of paper. The printer selected bit contains a one as long as the printer is on-line. If the user takes the printer off-line, then this bit will be cleared. The I/O error bit contains a one if some general I/O error has occurred. The time out error bit contains a one if the BIOS routine waited for an extended period of time for the printer to become “not busy” yet the printer remained busy. Note that certain peripheral devices (other than printers) also interface to the parallel port, often in addition to a parallel printer. Some of these devices use the error/status signal lines to return data to the PC. The software controlling such devices often takes over the int 17h routine (via a technique we’ll talk about later on) and always returns a “no error” status or “time out error” status if an error occurs on the printing device. Therefore, Page 711
Chapter 13 you should take care not to depend too heavily on these signals changing when you make the int 17h BIOS calls.
13.2.10 INT 18h - Run BASIC Instruction: int 18h BIOS Operation: Activate ROM BASIC Parameters: None Executing int 18h activates the ROM BASIC interpreter in an IBM PC. However, you shouldn’t use this mechanism to run BASIC since many PC compatibles do not have BASIC in ROM and the result of executing int 18h is undefined.
13.2.11 INT 19h - Reboot Computer Instruction: int 19h BIOS Operation: Restart the system Parameters: None Executing this interrupt has the same effect as pressing control-alt-del on the keyboard. For obvious reasons, this interrupt service should be handled carefully!
13.2.12 INT 1Ah - Real Time Clock Instruction: int 1ah BIOS Operation: Real time clock services Parameters: ax, cx, dx There are two services provided by this BIOS routine- read the clock and set the clock. The PC’s real time clock maintains a counter that counts the number of 1/18ths of a second that have transpired since midnight. When you read the clock, you get the number of ”ticks” which have occurred since then. When you set the clock, you specify the number of “ticks” which have occurred since midnight. As usual, the particular service is selected via the value in the ah register.
13.2.12.1 AH=0: Read the Real Time Clock If ah = 0, then int 1ah returns a 33-bit value in al:cx:dx as follows:
Reg dx cx al
Value Returned L.O. word of clock count H.O. word of clock count Zero if timer has not run for more than 24 hours Non-zero otherwise.
The 32-bit value in cx:dx represents the number of 55 millisecond periods which have elapsed since midnight.
Page 712
MS-DOS, PC BIOS, and File I/O
13.2.12.2 AH=1: Setting the Real Time Clock This call allows you to set the current system time value. cx:dx contains the current count (in 55ms increments) since last midnight. Cx contains the H.O. word, dx contains the L.O. word.
13.3
An Introduction to MS-DOS MS-DOS provides all of the basic file manager and device manager functions required by most application programs running on an IBM PC. MS-DOS handles file I/O, character I/0, memory management, and other miscellaneous functions in a (relatively) consistent manner. If you’re serious about writing software for the PC, you’ll have to get real friendly with MS-DOS. The title of this section is “An Introduction to MS-DOS”. And that’s exactly what it means. There is no way MS-DOS can be completely covered in a single chapter. Given all of the different books that already exist on the subject, it probably cannot even be covered by a single book (it certainly hasn’t been yet. Microsoft wrote a 1,600 page book on the subject and it didn’t even cover the subject fully). All this is leading up to a cop-out. There is no way this subject can be treated in more than a superficial manner in a single chapter. If you’re serious about writing programs in assembly language for the PC, you’ll need to complement this text with several others. Additional books on MS-DOS include: MS-DOS Programmer’s Reference (also called the MS-DOS Technical Reference Manual), Peter Norton’s Programmer’s Guide to the IBM PC, The MS-DOS Encyclopedia, and the MS-DOS Developer’s Guide. This, of course, is only a partial list of the books that are available. See the bibliography in the appendices for more details. Without a doubt, the MS-DOS Technical Reference Manual is the most important text to get your hands on. This is the official description of MS-DOS calls and parameters. MS-DOS has a long and colorful history2. Throughout its lifetime, it has undergone several revisions, each purporting to be better than the last. MS-DOS’ origins go all the way back to the CP/M-80 operating system written for the Intel 8080 microprocessor chip. In fact, MS-DOS v1.0 was nothing much more than a clone of CP/M-80 for Intel’s 8088 microprocessor. Unfortunately, CP/M-80’s file handling capabilities were horrible, to say the least. Therefore, DOS3 improved on CP/M. New file handling capabilities, compatible with Xenix and Unix, were added to DOS, producing MS-DOS v2.0. Additional calls were added to later versions of MS-DOS. Even with the introduction of OS/2 and Windows NT (which, as this is being written, have yet to take the world by storm), Microsoft is still working on enhancements to MS-DOS which may produce even later versions. Each new feature added to DOS introduced new DOS functions while preserving all of the functionality of the previous versions of DOS. When Microsoft rewrote the DOS file handling routines in version two, they didn’t replace the old calls, they simply added new ones. While this preserved software compatibility of programs that ran under the old version of DOS, what it produced was a DOS with two sets of functionally identical, but otherwise incompatible, file services. We’re only going to concentrate on a small subset of the available DOS commands in this chapter. We’re going to totally ignore those obsolete commands that have been augmented by newer, better, commands in later versions of DOS. Furthermore, we’re going to skip over a description of those calls that have very little use in day to day programming. For a complete, detailed, look at the commands not covered in this chapter, you should consider the acquisition of one of the aforementioned books.
2. The MS-DOS Encyclopedia gives Microsoft’s account of the history of MS-DOS. Of course, this is a one-sided presentation, but it’s interesting nonetheless. 3. This text uses “DOS” to mean MS-DOS.
Page 713
Chapter 13
13.3.1
MS-DOS Calling Sequence MS-DOS is called via the int 21h instruction. To select an appropriate DOS function, you load the ah register with a function number before issuing the int 21h instruction. Most DOS calls require other parameters as well. Generally, these other parameters are passed in the CPU’s register set. The specific parameters will be discussed along with each call. Unless MS-DOS returns some specific value in a register, all of the CPU’s registers are preserved across a call to DOS4.
13.3.2
MS-DOS Character Oriented Functions DOS provides 12 character oriented I/O calls. Most of these deal with writing and reading data to/from the keyboard, video display, serial port, and printer port. All of these functions have corresponding BIOS services. In fact, DOS usually calls the appropriate BIOS function to handle the I/O operation. However, due to DOS’ redirected I/O and device driver facilities, these functions don’t always call the BIOS routines. Therefore, you shouldn’t call the BIOS routines (rather than DOS) simply because DOS ends up calling BIOS. Doing so may prevent your program from working with certain DOS-supported devices. Except for function code seven, all of the following character oriented calls check the console input device (keyboard) for a control-C. If the user presses a control-C, DOS executes an int 23h instruction. Usually, this instruction will cause the program to abort and control will be returned to DOS. Keep this in mind when issuing these calls. Microsoft considers these calls obsolete and does not guarantee they will be present in future versions of DOS. So take these first 12 routines with a rather large grain of salt. Note that the UCR Standard Library provides the functionality of many of these calls anyway, and they make the proper DOS calls, not the obsolete ones.
Table 51: DOS Character Oriented Functions Function # (AH)
Input Parameters
1
2
Output Parameters al- char read
dl- output char
3
Description
Console Input w/Echo: Reads a single character from the keyboard and displays typed character on screen. Console Output: Writes a single character to the display.
al- char read
Auxiliary Input: Reads a single character from the serial port.
4
dl- output char
Auxiliary Output: Writes a single character to the output port
5
dl- output char
Printer Output: Writes a single character to the printer
4. So Microsoft claims. This may or may not be true across all versions of DOS.
Page 714
MS-DOS, PC BIOS, and File I/O
Table 51: DOS Character Oriented Functions Function # (AH)
Input Parameters
Description
al- char read (if input dl = 0FFh)
Direct Console I/O: On input, if dl contains 0FFh, this function attempts to read a character from the keyboard. If a character is available, it returns the zero flag clear and the character in al. If no character is available, it returns the zero flag set. On input, if dl contains a value other than 0FFh, this routine sends the character to the display. This routine does not do ctrl-C checking.
7
al- char read
Direct Console Input: Reads a character from the keyboard. Does not echo the character to the display. This call does not check for ctrl-C
8
al- char read
Read Keyboard w/o Echo: Just like function 7 above, except this call checks for ctrl-C.
6
dl- output char
Output Parameters
(if not 0FFh)
9
ds:dx- pointer to string terminated with “$”.
Display String: This function displays the characters from location ds:dx up to (but not including) a terminating “$” character.
0Ah
ds:dx- pointer to input buffer.
Buffered Keyboard Input: This function reads a line of text from the keyboard and stores it into the input buffer pointed at by ds:dx. The first byte of the buffer must contain a count between one and 255 that contains the maximum number of allowable characters in the input buffer. This routine stores the actual number of characters read in the second byte. The actual input characters begin at the third byte of the buffer. al- status (0=not
0Bh
ready, 0FFh=ready) 0Ch
al- DOS opcode 0,
al- input charac-
1, 6, 7, or 8
ter if opcode 1, 6, 7, or 8.
Check Keyboard Status: Determines whether a character is available from the keyboard. Flush Buffer: This call empties the system type ahead buffer and then executes the DOS command specified in the al register (if al=0, no further action).
Functions 1, 2, 3, 4, 5, 9, and 0Ah are obsolete and you should not use them. Use the DOS file I/O calls instead (opcodes 3Fh and 40h).
Page 715
Chapter 13
13.3.3
MS-DOS Drive Commands MS-DOS provides several commands that let you set the default drive, determine which drive is the default, and perform some other operations. The following table lists those functions.
Table 52: DOS Disk Drive Functions Function # (AH)
Input Parameters
Output Parameters
0Dh
0Eh
Reset Drive: Flushes all file buffers to disk. Generally called by ctrl-C handlers or sections of code that need to guaranteed file consistency because an error may occur. dl- drive number
al- number of
logical drives
19H
al- default drive
number 1Ah
ds:dx- Disk Transfer Area address.
1Bh
al- sectors/clus-
Page 716
dl- drive number
Set Default Drive: sets the DOS default drive to the specified value (0=A, 1=B, 2=C, etc.). Returns the number of logical drives in the system, although they may not be contiguous from 0-al. Get Default Drive: Returns the current system default drive number (0=A, 1=B, 2=C, etc.). Set Disk Transfer Area Address: Sets the address that MS-DOS uses for obsolete file I/O and Find First/Find Next commands.
ter cx- bytes/sector dx- # of clusters ds:bx - points at media descriptor byte
1Ch
Description
See above
Get Default Drive Data: Returns information about the disk in the default drive. Also see function 36h. Typical values for the media descriptor byte include: 0F0h- 3.5” 0F8h- Hard disk 0F9h- 720K 3.5” or 1.2M 5.25” 0FAh- 320K 5.25” 0FBh- 640K 3.5” 0FCh- 180K 5.25” 0FDh- 360K 5.25: 0FEh- 160K 5.25” 0FFh- 320K 5.25” Get Drive Data: same as above except you can specify the drive number in the dl register (0=default, 1=A, 2=B, 3=C, etc.).
MS-DOS, PC BIOS, and File I/O
Table 52: DOS Disk Drive Functions Function # (AH)
Input Parameters
1Fh
Output Parameters al- contains 0FFh
if error, 0 if no error. ds:bx- ptr to DPB
2Eh
al- verify flag
Get Default Disk Parameter Block (DPB): If successful, this function returns a pointer to the following structure: Drive (byte) - Drive number (0-A, 1=B, etc.). Unit (byte) - Unit number for driver. SectorSize (word) - # bytes/sector. ClusterMask (byte) - sectors/cluster minus one. Cluster2 (byte) - 2clusters/sector FirstFAT (word) - Address of sector where FAT starts. FATCount (byte) - # of FATs. RootEntries (word) - # of entries in root directory. FirstSector (word) - first sector of first cluster. MaxCluster (word) - # of clusters on drive, plus one. FATsize (word) - # of sectors for FAT. DirSector (word) - first sector containing directory. DriverAdrs (dword) - address of device driver. Media (byte) - media descriptor byte. FirstAccess (byte) - set if there has been an access to drive. NextDPB (dword) - link to next DPB in list. NextFree (word) - last allocated cluster. FreeCnt (word) - number of free clusters. Set/Reset Verify Flag: Turns on and off write verification. Usually off since this is a slow operation, but you can turn it on when performing critical I/O.
(0=no verify, 1=verify on). 2Fh
Description
es:bx- pointer to DTA
Get Disk Transfer Area Address: Returns a pointer to the current DTA in es:bx..
32h
dl- drive number.
Same as 1Fh
Get DPB: Same as function 1Fh except you get to specify the driver number (0=default, 1=A, 2=B, 3=C, etc.).
33h
al- 05 (subfunc-
dl- startup drive
tion code)
#.
Get Startup Drive: Returns the number of the drive used to boot DOS (1=A, 2=B, 3=C, etc.).
dl- drive number.
ax- sectors/clus-
36h
ter bx- available clusters cx- bytes/sector dx- total clusters 54h
13.3.4
al- verify state.
Get Disk Free Space: Reports the amount of free space. This call supersedes calls 1Bh and 1Ch that only support drives up to 32Mbytes. This call handles larger drives. You can compute the amount of free space (in bytes) by bx*ax*cx. If an error occurs, this call returns 0FFFFh in ax. Get Verify State: Returns the current state of the write verify flag (al=0 if off, al=1 if on).
MS-DOS “Obsolete” Filing Calls DOS functions 0Fh - 18h, 1Eh, 20h-24h, and 26h - 29h are the functions left over from the days of CP/M-80. In general, you shouldn’t bother at all with these calls since
Page 717
Chapter 13 MS-DOS v2.0 and later provides a much better way to accomplish the operations performed by these calls.
13.3.5
MS-DOS Date and Time Functions The MS-DOS date and time functions return the current date and time based on internal values maintained by the real time clock (RTC). Functions provided by DOS include reading and setting the date and time. These date and time values are used to perform date and time stamping of files when files are created on the disk. Therefore, if you change the date or time, keep in mind that it will have an effect on the files you create thereafter. Note that the UCR Standard Library also provides a set of date and time functions which, in many cases, are somewhat easier to use than these DOS calls.
Table 53: Date and Time Functions Function # (AH)
Input Parameters
Output Parameters al- day (0=Sun,
2Ah
Description
Get Date: returns the current MS-DOS date.
1=Mon, etc.). cx- year dh- month (1=Jan, 2=Feb, etc.). dl- Day of month (1-31). 2Bh
cx- year (1980 -
Set Date: sets the current MS-DOS date.
2099) dh- month (1-12) dl- day (1-31) 2CH
ch- hour (24hr
fmt) cl- minutes dh- seconds dl- hundredths 2Dh
13.3.6
ch- hour cl- minutes dh- seconds dl- hundredths
Get Time: reads the current MS-DOS time. Note that the hundredths of a second field has a resolution of 1/18 second.
Set Time: sets the current MS-DOS time.
MS-DOS Memory Management Functions MS-DOS provides three memory management functions- allocate, deallocate, and resize (modify). For most programs, these three memory allocation calls are not used. When DOS executes a program, it gives all of the available memory, from the start of that program to the end of RAM, to the executing process. Any attempt to allocate memory without first giving unused memory back to the system will produce an “insufficient memory” error. Sophisticated programs which terminate and remain resident, run other programs, or perform complex memory management tasks, may require the use of these memory management functions. Generally these types of programs immediately deallocate all of the memory that they don’t use and then begin allocating and deallocating storage as they see fit.
Page 718
MS-DOS, PC BIOS, and File I/O Since these are complex functions, they shouldn’t be used unless you have a very specific purpose for them. Misusing these commands may result in loss of system memory that can be reclaimed only by rebooting the system. Each of the following calls returns the error status in the carry flag. If the carry is clear on return, then the operation was completed successfully. If the carry flag is set when DOS returns, then the ax register contains one of the following error codes:
7- Memory control blocks destroyed 8- Insufficient memory 9- Invalid memory block address Additional notes about these errors will be discussed as appropriate.
13.3.6.1 Allocate Memory Function (ah): 48h Entry parameters: bx- Requested block size (in paragraphs) Exit parameters: If no error (carry clear): ax:0 points at allocated memory block If an error (carry set): bx- maximum possible allocation size ax- error code (7 or 8) This call is used to allocate a block of memory. On entry into DOS, bx contains the size of the requested block in paragraphs (groups of 16 bytes). On exit, assuming no error, the ax register contains the segment address of the start of the allocated block. If an error occurs, the block is not allocated and the ax register is returned containing the error code. If the allocation request failed due to insufficient memory, the bx register is returned containing the maximum number of paragraphs actually available.
13.3.6.2 Deallocate Memory Function (ah): 49h Entry parameters: es:0- Segment address of block to be deallocated Exit parameters: If the carry is set, ax contains the error code (7,9) This call is used to deallocate memory allocated via function 48h above. The es register cannot contain an arbitrary memory address. It must contain a value returned by the allocate memory function. You cannot use this call to deallocate a portion of an allocated block. The modify allocation function is used for that operation.
13.3.6.3 Modify Memory Allocation Function (ah): 4Ah Entry parameters: es:0- address of block to modify allocation size bx- size of new block Exit parameters: If the carry is set, then ax contains the error code 7, 8, or 9 bx contains the maximum size possible (if error 8) This call is used to change the size of an allocated block. On entry, es must contain the segment address of the allocated block returned by the memory allocation function. Bx must contain the new size of this block in paragraphs. While you can almost always reduce the size of a block, you cannot normally increase the size of a block if other blocks have been allocated after the block being modified. Keep this in mind when using this function. Page 719
Chapter 13
13.3.6.4 Advanced Memory Management Functions The MS-DOS 58h opcode lets programmers adjust MS-DOS’ memory allocation strategy and control the use of upper memory blocks (UMBs). There are four subfunctions to this call, with the subfunction value appearing in the al register. The following table describes these calls:
Table 54: Advanced Memory Management Functions Function # (AH)
Input Parameters
58h
al-0
58h
al-1 bx- strategy
58H
al- 2
58h
al- 3 bx- link flag
Output Parameters ax- strategy
Description
Get Allocation Strategy: Returns the current allocation strategy in ax (see table below for details). Set Allocation Strategy: Sets the MS-DOS allocation strategy to the value specified in bx (see the table below for details).
al- link flag
Get Upper Memory Link: Returns true/false (1/0) in al to determine whether a program can allocate memory in the upper memory blocks. Set Upper Memory Link: Links or unlinks the upper memory area. When linked, an application can allocate memory from the UMB (using the normal DOS allocate call).
(0=no link, 1=link okay).
Table 55: Memory Allocation Strategies Value
Name
Description
0
First Fit Low
Search conventional memory for the first free block of memory large enough to satisfy the allocation request. This is the default case.
1
Best Fit Low
Search conventional memory for the smallest block large enough to satisfy the request.
2
Last Fit Low
Search conventional memory from the highest address downward for the first block large enough to satisfy the request.
80h
First Fit High
Search high memory, then conventional memory, for the first available block that can satisfy the allocation request.
81h
Best Fit High
Search high memory, then conventional memory for the smallest block large enough to satisfy the allocation request.
82h
Last Fit High
Search high memory from high addresses to low, then conventional memory from high addresses to low, for the first block large enough to satisfy the request.
40h
First Fit Highonly
Search high memory only for the first block large enough to satisfy the request.
41h
Best Fit Highonly
Search high memory only for the smallest block large enough to satisfy the request.
Page 720
MS-DOS, PC BIOS, and File I/O
Table 55: Memory Allocation Strategies Value 42h
Name Last Fit Highonly
Description Search high memory only, from the end of memory downward, for the first block large enough to satisfy the request.
These different allocation strategies can have an impact on system performance. For an analysis of different memory management strategies, please consult a good operating systems theory text.
13.3.7
MS-DOS Process Control Functions DOS provides several services dealing with loading, executing, and terminating programs. Many of these functions have been rendered obsolete by later versions of DOS. There are three5 functions of general interest- program termination, terminate and stay resident, and execute a program. These three functions will be discussed in the following sections.
13.3.7.1 Terminate Program Execution Function (ah): 4Ch Entry parameters: al- return code Exit parameters: Does not return to your program This is the function call normally used to terminate your program. It returns control to the calling process (normally, but not necessarily, DOS). A return code can be passed to the calling process in the al register. Exactly what meaning this return code has is entirely up to you. This return code can be tested with the DOS “IF ERRORLEVEL return code” command in a DOS batch file. All files opened by the current process will be automatically closed upon program termination. Note that the UCR Standard Library function “ExitPgm” is simply a macro which makes this particular DOS call. This is the normal way of returning control back to MS-DOS or some other program which ran the currently active application.
13.3.7.2 Terminate, but Stay Resident Function (ah): 31h Entry parameters: al- return code dx- memory size, in paragraphs Exit parameters: does not return to your program This function also terminates program execution, but upon returning to DOS, the memory in use by the process is not returned to the DOS free memory pool. Essentially, the program remains in memory. Programs which remain resident in memory after returning to DOS are often called TSRs (terminate and stay resident programs). When this command is executed, the dx register contains the number of memory paragraphs to leave around in memory. This value is measured from the beginning of the “program segment prefix”, a segment marking the start of your file in memory. The address of the PSP (program segment prefix) is passed to your program in the ds register
5. Actually, there are others. See the DOS technical reference manual for more details. We will only consider these three here.
Page 721
Chapter 13 when your program is first executed. You’ll have to save this value if your program is a TSR6. Programs that terminate and stay resident need to provide some mechanism for restarting. Once they return to DOS they cannot normally be restarted. Most TSRs patch into one of the interrupt vectors (such as a keyboard, printer, or serial interrupt vector) in order to restart whenever some hardware related event occurs (such as when a key is pressed). This is how “pop-up” programs like SmartKey work. Generally, TSR programs are pop-ups or special device drivers. The TSR mechanism provides a convenient way for you to load your own routines to replace or augment BIOS’ routines. Your program loads into memory, patches the appropriate interrupt vector so that it points at an interrupt handler internal to your code, and then terminates and stays resident. Now, when the appropriate interrupt instruction is executed, your code will be called rather than BIOS’. There are far too many details concerning TSRs including compatibility issues, DOS re-entrancy issues, and how interrupts are processed, to be considered here. Additional details will appear in a later chapter.
13.3.7.3 Execute a Program Function (ah): 40h Entry parameters: ds:dx- pointer to pathname of program to execute es:bx- Pointer to parameter block al- 0=load and execute, 1=load only, 3=load overlay. Exit parameters: If carry is set, ax contains one of the following error codes: 1- invalid function 2- file not found 5- access denied 8- not enough memory 10- bad environment 11- bad format The execute (exec) function is an extremely complex, but at the same time, very useful operation. This command allows you to load or load and execute a program off of the disk drive. On entry into the exec function, the ds:dx registers contain a pointer to a zero terminated string containing the name of the file to be loaded or executed, es:bx points at a parameter block, and al contains zero or one depending upon whether you want to load and execute a program or simply load it into memory. On return, if the carry is clear, then DOS properly executed the command. If the carry flag is set, then DOS encountered an error while executing the command. The filename parameter can be a full pathname including drive and subdirectory information. “B:\DIR1\DIR2\MYPGM.EXE” is a perfectly valid filename (remember, however, it must be zero terminated). The segmented address of this pathname is passed in the ds:dx registers. The es:bx registers point at a parameter block for the exec call. This parameter block takes on three different forms depending upon whether a program is being loaded and executed (al=0), just loaded into memory (al=1), or loaded as an overlay (al=3). If al=0, the exec call loads and executes a program. In this case the es:bx registers point at a parameter block containing the following values:
Offset 0 2
Description A word value containing the segment address of the default environment (usually this is set to zero which implies the use of the standard DOS environment). Double word pointer containing the segment address of a command line string.
6. DOS also provides a call which will return the PSP for your program.
Page 722
MS-DOS, PC BIOS, and File I/O
6 0Ah
Double word pointer to default FCB at address 5Ch Double word pointer to default FCB at address 6Ch
The environment area is a set of strings containing default pathnames and other information (this information is provided by DOS using the PATH, SET, and other DOS commands). If this parameter entry contains zero, then exec will pass the standard DOS environment on to the new procedure. If non-zero, then this parameter contains the segment address of the environment block that your process is passing on to the program about to be executed. Generally, you should store a zero at this address. The pointer to the command string should contain the segmented address of a length prefixed string which is also terminated by a carriage return character (the carriage return character is not figured into the length of the string). This string corresponds to the data that is normally typed after the program name on the DOS command line. For example, if you’re executing the linker automatically, you might pass a command string of the following form: CmdStr
byte
16,”MyPgm+Routines /m”,0dh
The second item in the parameter block must contain the segmented address of this string. The third and fourth items in the parameter block point at the default FCBs. FCBs are used by the obsolete DOS filing commands, so they are rarely used in modern application programs. Since the data structures these two pointers point at are rarely used, you can point them at a group of 20 zeros. Example: Format a floppy disk in drive A: using the FORMAT.EXE command mov mov mov mov lea mov mov lea int
ah, al, dx, ds, dx, bx, es, bx, 21h
4Bh 0 seg PathName dx PathName seg ParmBlock bx ParmBlock
. . .
PathName ParmBlock
byte word dword dword
‘C:\DOS\FORMAT.EXE’,0 0 CmdLine Dummy,Dummy
CmdLine Dummy
byte byte
3,’ A:’,0dh 20 dup (?)
;Default environment ;Command line string ;Dummy FCBs
MS-DOS versions earlier than 3.0 do not preserve any registers except cs:ip when you execute the exec call. In particular, ss:sp is not preserved. If you’re using DOS v2.x or earlier, you’ll need to use the following code: ;Example: Format a floppy disk in drive A: using the FORMAT.EXE command mov mov mov mov mov mov lea mov mov lea int mov mov
cs:SS_Save, ss cs:SP_Save, sp ah, 4Bh al, 0 dx, seg PathName ds, dx dx, PathName bx, seg ParmBlock es, bx bx, ParmBlock 21h ss, cs:SS_Save sp, cs:SP_Save
;Save SS:SP to a location ; we have access to later. ;EXEC DOS opcode. ;Load and execute. ;Get filename into DS:DX.
;Point ES:BX at parameter ; block.
;Restore SS:SP from saved ; locations.
Page 723
Chapter 13 . . .
SS_Save SP_Save
word word
? ?
. . .
PathName ParmBlock
CmdLine Dummy
byte word dword dword byte byte
‘C:\DOS\FORMAT.EXE’,0 0 CmdLine Dummy,Dummy;Dummy 3,’ A:’,0dh 20 dup (?)
;Default environment ;Command line string ;FCBs
SS_Save and SP_Save must be declared inside your code segment. The other variables can
be declared anywhere. The exec command automatically allocates memory for the program being executed. If you haven’t freed up unused memory before executing this command, you may get an insufficient memory error. Therefore, you should use the DOS deallocate memory command to free up unused memory before attempting to use the exec command. If al=1 when the exec function executes, DOS will load the specified file but will not execute it. This function is generally used to load a program to execute into memory but give the caller control and let the caller start that code. When this function call is made, es:bx points at the following parameter block:
Offset 0 2 6 0Ah 0Eh 12h
Description Word value containing the segment address of the environment block for the new process. If you want to use the parent process’ environment block set this word to zero. Dword pointer to the command tail for this operation. The command tail is the command line string which will appear at location PSP:80 (See “The Program Segment Prefix (PSP)” on page 739 and “Accessing Command Line Parameters” on page 742). Address of default FCB #1. For most programs, this should point at a block of 20 zeros (unless, of course, you’re running a program which uses FCBs.). Address of default FCB #2. Should also point at a block of 20 zeros. SS:SP value. You must load these four bytes into SS and SP before starting the application. CS:IP value. These four bytes contain the starting address of the program.
The SSSP and CSIP fields are output values. DOS fills in the fields and returns them in the load structure. The other fields are all inputs which you must fill in before calling the exec function with al=1. When you execute the exec command with al=-3, DOS simply loads an overlay into memory. Overlays generally consist of a single code segment which contains some functions you want to execute. Since you are not creating a new process, the parameter block for this type of load is much simpler than for the other two types of load operations. On entry, es:bx must point at the following parameter block in memory:
Offset 0 2
Description Word value containing the segment address of where this file is going to be loaded into memory. The file will be loaded at offset zero within this segment. Word value containing a relocation factor for this file.
Unlike the load and execute functions, the overlay function does not automatically allocate storage for the file being loaded. Your program has to allocate sufficient storage and then pass the address of this storage block to the exec command (though the parameter block above). Only the segment address of this block is passed to the exec command, the offset is always assumed to be zero. The relocation factor should also contain the segment address for “.EXE” files. For “.COM” files, the relocation factor parameter should be zero. Page 724
MS-DOS, PC BIOS, and File I/O The overlay command is quite useful for loading overlays from disk into memory. An overlay is a segment of code which resides on the disk drive until the program actually needs to execute its code. Then the code is loaded into memory and executed. Overlays can reduce the amount of memory your program takes up by allowing you to reuse the same portion of memory for different overlay procedures (clearly, only one such procedure can be active at any one time). By placing seldom-used code and initialization code into overlay files, you can help reduce the amount of memory used by your program file. One word of caution, however, managing overlays is a very complex task. This is not something a beginning assembly language programmer would want to tackle right away. When loading a file into memory (as opposed to loading and executing a file), DOS does not scramble all of the registers, so you needn’t take the extra care necessary to preserve the ss:sp and other registers. The MS-DOS Encyclopedia contains an excellent description of the use of the exec function.
13.3.8
MS-DOS “New” Filing Calls Starting with DOS v2.0, Microsoft introduced a set of file handling procedures which (finally) made disk file access bearable under MS-DOS. Not only bearable, but actually easy to use! The following sections describe the use of these commands to access files on a disk drive. File commands which deal with filenames (Create, Open, Delete, Rename, and others) are passed the address of a zero-terminated pathname. Those that actually open a file (Create and Open) return a file handle as the result (assuming, of course, that there wasn’t an error). This file handle is used with other calls (read, write, seek, close, etc.) to gain access to the file you’ve opened. In this respect, a file handle is not unlike a file variable in Pascal. Consider the following Microsoft/Turbo Pascal code: program DemoFiles; var F:TEXT; begin assign(f,’FileName.TXT’); rewrite(f); writeln(f,’Hello there’); close(f); end.
The file variable “f” is used in this Pascal example in much the same way that a file handle is used in an assembly language program – to gain access to the file that was created in the program. All the following DOS filing commands return an error status in the carry flag. If the carry flag is clear when DOS returns to your program, then the operation was completed successfully. If the carry flag is set upon return, then some sort of error has occurred and the AX register contains the error number. The actual error return values will be discussed along with each function in the following sections.
13.3.8.1 Open File Function (ah): 3Dh Entry parameters: al- file access value 0- File opened for reading 1- File opened for writing 2- File opened for reading and writing ds:dx- Point at a zero terminated string containing the filename. Exit parameters: If the carry is set, ax contains one of the following error codes: 2- File not found Page 725
Chapter 13
4- Too many open files 5- Access denied 12- Invalid access If the carry is clear, ax contains the file handle value assigned by DOS. A file must be opened before you can access it. The open command opens a file that already exists. This makes it quite similar to Pascal’s Reset procedure. Attempting to open a file that doesn’t exist produces an error. Example: lea mov mov int jc mov
dx, Filename ah, 3dh al, 0 21h OpenError FileHandle, ax
;Assume DS points at segment ; of filename ;Open for reading.
If an error occurs while opening a file, the file will not be opened. You should always check for an error after executing a DOS open command, since continuing to operate on the file which hasn’t been properly opened will produce disastrous consequences. Exactly how you handle an open error is up to you, but at the very least you should print an error message and give the user the opportunity to specify a different filename. If the open command completes without generating an error, DOS returns a file handle for that file in the ax register. Typically, you should save this value away somewhere so you can use it when accessing the file later on.
13.3.8.2 Create File Function (ah): 3Ch Entry parameters: ds:dx- Address of zero terminated pathname cx- File attribute Exit parameters: If the carry is set, ax contains one of the following error codes: 3- Path not found 4- Too many open files 5- Access denied If the carry is clear, ax is returned containing the file handle Create opens a new file for output. As with the OPEN command, ds:dx points at a zero terminated string containing the filename. Since this call creates a new file, DOS assumes that you’re opening the file for writing only. Another parameter, passed in cx, is the initial file attribute settings. The L.O. six bits of cx contain the following values:
Bit 0 1 2 3 4 5
Meaning if equal to one File is a Read-Only file File is a hidden file File is a system file File is a volume label name File is a subdirectory File has been archived
In general, you shouldn’t set any of these bits. Most normal files should be created with a file attribute of zero. Therefore, the cx register should be loaded with zero before calling the create function. Upon exit, the carry flag is set if an error occurs. The “Path not found” error requires some additional explanation. This error is generated, not if the file isn’t found (which would be most of the time since this command is typically used to create a new file), but if a subdirectory in the pathname cannot be found. If the carry flag is clear when DOS returns to your program, then the file has been properly opened for output and the ax register contains the file handle for this file.
Page 726
MS-DOS, PC BIOS, and File I/O
13.3.8.3 Close File Function (ah): 3Eh Entry parameters: bx- File Handle Exit parameters: If the carry flag is set, ax contains 6, the only possible error, which is an invalid handle error. This call is used to close a file opened with the Open or Create commands above. It is passed the file handle in the bx register and, assuming the file handle is valid, closes the specified file. You should close all files your program uses as soon as you’re through with them to avoid disk file corruption in the event the user powers the system down or resets the machine while your files are left open. Note that quitting to DOS (or aborting to DOS by pressing control-C or control-break) automatically closes all open files. However, you should never rely on this feature since doing so is an extremely poor programming practice.
13.3.8.4 Read From a File Function (ah): 3Fh Entry parameters: bx- File handle cx- Number of bytes to read ds:dx- Array large enough to hold bytes read Exit parameters: If the carry flag is set, ax contains one of the following error codes 5- Access denied 6- Invalid handle If the carry flag is clear, ax contains the number of bytes actually read from the file. The read function is used to read some number of bytes from a file. The actual number of bytes is specified by the cx register upon entry into DOS. The file handle, which specifies the file from which the bytes are to be read, is passed in the bx register. The ds:dx register contains the address of a buffer into which the bytes read from the file are to be stored. On return, if there wasn’t an error, the ax register contains the number of bytes actually read. Unless the end of file (EOF) was reached, this number will match the value passed to DOS in the cx register. If the end of file has been reached, the value return in ax will be somewhere between zero and the value passed to DOS in the cx register. This is the only test for the EOF condition. Example: This example opens a file and reads it to the EOF
LP:
EOF:
mov mov lea int jc mov
ah, 3dh al, 0 dx, Filename 21h BadOpen FHndl, ax
;Open the file ;Open for reading ;Presume DS points at filename ; segment.
mov lea mov mov int jc cmp jne mov putc jmp
ah,3fh dx, Buffer cx, 1 bx, FHndl 21h ReadError ax, cx EOF al, Buffer
;Read data from the file ;Address of data buffer ;Read one byte ;Get file handle value
LP
;Get character read ;Print it ;Read next byte
mov mov
bx, FHndl ah, 3eh
;Close file
;Save file handle
;EOF reached?
Page 727
Chapter 13 int jc
21h CloseError
This code segment will read the entire file whose (zero-terminated) filename is found at address “Filename” in the current data segment and write each character in the file to the standard output device using the UCR StdLib putc routine. Be forewarned that one-character-at-a-time I/O such as this is extremely slow. We’ll discuss better ways to quickly read a file a little later in this chapter.
13.3.8.5 Write to a File Function (ah): 40h Entry parameters: bx- File handle cx- Number of bytes to write ds:dx- Address of buffer containing data to write Exit parameters: If the carry is set, ax contains one of the following error codes 5- Accessed denied 6- Invalid handle If the carry is clear on return, ax contains the number of bytes actually written to the file. This call is almost the converse of the read command presented earlier. It writes the specified number of bytes at ds:dx to the file rather than reading them. On return, if the number of bytes written to the file is not equal to the number originally specified in the cx register, the disk is full and this should be treated as an error. If cx contains zero when this function is called, DOS will truncate the file to the current file position (i.e., all data following the current position in the file will be deleted).
13.3.8.6 Seek (Move File Pointer) Function (ah):
42h Entry parameters: al- Method of moving
Exit parameters:
0- Offset specified is from the beginning of the file. 1- Offset specified is distance from the current file pointer. 2- The pointer is moved to the end of the file minus the specified offset. bx- File handle. cx:dx- Distance to move, in bytes. If the carry is set, ax contains one of the following error codes 1- Invalid function 6- Invalid handle If the carry is clear, dx:ax contains the new file position
This command is used to move the file pointer around in a random access file. There are three methods of moving the file pointer, an absolute distance within the file (if al=0), some positive distance from the current file position (if al=1), or some distance from the end of the file (if al=2). If AL doesn’t contain 0, 1, or 2, DOS will return an invalid function error. If this call is successfully completed, the next byte read or written will occur at the specified location. Note that DOS treats cx:dx as an unsigned integer. Therefore, a single seek command cannot be used to move backwards in the file. Instead, method #0 must be used to position the file pointer at some absolute position in the file. If you don’t know where you currently are and you want to move back 256 bytes, you can use the following code: mov mov xor xor
Page 728
ah, al, cx, dx,
42h 1 cx dx
;Seek command ;Move from current location ;Zero out CX and DX so we ; stay right here
MS-DOS, PC BIOS, and File I/O mov int jc sub sbb mov mov mov mov int
bx, FileHandle 21h SeekError ax, 256 dx, 0 cx, dx dx, ax ah, 42h al, 0 21h
;DX:AX now contains the ; current file position, so ; compute a location 256 ; bytes back. ;Absolute file position ;BX still contains handle.
13.3.8.7 Set Disk Transfer Address (DTA) Function (ah):
1Ah Entry parameters: ds:dx- Pointer to DTA
Exit parameters:
None
This command is called “Set Disk Transfer Address” because it was (is) used with the original DOS v1.0 file functions. We wouldn’t normally consider this function except for the fact that it is also used by functions 4Eh and 4Fh (described next) to set up a pointer to a 43-byte buffer area. If this function isn’t executed before executing functions 4Eh or 4Fh, DOS will use the default buffer space at PSP:80h.
13.3.8.8 Find First File Function (ah): 4Eh Entry parameters: cx- Attributes ds:dx- Pointer to filename Exit parameters: If carry is set, ax contains one of the following error codes 2- File not found 18- No more files The Find First File and Find Next File (described next) functions are used to search for files specified using ambiguous file references. An ambiguous file reference is any filename containing the “*” and “?” wildcard characters. The Find First File function is used to locate the first such filename within a specified directory, the Find Next File function is used to find successive entries in the directory. Generally, when an ambiguous file reference is provided, the Find First File command is issued to locate the first occurrence of the file, and then a loop is used, calling Find Next File, to locate all other occurrences of the file within that loop until there are no more files (error #18). Whenever Find First File is called, it sets up the following information at the DTA:
Offset 0 21 22 24 26 30
Description Reserved for use by Find Next File Attribute of file found Time stamp of file Date stamp of file File size in bytes Filename and extension (zero terminated)
(The offsets are decimal) Assuming Find First File doesn’t return some sort of error, the name of the first file matching the ambiguous file description will appear at offset 30 in the DTA. Note: if the specified pathname doesn’t contain any wildcard characters, then Find First File will return the exact filename specified, if it exists. Any subsequent call to Find Next File will return an error. Page 729
Chapter 13 The cx register contains the search attributes for the file. Normally, cx should contain zero. If non-zero, Find First File (and Find Next File) will include file names which have the specified attributes as well as all normal file names.
13.3.8.9 Find Next File Function (ah): 4Fh Entry parameters: none Exit parameters: If the carry is set, then there aren’t any more files and ax will be returned containing 18. The Find Next File function is used to search for additional file names matching an ambiguous file reference after a call to Find First File. The DTA must point at a data record set up by the Find First File function. Example: The following code lists the names of all the files in the current directory that end with “.EXE”. Presumably, the variable “DTA” is in the current data segment:
DirLoop: PrtName:
NextEntry:
mov lea int xor lea mov int jc lea cld lodsb test jz putc jmp
ah, 1Ah dx, DTA 21h cx, cx dx, FileName ah, 4Eh 21h NoMoreFiles si, DTA+30
mov int jnc
ah, 4Fh 21h DirLoop
al, al NextEntry
;Set DTA
;No attributes. ;Find First File ;If error, we’re done ;Address of filename
;Zero byte? ;Print this character
PrtName ;Find Next File ;Print this name
13.3.8.10 Delete File Function (ah): 41h Entry parameters: ds:dx- Address of pathname to delete Exit parameters: If carry set, ax contains one of the following error codes 2- File not found 5- Access denied This function will delete the specified file from the directory. The filename must be an unambiguous filename (i.e., it cannot contain any wildcard characters).
13.3.8.11 Rename File Function (ah):
56h Entry parameters:
Exit parameters:
ds:dx- Pointer to pathname of existing file es:di- Pointer to new pathname If carry set, ax contains one of the following error codes
2- File not found 5- Access denied 17- Not the same device
Page 730
MS-DOS, PC BIOS, and File I/O This command serves two purposes: it allows you to rename one file to another and it allows you to move a file from one directory to another (as long as the two subdirectories are on the same disk). Example: Rename “MYPGM.EXE” to “YOURPGM.EXE” ; Assume ES and DS both point at the current data segment ; containing the filenames. lea lea mov int jc
OldName NewName
byte byte
dx, OldName di, NewName ah, 56h 21h BadRename . . . “MYPGM.EXE”,0 “YOURPGM.EXE”,0
Example #2: Move a filename from one directory to another: ; Assume ES and DS both point at the current data segment ; containing the filenames. lea lea mov int jc
OldName NewName
byte byte
dx, OldName di, NewName ah, 56h 21h BadRename . . . “\DIR1\MYPGM.EXE”,0 “\DIR2\MYPGM.EXE”,0
13.3.8.12 Change/Get File Attributes Function (ah): 43h Entry parameters: al- Subfunction code 0- Return file attributes in cx 1- Set file attributes to those in cx cx- Attribute to be set if AL=01 ds:dx- address of pathname Exit parameters: If carry set, ax contains one of the following error codes: 1- Invalid function 3- Pathname not found 5- Access denied If the carry is clear and the subfunction was zero cx will contain the file’s attributes. This call is useful for setting/resetting and reading a file’s attribute bits. It can be used to set a file to read-only, set/clear the archive bit, or otherwise mess around with the file attributes.
13.3.8.13 Get/Set File Date and Time Function (ah): 57h Entry parameters: al- Subfunction code 0- Get date and time 1- Set date and time bx- File handle cx- Time to be set (if AL=01) dx- Date to be set (if AL=01) Page 731
Chapter 13
Exit parameters:
If carry set, ax contains one of the following error codes 1- Invalid subfunction 6- Invalid handle If the carry is clear, cx/dx is set to the time/date if al=00
This call sets the “last-write” date/time for the specified file. The file must be open (using open or create) before using this function. The date will not be recorded until the file is closed.
13.3.8.14 Other DOS Calls The following tables briefly list many of the other DOS calls. For more information on the use of these DOS functions consult the Microsoft MS-DOS Programmer’s Reference or the MS-DOS Technical Reference.
Table 56: Miscellaneous DOS File Functions Function # (AH)
Input Parameters
Output Parameters
Description
39h
ds:dx- pointer to zero terminated pathname.
Create Directory: Creates a new directory with the specified name.
3Ah
ds:dx- pointer to zero terminated pathname.
Remove Directory: Deletes the directory with the specified pathname. Error if directory is not empty or the specified directory is the current directory.
3Bh
ds:dx- pointer to zero terminated pathname.
Change Directory: Changes the default directory to the specified pathname.
45h
bx- file handle
46h
bx- file handle cx- duplicate
ax- new handle
Force Duplicate File Handle: Like function 45h above, except you specify which handle (in cx) you want to refer to the existing file (specified by bx).
handle 47h
ds:si- pointer to
Get Current Directory: Stores a string containing the current pathname (terminated with a zero) starting at location ds:si. These registers must point at a buffer containing at least 64 bytes. The dl register specifies the drive number (0=default, 1=A, 2=B, 3=C, etc.).
buffer dl- drive
5Ah
cx- attributes ds:dx- pointer to
temporary path.
Page 732
Duplicate File Handle: creates a copy of a file handle so a program can access a file using two separate file variables. This allows the program to close the file with one handle and continue accessing it with the other.
ax- handle
Create Temporary File: Creates a file with a unique name in the directory specified by the zero terminated string at which ds:dx points. There must be at least 13 zero bytes beyond the end of the pathname because this function will store the generated filename at the end of the pathname. The attributes are the same as for the Create call.
MS-DOS, PC BIOS, and File I/O
Table 56: Miscellaneous DOS File Functions Function # (AH) 5Bh
Input Parameters
Output Parameters
cx- attributes ds:dx- pointer to
ax- handle
zero terminated pathname.
Description
Create New File: Like the create call, but this call insists that the file not exist. It returns an error if the file exists (rather than deleting the old file).
67h
bx- handles
Set Maximum Handle Count: This function sets the maximum number of handles a program can use at any one given time.
68h
bx- handle
Commit File: Flushes all data to a file without closing it, ensuring that the file’s data is current and consistent.
Table 57: Miscellaneous DOS Functions Function # (AH) 25h
Input Parameters
Output Parameters
al- interrupt # ds:dx- pointer to
Set Interrupt Vector: Stores the specified address in ds:dx into the interrupt vector table at the entry specified by the al register.
interrupt service routine. 30h
Description
al- major version ah- minor version bh- Version flag bl:cx- 24 bit serial
Get Version Number: Returns the current version number of DOS (or value set by SETVER).
number 33h
al- 0
dl- break flag
(0=off, 1=on)
33h
al- 1 dl- break flag.
33h
al- 6
34h
35h
al- interrupt #
44h
al- subcode
Other parameters!
Get Break Flag: Returns the status of the DOS break flag. If on, MS-DOS checks for ctrl-C when processing any DOS command; if off, MS-DOS only checks on functions 1-0Ch. Set Break Flag: Sets the MS-DOS break flag according to the value in dl (see function above for details).
bl- major version bh- minor version dl- revision dh- version flags
Get MS-DOS Version: Returns the “real” version number, not the one set by the SETVER command. Bits three and four of the version flags are one if DOS is in ROM or DOS is in high memory, respectively.
es:bx- pointer to InDOS flag.
Get InDOS Flag Address: Returns the address of the InDOS flag. This flag helps prevent reentrancy in TSR applications
es:bx- pointer to interrupt service routine.
Get Interrupt Vector: Returns a pointer to the interrupt service routine for the specified interrupt number. See function 25h above for more details. Device Control: This is a whole family of additional DOS commands to control various devices. See the DOS programmer’s reference manual for more details.
Page 733
Chapter 13
Table 57: Miscellaneous DOS Functions Function # (AH)
Input Parameters
4Dh
Output Parameters al- return value ah- termination
method
50h
bx- PSP address
Description
Get Child Program Return Value: Returns the last result code from a child program in al. The ah register contains the termination method, which is one of the following values: 0-normal, 1-ctrl-C, 2-critical device error, 3-terminate and stay resident. Set PSP Address: Set DOS’ current PSP address to the value specified in the bx register.
51h
bx- PSP address
Get PSP Address: Returns a pointer to the current PSP in the bx register.
59h
ax- extended
Get Extended Error: Returns additional information when an error occurs on a DOS call. See the DOS programmer’s guide for more details on these errors and how to handle them.
error bh- error class bl- error action ch- error location 5Dh
al- 0Ah
Set Extended Error: copies the data from the extended error structure to DOS’ internal record.
ds:si- pointer to extended error structure.
In addition to the above commands, there are several additional DOS calls that deal with networks and international character sets. See the MS-DOS reference for more details.
13.3.9
File I/O Examples Of course, one of the main reasons for making calls to DOS is to manipulate files on a mass storage device. The following examples demonstrate some uses of character I/O using DOS.
13.3.9.1 Example #1: A Hex Dump Utility This program dumps a file in hexadecimal format. The filename must be hard coded into the file (see “Accessing Command Line Parameters” later in this chapter). include stdlib.a includelib stdlib.lib cseg
segment assume
byte public ‘CODE’ cs:cseg, ds:dseg, es:dseg, ss:sseg
MainPgm
proc
far
; Properly set up the segment registers: mov mov mov mov mov lea int jnc
Page 734
ax, seg dseg ds, ax es, ax ah, 3dh al, 0 dx, Filename 21h GoodOpen
;Open file for reading ;File to open
MS-DOS, PC BIOS, and File I/O print byte jmp GoodOpen: ReadFileLp:
NotNewLn:
BadRead:
‘Cannot open file, aborting program...’,cr,0 PgmExit
mov mov mov and jnz putcr mov xchg puth xchg puth print byte
FileHandle, ax Position, 0 al, byte ptr Position al, 0Fh NotNewLn
inc mov mov lea mov int jc cmp jnz mov puth mov putc jmp
Position bx, FileHandle cx, 1 dx, buffer ah, 3Fh 21h BadRead ax, 1 AtEOF al, Buffer
ax, Position al, ah
;Save file handle ;Initialize file pos counter ;Compute (Position MOD 16) ;Start new line each 16 bytes ;Print offset into file
al, ah
‘: ‘,0
al, ‘ ‘
;Increment character count ;Read one byte ;Place to store that byte ;Read operation
;Reached EOF? ;Get the character read and ; print it in hex ;Print a space between values
ReadFileLp
print byte byte byte
cr, lf ‘Error reading data from file, aborting’ cr,lf,0
AtEOF:
mov mov int
bx, FileHandle ah, 3Eh 21h
PgmExit: MainPgm
ExitPgm endp
cseg dseg
ends segment
byte public ‘data’
Filename FileHandle Buffer Position
byte word byte word
‘hexdump.asm’,0 ? ? 0
dseg
ends
sseg stk sseg
segment word ends
byte stack ‘stack’ 0ffh dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?)
;Close the file
;Filename to dump
MainPgm
13.3.9.2 Example #2: Upper Case Conversion The following program reads one file, converts all the lower case characters to upper case, and writes the data to a second output file. include stdlib.a includelib stdlib.lib
Page 735
Chapter 13 cseg
segment assume
byte public ‘CODE’ cs:cseg, ds:dseg, es:dseg, ss:sseg
MainPgm
proc
far
; Properly set up the segment registers: mov mov mov
ax, seg dseg ds, ax es, ax
;---------------------------------------------------------------; ; Convert UCCONVRT.ASM to uppercase ; ; Open input file: mov mov lea int jnc print byte jmp
ah, 3dh al, 0 dx, Filename 21h GoodOpen
mov
FileHandle1, ax
;Save input file handle
mov mov lea int jnc print byte byte mov mov int jmp
ah, 3Ch cx, 0 dx, OutFileName 21h GoodOpen2
;Create file call ;Normal file attributes ;File to open
GoodOpen2:
mov
FileHandle2, ax
ReadFileLp:
mov mov lea mov int jc cmp jz jmp
bx, FileHandle1 cx, 1 dx, buffer ah, 3Fh 21h BadRead ax, 1 ReadOK AtEOF
mov cmp jb cmp ja and mov
al, Buffer al, ‘a’ NotLower al, ‘z’ NotLower al, 5fh Buffer, al
GoodOpen:
;Open file for reading ;File to open
‘Cannot open file, aborting program...’,cr,lf,0 PgmExit
; Open output file:
ReadOK:
NotLower:
‘Cannot open output file, aborting program...’ cr,lf,0 ah, 3eh ;Close input file bx, FileHandle1 21h PgmExit ;Ignore any error. ;Save output file handle ;Read one byte ;Place to store that byte ;Read operation
;Reached EOF?
;Get the character read and ; convert it to upper case
;Set Bit #5 to zero.
; Now write the data to the output file mov mov lea mov int jc cmp jz BadWrite:
Page 736
print
bx, FileHandle2 cx, 1 dx, buffer ah, 40h 21h BadWrite ax, 1 ReadFileLp
;Read one byte ;Place to store that byte ;Write operation
;Make sure disk isn’t full
MS-DOS, PC BIOS, and File I/O
BadRead:
AtEOF:
byte byte byte jmp
cr, lf ‘Error writing data to file, aborting operation’ cr,lf,0 short AtEOF
print byte byte byte
cr, lf ‘Error reading data from file, aborting ‘ ‘operation’,cr,lf,0
mov mov int mov mov int
bx, ah, 21h bx, ah, 21h
FileHandle1 3Eh
;Close the file
FileHandle2 3eh
;---------------------------------------------------------------PgmExit: MainPgm cseg
ExitPgm endp ends
dseg
segment
Filename OutFileName FileHandle1 FileHandle2 Buffer Position
byte byte word word byte word
dseg
ends
sseg stk sseg
segment word ends
byte stack ‘stack’ 0ffh dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?)
byte public ‘data’ ‘ucconvrt.asm’,0 ‘output.txt’,0 ? ? ? 0
;Filename to convert ;Output filename
MainPgm
13.3.10 Blocked File I/O The examples in the previous section suffer from a major drawback, they are extremely slow. The performance problems with the code above are entirely due to DOS. Making a DOS call is not, shall we say, the fastest operation in the world. Calling DOS every time we want to read or write a single character from/to a file will bring the system to its knees. As it turns out, it doesn’t take (practically) any more time to have DOS read or write two characters than it does to read or write one character. Since the amount of time we (usually) spend processing the data is negligible compared to the amount of time DOS takes to return or write the data, reading two characters at a time will essentially double the speed of the program. If reading two characters doubles the processing speed, how about reading four characters? Sure enough, it almost quadruples the processing speed. Likewise processing ten characters at a time almost increases the processing speed by an order of magnitude. Alas, this progression doesn’t continue forever. There comes a point of diminishing returns- when it takes far too much memory to justify a (very) small improvement in performance (keeping in mind that reading 64K in a single operation requires a 64K memory buffer to hold the data). A good compromise is 256 or 512 bytes. Reading more data doesn’t really improve the performance much, yet a 256 or 512 byte buffer is easier to deal with than larger buffers. Reading data in groups or blocks is called blocked I/O. Blocked I/O is often one to two orders of magnitude faster than single character I/O, so obviously you should use blocked I/O whenever possible.
Page 737
Chapter 13 There is one minor drawback to blocked I/O-- it’s a little more complex to program than single character I/O. Consider the example presented in the section on the DOS read command: Example: This example opens a file and reads it to the EOF mov mov lea
ah, 3dh al, 0 dx, Filename
;Open the file ;Open for reading ;Presume DS points at
int jc mov
21h BadOpen FHndl, ax
; segment
mov lea mov mov int jc cmp jne mov putc jmp
ah,3fh dx, Buffer cx, 1 bx, FHndl 21h ReadError ax, cx EOF al, Buffer
;Read data from the file ;Address of data buffer ;Read one byte ;Get file handle value
mov mov int jc
bx, FHndl ah, 3eh 21h CloseError
filename
LP:
EOF:
LP
;Save file handle
;EOF reached? ;Get character read ;Print it (IOSHELL call) ;Read next byte ;Close file
There isn’t much to this program at all. Now consider the same example rewritten to use blocked I/O: Example: This example opens a file and reads it to the EOF using blocked I/O mov mov lea
ah, 3dh al, 0 dx, Filename
;Open the file ;Open for reading ;Presume DS points at
int jc mov
21h BadOpen FHndl, ax
; segment
mov lea mov mov int jc cmp jne mov mov putc inc loop jmp
ah,3fh dx, Buffer cx, 256 bx, FHndl 21h ReadError ax, cx;EOF reached? EOF si, 0 al, Buffer[si]
;Read data from the file ;Address of data buffer ;Read 256 bytes ;Get file handle value
filename
LP:
PrtLp:
si PrtLp LP
;Save file handle
;Note: CX=256 at this point. ;Get character read ;Print it
;Read next block
; Note, just because the number of bytes read doesn’t equal 256, ; don’t get the idea we’re through, there could be up to 255 bytes ; in the buffer still waiting to be processed. EOF:
Finis:
EOF2:
Page 738
mov jcxz mov mov putc inc loop
cx, ax EOF2 ;If CX is zero, we’re really done. si, 0 ;Process the last block of data read al, Buffer[si]; from the file which contains ; 1..255 bytes of valid data. si Finis
mov mov
bx, FHndl ah, 3eh ;Close file
MS-DOS, PC BIOS, and File I/O int jc
21h CloseError
This example demonstrates one major hassle with blocked I/O – when you reach the end of file, you haven’t necessarily processed all of the data in the file. If the block size is 256 and there are 255 bytes left in the file, DOS will return an EOF condition (the number of bytes read don’t match the request). In this case, we’ve still got to process the characters that were read. The code above does this in a rather straight-forward manner, using a second loop to finish up when the EOF is reached. You’ve probably noticed that the two print loops are virtually identical. This program can be reduced in size somewhat using the following code which is only a little more complex: Example: This example opens a file and reads it to the EOF using blocked I/O mov mov lea
ah, 3dh al, 0 dx, Filename
;Open the file ;Open for reading ;Presume DS points at
int jc mov
21h BadOpen FHndl, ax
; segment.
mov lea mov mov int jc mov mov jcxz mov mov putc inc loop cmp je
ah,3fh dx, Buffer cx, 256 bx, FHndl 21h ReadError bx, ax cx, ax EOF si, 0 al, Buffer[si]
;Read data from the file ;Address of data buffer ;Read 256 bytes ;Get file handle value
mov mov int jc
bx, FHndl ah, 3eh 21h CloseError
filename
LP:
PrtLp:
EOF:
si PrtLp bx, 256 LP
;Save file handle
;Save for later
;Note: CX=256 at this point. ;Get character read ;Print it
;Reach EOF yet?
;Close file
Blocked I/O works best on sequential files. That is, those files opened only for reading or writing (no seeking). When dealing with random access files, you should read or write whole records at one time using the DOS read/write commands to process the whole record. This is still considerably faster than manipulating the data one byte at a time.
13.3.11 The Program Segment Prefix (PSP) When a program is loaded into memory for execution, DOS first builds up a program segment prefix immediately before the program is loaded into memory. This PSP contains lots of information, some of it useful, some of it obsolete. Understanding the layout of the PSP is essential for programmers designing assembly language programs. The PSP is 256 bytes long and contains the following information:
Offset 0 2 4 5 0Ah
Length 2 2 1 5 4
Description An INT 20h instruction is stored here Program ending address Unused, reserved by DOS Call to DOS function dispatcher Address of program termination code Page 739
Chapter 13
0Eh 12h 16h 2Ch 2Eh 50h 53h 5Ch 6Ch 80h 81h
4 4 22 2 34 3 9 16 20 1 127
Address of break handler routine Address of critical error handler routine Reserved for use by DOS Segment address of environment area Reserved by DOS INT 21h, RETF instructions Reserved by DOS Default FCB #1 Default FCB #2 Length of command line string Command line string
Note: locations 80h..FFh are used for the default DTA. Most of the information in the PSP is of little use to a modern MS-DOS assembly language program. Buried in the PSP, however, are a couple of gems that are worth knowing about. Just for completeness, however, we’ll take a look at all of the fields in the PSP. The first field in the PSP contains an int 20h instruction. Int 20h is an obsolete mechanism used to terminate program execution. Back in the early days of DOS v1.0, your program would execute a jmp to this location in order to terminate. Nowadays, of course, we have DOS function 4Ch which is much easier (and safer) than jumping to location zero in the PSP. Therefore, this field is obsolete. Field number two contains a value which points at the last paragraph allocated to your program By subtracting the address of the PSP from this value, you can determine the amount of memory allocated to your program (and quit if there is insufficient memory available). The third field is the first of many “holes” left in the PSP by Microsoft. Why they’re here is anyone’s guess. The fourth field is a call to the DOS function dispatcher. The purpose of this (now obsolete) DOS calling mechanism was to allow some additional compatibility with CP/M-80 programs. For modern DOS programs, there is absolutely no need to worry about this field. The next three fields are used to store special addresses during the execution of a program. These fields contain the default terminate vector, break vector, and critical error handler vectors. These are the values normally stored in the interrupt vectors for int 22h, int 23h, and int 24h. By storing a copy of the values in the vectors for these interrupts, you can change these vectors so that they point into your own code. When your program terminates, DOS restores those three vectors from these three fields in the PSP. For more details on these interrupt vectors, please consult the DOS technical reference manual. The eighth field in the PSP record is another reserved field, currently unavailable for use by your programs. The ninth field is another real gem. It’s the address of the environment strings area. This is a two-byte pointer which contains the segment address of the environment storage area. The environment strings always begin with an offset zero within this segment. The environment string area consists of a sequence of zero-terminated strings. It uses the following format: string1 0 string2 0 string3 0 ... 0 stringn 0 0 That is, the environment area consists of a list of zero terminated strings, the list itself being terminated by a string of length zero (i.e., a zero all by itself, or two zeros in a row, however you want to look at it). Strings are (usually) placed in the environment area via DOS commands like PATH, SET, etc. Generally, a string in the environment area takes the form name = parameters
Page 740
MS-DOS, PC BIOS, and File I/O For example, the “SET IPATH=C:\ASSEMBLY\INCLUDE” command copies the string “IPATH=C:\ASSEMBLY\INCLUDE” into the environment string storage area. Many languages scan the environment storage area to find default filename paths and other pieces of default information set up by DOS. Your programs can take advantage of this as well. The next field in the PSP is another block of reserved storage, currently undefined by DOS. The 11th field in the PSP is another call to the DOS function dispatcher. Why this call exists (when the one at location 5 in the PSP already exists and nobody really uses either mechanism to call DOS) is an interesting question. In general, this field should be ignored by your programs. The 12th field is another block of unused bytes in the PSP which should be ignored. The 13th and 14th fields in the PSP are the default FCBs (File Control Blocks). File control blocks are another archaic data structure carried over from CP/M-80. FCBs are used only with the obsolete DOS v1.0 file handling routines, so they are of little interest to us. We’ll ignore these FCBs in the PSP. Locations 80h through the end of the PSP contain a very important piece of information- the command line parameters typed on the DOS command line along with your program’s name. If the following is typed on the DOS command line: MYPGM parameter1, parameter2
the following is stored into the command line parameter field: 23, “ parameter1, parameter2”, 0Dh
Location 80h contains 2310, the length of the parameters following the program name. Locations 81h through 97h contain the characters making up the parameter string. Location 98h contains a carriage return. Notice that the carriage return character is not figured into the length of the command line string. Processing the command line string is such an important facet of assembly language programming that this process will be discussed in detail in the next section. Locations 80h..FFh in the PSP also comprise the default DTA. Therefore, if you don’t use DOS function 1Ah to change the DTA and you execute a FIND FIRST FILE, the filename information will be stored starting at location 80h in the PSP. One important detail we’ve omitted until now is exactly how you access data in the PSP. Although the PSP is loaded into memory immediately before your program, that doesn’t necessarily mean that it appears 100h bytes before your code. Your data segments may have been loaded into memory before your code segments, thereby invalidating this method of locating the PSP. The segment address of the PSP is passed to your program in the ds register. To store the PSP address away in your data segment, your programs should begin with the following code: push mov mov mov pop
ds ax, seg DSEG ds, ax es, ax PSP
;Save PSP value ;Point DS and ES at our data ; segment. ;Store PSP value into “PSP” ; variable.
. . .
Another way to obtain the PSP address, in DOS 5.0 and later, is to make a DOS call. If you load ah with 51h and execute an int 21h instruction, MS-DOS will return the segment address of the current PSP in the bx register. There are lots of tricky things you can do with the data in the PSP. Peter Norton’s Programmer’s Guide to the IBM PC lists all kinds of tricks. Such operations won’t be discussed here because they’re a little beyond the scope of this manual.
Page 741
Chapter 13
13.3.12 Accessing Command Line Parameters Most programs like MASM and LINK allow you to specify command line parameters when the program is executed. For example, by typing ML MYPGM.ASM
you can instruct MASM to assemble MYPGM without any further intervention from the keyboard. “MYPGM.ASM;” is a good example of a command line parameter. When DOS’ COMMAND.COM command interpreter parses your command line, it copies most of the text following the program name to location 80h in the PSP as described in the previous section. For example, the command line above will store the following at PSP:80h 11, “ MYPGM.ASM”, 0Dh
The text stored in the command line tail storage area in the PSP is usually an exact copy of the data appearing on the command line. There are, however, a couple of exceptions. First of all, I/O redirection parameters are not stored in the input buffer. Neither are command tails following the pipe operator (“|”). The other thing appearing on the command line which is absent from the data at PSP:80h is the program name. This is rather unfortunate, since having the program name available would allow you to determine the directory containing the program. Nevertheless, there is lots of useful information present on the command line. The information on the command line can be used for almost any purpose you see fit. However, most programs expect two types of parameters in the command line parameter buffer-- filenames and switches. The purpose of a filename is rather obvious, it allows a program to access a file without having to prompt the user for the filename. Switches, on the other hand, are arbitrary parameters to the program. By convention, switches are preceded by a slash or hyphen on the command line. Figuring out what to do with the information on the command line is called parsing the command line. Clearly, if your programs are to manipulate data on the command line, you’ve got to parse the command line within your code. Before a command line can be parsed, each item on the command line has to be separated out apart from the others. That is, each word (or more properly, lexeme7) has to be identified in the command line. Separation of lexemes on a command line is relatively easy, all you’ve got to do is look for sequences of delimiters on the command line. Delimiters are special symbols used to separate tokens on the command line. DOS supports six different delimiter characters: space, comma, semicolon, equal sign, tab, or carriage return. Generally, any number of delimiter characters may appear between two tokens on a command line. Therefore, all such occurrences must be skipped when scanning the command line. The following assembly language code scans the entire command line and prints all of the tokens that appear thereon: include stdlib.a includelib stdlib.lib cseg
segment assume
byte public ‘CODE’ cs:cseg, ds:dseg, es:dseg, ss:sseg
; Equates into command lineCmdLnLen CmdLn
equ equ
byte ptr es:[80h] byte ptr es:[81h]
tab
equ
09h
MainPgm
proc
far
;Command line length ;Command line data
; Properly set up the segment registers: 7. Many programmers use the term “token” rather than lexeme. Technically, a token is a different entity.
Page 742
MS-DOS, PC BIOS, and File I/O push mov mov pop
ds ax, seg dseg ds, ax PSP
;Save PSP
;--------------------------------------------------------------print byte byte
PrintLoop:
PrtLoop2:
EndOfToken:
MainPgm ; ; ; ;
mov lea print byte call mov call jz putc inc jmp
cr,lf,’Item: ‘,0 SkipDelimiters al, es:[bx] TestDelimiter EndOfToken
cmp jne
al, cr PrintLoop
print byte byte ExitPgm endp
es, PSP bx, CmdLn
bx PrtLoop2
;Point ES at PSP ;Point at command line
;Skip over leading delimiters ;Get next character ;Is it a delimiter? ;Quit this loop if it is ;Print char if not. ;Move on to next character ;Carriage return? ;Repeat if not end of line
cr,lf,lf ‘End of command line’,cr,lf,lf,0
The following subroutine sets the zero flag if the character in the AL register is one of DOS’ six delimiter characters, otherwise the zero flag is returned clear. This allows us to use the JE/JNE instructions afterwards to test for a delimiter.
TestDelimiter
ItsOne: TestDelimiter ; ; ; ;
cr,lf ‘Items on this line:’,cr,lf,lf,0
proc cmp jz cmp jz cmp jz cmp jz cmp jz cmp ret endp
near al, ‘ ‘ ItsOne al,’,’ ItsOne al,Tab ItsOne al,’;’ ItsOne al,’=’ ItsOne al, cr
SkipDelimiters skips over leading delimiters on the command line. It does not, however, skip the carriage return at the end of a line since this character is used as the terminator in the main program.
SkipDelimiters
QuitSD: SkipDelimiters
proc dec inc mov cmp jz call jz ret endp
cseg
ends
SDLoop:
near bx bx al, es:[bx] al, 0dh QuitSD TestDelimiter SDLoop
dseg
segment
byte public ‘data’
PSP dseg
word ends
?
;To offset INC BX below ;Move on to next character. ;Get next character ;Don’t skip if CR. ;See if it’s some other ; delimiter and repeat.
;Program segment prefix
Page 743
Chapter 13 sseg stk sseg
segment word ends
byte stack ‘stack’ 0ffh dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?) MainPgm
Once you can scan the command line (that is, separate out the lexemes), the next step is to parse it. For most programs, parsing the command line is an extremely trivial process. If the program accepts only a single filename, all you’ve got to do is grab the first lexeme on the command line, slap a zero byte onto the end of it (perhaps moving it into your data segment), and use it as a filename. The following assembly language example modifies the hex dump routine presented earlier so that it gets its filename from the command line rather than hard-coding the filename into the program:
cseg
include includelib segment assume
stdlib.a stdlib.lib byte public 'CODE' cs:cseg, ds:dseg, es:dseg, ss:sseg
; Note CR and LF are already defined in STDLIB.A tab
equ
09h
MainPgm
proc
far
; Properly set up the segment registers: mov mov
ax, seg dseg es, ax
;Leave DS pointing at PSP
;--------------------------------------------------------------; ; First, parse the command line to get the filename: mov lea
si, 81h di, FileName
lodsb call je
TestDelimiter SkipDelimiters
;Pointer to command line ;Pointer to FileName buffer
SkipDelimiters: ;Get next character
; Assume that what follows is an actual filename GetFName:
dec lodsb cmp je call je stosb jmp
si
;Point at 1st char of name
al, 0dh GotName TestDelimiter GotName ;Save character in file name GetFName
; We're at the end of the filename, so zero-terminate it as ; required by DOS. GotName:
mov mov mov
byte ptr es:[di], 0 ax, es ds, ax
;Point DS at DSEG
; Now process the file
GoodOpen:
Page 744
mov mov lea int jnc print byte jmp
ah, 3dh al, 0 dx, Filename 21h GoodOpen
mov
FileHandle, ax
;Open file for reading ;File to open
'Cannot open file, aborting program...',cr,0 PgmExit ;Save file handle
MS-DOS, PC BIOS, and File I/O ReadFileLp:
NotNewLn:
BadRead:
AtEOF:
mov mov and jnz putcr mov xchg puth xchg puth print byte
Position, 0 al, byte ptr Position al, 0Fh NotNewLn
inc mov mov lea mov int jc cmp jnz mov puth mov putc jmp
Position bx, FileHandle cx, 1 dx, buffer ah, 3Fh 21h BadRead ax, 1 AtEOF al, Buffer
ax, Position al, ah
;Initialize file position ;Compute (Position MOD 16) ;Every 16 bytes start a line ;Print offset into file
al, ah
': ',0
al, ' '
;Increment character count ;Read one byte ;Place to store that byte ;Read operation
;Reached EOF? ;Get the character read and ; print it in hex ;Print a space between values
ReadFileLp
print byte byte byte
cr, lf 'Error reading data from file, aborting.' cr,lf,0
mov mov int
bx, FileHandle ah, 3Eh 21h
;Close the file
;--------------------------------------------------------------PgmExit: MainPgm
ExitPgm endp
TestDelimiter
near al, ' ' xit al, ',' xit al, Tab xit al, ';' xit al, '='
xit: TestDelimiter cseg
proc cmp je cmp je cmp je cmp je cmp ret endp ends
dseg
segment
byte public 'data'
PSP Filename FileHandle Buffer Position
word byte word byte word
? 64 dup (0) ? ? 0
dseg
ends
sseg stk sseg
segment word ends
byte stack 'stack' 0ffh dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends
para public 'zzzzzz' 16 dup (?)
;Filename to dump
Page 745
Chapter 13 end
MainPgm
The following example demonstrates several concepts dealing with command line parameters. This program copies one file to another. If the “/U” switch is supplied (somewhere) on the command line, all of the lower case characters in the file are converted to upper case before being written to the destination file. Another feature of this code is that it will prompt the user for any missing filenames, much like the MASM and LINK programs will prompt you for filename if you haven’t supplied any. include stdlib.a includelib stdlib.lib cseg
segment assume
byte public 'CODE' cs:cseg, ds:nothing, es:dseg, ss:sseg
; Note: The constants CR (0dh) and LF (0ah) appear within the ; stdlib.a include file. tab
equ
09h
MainPgm
proc
far
; Properly set up the segment registers: mov mov
ax, seg dseg es, ax
;Leave DS pointing at PSP
;--------------------------------------------------------------; First, parse the command line to get the filename: mov mov mov
es:GotName1, 0 es:GotName2, 0 es:ConvertLC,0
;Init flags that tell us if ; we’ve parsed the filenames ; and the “/U" switch.
; Okay, begin scanning and parsing the command line mov
si, 81h
lodsb call je
TestDelimiter SkipDelimiters
;Pointer to command line
SkipDelimiters: ;Get next character
; Determine if this is a filename or the /U switch cmp jnz
al, '/' MustBeFN
; See if it's "/U" herelodsb and cmp jnz lodsb cmp jz call jne
al, 5fh al, 'U' NotGoodSwitch al, cr GoodSwitch TestDelimiter NotGoodSwitch
;Convert "u" to "U"
;Make sure next char is ; a delimiter of some sort
; Okay, it's "/U" here. GoodSwitch:
mov dec jmp
es:ConvertLC, 1 si SkipDelimiters
;Convert LC to UC ;Back up in case it's CR ;Move on to next item.
; If a bad switch was found on the command line, print an error ; message and abortNotGoodSwitch: print byte byte byte jmp
cr,lf 'Illegal switch, only "/U" is allowed!',cr,lf 'Aborting program execution.',cr,lf,0 PgmExit
; If it's not a switch, assume that it's a valid filename and ; handle it down here-
Page 746
MS-DOS, PC BIOS, and File I/O MustBeFN:
cmp je
al, cr EndOfCmdLn
;See if at end of cmd line
; See if it's filename one, two, or if too many filenames have been ; specifiedcmp jz cmp jz
es:GotName1, 0 Is1stName es:GotName2, 0 Is2ndName
; More than two filenames have been entered, print an error message ; and abort. print byte byte byte jmp
cr,lf 'Too many filenames specified.',cr,lf 'Program aborting...',cr,lf,lf,0 PgmExit
; Jump down here if this is the first filename to be processedIs1stName:
lea mov jmp
di, FileName1 es:GotName1, 1 ProcessName
Is2ndName:
lea mov
di, FileName2 es:GotName2, 1
ProcessName: stosb lodsb cmp je call jne NameIsDone:
mov stosb dec jmp
;Store away character in name ;Get next char from cmd line al, cr NameIsDone TestDelimiter ProcessName al, 0
;Zero terminate filename
si SkipDelimiters
;Point back at previous char ;Try again.
; When the end of the command line is reached, come down here and ; see if both filenames were specified. assume EndOfCmdLn:
mov mov
ds:dseg ax, es ds, ax
;Point DS at DSEG
; We're at the end of the filename, so zero-terminate it as ; required by DOS. GotName:
mov mov
ax, es ds, ax
;Point DS at DSEG
; See if the names were supplied on the command line. ; If not, prompt the user and read them from the keyboard
HasName1:
cmp jnz mov lea call
GotName1, 0 HasName1 al, '1' si, Filename1 GetName
cmp jnz mov lea call
GotName2, 0 HasName2 al, '2' si, FileName2 GetName
;Was filename #1 supplied? ;Filename #1 ;Get filename #1 ;Was filename #2 supplied? ;If not, read it from kbd.
; Okay, we've got the filenames, now open the files and copy the ; source file to the destination file. HasName2
mov mov lea
ah, 3dh al, 0 dx, Filename1
;Open file for reading ;File to open
Page 747
Chapter 13 int jnc
21h GoodOpen1
print byte jmp
'Cannot open file, aborting program...',cr,lf,0 PgmExit
; If the source file was opened successfully, save the file handle. GoodOpen1:
mov
FileHandle1, ax
;Save file handle
; Open (CREATE, actually) the second file here. mov mov lea int jnc
ah, 3ch cx, 0 dx, Filename2 21h GoodCreate
;Create file ;Standard attributes ;File to open
; Note: the following error code relies on the fact that DOS ; automatically closes any open source files when the program ; terminates.
GoodCreate:
print byte byte byte jmp
cr,lf 'Cannot create new file, aborting operation' cr,lf,lf,0 PgmExit
mov
FileHandle2, ax
;Save file handle
mov mov mov lea int jc mov
ah, 3Fh bx, FileHandle1 cx, 512 dx, buffer 21h BadRead bp, ax
;DOS read opcode ;Read from file #1 ;Read 512 bytes ;Buffer for storage
cmp jz
ConvertLC,0 NoConversion
; Now process the files CopyLoop:
;Save # of bytes read ;Conversion option active?
; Convert all LC in buffer to UCmov lea mov
cx, 512 si, Buffer di, si
ConvertLC2UC:
NoConv:
lodsb cmp jb cmp ja and stosb loop
al, 'a' NoConv al, 'z' NoConv al, 5fh ConvertLC2UC
NoConversion:
jDiskFull:
mov mov mov lea int jc cmp jnz cmp jz jmp jmp
; Various error messages: BadRead:
Page 748
print
ah, 40h bx, FileHandle2 cx, bp dx, buffer 21h BadWrite ax, bp jDiskFull bp, 512 CopyLoop AtEOF DiskFull
;DOS write opcode ;Write to file #2 ;Write however many bytes ;Buffer for storage
;Did we write all of the ; bytes? ;Were there 512 bytes read?
MS-DOS, PC BIOS, and File I/O
BadWrite:
DiskFull:
AtEOF:
byte byte byte jmp
cr,lf 'Error while reading source file, aborting ' 'operation.',cr,lf,0 AtEOF
print byte byte byte jmp
cr,lf 'Error while writing destination file, aborting’ ' operation.',cr,lf,0 AtEOF
print byte byte
cr,lf 'Error, disk full.
mov mov int mov mov int
bx, ah, 21h bx, ah, 21h
PgmExit: MainPgm
ExitPgm endp
TestDelimiter
proc cmp je cmp je cmp je cmp je cmp ret endp
xit: TestDelimiter
Aborting operation.',cr,lf,0
FileHandle1 3Eh
;Close the first file
FileHandle2 3Eh
;Close the second file
near al, ' ' xit al, ',' xit al, Tab xit al, ';' xit al, '='
; GetName- Reads a filename from the keyboard. On entry, AL ; contains the filename number and DI points at the buffer in ES ; where the zero-terminated filename must be stored. GetName
GetName cseg
proc print byte putc mov putc gets ret endp ends
near
dseg
segment
byte public 'data'
PSP Filename1 Filename2 FileHandle1 FileHandle2 GotName1 GotName2 ConvertLC Buffer
word byte byte word word byte byte byte byte
? 128 dup (?);Source filename 128 dup (?);Destination filename ? ? ? ? ? 512 dup (?)
dseg
ends
sseg stk sseg
segment word ends
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
'Enter filename #',0 al, ':'
byte stack 'stack' 0ffh dup (?) para public 'zzzzzz' 16 dup (?) MainPgm
Page 749
Chapter 13 As you can see, there is more effort expended processing the command line parameters than actually copying the files!
13.3.13 ARGC and ARGV The UCR Standard Library provides two routines, argc and argv, which provide easy access to command line parameters. Argc (argument count) returns the number of items on the command line. Argv (argument vector) returns a pointer to a specific item in the command line. These routines break up the command line into lexemes using the standard delimiters. As per MS-DOS convention, argc and argv treat any string surrounded by quotation marks on the command line as a single command line item. Argc will return in cx the number of command line items. Since MS-DOS does not include the program name on the command line, this count does not include the program name either. Furthermore, redirection operands (“>filename” and “
Example: The following code echoes the command line parameters to the screen. include stdlib.a includelib stdlib.lib dseg
segment
para public 'data'
ArgCnt
word
0
dseg
ends
cseg
segment assume
Main
proc mov mov mov
para public 'code' cs:cseg, ds:dseg ax, dseg ds, ax es, ax
; Must call the memory manager initialization routine if you use ; any routine which calls malloc! ARGV is a good example of a ; routine which calls malloc. meminit
PrintCmds:
argc jcxz mov printf byte dword mov argv puts inc loop putcr
Quit:
Page 750
ExitPgm
Quit ArgCnt, 1
;Get the command line arg count. ;Quit if no cmd ln args. ;Init Cmd Ln count. ;Print the item.
"\n%2d: ",0 ArgCnt ax, ArgCnt
;Get the next command line guy.
ArgCnt PrintCmds
;Move on to next arg. ;Repeat for each arg. ;DOS macro to quit program.
MS-DOS, PC BIOS, and File I/O Main cseg
endp ends
sseg stk sseg
segment byte ends
para stack 'stack' 1024 dup ("stack ")
;zzzzzzseg is required by the standard library routines. zzzzzzseg LastBytes zzzzzzseg
13.4
segment byte ends end
para public 'zzzzzz' 16 dup (?) Main
UCR Standard Library File I/O Routines Although MS-DOS’ file I/O facilities are not too bad, the UCR Standard Library provides a file I/O package which makes blocked sequential I/O as easy as character at a time file I/O. Furthermore, with a tiny amount of effort, you can use all the StdLib routines like printf, print, puti, puth, putc, getc, gets, etc., when performing file I/O. This greatly simplifies text file operations in assembly language. Note that record oriented, or binary I/O, is probably best left to pure DOS. any time you want to do random access within a file. The Standard Library routines really only support sequential text I/O. Nevertheless, this is the most common form of file I/O around, so the Standard Library routines are quite useful indeed. The UCR Standard Library provides eight file I/O routines: fopen, fcreate, fclose, fgetc, fread, fputc, and fwrite. Fgetc and fputc perform character at a time I/O, fread and fwrite let you read and write blocks of data, the other four functions perform the obvious DOS operations. The UCR Standard Library uses a special file variable to keep track of file operations. There is a special record type, FileVar, declared in stdlib.a8. When using the StdLib file I/O routines you must create a variable of type FileVar for every file you need open at the same time. This is very easy, just use a definition of the form: MyFileVar
FileVar
{}
Please note that a Standard Library file variable is not the same thing as a DOS file handle. It is a structure which contains the DOS file handle, a buffer (for blocked I/O), and various index and status variables. The internal structure of this type is of no interest (remember data encapsulation!) except to the implementor of the file routines. You will pass the address of this file variable to the various Standard Library file I/O routines.
13.4.1
Fopen Entry parameters: axdx:sies:di-
Exit parameters:
File open mode 0- File opened for reading 1- File opened for writing Points at a zero terminated string containing the filename. Points at a StdLib file variable.
If the carry is set, ax contains the returned DOS error code (see DOS open function).
Fopen opens a sequential text file for reading or writing. Unlike DOS, you cannot open a file for reading and writing. Furthermore, this is a sequential text file which does not support random access. Note that the file must exist or fopen will return an error. This is even true when you open the file for writing.
8. Actually, it’s declared in file.a. Stdlib.a includes file.a so this definition appears inside stdlib.a as well.
Page 751
Chapter 13 Note that if you open a file for writing and that file already exists, any data written to the file will overwrite the existing data. When you close the file, any data appearing in the file after the data you wrote will still be there. If you want to erase the existing file before writing data to it, use the fcreate function.
13.4.2
Fcreate Entry parameters: dx:sies:diExit parameters:
Points at a zero terminated string containing the filename. Points at a StdLib file variable.
If the carry is set, ax contains the returned DOS error code (see DOS open function).
Fcreate creates a new file and opens it for writing. If the file already exists, fcreate deletes the existing file and creates a new one. It initializes the file variable for output but is otherwise identical to the fopen call.
13.4.3
Fclose Entry parameters: es:di- Points at a StdLib file variable. Exit parameters: If the carry is set, ax contains the returned DOS error code (see DOS open function). Fclose closes a file and updates any internal housekeeping information. It is very important that you close all files opened with fopen or fcreate using this call. When making DOS file calls, if you forget to close a file DOS will automatically do that for you when your program terminates. However, the StdLib routines cache up data in internal buffers. the fclose call automatically flushes these buffers to disk. If you exit your program without calling fclose, you may lose some data written to the file but not yet transferred from the internal buffer to the disk.
If you are in an environment where it is possible for someone to abort the program without giving you a chance to close the file, you should call the fflush routines (see the next section) on a regular basis to avoid losing too much data.
13.4.4
Fflush Entry parameters: es:di- Points at a StdLib file variable. Exit parameters: If the carry is set, ax contains the returned DOS error code (see DOS open function). This routine immediately writes any data in the internal file buffer to disk. Note that you should only use this routine in conjunction with files opened for writing (or opened by fcreate). If you write data to a file and then need to leave the file open, but inactive, for some time period, you should perform a flush operation in case the program terminates abnormally.
13.4.5
Fgetc Entry parameters: es:di- Points at a StdLib file variable. Exit parameters: If the carry flag is clear, al contains the character read from the file. If the carry is set, ax contains the returned DOS error code (see DOS open function). ax will contain zero if you attempt to read beyond the end of file. Fgetc reads a single character from the file and returns this character in the al register. Unlike calls to DOS, single character I/O using fgetc is relatively fast since the StdLib routines use blocked I/O. Of course, multiple calls to fgetc will never be faster than a call to fread (see the next section), but the performance is not too bad.
Page 752
MS-DOS, PC BIOS, and File I/O Fgetc is very flexible. As you will see in a little bit, you may redirect the StdLib input routines to read their data from a file using fgetc. This lets you use the higher level routines like gets and getsm when reading data from a file.
13.4.6
Fread Entry parameters: es:di- Points at a StdLib file variable. dx:siPoints at an input data buffer. cxContains a byte count. Exit parameters: If the carry flag is clear, ax contains the actual number of bytes read from the file. If the carry is set, ax contains the returned DOS error code (see DOS open function). Fread is very similar to the DOS read command. It lets you read a block of bytes, rather than just one byte, from a file. Note that if all you are doing is reading a block of bytes from a file, the DOS call is slightly more efficient than fread. However, if you have a mixture of single byte reads and multi-byte reads, the combination of fread and fgetc work very well.
As with the DOS read operation, if the byte count returned in ax does not match the value passed in the cx register, then you’ve read the remaining bytes in the file. When this occurs, the next call to fread or fgetc will return an EOF error (carry will be set and ax will contain zero). Note that fread does not return EOF unless there were zero bytes read from the file.
13.4.7
Fputc Entry parameters: es:dialExit parameters:
Points at a StdLib file variable. Contains the character to write to the file.
If the carry is set, ax contains the returned DOS error code (see DOS open function).
Fputc writes a single character (in al) to the file specified by the file variable whose address is in es:di. This call simply adds the character in al to an internal buffer (part of the file variable) until the buffer is full. Whenever the buffer is filled or you call fflush (or close the file with fclose), the file I/O routines write the data to disk.
13.4.8
Fwrite Entry parameters: es:di- Points at a StdLib file variable. dx:siPoints at an output data buffer. cxContains a byte count. Exit parameters: If the carry flag is clear, ax contains the actual number of bytes written to the file. If the carry is set, ax contains the returned DOS error code (see DOS open function). Like fread, fwrite works on blocks of bytes. It lets you write a block of bytes to a file opened for writing with fopen or fcreate.
13.4.9
Redirecting I/O Through the StdLib File I/O Routines The Standard Library provides very few file I/O routines. Fputc and fwrite are the only two output routines, for example. The “C” programming language standard library (on which the UCR Standard Library is based) provides many routines like fprintf, fputs, fscanf, etc. None of these are necessary in the UCR Standard Library because the UCR library provides an I/O redirection mechanism that lets you reuse all existing I/O routines to perform file I/O. Page 753
Chapter 13 The UCR Standard Library putc routine consists of a single jmp instruction. This instruction transfers control to some actual output routine via an indirect address internal to the putc code. Normally, this pointer variable points at a piece of code which writes the character in the al register to the DOS standard output device. However, the Standard Library also provides four routines which let you manipulate this indirect pointer. By changing this pointer you can redirect the output from its current routine to a routine of your choosing. All Standard Library output routines (e.g., printf, puti, puth, puts) call putc to output individual characters. Therefore, redirecting the putc routine affects all the output routines. Likewise, the getc routine is nothing more than an indirect jmp whose pointer variable normally points at a piece of code which reads data from the DOS standard input. Since all Standard Library input routines call the getc function to read each character you can redirect file input in a manner identical to file output. The Standard Library GetOutAdrs, SetOutAdrs, PushOutAdrs, and PopOutAdrs are the four main routines which manipulate the output redirection pointer. GetOutAdrs returns the address of the current output routine in the es:di registers. Conversely, SetOutAdrs expects you to pass the address of a new output routine in the es:di registers and it stores this address into the output pointer. PushOutAdrs and PopOutAdrs push and pop the pointer on an internal stack. These do not use the 80x86’s hardware stack. You are limited to a small number of pushes and pops. Generally, you shouldn’t count on being able to push more than four of these addresses onto the internal stack without overflowing it. GetInAdrs, SetInAdrs, PushInAdrs, and PopInAdrs are the complementary routines for the input vector. They let you manipulate the input routine pointer. Note that the stack for PushInAdrs/PopInAdrs is not the same as the stack for PushOutAdrs/PopOutAdrs. Pushes and pops to these two stacks are independent of one another.
Normally, the output pointer (which we will henceforth refer to as the output hook) points at the Standard Library routine PutcStdOut9. Therefore, you can return the output hook to its normal initialization state at any time by executing the statements10: mov di, seg SL_PutcStdOut mov es, di mov di, offset SL_PutcStdOut SetOutAdrs
The PutcStdOut routine writes the character in the al register to the DOS standard output, which itself might be redirected to some file or device (using the “>” DOS redirection operator). If you want to make sure your output is going to the video display, you can always call the PutcBIOS routine which calls the BIOS directly to output a character11. You can force all Standard Library output to the standard error device using a code sequence like: mov di, seg SL_PutcBIOS mov es, di mov di, offset SL_PutcBIOS SetOutAdrs
Generally, you would not simply blast the output hook by storing a pointer to your routine over the top of whatever pointer was there and then restoring the hook to PutcStdOut upon completion. Who knows if the hook was pointing at PutcStdOut in the first place? The best solution is to use the Standard Library PushOutAdrs and PopOutAdrs routines to preserve and restore the previous hook. The following code demonstrates a gentler way of modifying the output hook:
9. Actually, the routine is SL_PutcStdOut. The Standard Library macro by which you would normally call this routine is PutcStdOut. 10. If you do not have any calls to PutcStdOut in your program, you will also need to add the statement “externdef SL_PutcStdOut:far” to your program. 11. It is possible to redirect even the BIOS output, but this is rarely done and not easy to do from DOS.
Page 754
MS-DOS, PC BIOS, and File I/O PushOutAdrs ;Save current output routine. mov di, seg Output_Routine mov es, di mov di, offset Output_Routine SetOutAdrs PopOutAdrs
;Restore previous output routine.
Handle input in a similar fashion using the corresponding input hook access routines and the SL_GetcStdOut and SL_GetcBIOS routines. Always keep in mind that there are a limited number of entries on the input and output hook stacks so what how many items you push onto these stacks without popping anything off. To redirect output to a file (or redirect input from a file) you must first write a short routine which writes (reads) a single character from (to) a file. This is very easy. The code for a subroutine to output data to a file described by file variable OutputFile is ToOutput
proc push push
far es di
; Load ES:DI with the address of the OutputFile variable. This ; code assumes OutputFile is of type FileVar, not a pointer to ; a variable of type FileVar. mov mov mov
di, seg OutputFile es, di di, offset OutputFile
; Output the character in AL to the file described by “OutputFile” fputc
ToOutput
pop pop ret endp
di es
Now with only one additional piece of code, you can begin writing data to an output file using all the Standard Library output routines. That is a short piece of code which redirects the output hook to the “ToOutput” routine above: SetOutFile
proc push push
es di
PushOutAdrs mov di, seg ToOutput mov es, di mov di, offset ToOutput SetOutAdrs
SetOutFile
pop pop ret endp
;Save current output hook.
di es
There is no need for a separate routine to restore the output hook to its previous value; PopOutAdrs will handle that task by itself.
13.4.10 A File I/O Example The following piece of code puts everything together from the last several sections. This is a short program which adds line numbers to a text file. This program expects two command line parameters: an input file and an output file. It copies the input file to the output file while appending line numbers to the beginning of each line in the output file. This code demonstrates the use of argc, argv, the Standard Library file I/O routines, and I/O redirection. Page 755
Chapter 13 ; This program copies the input file to the output file and adds ; line numbers while it is copying the file. include stdlib.a includelib stdlib.lib dseg
segment
para public 'data'
ArgCnt LineNumber DOSErrorCode InFile OutFile InputLine OutputFile InputFile
word word word dword dword byte FileVar FileVar
0 0 0 ? ? 1024 dup (0) {} {}
dseg
ends
cseg
segment assume
;Ptr to Input file name. ;Ptr to Output file name ;Input/Output data buffer.
para public 'code' cs:cseg, ds:dseg
; ReadLn- Reads a line of text from the input file and stores the ; data into the InputLine buffer: ReadLn
proc push push push push push
ds es di si ax
mov mov mov lesi
si, dseg ds, si si, offset InputLine InputFile
fgetc jc cmp je mov inc cmp jne dec cmp jne dec
RdLnDone ;If some bizzarre error. ah, 0 ;Check for EOF. RdLnDone ;Note:carry is set. ds:[si], al si al, lf ;At EOLN? GetLnLp si ;Back up before LF. byte ptr ds:[si-1], cr ;CR before LF? RdLnDone si ;If so, skip it too.
GetLnLp:
RdLnDone:
ReadLn
mov pop pop pop pop pop ret endp
byte ptr ds:[si], 0 ;Zero terminate. ax si di es ds
; MyOutput- Writes the single character in AL to the output file. MyOutput
MyOutput
proc push push lesi fputc pop pop ret endp
far es di OutputFile di es
; The main program which does all the work: Main
Page 756
proc
MS-DOS, PC BIOS, and File I/O mov mov mov
ax, dseg ds, ax es, ax
; Must call the memory manager initialization routine if you use ; any routine which calls malloc! ARGV is a good example of a ; routine calls malloc. meminit ; We expect this program to be called as follows: ; fileio file1, file2 ; anything else is an error.
BadParms:
argc cmp je print byte jmp
cx, 2 Got2Parms
;Must have two parameters.
"Usage: FILEIO infile, outfile",cr,lf,0 Quit
; Okay, we've got two parameters, hopefully they're valid names. ; Get copies of the filenames and store away the pointers to them. Got2Parms:
mov argv mov mov mov argv mov mov
ax, 1
;Get the input filename
word ptr InFile, di word ptr InFile+2, es ax, 2
;Get the output filename
word ptr OutFile, di word ptr OutFile+2, es
; Output the filenames to the standard output device printf byte byte dword
"Input file: %^s\n" "Output file: %^s\n",0 InFile, OutFile
; Open the input file: lesi mov mov mov fopen jnc mov printf byte dword jmp
InputFile dx, word ptr InFile+2 si, word ptr InFile ax, 0 GoodOpen DOSErrorCode, ax "Could not open input file, DOS: %d\n",0 DOSErrorCode Quit
; Create a new file for output: GoodOpen:
lesi mov mov fcreate jnc mov printf byte dword jmp
OutputFile dx, word ptr OutFile+2 si, word ptr OutFile GoodCreate DOSErrorCode, AX "Could not open output file, DOS: %d\n",0 DOSErrorCode Quit
; Okay, save the output hook and redirect the output. GoodCreate:
PushOutAdrs lesi MyOutput SetOutAdrs
WhlNotEOF:
inc
LineNumber
; Okay, read the input line from the user:
Page 757
Chapter 13 call jc
ReadLn BadInput
; Okay, redirect the output to our output file and write the last ; line read prefixed with a line number: printf byte dword jmp BadInput:
"%4d: %s\n",0 LineNumber, InputLine WhlNotEOF
push ax ;Save error code. PopOutAdrs ;Restore output hook. pop ax ;Retrieve error code. test ax, ax ;EOF error? (AX = 0) jz CloseFiles mov DOSErrorCode, ax printf byte "Input error, DOS: %d\n",0 dword LineNumber
; Okay, close the files and quit: CloseFiles:
13.5
lesi fclose lesi fclose
Quit: Main cseg
ExitPgm endp ends
sseg stk sseg
segment byte ends
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
OutputFile InputFile ;DOS macro to quit program.
para stack 'stack' 1024 dup ("stack ") para public 'zzzzzz' 16 dup (?) Main
Sample Program If you want to use the Standard Library’s output routines (putc, print, printf, etc.) to output data to a file, you can do so by manually redirecting the output before and after each call to these routines. Unfortunately, this can be a lot of work if you mix interactive I/O with file I/O. The following program presents several macros that simplify this task for you. ; ; ; ; ; ;
FileMacs.asm This program presents a set of macros that make file I/O with the Standard Library even easier to do. The main program writes a multiplication table to the file "MyFile.txt". .xlist include stdlib.a includelib stdlib.lib .list
Page 758
dseg
segment
para public 'data'
CurOutput
dword
?
Filename
byte
"MyFile.txt",0
i j
word word
? ?
MS-DOS, PC BIOS, and File I/O TheFile
filevar
dseg
ends
cseg
segment assume
{}
para public 'code' cs:cseg, ds:dseg
; For-Next macros from Chapter Eight. ; See Chapter Eight for details on how this works. ForLp
$$For&LCV&= $$For&LCV&=
ForLoop &ForLoop&:
Next
NextLbl &NextLbl&:
macro local
LCV, Start, Stop ForLoop
ifndef $$For&LCV& 0 else $$For&LCV& + 1 endif mov mov
ax, Start LCV, ax
textequ
@catstr($$For&LCV&, %$$For&LCV&)
mov cmp jg endm
ax, LCV ax, Stop @catstr($$Next&LCV&, %$$For&LCV&)
macro local inc jmp textequ
LCV NextLbl LCV @catstr($$For&LCV&, %$$For&LCV&) @catstr($$Next&LCV&, %$$For&LCV&)
endm
; ; ; ; ; ; ;
File I/O macros:
SetPtr sets up the CurOutput pointer variable. This macro is called by the other macros, it's not something you would normally call directly. Its whole purpose in life is to shorten the other macros and save a little typing.
SetPtr
macro push push
fvar es di
mov mov mov mov
di, offset fvar word ptr CurOutput, di di, seg fvar word ptr CurOutput+2, di
PushOutAdrs lesi FileOutput SetOutAdrs pop di pop es endm ; ; ; ; fprint-
Prints a string to the display.
Page 759
Chapter 13 ; ; ; ; ; ; ; ; ; ; ;
Usage: fprint
filevar,"String or bytes to print"
Note: you can supply optional byte or string data after the string above by enclosing the data in angle brackets, e.g., fprint
filevar,<"string to print",cr,lf>
Do *NOT* put a zero terminating byte at the end of the string, the fprint macro will do that for you automatically.
fprint
macro SetPtr
fvar:req, string:req fvar
print byte byte
string 0
PopOutAdrs endm ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
fprintffprintff-
Prints a formatted string to the display. Like fprintf, but handles floats as well as other items.
Usage: fprintf filevar,"format string", optional data values fprintff filevar,"format string", optional data values Examples: fprintf fprintff
FileVariable,"i=%d, j=%d\n", i, j FileVariable,"f=%8.2f, i=%d\n", f, i
Note: if you want to specify a list of strings and bytes for the format string, just surround the items with an angle bracket, e.g., fprintf FileVariable, <"i=%d, j=%d",cr,lf>, i, j
fprintf
macro setptr
fvar:req, FmtStr:req, Operands:vararg fvar
printf byte byte
FmtStr 0
for dword endm
ThisVal, ThisVal
PopOutAdrs endm fprintff
macro setptr
fvar:req, FmtStr:req, Operands:vararg fvar
printff byte byte
FmtStr 0
for dword endm
ThisVal, ThisVal
PopOutAdrs endm
; F-
Page 760
This is a generic macro that converts stand-alone (no code stream parms)
MS-DOS, PC BIOS, and File I/O ; stdlib functions into file output routines. Use it with putc, puts, ; puti, putu, putl, putisize, putusize, putlsize, putcr, etc. ; ; Usage: ; ; F StdLibFunction, FileVariable ; ; Examples: ; ; mov al, 'A' ; F putc, TheFile ; mov ax, I ; mov cx, 4 ; F putisize, TheFile
F
macro func:req, fvar:req setptr fvar func PopOutAdrs endm
; WriteLn- Quick macro to handle the putcr operation (since this code calls ; putcr so often). WriteLn
macro F endm
fvar:req putcr, fvar
; FileOutput- Writes the single character in AL to an output file. ; The macros above redirect the standard output to this routine ; to print data to a file. FileOutput
FileOutput
proc push push push mov mov
far es di ds di, dseg ds, di
les fputc
di, CurOutput
pop pop pop ret endp
ds di es
; A simple main program that tests the code above. ; This program writes a multiplication table to the file "MyFile.txt" Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
; Rewrite(TheFile, FileName); ldxi lesi fcreate ; ; ; ;
FileName TheFile
writeln(TheFile); writeln(TheFile,' '); for i := 0 to 5 do write(TheFile,'|',i:4,' '); writeln(TheFile);
Page 761
Chapter 13 WriteLn fprint
TheFile TheFile,"
forlp fprintf next WriteLn
i,0,5 TheFile, "|%4d ", i i TheFile
"
; for j := -5 to 5 do begin ; ; write(TheFile,'----'); ; for i := 0 to 5 do write(TheFile, '+-----'); ; writeln(TheFile); ; ; write(j:3, ' |'); ; for i := 0 to 5 do write(i*j:4, ' |); ; writeln(TheFile); ; ; end; forlp
j,-5,5
fprint forlp fprintf next fprint
TheFile,"----" i,0,5 TheFile,"+-----" i TheFile,<"+",cr,lf>
fprintf
TheFile, "%3d |", j
forlp
i,0,5
mov imul mov F fprint
ax, i j cx, 4 putisize, TheFile TheFile, " |"
next Writeln
i TheFile
next WriteLn
j TheFile
lesi fclose
TheFile
; Close(TheFile);
Page 762
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
MS-DOS, PC BIOS, and File I/O
13.6
Laboratory Exercises The following three programs all do the same thing: they copy the file ”ex13_1.in” to the file “ex13_1.out”. The difference is the way they copy the files. The first program, ex13_1a, copies the data from the input file to the output file using character at a time I/O under DOS. The second program, ex13_1b, uses blocked I/O under DOS. The third program, ex13_1c, uses the Standard Library’s file I/O routines to copy the data. Run these three programs and measure the amount of time they take to run12. For your lab report: report the running times and comment on the relative efficiencies of these data transfer methods. Is the loss of performance of the Standard Library routines (compared to block I/O) justified in terms of the ease of use of these routines? Explain. ; ; ; ; ; ; ;
EX13_1a.asm This program copies one file to another using character at a time I/O. It is easy to write, read, and understand, but character at a time I/O is quite slow. Run this program and time its execution. Then run the corresponding blocked I/O exercise and compare the execution times of the two programs. include stdlib.a includelib stdlib.lib
dseg
segment
para public 'data'
FHndl FHndl2 Buffer
word word byte
? ? ?
FName FNamePtr
equ dword
this word FileName
Filename Filename2
byte byte
"Ex13_1.in",0 "Ex13_1.out",0
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
mov mov lea int jc mov
ah, 3dh al, 0 dx, Filename 21h BadOpen FHndl, ax
mov mov
FName, offset Filename2 ;Set this up in case there FName+2, seg FileName2 ; is an error during open.
mov mov
ah, 3ch cx, 0
;Open the input file ; for reading ;DS points at filename’s ; segment ;Save file handle
;Open the output file for writing ; with normal file attributes
12. If you have a really fast machine you may want to make the ex13_1.in file larger (by copying and pasting data in the file) to make it larger.
Page 763
Chapter 13
LP:
EOF:
ReadError:
WriteError:
BadOpen:
lea int jc mov
dx, Filename2 ;Presume DS points at filename 21h ; segment BadOpen FHndl2, ax ;Save file handle
mov lea mov mov int jc cmp jne
ah,3fh dx, Buffer cx, 1 bx, FHndl 21h ReadError ax, cx EOF
;Read data from the file ;Address of data buffer ;Read one byte ;Get file handle value
mov lea mov mov int jc jmp
ah,40h dx, Buffer cx, 1 bx, FHndl2 21h WriteError LP
;Write data to the file ;Address of data buffer ;Write one byte ;Get file handle value
mov mov int jmp
bx, FHndl ah, 3eh 21h Quit
printf byte dword jmp
"Error while reading data from file '%s'.",cr,lf,0 FileName Quit
printf byte dword jmp
"Error while writing data to file '%s'.",cr,lf,0 FileName2 Quit
printf byte byte byte byte dword
"Could not open '%^s'. Make sure this file is “ “in the ",cr,lf "current directory before attempting to run “ this program again.", cr,lf,0 FName
;EOF reached?
;Read next byte
;Close file
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
---------------------------------------------------
; ; ; ; ; ;
EX13_1b.asm This program copies one file to another using blocked I/O. Run this program and time its execution. Compare the execution time of this program against that of the character at a time I/O and the Standard Library File I/O example (ex13_1a and ex13_1c). include
Page 764
stdlib.a
MS-DOS, PC BIOS, and File I/O includelib stdlib.lib
dseg
segment
para public 'data'
; File handles for the files we will open. FHndl FHndl2
word word
? ?
;Input file handle ;Output file handle
Buffer
byte
256 dup (?)
;File buffer area
FName FNamePtr
equ dword
this word FileName
;Ptr to current file name
Filename Filename2
byte byte
"Ex13_1.in",0 "Ex13_1.out",0
;Input file name ;Output file name
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
mov mov lea int jc mov
ah, 3dh al, 0 dx, Filename 21h BadOpen FHndl, ax
mov mov
FName, offset Filename2 ;Set this up in case there FName+2, seg FileName2 ; is an error during open.
mov mov lea int jc mov
ah, 3ch cx, 0 dx, Filename2 21h BadOpen FHndl2, ax
;Open the input file ; for reading ;Presume DS points at ; filename’s segment ;Save file handle
;Open the output file for writing ; with normal file attributes ;Presume DS points at filename ; segment ;Save file handle
; The following loop reads 256 bytes at a time from the file and then ; writes those 256 bytes to the output file. LP:
mov lea mov mov int jc cmp jne
ah,3fh dx, Buffer cx, 256 bx, FHndl 21h ReadError ax, cx EOF
mov lea mov mov int jc jmp
ah, 40h dx, Buffer cx, 256 bx, FHndl2 21h WriteError LP
;Read data from the file ;Address of data buffer ;Read 256 bytes ;Get file handle value
;EOF reached?
;Write data to file ;Address of output buffer ;Write 256 bytes ;Output handle
;Read next block
; Note, just because the number of bytes read does not equal 256,
Page 765
Chapter 13 ; don't get the idea we're through, there could be up to 255 bytes ; in the buffer still waiting to be processed. EOF:
mov jcxz mov lea mov int jc
cx, ax EOF2 ah, 40h dx, Buffer bx, FHndl2 21h WriteError
EOF2:
mov mov int jmp
bx, FHndl ah, 3eh 21h Quit
printf byte dword jmp
"Error while reading data from file '%s'.",cr,lf,0 FileName Quit
printf byte dword jmp
"Error while writing data to file '%s'.",cr,lf,0 FileName2 Quit
printf byte byte byte byte dword
"Could not open '%^s'. Make sure this file is in “ “the ",cr,lf "current directory before attempting to run “ “this program again.", cr,lf,0 FName
ReadError:
WriteError:
BadOpen:
Quit: Main cseg
ExitPgm endp
;Put # of bytes to write in CX. ;If CX is zero, we're really done. ;Write data to file ;Address of output buffer ;Output handle
;Close file
;DOS macro to quit program.
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
--------------------------------------------------; ; ; ; ; ; ; ; ;
EX13_1c.asm This program copies one file to another using the standard library file I/O routines. The Standard Library file I/O routines let you do character at a time I/O, but they block up the data to transfer to improve system performance. You should find that the execution time of this code is somewhere between blocked I/O (ex13_1b) and character at a time I/O (EX13_1a); it will, however, be much closer to the block I/O time (probably about twice as long as block I/O). include stdlib.a includelib stdlib.lib
Page 766
dseg
segment
para public 'data'
InFile OutFile
filevar filevar
{} {}
Filename
byte
"Ex13_1.in",0;Input file name
MS-DOS, PC BIOS, and File I/O Filename2
byte
"Ex13_1.out",0;Output file name
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
; Open the input file: mov ldxi lesi fopen jc
ax, 0 Filename InFile
;Open for reading
BadOpen
; Open the output file: mov ldxi lesi fcreate jc
ax, 1 Filename2 OutFile
;Open for output
BadCreate
; Copy the input file to the output file: CopyLp:
BadOpen:
BadCreate:
GetDone:
AtEOF: CloseIn:
lesi fgetc jc
InFile GetDone
lesi fputc jmp
CopyLp
printf byte dword jmp
"Error opening '%s'",cr,lf,0 Filename Quit
printf byte dword jmp
"Error creating '%s'",cr,lf,0 Filename2 CloseIn
cmp je
ax, 0 AtEOF
print byte
"Error copying files (read error)",cr,lf,0
lesi fclose lesi fclose
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
OutFile
;Check for EOF
OutFile InFile
;DOS macro to quit program.
para stack 'stack' 1024 dup ("stack ")
Page 767
Chapter 13 zzzzzzseg LastBytes zzzzzzseg
13.7
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Programming Projects
1)
The sample program in Section 13.5 reroutes the standard output through the Standard Library’s file I/O routines allowing you to use any of the output routines to write data to a file. Write a similar set of routines and macros that let you read data from a file using the Standard Library’s input routines (getc, gets, getsm scanf, etc.). Redirect the input through the Standard Library’s file input functions.
2)
The last sample program in section 13.3.12 (copyuc.asm on the companion CD-ROM) copies one file to another, possibly converting lower case characters to upper case. This program currently parses the command line directly and uses blocked I/O to copy the data in the file. Rewrite this program using argv/argc to process the command line parameters and use the Standard Library file I/O routines to process each character in the file.
3)
Write a “word count” program that counts the number of characters, words, and lines within a file. Assume that a word is any sequence of characters between spaces, tabs, carriage returns, line feeds, the beginning of a file, and the end of a file (if you want to save some effort, you can assume a “whitespace” symbol is any ASCII code less than or equal to a space).
4)
Write a program that prints an ASCII text file to the printer. Use the BIOS int 17h services to print the characters in the file.
5)
Write two programs, “xmit” and “rcv”. The xmit program should fetch a command line filename and transmit this file across the serial port. It should transmit the filename and the number of bytes in the file (hint: use the DOS seek command to determine the length of the file). The rcv program should read the filename and file length from the serial port, create the file by the specified name, read the specified number of bytes from the serial port, and then close the file.
13.8
Summary MS-DOS and BIOS provide many system services which control the hardware on a PC. They provide a machine independent and flexible interface. Unfortunately, the PC has grown up quite a bit since the days of the original 5 Mhz 8088 IBM PC. Many BIOS and DOS calls are now obsolete, having been superseded by newer calls. To ensure backwards compatibility, MS-DOS and BIOS generally support all of the older obsolete calls as well as the newer calls. However, your programs should not use the obsolete calls, they are there for backwards compatibility only. The BIOS provides many services related to the control of devices such as the video display, the printer port, the keyboard, the serial port, the real time clock, etc. Descriptions of the BIOS services for these devices appear in the following sections: • • • • • • • • • • •
Page 768
“INT 5- Print Screen” on page 702 “INT 10h - Video Services” on page 702 “INT 11h - Equipment Installed” on page 704 “INT 12h - Memory Available” on page 704 “INT 13h - Low Level Disk Services” on page 704 “INT 14h - Serial I/O” on page 706 “INT 15h - Miscellaneous Services” on page 708 “INT 16h - Keyboard Services” on page 708 “INT 17h - Printer Services” on page 710 “INT 18h - Run BASIC” on page 712 “INT 19h - Reboot Computer” on page 712
MS-DOS, PC BIOS, and File I/O •
“INT 1Ah - Real Time Clock” on page 712
MS-DOS provides several different types of services. This chapter concentrated on the file I/O services provided by MS-DOS. In particular, this chapter dealt with implementing efficient file I/O operations using blocked I/O. To learn how to perform file I/O and perform other MS-DOS operations, check out the following sections: • • • • • • • • •
“MS-DOS Calling Sequence” on page 714 “MS-DOS Character Oriented Functions” on page 714 “MS-DOS “Obsolete” Filing Calls” on page 717 “MS-DOS Date and Time Functions” on page 718 “MS-DOS Memory Management Functions” on page 718 “MS-DOS Process Control Functions” on page 721 “MS-DOS “New” Filing Calls” on page 725 “File I/O Examples” on page 734 “Blocked File I/O” on page 737
Accessing command line parameters is an important operation within MS-DOS applications. DOS’ PSP (Program Segment Prefix) contains the command line and several other pieces of important information. To learn about the various fields in the PSP and see how to access command line parameters, check out the following sections in this chapter: • • •
“The Program Segment Prefix (PSP)” on page 739 “Accessing Command Line Parameters” on page 742 “ARGC and ARGV” on page 750
Of course, the UCR Standard Library provides some file I/O routines as well. This chapter closes up by describing some of the StdLib file I/O routines along with their advantages and disadvantages. See • • • • • • • • • •
“Fopen” on page 751 “Fcreate” on page 752 “Fclose” on page 752 “Fflush” on page 752 “Fgetc” on page 752\ “Fread” on page 753 “Fputc” on page 753 “Fwrite” on page 753 “Redirecting I/O Through the StdLib File I/O Routines” on page 753 “A File I/O Example” on page 755
Page 769
Chapter 13
13.9
Questions
1)
How are BIOS routines called?
2)
Which BIOS routine is used to write a character to the: a) video display
b) serial port
c) printer port
3)
When the serial transmit or receive services return to the caller, the error status is returned in the AH register. However, there is a problem with the value returned. What is this problem?
4)
Explain how you could test the keyboard to see if a key is available. 5)What with the keyboard shift status function?
6)
How are special key codes (those keystrokes not returning ASCII codes) returned by the read keyboard call?
7)
How would you send a character to the printer?
8)
How do you read the real time clock?
9)
Given that the RTC increments a 32-bit counter every 55ms, how long will the system run before overflow of this counter occurs?
10)
Why should you reset the clock if, when reading the clock, you’ve determined that the counter has overflowed?
11)
How do assembly language programs call MS-DOS?
12)
Where are parameters generally passed to MS-DOS?
13)
Why are there two sets of filing functions in MS-DOS?
14)
Where can the DOS command line be found?
15)
What is the purpose of the environment string area?
16)
How can you determine the amount of memory available for use by your program?
17)
Which is more efficient: character I/O or blocked I/O? Why?
18)
What is a good blocksize for blocked I/O?
19)
What can’t you use blocked I/O on random access files?
20)
Explain how to use the seek command to move the file pointer 128 bytes backwards in the file from the current file position.
21)
Where is the error status normally returned after a call to DOS?
22)
Why is it difficult to use blocked I/O on a random access file? Which would be easier, random access on a blocked I/O file opened for input or random access on a blocked I/O file opened for reading and writing?
23)
Describe how you might implement blocked I/O on files opened for random access reading and writing.
24)
What are two ways you can obtain the address of the PSP?
25)
How do you determine that you’ve reached the end of file when using MS-DOS file I/O calls? When using UCR Standard Library file I/O calls?
Page 770
is
wrong
Floating Point Arithmetic
Chapter 14
Although integers provide an exact representation for numeric values, they suffer from two major drawbacks: the inability to represent fractional values and a limited dynamic range. Floating point arithmetic solves these two problems at the expense of accuracy and, on some processors, speed. Most programmers are aware of the speed loss associated with floating point arithmetic; however, they are blithely unware of the problems with accuracy. For many applications, the benefits of floating point outweigh the disadvantages. However, to properly use floating point arithmetic in any program, you must learn how floating point arithmetic operates. Intel, understanding the importance of floating point arithmetic in modern programs, provided support for floating point arithmetic in the earliest designs of the 8086 – the 80x87 FPU (floating point unit or math coprocessor). However, on processors eariler than the 80486 (or on the 80486sx), the floating point processor is an optional device; it this device is not present you must simulate it in software. This chapter contains four main sections. The first section discusses floating point arithmetic from a mathematical point of view. The second section discusses the binary floating point formats commonly used on Intel processors. The third discusses software floating point and the math routines from the UCR Standard Library. The fourth section discusses the 80x87 FPU chips.
14.0
Chapter Overview This chapter contains four major sections: a description of floating point formats and operations (two sections), a discussion of the floating point support in the UCR Standard Library, and a discussion of the 80x87 FPU (floating point unit). The sections below that have a “•” prefix are essential. Those sections with a “❏” discuss advanced topics that you may want to put off for a while. • • • • • ❏
• • ❏ ❏ ❏ ❏ ❏
14.1
The mathematics of floating point arithmetic. IEEE floating point formats. The UCR Standard Library floating point routines. The 80x87 floating point coprocessors. FPU data movement instructions. Conversions. Arithmetic instructions. Comparison instructions. Constant instructiuons. Transcendental instructions. Miscellaneous instructions. Integer operations. Additional trigonometric functions.
The Mathematics of Floating Point Arithmetic A big problem with floating point arithmetic is that it does not follow the standard rules of algebra. Nevertheless, many programmers apply normal algebraic rules when using floating point arithmetic. This is a source of bugs in many programs. One of the primary goals of this section is to describe the limitations of floating point arithmetic so you will understand how to use it properly. Normal algebraic rules apply only to infinte precision arithmetic. Consider the simple statement x:=x+1, x is an integer. On any modern computer this statement follows the normal rules of algebra as long as overflow does not occur. That is, this statement is valid only for Page 771 Thi d
t
t d ith F
M k
402
Chapter 14
±
e±
Figure 14.1 Simple Floating Point Format certain values of x (minint <= x < maxint). Most programmers do not have a problem with this because they are well aware of the fact that integers in a program do not follow the standard algebraic rules (e.g., 5/2 ≠ 2.5). Integers do not follow the standard rules of algebra because the computer represents them with a finite number of bits. You cannot represent any of the (integer) values above the maximum integer or below the minimum integer. Floating point values suffer from this same problem, only worse. After all, the integers are a subset of the real numbers. Therefore, the floating point values must represent the same infinite set of integers. However, there are an infinite number of values between any two real values, so this problem is infinitely worse. Therefore, as well as having to limit your values between a maximum and minimum range, you cannot represent all the values between those two ranges, either. To represent real numbers, most floating point formats employ scientific notation and use some number of bits to represent a mantissa and a smaller number of bits to represent an exponent. The end result is that floating point numbers can only represent numbers with a specific number of significant digits. This has a big impact on how floating point arithmetic operations. To easily see the impact of limited precision arithmetic, we will adopt a simplified decimal floating point format for our examples. Our floating point format will provide a mantissa with three significant digits and a decimal exponent with two digits. The mantissa and exponents are both signed values (see Figure 14.1). When adding and subtracting two numbers in scientific notation, you must adjust the two values so that their exponents are the same. For example, when adding 1.23e1 and 4.56e0, you must adjust the values so they have the same exponent. One way to do this is to to convert 4.56e0 to 0.456e1 and then add. This produces 1.686e1. Unfortunately, the result does not fit into three significant digits, so we must either round or truncate the result to three significant digits. Rounding generally produces the most accurate result, so let’s round the result to obtain 1.69e1. As you can see, the lack of precision (the number of digits or bits we maintain in a computation) affects the accuracy (the correctness of the computation). In the previous example, we were able to round the result because we maintained four significant digits during the calculation. If our floating point calculation is limited to three significant digits during computation, we would have had to truncate the last digit of the smaller number, obtaining 1.68e1 which is even less correct. Extra digits available during a computation are known as guard digits (or guard bits in the case of a binary format). They greatly enhance accuracy during a long chain of computations. The accuracy loss during a single computation usually isn’t enough to worry about unless you are greatly concerned about the accuracy of your computations. However, if you compute a value which is the result of a sequence of floating point operations, the error can accumulate and greatly affect the computation itself. For example, suppose we were to add 1.23e3 with 1.00e0. Adjusting the numbers so their exponents are the same before the addition produces 1.23e3 + 0.001e3. The sum of these two values, even after rounding, is 1.23e3. This might seem perfectly reasonable to you; after all, we can only maintain three significant digits, adding in a small value shouldn’t affect the result at all. However, suppose we were to add 1.00e0 1.23e3 ten times. The first time we add 1.00e0 to 1.23e3 we get 1.23e3. Likewise, we get this same result the second, third, fourth, ..., and tenth time we add 1.00e0 to 1.23e3. On the other hand, had we added 1.00e0 to itself ten times, then added the result (1.00e1) to 1.23e3, we would have gotten a different result, 1.24e3. This is the most important thing to know about limited precision arithmetic:
Page 772
Floating Point Arithmetic The order of evaluation can effect the accuracy of the result. You will get more accurate results if the relative magnitudes (that is, the exponents) are close to one another. If you are performing a chain calculation involving addition and subtraction, you should attempt to group the values appropriately. Another problem with addition and subtraction is that you can wind up with false precision. Consider the computation 1.23e0 - 1.22 e0. This produces 0.01e0. Although this is mathematically equivalent to 1.00e-2, this latter form suggests that the last two digits are exactly zero. Unfortunately, we’ve only got a single significant digit at this time. Indeed, some FPUs or floating point software packages might actually insert random digits (or bits) into the L.O. positions. This brings up a second important rule concerning limited precision arithmetic: Whenever subtracting two numbers with the same signs or adding two numbers with different signs, the accuracy of the result may be less than the precision available in the floating point format. Multiplication and division do not suffer from the same problems as addition and subtraction since you do not have to adjust the exponents before the operation; all you need to do is add the exponents and multiply the mantissas (or subtract the exponents and divide the mantissas). By themselves, multiplication and division do not produce particularly poor results. However, they tend to multiply any error which already exists in a value. For example, if you multiply 1.23e0 by two, when you should be multiplying 1.24e0 by two, the result is even less accurate. This brings up a third important rule when working with limited precision arithmetic: When performing a chain of calculations involving addition, subtraction, multiplication, and division, try to perform the multiplication and division operations first. Often, by applying normal algebraic transformations, you can arrange a calculation so the multiply and divide operations occur first. For example, suppose you want to compute x*(y+z). Normally you would add y and z together and multiply their sum by x. However, you will get a little more accuracy if you transform x*(y+z) to get x*y+x*z and compute the result by performing the multiplications first. Multiplication and division are not without their own problems. When multiplying two very large or very small numbers, it is quite possible for overflow or underflow to occur. The same situation occurs when dividing a small number by a large number or dividing a large number by a small number. This brings up a fourth rule you should attempt to follow when multiplying or dividing values: When multiplying and dividing sets of numbers, try to arrange the multiplications so that they multiply large and small numbers together; likewise, try to divide numbers that have the same relative magnitudes. Comparing floating pointer numbers is very dangerous. Given the inaccuracies present in any computation (including converting an input string to a floating point value), you should never compare two floating point values to see if they are equal. In a binary floating point format, different computations which produce the same (mathematical) result may differ in their least significant bits. For example, adding 1.31e0+1.69e0 should produce 3.00e0. Likewise, adding 2.50e0+1.50e0 should produce 3.00e0. However, were you to compare (1.31e0+1.69e0) agains (2.50e0+1.50e0) you might find out that these sums are not equal to one another. The test for equality succeeds if and only if all bits (or digits) in the two operands are exactly the same. Since this is not necessarily true after two different floating point computations which should produce the same result, a straight test for equality may not work. The standard way to test for equality between floating point numbers is to determine how much error (or tolerance) you will allow in a comparison and check to see if one value is within this error range of the other. The straight-forward way to do this is to use a test like the following: if Value1 >= (Value2-error) and Value1 <= (Value2+error) then …
Page 773
Chapter 14 Another common way to handle this same comparison is to use a statement of the form: if abs(Value1-Value2) <= error then …
Most texts, when discussing floating point comparisons, stop immediately after discussing the problem with floating point equality, assuming that other forms of comparison are perfectly okay with floating point numbers. This isn’t true! If we are assuming that x=y if x is within y±error, then a simple bitwise comparison of x and y will claim that x ≥
if if if if if if
abs(x-y) <= error then … abs(x-y) > error then … (x-y) < error then … (x-y) <= error then … (x-y) > error then … (x-y) >= error then …
You must exercise care when choosing the value for error. This should be a value slightly greater than the largest amount of error which will creep into your computations. The exact value will depend upon the particular floating point format you use, but more on that a little later. The final rule we will state in this section is When comparing two floating point numbers, always compare one value to see if it is in the range given by the second value plus or minus some small error value. There are many other little problems that can occur when using floating point values. This text can only point out some of the major problems and make you aware of the fact that you cannot treat floating point arithmetic like real arithmetic – the inaccuracies present in limited precision arithmetic can get you into trouble if you are not careful. A good text on numerical analysis or even scientific computing can help fill in the details which are beyond the scope of this text. If you are going to be working with floating point arithmetic, in any language, you should take the time to study the effects of limited precision arithmetic on your computations.
14.2
IEEE Floating Point Formats When Intel planned to introduce a floating point coprocessor for their new 8086 microprocessor, they were smart enough to realize that the electrical engineers and solid-state physicists who design chips were, perhaps, not the best people to do the necessary numerical analysis to pick the best possible binary representation for a floating point format. So Intel went out and hired the best numerical analyst they could find to design a floating point format for their 8087 FPU. That person then hired two other experts in the field and the three of them (Kahn, Coonan, and Stone) designed Intel’s floating point format. They did such a good job designing the KCS Floating Point Standard that the IEEE organization adopted this format for the IEEE floating point format1. To handle a wide range of performance and accuracy requirements, Intel actually introduced three floating point formats: single precision, double precision, and extended precision. The single and double precision formats corresponded to C’s float and double types or FORTRAN’s real and double precision types. Intel intended to use extended precision for long chains of computations. Extended precision contains 16 extra bits that the
1. There were some minor changes to the way certain degenerate operations were handled, but the bit representation remained essentially unchanged.
Page 774
Floating Point Arithmetic
31
Sign Bit
23
Exponent Bits
15
1
7
0
Mantissa Bits
The 24th mantissa bit is implied and is always one.
Figure 14.2 32 Bit Single Precision Floating Point Format calculations could use for guard bits before rounding down to a double precision value when storing the result. The single precision format uses a one’s complement 24 bit mantissa and an eight bit excess-128 exponent. The mantissa usually represents a value between 1.0 to just under 2.0. The H.O. bit of the mantissa is always assumed to be one and represents a value just to the left of the binary point2. The remaining 23 mantissa bits appear to the right of the binary point. Therefore, the mantissa represents the value: 1.mmmmmmm mmmmmmmm mmmmmmmm
The “mmmm…” characters represent the 23 bits of the mantissa. Keep in mind that we are working with binary numbers here. Therefore, each position to the right of the binary point represents a value (zero or one) times a successive negative power of two. The implied one bit is always multiplied by 20, which is one. This is why the mantissa is always greater than or equal to one. Even if the other mantissa bits are all zero, the implied one bit always gives us the value one3. Of course, even if we had an almost infinite number of one bits after the binary point, they still would not add up to two. This is why the mantissa can represent values in the range one to just under two. Although there are an infinite number of values between one and two, we can only represent eight million of them because we a 23 bit mantissa (the 24th bit is always one). This is the reason for inaccuracy in floating point arithmetic – we are limited to 23 bits of precision in compuations involving single precision floating point values. The mantissa uses a one’s complement format rather than two’s complement. This means that the 24 bit value of the mantissa is simply an unsigned binary number and the sign bit determines whether that value is positive or negative. One’s complement numbers have the unusual property that there are two representations for zero (with the sign bit set or clear). Generally, this is important only to the person designing the floating point software or hardware system. We will assume that the value zero always has the sign bit clear. To represent values outside the range 1.0 to just under 2.0, the exponent portion of the floating point format comes into play. The floating point format raise two to the power specified by the exponent and then multiplies the mantissa by this value. The exponent is eight bits and is stored in an excess-127 format. In excess-127 format, the exponent 20 is represented by the value 127 (7fh). Therefore, to convert an exponent to excess-127 format simply add 127 to the exponent value. The use of excess-127 format makes it easier to compare floating point values. The single precision floating point format takes the form shown in Figure 14.2. With a 24 bit mantissa, you will get approximately 6-1/2 digits of precision (one half digit of precision means that the first six digits can all be in the range 0..9 but the seventh digit can only be in the range 0..x where x<9 and is generally close to five). With an eight
2. The binary point is the same thing as the decimal point except it appears in binary numbers rather than decimal numbers. 3. Actually, this isn’t necessarily true. Thye IEEE floating point format supports denormalized values where the H.O. bit is not zero. However, we will ignore denormalized values in our discussion.
Page 775
Chapter 14
63
52 …
Sign Bit
Exponent Bits
7
0
7
0
… 1
Mantissa Bits The 53rd mantissa bit is implied and is always one.
Figure 14.3 64 Bit Double Precision Floating Point Format
79
64 …
Sign Bit
Exponent Bits
… Mantissa Bits
Figure 14.4 80 Bit Extended Precision Floating Point Format bit excess-128 exponent, the dynamic range of single precision floating point numbers is approximately 2±128 or about 10±38. Although single precision floating point numbers are perfectly suitable for many applications, the dynamic range is somewhat small for many scientific applications and the very limited precision is unsuitable for many financial, scientific, and other applications. Furthermore, in long chains of computations, the limited precision of the single precision format may introduce serious error. The double precision format helps overcome the problems of single preicision floating point. Using twice the space, the double precision format has an 11-bit excess-1023 exponent and a 53 bit mantissa (with an implied H.O. bit of one) plus a sign bit. This provides a dynamic range of about 10±308and 14-1/2 digits of precision, sufficient for most applications. Double precision floating point values take the form shown in Figure 14.3. In order to help ensure accuracy during long chains of computations involving double precision floating point numbers, Intel designed the extended precision format. The extended precision format uses 80 bits. Twelve of the additional 16 bits are appended to the mantissa, four of the additional bits are appended to the end of the exponent. Unlike the single and double precision values, the extended precision format does not have an implied H.O. bit which is always one. Therefore, the extended precision format provides a 64 bit mantissa, a 15 bit excess-16383 exponent, and a one bit sign. The format for the extended precision floating point value is shown in Figure 14.4. On the 80x87 FPUs and the 80486 CPU, all computations are done using the extended precision form. Whenever you load a single or double precision value, the FPU automatically converts it to an extended precision value. Likewise, when you store a single or double precision value to memory, the FPU automatically rounds the value down to the appropriate size before storing it. By always working with the extended precision format, Intel guarantees a large number of guard bits are present to ensure the accuracy of your computations. Some texts erroneously claim that you should never use the extended precision format in your own programs, because Intel only guarantees accurate computations when using the single or double precision formats. This is foolish. By performing all computations using 80 bits, Intel helps ensure (but not guarantee) that you will get full 32 or 64 bit accuracy in your computations. Since the 80x87 FPUs and 80486 CPU do not provide a large number of guard bits in 80 bit computations, some error will inevitably creep into the L.O. bits of an extended precision computation. However, if your computation is correct to 64 bits, the 80 bit computation will always provide at least 64 accurate bits. Most of the time you will get even more. While you cannot assume that you get an accurate 80
Page 776
Floating Point Arithmetic bit computation, you can usually do better than 64 when using the extended precision format. To maintain maximum precision during computation, most computations use normalized values. A normalized floating point value is one that has a H.O. mantissa bit equal to one. Almost any non-normalized value can be normalized by shifting the mantissa bits to the left and decrementing the exponent by one until a one appears in the H.O. bit of the mantissa. Remember, the exponent is a binary exponent. Each time you increment the exponent, you multiply the floating point value by two. Likewise, whenever you decrement the exponent, you divide the floating point value by two. By the same token, shifting the mantissa to the left one bit position multiplies the floating point value by two; likewise, shifting the mantissa to the right divides the floating point value by two. Therefore, shifting the mantissa to the left one position and decrementing the exponent does not change the value of the floating point number at all. Keeping floating point numbers normalized is beneficial because it maintains the maximum number of bits of precision for a computation. If the H.O. bits of the mantissa are all zero, the mantissa has that many fewer bits of precision available for computation. Therefore, a floating point computation will be more accurate if it involves only normalized values. There are two important cases where a floating point number cannot be normalized. The value 0.0 is a special case. Obviously it cannot be normalized because the floating point representation for zero has no one bits in the mantissa. This, however, is not a problem since we can exactly represent the value zero with only a single bit. The second case is when we have some H.O. bits in the mantissa which are zero but the biased exponent is also zero (and we cannot decrement it to normalize the mantissa). Rather than disallow certain small values, whose H.O. mantissa bits and biased exponent are zero (the most negative exponent possible), the IEEE standard allows special denormalized values to represent these smaller values4. Although the use of denormalized values allows IEEE floating point computations to produce better results than if underflow occurred, keep in mind that denormalized values offer less bits of precision and are inherently less accurate. Since the 80x87 FPUs and 80486 CPU always convert single and double precision values to extended precision, extended precision arithmetic is actually faster than single or double precision. Therefore, the expected performance benefit of using the smaller formats is not present on these chips. However, when designing the Pentium/586 CPU, Intel redesigned the built-in floating point unit to better compete with RISC chips. Most RISC chips support a native 64 bit double precision format which is faster than Intel’s extended precision format. Therefore, Intel provided native 64 bit operations on the Pentium to better compete against the RISC chips. Therefore, the double precision format is the fastest on the Pentium and later chips.
14.3
The UCR Standard Library Floating Point Routines In most assembly language texts, which bother to cover floating point arithmetic, this section would normally describe how to design your own floating point routines for addition, subtraction, multiplication, and division. This text will not do that for several reasons. First, to design a good floating point library requires a solid background in numerical analysis; a prerequisite this text does not assume of its readers. Second, the UCR Standard Library already provides a reasonable set of floating point routines in source code form; why waste space in this text when the sources are readily available elsewhere? Third, floating point units are quickly becoming standard equipment on all modern CPUs or motherboards; it makes no more sense to describe how to manually perform a floating point computation than it does to describe how to manually perform an integer computation. Therefore, this section will describe how to use the UCR Standard Library routines if
4. The alternative would be to underflow the values to zero.
Page 777
Chapter 14 you do not have an FPU available; a later section will describe the use of the floating point unit. The UCR Standard Library provides a large number of routines to support floating point computation and I/O. This library uses the same memory format for 32, 64, and 80 bit floating point numbers as the 80x87 FPUs. The UCR Standard Library’s floating point routines do not exactly follow the IEEE requirements with respect to error conditions and other degenerate cases, and it may produce slightly different results than an 80x87 FPU, but the results will be very close5. Since the UCR Standard Library uses the same memory format for 32, 64, and 80 bit numbers as the 80x87 FPUs, you can freely mix computations involving floating point between the FPU and the Standard Library routines. The UCR Standard Library provides numerous routines to manipulate floating point numbes. The following sections describe each of these routines, by category.
14.3.1
Load and Store Routines Since 80x86 CPUs without an FPU do not provide any 80-bit registers, the UCR Standard Library must use memory-based variables to hold floating point values during computation. The UCR Standard Library routines use two pseudo registers, an accumlator register and an operand register, when performing floating point operations. For example, the floating point addition routine adds the value in the floating point operand register to the floating point accumulator register, leaving the result in the accumulator. The load and store routines allow you to load floating point values into the floating point accumulator and operand registers as well as store the value of the floating point accumulator back to memory. The routines in this category include accop, xaccop, lsfpa, ssfpa, ldfpa, sdfpa, lefpa, sefpa, lefpal, lsfpo, ldfpo, lefpo, and lefpol. The accop routine copies the value in the floating point accumulator to the floating point operand register. This routine is useful when you want to use the result of one computation as the second operand of a second computation. The xaccop routine exchanges the values in the floating point accumuator and operand registers. Note that many floating point computations destory the value in the floating point operand register, so you cannot blindly assume that the routines preserve the operand register. Therefore, calling this routine only makes sense after performing some computation which you know does not affect the floating point operand register. Lsfpa, ldfpa, and lefpa load the floating point accumulator with a single, double, or extended precision floating point value, respectively. The UCR Standard Library uses its own internal format for computations. These routines convert the specified values to the internal format during the load. On entry to each of these routines, es:di must contain the address of the variable you want to load into the floating point accumulator. The following code demonstrates how to call these routines: rVar drVar xrVar
real4 real8 real10
1.0 2.0 3.0
. . .
lesi lsfpa
rVar
. . .
lesi ldfpa
drVar
. . .
5. Note, by the way, that different floating point chips, especially across different CPU lines, but even within the Intel family, produce slightly different results. So the fact that the UCR Standard Library does not produce the exact same results as a particular FPU is not that important.
Page 778
Floating Point Arithmetic lesi lefpa
xrVar
The lsfpo, ldfpo, and lefpo routines are similar to the lsfpa, ldfpa, and lefpa routines except, of course, they load the floating point operand register rather than the floating point accumulator with the value at address es:di. Lefpal and lefpol load the floating point accumulator or operand register with a literal 80 bit floating point constant appearing in the code stream. To use these two routines, simply follow the call with a real10 directive and the appropriate constant, e.g., lefpal real10 lefpol real10
1.0 2.0e5
The ssfpa, sdfpa, and sefpa routines store the value in the floating point accumulator into the memory based floating point variable whose address appears in es:di. There are no corresponding ssfpo, sdfpo, or sefpo routines because a result you would want to store should never appear in the floating point operand register. If you happen to get a value in the floating point operand that you want to store into memory, simply use the xaccop routine to swap the accumulator and operand registers, then use the store accumulator routines to save the result. The following code demonstrates the use of these routines: rVar drVar xrVar
real4 real8 real10
1.0 2.0 3.0
. . .
lesi ssfpa
rVar
. . .
lesi sdfpa
drVar
. . .
lesi sefpa
14.3.2
xrVar
Integer/Floating Point Conversion The UCR Standard Library includes several routines to convert between binary integers and floating point values. These routines are itof, utof, ltof, ultof, ftoi, ftou, ftol, and ftoul. The first four routines convert signed and unsigned integers to floating point format, the last four routines truncate floating point values and convert them to an integer value. Itof converts the signed 16-bit value in ax to a floating point value and leaves the result in the floating point accumulator. This routine does not affect the floating point operand register. Utof converts the unsigned integer in ax in a similar fashion. Ltof and ultof convert the 32 bit signed (ltof) or unsigned (ultof) integer in dx:ax to a floating point value, leaving the value in the floating point accumulator. These routines always succeed. Ftoi converts the value in the floating point accumulator to a signed integer value, leaving the result in ax. Conversion is by truncation; this routine keeps the integer portion and throws away the fractional part. If an overflow occurs because the resulting integer portion does not fit into 16 bits, ftoi returns the carry flag set. If the conversion occurs without error, ftoi return the carry flag clear. Ftou works in a similar fashion, except it converts the floating point value to an unsigned integer in ax; it returns the carry set if the floating point value was negative. Ftol and ftoul converts the value in the floating point accumulator to a 32 bit integer leaving the result in dx:ax. Ftol works on signed values, ftoul works with unsigned values. As with ftoi and ftou, these routines return the carry flag set if a conversion error occurs.
Page 779
Chapter 14
14.3.3
Floating Point Arithmetic Floating point arithmetic is handled by the fpadd, fp sub, fpcmp, fpmul, and fpdiv routines. Fpadd adds the value in the floating point accumulator to the floating point accumulator. Fpsub subtracts the value in the floating point operand from the floating point accumulator. Fpmul multiplies the value in the floating accumulator by the floating point operand. Fpdiv divides the value in the floating point accumulator by the value in the floating point operand register. Fpcmp compares the value in the floating point accumulator against the floating point operand. The UCR Standard Library arithmetic routines do very little error checking. For example, if arithmetic overflow occurs during addition, subtraction, multiplication, or division, the Standard Library simply sets the result to the largest legal value and returns. This is one of the major deviations from the IEEE floating point standard. Likewise, when underflow occurs the routines simply set the result to zero and return. If you divide any value by zero, the Standard Library routines simply set the result to the largest possible value and return. You may need to modify the standard library routines if you need to check for overflow, underflow, or division by zero in your programs. The floating point comparison routine (fpcmp) compares the floating point accumulator against the floating point operand and returns -1, 0, or 1 in the ax register if the accumulator is less than, equal, or greater than the floating point operand. It also compares ax with zero immediately before returning so it sets the flags so you can use the jg, jge, jl, jle, je, and jne instructions immediately after calling fpcmp. Unlike fpadd, fpsub, fpmul, and fpdiv, fpcmp does not destroy the value in the floating point accumulator or the floating point operand register. Keep in mind the problems associated with comparing floating point numbers!
14.3.4
Float/Text Conversion and Printff The UCR Standard Library provides three routines, ftoa, etoa, and atof, that let you convert floating point numbers to ASCII strings and vice versa; it also provides a special version of printf, printff, that includes the ability to print floating point values as well as other data types. Ftoa converts a floating point number to an ASCII string which is a decimal representation of that floating point number. On entry, the floating point accumulator contains the number you want to convert to a string. The es:di register pair points at a buffer in memory where ftoa will store the string. The al register contains the field width (number of print positions). The ah register contains the number of positions to display to the right of the decimal point. If ftoa cannot display the number using the print format specified by al and ah, it will create a string of “#” characters, ah characters long. Es:di must point at a byte array containing at least al+1 characters and al should contain at least five. The field width and decimal length values in the al and ah registers are similar to the values appearing after floating point numbers in the Pascal write statement, e.g., write(floatVal:al:ah); Etoa outputs the floating point number in exponential form. As with ftoa, es:di points at the buffer where etoa will store the result. The al register must contain at least eight and is the field width for the number. If al contains less than eight, etoa will output a string of “#” characters. The string that es:di points at must contain at least al+1 characters. This conversion routine is similar to Pascal’s write procedure when writing real values with a single field width specification: write(realvar:al);
The Standard Library printff routine provides all the facilities of the standard printf routine plus the ability to handle floating point output. The printff routine includes sev-
Page 780
Floating Point Arithmetic eral new format specifications to print floating point numbers in decimal form or using scientific notation. The specifications are • • • • • •
%x.yF %x.yGF %x.yLF %zE %zGE %zLE
Prints a 32 bit floating point number in decimal form. Prints a 64 bit floating point number in decimal form. Prints an 80 bit floating point number in decimal form. Prints a 32 bit floating point number using scientific notation. Prints a 64 bit floating point number using scientific notation. Prints an 80 bit floating point value using scientific notation.
In the format strings above, x and z are integer constants that denote the field width of the number to print. The y item is also an integer constant that specifies the number of positions to print after the decimal point. The x.y values are comparable to the values passed to ftoa in al and ah. The z value is comparable to the value etoa expects in the al register. Other than the addition of these six new formats, the printff routine is identical to the printf routine. If you use the printff routine in your assembly language programs, you should not use the printf routine as well. Printff duplicates all the facilities of printf and using both would only waste memory.
14.4
The 80x87 Floating Point Coprocessors When the 8086 CPU first appeared in the late 1970’s, semiconductor technology was not to the point where Intel could put floating point instrutions directly on the 8086 CPU. Therefore, they devised a scheme whereby they could use a second chip to perform the floating point calculations – the floating point unit (or FPU)6. They released their original floating point chip, the 8087, in 1980. This particular FPU worked with the 8086, 8088, 80186, and 80188 CPUs. When Intel introduced the 80286 CPU, they released a redesigned 80287 FPU chip to accompany it. Although the 80287 was compatible with the 80386 CPU, Intel designed a better FPU, the 80387, for use in 80386 systems. The 80486 CPU was the first Intel CPU to include an on-chip floating point unit. Shortly after the release of the 80486, Intel introduced the 80486sx CPU that was an 80486 without the built-in FPU. To get floating point capabilities on this chip, you had to add an 80487 chip, although the 80487 was really nothing more than a full-blown 80486 which took over for the “sx” chip in the system. Intel’s Pentium/586 chips provide a high-performance floating point unit directly on the CPU. There is no floating point coprocessor available for the Pentium chip. Collectively, we will refer to all these chips as the 80x87 FPU. Given the obsolesence of the 8086, 80286, 8087, and 80287 chips, this text will concentrate on the 80387 and later chips. There are some differences between the 80387/80486/Pentium floating point units and the earlier FPUs. If you need to write code that will execute on those earlier machines, you should consult the appropriate Intel documentation for those devices.
14.4.1
FPU Registers The 80x87 FPUs add 13 registers to the 80386 and later processors: eight floating point data registers, a control register, a status register, a tag register, an instruction pointer, and a data pointer. The data registers are similar to the 80x86’s general purpose register set insofar as all floating point calculations take place in these registers. The control register contains bits that let you decide how the 80x87 handles certain degenerate cases like rounding of inaccurate computations, control precision, and so on. The status register is similar to the 80x86’s flags register; it contains the condition code bits and several other floating point flags that describe the state of the 80x87 chip. The tag register contains several groups of bits that determine the state of the value in each of the eight general purpose registers. The instruction and data pointer registers contain certain state information
6. Intel has also refered to this device as the Numeric Data Processor (NDP), Numeric Processor Extension (NPX), and math coprocessor.
Page 781
Chapter 14
79
64
0
st(0) st(1) st(2) st(3) st(4) st(5) st(6) st(7)
Figure 14.5 80x87 Floating Point Register Stack about the last floating point instruction executed. We will not consider the last three registers in this text, see the Intel documentation for more details.
14.4.1.1 The FPU Data Registers The 80x87 FPUs provide eight 80 bit data registers organized as a stack. This is a significant departure from the organization of the general purpose registers on the 80x86 CPU that comprise a standard general-purpose register set. Intel refers to these registers as ST(0), ST(1), …, ST(7). Most assemblers will accept ST as an abbreviation for ST(0). The biggest difference between the FPU register set and the 80x86 register set is the stack organization. On the 80x86 CPU, the ax register is always the ax register, no matter what happens. On the 80x87, however, the register set is an eight element stack of 80 bit floating point values (see Figure 14.5). ST(0) refers to the item on the top of the stack, ST(1) refers to the next item on the stack, and so on. Many floating point instructions push and pop items on the stack; therefore, ST(1) will refer to the previous contents of ST(0) after you push something onto the stack. It will take some thought and practice to get used to the fact that the registers are changing under you, but this is an easy problem to overcome.
14.4.1.2 The FPU Control Register When Intel designed the 80x87 (and, essentially, the IEEE floating point standard), there were no standards in floating point hardware. Different (mainframe and mini) computer manufacturers all had different and incompatible floating point formats. Unfortunately, much application software had been written taking into account the idiosyncrasies of these different floating point formats. Intel wanted to designed an FPU that could work with the majority of the software out there (keep in mind, the IBM PC was three to four years away when Intel began designing the 8087, they couldn’t rely on that “mountain” of software available for the PC to make their chip popular). Unfortunately, many of the features found in these older floating point formats were mutually exclusive. For example, in some floating point systems rounding would occur when there was insufficient precision; in others, truncation would occur. Some applications would work with one floating point system but not with the other. Intel wanted as many applications as possible to work with as few changes as possible on their 80x87 FPUs, so they added a special register, the FPU control register, that lets the user choose one of several possible operating modes for the 80x87. The 80x87 control register contains 16 bits organized as shown in Figure 14.6. Bit 12 of the control register is only present on the 8087 and 80287 chips. It controls how the 80x87 responds to infinity. The 80387 and later chips always use a form of infinitly known and affine closure because this is the only form supported by the IEEE Page 782
Floating Point Arithmetic
Rounding Precision Control Control
15
11 10 9
00 - To nearest or even 01 - Round down 10 - Round up 11 - Truncate result
8
Exception Masks
5
4
3
2
1
0
00 - 24 bits 01 - reserved 10 - 53 bits 11 - 64 bits
Reserved on 80387 and later FPUs.
Precision Underflow Overflow Zero Divide Denormalized Invalid Operation
Figure 14.6 80x87 Control Register 754/854 standards. As such, we will ignore any further use of this bit and assume that it is always programmed with a one. Bits 10 and 11 provide rounding control according to the following values:
Table 58: Rounding Control Bits 10 & 11
Function
00
To nearest or even
01
Round down
10
Round up
11
Truncate
The “00” setting is the default. The 80x87 rounds values above one-half of the least significant bit up. It rounds values below one-half of the least significant bit down. If the value below the least significant bit is exactly one-half the least significant bit, the 80x87 rounds the value towards the value whose least significant bit is zero. For long strings of computations, this provides a reasonable, automatic, way to maintain maximum precision. The round up and round down options are present for those computations where it is important to keep track of the accuracy during a computation. By setting the rounding control to round down and performing the operation, the repeating the operation with the rounding control set to round up, you can determine the minimum and maximum ranges between which the true result will fall. The truncate option forces all computations to truncate any excess bits during the computation. You will rarely use this option if accuracy is important to you. However, if you are porting older software to the 80x87, you might use this option to help when porting the software. Bits eight and nine of the control register control the precision during computation. This capability is provided mainly to allow compatbility with older software as required by the IEEE 754 standard. The precision control bits use the following values:
Page 783
Chapter 14
Table 59: Mantissa Precision Control Bits Bits 8 & 9
Precision Control
00
24 bits
01
Reserved
10
53 bits
11
64 bits
For modern applications, the precision control bits should always be set to “11” to obtain 64 bits of precision. This will produce the most accurate results during numerical computation. Bits zero through five are the exception masks. These are similar to the interrupt enable bit in the 80x86’s flags register. If these bits contain a one, the corresponding condition is ignored by the 80x87 FPU. However, if any bit contains zero, and the corresponding condition occurs, then the FPU immediately generates an interrupt so the program can handle the degenerate condition. Bit zero corresponds to an invalid operation error. This generally occurs as the result of a programming error. Problem which raise the invalid operation exception include pushing more than eight items onto the stack or attempting to pop an item off an empty stack, taking the square root of a negative number, or loading a non-empty register. Bit one masks the denormalized interrupt which occurs whenever you try to manipulate denormalized values. Denormalized values generally occur when you load arbitrary extended precision values into the FPU or work with very small numbers just beyond the range of the FPU’s capabilities. Normally, you would probably not enable this exception. Bit two masks the zero divide exception. If this bit contains zero, the FPU will generate an interrupt if you attempt to divide a nonzero value by zero. If you do not enable the zero division exception, the FPU will produce NaN (not a number) whenever you perform a zero division. Bit three masks the overflow exception. The FPU will raise the overflow exception if a calculation overflows or if you attempt to store a value which is too large to fit into a destination operand (e.g., storing a large extended precision value into a single precision variable). Bit four, if set, masks the underflow exception. Underflow occurs when the result is too small to fit in the desintation operand. Like overflow, this exception can occur whenever you store a small extended precision value into a smaller variable (single or double precision) or when the result of a computation is too small for extended precision. Bit five controls whether the precision exception can occur. A precision exception occurs whenever the FPU produces an imprecise result, generally the result of an internal rounding operation. Although many operations will produce an exact result, many more will not. For example, dividing one by ten will produce an inexact result. Therefore, this bit is usually one since inexact results are very common. Bits six and thirteen through fifteen in the control register are currently undefined and reserved for future use. Bit seven is the interrupt enable mask, but it is only active on the 8087 FPU; a zero in this bit enables 8087 interrupts and a one disables FPU interrupts. The 80x87 provides two instructions, FLDCW (load control word) and FSTCW (store control word), that let you load and store the contents of the control register. The single operand to these instructions must be a 16 bit memory location. The FLDCW instruction loads the control register from the specified memory location, FSTCW stores the control register into the specified memory location.
Page 784
Floating Point Arithmetic
Exception Flags
15
14 13 12 11 10
9
8
Busy
C3
C1
C0
Top of stack Pointer
C2
7
6
5
4
3
2
1
0
Condition Codes Exception Flag Stack Fault Precision Underflow Overflow Reserved on 80387 Zero Divide and later FPUs. Denormalized Invalid Operation
Figure 14.7 FPU Status Register
14.4.1.3 The FPU Status Register The FPU status register provides the status of the coprocessor at the instant you read it. The FSTSW instruction stores the16 bit floating point status register into the mod/reg/rm operand. The status register s a 16 bit register, its layout appears in Figure 14.7. Bits zero through five are the exception flags. These bits are appear in the same order as the exception masks in the control register. If the corresponding condition exists, then the bit is set. These bits are independent of the exception masks in the control register. The 80x87 sets and clears these bits regardless of the corresponding mask setting. Bit six (active only on 80386 and later processors) indicates a stack fault. A stack fault occurs whenever there is a stack overflow or underflow. When this bit is set, the C1 condition code bit determines whether there was a stack overflow (C1=1) or stack underflow (C1=0) condition. Bit seven of the status register is set if any error condition bit is set. It is the logical OR of bits zero through five. A program can test this bit to quickly determine if an error condition exists. Bits eight, nine, ten, and fourteen are the coprocessor condition code bits. Various instructions set the condition code bits as shown in the following table:
Table 60: FPU Condition Code Bits Instruction fcom, fcomp, fcompp, ficom, ficomp
Condition Code Bits
Condition
C3
C2
C1
C0
0
0
X
0
ST > source
0
0
X
1
ST < source
1
0
X
0
ST = source
1
1
X
1
ST or source undefined
X = Don’t care
Page 785
Chapter 14
Table 60: FPU Condition Code Bits Instruction ftst
fxam
fucom, fucomp, fucompp
Condition Code Bits C3
C2
C1
C0
0
0
X
0
ST is positive
0
0
X
1
ST is negative
1
0
X
0
ST is zero (+ or -)
1
1
X
1
ST is uncomparable
0
0
0
0
+ Unnormalized
0
0
1
0
-Unnormalized
0
1
0
0
+Normalized
0
1
1
0
-Normalized
1
0
0
0
+0
1
0
1
0
-0
1
1
0
0
+Denormalized
1
1
1
0
-Denormalized
0
0
0
1
+NaN
0
0
1
1
-NaN
0
1
0
1
+Infinity
0
1
1
1
-Infinity
1
X
X
1
Empty register
0
0
X
0
ST > source
0
0
X
1
ST < source
1
0
X
0
ST = source
1
1
X
1
Unorder
X = Don’t care
Page 786
Condition
Floating Point Arithmetic
Table 61: Condition Code Interpretation Insruction(s) fcom, fcomp, fcmpp, ftst, fucom, fucomp, fucompp, ficom, ficomp
C0
C3
C2 Operand is not comparable.
C1 Result of comparison (see table above) or stack overflow/underflow (if stack exception bit is set ).
Result of
Result of
comparison.
comparison.
See table above.
See table above.
fxam
See previous table.
See previous table.
See previous table.
Sign of result, or stack overflow/underflow (if stack exception bit is set ).
fprem, fprem1
Bit 2 of remainder
Bit 0 of remainder
0- reduction done.
Bit 1 of remainder or stack overflow/underflow (if stack exception bit is set ).
1- reduction incomplete. fist, fbstp, frndint, fst, fstp, fadd, fmul, fdiv, fdivr, fsub, fsubr, fscale, fsqrt, fpatan, f2xm1, fyl2x, fyl2xp1 fptan, fsin, fcos, fsincos
Undefined
Undefined
Undefined
Undefined
Undefined
0- reduction done. 1- reduction incomplete.
fchs, fabs, fxch, fincstp, fdecstp, constant loads , fxtract, fld, fild, fbld, fstp (80 bit)
Round up occurred or stack overflow/underflow (if stack exception bit is set ).
Round up occurred or stack overflow/underflow (if stack exception bit is set ). Zero result or stack overflow/underflow (if stack exception bit is set ).
Undefined
Undefined
Undefined
Restored from memory operand.
Restored from memory operand.
Restored from memory operand.
Restored from memory operand.
fldcw, fstenv, fstcw, fstsw, fclex
Undefined
Undefined
Undefined
Undefined
finit, fsave
Cleared to zero.
Cleared to zero.
Cleared to zero.
Cleared to zero.
fldenv, fstor
Page 787
Chapter 14
31
23
15
7
0
32 bit Single Precision Floating Point Format 63
52 …
7
0
7
0
7
0
7
0
…
64 bit Double Precision Floating Point Format 79
64 …
…
80 bit Extended Precision Floating Point Format
Figure 14.8 80x87 Floating Point Formats
15
7
0
16 Bit Two's Complement Integer
31
23
15
32 bit Two's Complement Integer
63
52 …
…
64 bit Two's Complement Integer
Figure 14.9 80x87 Integer Formats Bits 11-13 of the FPU status register provide the register number of the top of stack. During computations, the 80x87 adds (modulo eight) the logical register numbers supplied by the programmer to these three bits to determine the physical register number at run time. Bit 15 of the status register is the busy bit. It is set whenever the FPU is busy. Most programs will have little reason to access this bit.
14.4.2
FPU Data Types The 80x87 FPU supports seven different data types: three integer types, a packed decimal type, and three floating point types. Since the 80x86 CPUs already support integer data types, these are few reasons why you would want to use the 80x87 integer types. The packed decimal type provides a 17 digit signed decimal (BCD) integer. However, we are avoiding BCD arithmetic in this text, so we will ignore this data type in the 80x87 FPU. The remaining three data types are the 32 bit, 64 bit, and 80 bit floating point data types we’ve looked at so far. The 80x87 data types appear in Figure 14.8, Figure 14.9, and Figure 14.10.
Page 788
Floating Point Arithmetic
79
72
68
64
60
7
4
0
… Sign Unused
D 17
D16
D15
D14
D2
D1
D0
80 Bit Packed Decimal Integer (BCD)
Figure 14.10 80x87 Packed Decimal Formats The 80x87 FPU generally stores values in a normalized format. When a floating point number is normalized, the H.O. bit is always one. In the 32 and 64 bit floating point formats, the 80x87 does not actually store this bit, the 80x87 always assumes that it is one. Therefore, 32 and 64 bit floating point numbers are always normalized. In the extended precision 80 bit floating point format, the 80x87 does not assume that the H.O. bit of the mantissa is one, the H.O. bit of the number appears as part of the string of bits. Normalized values provide the greatest precision for a given number of bits. However, there are a large number of non-normalized values which we can represent with the 80 bit format. These values are very close to zero and represent the set of values whose mantissa H.O. bit is not zero. The 80x87 FPUs support a special form of 80 bit known as denormalized values. Denormalized values allow the 80x87 to encode very small values it cannot encode using normalized values, but at a price. Denormalized values offer less bits of precision than normalized values. Therefore, using denormalized values in a computation may introduce some slight inaccuracy into a computation. Of course, this is always better than underflowing the denormalized value to zero (which could make the computation even less accurate), but you must keep in mind that if you work with very small values you may lose some accuracy in your computations. Note that the 80x87 status register contains a bit you can use to detect when the FPU uses a denormalized value in a computation.
14.4.3
The FPU Instruction Set The 80387 (and later) FPU adds over 80 new instructions to the 80x86 instruction set. We can classify these instructions as data movement instructions, conversions, arithmetic instructions, comparisons, constant instructions, transcendental instructions, and miscellaneous instructions. The following sections describe each of the instructions in these categories.
14.4.4
FPU Data Movement Instructions The data movement instructions transfer data between the internal FPU registers and memory. The instructions in this category are fld, fst, fstp, and fxch. The fld instructions always pushes its operand onto the floating point stack. The fstp instruction always pops the top of stack after storing the top of stack (tos) into its operation. The remaining instructions do not affect the number of items on the stack.
14.4.4.1 The FLD Instruction The fld instruction loads a 32 bit, 64 bit, or 80 bit floating point value onto the stack. This instruction converts 32 and 64 bit operand to an 80 bit extended precision value before pushing the value onto the floating point stack. The fld instruction first decrements the tos pointer (bits 11-13 of the status register) and then stores the 80 bit value in the physical register specified by the new tos pointer. If the source operand of the fld instruction is a floating point data register, ST(i), then the actual
Page 789
Chapter 14 register the 80x87 uses for the load operation is the register number before decrementing the tos pointer. Therefore, fld st or fld st(0) duplicates the value on the top of the stack. The fld instruction sets the stack fault bit if stack overflow occurs. It sets the the denormalized exception bit if you load an 80 bit denormalized value. It sets the invalid operation bit if you attempt to load an empty floating point register onto the stop of stack (or perform some other invalid operation). Examples: fld fld fld fld
st(1) mem_32 MyRealVar mem_64[bx]
14.4.4.2 The FST and FSTP Instructions The fst and fstp instructions copy the value on the top of the floating point register stack to another floating point register or to a 32, 64, or 80 bit memory variable. When copying data to a 32 or 64 bit memory variable, the 80 bit extended precision value on the top of stack is rounded to the smaller format as specified by the rounding control bits in the FPU control register. The fstp instruction pops the value off the top of stack when moving it to the destination location. It does this by incrementing the top of stack pointer in the status register after accessing the data in st(0). If the destination operand is a floating point register, the FPU stores the value at the specified register number before popping the data off the top of the stack. Executing an fstp st(0) instruction effectively pops the data off the top of stack with no data transfer. Examples: fst fstp fstp fst fst fstp
mem_32 mem_64 mem_64[ebx*8] mem_80 st(2) st(1)
The last example above effectively pops st(1) while leaving st(0) on the top of the stack. The fst and fstp instructions will set the stack exception bit if a stack underflow occurs (attempting to store a value from an empty register stack). They will set the precision bit if there is a loss of precision during the store operation (this will occur, for example, when storing an 80 bit extended precision value into a 32 or 64 bit memory variable and there are some bits lost during conversion). They will set the underflow exception bit when storing an 80 bit value value into a 32 or 64 bit memory variable, but the value is too small to fit into the destination operand. Likewise, these instructions will set the overflow exception bit if the value on the top of stack is too big to fit into a 32 or 64 bit memory variable. The fst and fstp instructions set the denormalized flag when you try to store a denormalized value into an 80 bit register or variable7. They set the invalid operation flag if an invalid operation (such as storing into an empty register) occurs. Finally, these instructions set the C1 condition bit if rounding occurs during the store operation (this only occurs when storing into a 32 or 64 bit memory variable and you have to round the mantissa to fit into the destination).
14.4.4.3 The FXCH Instruction The fxch instruction exchanges the value on the top of stack with one of the other FPU registers. This instruction takes two forms: one with a single FPU register as an operand, 7. Storing a denormalized value into a 32 or 64 bit memory variable will always set the underflow exception bit.
Page 790
Floating Point Arithmetic the second without any operands. The first form exchanges the top of stack with the specified register. The second form of fxch swaps the top of stack with st(1). Many FPU instructions, e.g., fsqrt, operate only on the top of the register stack. If you want to perform such an operation on a value that is not on the top of stack, you can use the fxch instruction to swap that register with tos, perform the desired operation, and then use the fxch to swap the tos with the original register. The following example takes the square root of st(2): fxch fsqrt fxch
st(2) st(2)
The fxch instruction sets the stack exception bit if the stack is empty. It sets the invalid operation bit if you specify an empty register as the operand. This instruction always clears the C1 condition code bit.
14.4.5
Conversions The 80x87 chip performs all arithmetic operations on 80 bit real quantities. In a sense, the fld and fst/fstp instructions are conversion instructions as well as data movement instructions because they automatically convert between the internal 80 bit real format and the 32 and 64 bit memory formats. Nonetheless, we’ll simply classify them as data movement operations, rather than conversions, because they are moving real values to and from memory. The 80x87 FPU provides five routines which convert to or from integer or binary coded decimal (BCD) format when moving data. These instructions are fild, fist, fistp, fbld, and fbstp.
14.4.5.1 The FILD Instruction The fild (integer load) instruction converts a 16, 32, or 64 bit two’s complement integer to the 80 bit extended precision format and pushes the result onto the stack. This instruction always expects a single operand. This operand must be the address of a word, double word, or quad word integer variable. Although the instruction format for fild uses the familiar mod/rm fields, the operand must be a memory variable, even for 16 and 32 bit integers. You cannot specify one of the 80386’s 16 or 32 bit general purpose registers. If you want to push an 80x86 general purpose register onto the FPU stack, you must first store it into a memory variable and then use fild to push that value of that memory variable. The fild instruction sets the stack exception bit and C1 (accordingly) if stack overflow occurs while pushing the converted value. Examples: fild fild fild
mem_16 mem_32[ecx*4] mem_64[ebx+ecx*8]
14.4.5.2 The FIST and FISTP Instructions The fist and fistp instructions convert the 80 bit extended precision variable on the top of stack to a 16, 32, or 64 bit integer and store the result away into the memory variable specified by the single operand. These instructions convert the value on tos to an integer according to the rounding setting in the FPU control register (bits 10 and 11). As for the fild instruction, the fist and fistp instructions will not let you specify one of the 80x86’s general purpose 16 or 32 bit registers as the destination operand. The fist instruction converts the value on the top of stack to an integer and then stores the result; it does not otherwise affect the floating point register stack. The fistp instruction pops the value off the floating point register stack after storing the converted value. Page 791
Chapter 14 These instructions set the stack exception bit if the floating point register stack is empty (this will also clear C1). They set the precision (imprecise operation) and C1 bits if rounding occurs (that is, if there is any fractional component to the value in st(0)). These instructions set the underflow exception bit if the result is too small (i.e., less than one but greater than zero or less than zero but greater than -1). Examples: fist fist fistp
mem_16[bx] mem_64 mem_32
Don’t forget that these instructions use the rounding control settings to determine how they will convert the floating point data to an integer during the store operation. Be default, the rouding control is usually set to “round” mode; yet most programmers expect fist/fistp to truncate the decimal portion during conversion. If you want fist/fistp to truncate floating point values when converting them to an integer, you will need to set the rounding control bits appropriately in the floating point control register.
14.4.5.3 The FBLD and FBSTP Instructions The fbld and fbstp instructions load and store 80 bit BCD values. The fbld instruction converts a BCD value to its 80 bit extended precision equivalent and pushes the result onto the stack. The fbstp instruction pops the extended precision real value on tos, converts it to an 80 bit BCD value (rounding according to the bits in the floating point control register), and stores the converted result at the address specified by the destination memory operand. Note that there is no fbst instruction which stores the value on tos without popping it. The fbld instruction sets the stack exception bit and C1 if stack overflow occurs. It sets the invalid operation bit if you attempt to load an invalid BCD value. The fbstp instruction sets the stack exception bit and clears C1 if stack underflow occurs (the stack is empty). It sets the underflow flag under the same conditions as fist and fistp. Examples: ; Assuming fewer than eight items on the stack, the following ; code sequence is equivalent to an fbst instruction: fld fbstp
st(0) mem_80
;Duplicate value on TOS.
; The following example easily converts an 80 bit BCD value to ; a 64 bit integer: fbld fist
14.4.6
bcd_80 mem_64
;Get BCD value to convert. ;Store as an integer.
Arithmetic Instructions The arithmetic instructions make up a small, but important, subset of the 80x87’s instruction set. These instructions fall into two general categories – those which operate on real values and those which operate on a real and an integer value.
14.4.6.1 The FADD and FADDP Instructions These two instructions take the following forms: fadd faddp fadd fadd faddp fadd
Page 792
st(i), st(0) st(0), st(i) st(i), st(0) mem
Floating Point Arithmetic The first two forms are equivalent. They pop the two values on the top of stack, add them, and push their sum back onto the stack. The next two forms of the fadd instruction, those with two FPU register operands, behave like the 80x86’s add instruction. They add the value in the second register operand to the value in the first register operand. Note that one of the register operands must be st(0)8. The faddp instruction with two operands adds st(0) (which must always be the second operand) to the destination (first) operand and then pops st(0). The destination operand must be one of the other FPU registers. The last form above, fadd with a memory operand, adds a 32 or 64 bit floating point variable to the value in st(0). This instruction will convert the 32 or 64 bit operands to an 80 bit extended precision value before performing the addition. Note that this instruction does not allow an 80 bit memory operand. These instructions can raise the stack, precision, underflow, overflow, denormalized, and illegal operation exceptions, as appropriate. If a stack fault exception occurs, C1 denotes stack overflow or underflow.
14.4.6.2 The FSUB, FSUBP, FSUBR, and FSUBRP Instructions These four instructions take the following forms: fsub fsubp fsubr fsubrp fsub fsub fsubp fsub
st(i). st(0) st(0), st(i) st(i), st(0) mem
fsubr fsubr fsubrp fsubr
st(i). st(0) st(0), st(i) st(i), st(0) mem
With no operands, the fsub and fsubp instructions operate identically. They pop st(0) and st(1) from the register stack, compute st(0)-st(1), and the push the difference back onto the stack. The fsubr and fsubrp instructions (reverse subtraction) operate in an almost identical fashion except they compute st(1)-st(0) and push that difference. With two register operands (destination, source ) the fsub instruction computes destination := destination - source. One of the two registers must be st(0). With two registers as operands, the fsubp also computes destination := destination - source and then it pops st(0) off the stack after computing the difference. For the fsubp instruction, the source operand must be st(0). With two register operands, the fsubr and fsubrp instruction work in a similar fashion to fsub and fsubp, except they compute destination := source - destination. The fsub mem and fsubr mem instructions accept a 32 or 64 bit memory operand. They convert the memory operand to an 80 bit extended precision value and subtract this from st(0) (fsub) or subtract st(0) from this value (fsubr) and store the result back into st(0). These instructions can raise the stack, precision, underflow, overflow, denormalized, and illegal operation exceptions, as appropriate. If a stack fault exception occurs, C1 denotes stack overflow or underflow.
8. Because you will use st(0) quite a bit when programming the 80x87, MASM allows you to use the abbreviation st for st(0). However, this text will explicitly state st(0) so there will be no confusion.
Page 793
Chapter 14
14.4.6.3 The FMUL and FMULP Instructions The fmul and fmulp instructions multiply two floating point values. These instructions allow the following forms: fmul fmulp fmul fmul fmul
st(0), st(i) st(i), st(0) mem
fmulp
st(i), st(0)
With no operands, fmul and fmulp both do the same thing – they pop st(0) and st(1), multiply these values, and push their product back onto the stack. The fmul instructions with two register operands compute destination := destination * source. One of the registers (source or destination) must be st(0). The fmulp st(i), st(0) instruction computes st(i) := st(i) * st(0) and then pops st(0). This instruction uses the value for i before popping st(0). The fmul mem instruction requires a 32 or 64 bit memory operand. It converts the specified memory variable to an 80 bit extended precision value and the multiplies st(0) by this value. These instructions can raise the stack, precision, underflow, overflow, denormalized, and illegal operation exceptions, as appropriate. If rounding occurs during the computation, these instructions set the C1 condition code bit. If a stack fault exception occurs, C1 denotes stack overflow or underflow.
14.4.6.4 The FDIV, FDIVP, FDIVR, and FDIVRP Instructions These four instructions allow the following forms: fdiv fdivp fdivr fdivrp fdiv fdiv fdivp
st(0), st(i) st(i), st(0) st(i), st(0)
fdivr fdivr fdivrp
st(0), st(i) st(i), st(0) st(i), st(0)
fdiv fdivr
mem mem
With zero operands, the fdiv and fdivp instructions pop st(0) and st(1), compute st(0)/st(1), and push the result back onto the stack. The fdivr and fdivrp instructions also pop st(0) and st(1) but compute st(1)/st(0) before pushing the quotient onto the stack. With two register operands, these instructions compute the following quotients: fdiv fdiv fdivp fdivr fdivrp
st(0), st(i), st(i), st(i), st(i),
st(i) st(0) st(0) st(i) st(0)
;st(0) ;st(i) ;st(i) ;st(0) ;st(i)
:= := := := :=
st(0)/st(i) st(i)/st(0) st(i)/st(0) st(0)/st(i) st(0)/st(i)
The fdivp and fdivrp instructions also pop st(0) after performing the division operation. The value for i in this two instructions is computed before popping st(0). These instructions can raise the stack, precision, underflow, overflow, denormalized, zero divide, and illegal operation exceptions, as appropriate. If rounding occurs during the computation, these instructions set the C1 condition code bit. If a stack fault exception occurs, C1 denotes stack overflow or underflow. Page 794
Floating Point Arithmetic
14.4.6.5 The FSQRT Instruction The fsqrt routine does not allow any operands. It computes the square root of the value on tos and replaces st(0) with this result. The value on tos must be zero or positive, otherwise fsqrt will generate an invalid operation exception. This instruction can raise the stack, precision, denormalized, and invalid operation exceptions, as appropriate. If rounding occurs during the computation, fsqrt sets the C1 condition code bit. If a stack fault exception occurs, C1 denotes stack overflow or underflow. Example: ; Compute Z := sqrt(x**2 + y**2); fld fld fmul
x st(0)
;Load X. ;Duplicate X on TOS. ;Compute X**2.
fld fld fmul
y st(0)
;Load Y. ;Duplicate Y on TOS. ;Compute Y**2.
fadd fsqrt fst
Z
;Compute X**2 + Y**2. ;Compute sqrt(x**2 + y**2). ;Store away result in Z.
14.4.6.6 The FSCALE Instruction The fscale instruction pops two values off the stack. It multiplies st(0) by 2st(1) and pushes the result back onto the stack. If the value in st(1) is not an integer, fscale truncates it towards zero before performing the operation. This instruction raises the stack exception if there are not two items currently on the stack (this will also clear C1 since stack underflow occurs). It raises the precision exception if there is a loss of precision due to this operation (this occurs when st(1) contains a large, negative, value). Likewise, this instruction sets the underflow or overflow exception bits if you multiply st(0) by a very large positive or negative power of two. If the result of the multiplication is very small, fscale could set the denormalized bit. Also, this instruction could set the invalid operation bit if you attempt to fscale illegal values. Fscale sets C1 if rounding occurs in an otherwise correct computation. Example: fild fld fscale
Sixteen x
;Push sixteen onto the stack. ;Compute x * (2**16).
. . .
Sixteen
word
16
14.4.6.7 The FPREM and FPREM1 Instructions The fprem and fprem1 instructions compute a partial remainder. Intel designed the fprem instruction before the IEEE finalized their floating point standard. In the final draft of the IEEE floating point standard, the definition of fprem was a little different than Intel’s original design. Unfortunately, Intel needed to maintain compatibility with the existing software that used the fprem instruction, so they designed a new version to handle the IEEE partial remainder operation, fprem1. You should always use fprem1 in new software you write, therefore we will only discuss fprem1 here, although you use fprem in an identical fashion. Fprem1 computes the partial remainder of st(0)/st(1). If the difference between the exponents of st(0) and st(1) is less than 64, fprem1 can compute the exact remainder in one
Page 795
Chapter 14 operation. Otherwise you will have to execute the fprem1 two or more times to get the correct remainder value. The C2 condition code bit determines when the computation is complete. Note that fprem1 does not pop the two operands off the stack; it leaves the partial remainder in st(0) and the original divisor in st(1) in case you need to compute another partial product to complete the result. The fprem1 instruction sets the stack exception flag if there aren’t two values on the top of stack. It sets the underflow and denormal exception bits if the result is too small. It sets the invalid operation bit if the values on tos are inappropriate for this operation. It sets the C2 condition code bit if the partial remainder operation is not complete. Finally, it loads C3, C1, and C0 with bits zero, one, and two of the quotient, respectively. Example: ; Compute Z := X mod Y
PartialLp:
fld fld fprem1 fstsw test jnz fstp fstp
y x ax ah, 100b PartialLp Z st(0)
;Get condition bits in AX. ;See if C2 is set. ;Repeat if not done yet. ;Store remainder away. ;Pop old y value.
14.4.6.8 The FRNDINT Instruction The frndint instruction rounds the value on tos to the nearest integer using the rounding algorithm specified in the control register. This instruction sets the stack exception flag if there is no value on the tos (it will also clear C1 in this case). It sets the precision and denormal exception bits if there was a loss of precision. It sets the invalid operation flag if the value on the tos is not a valid number.
14.4.6.9 The FXTRACT Instruction The fxtract instruction is the complement to the fscale instruction. It pops the value off the top of the stack and pushes a value which is the integer equivalent of the exponent (in 80 bit real form), and then pushes the mantissa with an exponent of zero (3fffh in biased form). This instruction raises the stack exception if there is a stack underflow when popping the original value or a stack overflow when pushing the two results (C1 determines whether stack overflow or underflow occurs). If the original top of stack was zero, fxtract sets the zero division exception flag. The denormalized flag is set if the result warrants it; and the invalid operation flag is set if there are illegal input values when you execute fxtract. Example: ; The following example extracts the binary exponent of X and ; stores this into the 16 bit integer variable Xponent. fld fxtract fstp fistp
x st(0) Xponent
14.4.6.10 The FABS Instruction Fabs computes the absolute value of st(0) by clearing the sign bit of st(0). It sets the stack exception bit and invalid operation bits if the stack is empty.
Page 796
Floating Point Arithmetic Example: ; Compute X := sqrt(abs(x)); fld fabs fsqrt fstp
x
x
14.4.6.11 The FCHS Instruction Fchs changes the sign of st(0)’s value by inverting its sign bit. It sets the stack exception bit and invalid operation bits if the stack is empty. Example: ; Compute X := -X if X is positive, X := X if X is negative. fld fabs fchs fstp
14.4.7
x
x
Comparison Instructions The 80x87 provides several instructions for comparing real values. The fcom, fcomp, fcompp, fucom, fucomp, and fucompp instructions compare the two values on the top of stack and set the condition codes appropriately. The ftst instruction compares the value on the top of stack with zero. The fxam instrution checks the value on tos and reports sign, normalization, and tag information. Generally, most programs test the condition code bits immediately after a comparison. Unfortunately, there are no conditional jump instructions that branch based on the FPU condition codes. Instead, you can use the fstsw instruction to copy the floating point status register (see “The FPU Status Register” on page 785) into the ax register; then you can use the sahf instruction to copy the ah register into the 80x86’s condition code bits. After doing this, you can can use the conditional jump instructions to test some condition. This technique copies C0 into the carry flag, C2 into the parity flag, and C3 into the zero flag. The sahf instruction does not copy C1 into any of the 80x86’s flag bits. Since the sahf instruction does not copy any 80x87 processor status bits into the sign or overflow flags, you cannot use the jg, jl, jge, or jle instructions. Instead, use the ja, jae, jb, jbe, je, and jz instructions when testing the results of a floating point comparison. Yes, these conditional jumps normally test unsigned values and floating point numbers are signed values. However, use the unsigned conditional branches anyway; the fstsw and sahf instructions set the 80x86 flags register to use the unsigned jumps.
14.4.7.1 The FCOM, FCOMP, and FCOMPP Instructions The fcom, fcomp, and fcompp instructions compare st(0) to the specified operand and set the corresponding 80x87 condition code bits based on the result of the comparison. The legal forms for these instructions are fcom fcomp fcompp fcom fcomp
st(i) st(i)
fcom fcomp
mem mem
Page 797
Chapter 14 With no operands, fcom, fcomp, and fcompp compare st(0) against st(1) and set the processor flags accordingly. In addition, fcomp pops st(0) off the stack and fcompp pops both st(0) and st(1) off the stack. With a single register operand, fcom and fcomp compare st(0) against the specified register. Fcomp also pops st(0) after the comparison. With a 32 or 64 bit memory operand, the fcom and fcomp instructions convert the memory variable to an 80 bit extended precision value and then compare st(0) against this value, setting the condition code bits accordingly. Fcomp also pops st(0) after the comparison. These instructions set C2 (which winds up in the parity flag) if the two operands are not comparable (e.g., NaN). If it is possible for an illegal floating point value to wind up in a comparison, you should check the parity flag for an error before checking the desired condition. These instructions set the stack fault bit if there aren’t two items on the top of the register stack. They set the denormalized exception bit if either or both operands are denormalized. They set the invalid operation flag if either or both operands are quite NaNs. These instructions always clear the C1 condition code.
14.4.7.2 The FUCOM, FUCOMP, and FUCOMPP Instructions These instructions are similar to the fcom, fcomp, and fcompp instructions, although they only allow the following forms: fucom fucomp fucompp fucom fucomp
st(i) st(i)
The difference between fcom/fcomp/fcompp and fucom/fucomp/fucompp is relatively minor. The fcom/fcomp/fcompp instructions set the invalid operation exception bit if you compare two NaNs. The fucom/fucomp/fucompp instructions do not. In all other cases, these two sets of instructions behave identically.
14.4.7.3 The FTST Instruction The ftst instruction compares the value in st(0) against 0.0. It behaves just like the fcom instruction would if st(1) contained 0.0. Note that this instruction does not differentiate -0.0 from +0.0. If the value in st(0) is either of these values, ftst will set C3 to denote equality. If you need to differentiate -0.0 from +0.0, use the fxam instruction. Note that this instruction does not pop st(0) off the stack.
14.4.7.4 The FXAM Instruction The fxam instruction examines the value in st(0) and reports the results in the condition code bits (see “The FPU Status Register” on page 785 for details on how fxam sets these bits). This instruction does not pop st(0) off the stack.
14.4.8
Constant Instructions The 80x87 FPU provides several instructions that let you load commonly used constants onto the FPU’s register stack. These instructions set the stack fault, invalid opera-
Page 798
Floating Point Arithmetic tion, and C1 flags if a stack overflow occurs; they do not otherwise affect the FPU flags. The specific instructions in this category include: fldz fld1 fldpi fldl2t fldl2e fldlg2 fldln2
14.4.9
;Pushes ;Pushes ;Pushes ;Pushes ;Pushes ;Pushes ;Pushes
+0.0. +1.0. π. log2(10). log2(e). log10(2). ln(2).
Transcendental Instructions The 80387 and later FPUs provide eight transcendental (log and trigonometric) instructions to compute a partial tangent, partial arctangent, 2x-1, y * log2(x), and y * log2(x+1). Using various algebraic identities, it is easy to compute most of the other common transcendental functions using these instructions.
14.4.9.1 The F2XM1 Instruction F2xm1 computes 2st(0)-1. The value in st(0) must be in the range -1.0 ≤ st(0) ≤ +1.0. If st(0) is out of range f2xm1 generates an undefined result but raises no exceptions. The computed value replaces the value in st(0). Example: ; Compute 10x using the identity: 10x = 2x*lg(10) (lg = log2). fld fldl2t fmul f2xm1 fld1 fadd
x
Note that f2xm1 computes 2x-1, which is why the code above adds 1.0 to the result at the end of the computation.
14.4.9.2 The FSIN, FCOS, and FSINCOS Instructions These instructions pop the value off the top of the register stack and compute the sine, cosine, or both, and push the result(s) back onto the stack. The fsincos pushes the sine followed by the cosine of the original operand, hence it leaves cos(st(0)) in st(0) and sin(st(0)) in st(1). These instructions assume st(0) specifies an angle in radians and this angle must be in the range -263 < st(0) < +263. If the original operand is out of range, these instructions set the C2 flag and leave st(0) unchanged. You can use the fprem1 instruction, with a divisor of 2π, to reduce the operand to a reasonable range. These instructions set the stack fault/C1, precision, underflow, denormalized, and invalid operation flags according to the result of the computation.
14.4.9.3 The FPTAN Instruction Fptan computes the tangent of st(0) and pushes this value and then it pushes 1.0 onto the stack. Like the fsin and fcos instructions, the value of st(0) is assumed to be in radians and must be in the range -263<st(0)<+263. If the value is outside this range, fptan sets C2 to indicate that the conversion did not take place. As with the fsin, fcos, and fsincos instructions, you can use the fprem1 instruction to reduce this operand to a reasonable range using a divisor of 2π.
Page 799
Chapter 14 If the argument is invalid (i.e., zero or π radians, which causes a division by zero) the result is undefined and this instruction raises no exceptions. Fptan will set the stack fault, precision, underflow, denormal, invalid operation, C2, and C1 bits as required by the operation.
14.4.9.4 The FPATAN Instruction This instruction expects two values on the top of stack. It pops them and computes the following: st(0) = tan-1( st(1) / st(0) ) The resulting value is the arctangent of the ratio on the stack expressed in radians. If you have a value you wish to compute the tangent of, use fld1 to create the appropriate ratio and then execute the fpatan instruction. This instruction affects the stack fault/C1, precision, underflow, denormal, and invalid operation bits if an problem occurs during the computation. It sets the C1 condition code bit if it has to round the result.
14.4.9.5 The FYL2X and FYL2XP1 Instructions The fyl2x and fyl2xp1 instructions compute st(1) * log2(st(0)) and st(1) * log2(st(0)+1), respectively. Fyl2x requires that st(0) be greater than zero, fyl2xp1 requires st(0) to be in the range:
2 2 - < st ( 0 ) < 1 – ------- – 1 – -----2 2 Fyl2x is useful for computing logs to bases other than two; fyl2xp1 is useful for computing compound interest, maintaining the maximum precision during computation. Fyl2x can affect all the exception flags. C1 denotes rounding if there is not other error, stack overflow/underflow if the stack fault bit is set.
The fyl2xp1 instruction does not affect the overflow or zero divide exception flags. These exceptions occur when st(0) is very small or zero. Since fyl2xp1 adds one to st(0) before computing the function, this condition never holds. Fyl2xp1 affects the other flags in a manner identical to fyl2x.
14.4.10 Miscellaneous instructions The 80x87 FPU includes several additional instructions which control the FPU, synchronize operations, and let you test or set various status bits. These instructions include finit/fninit, fdisi/fndisi, feni/fneni, fldcw, fstcw/fnstcw, fclex/fnclex, fsave/fnsave, frstor, frstpm, fstsw/fnstsw, fstenv/fnstenv, fldenv, fincstp, fdecstp, fwait, fnop, and ffree. The fdisi/fndisi, feni/fneni, and frstpm are active only on FPUs earlier than the 80387, so we will not consider them
here. Many of these instructions have two forms. The first form is Fxxxx and the second form is FNxxxx. The version without the “N” emits an fwait instruction prior to opcode (which is standard for most coprocessor instructions). The version with the “N” does not emit the fwait opcode (“N” stands for no wait).
14.4.10.1 The FINIT and FNINIT Instructions The finit instruction intializes the FPU for proper operation. Your applications should execute this instruction before executing any other FPU instructions. This instruction iniPage 800
Floating Point Arithmetic tializes the control register to 37Fh (see “The FPU Control Register” on page 782), the status register to zero (see “The FPU Status Register” on page 785) and the tag word to 0FFFFh. The other registers are unaffected.
14.4.10.2 The FWAIT Instruction The fwait instruction pauses the system until any currently executing FPU instruction completes. This is required because the FPU on the 80486sx and earlier CPU/FPU combinations can execute instructions in parallel with the CPU. Therefore, any FPU instruction which reads or writes memory could suffer from a data hazard if the main CPU accesses that same memory location before the FPU reads or writes that location. The fwait instruction lets you synchronize the operation of the FPU by waiting until the completion of the current FPU instruction. This resolves the data hazard by, effectively, inserting an explict “stall” into the execution stream.
14.4.10.3 The FLDCW and FSTCW Instructions The fldcw and fstcw instructions require a single 16 bit memory operand: fldcw fstcw
mem_16 mem_16
These two instructions load the control register (see “The FPU Control Register” on page 782) from a memory location (fldcw) or store the control word to a 16 bit memory location (fstcw). When using the fldcw instruction to turn on one of the exceptions, if the corresponding exception flag is set when you enable that exception, the FPU will generate an immediate interrupt before the CPU executes the next instruction. Therefore, you should use the fclex instruction to clear any pending interrupts before changing the FPU exception enable bits.
14.4.10.4 The FCLEX and FNCLEX Instructions The fclex and fnclex instructions clear all exception bits the stack fault bit, and the busy flag in the FPU status register (see “The FPU Status Register” on page 785).
14.4.10.5 The FLDENV, FSTENV, and FNSTENV Instructions fstenv fnstenv fldenv
mem_14b mem_14b mem_14b
The fstenv/fnstenv instructions store a 14-byte FPU environment record to the memory operand specified. When operating in real mode (the only mode this text considers), the environment record takes the form appearing in Figure 14.11. You must execute the fstenv and fnstenv instructions with the CPU interrupts disabled. Furthermore, you should always ensure that the FPU is not busy before executing this instruction. This is easily accomplished by using the following code: pushf cli fstenv fwait popf
mem_14b
;Preserve I flag. ;Disable interrupts. ;Implicit wait for not busy. ;Wait for operation to finish. ;Restore I flag.
The fldenv instruction loads the FPU environment from the specified memory operand. Note that this instruction lets you load the the status word. There is no explicit instruction like fldcw to accomplish this. Page 801
Chapter 14
Offset Data Ptr Bits 16-19
Unused Bits (set to zero) Data Ptr (Bits 0-15)
Instr Ptr Bits 16-19
0
Instruction opcode (11 bits) Instr Ptr (Bits 0-15)
12 10 8 6
Tag Word
4
Status Word
2
Control Word
0
Figure 14.11 FPU Environment Record (16 Bit Real Mode)
14.4.10.6 The FSAVE, FNSAVE, and FRSTOR Instructions fsave fnsave frstor
mem_94b mem_94b mem_94b
These instructions save and restore the state of the FPU. This includes saving all the internal control, status, and data registers. The destination location for fsave/fnsave (source location for frstor) must be 94 bytes long. The first 14 bytes correspond to the environment record the fldenv and fstenv instructions use; the remaining 80 bytes hold the data from the FPU register stack written out as st(0) through st(7). Frstor reloads the environment record and floating point registers from the specified memory operand. The fsave/fnsave and frstor instructions are mainly intended for task switching. You can also use fsave/fnsave and frstor as a “push all” and “pop all” sequence to preserve the state of the FPU. Like the fstenv and fldenv instructions, interrupts should be disabled while saving or restoring the FPU state. Otherwise another interrupt service routine could manipulate the FPU registers and invalidate the operation of the fsave/fnsave or frestore operation. The following code properly protects the environment data while saving and restore the FPU status: ; Preserve the FPU state, assume di points at the environment ; record in memory. pushf cli fsave fwait popf
[si]
. . .
pushf cli frstor fwait popf
Page 802
[si]
Floating Point Arithmetic
14.4.10.7 The FSTSW and FNSTSW Instructions fstsw fnstsw fstsw fnstsw
ax ax mem_16 mem_16
These instructions store the FPU status register (see “The FPU Status Register” on page 785) into a 16 bit memory location or the ax register. These instructions are unusual in the sense that they can copy an FPU value into one of the 80x86 general purpose registers. Of course, the whole purpose behind allowing the transfer of the status register into ax is to allow the CPU to easily test the condition code register with the sahf instruction.
14.4.10.8 The FINCSTP and FDECSTP Instructions The fincstp and fdecstp instructions do not take any operands. They simply increment and decrement the stack pointer bits (mod 8) in the FPU status register. These two instructions clear the C1 flag, but do not otherwise affect the condition code bits in the FPU status register.
14.4.10.9 The FNOP Instruction The fnop instruction is simply an alias for fst st, st(0). It performs no other operation on the FPU.
14.4.10.10The FFREE Instruction ffree
st(i)
This instruction modifies the tag bits for register i in the tags register to mark the specified register as emtpy. The value is unaffected by this instruction, but the FPU will no longer be able to access that data (without resetting the appropriate tag bits).
14.4.11 Integer Operations The 80x87 FPUs provide special instructions that combine integer to extended precision conversion along with various arithmetic and comparison operations. These instructions are the following: fiadd fisub fisubr fimul fidiv fidivr
int int int int int int
ficom ficomp
int int
These instructions convert their 16 or 32 bit integer operands to an 80 bit extended precision floating point value and then use this value as the source operand for the specified operation. These instructions use st(0) as the destination operand.
Page 803
Chapter 14
14.5
Sample Program: Additional Trigonometric Functions This section provides various examples of 80x87 FPU programming. This group of routines provides several trigonometric, inverse trigonometric, logarithmic, and exponential functions using various algebraic identities. All these functions assume that the input values are on the stack are are within valid ranges for the given functions. The trigonometric routines expect angles expressed in radians and the inverse trig routines produce angles measured in radians. This program (transcnd.asm) appears on the companion CD-ROM. .xlist include stdlib.a includelib stdlib.lib .list .386 .387 option
segment:use16
dseg
segment
para public ‘data’
result
real8
?
; Some variables we use to test the routines in this package:
Page 804
cotvar cotRes acotRes
real8 real8 real8
3.0 ? ?
cscvar cscRes acscRes
real8 real8 real8
1.5 ? ?
secvar secRes asecRes
real8 real8 real8
0.5 ? ?
sinvar sinRes asinRes
real8 real8 real8
0.75 ? ?
cosvar cosRes acosRes
real8 real8 real8
0.25 ? ?
Two2xvar Two2xRes lgxRes
real8 real8 real8
-2.5 ? ?
Ten2xVar Ten2xRes logRes
real8 real8 real8
3.75 ? ?
expVar expRes lnRes
real8 real8 real8
3.25 ? ?
Y2Xx Y2Xy Y2XRes
real8 real8 real8
3.0 3.0 ?
dseg
ends
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
Floating Point Arithmetic ; COT(x) - Computes the cotangent of st(0) and leaves result in st(0). ; st(0) contains x (in radians) and must be between ; -2**63 and +2**63 ; ; There must be at least one free register on the stack for ; this routine to operate properly. ; ; cot(x) = 1/tan(x) cot
cot
proc fsincos fdivr ret endp
near
; CSC(x) - computes the cosecant of st(0) and leaves result in st(0). ; st(0) contains x (in radians) and must be between ; -2**63 and +2**63. ; The cosecant of x is undefined for any value of sin(x) that ; produces zero (e.g., zero or pi radians). ; ; There must be at least one free register on the stack for ; this routine to operate properly. ; ; csc(x) = 1/sin(x) csc
csc
proc fsin fld1 fdivr ret endp
near
; SEC(x) - computes the secant of st(0) and leaves result in st(0). ; st(0) contains x (in radians) and must be between ; -2**63 and +2**63. ; ; The secant of x is undefined for any value of cos(x) that ; produces zero (e.g., pi/2 radians). ; ; There must be at least one free register on the stack for ; this routine to operate properly. ; ; sec(x) = 1/cos(x) sec
sec
proc fcos fld1 fdivr ret endp
near
; ASIN(x)- Computes the arcsine of st(0) and leaves the result in st(0). ; Allowable range: -1<=x<=+1 ; There must be at least two free registers for this ; function to operate properly. ; ; asin(x) = atan(sqrt(x*x/(1-x*x))) asin
proc fld fmul fld fld1 fsubr fdiv fsqrt fld1 fpatan ret
near st(0) st(0)
;Duplicate X on tos. ;Compute X**2. ;Duplicate X**2 on tos. ;Compute 1-X**2. ;Compute X**2/(1-X**2). ;Compute sqrt(x**2/(1-X**2)). ;To compute full arctangent. ;Compute atan of the above.
Page 805
Chapter 14 asin
endp
; ACOS(x)- Computes the arccosine of st(0) and leaves the ; result in st(0). ; Allowable range: -1<=x<=+1 ; There must be at least two free registers for ; this function to operate properly. ; ; acos(x) = atan(sqrt((1-x*x)/(x*x))) acos
acos
proc fld fmul fld fld1 fsubr fdiv fsqrt fld1 fpatan ret endp
near st(0) st(0)
;Duplicate X on tos. ;Compute X**2. ;Duplicate X**2 on tos. ;Compute 1-X**2. ;Compute (1-x**2)/X**2. ;Compute sqrt((1-X**2)/X**2). ;To compute full arctangent. ;Compute atan of the above.
; ACOT(x)- Computes the arccotangent of st(0) and leaves the ; result in st(0). ; X cannot equal zero. ; There must be at least one free register for ; this function to operate properly. ; ; acot(x) = atan(1/x) acot
acot
proc fld1 fxch fpatan ret endp
near ;fpatan computes ; atan(st(1)/st(0)). ; we want atan(st(0)/st(1)).
; ACSC(x)- Computes the arccosecant of st(0) and leaves the ; result in st(0). ; abs(X) must be greater than one. ; There must be at least two free registers for ; this function to operate properly. ; ; acsc(x) = atan(sqrt(1/(x*x-1))) acsc
acsc
proc fld fmul fld1 fsub fld1 fdivr fsqrt fld1 fpatan ret endp
near st(0)
;Compute x*x ;Compute x*x-1 ;Compute 1/(x*x-1) ;Compute sqrt(1/(x*x-1)) ;Compute atan of above.
; ASEC(x)- Computes the arcsecant of st(0) and leaves the ; result in st(0). ; abs(X) must be greater than one. ; There must be at least two free registers for ; this function to operate properly. ; ; asec(x) = atan(sqrt(x*x-1)) asec
Page 806
proc fld fmul
near st(0)
;Compute x*x
Floating Point Arithmetic
asec
fld1 fsub fsqrt fld1 fpatan ret endp
;Compute x*x-1 ;Compute sqrt(x*x-1) ;Compute atan of above.
; TwoToX(x)- Computes 2**x. ; It does this by using the algebraic identity: ; ; 2**x = 2**int(x) * 2**frac(x). ; We can easily compute 2**int(x) with fscale and ; 2**frac(x) using f2xm1. ; ; This routine requires three free registers. SaveCW MaskedCW
word word
? ?
TwoToX
proc fstcw
near cseg:SaveCW
; Modify the control word to truncate when rounding. fstcw or fldcw
cseg:MaskedCW byte ptr cseg:MaskedCW+1, 1100b cseg:MaskedCW
fld fld frndint
st(0) st(0)
fxch fsub
;Swap whole and int values. st(0), st(1) ;Compute fractional part.
;Compute integer portion.
f2xm1 fld1 fadd fxch fld1 fscale fstp
;Compute 2**frac(x)-1. ;Compute 2**frac(x). ;Get integer portion. ;Compute 1*2**int(x). st(1)
fmul
TwoToX
fldcw ret endp
;Duplicate tos.
;Remove st(1) (which is 1). ;Compute 2**int(x) * 2**frac(x).
cseg:SaveCW
;Restore rounding mode.
; TenToX(x)- Computes 10**x. ; ; This routine requires three free registers. ; ; TenToX(x) = 2**(x * lg(10)) TenToX
TenToX
proc fldl2t fmul call ret endp
near
TwoToX
;Put lg(10) onto the stack ;Compute x*lg(10) ;Compute 2**(x * lg(10)).
; exp(x)- Computes e**x. ; ; This routine requires three free registers. ; ; exp(x) = 2**(x * lg(e))
Page 807
Chapter 14 exp
exp
proc fldl2e fmul call ret endp
near
TwoToX
;Put lg(e) onto the stack. ;Compute x*lg(e). ;Compute 2**(x * lg(e))
; YtoX(y,x)- Computes y**x (y=st(1), x=st(0)). ; ; This routine requires three free registers. ; ; Y must be greater than zero. ; ; YtoX(y,x) = 2 ** (x * lg(y)) YtoX
YtoX
proc fxch fld1 fxch fyl2x fmul call ret endp
near ;Compute lg(y).
TwoToX
;Compute x*lg(y). ;Compute 2**(x*lg(y)).
; LOG(x)- Computes the base 10 logarithm of x. ; ; Usual range for x (>0). ; ; LOG(x) = lg(x)/lg(10). log
log
proc fld1 fxch fyl2x fldl2t fdiv ret endp
near
;Compute 1*lg(x). ;Load lg(10). ;Compute lg(x)/lg(10).
; LN(x)- Computes the base e logarithm of x. ; ; X must be greater than zero. ; ; ln(x) = lg(x)/lg(e). ln
ln
proc fld1 fxch fyl2x fldl2e fdiv ret endp
near
;Compute 1*lg(x). ;Load lg(e). ;Compute lg(x)/lg(10).
; This main program tests the various functions in this package. Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
finit ; Check to see if cot and acot are working properly.
Page 808
Floating Point Arithmetic fld call fst call fstp
cotVar cot cotRes acot acotRes
printff byte dword
“x=%8.5gf, cot(x)=%8.5gf, acot(cot(x)) = %8.5gf\n”,0 cotVar, cotRes, acotRes
; Check to see if csc and acsc are working properly. fld call fst call fstp
cscVar csc cscRes acsc acscRes
printff byte dword
“x=%8.5gf, csc(x)=%8.5gf, acsc(csc(x)) = %8.5gf\n”,0 cscVar, cscRes, acscRes
; Check to see if sec and asec are working properly. fld call fst call fstp
secVar sec secRes asec asecRes
printff byte dword
“x=%8.5gf, sec(x)=%8.5gf, asec(sec(x)) = %8.5gf\n”,0 secVar, secRes, asecRes
; Check to see if sin and asin are working properly. fld fsin fst call fstp printff byte dword
sinVar sinRes asin asinRes
“x=%8.5gf, sin(x)=%8.5gf, asin(sin(x)) = %8.5gf\n”,0 sinVar, sinRes, asinRes
; Check to see if cos and acos are working properly. fld fcos fst call fstp printff byte dword
cosVar cosRes acos acosRes
“x=%8.5gf, cos(x)=%8.5gf, acos(cos(x)) = %8.5gf\n”,0 cosVar, cosRes, acosRes
; Check to see if 2**x and lg(x) are working properly. fld call fst fld1 fxch fyl2x fstp printff byte
Two2xVar TwoToX Two2xRes
lgxRes
“x=%8.5gf, 2**x =%8.5gf, lg(2**x) = %8.5gf\n”,0
Page 809
Chapter 14 dword
Two2xVar, Two2xRes, lgxRes
; Check to see if 10**x and l0g(x) are working properly. fld call fst call fstp
Ten2xVar TenToX Ten2xRes LOG logRes
printff byte dword
“x=%8.5gf, 10**x =%8.2gf, log(10**x) = %8.5gf\n”,0 Ten2xVar, Ten2xRes, logRes
; Check to see if exp(x) and ln(x) are working properly. fld call fst call fstp
expVar exp expRes ln lnRes
printff byte dword
“x=%8.5gf, e**x =%8.2gf, ln(e**x) = %8.5gf\n”,0 expVar, expRes, lnRes
; Check to see if y**x is working properly. fld fld call fstp
Y2Xy Y2Xx YtoX Y2XRes
printff byte dword
“x=%8.5gf, y =%8.5gf, y**x = %8.4gf\n”,0 Y2Xx, Y2Xy, Y2XRes
Quit: Main cseg
ExitPgm endp ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?) Main
Sample program output: x= 3.00000, cot(x)=-7.01525, x= 1.50000, csc(x)= 1.00251, x= 0.50000, sec(x)= 1.13949, x= 0.75000, sin(x)= 0.68163, x= 0.25000, cos(x)= 0.96891, x=-2.50000, 2**x = 0.17677, x= 3.75000, 10**x = 5623.41, x= 3.25000, e**x = 25.79, x= 3.00000, y = 3.00000,
14.6
Laboratory Exercises
Page 810
acot(cot(x)) acsc(csc(x)) asec(sec(x)) asin(sin(x)) acos(cos(x)) lg(2**x) log(10**x) ln(e**x) y**x
= = = = = = = = =
3.00000 1.50000 0.50000 0.75000 0.25000 -2.50000 3.75000 3.25000 27.0000
Floating Point Arithmetic
14.6.1
FPU vs StdLib Accuracy In this laboratory exercise you will will run two programs that perform 20,000,000 floating point additions. These programs do the first 10,000,000 additions using the 80x87 FPU, they do the second 10,000,000 additions using the Standard Library’s floating point routines. This exercise demonstrates the relative accuracy of the two floating point mechanisms. For your lab report: assemble and run the EX14_1.asm program (it’s on the companion CD-ROM). This program adds together 10,000,000 64-bit floating point values and prints their sum. Describe the results in your lab report. Time these operations and report the time difference in your lab report. Note that the exact sum these operations should produce is 1.00000010000e+0000. After running Ex14_1.asm, repeat this process for the Ex14_2.asm file. Ex14_2 differs from Ex14_1 insofar as Ex14_2 lets the Standard Library routines operate on 80-bit memory operands (the FPU cannot operate on 80-bit memory operands, so this part remains unchanged). Time the execution of Ex14_2’s two components. Compare these times against the running time of Ex14_1 and explain any differences. ; ; ; ; ; ; ; ; ;
EX14_1.asm This program runs some tests to determine how well the floating point arithmetic in the Standard Library compares with the floating point arithmetic on the 80x87. It does this performing various operations using both methods and comparing the result. Of course, you must have an 80x87 FPU (or 80486 or later processor) in order to run this code. .386 option
segment:use16
include stdlib.a includelib stdlib.lib dseg
segment
para public 'data'
; Since this is an accuracy test, this code uses REAL8 values for ; all operations slValue1 slSmallVal
real8 real8
1.0 1.0e-14
Value1 SmallVal
real8 real8
1.0 1.0e-14
Buffer
byte
20 dup (0)
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
finit
;Initialize the FPU
; Do 10,000,000 floating point additions:
Page 811
Chapter 14 printff byte byte
FPLoop:
mov fld fld fadd fstp dec jnz printff byte dword
"Adding 10,000,000 FP values together with the “ “FPU",cr,lf,0 ecx, 10000000 Value1 SmallVal Value1 ecx FPLoop
"Result = %20GE\n",cr,lf,0 Value1
; Do 10,000,000 floating point additions with the Standard Library fpadd ; routine: printff byte byte byte byte byte byte
SLLoop:
mov lesi ldfpa lesi ldfpo fpadd lesi sdfpa dec jnz printff byte dword
ecx, 10000000 slValue1 slSmallVal
slValue1 ecx SLLoop
"Result = %20GE\n",cr,lf,0 slValue1
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?)
; ; ; ; ; ; ; ; ;
Page 812
cr,lf "Adding 10,000,000 FP values together with the “ “StdLib", cr,lf "Note: this may take a few minutes to run, don't “ “get too impatient" cr,lf,0
;DOS macro to quit program.
Main
EX14_2.asm This program runs some tests to determine how well the floating point arithmetic in the Standard Library compares with the floating point arithmetic on the 80x87. It lets the standard library routines use the full 80-bit format since they allow it and the FPU does not. Of course, you must have an 80x87 FPU (or 80486 or later processor) in order to run this code.
Floating Point Arithmetic .386 option
segment:use16
include stdlib.a includelib stdlib.lib dseg
segment
para public 'data'
slValue1 slSmallVal
real10 real10
1.0 1.0e-14
Value1 SmallVal
real8 real8
1.0 1.0e-14
Buffer
byte
20 dup (0)
dseg
ends
cseg
segment assume
Main
proc mov mov mov meminit finit
para public 'code' cs:cseg, ds:dseg
ax, dseg ds, ax es, ax ;Initialize the FPU
; Do 10,000,000 floating point additions: printff byte byte
FPLoop:
mov fld fld fadd fstp dec jnz printff byte dword
"Adding 10,000,000 FP values together with the “ “FPU",cr,lf,0 ecx, 10000000 Value1 SmallVal Value1 ecx FPLoop
"Result = %20GE\n",cr,lf,0 Value1
; Do 10,000,000 floating point additions with the Standard Library fpadd ; routine: printff byte byte byte byte byte byte
SLLoop:
mov lesi lefpa lesi lefpo fpadd lesi sefpa dec jnz
cr,lf "Adding 10,000,000 FP values together with the “ “StdLib", cr,lf "Note: this may take a few minutes to run, don't “ “get too impatient" cr,lf,0 ecx, 10000000 slValue1 slSmallVal
slValue1 ecx SLLoop
printff
Page 813
Chapter 14 byte dword
"Result = %20LE\n",cr,lf,0 slValue1
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?)
14.7
Programming Projects
14.8
Summary
;DOS macro to quit program.
Main
For many applications integer arithmetic has two insurmountable drawbacks – it is not easy to represent fractional values with integers and integers have a limited dynamic range. Floating point arithmetic provides an approximation to real arithmetic that overcomes these two limitations. Floating point arithmetic, however, is not without its own problems. Floating point arithmetic suffers from limited precision. As a result, inaccuracies can creep into a calculation. Therefore, floating point arithmetic does not completely follow normal algebraic rules. There are five very important rules to keep in mind when using floating point arithmetic: (1`) The order of evaluation can affect the accuracy of the result; (2) Whenever adding and subtracting numbers, the accuracy of the result may be less than the precision provided by the floating point format; (3) When performing a chain of calculations involving addition, subtraction, multiplication, and division, try to perform the multiplication and division operations first; (4) When multiplying and dividing values, try to multiply large and small numbers together first and try to divide numbers with the same relative magnitue first; (5) When comparing two floating point numbers, always keep in mind that errors can creep into the computations, therefore you should check to see if one value is within a certain range of the other. For more information, see •
“The Mathematics of Floating Point Arithmetic” on page 771
Early on Intel recognized the need for a hardware floating point unit. They hired three mathematicians to design highly accurate floating point formats and algorithms for their 80x87 family of FPUs. These formats, with slight modifications, become the IEEE 754 and IEEE 854 floating point standards. The IEEE standard actually provides for three different formats: a 32 bit standard precision format, a 64 bit double precision format, and an extended precision format. Intel implemented the extended precision format using 80 bits9. The 32 bit format uses a 24 bit mantissa (the H.O. bit is an implied one and is not stored in the 32 bits), an eight bit bias 127 exponent, and a one bit sign. The 64 bit format provides a 53 bit mantissa (again, the H.O. bit is always one and is not stored in the 64 b it value), an 11 bit excess 1023 exponent, and a one bit sign. The 80 bit extended precision format uses a 64 bit exponent, a 15 bit excess 16363 exponent, and a single bit sign. For more information, see •
“IEEE Floating Point Formats” on page 774
9. The IEEE standard only requires that the extended precision format contain more bits than the double precision format.
Page 814
Floating Point Arithmetic Although 80x87 FPUs and CPUs with built-in FPUs (80486 and Pentium) are becoming very common, it is still possible that you may need to execute code that uses floating point arithmetic on a machine without an FPU. In such cases you will need to supply software routines to execute the floating point arithmetic. Fortunately, the UCR Standard Library provides a set of floating point routines you can call. The Standard Library includes routines to load and store floating point values, convert between integer and floating point formats, add, subtract, multiply, and divide floating point values, convert between ASCII and floating point, and output floating point values. Even if you have an FPU installed, the Standard Library’s conversion and output routines are quite useful. For more information, see •
“The UCR Standard Library Floating Point Routines” on page 777
For fast floating point arithmetic, software doesn’t stand a chance against hardware. The 80x87 FPUs provide fast and convient floating point operations by extended the 80x86’s instruction set to handle floating point arithmetic. In addition to the new instructions, the 80x87 FPUs also provide eight new data registers, a control register, a status register, and several other internal registers. The FPU data registers, unlike the 80x86’s general purpose registers, are organized as a stack. Although it is possible to manipulate the registers as though they were a standard register file, most FPU applications use the stack mechanism when computing floating point results. The FPU control register lets you initialize the 80x87 FPU in one of several different modes. The control register lets you set the rounding control, the precision available during computation, and choose which exceptions can cause an interrupt. The 80x87 status register reports the current state of the FPU. This register provides bits that determine if the FPU is currently busy, determine if a previous instruction has generated an exception, determine the physical register number of the top of the register stack, and provide the FPU condition codes. For more information on the 80x87 register set, see • • • • •
“The 80x87 Floating Point Coprocessors” on page 781 “FPU Registers” on page 781 “The FPU Data Registers” on page 782 “The FPU Control Register” on page 782 “The FPU Status Register” on page 785
In addition to the IEEE single, double, and extended preoision data types, the 80x87 FPUs also support various integer and BCD data types. The FPU will automatically convert to and from these data types when loading and storing such values. For more information on these data type formats, see •
“FPU Data Types” on page 788
The 80x87 FPUs provide a wide range of floating point operations by augmenting the 80x86’s instruction set. We can classify the FPU instructions into eight categories: data movement instructions, conversions, arithmetic instructions, comparison instructions, constant instructions, transcendental instructions, miscellaneous instructions, and integer instructions. For more information on these instruction types, see • • • • • • • • •
“The FPU Instruction Set” on page 789 “FPU Data Movement Instructions” on page 789 “Conversions” on page 791 “Arithmetic Instructions” on page 792 “Comparison Instructions” on page 797 “Constant Instructions” on page 798 “Transcendental Instructions” on page 799 “Miscellaneous instructions” on page 800 “Integer Operations” on page 803
Although the 80387 and later FPUs provide a rich set of transcendental functions, there are many trigonometric, inverse trigonometric, exponential, and logarithmic functions missing from the instruction set. However, the missing functions are easy to synthesize using algebraic identities. This chapter provides source code for many of these routines as an example of FPU programming. For more information, see Page 815
Chapter 14 •
Page 816
“Sample Program: Additional Trigonometric Functions” on page 804
Floating Point Arithmetic
14.9
Questions
1)
Why don’t the normal rules of algebra apply to floating point arithmetic?
2)
Give an example of a sequence of operations whose order of evaluation will produce different results with finite precision arithmetic.
3)
Explain why limited precision addition and subtraction operations can cause a loss of precision during a calculation.
4)
Why should you, if at all possible, perform multiplications and divisions first in a calculation involving multiplication or division as well as addition or subtraction?
5)
Explain the difference between a normalized, unnormalized, and denormalized floating point value.
6)
Using the UCR Standard Library, convert the following expression to 80x86 assembly code (assume all variables are 64 bit double precision values). Be sure to perform any necessary algebraic manipulations to ensure the maximum accuracy. You can assume all variables fall in the range ±1e-10…±1e+10. a) Z := X * X + Y * Y
b) Z := (X-Y)*Z
c) Z := X*Y - X/Y
d) Z := (X+Y)/(X-Y)
e) Z := (X*X)/(Y*Y)
f) Z := X*X + Y + 1.0
7)
Convert the above statements to 80x87 FPU code.
8)
The following problems provide definitions for the hyperbolic trigonometric functions. Encode each of these using the 80x87 FPU instructions and the exp(x) and ln(x) routines provided in this chapter. x – e–x a) sinh x = e------------------
x + e–x b) cosh x = e------------------
sinh x c) tanh x = --------------
1 d) csch x = -------------
1 e) sech x = --------------
cosh x f) coth x = --------------
g) asinh x = ln(x + x 2 + 1)
h) acosh x = ln(x + x 2 – 1)
1+x ln(------------) 1–x i) atanh x = -------------------2
x± 1+x j) acsch x = ln(---------------------------)
2
2
cosh x
sinh x
cosh x
sinh x
2
x
x± 1–x ) k) asech x = ln(--------------------------2
x
9)
x+1 ln(---------------)
x–1 l) atanh x = -----------------------2
Create a log(x,y) function which computes logy x. The algebraic identity for this is log 2 x log yx = -----------log 2y
10)
Interval arithmetic involves performing a calculation with every result rounded down and then repeating the computation with each result rounded up. At the end of these two computations, you know that the true result must lie between the two computed results. The rounding control bits in the FPU control register let you select round up and round down modes. Repeat question six applying interval arithmetic and compute the two bounds for each of those problems (a-f).
Page 817
Chapter 14 11)
The mantissa precision control bits in the FPU control register simply control where the FPU rounds results. Selecting a lower precision does not improve the performance of the FPU. Therefore, any new software you write should set these two bits to ones to get 64 bits of precision when performing calculations. Can you provide one reason why you might want to set the precision to something other than 64 bits?
12)
Suppose you have two 64 bit variables, X and Y, that you want to compare to see if they are equal. As you know, you should not compare them directly to see if they are equal, but rather see if they are less than some small value apart. Suppose ε, the error constant, is 1e-300. Provide the code to load ax with zero if X=Y and load ax with one if X≠Y.
13)
Repeat problem 12, except test for: a) X ≤ Y
b) X < Y
c) X ≥ Y
d) X > Y
e) X ≠ Y 14)
What instruction can you use to see if the value in st(0) is denormalized?
15)
Assuming no stack underflow or overflow, what is the C1 condition code bit usually used for?
16)
Many texts, when describing the FPU chip, suggest that you can use the FPU to perform integer arithmetic. An argument generally given is that the FPU can support 64 bit integers whereas the CPU can only support 16 or 32 bit integers. What is wrong with this argument? Why would you not want to use the FPU to perform integer arithmetic? Why does the FPU even provide integer instructions?
17)
Suppose you have a 64 bit double precision floating point value in memory. Describe how you could take the absolute value of this variable without using the FPU (i.e., by using only 80x86 instructions).
18)
Explain how to change the sign of the variable in question 17.
19)
Why does the TwoToX function (see “Sample Program: Additional Trigonometric Functions” on page 804) have to compute the result using fscale and fyl2x? Why can’t it use fyl2x along?
20)
Explain a possible problem with the following code sequence: stp xor
Page 818
mem_64 byte ptr mem_64+7, 80h
;Tweak sign bit
Strings and Character Sets
Chapter 15
A string is a collection of objects stored in contiguous memory locations. Strings are usually arrays of bytes, words, or (on 80386 and later processors) double words. The 80x86 microprocessor family supports several instructions specifically designed to cope with strings. This chapter explores some of the uses of these string instructions. The 8088, 8086, 80186, and 80286 can process two types of strings: byte strings and word strings. The 80386 and later processors also handle double word strings. They can move strings, compare strings, search for a specific value within a string, initialize a string to a fixed value, and do other primitive operations on strings. The 80x86’s string instructions are also useful for manipulating arrays, tables, and records. You can easily assign or compare such data structures using the string instructions. Using string instructions may speed up your array manipulation code considerably.
15.0
Chapter Overview This chapter presents a review of the operation of the 80x86 string instructions. Then it discusses how to process character strings using these instructions. Finally, it concludes by discussing the string instruction available in the UCR Standard Library. The sections below that have a “•” prefix are essential. Those sections with a “❏” discuss advanced topics that you may want to put off for a while. • • • • ❏
15.1
The 80x86 string instructions. Character strings. Character string functions. String functions in the UCR Standard Library. Using the string instructions on other data types.
The 80x86 String Instructions All members of the 80x86 family support five different string instructions: movs, cmps, scas, lods, and stos1. They are the string primitives since you can build most other string operations from these five instructions. How you use these five instructions is the topic of the next several sections.
15.1.1
How the String Instructions Operate The string instructions operate on blocks (contiguous linear arrays) of memory. For example, the movs instruction moves a sequence of bytes from one memory location to another. The cmps instruction compares two blocks of memory. The scas instruction scans a block of memory for a particular value. These string instructions often require three operands, a destination block address, a source block address, and (optionally) an element count. For example, when using the movs instruction to copy a string, you need a source address, a destination address, and a count (the number of string elements to move). Unlike other instructions which operate on memory, the string instructions are single-byte instructions which don’t have any explicit operands. The operands for the string instructions include
1. The 80186 and later processor support two additional string instructions, INS and OUTS which input strings of data from an input port or output strings of data to an output port. We will not consider these instructions in this chapter.
Page 819 Thi d
t
t d ith F
M k
402
Chapter 15 • • • • •
the si (source index) register, the di (destination index) register, the cx (count) register, the ax register, and the direction flag in the FLAGS register.
For example, one variant of the movs (move string) instruction copies a string from the source address specified by ds:si to the destination address specified by es:di, of length cx. Likewise, the cmps instruction compares the string pointed at by ds:si, of length cx, to the string pointed at by es:di. Not all instructions have source and destination operands (only movs and cmps support them). For example, the scas instruction (scan a string) compares the value in the accumulator to values in memory. Despite their differences, the 80x86’s string instructions all have one thing in common – using them requires that you deal with two segments, the data segment and the extra segment.
15.1.2
The REP/REPE/REPZ and REPNZ/REPNE Prefixes The string instructions, by themselves, do not operate on strings of data. The movs instruction, for example, will move a single byte, word, or double word. When executed by itself, the movs instruction ignores the value in the cx register. The repeat prefixes tell the 80x86 to do a multi-byte string operation. The syntax for the repeat prefix is: Field: Label repeat
mnemonic
operand
For MOVS: rep
movs
{operands}
For CMPS: repe repz repne repnz
cmps cmps cmps cmps
{operands} {operands} {operands} {operands}
For SCAS: repe repz repne repnz
scas scas scas scas
{operands} {operands} {operands} {operands}
For STOS: rep
stos
{operands}
;comment
You don’t normally use the repeat prefixes with the lods instruction. As you can see, the presence of the repeat prefixes introduces a new field in the source line – the repeat prefix field. This field appears only on source lines containing string instructions. In your source file: • • •
the label field should always begin in column one, the repeat field should begin at the first tab stop, and the mnemonic field should begin at the second tab stop.
When specifying the repeat prefix before a string instruction, the string instruction repeats cx times2. Without the repeat prefix, the instruction operates only on a single byte, word, or double word.
2. Except for the cmps instruction which repeats at most the number of times specified in the cx register.
Page 820
Strings and Character Sets You can use repeat prefixes to process entire strings with a single instruction. You can use the string instructions, without the repeat prefix, as string primitive operations to synthesize more powerful string operations. The operand field is optional. If present, MASM simply uses it to determine the size of the string to operate on. If the operand field is the name of a byte variable, the string instruction operates on bytes. If the operand is a word address, the instruction operates on words. Likewise for double words. If the operand field is not present, you must append a “B”, “W”, or “D” to the end of the string instruction to denote the size, e.g., movsb, movsw, or movsd.
15.1.3
The Direction Flag Besides the si, di, si, and ax registers, one other register controls the 80x86’s string instructions – the flags register. Specifically, the direction flag in the flags register controls how the CPU processes strings. If the direction flag is clear, the CPU increments si and di after operating upon each string element. For example, if the direction flag is clear, then executing movs will move the byte, word, or double word at ds:si to es:di and will increment si and di by one, two, or four. When specifying the rep prefix before this instruction, the CPU increments si and di for each element in the string. At completion, the si and di registers will be pointing at the first item beyond the string. If the direction flag is set, then the 80x86 decrements si and di after processing each string element. After a repeated string operation, the si and di registers will be pointing at the first byte or word before the strings if the direction flag was set. The direction flag may be set or cleared using the cld (clear direction flag) and std (set direction flag) instructions. When using these instructions inside a procedure, keep in mind that they modify the machine state. Therefore, you may need to save the direction flag during the execution of that procedure. The following example exhibits the kinds of problems you might encounter: StringStuff: cld <do some operations> call Str2 <do some string operations requiring D=0> . . .
Str2
Str2
proc near std ret endp
This code will not work properly. The calling code assumes that the direction flag is clear after Str2 returns. However, this isn’t true. Therefore, the string operations executed after the call to Str2 will not function properly. There are a couple of ways to handle this problem. The first, and probably the most obvious, is always to insert the cld or std instructions immediately before executing a string instruction. The other alternative is to save and restore the direction flag using the pushf and popf instructions. Using these two techniques, the code above would look like this: Always issuing cld or std before a string instruction: StringStuff: cld <do some operations> call Str2 cld <do some string operations requiring D=0>
Page 821
Chapter 15 . . .
Str2
Str2
proc near std ret endp
Saving and restoring the flags register: StringStuff: cld <do some operations> call Str2 <do some string operations requiring D=0> . . .
Str2
Str2
proc near pushf std popf ret endp
If you use the pushf and popf instructions to save and restore the flags register, keep in mind that you’re saving and restoring all the flags. Therefore, such subroutines cannot return any information in the flags. For example, you will not be able to return an error condition in the carry flag if you use pushf and popf.
15.1.4
The MOVS Instruction The movs instruction takes four basic forms. Movs moves bytes, words, or double words, movsb moves byte strings, movsw moves word strings, and movsd moves double word strings (on 80386 and later processors). These four instructions use the following syntax: {REP} {REP} {REP} {REP}
MOVSB MOVSW MOVSD ;Available only on 80386 and later processors MOVS Dest, Source
The movsb (move string, bytes) instruction fetches the byte at address ds:si, stores it at address es:di, and then increments or decrements the si and di registers by one. If the rep prefix is present, the CPU checks cx to see if it contains zero. If not, then it moves the byte from ds:si to es:di and decrements the cx register. This process repeats until cx becomes zero. The movsw (move string, words) instruction fetches the word at address ds:si, stores it at address es:di, and then increments or decrements si and di by two. If there is a rep prefix, then the CPU repeats this procedure as many times as specified in cx. The movsd instruction operates in a similar fashion on double words. Incrementing or decrementing si and di by four for each data movement. MASM automatically figures out the size of the movs instruction by looking at the size of the operands specified. If you’ve defined the two operands with the byte (or comparable) directive, then MASM will emit a movsb instruction. If you’ve declared the two labels via word (or comparable), MASM will generate a movws instruction. If you’ve declared the two labels with dword, MASM emits a movsd instruction. The assembler will also check the segments of the two operands to ensure they match the current assumptions (via the assume directive) about the es and ds registers. You should always use the movsb, movsw, and movsd forms and forget about the movs form.
Page 822
Strings and Character Sets Although, in theory, the movs form appears to be an elegant way to handle the move string instruction, in practice it creates more trouble than it’s worth. Furthermore, this form of the move string instruction implies that movs has explicit operands, when, in fact, the si and di registers implicitly specify the operands. For this reason, we’ll always use the movsb, movsw, or movsd instructions. When used with the rep prefix, the movsb instruction will move the number of bytes specified in the cx register. The following code segment copies 384 bytes from String1 to String2:
rep
cld lea lea mov movsb
si, String1 di, String2 cx, 384
. . .
String1 String2
byte byte
384 dup (?) 384 dup (?)
This code, of course, assumes that String1 and String2 are in the same segment and both the ds and es registers point at this segment. If you substitute movws for movsb, then the code above will move 384 words (768 bytes) rather than 384 bytes:
rep
cld lea lea mov movsw
si, String1 di, String2 cx, 384
. . .
String1 String2
word word
384 dup (?) 384 dup (?)
Remember, the cx register contains the element count, not the byte count. When using the movsw instruction, the CPU moves the number of words specified in the cx register. If you’ve set the direction flag before executing a movsb/movsw/movsd instruction, the CPU decrements the si and di registers after moving each string element. This means that the si and di registers must point at the end of their respective strings before issuing a movsb, movsw, or movsd instruction. For example,
rep
std lea lea mov movsb
si, String1+383 di, String2+383 cx, 384
. . .
String1 String2
byte byte
384 dup (?) 384 dup (?)
Although there are times when processing a string from tail to head is useful (see the cmps description in the next section), generally you’ll process strings in the forward direc-
tion since it’s more straightforward to do so. There is one class of string operations where being able to process strings in both directions is absolutely mandatory: processing strings when the source and destination blocks overlap. Consider what happens in the following code:
rep
cld lea lea mov movsb
si, String1 di, String2 cx, 384
. . .
String1 String2
byte byte
? 384 dup (?)
Page 823
Chapter 15
1st move operation:
X
A
B
C
D
E
F
G
H
I
J
K
L
C
D
E
F
G
H
I
J
K
L
C
D
E
F
G
H
I
J
K
L
X
D
E
F
G
H
I
J
K
L
X
X
X
X
X
X
X
X
X
L
2nd move operation:
X
X
B
3rd move operation:
X
X
X
4th move operation:
X
X
X
nth move operation:
X
X
X
Figure 15.1 Overwriting Data During a Block Move Operation This sequence of instructions treats String1 and String2 as a pair of 384 byte strings. However, the last 383 bytes in the String1 array overlap the first 383 bytes in the String2 array. Let’s trace the operation of this code byte by byte. When the CPU executes the movsb instruction, it copies the byte at ds:si (String1) to the byte pointed at by es:di (String2). Then it increments si and di, decrements cx by one, and repeats this process. Now the si register points at String1+1 (which is the address of String2) and the di register points at String2+1. The movsb instruction copies the byte pointed at by si to the byte pointed at by di. However, this is the byte originally copied from location String1. So the movsb instruction copies the value originally in location String1 to both locations String2 and String2+1. Again, the CPU increments si and di, decrements cx, and repeats this operation. Now the movsb instruction copies the byte from location String1+2 (String2+1) to location String2+2. But once again, this is the value that originally appeared in location String1. Each repetition of the loop copies the next element in String1 to the next available location in the String2 array. Pictorially, it looks something like that in Figure 15.1. Page 824
Strings and Character Sets
1st move operation:
X
A
B
C
D
E
F
G
H
I
J
K
L
C
D
E
F
G
H
I
J
K
K
C
D
E
F
G
H
I
J
J
K
C
D
E
F
G
H
I
I
J
K
B
C
D
E
F
G
H
I
J
K
2nd move operation:
X
A
B
3rd move operation:
X
A
B
4th move operation:
X
A
B
nth move operation:
X
A
A
Figure 15.2 Correct Way to Move Data With a Block Move Operation The end result is that X gets replicated throughout the string. The move instruction copies the source operand into the memory location which will become the source operand for the very next move operation, which causes the replication. If you really want to move one array into another when they overlap, you should move each element of the source string to the destination string starting at the end of the two strings as shown in Figure 15.2. Setting the direction flag and pointing si and di at the end of the strings will allow you to (correctly) move one string to another when the two strings overlap and the source string begins at a lower address than the destination string. If the two strings overlap and the source string begins at a higher address than the destination string, then clear the direction flag and point si and di at the beginning of the two strings. If the two strings do not overlap, then you can use either technique to move the strings around in memory. Generally, operating with the direction flag clear is the easiest, so that makes the most sense in this case. You shouldn’t use the movs instruction to fill an array with a single byte, word, or double word value. Another string instruction, stos, is much better suited for this purpose. However, for arrays whose elements are larger than four bytes, you can use the movs instruction to initialize the entire array to the content of the first element. See the questions for additional information. Page 825
Chapter 15
15.1.5
The CMPS Instruction The cmps instruction compares two strings. The CPU compares the string referenced by es:di to the string pointed at by ds:si. Cx contains the length of the two strings (when using the rep prefix). Like the movs instruction, the MASM assembler allows several different forms of this instruction: {REPE} {REPE} {REPE} {REPE} {REPNE} {REPNE} {REPNE} {REPNE}
CMPSB CMPSW CMPSD CMPS CMPSB CMPSW CMPSD CMPS
;Available only on 80386 and later dest, source
;Available only on 80386 and later dest, source
Like the movs instruction, the operands present in the operand field of the cmps instruction determine the size of the operands. You specify the actual operand addresses in the si and di registers. Without a repeat prefix, the cmps instruction subtracts the value at location es:di from the value at ds:si and updates the flags. Other than updating the flags, the CPU doesn’t use the difference produced by this subtraction. After comparing the two locations, cmps increments or decrements the si and di registers by one, two, or four (for cmpsb/cmpsw/cmpsd, respectively). Cmps increments the si and di registers if the direction flag is clear and decrements them otherwise. Of course, you will not tap the real power of the cmps instruction using it to compare single bytes or words in memory. This instruction shines when you use it to compare whole strings. With cmps, you can compare consecutive elements in a string until you find a match or until consecutive elements do not match. To compare two strings to see if they are equal or not equal, you must compare corresponding elements in a string until they don’t match. Consider the following strings: “String1” “String1” The only way to determine that these two strings are equal is to compare each character in the first string to the corresponding character in the second. After all, the second string could have been “String2” which definitely is not equal to “String1”. Of course, once you encounter a character in the destination string which doesn’t equal the corresponding character in the source string, the comparison can stop. You needn’t compare any other characters in the two strings. The repe prefix accomplishes this operation. It will compare successive elements in a string as long as they are equal and cx is greater than zero. We could compare the two strings above using the following 80x86 assembly language code: ; Assume both strings are in the same segment and ES and DS ; both point at this segment.
repe
cld lea lea mov cmpsb
si, AdrsString1 di, AdrsString2 cx, 7
After the execution of the cmpsb instruction, you can test the flags using the standard conditional jump instructions. This lets you check for equality, inequality, less than, greater than, etc. Character strings are usually compared using lexicographical ordering. In lexicographical ordering, the least significant element of a string carries the most weight. This is in direct contrast to standard integer comparisons where the most significant portion of the
Page 826
Strings and Character Sets number carries the most weight. Furthermore, the length of a string affects the comparison only if the two strings are identical up to the length of the shorter string. For example, “Zebra” is less than “Zebras”, because it is the shorter of the two strings, however, “Zebra” is greater than “AAAAAAAAAAH!” even though it is shorter. Lexicographical comparisons compare corresponding elements until encountering a character which doesn’t match, or until encountering the end of the shorter string. If a pair of corresponding characters do not match, then this algorithm compares the two strings based on that single character. If the two strings match up to the length of the shorter string, we must compare their length. The two strings are equal if and only if their lengths are equal and each corresponding pair of characters in the two strings is identical. Lexicographical ordering is the standard alphabetical ordering you’ve grown up with. For character strings, use the cmps instruction in the following manner: • •
• • •
The direction flag must be cleared before comparing the strings. Use the cmpsb instruction to compare the strings on a byte by byte basis. Even if the strings contain an even number of characters, you cannot use the cmpsw instruction. It does not compare strings in lexicographical order. The cx register must be loaded with the length of the smaller string. Use the repe prefix. The ds:si and es:di registers must point at the very first character in the two strings you want to compare.
After the execution of the cmps instruction, if the two strings were equal, their lengths must be compared in order to finish the comparison. The following code compares a couple of character strings:
NoSwap: repe
lea lea mov mov cmp ja xchg cmpsb jne mov cmp
si, source di, dest cx, lengthSource ax, lengthDest cx, ax NoSwap ax, cx NotEqual ax, lengthSource ax, lengthDest
NotEqual:
If you’re using bytes to hold the string lengths, you should adjust this code appropriately. You can also use the cmps instruction to compare multi-word integer values (that is, extended precision integer values). Because of the amount of setup required for a string comparison, this isn’t practical for integer values less than three or four words in length, but for large integer values, it’s an excellent way to compare such values. Unlike character strings, we cannot compare integer strings using a lexicographical ordering. When comparing strings, we compare the characters from the least significant byte to the most significant byte. When comparing integers, we must compare the values from the most significant byte (or word/double word) down to the least significant byte, word or double word. So, to compare two eight-word (128-bit) integer values, use the following code on the 80286:
repe
std lea lea mov cmpsw
si, SourceInteger+14 di, DestInteger+14 cx, 8
This code compares the integers from their most significant word down to the least significant word. The cmpsw instruction finishes when the two values are unequal or upon decrementing cx to zero (implying that the two values are equal). Once again, the flags provide the result of the comparison.
Page 827
Chapter 15 The repne prefix will instruct the cmps instruction to compare successive string elements as long as they do not match. The 80x86 flags are of little use after the execution of this instruction. Either the cx register is zero (in which case the two strings are totally different), or it contains the number of elements compared in the two strings until a match. While this form of the cmps instruction isn’t particularly useful for comparing strings, it is useful for locating the first pair of matching items in a couple of byte or word arrays. In general, though, you’ll rarely use the repne prefix with cmps. One last thing to keep in mind with using the cmps instruction – the value in the cx register determines the number of elements to process, not the number of bytes. Therefore, when using cmpsw, cx specifies the number of words to compare. This, of course, is twice the number of bytes to compare.
15.1.6
The SCAS Instruction The cmps instruction compares two strings against one another. You cannot use it to search for a particular element within a string. For example, you could not use the cmps instruction to quickly scan for a zero throughout some other string. You can use the scas (scan string) instruction for this task. Unlike the movs and cmps instructions, the scas instruction only requires a destination string (es:di) rather than both a source and destination string. The source operand is the value in the al (scasb), ax (scasw), or eax (scasd) register. The scas instruction, by itself, compares the value in the accumulator (al, ax, or eax) against the value pointed at by es:di and then increments (or decrements) di by one, two, or four. The CPU sets the flags according to the result of the comparison. While this might be useful on occasion, scas is a lot more useful when using the repe and repne prefixes. When the repe prefix (repeat while equal) is present, scas scans the string searching for an element which does not match the value in the accumulator. When using the repne prefix (repeat while not equal), scas scans the string searching for the first string element which is equal to the value in the accumulator. You’re probably wondering “why do these prefixes do exactly the opposite of what they ought to do?” The paragraphs above haven’t quite phrased the operation of the scas instruction properly. When using the repe prefix with scas, the 80x86 scans through the string while the value in the accumulator is equal to the string operand. This is equivalent to searching through the string for the first element which does not match the value in the accumulator. The scas instruction with repne scans through the string while the accumulator is not equal to the string operand. Of course, this form searches for the first value in the string which matches the value in the accumulator register. The scas instruction takes the following forms: {REPE} {REPE} {REPE} {REPE} {REPNE} {REPNE} {REPNE} {REPNE}
SCASB SCASW SCASD SCAS SCASB SCASW SCASD SCAS
;Available only on 80386 and later processors dest
;Available only on 80386 and later processors dest
Like the cmps and movs instructions, the value in the cx register specifies the number of elements to process, not bytes, when using a repeat prefix.
15.1.7
The STOS Instruction The stos instruction stores the value in the accumulator at the location specified by es:di. After storing the value, the CPU increments or decrements di depending upon the state of the direction flag. Although the stos instruction has many uses, its primary use is
Page 828
Strings and Character Sets to initialize arrays and strings to a constant value. For example, if you have a 256-byte array you want to clear out with zeros, use the following code: ; Presumably, the ES register already points at the segment ; containing DestString
rep
cld lea mov xor stosw
di, DestString cx, 128 ax, ax
;256 bytes is 128 words. ;AX := 0
This code writes 128 words rather than 256 bytes because a single stosw operation is faster than two stosb operations. On an 80386 or later this code could have written 64 double words to accomplish the same thing even faster. The stos instruction takes four forms. They are {REP} {REP} {REP} {REP}
STOSB STOSW STOSD STOS
dest
The stosb instruction stores the value in the al register into the specified memory location(s), the stosw instruction stores the ax register into the specified memory location(s) and the stosd instruction stores eax into the specified location(s). The stos instruction is either an stosb, stosw, or stosd instruction depending upon the size of the specified operand. Keep in mind that the stos instruction is useful only for initializing a byte, word, or dword array to a constant value. If you need to initialize an array to different values, you cannot use the stos instruction. You can use movs in such a situation, see the exercises for additional details.
15.1.8
The LODS Instruction The lods instruction is unique among the string instructions. You will never use a repeat prefix with this instruction. The lods instruction copies the byte or word pointed at by ds:si into the al, ax, or eax register, after which it increments or decrements the si register by one, two, or four. Repeating this instruction via the repeat prefix would serve no purpose whatsoever since the accumulator register will be overwritten each time the lods instruction repeats. At the end of the repeat operation, the accumulator will contain the last value read from memory. Instead, use the lods instruction to fetch bytes (lodsb), words (lodsw), or double words (lodsd) from memory for further processing. By using the stos instruction, you can synthesize powerful string operations. Like the stos instruction, the lods instruction takes four forms: {REP} {REP} {REP} {REP}
LODSB LODSW LODSD LODS
;Available only on 80386 and later dest
As mentioned earlier, you’ll rarely, if ever, use the rep prefixes with these instructions3. The 80x86 increments or decrements si by one, two, or four depending on the direction flag and whether you’re using the lodsb, lodsw, or lodsd instruction.
3. They appear here simply because they are allowed. They’re not useful, but they are allowed.
Page 829
Chapter 15
15.1.9
Building Complex String Functions from LODS and STOS The 80x86 supports only five different string instructions: movs, cmps, scas, lods, and stos4. These certainly aren’t the only string operations you’ll ever want to use. However, you can use the lods and stos instructions to easily generate any particular string operation you like. For example, suppose you wanted a string operation that converts all the upper case characters in a string to lower case. You could use the following code: ; Presumably, ES and DS have been set up to point at the same ; segment, the one containing the string to convert.
Convert2Lower:
NotUpper:
lea mov mov lodsb cmp jb cmp ja or stosb loop
si, String2Convert di, si cx, LengthOfString al, ‘A’ NotUpper al, ‘Z’ NotUpper al, 20h
;Get next char in str. ;Is it upper case?
;Convert to lower case. ;Store into destination.
Convert2Lower
Assuming you’re willing to waste 256 bytes for a table, this conversion operation can be sped up somewhat using the xlat instruction: ; Presumably, ES and DS have been set up to point at the same ; segment, the one containing the string to be converted.
Convert2Lower:
cld lea mov mov lea lodsb xlat stosb loop
si, di, cx, bx,
String2Convert si LengthOfString ConversionTable ;Get next char in str. ;Convert as appropriate. ;Store into destination.
Convert2Lower
The conversion table, of course, would contain the index into the table at each location except at offsets 41h..5Ah. At these locations the conversion table would contain the values 61h..7Ah (i.e., at indexes ‘A’..’Z’ the table would contain the codes for ‘a’..’z’). Since the lods and stos instructions use the accumulator as an intermediary, you can use any accumulator operation to quickly manipulate string elements.
15.1.10 Prefixes and the String Instructions The string instructions will accept segment prefixes, lock prefixes, and repeat prefixes. In fact, you can specify all three types of instruction prefixes should you so desire. However, due to a bug in the earlier 80x86 chips (pre-80386), you should never use more than a single prefix (repeat, lock, or segment override) on a string instruction unless your code will only run on later processors; a likely event these days. If you absolutely must use two or more prefixes and need to run on an earlier processor, make sure you turn off the interrupts while executing the string instruction.
4. Not counting INS and OUTS which we’re ignoring here.
Page 830
Strings and Character Sets
15.2
Character Strings Since you’ll encounter character strings more often than other types of strings, they deserve special attention. The following sections describe character strings and various types of string operations.
15.2.1
Types of Strings At the most basic level, the 80x86’s string instruction only operate upon arrays of characters. However, since most string data types contain an array of characters as a component, the 80x86’s string instructions are handy for manipulating that portion of the string. Probably the biggest difference between a character string and an array of characters is the length attribute. An array of characters contains a fixed number of characters. Never any more, never any less. A character string, however, has a dynamic run-time length, that is, the number of characters contained in the string at some point in the program. Character strings, unlike arrays of characters, have the ability to change their size during execution (within certain limits, of course). To complicate things even more, there are two generic types of strings: statically allocated strings and dynamically allocated strings. Statically allocated strings are given a fixed, maximum length at program creation time. The length of the string may vary at run-time, but only between zero and this maximum length. Most systems allocate and deallocate dynamically allocated strings in a memory pool when using strings. Such strings may be any length (up to some reasonable maximum value). Accessing such strings is less efficient than accessing statically allocated strings. Furthermore, garbage collection5 may take additional time. Nevertheless, dynamically allocated strings are much more space efficient than statically allocated strings and, in some instances, accessing dynamically allocated strings is faster as well. Most of the examples in this chapter will use statically allocated strings. A string with a dynamic length needs some way of keeping track of this length. While there are several possible ways to represent string lengths, the two most popular are length-prefixed strings and zero-terminated strings. A length-prefixed string consists of a single byte or word that contains the length of that string. Immediately following this length value, are the characters that make up the string. Assuming the use of byte prefix lengths, you could define the string “HELLO” as follows: HelloStr
byte
5,”HELLO”
Length-prefixed strings are often called Pascal strings since this is the type of string variable supported by most versions of Pascal6. Another popular way to specify string lengths is to use zero-terminated strings. A zero-terminated string consists of a string of characters terminated with a zero byte. These types of strings are often called C-strings since they are the type used by the C/C++ programming language. The UCR Standard Library, since it mimics the C standard library, also uses zero-terminated strings. Pascal strings are much better than C/C++ strings for several reasons. First, computing the length of a Pascal string is trivial. You need only fetch the first byte (or word) of the string and you’ve got the length of the string. Computing the length of a C/C++ string is considerably less efficient. You must scan the entire string (e.g., using the scasb instruction) for a zero byte. If the C/C++ string is long, this can take a long time. Furthermore, C/C++ strings cannot contain the NULL character. On the other hand, C/C++ strings can be any length, yet require only a single extra byte of overhead. Pascal strings, however, 5. Reclaiming unused storage. 6. At least those versions of Pascal which support strings.
Page 831
Chapter 15 can be no longer than 255 characters when using only a single length byte. For strings longer than 255 bytes, you’ll need two bytes to hold the length for a Pascal string. Since most strings are less than 256 characters in length, this isn’t much of a disadvantage. An advantage of zero-terminated strings is that they are easy to use in an assembly language program. This is particularly true of strings that are so long they require multiple source code lines in your assembly language programs. Counting up every character in a string is so tedious that it’s not even worth considering. However, you can write a macro which will easily build Pascal strings for you: PString
StringStart StringLength
macro local byte byte = endm
String StringLength, StringStart StringLength String $-StringStart
. . .
PString
“This string has a length prefix”
As long as the string fits entirely on one source line, you can use this macro to generate Pascal style strings. Common string functions like concatenation, length, substring, index, and others are much easier to write when using length-prefixed strings. So we’ll use Pascal strings unless otherwise noted. Furthermore, the UCR Standard library provides a large number of C/C++ string functions, so there is no need to replicate those functions here.
15.2.2
String Assignment You can easily assign one string to another using the movsb instruction. For example, if you want to assign the length-prefixed string String1 to String2, use the following: ; Presumably, ES and DS are set up already
rep
lea lea mov mov inc movsb
si, di, ch, cl, cx
String1 String2 0 String1
;Extend len to 16 bits. ;Get string length. ;Include length byte.
This code increments cx by one before executing movsb because the length byte contains the length of the string exclusive of the length byte itself. Generally, string variables can be initialized to constants by using the PString macro described earlier. However, if you need to set a string variable to some constant value, you can write a StrAssign subroutine which assigns the string immediately following the call. The following procedure does exactly that: include stdlib.a includelib stdlib.lib cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg, es:dseg, ss:sseg
; String assignment procedure MainPgm
Page 832
proc mov mov mov
far ax, seg dseg ds, ax es, ax
lea call byte
di, ToString StrAssign “This is an example of how the “
Strings and Character Sets
MainPgm StrAssign
byte nop ExitPgm endp
“StrAssign routine is used”,0
proc push mov pushf push push push push push push push cld
near bp bp, sp ds si di cx ax di es
;Save again for use later.
; Get the address of the source string
repne
mov mov mov mov mov scasb neg dec dec
ax, es, di, cx, al,
cs ax 2[bp] 0ffffh 0
cx cx cx
;Get return address. ;Scan for as long as it takes. ;Scan for a zero. ;Compute the length of string. ;Convert length to a positive #. ;Because we started with -1, not 0. ;skip zero terminating byte.
es di al, cl
;Get destination segment. ;Get destination address. ;Store length byte.
; Now copy the strings pop pop mov stosb
; Now copy the source string.
rep
mov mov mov movsb
ax, cs ds, ax si, 2[bp]
; Update the return address and leave: inc mov
si 2[bp], si ax cx di si ds
StrAssign
pop pop pop pop pop popf pop ret endp
cseg
ends
dseg ToString dseg
segment para public ‘data’ byte 255 dup (0) ends
sseg
segment para stack ‘stack’ word 256 dup (?) ends end MainPgm
sseg
;Skip over zero byte.
bp
Page 833
Chapter 15 This code uses the scas instruction to determine the length of the string immediately following the call instruction. Once the code determines the length, it stores this length into the first byte of the destination string and then copies the text following the call to the string variable. After copying the string, this code adjusts the return address so that it points just beyond the zero terminating byte. Then the procedure returns control to the caller. Of course, this string assignment procedure isn’t very efficient, but it’s very easy to use. Setting up es:di is all that you need to do to use this procedure. If you need fast string assignment, simply use the movs instruction as follows: ; Presumably, DS and ES have already been set up.
rep
lea lea mov movsb
si, SourceString di, DestString cx, LengthSource
. . .
SourceString
LengthSource
byte byte byte =
LengthSource-1 “This is an example of how the “ “StrAssign routine is used” $-SourceString
DestString
byte
256 dup (?)
Using in-line instructions requires considerably more setup (and typing!), but it is much faster than the StrAssign procedure. If you don’t like the typing, you can always write a macro to do the string assignment for you.
15.2.3
String Comparison Comparing two character strings was already beaten to death in the section on the cmps instruction. Other than providing some concrete examples, there is no reason to con-
sider this subject any further. Note: all the following examples assume that es and ds are pointing at the proper segments containing the destination and source strings. Comparing Str1 to Str2: lea lea
si, Str1 di, Str2
; Get the minimum length of the two strings. mov mov cmp jb mov
al, Str1 cl, al al, Str2 CmpStrs cl, Str2
; Compare the two strings. CmpStrs: repe
mov cld cmpsb jne
ch, 0
StrsNotEqual
; If CMPS thinks they’re equal, compare their lengths ; just to be sure. cmp StrsNotEqual:
Page 834
al, Str2
Strings and Character Sets At label StrsNotEqual, the flags will contain all the pertinent information about the ranking of these two strings. You can use the conditional jump instructions to test the result of this comparison.
15.3
Character String Functions Most high level languages, like Pascal, BASIC, “C”, and PL/I, provide several string functions and procedures (either built into the language or as part of a standard library). Other than the five string operations provided above, the 80x86 doesn’t support any string functions. Therefore, if you need a particular string function, you’ll have to write it yourself. The following sections describe many of the more popular string functions and how to implement them in assembly language.
15.3.1
Substr The Substr (substring) function copies a portion of one string to another. In a high level language, this function usually takes the form: DestStr := Substr(SrcStr,Index,Length);
where: • • • •
DestStr is the name of the string variable where you want to store the substring, SrcStr is the name of the source string (from which the substring is to be taken), Index is the starting character position within the string (1..length(SrcStr)), and Length is the length of the substring you want to copy into DestStr.
The following examples show how Substr works. SrcStr := ‘This is an example of a string’; DestStr := Substr(SrcStr,11,7); write(DestStr);
This prints ‘example’. The index value is eleven, so, the Substr function will begin copying data starting at the eleventh character in the string. The eleventh character is the ‘e’ in ‘example’. The length of the string is seven. This invocation copies the seven characters ‘example’ to DestStr. SrcStr := ‘This is an example of a string’; DestStr := Substr(SrcStr,1,10); write(DestStr);
This prints ‘This is an’. Since the index is one, this occurrence of the Substr function starts copying 10 characters starting with the first character in the string. SrcStr := ‘This is an example of a string’; DestStr := Substr(SrcStr,20,11); write(DestStr);
This prints ‘of a string’. This call to Substr extracts the last eleven characters in the string. What happens if the index and length values are out of bounds? For example, what happens if Index is zero or is greater than the length of the string? What happens if Index is fine, but the sum of Index and Length is greater than the length of the source string? You can handle these abnormal situations in one of three ways: (1)ignore the possibility of error; (2)abort the program with a run-time error; (3)process some reasonable number of characters in response to the request.
Page 835
Chapter 15 The first solution operates under the assumption that the caller never makes a mistake computing the values for the parameters to the Substr function. It blindly assumes that the values passed to the Substr function are correct and processes the string based on that assumption. This can produce some bizarre effects. Consider the following examples, which use length-prefixed strings: SourceStr :=’1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ’; DestStr := Substr(SourceStr,0,5); Write(‘DestStr’);
prints ‘$1234’. The reason, of course, is that SourceStr is a length-prefixed string. Therefore the length, 36, appears at offset zero within the string. If Substr uses the illegal index of zero then the length of the string will be returned as the first character. In this particular case, the length of the string, 36, just happened to correspond to the ASCII code for the ‘$’ character. The situation is considerably worse if the value specified for Index is negative or is greater than the length of the string. In such a case, the Substr function would be returning a substring containing characters appearing before or after the source string. This is not a reasonable result. Despite the problems with ignoring the possibility of error in the Substr function, there is one big advantage to processing substrings in this manner: the resulting Substr code is more efficient if it doesn’t have to perform any run-time checking on the data. If you know that the index and length values are always within an acceptable range, then there is no need to do this checking within Substr function. If you can guarantee that an error will not occur, your programs will run (somewhat) faster by eliminating the run-time check. Since most programs are rarely error-free, you’re taking a big gamble if you assume that all calls to the Substr routine are passing reasonable values. Therefore, some sort of run-time check is often necessary to catch errors in your program. An error occurs under the following conditions: • • • •
The index parameter (Index) is less than one. Index is greater than the length of the string. The Substr length parameter (Length) is greater than the length of the string. The sum of Index and Length is greater than the length of the string.
An alternative to ignoring any of these errors is to abort with an error message. This is probably fine during the program development phase, but once your program is in the hands of users it could be a real disaster. Your customers wouldn’t be very happy if they’d spent all day entering data into a program and it aborted, causing them to lose the data they’ve entered. An alternative to aborting when an error occurs is to have the Substr function return an error condition. Then leave it up to the calling code to determine if an error has occurred. This technique works well with the third alternative to handling errors: processing the substring as best you can. The third alternative, handling the error as best you can, is probably the best alternative. Handle the error conditions in the following manner: •
Page 836
The index parameter (Index) is less than one. There are two ways to handle this error condition. One way is to automatically set the Index parameter to one and return the substring beginning with the first character of the source string. The other alternative is to return the empty string, a string of length zero, as the substring. Variations on this theme are also possible. You might return the substring beginning with the first character if the index is zero and an empty string if the index is negative. Another alternative is to use unsigned numbers. Then you’ve only got to worry about the case where Index is zero. A negative number, should the calling code accidentally generate one, would look like a large positive number.
Strings and Character Sets •
• •
The index is greater than the length of the string. If this is the case, then the Substr function should return an empty string. Intuitively, this is the proper response in this situation. The Substr length parameter (Length) is greater than the length of the string. -orThe sum of Index and Length is greater than the length of the string. Points three and four are the same problem, the length of the desired substring extends beyond the end of the source string. In this event, Substr should return the substring consisting of those characters starting at Index through the end of the source string.
The following code for the Substr function expects four parameters: the addresses of the source and destination strings, the starting index, and the length of the desired substring. Substr expects the parameters in the following registers: ds:si-
The address of the source string.
es:di-
The address of the destination string.
ch-
The starting index.
cl-
The length of the substring.
Substr returns the following values: • • •
The substring, at location es:di. Substr clears the carry flag if there were no errors. Substr sets the carry flag if there was an error. Substr preserves all the registers.
If an error occurs, then the calling code must examine the values in si, di and cx to determine the exact cause of the error (if this is necessary). In the event of an error, the Substr function returns the following substrings: • • • •
If the Index parameter (ch) is zero, Substr uses one instead. The Index and Length parameters are both unsigned byte values, therefore they are never negative. If the Index parameter is greater than the length of the source string, Substr returns an empty string. If the sum of the Index and Length parameters is greater than the length of the source string, Substr returns only those characters from Index through the end of the source string. The following code realizes the substring function.
; Substring function. ; ; HLL form: ; ;procedure substring(var Src:string; ; Index, Length:integer; ; var Dest:string); ; ; Src- Address of a source string. ; Index- Index into the source string. ; Length- Length of the substring to extract. ; Dest- Address of a destination string. ; ; Copies the source string from address [Src+index] of length ; Length to the destination string. ; ; If an error occurs, the carry flag is returned set, otherwise ; clear. ; ; Parameters are passed as follows: ; ; DS:SI- Source string address. ; ES:DI- Destination string address.
Page 837
Chapter 15 ; ; ; ; ; ;
CH- Index into source string. CL- Length of source string. Note: the strings pointed at by the SI and DI registers are length-prefixed strings. That is, the first byte of each string contains the length of that string.
Substring
proc push push push push clc pushf
near ax cx di si ;Assume no error. ;Save direction flag status.
; Check the validity of the parameters. cmp ja mov dec add jc cmp jbe
ch, [si] ReturnEmpty al, ch al al, cl TooLong al, [si] OkaySoFar
;Is index beyond the length of ; the source string? ;See if the sum of index and ; length is beyond the end of the ; string. ;Error if > 255. ;Beyond the length of the source?
; If the substring isn’t completely contained within the source ; string, truncate it: TooLong:
OkaySoFar:
rep
SubStrDone:
popf stc pushf mov sub inc mov mov inc mov mov mov add cld movsb popf pop pop pop pop ret
;Return an error flag. al, [si] al, ch al cl, al
;Get maximum length. ;Subtract index value. ;Adjust as appropriate. ;Save as new length.
es:[di], cl di al, ch ch, 0 ah, 0 si, ax
;Save destination string length. ;Get index into source. ;Zero extend length value into CX. ;Zero extend index into AX. ;Compute address of substring. ;Copy the substring.
si di cx ax
; Return an empty string here: ReturnEmpty:
SubString
15.3.2
mov popf stc jmp endp
byte ptr es:[di], 0
SubStrDone
Index The Index string function searches for the first occurrence of one string within another and returns the offset to that occurrence. Consider the following HLL form:
Page 838
Strings and Character Sets SourceStr := ‘Hello world’; TestStr := ‘world’; I := INDEX(SourceStr, TestStr);
The Index function scans through the source string looking for the first occurrence of the test string. If found, it returns the index into the source string where the test string begins. In the example above, the Index function would return seven since the substring ‘world’ starts at the seventh character position in the source string. The only possible error occurs if Index cannot find the test string in the source string. In such a situation, most implementations return zero. Our version will do likewise. The Index function which follows operates in the following fashion: 1) It compares the length of the test string to the length of the source string. If the test string is longer, Index immediately returns zero since there is no way the test string will be found in the source string in this situation. 2) The index function operates as follows: i := 1; while (i < (length(source)-length(test)) and test <> substr(source, i, length(test)) do i := i+1;
When this loop terminates, if (i < length(source)-length(test)) then it contains the index into source where test begins. Otherwise test is not a substring of source. Using the previous example, this loop compares test to source in the following manner: i=1 test: source:
world Hello world
No match
i=2 test: source:
world Hello world
No match
i=3 test: source:
world Hello world
No match
i=4 test: source:
world Hello world
No match
i=5 test: source:
world Hello world
No match
i=6 test: source:
world Hello world
No match
i=7 test: source:
world Hello world
Match
There are (algorithmically) better ways to do this comparison7, however, the algorithm above lends itself to the use of 80x86 string instructions and is very easy to understand. Index’s code follows: ; INDEX- computes the offset of one string within another. ; ; On entry: ; 7. The interested reader should look up the Knuth-Morris-Pratt algorithm in “Data Structure Techniques” by Thomas A. Standish. The Boyer-Moore algorithm is another fast string search routine, although somewhat more complex.
Page 839
Chapter 15 ; ; ; ; ; ; ; ; ;
ES:DI-
Points at the in the source Points at the contains the
DS:SI-
test string that INDEX will search for string. source string which (presumably) string INDEX is searching for.
On exit: AX-
Contains the offset into the source string where the test string was found.
INDEX
proc push push push push pushf cld
near si di bx cx
mov cmp ja
al, es:[di] al, [si] NotThere
;Save direction flag value.
;Get the length of the test string. ;See if it is longer than the length ; of the source string.
; Compute the index of the last character we need to compare the ; test string against in the source string.
CmpLoop:
rep
mov mov mov sub mov inc xor inc inc push push push cmpsb pop pop pop je dec jnz
al, cl, ch, al, bl, di ax, ax si si di cx
es:[di] al 0 [si] al ax
cx di si Foundindex bl CmpLoop
;Length of test string. ;Save for later. ;Length of source string. ;# of times to repeat loop. ;Skip over length byte. ;Init index to zero. ;Bump index by one. ;Move on to the next char in source. ;Save string pointers and the ; length of the test string. ;Compare the strings. ;Restore string pointers ; and length. ;If we found the substring. ;Try next entry in source string.
; If we fall down here, the test string doesn’t appear inside the ; source string. NotThere:
xor
ax, ax
;Return INDEX = 0
; If the substring was found in the loop above, remove the ; garbage left on the stack FoundIndex:
INDEX
15.3.3
popf pop pop pop pop ret endp
cx bx di si
Repeat The Repeat string function expects three parameters– the address of a string, a length, and a character. It constructs a string of the specified length containing “length” copies of
Page 840
Strings and Character Sets the specified character. For example, Repeat(STR,5,’*’) stores the string ‘*****’ into the STR string variable. This is a very easy string function to write, thanks to the stosb instruction: ; ; ; ; ; ; ; ; ;
REPEAT-
On entry: ES:DICXAL-
REPEAT
rep
REPEAT
15.3.4
Constructs a string of length CX where each element is initialized to the character passed in AL.
Points at the string to be constructed. Contains the length of the string. Contains the character with which each element of the string is to be initialized. proc push push push pushf cld mov mov inc stosb popf pop pop pop ret endp
near di ax cx ;Save direction flag value. es:[di], cl ch, 0 di
;Save string length. ;Just in case. ;Start string at next location.
cx ax di
Insert The Insert string function inserts one string into another. It expects three parameters, a source string, a destination string, and an index. Insert inserts the source string into the destination string starting at the offset specified by the index parameter. HLLs usually call the Insert procedure as follows: source := ‘ there’; dest := ‘Hello world’; INSERT(source,dest,6);
The call to Insert above would change source to contain the string ‘Hello there world’. It does this by inserting the string ‘ there’ before the sixth character in ‘Hello world’. The insert procedure using the following algorithm: Insert(Src,dest,index);
1)
Move the characters from location dest+index through the end of the destination string length (Src) bytes up in memory.
2)
Copy the characters from the Src string to location dest+index.
3)
Adjust the length of the destination string so that it is the sum of the destination and source lengths. The following code implements this algorithm:
; ; ; ; ; ; ; ; ; ;
INSERT- Inserts one string into another. On entry: DS:SI Points at the source string to be inserted ES:DI Points at the destination string into which the source string will be inserted. DX Contains the offset into the destination string where the
Page 841
Chapter 15 ; source string is to be inserted. ; ; ; All registers are preserved. ; ; Error condition; ; If the length of the newly created string is greater than 255, ; the insert operation will not be performed and the carry flag ; will be returned set. ; ; If the index is greater than the length of the destination ; string, ; then the source string will be appended to the end of the destin- ; ation string. INSERT
proc push push push push push push clc pushf mov
near si di dx cx bx ax ;Assume no error. dh, 0
;Just to be safe.
; First, see if the new string will be too long. mov mov mov mov mov mov add jc mov
ch, 0 ah, ch bh, ch al, es:[di] cl, [si] bl, al bl, cl TooLong es:[di], bl
;AX = length of dest string. ;CX = length of source string. ;BX = length of new string. ;Abort if too long. ;Update length.
; See if the index value is too large: cmp jbe mov
dl, al IndexIsOK dl, al
IndexIsOK: ; Now, make room for the string that’s about to be inserted.
rep
push push
si cx
;Save for later.
mov add add std movsb
si, di si, ax di, bx
;Point SI at the end of current ; destination string. ;Point DI at the end of new str. ;Open up space for new string.
; Now, copy the source string into the space opened up.
rep
Page 842
pop pop add movsb jmp
TooLong:
popf stc pushf
INSERTDone:
popf
cx si si, cx INSERTDone
;Point at end of source string.
Strings and Character Sets pop pop pop pop pop pop ret endp
INSERT
15.3.5
ax bx cx dx di si
Delete The Delete string removes characters from a string. It expects three parameters – the address of a string, an index into that string, and the number of characters to remove from that string. A HLL call to Delete usually takes the form: Delete(Str,index,length);
For example, Str := ‘Hello there world’; Delete(str,7,6);
This call to Delete will leave str containing ‘Hello world’. The algorithm for the delete operation is the following: 1)
Subtract the length parameter value from the length of the destination string and update the length of the destination string with this new value.
2)
Copy any characters following the deleted substring over the top of the deleted substring.
There are a couple of errors that may occur when using the delete procedure. The index value could be zero or larger than the size of the specified string. In this case, the Delete procedure shouldn’t do anything to the string. If the sum of the index and length parameters is greater than the length of the string, then the Delete procedure should delete all the characters to the end of the string. The following code implements the Delete procedure: ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
DELETE - removes some substring from a string. On entry: DS:SI DX
Points at the source string. Index into the string of the start of the substring to delete. Length of the substring to be deleted.
CX Error conditions-
If DX is greater than the length of the string, then the operation is aborted. If DX+CX is greater than the length of the string, DELETE only deletes those characters from DX through the end of the string.
DELETE
proc push push push push push push pushf mov mov mov
near es si di ax cx dx ax, ds es, ax ah, 0
;Save direction flag. ;Source and destination strings ; are the same.
Page 843
Chapter 15 mov mov
dh, ah ch, ah
;Just to be safe.
; See if any error conditions exist. mov cmp ja mov add jc cmp jbe
al, [si] dl, al TooBig al, dl al, cl Truncate al, [si] LengthIsOK
;Get the string length ;Is the index too big? ;Now see if INDEX+LENGTH ;is too large
; If the substring is too big, truncate it to fit. Truncate:
mov sub inc
cl, [si] cl, dl cl
;Compute maximum length
; Compute the length of the new string. LengthIsOK:
mov sub mov
al, [si] al, cl [si], al
; Okay, now delete the specified substring.
rep TooBig:
popf pop pop pop pop pop pop ret endp
DELETE
15.3.6
add mov add cld movsb
si, dx di, si di, cx
;Compute address of the substring ; to be deleted, and the address of ; the first character following it. ;Delete the string.
dx cx ax di si es
Concatenation The concatenation operation takes two strings and appends one to the end of the other. For example, Concat(‘Hello ‘,’world’) produces the string ‘Hello world’. Some high level languages treat concatenation as a function call, others as a procedure call. Since in assembly language everything is a procedure call anyway, we’ll adopt the procedural syntax. Our Concat procedure will take the following form: Concat(source1,source2,dest);
This procedure will copy source1 to dest, then it will concatenate source2 to the end of dest. Concat follows: ; ; ; ; ; ; ; ; ;
Page 844
Concat-
Copies the string pointed at by SI to the string rointed at byDI and then concatenates the string; pointed at by BX to the destination string.
On entryDS:SIDS:BXES:DI-
Points at the first source string Points at the second source string Points at the destination string.
Strings and Character Sets ; ; ; ; ; ;
Error conditionThe sum of the lengths of the two strings is greater than 255. In this event, the second string will be truncated so that the entire string is less than 256 characters in length.
CONCAT
proc push push push push pushf
near si di cx ax
; Copy the first string to the destination string:
SetNewLength:
rep
mov mov mov mov add adc cmp jb mov mov mov inc inc movsb
al, [si] cl, al ch, 0 ah, ch al, [bx] ah, 0 ax, 256 SetNewLength ah, [si] al, 255 es:[di], al di si
;Compute the sum of the string’s ; lengths.
;Save original string length. ;Fix string length at 255. ;Save new string length. ;Skip over length bytes. ;Copy source1 to dest string.
; If the sum of the two strings is too long, the second string ; must be truncated.
LengthsAreOK: ;
CONCAT
15.4
mov cmp jb mov neg
cl, [bx] ;Get length of second string. ax, 256 LengthsAreOK cl, ah ;Compute truncated length. cl ;CL := 256-Length(Str1).
lea
si, 1[bx]
cld rep
movsb
popf pop pop pop pop ret endp
;Point at second string and ; skip the string length. ;Perform the concatenation.
ax cx di si
String Functions in the UCR Standard Library The UCR Standard Library for 80x86 Assembly Language Programmers provides a very rich set of string functions you may use. These routines, for the most part, are quite similar to the string functions provided in the C Standard Library. As such, these functions support zero terminated strings rather than the length prefixed strings supported by the functions in the previous sections. Because there are so many different UCR StdLib string routines and the sources for all these routines are in the public domain (and are present on the companion CD-ROM for this text), the following sections will not discuss the implementation of each routine. Instead, the following sections will concentrate on how to use these library routines.
Page 845
Chapter 15 The UCR library often provides several variants of the same routine. Generally a suffix of “l”, “m”, or “ml” appears at the end of the name of these variant routines. The “l” suffix stands for “literal constant”. Routines with the “l” (or “ml”) suffix require two string operands. The first is generally pointed at by es:di and the second immediate follows the call in the code stream. Most StdLib string routines operate on the specified string (or one of the strings if the function has two operands). The “m” (or “ml”) suffix instructs the string function to allocate storage on the heap (using malloc, hence the “m” suffix) for the new string and store the modified result there rather than changing the source string(s). These routines always return a pointer to the newly created string in the es:di registers. In the event of a memory allocation error (insufficient memory), these routines with the “m” or “ml” suffix return the carry flag set. They return the carry clear if the operation was successful.
15.4.1
StrBDel, StrBDelm These two routines delete leading spaces from a string. StrBDel removes any leading spaces from the string pointed at by es:di. It actually modifies the source string. StrBDelm makes a copy of the string on the heap with any leading spaces removed. If there are no leading spaces, then the StrBDel routines return the original string without modification. Note that these routines only affect leading spaces (those appearing at the beginning of the string). They do not remove trailing spaces and spaces in the middle of the string. See Strtrim if you want to remove trailing spaces. Examples: MyString MyStrPtr
byte dword
“ Hello there, this is my string”,0 MyString
. . .
les di, MyStrPtr strbdelm ;Creates a new string w/o leading spaces, jc error ; pointer to string is in ES:DI on return. puts ;Print the string pointed at by ES:DI. free ;Deallocate storage allocated by strbdelm. . . .
; Note that “MyString” still contains the leading spaces. ; The following printf call will print the string along with ; those leading spaces. “strbdelm” above did not change MyString. printf byte dword
“MyString = ‘%s’\n”,0 MyString
. . .
les strbdel
di, MyStrPtr
; Now, we really have removed the leading spaces from “MyString” printf byte dword
“MyString = ‘%s’\n”,0 MyString
. . .
Output from this code fragment: Hello there, this is my string MyString = ‘ Hello there, this is my string’ MyString = ‘Hello there, this is my string’
Page 846
Strings and Character Sets
15.4.2
Strcat, Strcatl, Strcatm, Strcatml The strcat(xx) routines perform string concatenation. On entry, es:di points at the first string, and for strcat/strcatm dx:si points at the second string. For strcatl and strcatlm the second string follows the call in the code stream. These routines create a new string by appending the second string to the end of the first. In the case of strcat and strcatl, the second string is directly appended to the end of the first string (es:di) in memory. You must make sure there is sufficient memory at the end of the first string to hold the appended characters. Strcatm and strcatml create a new string on the heap (using malloc) holding the concatenated result. Examples: String1
String2
byte byte
“Hello “,0 16 dup (0)
byte
“world”,0
;Room for concatenation.
; The following macro loads ES:DI with the address of the ; specified operand. lesi
macro mov mov mov endm
operand di, seg operand es, di di, offset operand
; The following macro loads DX:SI with the address of the ; specified operand. ldxi
macro mov mov endm
operand dx, seg operand si, offset operand
. . .
lesi ldxi strcatm jc print byte puts putcr free
String1 String2 error
;Create “Hello world” ;If insufficient memory.
“strcatm: “,0 ;Print “Hello world” ;Deallocate string storage.
. . .
lesi strcatml jc byte print byte puts putcr free
String1 error “there”,0
;Create the string ; “Hello there” ;If insufficient memory.
“strcatml: “,0 ;Print “Hello there”
. . .
lesi ldxi strcat printf byte
String1 String2 ;Create “Hello world” “strcat: %s\n”,0
. . .
; Note: since strcat above has actually modified String1, ; the following call to strcatl appends “there” to the end ; of the string “Hello world”. lesi
String1
Page 847
Chapter 15 strcatl byte printf byte
“there”,0 “strcatl: %s\n”,0
. . .
The code above produces the following output: strcatm: Hello world strcatml: Hello there strcat: Hello world strcatl: Hello world there
15.4.3
Strchr Strchr searches for the first occurrence of a single character within a string. In operation it is quite similar to the scasb instruction. However, you do not have to specify an explicit length when using this function as you would for scasb. On entry, es:di points at the string you want to search through, al contains the value to search for. On return, the carry flag denotes success (C=1 means the character was not present in the string, C=0 means the character was present). If the character was found in the string, cx contains the index into the string where strchr located the character. Note that the first character of the string is at index zero. So strchr will return zero if al matches the first character of the string. If the carry flag is set, then the value in cx has no meaning. Example: ; Note that the following string has a period at location ; “HasPeriod+24”. HasPeriod
byte
“This string has a period.”,0
. . .
lesi mov strchr jnc print byte jmp
HasPeriod al, “.”
;See strcat for lesi definition. ;Search for a period.
GotPeriod “No period in string”,cr,lf,0 Done
; If we found the period, output the offset into the string: GotPeriod:
print byte mov puti putcr
“Found period at offset “,0 ax, cx
Done:
This code fragment produces the output: Found period at offset 24
15.4.4
Strcmp, Strcmpl, Stricmp, Stricmpl These routines compare strings using a lexicographical ordering. On entry to strcmp or stricmp, es:di points at the first string and dx:si points at the second string. Strcmp compares the first string to the second and returns the result of the comparison in the flags register. Strcmpl operates in a similar fashion, except the second string follows the call in the code stream. The stricmp and stricmpl routines differ from their counterparts in that they ignore case during the comparison. Whereas strcmp would return ‘not equal’ when comparing “Strcmp” with “strcmp”, the stricmp (and stricmpl) routines would return “equal” since the
Page 848
Strings and Character Sets only differences are upper vs. lower case. The “i” in stricmp and stricmpl stands for “ignore case.” Examples: String1 String2 String3
byte byte byte
“Hello world”, 0 “hello world”, 0 “Hello there”, 0
. . .
lesi ldxi strcmp jae printf byte dword jmp IsGtrEql:
Tryl:
NotEql:
Tryi:
BadCmp:
Tryil:
BadCmp2:
printf byte dword lesi strcmpl byte jne printf byte dword jmp printf byte dword lesi ldxi stricmp jne printf byte dword jmp printf byte dword lesi stricmpl byte jne print byte jmp print byte
String1 String2
;See strcat for lesi definition. ;See strcat for ldxi definition.
IsGtrEql “%s is less than %s\n”,0 String1, String2 Tryl
“%s is greater or equal to %s\n”,0 String1, String2 String2 “hi world!”,0 NotEql “Hmmm..., %s is equal to ‘hi world!’\n”,0 String2 Tryi
“%s is not equal to ‘hi world!’\n”,0 String2 String1 String2 BadCmp “Ignoring case, %s equals %s\n”,0 String1, String2 Tryil
“Wow, stricmp doesn’t work! %s <> %s\n”,0 String1, String2 String2 “hELLO THERE”,0 BadCmp2 “Stricmpl worked”,cr,lf,0 Done
“Stricmp did not work”,cr,lf,0
Done:
15.4.5
Strcpy, Strcpyl, Strdup, Strdupl The strcpy and strdup routines copy one string to another. There is no strcpym or strcpyml routines. Strdup and strdupl correspond to those operations. The UCR Standard Library uses the names strdup and strdupl rather than strcpym and strcpyml so it will use the same names as the C standard library. Page 849
Chapter 15 Strcpy copies the string pointed at by es:di to the memory locations beginning at the address in dx:si. There is no error checking; you must ensure that there is sufficient free space at location dx:si before calling strcpy. Strcpy returns with es:di pointing at the destination string (that is, the original dx:si value). Strcpyl works in a similar fashion, except the source string follows the call. Strdup duplicates the string which es:di points at and returns a pointer to the new string on the heap. Strdupl works in a similar fashion, except the string follows the call. As usual, the carry flag is set if there is a memory allocation error when using strdup or strdupl. Examples: String1 String2 String3 StrVar1 StrVar2
byte byte byte dword dword
“Copy this string”,0 32 dup (0) 32 dup (0) 0 0
. . .
15.4.6
lesi ldxi strcpy
String1 String2
;See strcat for lesi definition. ;See strcat for ldxi definition.
ldxi strcpyl byte
String3
lesi strdup jc mov mov
String1 error word ptr StrVar1, di word ptr StrVar1+2, es
strdupl jc byte mov mov
error “Also, this string”,0 word ptr StrVar2, di word ptr StrVar2+2, es
printf byte byte byte byte dword
“strcpy: %s\n” “strcpyl: %s\n” “strdup: %^s\n” “strdupl: %^s\n”,0 String2, String3, StrVar1, StrVar2
“This string, too!”,0
;If insufficient mem. ;Save away ptr to ; string.
Strdel, Strdelm Strdel and strdelm delete characters from a string. Strdel deletes the specified characters within the string, strdelm creates a new copy of the source string without the specified characters. On entry, es:di points at the string to manipulate, cx contains the index into the string where the deletion is to start, and ax contains the number of characters to delete from the string. On return, es:di points at the new string (which is on the heap if you call strdelm). For strdelm only, if the carry flag is set on return, there was a memory allocation error. As with all UCR StdLib string routines, the index values for the string are zero-based. That is, zero is the index of the first character in the source string. Example: String1
byte
“Hello there, how are you?”,0
. . .
lesi mov mov strdelm jc print byte puts
Page 850
String1 cx, 5 ax, 6 error
;See strcat for lesi definition. ;Start at position five (“ there”) ;Delete six characters. ;Create a new string. ;If insufficient memory.
“New string:”,0
Strings and Character Sets putcr lesi mov mov strdel printf byte dword
String1 ax, 11 cx, 13
“Modified string: %s\n”,0 String1
This code prints the following: New string: Hello, how are you? Modified string: Hello there
15.4.7
Strins, Strinsl, Strinsm, Strinsml The strins(xx) functions insert one string within another. For all four routines es:di points at the source string into you want to insert another string. Cx contains the insertion point (0..length of source string). For strins and strinsm, dx:si points at the string you wish to insert. For strinsl and strinsml, the string to insert appears as a literal constant in the code stream. Strins and strinsl insert the second string directly into the string pointed at by es:di. Strinsm and strinsml make a copy of the source string and insert the second string into that copy. They return a pointer to the new string in es:di. If there is a memory allocation error then strinsm/strinsml sets the carry flag on return. For strins and strinsl, the first string must have sufficient storage allocated to hold the new string. Examples: InsertInMe InsertStr StrPtr1 StrPtr2
byte byte byte dword dword
“Insert >< Here”,0 16 dup (0) “insert this”,0 0 0
. . .
lesi ldxi mov strinsm mov mov
InsertInMe InsertStr cx, 8
;See strcat for lesi definition. ;See strcat for ldxi definition. ;Însert before “<“
word ptr StrPtr1, di word ptr StrPtr1+2, es
lesi mov strinsml byte mov mov
InsertInMe cx, 8
lesi mov strinsl byte
InsertInMe cx, 8 “ “,0
;Two spaces
lesi ldxi mov strins
InsertInMe InsertStr cx, 9
;In front of first space from above.
printf byte byte byte dword
“insert that”,0 word ptr StrPtr2, di word ptr StrPtr2+2, es
“First string: %^s\n” “Second string: %^s\n” “Third string: %s\n”,0 StrPtr1, StrPtr2, InsertInMe
Note that the strins and strinsl operations above both insert strings into the same destination string. The output from the above code is
Page 851
Chapter 15 First string: Insert >insert this< here Second string: Insert >insert that< here Third string: Insert > insert this < here
15.4.8
Strlen Strlen computes the length of the string pointed at by es:di. It returns the number of characters up to, but not including, the zero terminating byte. It returns this length in the cx register. Example: GetLen
byte
“This string is 33 characters long”,0
. . .
lesi strlen print byte mov puti print byte
15.4.9
GetLen
;See strcat for lesi definition.
“The string is “,0 ax, cx ;Puti needs the length in AX!
“ characters long”,cr,lf,0
Strlwr, Strlwrm, Strupr, Struprm Strlwr and Strlwrm convert any upper case characters in a string to lower case. Strupr and Struprm convert any lower case characters in a string to upper case. These routines do not affect any other characters present in the string. For all four routines, es:di points at the source string to convert. Strlwr and strupr modify the characters directly in that string. Strlwrm and struprm make a copy of the string to the heap and then convert the characters in the new string. They also return a pointer to this new string in es:di. As usual for UCR StdLib routines, strlwrm and struprm return the carry flag set if there is a memory allocation error. Examples: String1 String2 StrPtr1 StrPtr2
byte byte dword dword
“This string has lower case.”,0 “THIS STRING has Upper Case.”,0 0 0
. . .
lesi struprm jc mov mov lesi strlwrm jc mov mov
;See strcat for lesi definition. ;Convert lower case to upper case.
error word ptr StrPtr1, di word ptr StrPtr1+2, es String2 ;Convert upper case to lower case. error word ptr StrPtr2, di word ptr StrPtr2+2, es
lesi strlwr
String1
lesi strupr
String2
printf byte byte byte byte dword
Page 852
String1
;Convert to lower case, in place.
;Convert to upper case, in place.
“struprm: %^s\n” “strlwrm: %^s\n” “strlwr: %s\n” “strupr: %s\n”,0 StrPtr1, StrPtr2, String1, String2
Strings and Character Sets The above code fragment prints the following: struprm: THIS STRING HAS LOWER CASE strlwrm: this string has upper case strlwr: this string has lower case strupr: THIS STRING HAS UPPER CASE
15.4.10 Strrev, Strrevm These two routines reverse the characters in a string. For example, if you pass strrev the string “ABCDEF” it will convert that string to “FEDCBA”. As you’d expect by now, the strrev routine reverse the string whose address you pass in es:di; strrevm first makes a copy of the string on the heap and reverses those characters leaving the original string unchanged. Of course strrevm will return the carry flag set if there was a memory allocation error. Example: Palindrome NotPaldrm StrPtr1
byte byte dword
“radar”,0 “x + y - z”,0 0
. . .
lesi strrevm jc mov mov lesi strrev printf byte byte dword
Palindrome
;See strcat for lesi definition.
error word ptr StrPtr1, di word ptr StrPtr1+2, es NotPaldrm
“First string: %^s\n” “Second string: %s\n”,0 StrPtr1, NotPaldrm
The above code produces the following output: First string: radar Second string: z - y + x
15.4.11 Strset, Strsetm Strset and strsetm replicate a single character through a string. Their behavior, however, is not quite the same. In particular, while strsetm is quite similar to the repeat function (see “Repeat” on page 840), strset is not. Both routines expect a single character value in the al register. They will replicate this character throughout some string. Strsetm also requires a count in the cx register. It creates a string on the heap consisting of cx characters and returns a pointer to this string in es:di (assuming no memory allocation error). Strset, on the other hand, expects you to pass it the address of an existing string in es:di. It will replace each character in that string with the character in al. Note that you do not specify a length when using the strset function, strset uses the length of the existing string. Example: String1
byte
“Hello there”,0
. . .
lesi mov strset
String1 al, ‘*’
mov mov strsetm
cx, 8 al, ‘#’
;See strcat for lesi definition.
print
Page 853
Chapter 15 byte puts printf byte dword
“String2: “,0
“\nString1: %s\n“,0 String1
The above code produces the output: String2: ######## String1: ***********
15.4.12 Strspan, Strspanl, Strcspan, Strcspanl These four routines search through a string for a character which is either in some specified character set (strspan, strspanl) or not a member of some character set (strcspan, strcspanl). These routines appear in the UCR Standard Library only because of their appearance in the C standard library. You should rarely use these routines. The UCR Standard Library includes some other routines for manipulating character sets and performing character matching operations. Nonetheless, these routines are somewhat useful on occasion and are worth a mention here. These routines expect you to pass them the addresses of two strings: a source string and a character set string. They expect the address of the source string in es:di. Strspan and strcspan want the address of the character set string in dx:si; the character set string follows the call with strspanl and strcspanl. On return, cx contains an index into the string, defined as follows: strspan, strspanl:
Index of first character in source found in the character set.
strcspan, strcspanl:
Index of first character in source not found in the character set.
If all the characters are in the set (or are not in the set) then cx contains the index into the string of the zero terminating byte. Example: Source Set1 Set2 Index1 Index2 Index3 Index4
byte byte byte word word word word
“ABCDEFG 0123456”,0 “ABCDEFGHIJKLMNOPQRSTUVWXYZ”,0 “0123456789”,0 ? ? ? ?
. . .
lesi ldxi strspan mov lesi lesi strspan mov
Source Set1 Index1, cx
;See strcat for lesi definition. ;See strcat for ldxi definition. ;Search for first ALPHA char. ;Index of first alphabetic char.
Source Set2 ;Search for first numeric char. Index2, cx
lesi Source strcspanl byte “ABCDEFGHIJKLMNOPQRSTUVWXYZ”,0 mov Index3, cx lesi Set2 strcspnl byte “0123456789”,0 mov Index4, cx printf byte byte
Page 854
“First alpha char in Source is at offset %d\n” “First numeric char is at offset %d\n”
Strings and Character Sets byte byte dword
“First non-alpha in Source is at offset %d\n” “First non-numeric in Set2 is at offset %d\n”,0 Index1, Index2, Index3, Index4
This code outputs the following: First First First First
alpha char in Source is at offset 0 numeric char is at offset 8 non-alpha in Source is at offset 7 non-numeric in Set2 is at offset 10
15.4.13 Strstr, Strstrl Strstr searches for the first occurrence of one string within another. es:di contains the address of the string in which you want to search for a second string. dx:si contains the address of the second string for the strstr routine; for strstrl the search second string immediately follows the call in the code stream.
On return from strstr or strstrl, the carry flag will be set if the second string is not present in the source string. If the carry flag is clear, then the second string is present in the source string and cx will contain the (zero-based) index where the second string was found. Example: SourceStr SearchStr
byte byte
“Search for ‘this’ in this string”,0 “this”,0
. . .
lesi ldxi strstr jc print byte mov puti putcr lesi strstrl byte jc print byte mov puti putcr
SourceStr SearchStr
;See strcat for lesi definition. ;See strcat for ldxi definition.
NotPresent “Found string at offset “,0 ax, cx ;Need offset in AX for puti
SourceStr “for”,0 NotPresent “Found ‘for’ at offset “,0 ax, cx
NotPresent:
The above code prints the following: Found string at offset 12 Found ‘for’ at offset 7
15.4.14 Strtrim, Strtrimm These two routines are quite similar to strbdel and strbdelm. Rather than removing leading spaces, however, they trim off any trailing spaces from a string. Strtrim trims off any trailing spaces directly on the specified string in memory. Strtrimm first copies the source string and then trims and space off the copy. Both routines expect you to pass the address of the source string in es:di. Strtrimm returns a pointer to the new string (if it could allocate it) in es:di. It also returns the carry set or clear to denote error/no error. Example:
Page 855
Chapter 15 String1 String2 StrPtr1 StrPtr2
byte byte dword dword
“Spaces at the end “,0 “ Spaces on both sides 0 0
“,0
. . .
; ; ; ; ;
TrimSpcs trims the spaces off both ends of a string. Note that it is a little more efficient to perform the strbdel first, then the strtrim. This routine creates the new string on the heap and returns a pointer to this string in ES:DI.
TrimSpcs
BadAlloc: TrimSpcs
proc strbdelm jc BadAlloc strtrim clc ret endp
;Just return if error.
. . .
lesi strtrimm jc mov mov
String1
;See strcat for lesi definition.
error word ptr StrPtr1, di word ptr StrPtr1+2, es
lesi call jc mov mov
String2 TrimSpcs error word ptr StrPtr2, di word ptr StrPtr2+2, es
printf byte byte dword
“First string: ‘%s’\n” “Second string: ‘%s’\n”,0 StrPtr1, StrPtr2
This code fragment outputs the following: First string: ‘Spaces at the end’ Second string: ‘Spaces on both sides’
15.4.15 Other String Routines in the UCR Standard Library In addition to the “strxxx” routines listed in this section, there are many additional string routines available in the UCR Standard Library. Routines to convert from numeric types (integer, hex, real, etc.) to a string or vice versa, pattern matching and character set routines, and many other conversion and string utilities. The routines described in this chapter are those whose definitions appear in the “strings.a” header file and are specifically targeted towards generic string manipulation. For more details on the other string routines, consult the UCR Standard Library reference section in the appendices.
15.5
The Character Set Routines in the UCR Standard Library The UCR Standard Library provides an extensive collection of character set routines. These routines let you create sets, clear sets (set them to the empty set), add and remove one or more items, test for set membership, copy sets, compute the union, intersection, or difference, and extract items from a set. Although intended to manipulate sets of characters, you can use the StdLib character set routines to manipulate any set with 256 or fewer possible items.
Page 856
Strings and Character Sets The first unusual thing to note about the StdLib’s sets is their storage format. A 256-bit array would normally consumes 32 consecutive bytes. For performance reasons, the UCR Standard Library’s set format packs eight separate sets into 272 bytes (256 bytes for the eight sets plus 16 bytes overhead). To declare set variables in your data segment you should use the set macro. This macro takes the form: set
SetName1, SetName2, ..., SetName8
SetName1..SetName8 represent the names of up to eight set variables. You may have fewer than eight names in the operand field, but doing so will waste some bits in the set array.
The CreateSets routine provides another mechanism for creating set variables. Unlike the set macro, which you would use to create set variables in your data segment, the CreateSets routine allocates storage for up to eight sets dynamically at run time. It returns a pointer to the first set variable in es:di. The remaining seven sets follow at locations es:di+1, es:di+2, ..., es:di+7. A typical program that allocates set variables dynamically might use the following code: Set0 Set1 Set2 Set3 Set4 Set5 Set6 Set7
dword dword dword dword dword dword dword dword
? ? ? ? ? ? ? ?
. . .
CreateSets mov word mov word mov word mov word mov word mov word mov word mov word mov inc mov inc mov inc mov inc mov inc mov inc mov inc mov inc
word di word di word di word di word di word di word di word di
ptr ptr ptr ptr ptr ptr ptr ptr
Set0+2, Set1+2, Set2+2, Set3+2, Set4+2, Set5+2, Set6+2, Set7+2,
es es es es es es es es
ptr Set0, di ptr Set1, di ptr Set2, di ptr Set3, di ptr Set4, di ptr Set5, di ptr Set6, di ptr Set7, di
This code segment creates eight different sets on the heap, all empty, and stores pointers to them in the appropriate pointer variables. The SHELL.ASM file provides a commented-out line of code in the data segment that includes the file STDSETS.A. This include file provides the bit definitions for eight commonly used character sets. They are alpha (upper and lower case alphabetics), lower (lower case alphabetics), upper (upper case alphabetics), digits (“0”..”9”), xdigits (“0”..”9”, “A”..”F”, and “a”..”f”), alphanum (upper and lower case alphabetics plus the digits), whitespace (space, tab, carriage return, and line feed), and delimiters (whitespace plus commas, semicolons, less than, greater than, and vertical bar). If you would like to use these standard character sets in your program, you need to remove the semicolon from the beginning of the include statement in the SHELL.ASM file.
Page 857
Chapter 15 The UCR Standard Library provides 16 character set routines: CreateSets, EmptySet, RangeSet, AddStr, AddStrl, RmvStr, RmvStrl, AddChar, RmvChar, Member, CopySet, SetUnion, SetIntersect, SetDifference, NextItem, and RmvItem. All of these routines except CreateSets require a pointer to a character set variable in the es:di registers. Specific routines may require other parameters as well. The EmptySet routine clears all the bits in a set producing the empty set. This routine requires the address of the set variable in the es:di. The following example clears the set pointed at by Set1: les di, Set1 EmptySet RangeSet unions in a range of values into the set variable pointed at by es:di. The al register contains the lower bound of the range of items, ah contains the upper bound. Note that al must be less than or equal to ah. The following example constructs the set of all control characters (ASCII codes one through 31, the null character [ASCII code zero] is not allowed in sets): les di, CtrlCharSet mov al, 1 mov ah, 31 RangeSet
;Ptr to ctrl char set.
AddStr and AddStrl add all the characters in a zero terminated string to a character set. For AddStr, the dx:si register pair points at the zero terminated string. For AddStrl, the zero terminated string follows the call to AddStrl in the code stream. These routines union each character of the specified string into the set. The following examples add the digits and some special characters into the FPDigits set: Digits FPDigits
byte set dword
“0123456789”,0 FPDigitsSet FPDigitsSet
. . .
ldxi les AddStr
Digits ;Loads DX:SI with adrs of Digits. di, FPDigits
. . .
les AddStrL byte
di, FPDigits “Ee.+-”,0
RmvStr and RmvStrl remove characters from a set. You supply the characters in a zero terminated string. For RmvStr, dx:si points at the string of characters to remove from the string. For RmvStrl, the zero terminated string follows the call. The following example uses RmvStrl to remove the special symbols from FPDigits above: les RmvStrl byte
di, FPDigits “Ee.+-”,0
The AddChar and RmvChar routines let you add or remove individual characters. As usual, es:di points at the set; the al register contains the character you wish to add to the set or remove from the set. The following example adds a space to the set FPDigits and removes the “,” character (if present): les mov AddChar
di, FPDigits al, ‘ ‘
. . .
les mov RmvChar
Page 858
di, FPDigits al, ‘,’
Strings and Character Sets The Member function checks to see if a character is in a set. On entry, es:di must point at the set and al must contain the character to check. On exit, the zero flag is set if the character is a member of the set, the zero flag will be clear if the character is not in the set. The following example reads characters from the keyboard until the user presses a key that is not a whitespace character: SkipWS:
get lesi member je
WhiteSpace
;Read char from user into AL. ;Address of WS set into es:di.
SkipWS
The CopySet, SetUnion, SetIntersect, and SetDifference routines all operate on two sets of characters. The es:di register points at the destination character set, the dx:si register pair points at a source character set. CopySet copies the bits from the source set to the destination set, replacing the original bits in the destination set. SetUnion computes the union of the two sets and stores the result into the destination set. SetIntersect computes the set intersection and stores the result into the destination set. Finally, the SetDifference routine computes DestSet := DestSet - SrcSet. The NextItem and RmvItem routines let you extract elements from a set. NextItem returns in al the ASCII code of the first character it finds in a set. RmvItem does the same thing except it also removes the character from the set. These routines return zero in al if the set is empty (StdLib sets cannot contain the NULL character). You can use the RmvItem routine to build a rudimentary iterator for a character set. The UCR Standard Library’s character set routines are very powerful. With them, you can easily manipulate character string data, especially when searching for different patterns within a string. We will consider this routines again when we study pattern matching later in this text (see “Pattern Matching” on page 883).
15.6
Using the String Instructions on Other Data Types The string instructions work with other data types besides character strings. You can use the string instructions to copy whole arrays from one variable to another, to initialize large data structures to a single value, or to compare entire data structures for equality or inequality. Anytime you’re dealing with data structures containing several bytes, you may be able to use the string instructions.
15.6.1
Multi-precision Integer Strings The cmps instruction is useful for comparing (very) large integer values. Unlike character strings, we cannot compare integers with cmps from the L.O. byte through the H.O. byte. Instead, we must compare them from the H.O. byte down to the L.O. byte. The following code compares two 12-byte integers:
repe
lea lea mov std cmpsw
di, integer1+10 si, integer2+10 cx, 6
After the execution of the cmpsw instruction, the flags will contain the result of the comparison. You can easily assign one long integer string to another using the movs instruction. Nothing tricky here, just load up the si, di, and cx registers and have at it. You must do other operations, including arithmetic and logical operations, using the extended precision methods described in the chapter on arithmetic operations.
Page 859
Chapter 15
15.6.2
Dealing with Whole Arrays and Records The only operations that apply, in general, to all array and record structures are assignment and comparison (for equality/inequality only). You can use the movs and cmps instructions for these operations. Operations such as scalar addition, transposition, etc., may be easily synthesized using the lods and stos instructions. The following code shows how you can easily add the value 20 to each element of the integer array A:
AddLoop:
lea mov mov cld lodsw add stosw loop
si, A di, si cx, SizeOfA
ax, 20 AddLoop
You can implement other operations in a similar fashion.
15.7
Sample Programs In this section there are three sample programs. The first searches through a file for a particular string and displays the line numbers of any lines containing that string. This program demonstrates the use of the strstr function (among other things). The second program is a demo program that uses several of the string functions available in the UCR Standard Library’s string package. The third program demonstrates how to use the 80x86 cmps instruction to compare the data in two files. These programs (find.asm, strdemo.asm, and fcmp.asm) are available on the companion CD-ROM.
15.7.1
Find.asm ; ; ; ; ; ; ; ;
Find.asm This program opens a file specified on the command line and searches for a string (also specified on the command line). Program Usage: find "string" filename
.xlist include stdlib.a includelib stdlib.lib .list
Page 860
wp
textequ
<word ptr>
dseg
segment
para public 'data'
StrPtr FileName LineCnt
dword dword dword
? ? ?
FVar
filevar
{}
InputLine dseg
byte ends
1024 dup (?)
Strings and Character Sets cseg
segment assume
; Readln;
This procedure reads a line of text from the input file and buffers it up in the "InputLine" array.
ReadLn
proc push push push push
ReadLp:
para public 'code' cs:cseg, ds:dseg
es ax di bx
lesi mov fgetc jc
FVar bx, 0
;Read from our file. ;Index into InputLine. ;Get next char from file. ;Quit on EOF
EndRead
cmp je cmp je
al, cr ReadLp al, lf EndRead
mov inc jmp
InputLine[bx], al bx ReadLp
;Ignore carriage returns. ;End of line on line feed.
; If we hit the end of a line or the end of the file, ; zero-terminate the string. EndRead:
ReadLn
mov pop pop pop pop ret endp
InputLine[bx], 0 bx di ax es
; The following main program extracts the search string and the ; filename from the command line, opens the file, and then searches ; for the string in that file. Main
proc mov mov mov meminit argc cmp je print byte jmp
GoodArgs:
ax, dseg ds, ax es, ax
cx, 2 GoodArgs "Usage: find 'string' filename",cr,lf,0 Quit
mov argv mov mov
ax, 1
;Get the string to search for ; off the command line. wp StrPtr, di wp StrPtr+2, es
mov argv mov mov
ax, 2
;Get the filename from the ; command line. wp Filename, di wp Filename+2, es
; Open the input file for reading mov mov
ax, 0 ;Open for read. si, wp FileName
Page 861
Chapter 15 mov lesi fopen jc
dx, wp FileName+2 Fvar BadOpen
; Okay, start searching for the string in the file.
SearchLp:
; ; ; ;
mov mov call jc
wp LineCnt, 0 wp LineCnt+2, 0 ReadLn AtEOF
Bump the line number up by one. Note that this is 8086 code so we have to use extended precision arithmetic to do a 32-bit add. LineCnt is a 32-bit variable because some files have more that 65,536 lines. add adc
wp LineCnt, 1 wp LineCnt+2, 0
; Search for the user-specified string on the current line. lesi mov mov strstr jc
InputLine dx, wp StrPtr+2 si, wp StrPtr SearchLp;Jump if not found.
; Print an appropriate message if we found the string. printf byte dword jmp
"Found '%^s' at line %ld\n",0 StrPtr, LineCnt SearchLp
; Close the file when we're done. AtEOF:
BadOpen:
15.7.2
lesi fclose jmp
FVar Quit
printf byte dword
"Error attempting to open %^s\n",cr,lf,0 FileName
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
StrDemo.asm This short demo program just shows off how to use several of the string routines found in the UCR Standard Library strings package. ; StrDemo.asm;
Page 862
Demonstration of some of the various UCR Standard Library string routines.
Strings and Character Sets include stdlib.a includelib stdlib.lib dseg
segment
para public 'data'
MemAvail String
word byte
? 256 dup (0)
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov
ax, seg dseg ;Set up the segment registers ds, ax es, ax
MemInit mov printf byte byte dword
MemAvail, cx "There are %x paragraphs of memory available." cr,lf,lf,0 MemAvail
; Demonstration of StrTrim:
HelloThere1
print byte strdupl byte strtrim mov putc puts putc putcr free
"Testing strtrim on 'Hello there "Hello there
'",cr,lf,0
",0
al, "'"
;Demonstration of StrTrimm: print byte "Testing strtrimm on 'Hello there lesi HelloThere1 strtrimm mov al, "'" putc puts putc putcr free
'",cr,lf,0
; Demonstration of StrBdel
HelloThere3
print byte strdupl byte strbdel mov putc puts putc putcr free
"Testing strbdel on ' "
Hello there
Hello there
'",cr,lf,0
",0
al, "'"
Page 863
Chapter 15 ; Demonstration of StrBdelm print byte "Testing strbdelm on ' lesi HelloThere3 strbdelm mov al, "'" putc puts putc putcr free
Hello there
'",cr,lf,0
; Demonstrate StrCpyl: ldxi strcpyl byte printf byte dword
string "Copy this string to the 'String' variable",0
"STRING = '%s'",cr,lf,0 String
; Demonstrate StrCatl: lesi strcatl byte printf byte dword
String ". Put at end of 'String'",0
"STRING = ",'"%s"',cr,lf,0 String
; Demonstrate StrChr: lesi mov strchr print byte byte mov puti putcr
String al, "'"
"StrChr: First occurrence of ", '"', "'" '" found at position ',0 ax, cx
; Demonstrate StrStrl: lesi strstrl byte print byte byte
String "String",0
'StrStr: First occurrence of "String" found at ‘ ‘position ',0
mov puti putcr
ax, cx
lesi mov strset
String al, '*'
; Demo of StrSet
printf byte dword
Page 864
"Strset: String
'%s'",cr,lf,0
Strings and Character Sets ; Demo of strlen lesi strlen print byte puti putcr Quit:
15.7.3
String
"String length = ",0
Main
mov int endp
ah, 4ch 21h
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 256 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Fcmp.asm This is a file comparison program. It demonstrates the use of the 80x86 cmps instruction (as well as blocked I/O under DOS). ; FCMP.ASM;
A file comparison program that demonstrates the use of the 80x86 string instructions. .xlist include stdlib.a includelib stdlib.lib .list
dseg
segment
para public 'data'
Name1 Name2 Handle1 Handle2 LineCnt
dword dword word word word
? ? ? ? 0
;Ptr to filename #1 ;Ptr to filename #2 ;File handle for file #1 ;File handle for file #2 ;# of lines in the file.
Buffer1 Buffer2
byte byte
256 dup (0) 256 dup (0)
;Block of data from file 1 ;Block of data from file 2
dseg
ends
wp
equ
<word ptr>
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
; Error- Prints a DOS error message depending upon the error type. Error
NotFNF:
proc cmp jne print byte jmp
near ax, 2 NotFNF
cmp jne
ax, 4 NotTMF
"File not found",0 ErrorDone
Page 865
Chapter 15 print byte jmp NotTMF:
NotAD:
NotIA: ErrorDone: Error
"Too many open files",0 ErrorDone
cmp jne print byte jmp
ax, 5 NotAD
cmp jne print byte jmp
ax, 12 NotIA
"Access denied",0 ErrorDone
"Invalid access",0 ErrorDone
putcr ret endp
; Okay, here's the main program. It opens two files, compares them, and ; complains if they're different. Main
proc mov mov mov meminit
; File comparison routine. argc cmp je print byte jmp GotTwoNames:
mov argv mov mov
ax, seg dseg ds, ax es, ax
;Set up the segment registers
First, open the two source files.
cx, 2 GotTwoNames
;Do we have two filenames?
"Usage: fcmp file1 file2",cr,lf,0 Quit ax, 1
;Get first file name
wp Name1, di wp Name1+2, es
; Open the files by calling DOS.
GoodOpen1:
Page 866
mov lds int jnc printf byte dword call jmp
ax, 3d00h dx, Name1 21h GoodOpen1
mov mov mov
dx, dseg ds, dx Handle1, ax
mov argv mov mov
ax, 2
mov lds int jnc printf
ax, 3d00h dx, Name2 21h GoodOpen2
;Open for reading
"Error opening %^s:",0 Name1 Error Quit
;Get second file name
wp Name2, di wp Name2+2, es ;Open for reading
Strings and Character Sets
GoodOpen2:
byte dword call jmp
"Error opening %^s:",0 Name2 Error Quit
mov mov mov
dx, dseg ds, dx Handle2, ax
; Read the data from the files using blocked I/O ; and compare it.
CmpLoop:
mov mov mov lea mov int jc cmp jne
LineCnt, 1 bx, Handle1 cx, 256 dx, Buffer1 ah, 3fh 21h FileError ax, 256 EndOfFile
mov mov lea mov int jc cmp jne
bx, Handle2 cx, 256 dx, Buffer2 ah, 3fh 21h FileError ax, 256 BadLen
;Read 256 bytes from ; the first file into ; Buffer1.
;Leave if at EOF.
;Read 256 bytes from ; the second file into ; Buffer2
;If we didn't read 256 bytes, ; the files are different.
; Okay, we've just read 256 bytes from each file, compare the buffers ; to see if the data is the same in both files.
repe
FileError:
BadLen:
BadCmp:
mov mov mov mov lea lea cld cmpsb jne jmp
ax, ds, es, cx, di, si,
dseg ax ax 256 Buffer1 Buffer2
BadCmp CmpLoop
print byte call jmp
"Error reading files: ",0 Error Quit
print byte
"File lengths were different",cr,lf,0
print byte
7,"Files were not equal",cr,lf,0
mov int
ax, 4c01h 21h
;Exit with error.
; If we reach the end of the first file, compare any remaining bytes ; in that first file against the remaining bytes in the second file. EndOfFile:
push mov mov lea mov
ax bx, cx, dx, ah,
;Save final length. Handle2 256 Buffer2 3fh
Page 867
Chapter 15
repe
Quit: Main cseg
int jc
21h BadCmp
pop cmp jne
bx ax, bx BadLen
;Retrieve file1's length. ;See if file2 matches it.
mov mov mov mov lea lea cmpsb jne
cx, ax, ds, es, di, si,
;Compare the remaining ; bytes down here.
mov int endp ends
ax, 4c00h 21h
ax dseg ax ax Buffer2 Buffer1
BadCmp ;Set Exit code to okay.
; Allocate a reasonable amount of space for the stack (2k).
15.8
sseg stk sseg
segment byte ends
para stack 'stack' 256 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public 'zzzzzz' 16 dup (?) Main
Laboratory Exercises These exercises use the Ex15_1.asm, Ex15_2.asm, Ex15_3.asm, and Ex15_4.asm files found on the companion CD-ROM. In this set of laboratory exercises you will be measuring the performance of the 80x86 movs instructions and the (hopefully) minor performance differences between length prefixed string operations and zero terminated string operations.
15.8.1
MOVS Performance Exercise #1 The movsb, movsw, and movsd instructions operate at different speeds, even when moving around the same number of bytes. In general, the movsw instruction is twice as fast as movsb when moving the same number of bytes. Likewise, movsd is about twice as fast as movsw (and about four times as fast as movsb) when moving the same number of bytes. Ex15_1.asm is a short program that demonstrates this fact. This program consists of three sections that copy 2048 bytes from one buffer to another 100,000 times. The three sections repeat this operation using the movsb, movsw, and movsd instructions. Run this program and time each phase. For your lab report: present the timings on your machine. Be sure to list processor type and clock frequency in your lab report. Discuss why the timings are different between the three phases of this program. Explain the difficulty with using the movsd (versus movsw or movsb) instruction in any program on an 80386 or later processor. Why is it not a general replacement for movsb, for example? How can you get around this problem? ; EX15_1.asm ; ; This program demonstrates the proper use of the 80x86 string instructions. .386 option
Page 868
segment:use16
Strings and Character Sets include stdlib.a includelib stdlib.lib
dseg
segment
para public 'data'
Buffer1 Buffer2
byte byte
2048 dup (0) 2048 dup (0)
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
; Demo of the movsb, movsw, and movsd instructions print byte byte byte byte byte byte byte byte
"The following code moves a block of 2,048 bytes " "around 100,000 times.",cr,lf "The first phase does this using the movsb " "instruction; the second",cr,lf "phase does this using the movsw instruction; " "the third phase does",cr,lf "this using the movsd instruction.",cr,lf,lf,lf "Press any key to begin phase one:",0
getc putcr
movsbLp:
rep
mov
edx, 100000
lea lea cld mov movsb dec jnz
si, Buffer1 di, Buffer2
print byte byte byte
cx, 2048 edx movsbLp
cr,lf "Phase one complete",cr,lf,lf "Press any key to begin phase two:",0
getc putcr
movswLp:
rep
mov
edx, 100000
lea lea cld mov movsw dec jnz
si, Buffer1 di, Buffer2
print byte byte byte
cx, 1024 edx movswLp
cr,lf "Phase two complete",cr,lf,lf "Press any key to begin phase three:",0
getc
Page 869
Chapter 15 putcr
movsdLp:
rep
15.8.2
mov
edx, 100000
lea lea cld mov movsd dec jnz
si, Buffer1 di, Buffer2 cx, 512 edx movsdLp
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
MOVS Performance Exercise #2 In this exercise you will once again time the computer moving around blocks of 2,048 bytes. Like Ex15_1.asm in the previous exercise, Ex15_2.asm contains three phases; the first phase moves data using the movsb instruction; the second phase moves the data around using the lodsb and stosb instructions; the third phase uses a loop with simple mov instructions. Run this program and time the three phases. For your lab report: include the timings and a description of your machine (CPU, clock speed, etc.). Discuss the timings and explain the results (consult Appendix D as necessary). ; ; ; ; ;
EX15_2.asm This program compares the performance of the MOVS instruction against a manual block move operation. It also compares MOVS against a LODS/STOS loop. .386 option
segment:use16
include stdlib.a includelib stdlib.lib
Page 870
dseg
segment
para public 'data'
Buffer1 Buffer2
byte byte
2048 dup (0) 2048 dup (0)
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
Strings and Character Sets ; MOVSB version done here: print byte byte byte byte byte byte byte byte byte
"The following code moves a block of 2,048 bytes " "around 100,000 times.",cr,lf "The first phase does this using the movsb " "instruction; the second",cr,lf "phase does this using the lods/stos instructions; " "the third phase does",cr,lf "this using a loop with MOV “ “instructions.",cr,lf,lf,lf "Press any key to begin phase one:",0
getc putcr
movsbLp:
rep
mov
edx, 100000
lea lea cld mov movsb dec jnz
si, Buffer1 di, Buffer2
print byte byte byte
cx, 2048 edx movsbLp
cr,lf "Phase one complete",cr,lf,lf "Press any key to begin phase two:",0
getc putcr
LodsStosLp:
lodsstoslp2:
mov
edx, 100000
lea lea cld mov lodsb stosb loop dec jnz
si, Buffer1 di, Buffer2
print byte byte byte
cx, 2048
LodsStosLp2 edx LodsStosLp
cr,lf "Phase two complete",cr,lf,lf "Press any key to begin phase three:",0
getc putcr
MovLp:
MovLp2:
mov
edx, 100000
lea lea cld mov mov mov inc inc loop dec jnz
si, Buffer1 di, Buffer2 cx, 2048 al, ds:[si] es:[di], al si di MovLp2 edx MovLp
Page 871
Chapter 15
15.8.3
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Memory Performance Exercise In the previous two exercises, the programs accessed a maximum of 4K of data. Since most modern on-chip CPU caches are at least this big, most of the activity took place directly on the CPU (which is very fast). The following exercise is a slight modification that moves the array data in such a way as to destroy cache performance. Run this program and time the results. For your lab report: based on what you learned about the 80x86’s cache mechanism in Chapter Three, explain the performance differences. ; ; ; ; ; ;
EX15_3.asm This program compares the performance of the MOVS instruction against a manual block move operation. It also compares MOVS against a LODS/STOS loop. This version does so in such a way as to wipe out the on-chip CPU cache. .386 option
segment:use16
include stdlib.a includelib stdlib.lib dseg
segment
para public 'data'
Buffer1 Buffer2
byte byte
16384 dup (0) 16384 dup (0)
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
; MOVSB version done here: print byte byte byte byte byte byte byte byte byte getc
Page 872
"The following code moves a block of 16,384 bytes " "around 12,500 times.",cr,lf "The first phase does this using the movsb " "instruction; the second",cr,lf "phase does this using the lods/stos instructions; " "the third phase does",cr,lf "this using a loop with MOV instructions." cr,lf,lf,lf "Press any key to begin phase one:",0
Strings and Character Sets putcr
movsbLp:
rep
mov
edx, 12500
lea lea cld mov movsb dec jnz
si, Buffer1 di, Buffer2
print byte byte byte
cx, 16384 edx movsbLp
cr,lf "Phase one complete",cr,lf,lf "Press any key to begin phase two:",0
getc putcr
LodsStosLp:
lodsstoslp2:
mov
edx, 12500
lea lea cld mov lodsb stosb loop dec jnz
si, Buffer1 di, Buffer2
print byte byte byte
cx, 16384
LodsStosLp2 edx LodsStosLp
cr,lf "Phase two complete",cr,lf,lf "Press any key to begin phase three:",0
getc putcr
MovLp:
MovLp2:
mov
edx, 12500
lea lea cld mov mov mov inc inc loop dec jnz
si, Buffer1 di, Buffer2 cx, 16384 al, ds:[si] es:[di], al si di MovLp2 edx MovLp
Quit: Main cseg
ExitPgm endp ends
;DOS macro to quit program.
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Page 873
Chapter 15
15.8.4
The Performance of Length-Prefixed vs. Zero-Terminated Strings The following program (Ex15_4.asm on the companion CD-ROM) executes two million string operations. During the first phase of execution, this code executes a sequence of length-prefixed string operations 1,000,000 times. During the second phase it does a comparable set of operation on zero terminated strings. Measure the execution time of each phase. For your lab report: report the differences in execution times and comment on the relative efficiency of length prefixed vs. zero terminated strings. Note that the relative performances of these sequences will vary depending upon the processor you use. Based on what you learned in Chapter Three and the cycle timings in Appendix D, explain some possible reasons for relative performance differences between these sequences among different processors. ; ; ; ; ; ; ;
EX15_4.asm This program compares the performance of length prefixed strings versus zero terminated strings using some simple examples. Note: these routines all assume that the strings are in the data segment and both ds and es already point into the data segment. .386 option
segment:use16
include stdlib.a includelib stdlib.lib dseg
segment
para public 'data'
LStr1 LResult
byte byte
17,"This is a string." 256 dup (?)
ZStr1 ZResult
byte byte
"This is a string",0 256 dup (?)
dseg
ends
cseg
segment assume
para public 'code' cs:cseg, ds:dseg
; LStrCpy: Copies a length prefixed string pointed at by SI to ; the length prefixed string pointed at by DI. LStrCpy
proc push push push
si di cx
cld mov mov inc movsb
cl, [si] ch, 0 cx
cx di si
LStrCpy
pop pop pop ret endp
; LStrCat; ;
Concatenates the string pointed at by SI to the end of the string pointed at by DI using length prefixed strings.
LStrCat
proc
rep
Page 874
;Get length of string. ;Include length byte.
Strings and Character Sets push push push
si di cx
cld ; Compute the final length of the concatenated string mov mov add
cl, [di] ch, [si] [di], ch
;Get orig length. ;Get 2nd Length. ;Compute new length.
; Move SI to the first byte beyond the end of the first string. mov add inc
ch, 0 di, cx di
;Zero extend orig len. ;Skip past str. ;Skip past length byte.
; Concatenate the second string (SI) to the end of the first string (DI) rep
movsb
;Copy 2nd to end of orig.
LStrCat
pop pop pop ret endp
cx di si
; LStrCmp; ;
String comparison using two length prefixed strings. SI points at the first string, DI points at the string to compare it against.
LStrCmp
proc push push push
si di cx
cld ; When comparing the strings, we need to compare the strings ; up to the length of the shorter string. The following code ; computes the minimum length of the two strings.
HasMin: repe
mov mov cmp jb mov mov
cl, [si] ch, [di] cl, ch HasMin cl, ch ch, 0
cmpsb je pop pop pop ret
CmpLen cx di si
;Get the minimum of the two lengths
;Compare the two strings.
; If the strings are equal through the length of the shorter string, ; we need to compare their lengths CmpLen:
LStrCmp
pop pop pop
cx di si
mov cmp ret endp
cl, [si] cl, [di]
; ZStrCpy- Copies the zero terminated string pointed at by SI
Page 875
Chapter 15 ;
to the zero terminated string pointed at by DI.
ZStrCpy
proc push push push
si di ax
mov inc mov inc cmp jne
al, [si] si [di], al di al, 0 ZSCLp ax di si
ZStrCpy
pop pop pop ret endp
; ZStrCat; ;
Concatenates the string pointed at by SI to the end of the string pointed at by DI using zero terminated strings.
ZStrCat
proc push push push push
ZSCLp:
si di cx ax
cld ; Find the end of the destination string:
repne
mov mov scasb
cx, 0FFFFh al, 0
;Look for zero byte.
; Copy the source string to the end of the destination string: ZcatLp:
mov inc mov inc cmp jne
al, [si] si [di], al di al, 0 ZCatLp ax cx di si
ZStrCat
pop pop pop pop ret endp
; ZStrCmp; ;
Compares two zero terminated strings. This is actually easier than the length prefixed comparison.
ZStrCmp
proc push push push
; ; ; ;
Compare the two strings until they are not equal or until we encounter a zero byte. They are equal if we encounter a zero byte after comparing the two characters from the strings.
ZCmpLp:
Page 876
cx si di
mov
al, [si]
Strings and Character Sets
ZCmpDone:
ZStrCmp
Main
inc cmp jne inc cmp jne
si al, [di] ZCmpDone di al, 0 ZCmpLp
pop pop pop ret endp
di si cx
proc mov mov mov meminit print byte byte byte byte byte byte
ax, dseg ds, ax es, ax
"The following code does 1,000,000 string " "operations using",cr,lf "length prefixed strings. Measure the amount " "of time this code",cr,lf "takes to run.",cr,lf,lf "Press any key to begin:",0
getc putcr
LStrCpyLp:
mov lea lea call call call call call call call call
edx, 1000000 si, LStr1 di, LResult LStrCpy LStrCat LStrCat LStrCat LStrCpy LStrCmp LStrCat LStrCmp
dec jne
edx LStrCpyLp
print byte byte byte byte byte byte
"The following code does 1,000,000 string " "operations using",cr,lf "zero terminated strings. Measure the amount " "of time this code",cr,lf "takes to run.",cr,lf,lf "Press any key to begin:",0
getc putcr
ZStrCpyLp:
mov lea lea call call call call call call call call
edx, 1000000 si, ZStr1 di, ZResult ZStrCpy ZStrCat ZStrCat ZStrCat ZStrCpy ZStrCmp ZStrCat ZStrCmp
dec
edx
Page 877
Chapter 15 jne
15.9
ZStrCpyLp
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack 'stack' 1024 dup ("stack ")
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public 'zzzzzz' 16 dup (?) Main
Programming Projects
1)
Write a SubStr function that extracts a substring from a zero terminated string. Pass a pointer to the string in ds:si, a pointer to the destination string in es:di, the starting position in the string in ax, and the length of the substring in cx. Follow all the rules given in section 15.3.1 concerning degenerate conditions.
2)
Write a word iterator (see “Iterators” on page 663) to which you pass a string (by reference, on the stack). Each each iteration of the corresponding foreach loop should extract a word from this string, malloc sufficient storage for this string on the heap, copy that word (substring) to the malloc’d location, and return a pointer to the word. Write a main program that calls the iterator with various strings to test it.
3)
Modify the find.asm program (see “Find.asm” on page 860) so that it searches for the desired string in several files using ambiguous filenames (i.e., wildcard characters). See “Find First File” on page 729 for details about processing filenames that contain wildcard characters. You should write a loop that processes all matching filenames and executes the find.asm core code on each filename that matches the ambiguous filename a user supplies.
4)
Write a strncpy routine that behaves like strcpy except it copies a maximum of n characters (including the zero terminating byte). Pass the source string’s address in es:di, the destination string’s address in dx:si, and the maximum length in cx.
5)
The movsb instruction may not work properly if the source and destination blocks overlap (see “The MOVS Instruction” on page 822). Write a procedure “bcopy” to which you pass the address of a source block, the address of a destination block, and a length, that will properly copy the data even if the source and destination blocks overlap. Do this by checking to see if the blocks overlap and adjusting the source pointer, destination pointer, and direction flag if necessary.
6)
As you discovered in the lab experiments, the movsd instruction can move a block of data much faster than movsb or movsw can move that same block. Unfortunately, it can only move a block that contains an even multiple of four bytes. Write a “fastcopy” routine that uses the movsd instruction to copy all but the last one to three bytes of a source block to the destination block and then manually copies the remaining bytes between the blocks. Write a main program with several boundary test cases to verify correct operation. Compare the performance of your fastcopy procedure against the use of the movsb instruction.
15.10 Summary The 80sx86 provides a powerful set of string instructions. However, these instructions are very primitive, useful mainly for manipulating blocks of bytes. They do not correspond to the string instructions one expects to find in a high level language. You can, however, use the 80x86 string instructions to synthesize those functions normally associated with HLLs. This chapter explains how to construct many of the more popular string funcPage 878
Strings and Character Sets tions. Of course, it’s foolish to constantly reinvent the wheel, so this chapter also describes many of the string functions available in the UCR Standard Library. The 80x86 string instructions provide the basis for many of the string operations appearing in this chapter. Therefore, this chapter begins with a review and in-depth discussion of the 80x86 string instructions: the repeat prefixes, and the direction flag. This chapter discusses the operation of each of the string instructions and describes how you can use each of them to perform string related tasks. To see how the 80x86 string instructions operate, check out the following sections: • • • • • • • • • • •
“The 80x86 String Instructions” on page 819 “How the String Instructions Operate” on page 819 “The REP/REPE/REPZ and REPNZ/REPNE Prefixes” on page 820 “The Direction Flag” on page 821 “The MOVS Instruction” on page 822 “The CMPS Instruction” on page 826 “The SCAS Instruction” on page 828 “The STOS Instruction” on page 828 “The LODS Instruction” on page 829 “Building Complex String Functions from LODS and STOS” on page 830 “Prefixes and the String Instructions” on page 830
Although Intel calls them “string instructions” they do not actually work on the abstract data type we normally think of as a character string. The string instructions simply manipulate arrays of bytes, words, or double words. It takes a little work to get these instructions to deal with true character strings. Unfortunately, there isn’t a single definition of a character string which, no doubt, is the reason there aren’t any instructions specifically for character strings in the 80x86 instruction set. Two of the more popular character string types include length prefixed strings and zero terminated strings which Pascal and C use, respectively. Details on string formats appear in the following sections: • •
“Character Strings” on page 831 “Types of Strings” on page 831
Once you decide on a specific data type for you character strings, the next step is to implement various functions to process those strings. This chapter provides examples of several different string functions designed specifically for length prefixed strings. To learn about these functions and see the code that implements them, look at the following sections: • • • • • • • • •
“String Assignment” on page 832 “String Comparison” on page 834 “Character String Functions” on page 835 “Substr” on page 835 “Index” on page 838 “Repeat” on page 840 “Insert” on page 841 “Delete” on page 843 “Concatenation” on page 844
The UCR Standard Library provides a very rich set of string functions specifically designed for zero germinated strings. For a description of many of these routines, read the following sections: • • • • • •
“String Functions in the UCR Standard Library” on page 845 “StrBDel, StrBDelm” on page 846 “Strcat, Strcatl, Strcatm, Strcatml” on page 847 “Strchr” on page 848 “Strcmp, Strcmpl, Stricmp, Stricmpl” on page 848 “Strcpy, Strcpyl, Strdup, Strdupl” on page 849 Page 879
Chapter 15 • • • • • • • • • •
“Strdel, Strdelm” on page 850 “Strins, Strinsl, Strinsm, Strinsml” on page 851 “Strlen” on page 852 “Strlwr, Strlwrm, Strupr, Struprm” on page 852 “Strrev, Strrevm” on page 853 “Strset, Strsetm” on page 853 “Strspan, Strspanl, Strcspan, Strcspanl” on page 854 “Strstr, Strstrl” on page 855 “Strtrim, Strtrimm” on page 855 “Other String Routines in the UCR Standard Library” on page 856
As mentioned earlier, the string instructions are quite useful for many operations beyond character string manipulation. This chapter closes with some sections describing other uses for the string instructions. See • • •
“Using the String Instructions on Other Data Types” on page 859 “Multi-precision Integer Strings” on page 859 “Dealing with Whole Arrays and Records” on page 860
The set is another common abstract data type commonly found in programs today. A set is a data structure which represent membership (or lack thereof) of some group of objects. If all objects are of the same underlying base type and there is a limited number of possible objects in the set, then we can use a bit vector (array of booleans) to represent the set. The bit vector implementation is very efficient for small sets. The UCR Standard Library provides several routines to manipulate character sets and other sets with a maximum of 256 members. For more details, •
Page 880
“The Character Set Routines in the UCR Standard Library” on page 856
Strings and Character Sets
15.11 Questions 1)
What are the repeat prefixes used for?
2)
Which string prefixes are used with the following instructions? a) MOVS
b) CMPS
c) STOS
d) SCAS
3)
Why aren’t the repeat prefixes normally used with the LODS instruction?
4)
What happens to the SI, DI, and CX registers when the MOVSB instruction is executed (without a repeat prefix) and: a) the direction flag is set.
b) the direction flag is clear.
5)
Explain how the MOVSB and MOVSW instructions work. Describe how they affect memory and registers with and without the repeat prefix. Describe what happens when the direction flag is set and clear.
6)
How do you preserve the value of the direction flag across a procedure call?
7)
How can you ensure that the direction flag always contains a proper value before a string instruction without saving it inside a procedure?
8)
What is the difference between the “MOVSB”, “MOVSW”, and “MOVS oprnd1,oprnd2” instructions?
9)
Consider the following Pascal array definition: a:array [0..31] of record a,b,c:char; i,j,k:integer; end;
Assuming A[0] has been initialized to some value, explain how you can use the MOVS instruction to initialize the remaining elements of A to the same value as A[0]. 10)
Give an example of a MOVS operation which requires the direction flag to be: a) clear
b) set
11)
How does the CMPS instruction operate? (what does it do, how does it affect the registers and flags, etc.)
12)
Which segment contains the source string? The destination string?
13)
What is the SCAS instruction used for?
14)
How would you quickly initialize an array to all zeros?
15)
How are the LODS and STOS instructions used to build complex string operations?
16)
How would you use the SUBSTR function to extract a substring of length 6 starting at offset 3 in the StrVar variable, storing the substring in the NewStr variable?
17)
What types of errors can occur when the SUBSTR function is executed?
18)
Give an example demonstrating the use of each of the following string functions: a) INDEX
b) REPEAT
c) INSERT
d) DELETE
e) CONCAT
19)
Write a short loop which multiplies each element of a single dimensional array by 10. Use the string instructions to fetch and store each array element.
20)
The UCR Standard Library does not provide an STRCPYM routine. What is the routine which performs this task?
21)
Suppose you are writing an “adventure game” into which the player types sentences and you want to pick out the two words “GO” and “NORTH”, if they are present, in the input line. What (non-UCR StdLib) string function appearing in this chapter would you use to search for these words? What UCR Standard Library routine would you use?
22)
Explain how to perform an extended precision integer comparison using CMPS Page 881
Chapter 15
Page 882
Pattern Matching
Chapter 16
The last chapter covered character strings and various operations on those strings. A very typical program reads a sequence of strings from the user and compares the strings to see if they match. For example, DOS’ COMMAND.COM program reads command lines from the user and compares the strings the user types to fixed strings like “COPY”, “DEL”, “RENAME”, and so on. Such commands are easy to parse because the set of allowable commands is finite and fixed. Sometimes, however, the strings you want to test for are not fixed; instead, they belong to a (possibly infinite) set of different strings. For example, if you execute the DOS command “DEL *.BAK”, MS-DOS does not attempt to delete a file named “*.BAK”. Instead, it deletes all files which match the generic pattern “*.BAK”. This, of course, is any file which contains four or more characters and ends with “.BAK”. In the MS-DOS world, a string containing characters like “*” and “?” are called wildcards; wildcard characters simply provide a way to specify different names via patterns. DOS’ wildcard characters are very limited forms of what are known as regular expressions; regular expressions are very limited forms of patterns in general. This chapter describes how to create patterns that match a variety of character strings and write pattern matching routines to see if a particular string matches a given pattern.
16.1
An Introduction to Formal Language (Automata) Theory Pattern matching, despite its low-key coverage, is a very important topic in computer science. Indeed, pattern matching is the main programming paradigm in several programming languages like Prolog, SNOBOL4, and Icon. Several programs you use all the time employ pattern matching as a major part of their work. MASM, for example, uses pattern matching to determine if symbols are correctly formed, expressions are proper, and so on. Compilers for high level languages like Pascal and C also make heavy use of pattern matching to parse source files to determine if they are syntactically correct. Surprisingly enough, an important statement known as Church’s Hypothesis suggests that any computable function can be programmed as a pattern matching problem1. Of course, there is no guarantee that the solution would be efficient (they usually are not), but you could arrive at a correct solution. You probably wouldn’t need to know about Turing machines (the subject of Church’s hypothesis) if you’re interested in writing, say, an accounts receivable package. However, there many situations where you may want to introduce the ability to match some generic patterns; so understanding some of the theory of pattern matching is important. This area of computer science goes by the stuffy names of formal language theory and automata theory. Courses in these subjects are often less than popular because they involve a lot of proofs, mathematics, and, well, theory. However, the concepts behind the proofs are quite simple and very useful. In this chapter we will not bother trying to prove everything about pattern matching. Instead, we will accept the fact that this stuff really works and just apply it. Nonetheless, we do have to discuss some of the results from automata theory, so without further ado…
16.1.1
Machines vs. Languages You will find references to the term “machine” throughout automata theory literature. This term does not refer to some particular computer on which a program executes. Instead, this is usually some function that reads a string of symbols as input and produces one of two outputs: match or failure. A typical machine (or automaton ) divides all possible strings into two sets – those strings that it accepts (or matches) and those string that it rejects. The language accepted by this machine is the set of all strings that the machine
1. Actually, Church’s Hypothesis claims that any computable function can be computed on a Turing machine. However, the Turing machine is the ultimate pattern machine computer.
Page 883 Thi d
t
t d ith F
M k
402
Chapter 16 accepts. Note that this language could be infinite, finite, or the empty set (i.e., the machine rejects all input strings). Note that an infinite language does not suggest that the machine accepts all strings. It is quite possible for the machine to accept an infinite number of strings and reject an even greater number of strings. For example, it would be very easy to design a function which accepts all strings whose length is an even multiple of three. This function accepts an infinite number of strings (since there are an infinite number of strings whose length is a multiple of three) yet it rejects twice as many strings as it accepts. This is a very easy function to write. Consider the following 80x86 program that accepts all strings of length three (we’ll assume that the carriage return character terminates a string): MatchLen3
Failure:
Accept: MatchLen3
proc getc cmp je getc cmp je getc cmp jne mov ret mov ret endp
near al, cr Accept
;Get character #1. ;Zero chars if EOLN. ;Get character #2.
al, cr Failure ;Get character #3. al, cr MatchLen3 ax, 0
;Return zero to denote failure.
ax, 1
;Return one to denote success.
By tracing through this code, you should be able to easily convince yourself that it returns one in ax if it succeeds (reads a string whose length is a multiple of three) and zero otherwise. Machines are inherently recognizers. The machine itself is the embodiment of a pattern. It recognizes any input string which matches the built-in pattern. Therefore, a codification of these automatons is the basic job of the programmer who wants tomatch some patterns. There are many different classes of machines and the languages they recognize. From simple to complex, the major classifications are deterministic finite state automata (which are equivalent to nondeterministic finite state automata ), deterministic push down automata, nondeterministic push down automata, and Turing machines. Each successive machine in this list provides a superset of the capabilities of the machines appearing before it. The only reason we don’t use Turing machines for everything is because they are more complex to program than, say, a deterministic finite state automaton. If you can match the pattern you want using a deterministic finite state automaton, you’ll probably want to code it that way rather than as a Turing machine. Each class of machine has a class of languages associated with it. Deterministic and nondeterministic finite state automata recognize the regular languages. Nondeterministic push down automata recognize the context free languages2. Turing machines can recognize all recognizable languages. We will discuss each of these sets of languages, and their properties, in turn.
16.1.2
Regular Languages The regular languages are the least complex of the languages described in the previous section. That does not mean they are less useful; in fact, patterns based on regular expression are probably more common than any other.
2. Deterministic push down automata recognize only a subset of the context free languages.
Page 884
Control Structures
16.1.2.1 Regular Expressions The most compact way to specify the strings that belong to a regular language is with a regular expression. We shall define, recursively, a regular expression with the following rules: • •
∅ (the empty set) is a regular language and denotes the empty set. ε is a regular expression3. It denotes the set of languages containing only the empty string: {ε}.
•
Any single symbol, a, is a regular expression (we will use lower case characters to denote arbitrary symbols). This single symbol matches exactly one character in the input string, that character must be equal to the single symbol in the regular expression. For example, the pattern “m” matches a single “m” character in the input string.
Note that ∅ and ε are not the same. The empty set is a regular language that does not accept any strings, including strings of length zero. If a regular language is denoted by {ε}, then it accepts exactly one string, the string of length zero. This latter regular language accepts something, the former does not. The three rules above provide our basis for a recursive definition. Now we will define regular expressions recursively. In the following definitions, assume that r, s, and t are any valid regular expressions. •
•
• •
• •
Concatenation. If r and s are regular expressions, so is rs. The regular expression rs matches any string that begins with a string matched by r and ends with a string matched by s. Alternation/Union. If r and s are regular expressions, so is r | s (read this as r or s ) This is equivalent to r ∪ s, (read as r union s ). This regular expression matches any string that r or s matches. Intersection. If r and s are regular expressions, so is r ∩ s. This is the set of all strings that both r and s match. Kleene Star. If r is a regular expression, so is r*. This regular expression matches zero or more occurrences of r. That is, it matches ε, r, rr, rrr, rrrr, ... Difference. If r and s are regular expressions, so is r-s. This denotes the set of strings matched by r that are not also matched by s. Precedence. If r is a regular expression, so is (r ). This matches any string matched by r alone. The normal algebraic associative and distributive laws apply here, so (r | s ) t is equivalent to rt | st.
These operators following the normal associative and distributive laws and exhibit the following precedences: Highest:
Lowest:
(r) Kleene Star Concatentation Intersection Difference Alternation/Union
Examples: (r | s) t = rt | st rs* = r(s*) r ∪ t - s = r ∪ (t - s) r ∩ t - s = (r ∩ t) - s
Generally, we’ll use parenthesis to avoid any ambiguity Although this definition is sufficient for an automata theory class, there are some practical aspects to this definition that leave a little to be desired. For example, to define a 3. The empty string is the string of length zero, containing no symbols.
Page 885
Chapter 16 regular expression that matches a single alphabetic character, you would need to create something like (a | b | c | … | y | z ). Quite a lot of typing for such a trivial character set. Therefore, we shall add some notation to make it easier to specify regular expressions. •
•
•
Character Sets. Any set of characters surrounded by brackets, e.g., [abcdefg] is a regular expression and matches a single character from that set. You can specify ranges of characters using a dash, i.e., “[a-z]” denotes the set of lower case characters and this regular expression matches a single lower case character. Kleene Plus. If r is a regular expression, so is r+. This regular expression matches one or more occurrences of r. That is, it matches r, rr, rrr, rrrr, … The precedence of the Kleene Plus is the same as for the Kleene Star. Note that r+ = rr*. Σ represents any single character from the allowable character set. Σ* represents the set of all possible strings. The regular expression Σ*-r is the complement of r – that is, the set of all strings that r does not match.
With the notational baggage out of the way, it’s time to discuss how to actually use regular expressions as pattern matching specifications. The following examples should give a suitable introduction. Identifiers:
Most programming languages like Pascal or C/C++ specify legal forms for identifiers using a regular expression. Expressed in English terms, the specification is something like “An identifier must begin with an alphabetic character and is followed by zero or more alphanumeric or underscore characters.” Using the regular expression (RE) syntax described in this section, an identifier is [a-zA-Z][a-zA-Z0-9_]*
Integer Consts: A regular expression for integer constants is relatively easy to design. An integer constant consists of an optional plus or minus followed by one or more digits. The RE is (+ | - | ε ) [0-9]+ Note the use of the empty string (ε) to make the plus or minus optional. Real Consts:
Real constants are a bit more complex, but still easy to specify using REs. Our definition matches that for a real constant appearing in a Pascal program – an optional plus or minus, following by one or more digits; optionally followed by a decimal point and zero or more digits; optionally followed by an “e” or an “E” with an optional sign and one or more digits: (+ | - | ε ) [0-9]+ ( “.” [0-9]* | ε ) (((e | E) (+ | - | ε ) [0-9]+) | ε ) Since this RE is relatively complex, we should dissect it piece by piece. The first parenthetical term gives us the optional sign. One or more digits are mandatory before the decimal point, the second term provides this. The third term allows an optional decimal point followed by zero or more digits. The last term provides for an optional exponent consisting of “e” or “E” followed by an optional sign and one or more digits.
Reserved Words: It is very easy to provide a regular expression that matches a set of reserved words. For example, if you want to create a regular expression that matches MASM’s reserved words, you could use an RE similar to the following: ( mov | add | and | … | mul ) Even:
The regular expression ( ΣΣ )* matches all strings whose length is a multiple of two.
Sentences:
The regular expression: (Σ* “ “* )* run ( “ “+ ( Σ* “ “+ | ε )) fast (“ “ Σ* )*
Page 886
Control Structures
0-9 0-9
0
"."
+| - |e 1
2 0-9
3
e
e
4 e
8
e|E e 5
+| - |e 6
0-9
7
0-9
Figure 16.1 NFA for Regular Expression (+ | - | e ) [0-9]+ ( “.” [0-9]* | e ) (((e | E) (+ | - | e ) [0-9]+) | e ) matches all strings that contain the separate words “run” followed by “fast” somewhere on the line. This matches strings like “I want to run very fast” and “run as fast as you can” as well as “run fast.” While REs are convenient for specifying the pattern you want to recognize, they are not particularly useful for creating programs (i.e., “machines”) that actually recognize such patterns. Instead, you should first convert an RE to a nondeterministic finite state automaton, or NFA. It is very easy to convert an NFA into an 80x86 assembly language program; however, such programs are rarely efficient as they might be. If efficiency is a big concern, you can convert the NFA into a deterministic finite state automaton (DFA) that is also easy to convert to 80x86 assembly code, but the conversion is usually far more efficient.
16.1.2.2 Nondeterministic Finite State Automata (NFAs) An NFA is a directed graph with state numbers associated with each node and characters or character strings associated with each edge of the graph. A distinguished state, the starting state, determines where the machine begins attempting to match an input string. With the machine in the starting state, it compares input characters against the characters or strings on each edge of the graph. If a set of input characters matches one of the edges, the machine can change states from the node at the start of the edge (the tail) to the state at the end of the edge (the head). Certain other states, known as final or accepting states, are usually present as well. If a machine winds up in a final state after exhausting all the input characters, then that machine accepts or matches that string. If the machine exhausts the input and winds up in a state that is not a final state, then that machine rejects the string. Figure 16.1 shows an example NFA for the floating point RE presented earlier. By convention, we’ll always assume that the starting state is state zero. We will denote final states (there may be more than one) by using a double circle for the state (state eight is the final state above). An NFA always begins with an input string in the starting state (state zero). On each edge coming out of a state there is either ε, a single character, or a character string. To help unclutter the NFA diagrams, we will allow expressions of the form “ xxx | yyy | zzz | …” where xxx, yyy, and zzz are ε, a single character, or a character string. This corresponds to
Page 887
Chapter 16 multiple edges from one state to the other with a single item on each edge. In the example above, 0
+| - |ε
1
is equivalent to + 0
-
1
ε
Likewise, we will allow sets of characters, specified by a string of the form x-y, to denote the expression x | x+1 | x+2 | … | y. Note that an NFA accepts a string if there is some path from the starting state to an accepting state that exhausts the input string. There may be multiple paths from the starting state to various final states. Furthermore, there may be some particular path from the starting state to a non-accepting state that exhausts the input string. This does not necessarily mean the NFA rejects that string; if there is some other path from the starting state to an accepting state, then the NFA accepts the string. An NFA rejects a string only if there are no paths from the starting state to an accepting state that exhaust the string. Passing through an accepting state does not cause the NFA to accept a string. You must wind up in a final state and exhaust the input string. To process an input string with an NFA, begin at the starting state. The edges leading out of the starting state will have a character, a string, or ε associated with them. If you choose to move from one state to another along an edge with a single character, then remove that character from the input string and move to the new state along the edge traversed by that character. Likewise, if you choose to move along an edge with a character string, remove that character string from the input string and switch to the new state. If there is an edge with the empty string, ε, then you may elect to move to the new state given by that edge without removing any characters from the input string. Consider the string “1.25e2” and the NFA in Figure 16.1. From the starting state we can move to state one using the ε string (there is no leading plus or minus, so ε is our only option). From state one we can move to state two by matching the “1” in our input string with the set 0-9; this eats the “1” in our input string leaving “.25e2”. In state two we move to state three and eat the period from the input string, leaving “25e2”. State three loops on itself with numeric input characters, so we eat the “2” and “5” characters at the beginning of our input string and wind up back in state three with a new input string of “e2”. The next input character is “e”, but there is no edge coming out of state three with an “e” on it; there is, however, an ε-edge, so we can use that to move to state four. This move does not change the input string. In state four we can move to state five on an “e” character. This eats the “e” and leaves us with an input string of “2”. Since this is not a plus or minus character, we have to move from state five to state six on the ε edge. Movement from state six to state seven eats the last character in our string. Since the string is empty (and, in particular, it does not contain any digits), state seven cannot loop back on itself. We are currently in state seven (which is not a final state) and our input string is exhausted. However, we can move to state eight (the accepting state) since the transition between states seven and eight is an ε edge. Since we are in a final state and we’ve exhausted the input string, This NFA accepts the input string.
16.1.2.3 Converting Regular Expressions to NFAs If you have a regular expression and you want to build a machine that recognizes strings in the regular language specified by that expression, you will need to convert the Page 888
Control Structures RE to and NFA. It turns out to be very easy to convert a regular expression to an NFA. To do so, just apply the following rules: • •
The NFA representing regular language denoted by the regular expression ∅ (the empty set) is a single, non-accepting state. If a regular expression contains an ε, a single character, or a string, create two states and draw an arc between them with ε, the single character, or the string as the label. For example, the RE “a” is converted to an NFA as a
•
Let the symbol
denote an NFA which recognizes some reg-
ular language specified by some regular expression r, s, or t. If a regular expression takes the form rs then the corresponding NFA is ε
r •
•
s
If a regular expression takes the form r | s, then the corresponding NFA is ε
r
ε
s
ε
ε
If a regular expression takes the form r* then the corresponding NFA is r ε
ε
All of the other forms of regular expressions are easily synthesized from these, therefore, converting those other forms of regular expressions to NFAs is a simple two-step process, convert the RE to one of these forms, and then convert this form to the NFA. For example, to convert r+ to an NFA, you would first convert r+ to rr*. This produces the NFA: r ε
r
ε
ε
The following example converts the regular expression for an integer constant to an NFA. The first step is to create an NFA for the regular expression (+ | - | ε ). The complete construction becomes ε
+
ε
ε
-
ε
ε
ε
ε
Although we can obviously optimize this to +| - |ε
Page 889
Chapter 16 The next step is to handle the [0-9]+ regular expression; after some minor optimization, this becomes the NFA 0-9
0-9 Now we simply concatenate the results to produce: 0-9
+| - |ε
ε 0-9
All we need now are starting and final states. The starting state is always the first state of the NFA created by the conversion of the leftmost item in the regular expression. The final state is always the last state of the NFA created by the conversion of the rightmost item in the regular expression. Therefore, the complete regular expression for integer constants (after optimizing out the middle edge above, which serves no purpose) is
0-9
0
+| - |ε
1
0-9
2
16.1.2.4 Converting an NFA to Assembly Language There is only one major problem with converting an NFA to an appropriate matching function – NFAs are nondeterministic. If you’re in some state and you’ve got some input character, say “a”, there is no guarantee that the NFA will tell you what to do next. For example, there is no requirement that edges coming out of a state have unique labels. You could have two or more edges coming out of a state, all leading to different states on the single character “a”. If an NFA accepts a string, it only guarantees that there is some path that leads to an accepting state, there is no guarantee that this path will be easy to find. The primary technique you will use to resolve the nondeterministic behavior of an NFA is backtracking. A function that attempts to match a pattern using an NFA begins in the starting state and tries to match the first character(s) of the input string against the edges leaving the starting state. If there is only one match, the code must follow that edge. However, if there are two possible edges to follow, then the code must arbitrarily choose one of them and remember the others as well as the current point in the input string. Later, if it turns out the algorithm guessed an incorrect edge to follow, it can return back and try one of the other alternatives (i.e., it backtracks and tries a different path). If the algorithm exhausts all alternatives without winding up in a final state (with an empty input string), then the NFA does not accept the string. Probably the easiest way to implement backtracking is via procedure calls. Let us assume that a matching procedure returns the carry flag set if it succeeds (i.e., accepts a
Page 890
Control Structures string) and returns the carry flag clear if it fails (i.e., rejects a string). If an NFA offers multiple choices, you could implement that portion of the NFA as follows: ε ε ε AltRST
Success: AltRST
proc push mov call jc mov call jc mov call pop ret endp
r s t near ax ax, di r Success di, ax s Success di, ax t ax
ε ε ε ;The purpose of these two instructions ; is to preserve di in case of failure.
;Restore di (it may be modified by r).
;Restore di (it may be modified by s). ;Restore ax.
If the r matching procedure succeeds, there is no need to try s and t. On the other hand, if r fails, then we need to try s. Likewise, if r and s both fail, we need to try t. AltRST will fail only if r, s, and t all fail. This code assumes that es:di points at the input string to match. On return, es:di points at the next available character in the string after a match or it points at some arbitrary point if the match fails. This code assumes that r, s, and t all preserve the ax register, so it preserves a pointer to the current point in the input string in ax in the event r or s fail. To handle the individual NFA associated with simple regular expressions (i.e., matching ε or a single character) is not hard at all. Suppose the matching function r matches the regular expression (+ | - | ε ). The complete procedure for r is r
r_matched: r_nomatch: r
proc cmp je cmp jne inc stc ret endp
near byte ptr es:[di], ‘+’ r_matched byte ptr es:[di], ‘-’ r_nomatch di
Note that there is no explicit test for ε. If ε is one of the alternatives, the function attempts to match one of the other alternatives first. If none of the other alternatives succeed, then the matching function will succeed anyway, although it does not consume any input characters (which is why the above code skips over the inc di instruction if it does not match “+” or “-”). Therefore, any matching function that has ε as an alternative will always succeed. Of course, not all matching functions succeed in every case. Suppose the s matching function accepts a single decimal digit. the code for s might be the following: s
proc cmp jb cmp ja inc stc ret
s_fails:
clc ret endp
s
near byte ptr es:[di], ‘0’ s_fails byte ptr es:[di], ‘9’ s_fails di
Page 891
Chapter 16 If an NFA takes the form: x
r
s
Where x is any arbitrary character or string or ε, the corresponding assembly code for this procedure would be ConcatRxS
proc call jnc
near r CRxS_Fail
;If no r, we won’t succeed
; Note, if x=ε then simply delete the following three statements. ; If x is a string rather than a single character, put the the additional ; code to match all the characters in the string.
CRxS_Fail: ConcatRxS
cmp jne inc
byte ptr es:[di], ‘x’ CRxS_Fail di
call jnc stc ret
s CRxS_Fail ;Success!
clc ret endp
If the regular expression is of the form r* and the corresponding NFA is of the form r ε
ε
Then the corresponding 80x86 assembly code can look something like the following: RStar
RStar
proc call jc stc ret endp
near r RStar
Regular expressions based on the Kleene star always succeed since they allow zero or more occurrences. That is why this code always returns with the carry flag set. The Kleene Plus operation is only slightly more complex, the corresponding (slightly optimized) assembly code is RPlus
RPlusLp:
RPlus_Fail: RPlus
proc call jnc call jc stc ret
near r RPlus_Fail r RPlusLp
clc ret endp
Note how this routine fails if there isn’t at least one occurrence of r. A major problem with backtracking is that it is potentially inefficient. It is very easy to create a regular expression that, when converted to an NFA and assembly code, generates considerable backtracking on certain input strings. This is further exacerbated by the fact Page 892
Control Structures that matching routines, if written as described above, are generally very short; so short, in fact, that the procedure calls and returns make up a significant portion of the execution time. Therefore, pattern matching in this fashion, although easy, can be slower than it has to be. This is just a taste of how you would convert REs to NFAs to assembly language. We will not go into further detail in this chapter; not because this stuff isn’t interesting to know, but because you will rarely use these techniques in a real program. If you need high performance pattern matching you would not use nondeterministic techniques like these. If you want the ease of programming offered by the conversion of an NFA to assembly language, you still would not use this technique. Instead, the UCR Standard Library provides very powerful pattern matching facilities (which exceed the capabilities of NFAs), so you would use those instead; but more on that a little later.
16.1.2.5 Deterministic Finite State Automata (DFAs) Nondeterministic finite state automata, when converted to actual program code, may suffer from performance problems because of the backtracking that occurs when matching a string. Deterministic finite state automata solve this problem by comparing different strings in parallel. Whereas, in the worst case, an NFA may require n comparisons, where n is the sum of the lengths of all the strings the NFA recognizes, a DFA requires only m comparisons (worst case), where m is the length of the longest string the DFA recognizes. For example, suppose you have an NFA that matches the following regular expression (the set of 80x86 real-mode mnemonics that begin with an “A”): ( AAA | AAD | AAM | AAS | ADC | ADD | AND )
A typical implementation as an NFA might look like the following: MatchAMnem
matched:
MatchAMnem
proc strcmpl byte je strcmpl byte je strcmpl byte je strcmpl byte je strcmpl byte je strcmpl byte je strcmpl byte je clc ret add stc ret endp
near “AAA”,0 matched “AAD”,0 matched “AAM”,0 matched “AAS”,0 matched “ADC”,0 matched “ADD”,0 matched “AND”,0 matched
di, 3
If you pass this NFA a string that it doesn’t match, e.g., “AAND”, it must perform seven string comparisons, which works out to about 18 character comparisons (plus all the overhead of calling strcmpl). In fact, a DFA can determine that it does not match this character string by comparing only three characters.
Page 893
Chapter 16
Σ
3
Σ - [0-9+-] Σ - [0-9] 0
+| -
1
0-9
Σ - [0-9]
0-9
0-9 2
Figure 16.2 DFA for Regular Expression (+ | - | ε ) [0-9]+
0
+| -
1
0-9 0-9
0-9 2
Figure 16.3 Simplified DFA for Regular Expression (+ | - | ε ) [0-9]+ A DFA is a special form of an NFA with two restrictions. First, there must be exactly one edge coming out of each node for each of the possible input characters; this implies that there must be one edge for each possible input symbol and you may not have two edges with the same input symbol. Second, you cannot move from one state to another on the empty string, ε. A DFA is deterministic because at each state the next input symbol determines the next state you will enter. Since each input symbol has an edge associated with it, there is never a case where a DFA “jams” because you cannot leave the state on that input symbol. Similarly, the new state you enter is never ambiguous because there is only one edge leaving any particular state with the current input symbol on it. Figure 16.2 shows the DFA that handles integer constants described by the regular expression (+ | - | ε ) [0-9]+ Note than an expression of the form “Σ - [0-9]“ means any character except a digit; that is, the complement of the set [0-9]. State three is a failure state. It is not an accepting state and once the DFA enters a failure state, it is stuck there (i.e., it will consume all additional characters in the input string without leaving the failure state). Once you enter a failure state, the DFA has already rejected the input string. Of course, this is not the only way to reject a string; the DFA above, for example, rejects the empty string (since that leaves you in state zero) and it rejects a string containing only a “+” or a “-” character. DFAs generally contain more states than a comparable NFA. To help keep the size of a DFA under control, we will allow a few shortcuts that, in no way, affect the operation of a DFA. First, we will remove the restriction that there be an edge associated with each possible input symbol leaving every state. Most of the edges leaving a particular state lead to the failure state. Therefore, our first simplification will be to allow DFAs to drop the edges that lead to a failure state. If a input symbol is not represented on an outgoing edge from some state, we will assume that it leads to a failure state. The above DFA with this simplification appears in Figure 16.2.
Page 894
Control Structures
2 d
n a 0
a 1
3
a|d|m|s
5
d c|d 4
Figure 16.4 DFA that Recognizes AND, AAA, AAD, AAM, AAS, ADD, and ADC A second shortcut, that is actually present in the two examples above, is to allow sets of characters (or the alternation symbol, “|”) to associate several characters with a single edge. Finally, we will also allow strings attached to an edge. This is a shorthand notation for a list of states which recognize each successive character, i.e., the following two DFAs are equivalent: abc
a
b
c
Returning to the regular expression that recognizes 80x86 real-mode mnemonics beginning with an “A”, we can construct a DFA that recognizes such strings as shown in Figure 16.4. If you trace through this DFA by hand on several accepting and rejecting strings, you will discover than it requires no more than six character comparisons to determine whether the DFA should accept or reject an input string. Although we are not going to discuss the specifics here, it turns out that regular expressions, NFAs, and DFAs are all equivalent. That is, you can convert anyone of these to the others. In particular, you can always convert an NFA to a DFA. Although the conversion isn’t totally trivial, especially if you want an optimized DFA, it is always possible to do so. Converting between all these forms is beginning to leave the scope of this text. If you are interested in the details, any text on formal languages or automata theory will fill you in.
16.1.2.6 Converting a DFA to Assembly Language It is relatively straightforward to convert a DFA to a sequence of assembly instructions. For example, the assembly code for the DFA that accepts the A-mnemonics in the previous section is DFA_A_Mnem
proc cmp jne cmp je cmp je cmp je
near byte Fail byte DoAA byte DoAD byte DoAN
ptr es:[di], ‘A’ ptr es:[di+1], ‘A’ ptr es:[di+1], ‘D’ ptr es:[di+1], ‘N’
Page 895
Chapter 16 Fail:
clc ret
DoAN:
cmp jne add stc ret
byte ptr es:[di+2], ‘D’ Fail di, 3
cmp je cmp je clc ret
byte ptr es:[di+2], ‘D’ Succeed byte ptr es:[di+2], ‘C’ Succeed
cmp je cmp je cmp je cmp je clc ret endp
byte ptr Succeed byte ptr Succeed byte ptr Succeed byte ptr Succeed
Succeed:
DoAD:
DoAA:
DFA_A_Mnem
;Return Failure
es:[di+2], ‘A’ es:[di+2], ‘D’ es:[di+2], ‘M’ es:[di+2], ‘S’
Although this scheme works and is considerably more efficient than the coding scheme for NFAs, writing this code can be tedious, especially when converting a large DFA to assembly code. There is a technique that makes converting DFAs to assembly code almost trivial, although it can consume quite a bit of space – to use state machines. A simple state machine is a two dimensional array. The columns are indexed by the possible characters in the input string and the rows are indexed by state number (i.e., the states in the DFA). Each element of the array is a new state number. The algorithm to match a given string using a state machine is trivial, it is state := 0; while (another input character ) do begin ch := next input character ; state := StateTable [state][ch]; end; if (state in FinalStates) then accept else reject; FinalStates is a set of accepting states. If the current state number is in this set after the algorithm exhausts the characters in the string, then the state machine accepts the string, otherwise it rejects the string.
The following state table corresponds to the DFA for the “A” mnemonics appearing in the previous section:
Page 896
Control Structures
Table 62: State Machine for 80x86 “A” Instructions DFA State
A
C
D
M
N
S
Else
0
1
F
F
F
F
F
F
1
3
F
4
F
2
F
F
2
F
F
5
F
F
F
F
3
5
F
5
5
F
5
F
4
F
5
5
F
F
F
F
5
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
State five is the only accepting state. There is one major drawback to using this table driven scheme – the table will be quite large. This is not apparent in the table above because the column labelled “Else” hides considerable detail. In a true state table, you will need one column for each possible input character. since there are 256 possible input characters (or at least 128 if you’re willing to stick to seven bit ASCII), the table above will have 256 columns. With only one byte per element, this works out to about 2K for this small state machine. Larger state machines could generate very large tables. One way to reduce the size of the table at a (very) slight loss in execution speed is to classify the characters before using them as an index into a state table. By using a single 256-byte lookup table, it is easy to reduce the state machine to the table above. Consider the 256 byte lookup table that contains: • • • • • • •
A one at positions Base+”a” and Base+”A”, A two at locations Base+”c” and Base+”C”, A three at locations Base+”d” and Base+”D”, A four at locations Base+”m” and Base+”M”, A five at locations Base+”n” and Base+”N”, A six at locations Base+”s” and Base+”S”, and A zero everywhere else.
Now we can modify the above table to produce:
Table 63: Classified State Machine Table for 80x86 “A” Instructions DFA State
0
1
2
3
4
5
6
7
0
6
1
6
6
6
6
6
6
1
6
3
6
4
6
2
6
6
2
6
6
6
5
6
6
6
6
3
6
5
6
5
5
6
5
6
4
6
6
5
5
6
6
6
6
5
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
The table above contains an extra column, “7”, that we will not use. The reason for adding the extra column is to make it easy to index into this two dimensional array (since the extra column lets us multiply the state number by eight rather than seven). Assuming Classify is the name of the lookup table, the following 80386 code recognizes the strings specified by this DFA:
Page 897
Chapter 16 DFA2_A_Mnem
WhileNotEOS:
AtEOS:
Accept:
DFA2_A_Mnem
proc push push push xor mov mov lea mov cmp je xlat mov inc jmp cmp stc je clc pop pop pop ret endp
ebx eax ecx eax, eax ebx, eax ecx, eax bx, Classify al, es:[di] al, 0 AtEOS cl, State_Tbl[eax+ecx*8] di WhileNotEOS cl, 5
;Ptr to Classify. ;Current character. ;Current state. ;EAX := 0 ;EBX := 0 ;ECX (state) := 0 ;Get next input char. ;At end of string? ;Classify character. ;Get new state #. ;Move on to next char.
;In accepting state? ;Assume acceptance.
Accept ecx eax ebx
The nice thing about this DFA (the DFA is the combination of the classification table, the state table, and the above code) is that it is very easy to modify. To handle any other state machine (with eight or fewer character classifications) you need only modify the Classification array, the State_Tbl array, the lea bx, Classify statement and the statements at label AtEOS that determine if the machine is in a final state. The assembly code does not get more complex as the DFA grows in size. The State_Tbl array will get larger as you add more states, but this does not affect the assembly code. Of course, the assembly code above does assume there are exactly eight columns in the matrix. It is easy to generalize this code by inserting an appropriate imul instruction to multiply by the size of the array. For example, had we gone with seven columns rather than eight, the code above would be DFA2_A_Mnem
WhileNotEOS:
AtEOS:
Accept:
DFA2_A_Mnem
proc push push push xor mov mov lea mov cmp je xlat imul movzx inc jmp cmp stc je clc pop pop pop ret endp
ebx eax ecx eax, eax ebx, eax ecx, eax bx, Classify al, es:[di] al, 0 AtEOS
;Ptr to Classify. ;Current character. ;Current state. ;EAX := 0 ;EBX := 0 ;ECX (state) := 0 ;Get next input char. ;At end of string? ;Classify character.
cx, 7 ecx, State_Tbl[eax+ecx] di WhileNotEOS cl, 5
;Get new state #. ;Move on to next char.
;In accepting state? ;Assume acceptance.
Accept ecx eax ebx
Although using a state table in this manner simplifies the assembly coding, it does suffer from two drawbacks. First, as mentioned earlier, it is slower. This technique has to Page 898
Control Structures execute all the statements in the while loop for each character it matches; and those instructions are not particularly fast ones, either. The second drawback is that you’ve got to create the state table for the state machine; that process is tedious and error prone. If you need the absolute highest performance, you can use the state machine techniques described in (see “State Machines and Indirect Jumps” on page 529). The trick here is to represent each state with a short segment of code and its own one dimensional state table. Each entry in the table is the target address of the segment of code representing the next state. The following is an example of our “A Mnemonic” state machine written in this fashion. The only difference is that the zero byte is classified to value seven (zero marks the end of the string, we will use this to determine when we encounter the end of the string). The corresponding state table would be:
Table 64: Another State Machine Table for 80x86 “A” Instructions DFA State
0
1
2
3
4
5
6
7
0
6
1
6
6
6
6
6
6
1
6
3
6
4
6
2
6
6
2
6
6
6
5
6
6
6
6
3
6
5
6
5
5
6
5
6
4
6
6
5
5
6
6
6
6
5
6
6
6
6
6
6
6
5
6
6
6
6
6
6
6
6
6
The 80x86 code is DFA3_A_Mnem
proc push push push xor
ebx eax ecx eax, eax
lea mov xlat inc jmp
ebx, Classify al, es:[di]
State0Tbl
word word
State6, State1, State6, State6 State6, State6, State6, State6
State1:
mov xlat inc jmp
al, es:[di] di cseg:State1Tbl[eax*2]
State1Tbl
word word
State6, State3, State6, State4 State6, State2, State6, State6
State2:
mov xlat inc jmp
al, es:[di] di cseg:State2Tbl[eax*2]
State2Tbl
word word
State6, State6, State6, State5 State6, State6, State6, State6
State3:
mov xlat inc jmp
al, es:[di]
State0:
di cseg:State0Tbl[eax*2]
di cseg:State3Tbl[eax*2]
Page 899
Chapter 16 State3Tbl
word word
State6, State5, State6, State5 State5, State6, State5, State6
State4:
mov xlat inc jmp
al, es:[di] di cseg:State4Tbl[eax*2]
State4Tbl
word word
State6, State6, State5, State5 State6, State6, State6, State6
State5:
mov cmp jne stc pop pop pop ret
al, es:[di] al, 0 State6
State6:
clc pop pop pop ret
ecx eax ebx
ecx eax ebx
There are two important features you should note about this code. First, it only executes four instructions per character comparison (fewer, on the average, than the other techniques). Second, the instant the DFA detects failure it stops processing the input characters. The other table driven DFA techniques blindly process the entire string, even after it is obvious that the machine is locked in a failure state. Also note that this code treats the accepting and failure states a little differently than the generic state table code. This code recognizes the fact that once we’re in state five it will either succeed (if EOS is the next character) or fail. Likewise, in state six this code knows better than to try searching any farther. Of course, this technique is not as easy to modify for different DFAs as a simple state table version, but it is quite a bit faster. If you’re looking for speed, this is a good way to code a DFA.
16.1.3
Context Free Languages Context free languages provide a superset of the regular languages – if you can specify a class of patterns with a regular expression, you can express the same language using a context free grammar. In addition, you can specify many languages that are not regular using context free grammars (CFGs). Examples of languages that are context free, but not regular, include the set of all strings representing common arithmetic expressions, legal Pascal or C source files4, and MASM macros. Context free languages are characterized by balance and nesting. For example, arithmetic expression have balanced sets of parenthesis. High level language statements like repeat…until allow nesting and are always balanced (e.g., for every repeat there is a corresponding until statement later in the source file). There is only a slight extension to the regular languages to handle context free languages – function calls. In a regular expression, we only allow the objects we want to match and the specific RE operators like “|”, “*”, concatenation, and so on. To extend regular languages to context free languages, we need only add recursive function calls to regular expressions. Although it would be simple to create a syntax allowing function calls
4. Actually, C and Pascal are not context free languages, but Computer Scientists like to treat them as though they were.
Page 900
Control Structures within a regular expression, computer scientists use a different notation altogether for context free languages – a context free grammar. A context free grammar contains two types of symbols: terminal symbols and nonterminal symbols. Terminal symbols are the individual characters and strings that the context free grammar matches plus the empty string, ε. Context free grammars use nonterminal symbols for function calls and definitions. In our context free grammars we will use italic characters to denote nonterminal symbols and standard characters to denote terminal symbols. A context free grammar consists of a set of function definitions known as productions. A production takes the following form: Function_Name → «list of terminal and nonterminal symbols»
The function name to the left hand side of the arrow is called the left hand side of the production. The function body, which is the list of terminals and nonterminal symbols, is called the right hand side of the production. The following is a grammar for simple arithmetic expressions: expression → expression + factor expression → expression - factor expression → factor factor → factor * term factor → factor / term factor → term term → IntegerConstant term → ( expression ) IntegerConstant → digit IntegerConstant → digit IntegerConstant digit → 0 digit → 1 digit → 2 digit → 3 digit → 4 digit → 5 digit → 6 digit → 7 digit → 8 digit → 9
Note that you may have multiple definitions for the same function. Context-free grammars behave in a non-deterministic fashion, just like NFAs. When attempting to match a string using a context free grammar, a string matches if there exists some matching function which matches the current input string. Since it is very common to have multiple productions with identical left hand sides, we will use the alternation symbol from the regular expressions to reduce the number of lines in the grammar. The following two subgrammars are identical: expression → expression + factor expression → expression - factor expression → factor
The above is equivalent to: expression → expression + factor
|
expression - factor
|
factor
|
factor
The full arithmetic grammar, using this shorthand notation, is expression → expression + factor | expression - factor factor → factor * term | factor / term | term term → IntegerConstant | ( expression )
Page 901
Chapter 16 IntegerConstant → digit | digit IntegerConstant digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
|
8
|
9
One of the nonterminal symbols, usually the first production in the grammar, is the starting symbol. This is roughly equivalent to the starting state in a finite state automaton. The starting symbol is the first matching function you call when you want to test some input string to see if it is a member of a context free language. In the example above, expression is the starting symbol. Much like the NFAs and DFAs recognize strings in a regular language specified by a regular expression, nondeterministic pushdown automata and deterministic pushdown automata recognize strings belonging to a context free language specified by a context free grammar. We will not go into the details of these pushdown automata (or PDAs ) here, just be aware of their existence. We can match strings directly with a grammar. For example, consider the string 7+5*(2+1)
To match this string, we begin by calling the starting symbol function, expression, using the function expression → expression + factor. The first plus sign suggests that the expression term must match “7” and the factor term must match “5*(2+1)”. Now we need to match our input string with the pattern expression + factor. To do this, we call the expression function once again, this time using the expression → factor production. This give us the reduction: expression ⇒ expression + factor ⇒ factor + factor
The ⇒ symbol denotes the application of a nonterminal function call (a reduction). Next, we call the factor function, using the production factor → term to yield the reduction: expression ⇒ expression + factor ⇒ factor + factor ⇒ term + factor
Continuing, we call the term function to produce the reduction: expression ⇒ expression + factor ⇒ factor + factor ⇒ term + factor ⇒ IntegerConstant + factor
Next, we call the IntegerConstant function to yield: expression ⇒ expression + factor ⇒ factor + factor ⇒ term + factor ⇒ IntegerConstant + factor ⇒ 7 + factor
At this point, the first two symbols of our generated string match the first two characters of the input string, so we can remove them from the input and concentrate on the items that follow. In succession, we call the factor function to produce the reduction 7 + factor * term and then we call factor, term, and IntegerConstant to yield 7 + 5 * term. In a similar fashion, we can reduce the term to “( expression )” and reduce expression to “2+1”. The complete derivation for this string is
Page 902
Control Structures expression
⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒
expression + factor factor + factor term + factor IntegerConstant + factor 7 + factor 7 + factor * term 7 + term * term 7 + IntegerConstant * term 7 + 5 * term 7 + 5 * ( expression ) 7 + 5 * ( expression + factor ) 7 + 5 * ( factor + factor ) 7 + 5 * ( IntegerConstant + factor ) 7 + 5 * ( 2 + factor ) 7 + 5 * ( 2 + term ) 7 + 5 * ( 2 + IntegerConstant ) 7 + 5 * ( 2 + 1)
The final reduction completes the derivation of our input string, so the string 7+5*(2+1) is in the language specified by the context free grammar.
16.1.4
Eliminating Left Recursion and Left Factoring CFGs In the next section we will discuss how to convert a CFG to an assembly language program. However, the technique we are going to use to do this conversion will require that we modify certain grammars before converting them. The arithmetic expression grammar in the previous section is a good example of such a grammar – one that is left recursive. Left recursive grammars pose a problem for us because the way we will typically convert a production to assembly code is to call a function corresponding to a nonterminal and compare against the terminal symbols. However, we will run into trouble if we attempt to convert a production like the following using this technique: expression → expression + factor
Such a conversion would yield some assembly code that looks roughly like the following: expression
Fail: expression
proc call jnc cmp jne inc call jnc stc ret clc ret endp
near expression fail byte ptr es:[di], ‘+’ fail di factor fail
The obvious problem with this code is that it will generate an infinite loop. Upon entering the expression function this code immediately calls expression recursively, which immediately calls expression recursively, which immediately calls expression recursively, ... Clearly, we need to resolve this problem if we are going to write any real code to match this production. The trick to resolving left recursion is to note that if there is a production that suffers from left recursion, there must be some production with the same left hand side that is not left recursive5. All we need do is rewrite the left recursive call in terms of the production
Page 903
Chapter 16 that does not have any left recursion. This sound like a difficult task, but it’s actually quite easy. To see how to eliminate left recursion, let Xi and Yj represent any set of terminal symbols or nonterminal symbols that do not have a right hand side beginning with the nonterminal A. If you have some productions of the form: A → AX1 | AX2 | … | AXn | Y1 | Y2 | … | Ym
You will be able to translate this to an equivalent grammar without left recursion by replacing each term of the form A →Yi by A →Yi A and each term of the form A →AXi by A’ →Xi A’ | ε. For example, consider three of the productions from the arithmetic grammar: expression → expression + factor expression → expression - factor expression → factor
In this example A corresponds to expression, X1 corresponds to “+ factor ”, X2 corresponds to “- factor ”, and Y1 corresponds to “factor ”. The equivalent grammar without left recursion is expression → factor E’ E’ → - factor E’ E’ → + factor E’ E’ → ε
The complete arithmetic grammar, with left recursion removed, is expression → factor E’ E’ → + factor E’ | - factor E’ | ε factor → term F’ F’ → * term F’ | / term F’ | ε term → IntegerConstant | ( expression ) IntegerConstant → digit | digit IntegerConstant digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
|
8
|
9
Another useful transformation on a grammar is to left factor the grammar. This can reduce the need for backtracking, improving the performance of your pattern matching code. Consider the following CFG fragment: stmt → if expression then stmt endif stmt → if expression then stmt else stmt endif
These two productions begin with the same set of symbols. Either production will match all the characters in an if statement up to the point the matching algorithm encounters the first else or endif. If the matching algorithm processes the first statement up to the point of the endif terminal symbol and encounters the else terminal symbol instead, it must backtrack all the way to the if symbol and start over. This can be terribly inefficient because of the recursive call to stmt (imagine a 10,000 line program that has a single if statement around the entire 10,000 lines, a compiler using this pattern matching technique would have to recompile the entire program from scratch if it used backtracking in this fashion). However, by left factoring the grammar before converting it to program code, you can eliminate the need for backtracking. To left factor a grammar, you collect all productions that have the same left hand side and begin with the same symbols on the right hand side. In the two productions above, the common symbols are “if expression then stmt “. You combine the common strings into a single production and then append a new nonterminal symbol to the end of this new production, e.g., 5. If this is not the case, the grammar does not match any finite length strings.
Page 904
Control Structures stmt → if expression then stmt NewNonTerm
Finally, you create a new set of productions using this new nonterminal for each of the suffixes to the common production: NewNonTerm → endif | else stmt endif
This eliminates backtracking because the matching algorithm can process the if, the expression, the then, and the stmt before it has to choose between endif and else.
16.1.5
Converting REs to CFGs Since the context free languages are a superset of the regular languages, it should come as no surprise that it is possible to convert regular expressions to context free grammars. Indeed, this is a very easy process involving only a few intuitive rules. 1) If a regular expression simply consists of a sequence of characters, xyz, you can easily create a production for this regular expression of the form P → xyz. This applies equally to the empty string, ε. 2) If r and s are two regular expression that you’ve converted to CFG productions R and S , and you have a regular expression rs that you want to convert to a production, simply create a new production of the form T → R S. 3) If r and s are two regular expression that you’ve converted to CFG productions R and S , and you have a regular expression r | s that you want to convert to a production, simply create a new production of the form T → R | S. 4) If r is a regular expression that you’ve converted to a production, R, and you want to create a production for r*, simply use the production RStar → R RStar | ε. 5) If r is a regular expression that you’ve converted to a production, R, and you want to create a production for r+, simply use the production RPlus → R RPlus | R. 6) For regular expressions there are operations with various precedences. Regular expressions also allow parenthesis to override the default precedence. This notion of precedence does not carry over into CFGs. Instead, you must encode the precedence directly into the grammar. For example, to encode R S* you would probably use productions of the form: T → R SStar SStar → S SStar | ε
Likewise, to handle a grammar of the form (RS )* you could use productions of the form: T →RS T |ε RS → R S
16.1.6
Converting CFGs to Assembly Language If you have removed left recursion and you’ve left factored a grammar, it is very easy to convert such a grammar to an assembly language program that recognizes strings in the context free language. The first convention we will adopt is that es:di always points at the start of the string we want to match. The second convention we will adopt is to create a function for each nonterminal. This function returns success (carry set) if it matches an associated subpattern, it returns failure (carry clear) otherwise. If it succeeds, it leaves di pointing at the next character is the staring after the matched pattern; if it fails, it preserves the value in di across the function call. To convert a set of productions to their corresponding assembly code, we need to be able to handle four things: terminal symbols, nonterminal symbols, alternation, and the Page 905
Chapter 16 empty string. First, we will consider simple functions (nonterminals) which do not have multiple productions (i.e., alternation). If a production takes the form T → ε and there are no other productions associated with T, then this production always succeeds. The corresponding assembly code is simply: T
T
proc stc ret endp
near
Of course, there is no real need to ever call T and test the returned result since we know it will always succeed. On the other hand, if T is a stub that you intend to fill in later, you should call T. If a production takes the form T → xyz, where xyz is a string of one or more terminal symbols, then the function returns success if the next several input characters match xyz, it returns failure otherwise. Remember, if the prefix of the input string matches xyz, then the matching function must advance di beyond these characters. If the first characters of the input string does not match xyz, it must preserve di. The following routines demonstrate two cases, where xyz is a single character and where xyz is a string of characters: T1
Success:
T1
T2
T2
proc cmp je clc ret
near byte ptr es:[di], ‘x’ Success
inc stc ret endp
di
proc call byte ret endp
near MatchPrefix ‘xyz’,0
;Single char. ;Return Failure.
;Skip matched char. ;Return success.
MatchPrefix is a routine that matches the prefix of the string pointed at by es:di against the string following the call in the code stream. It returns the carry set and adjusts di if the
string in the code stream is a prefix of the input string, it returns the carry flag clear and preserves di if the literal string is not a prefix of the input. The MatchPrefix code follows: MatchPrefix
CmpLoop:
Success:
Page 906
proc push mov push push push push
far bp bp, sp ax ds si di
;Must be far!
lds mov cmp je cmp jne inc inc jmp
si, 2[bp] al, ds:[si] al, 0 Success al, es:[di] Failure si di CmpLoop
;Get the return address. ;Get string to match. ;If at end of prefix, ; we succeed. ;See if it matches prefix, ; if not, immediately fail.
add inc mov pop pop pop
sp, 2 si 2[bp], si si ds ax
;Don’t restore di. ;Skip zero terminating byte. ;Save as return address.
Control Structures
Failure:
MatchPrefix
pop stc ret
bp
inc cmp jne inc mov
si byte ptr ds:[si], 0 Failure si 2[bp], si
pop pop pop pop pop clc ret endp
di si ds ax bp
;Return success.
;Need to skip to zero byte.
;Save as return address.
;Return failure.
If a production takes the form T → R, where R is a nonterminal, then the T function calls R and returns whatever status R returns, e.g., T
T
proc call ret endp
near R
If the right hand side of a production contains a string of terminal and nonterminal symbols, the corresponding assembly code checks each item in turn. If any check fails, then the function returns failure. If all items succeed, then the function returns success. For example, if you have a production of the form T → R abc S you could implement this in assembly language as T
proc push
near di
;If we fail, must preserve
call jnc call byte jnc call jnc add
R Failure MatchPrefix “abc”,0 Failure S Failure sp, 2
;Don’t preserve di if we
di.
succeed. stc ret Failure:
T
pop clc ret endp
di
Note how this code preserves di if it fails, but does not preserve di if it succeeds. If you have multiple productions with the same left hand side (i.e., alternation), then writing an appropriate matching function for the productions is only slightly more complex than the single production case. If you have multiple productions associated with a single nonterminal on the left hand side, then create a sequence of code to match each of the individual productions. To combine them into a single matching function, simply write the function so that it succeeds if any one of these code sequences succeeds. If one of the productions is of the form T → e, then test the other conditions first. If none of them could be selected, the function succeeds. For example, consider the productions: E’ → + factor E’ |
- factor E’ |
ε
This translates to the following assembly code:
Page 907
Chapter 16 EPrime
proc push cmp jne inc call jnc call jnc add stc ret
near di byte ptr es:[di], ‘+’ TryMinus di factor EP_Failed EPrime EP_Failed sp, 2
TryMinus:
cmp jne inc call jnc call jnc add stc ret
byte ptr es:[di], ‘-’ EP_Failed di factor EP_Failed EPrime EP_Failed sp, 2
EP_Failed:
pop stc ret endp
di
Success:
EPrime
;Succeed because of E’ ->
ε
This routine always succeeds because it has the production E’ → ε. This is why the stc instruction appears after the EP_Failed label. To invoke a pattern matching function, simply load es:di with the address of the string you want to test and call the pattern matching function. On return, the carry flag will contain one if the pattern matches the string up to the point returned in di. If you want to see if the entire string matches the pattern, simply check to see if es:di is pointing at a zero byte when you get back from the function call. If you want to see if a string belongs to a context free language, you should call the function associated with the starting symbol for the given context free grammar. The following program implements the arithmetic grammar we’ve been using as examples throughout the past several sections. The complete implementation is ; ARITH.ASM ; ; A simple recursive descent parser for arithmetic strings. .xlist include stdlib.a includelibstdlib.lib .list
dseg ; ; ; ; ; ; ; ; ; ;
Page 908
segment
para public ‘data’
Grammar for simple arithmetic grammar (supports +, -, *, /): E -> FE’ E’ -> + F E’ | - F E’ | <empty string> F -> TF’ F’ -> * T F’ | / T F’ | <empty string> T -> G | (E) G -> H | H G H -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
InputLine
byte
dseg
ends
128 dup (0)
Control Structures cseg
; ; ; ; ;
segment assume
para public ‘code’ cs:cseg, ds:dseg
Matching functions for the grammar. These functions return the carry flag set if they match their respective item. They return the carry flag clear if they fail. If they fail, they preserve di. If they succeed, di points to the first character after the match.
; E -> FE’ E
E_Failed:
E
proc push call jnc call jnc add stc ret
near di F E_Failed EPrime E_Failed sp, 2
pop clc ret endp
di
; E’ -> + F E’ | - F E’ | EPrime
;See if F, then E’, succeeds.
;Success, don’t restore di.
;Failure, must restore di.
ε
proc push
near di
cmp jne inc call jnc call jnc add stc ret
byte ptr es:[di], ‘+’ TryMinus di F EP_Failed EPrime EP_Failed sp, 2
; Try + F E’ here
Success:
; Try
- F E’ here.
TryMinus:
cmp jne inc call jnc call jnc add stc ret
byte ptr es:[di], ‘-’ Success di F EP_Failed EPrime EP_Failed sp, 2
; If none of the above succeed, return success anyway because we have ; a production of the form E’ -> ε. EP_Failed:
EPrime
pop stc ret endp
di
Page 909
Chapter 16 ; F -> TF’ F
F_Failed:
F
proc push call jnc call jnc add stc ret
near di T F_Failed FPrime F_Failed sp, 2
pop clc ret endp
di
; F -> * T F’ | / T F’ | FPrime
Success:
proc push cmp jne inc call jnc call jnc add stc ret
;Success, don’t restore di.
ε near di byte ptr es:[di], ‘*’ TryDiv di T FP_Failed FPrime FP_Failed sp, 2
;Start with “*”? ;Skip the “*”.
; Try F -> / T F’ here TryDiv:
cmp jne inc call jnc call jnc add stc ret
byte ptr es:[di], ‘/’ Success di T FP_Failed FPrime FP_Failed sp, 2
;Start with “/”? ;Succeed anyway. ;Skip the “/”.
; If the above both fail, return success anyway because we’ve got ; a production of the form F -> ε FP_Failed:
FPrime
pop stc ret endp
di
proc
near
; T -> G | (E) T
; Try T -> G here. call jnc ret ; Try T -> (E) here.
Page 910
G TryParens
Control Structures TryParens:
T_Failed:
T
; ; ; ; ; ; ; ;
push cmp jne inc call jnc cmp jne inc add stc ret
di byte ptr es:[di], ‘(‘ T_Failed di E T_Failed byte ptr es:[di], ‘)’ T_Failed di sp, 2
pop clc ret endp
di
;Preserve if we fail. ;Start with “(“? ;Fail if no. ;Skip “(“ char.
;End with “)”? ;Fail if no. ;Skip “)” ;Don’t restore di, ; we’ve succeeded.
The following is a free-form translation of G -> H | H G H -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 This routine checks to see if there is at least one digit. It fails if there isn’t at least one digit; it succeeds and skips over all digits if there are one or more digits.
G
DigitLoop:
G_Succeeds:
G_Failed: G
proc cmp jb cmp ja
near byte ptr es:[di], ‘0’ G_Failed byte ptr es:[di], ‘9’ G_Failed
inc cmp jb cmp jbe stc ret
di byte ptr es:[di], ‘0’ G_Succeeds byte ptr es:[di], ‘9’ DigitLoop
clc ret endp
;Check for at least ; one digit.
;Skip any remaining ; digits found.
;Fail if no digits ; at all.
; This main program tests the matching functions above and demonstrates ; how to call the matching functions. Main
proc mov mov mov printf byte lesi gets call jnc
ax, seg dseg ;Set up the segment registers ds, ax es, ax
“Enter an arithmetic expression: “,0 InputLine E BadExp
; Good so far, but are we at the end of the string? cmp jne
byte ptr es:[di], 0 BadExp
; Okay, it truly is a good expression at this point. printf
Page 911
Chapter 16
BadExp:
16.1.7
byte dword jmp
“‘%s’ is a valid expression”,cr,lf,0 InputLine Quit
printf byte dword
“‘%s’ is an invalid arithmetic expression”,cr,lf,0 InputLine
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?) Main
Some Final Comments on CFGs The techniques presented in this chapter for converting CFGs to assembly code do not work for all CFGs. They only work for a (large) subset of the CFGs known as LL(1) grammars. The code that these techniques produce is a recursive descent predictive parser6. Although the set of context free languages recognizable by an LL(1) grammar is a subset of the context free languages, it is a very large subset and you shouldn’t run into too many difficulties using this technique. One important feature of predictive parsers is that they do not require any backtracking. If you are willing to live with the inefficiencies associated with backtracking, it is easy to extended a recursive descent parser to handle any CFG. Note that when you use backtracking, the predictive adjective goes away, you wind up with a nondeterministic system rather than a deterministic system (predictive and deterministic are very close in meaning in this case). There are other CFG systems as well as LL(1). The so-called operator precedence and LR(k) CFGs are two examples. For more information about parsing and grammars, consult a good text on formal language theory or compiler construction (see the bibliography).
16.1.8
Beyond Context Free Languages Although most patterns you will probably want to process will be regular or context free, there may be times when you need to recognize certain types of patterns that are beyond these two (e.g., context sensitive languages). As it turns out, the finite state automata are the simplest machines; the pushdown automata (that recognize context free languages) are the next step up. After pushdown automata, the next step up in power is the Turing machine. However, Turing machines are equivalent in power to the 80x867, so matching patterns recognized by Turing machines is no different than writing a normal program. The key to writing functions that recognize patterns that are not context free is to maintain information in variables and use the variables to decide which of several productions you want to use at any one given time. This technique introduces context sensitiv-
6. A parser is a function that determines whether a pattern belongs to a language. 7. Actually, they are more powerful, in theory, because they have an infinite amount of memory available.
Page 912
Control Structures ity. Such techniques are very useful in artificial intelligence programs (like natural language processing) where ambiguity resolution depends on past knowledge or the current context of a pattern matching operation. However, the uses for such types of pattern matching quickly go beyond the scope of a text on assembly language programming, so we will let some other text continue this discussion.
16.2
The UCR Standard Library Pattern Matching Routines The UCR Standard Library provides a very sophisticated set of pattern matching routines. They are patterned after the pattern matching facilities of SNOBOL4, support CFGs, and provide fully automatic backtracking, as necessary. Furthermore, by writing only five assembly language statements, you can match simple or complex patterns. There is very little assembly language code to worry about when using the Standard Library’s pattern matching routines because most of the work occurs in the data segment. To use the pattern matching routines, you first construct a pattern data structure in the data segment. You then pass the address of this pattern and the string you wish to test to the Standard Library match routine. The match routine returns failure or success depending on the state of the comparison. This isn’t quite as easy as it sounds, though; learning how to construct the pattern data structure is almost like learning a new programming language. Fortunately, if you’ve followed the discussion on context free languages, learning this new “language” is a breeze. The Standard Library pattern data structure takes the following form: Pattern MatchFunction MatchParm MatchAlt NextPattern EndPattern StartPattern StrSeg Pattern
struct dword dword dword dword word word word ends
? ? ? ? ? ? ?
The MatchFunction field contains the address of a routine to call to perform some sort of comparison. The success or failure of this function determines whether the pattern matches the input string. For example, the UCR Standard Library provides a MatchStr function that compares the next n characters of the input string against some other character string. The MatchParm field contains the address or value of a parameter (if appropriate) for the MatchFunction routine. For example, if the MatchFunction routine is MatchStr, then the MatchParm field contains the address of the string to compare the input characters against. Likewise, the MatchChar routine compares the next input character in the string against the L.O. byte of the MatchParm field. Some matching functions do not require any parameters, they will ignore any value you assign to MatchParm field. By convention, most programmers store a zero in unused fields of the Pattern structure. The MatchAlt field contains either zero (NULL) or the address of some other pattern data structure. If the current pattern matches the input characters, the pattern matching routines ignore this field. However, if the current pattern fails to match the input string, then the pattern matching routines will attempt to match the pattern whose address appears in this field. If this alternate pattern returns success, then the pattern matching routine returns success to the caller, otherwise it returns failure. If the MatchAlt field contains NULL, then the pattern matching routine immediately fails if the main pattern does not match. The Pattern data structure only matches one item. For example, it might match a single character, a single string, or a character from a set of characters. A real world pattern will probably contain several small patterns concatenated together, e.g., the pattern for a Pascal identifier consists of a single character from the set of alphabetic characters followed Page 913
Chapter 16 by one or more characters from the set [a-zA-Z0-9_]. The NextPattern field lets you create a composite pattern as the concatenation of two individual patterns. For such a composite pattern to return success, the current pattern must match and then the pattern specified by the NextPattern field must also match. Note that you can chain as many patterns together as you please using this field. The last three fields, EndPattern, StartPattern, and StrSeg are for the internal use of the pattern matching routine. You should not modify or examine these fields. Once you create a pattern, it is very easy to test a string to see if it matches that pattern. The calling sequence for the UCR Standard Library match routine is lesi ldxi mov match jc
« Input string to match » « Pattern to match string against » cx, 0 Success
The Standard Library match routine expects a pointer to the input string in the es:di registers; it expects a pointer to the pattern you want to match in the dx:si register pair. The cx register should contain the length of the string you want to test. If cx contains zero, the match routine will test the entire input string. If cx contains a nonzero value, the match routine will only test the first cx characters in the string. Note that the end of the string (the zero terminating byte) must not appear in the string before the position specified in cx. For most applications, loading cx with zero before calling match is the most appropriate operation. On return from the match routine, the carry flag denotes success or failure. If the carry flag is set, the pattern matches the string; if the carry flag is clear, the pattern does not match the string. Unlike the examples given in earlier sections, the match routine does not modify the di register, even if the match succeeds. Instead, it returns the failure/success position in the ax register. The is the position of the first character after the match if match succeeds, it is the position of the first unmatched character if match fails.
16.3
The Standard Library Pattern Matching Functions The UCR Standard Library provides about 20 built-in pattern matching functions. These functions are based on the pattern matching facilities provided by the SNOBOL4 programming language, so they are very powerful indeed! You will probably discover that these routines solve all your pattern matching need, although it is easy to write your own pattern matching routines (see “Designing Your Own Pattern Matching Routines” on page 922) if an appropriate one is not available. The following subsections describe each of these pattern matching routines in detail. There are two things you should note if you’re using the Standard Library’s SHELL.ASM file when creating programs that use pattern matching and character sets. First, there is a line at the very beginning of the SHELL.ASM file that contains the statement “matchfuncs”. This line is currently a comment because it contains a semicolon in column one. If you are going to be using the pattern matching facilities of the UCR Standard Library, you need to uncomment this line by deleting the semicolon in column one. If you are going to be using the character set facilities of the UCR Standard Library (very common when using the pattern matching facilities), you may want to uncomment the line containing “include stdsets.a” in the data segment. The “stdsets.a” file includes several common character sets, including alphabetics, digits, alphanumerics, whitespace, and so on.
16.3.1
Spancset The spancset routine skips over all characters belonging to a character set. This routine will match zero or more characters in the specified set and, therefore, always succeeds.
Page 914
Control Structures The MatchParm field of the pattern data structure must point at a UCR Standard Library character set variable (see “The Character Set Routines in the UCR Standard Library” on page 856). Example: SkipAlphas
pattern
{spancset, alpha}
. . .
lesi ldxi xor match
16.3.2
StringWAlphas SkipAlphas cx, cx
Brkcset Brkcset is the dual to spancset – it matches zero or more characters in the input string which are not members of a specified character set. Another way of viewing brkcset is that it will match all characters in the input string up to a character in the specified character set (or to the end of the string). The matchparm field contains the address of the character set to match.
Example: DoDigits DoDigits2
pattern pattern
{brkcset, digits, 0, DoDigits2} {spancset, digits}
. . .
lesi ldxi xor match jnc
StringWDigits DoDigits cx, cx NoDigits
The code above matches any string that contains a string of one or more digits somewhere in the string.
16.3.3
Anycset Anycset matches a single character in the input string from a set of characters. The matchparm field contains the address of a character set variable. If the next character in the input string is a member of this set, anycset set accepts the string and skips over than character. If the next input character is not a member of that set, anycset returns failure.
Example: DoID DoID2
pattern pattern
{anycset, alpha, 0, DoID2} {spancset, alphanum}
. . .
lesi ldxi xor match jnc
StringWID DoID cx, cx NoID
This code segment checks the string StringWID to see if it begins with an identifier specified by the regular expression [a-zA-Z][a-zA-Z0-9]*. The first subpattern with anycset makes sure there is an alphabetic character at the beginning of the string (alpha is the stdsets.a set variable that has all the alphabetic characters as members). If the string does not begin with an alphabetic, the DoID pattern fails. The second subpattern, DoID2, skips over any following alphanumeric characters using the spancset matching function. Note that spancset always succeeds.
Page 915
Chapter 16 The above code does not simply match a string that is an identifier; it matches strings that begin with a valid identifier. For example, it would match “ThisIsAnID” as well as “ThisIsAnID+SoIsThis - 5”. If you only want to match a single identifier and nothing else, you must explicitly check for the end of string in your pattern. For more details on how to do this, see “EOS” on page 919.
16.3.4
Notanycset Notanycset provides the complement to anycset – it matches a single character in the input string that is not a member of a character set. The matchparm field, as usual, contains the address of the character set whose members must not appear as the next character in the input string. If notanycset successfully matches a character (that is, the next input character is not in the designated character set), the function skips the character and returns success; otherwise it returns failure.
Example: DoSpecial DoSpecial2
pattern pattern
{notanycset, digits, 0, DoSpecial2} {spancset, alphanum}
. . .
lesi ldxi xor match jnc
StringWSpecial DoSpecial cx, cx NoSpecial
This code is similar to the DoID pattern in the previous example. It matches a string containing any character except a digit and then matches a string of alphanumeric characters.
16.3.5
MatchStr Matchstr compares the next set of input characters against a character string. The matchparm field contains the address of a zero terminated string to compare against. If matchstr succeeds, it returns the carry set and skips over the characters it matched; if it
fails, it tries the alternate matching function or returns failure if there is no alternate. Example: DoString MyStr
pattern byte
{matchstr, MyStr} “Match this!”,0
. . .
lesi ldxi xor match jnc
String DoString cx, cx NotMatchThis
This sample code matches any string that begins with the characters “Match This!”
16.3.6
MatchiStr Matchistr is like matchstr insofar as it compares the next several characters against a zero terminated string value. However, matchistr does a case insensitive comparison. During the comparison it converts the characters in the input string to upper case before comparing them to the characters that the matchparm field points at. Therefore, the string pointed at by the matchparm field must contain uppercase wherever alphabetics appear. If the matchparm string contains any lower case characters, the matchistr function will always fail.
Page 916
Control Structures Example: DoString MyStr
pattern byte
{matchistr, MyStr} “MATCH THIS!”,0
. . .
lesi ldxi xor match jnc
String DoString cx, cx NotMatchThis
This example is identical to the one in the previous section except it will match the characters “match this!” using any combination of upper and lower case characters.
16.3.7
MatchToStr Matchtostr matches all characters in an input string up to and including the characters specified by the matchparm parameter. This routine succeeds if the specified string appears somewhere in the input string, it fails if the string does not appear in the input string. This pattern function is quite useful for locating a substring and ignoring everything that came before the substring.
Example: DoString MyStr
pattern byte
{matchtostr, MyStr} “Match this!”,0
. . .
lesi ldxi xor match jnc
String DoString cx, cx NotMatchThis
Like the previous two examples, this code segment matches the string “Match this!” However, it does not require that the input string (String) begin with “Match this!” Instead, it only requires that “Match this!” appear somewhere in the string.
16.3.8
MatchChar The matchchar function matches a single character. The matchparm field’s L.O. byte contains the character you want to match. If the next character in the input string is that character, then this function succeeds, otherwise it fails. Example: DoSpace
pattern
{matchchar, ‘ ‘}
. . .
lesi ldxi xor match jnc
String DoSpace cx, cx NoSpace
This code segment matches any string that begins with a space. Keep in mind that the match routine only checks the prefix of a string. If you wanted to see if the string contained only a space (rather than a string that begins with a space), you would need to explicitly check for an end of string after the space. Of course, it would be far more efficient to use strcmp (see “Strcmp, Strcmpl, Stricmp, Stricmpl” on page 848) rather than match for this purpose!
Page 917
Chapter 16 Note that unlike matchstr, you encode the character you want to match directly into the matchparm field. This lets you specify the character you want to test directly in the pattern definition.
16.3.9
MatchToChar Like matchtostr, matchtochar matches all characters up to and including a character you specify. This is similar to brkcset except you don’t have to create a character set containing a single member and brkcset skips up to but not including the specified character(s). Matchtochar fails if it cannot find the specified character in the input string. Example: DoToSpace
pattern
{matchtochar, ‘ ‘}
. . .
lesi ldxi xor match jnc
String DoSpace cx, cx NoSpace
This call to match will fail if there are no spaces left in the input string. If there are, the call to matchtochar will skip over all characters up to, and including, the first space. This is a useful pattern for skipping over words in a string.
16.3.10 MatchChars Matchchars skips zero or more occurrences of a singe character in an input string. It is similar to spancset except you can specify a single character rather than an entire character set with a single member. Like matchchar, matchchars expects a single character in the L.O. byte of the matchparm field. Since this routine matches zero or more occurrences of that character, it always succeeds.
Example: Skip2NextWord SkipSpcs
pattern pattern
{matchtochar, ‘ ‘, 0, SkipSpcs} {matchchars, ‘ ‘}
. . .
lesi ldxi xor match jnc
String Skip2NextWord cx, cx NoWord
The code segment skips to the beginning of the next word in a string. It fails if there are no additional words in the string (i.e., the string contains no spaces).
16.3.11 MatchToPat Matchtopat matches all characters in a string up to and including the substring matched by some other pattern. This is one of the two facilities the UCR Standard Library pattern matching routines provide to allow the implementation of nonterminal function calls (also see “SL_Match2” on page 922). This matching function succeeds if it finds a string matching the specified pattern somewhere on the line. If it succeeds, it skips the characters through the last character matched by the pattern parameter. As you would expect, the matchparm field contains the address of the pattern to match.
Example:
Page 918
Control Structures ; Assume there is a pattern “expression” that matches arithmetic ; expressions. The following pattern determines if there is such an ; expression on the line followed by a semicolon. FindExp MatchSemi
pattern pattern
{matchtopat, expression, 0, MatchSemi} {matchchar, ‘;‘}
. . .
lesi ldxi xor match jnc
String FindExp cx, cx NoExp
16.3.12 EOS The EOS pattern matches the end of a string. This pattern, which must obviously appear at the end of a pattern list if it appears at all, checks for the zero terminating byte. Since the Standard Library routines only match prefixes, you should stick this pattern at the end of a list if you want to ensure that a pattern exactly matches a string with no left over characters at the end. EOS succeeds if it matches the zero terminating byte, it fails otherwise. Example: SkipNumber SkipDigits EOSPat
pattern pattern pattern
{anycset, digits, 0, SkipDigits} {spancset, digits, 0, EOSPat} {EOS}
. . .
lesi ldxi xor match jnc
String SkipNumber cx, cx NoNumber
The SkipNumber pattern matches strings that contain only decimal digits (from the start of the match to the end of the string). Note that EOS requires no parameters, not even a matchparm parameter.
16.3.13 ARB ARB matches any number of arbitrary characters. This pattern matching function is equivalent to Σ*. Note that ARB is a very inefficient routine to use. It works by assuming it can match all remaining characters in the string and then tries to match the pattern specified by the nextpattern field8. If the nextpattern item fails, ARB backs up one character and tries matching nextpattern again. This continues until the pattern specified by nextpattern succeeds or ARB backs up to its initial starting position. ARB succeeds if the pattern specified by nextpattern succeeds, it fails if it backs up to its initial starting position.
Given the enormous amount of backtracking that can occur with ARB (especially on long strings), you should try to avoid using this pattern if at all possible. The matchtostr, matchtochar, and matchtopat functions accomplish much of what ARB accomplishes, but they work forward rather than backward in the source string and may be more efficient. ARB is useful mainly if you’re sure the following pattern appears late in the string you’re matching or if the string you want to match occurs several times and you want to match the last occurrence (matchtostr, matchtochar, and matchtopat always match the first occurrence they find). 8. Since the match routine only matches prefixes, it does not make sense to apply ARB to the end of a pattern list, the same pattern would match with or without the final ARB. Therefore, ARB usually has a nextpattern field.
Page 919
Chapter 16 Example: SkipNumber SkipDigit SkipDigits
pattern pattern pattern
{ARB,0,0,SkipDigit} {anycset, digits, 0, SkipDigits} {spancset, digits}
. . .
lesi ldxi xor match jnc
String SkipNumber cx, cx NoNumber
This code example matches the last number that appears on an input line. Note that ARB does not use the matchparm field, so you should set it to zero by default.
16.3.14 ARBNUM ARBNUM matches an arbitrary number (zero or more) of patterns that occur in the input string. If R represents some nonterminal number (pattern matching function), then ARBNUM(R ) is equivalent to the production ARBNUM → R ARBNUM | ε.
The matchparm field contains the address of the pattern that ARBNUM attempts to match. Example: SkipNumbers SkipNumber SkipDigits EndDigits EndString
pattern pattern pattern pattern pattern
{ARBNUM, SkipNumber} {anycset, digits, 0, SkipDigits} {spancset, digits, 0, EndDigits} {matchchars, ‘ ‘, EndString} {EOS}
. . .
lesi ldxi xor match jnc
String SkipNumbers cx, cx IllegalNumbers
This code accepts the input string if it consists of a sequence of zero or more numbers separated by spaces and terminated with the EOS pattern. Note the use of the matchalt field in the EndDigits pattern to select EOS rather than a space for the last number in the string.
16.3.15 Skip Skip matches n arbitrary characters in the input string. The matchparm field is an integer value containing the number of characters to skip. Although the matchparm field is a double word, this routine limits the number of characters you can skip to 16 bits (65,535 characters); that is, n is the L.O. word of the matchparm field. This should prove sufficient for most needs. Skip succeeds if there are at least n characters left in the input string; it fails if there are fewer than n characters left in the input string.
Example: Skip1st6 SkipNumber SkipDigits EndDigits
pattern pattern pattern pattern
{skip, 6, 0, SkipNumber} {anycset, digits, 0, SkipDigits} {spancset, digits, 0, EndDigits} {EOS}
. . .
lesi ldxi xor
Page 920
String Skip1st6 cx, cx
Control Structures match jnc
IllegalItem
This example matches a string containing six arbitrary characters followed by one or more decimal digits and a zero terminating byte.
16.3.16 Pos Pos succeeds if the matching functions are currently at the nth character in the string, where n is the value in the L.O. word of the matchparm field. Pos fails if the matching func-
tions are not currently at position n in the string. Unlike the pattern matching functions you’ve seen so far, pos does not consume any input characters. Note that the string starts out at position zero. So when you use the pos function, it succeeds if you’ve matched n characters at that point. Example: SkipNumber SkipDigits EndDigits
pattern pattern pattern
{anycset, digits, 0, SkipDigits} {spancset, digits, 0, EndDigits} {pos, 4}
. . .
lesi ldxi xor match jnc
String SkipNumber cx, cx IllegalItem
This code matches a string that begins with exactly 4 decimal digits.
16.3.17 RPos Rpos works quite a bit like the pos function except it succeeds if the current position is n character positions from the end of the string. Like pos, n is the L.O. 16 bits of the matchparm field. Also like pos, rpos does not consume any input characters.
Example: SkipNumber SkipDigits EndDigits
pattern pattern pattern
{anycset, digits, 0, SkipDigits} {spancset, digits, 0, EndDigits} {rpos, 4}
. . .
lesi ldxi xor match jnc
String SkipNumber cx, cx IllegalItem
This code matches any string that is all decimal digits except for the last four characters of the string. The string must be at least five characters long for the above pattern match to succeed.
16.3.18 GotoPos Gotopos skips over any characters in the string until it reaches character position n in the string. This function fails if the pattern is already beyond position n in the string. The L.O. word of the matchparm field contains the value for n.
Example: SkipNumber MatchNmbr
pattern pattern
{gotopos, 10, 0, MatchNmbr} {anycset, digits, 0, SkipDigits}
Page 921
Chapter 16 SkipDigits EndDigits
pattern pattern
{spancset, digits, 0, EndDigits} {rpos, 4}
. . .
lesi ldxi xor match jnc
String SkipNumber cx, cx IllegalItem
This example code skips to position 10 in the string and attempts to match a string of digits starting with the 11th character. This pattern succeeds if the there are four characters remaining in the string after processing all the digits.
16.3.19 RGotoPos Rgotopos works like gotopos except it goes to the position specified from the end of the string. Rgotopos fails if the matching routines are already beyond position n from the end of the string. As with gotopos, the L.O. word of the matchparm field contains the value for n.
Example: SkipNumber MatchNmbr SkipDigits
pattern pattern pattern
{rgotopos, 10, 0, MatchNmbr} {anycset, digits, 0, SkipDigits} {spancset, digits}
. . .
lesi ldxi xor match jnc
String SkipNumber cx, cx IllegalItem
This example skips to ten characters from the end of the string and then attempts to match one or digits starting at that point. It fails if there aren’t at least 11 characters in the string or the last 10 characters don’t begin with a string of one or more digits.
16.3.20 SL_Match2 The sl_match2 routine is nothing more than a recursive call to match. The matchparm field contains the address of pattern to match. This is quite useful for simulating parenthesis around a pattern in a pattern expression. As far as matching strings are concerned, pattern1 and pattern2, below, are equivalent: Pattern2 Pattern1
pattern pattern
{sl_match2, Pattern1} {matchchar, ‘a’}
The only difference between invoking a pattern directly and invoking it with sl_match2 is that sl_match2 tweaks some internal variables to keep track of matching positions within the input string. Later, you can extract the character string matched by sl_match2 using the patgrab routine (see “Extracting Substrings from Matched Patterns” on page 925).
16.4
Designing Your Own Pattern Matching Routines Although the UCR Standard Library provides a wide variety of matching functions, there is no way to anticipate the needs of all applications. Therefore, you will probably discover that the library does not support some particular pattern matching function you need. Fortunately, it is very easy for you to create your own pattern matching functions to augment those available in the UCR Standard Library. When you specify a matching func-
Page 922
Control Structures tion name in the pattern data structure, the match routine calls the specified address using a far call and passing the following parameters: es:di-
Points at the next character in the input string. You should not look at any characters before this address. Furthermore, you should never look beyond the end of the string (see cx below).
ds:si-
Contains the four byte parameter found in the matchparm field.
cx-
Contains the last position, plus one, in the input string you’re allowed to look at. Note that your pattern matching routine should not look beyond location es:cx or the zero terminating byte; whichever comes first in the input string.
On return from the function, ax must contain the offset into the string (di’s value) of the last character matched plus one, if your matching function is successful. It must also set the carry flag to denote success. After your pattern matches, the match routine might call another matching function (the one specified by the next pattern field) and that function begins matching at location es:ax. If the pattern match fails, then you must return the original di value in the ax register and return with the carry flag clear. Note that your matching function must preserve all other registers. There is one very important detail you must never forget with writing your own pattern matching routines – ds does not point at your data segment, it contains the H.O. word of the matchparm parameter. Therefore, if you are going to access global variables in your data segment you will need to push ds, load it with the address of dseg, and pop ds before leaving. Several examples throughout this chapter demonstrate how to do this. There are some obvious omissions from (the current version of) the UCR Standard Library’s repertoire. For example, there should probably be matchtoistr, matchichar, and matchtoichar pattern functions. The following example code demonstrates how to add a matchtoistr (match up to a string, doing a case insensitive comparison) routine. .xlist include stdlib.a includelib stdlib.lib matchfuncs .list dseg
segment
para public ‘data’
TestString
byte
“This is the string ‘xyz’ in it”,cr,lf,0
TestPat xyz
pattern byte
{matchtoistr,xyz} “XYZ”,0
dseg
ends
cseg
segment assume
; MatchToiStr; ; ; ; ; inputs: ; ; ; ; ; outputs: ; ; ; ;
Matches all characters in a string up to, and including, the specified parameter string. The parameter string must be all upper case characters. This guy matches string using a case insensitive comparison.
para public ‘code’ cs:cseg, ds:dseg
es:dids:sicx-
Source string String to match Maximum match position
ax-
Points at first character beyond the end of the matched string if success, contains the initial DI value if failure occurs. 0 if failure, 1 if success.
carry-
Page 923
Chapter 16 MatchToiStr
proc pushf push push cld
far di si
; Check to see if we’re already past the point were we’re allowed ; to scan in the input string. cmp jae
di, cx MTiSFailure
; If the pattern string is the empty string, always match. cmp je
byte ptr ds:[si], 0 MTSsuccess
; The following loop scans through the input string looking for ; the first character in the pattern string. ScanLoop:
FindFirst:
DoCmp:
push lodsb
si ;Get first char of string
dec di inc di ;Move on to next (or 1st) char. cmp di, cx ;If at cx, then we’ve got to jae CantFind1st; fail. mov cmp jb cmp ja and cmp jne
ah, es:[di] ah, ‘a’ DoCmp ah, ‘z’ DoCmp ah, 5fh al, ah FindFirst
;Get input character. ;Convert input character to ; upper case if it’s a lower ; case character.
;Compare input character against ; pattern string.
; At this point, we’ve located the first character in the input string ; that matches the first character of the pattern string. See if the ; strings are equal.
CmpLoop:
DoCmp2:
StrNotThere: CantFind1st: MTiSFailure:
Page 924
push
di
;Save restart point.
cmp jae lodsb cmp je
di, cx ;See if we’ve gone beyond the StrNotThere; last position allowable. ;Get next input character. al, 0 ;At the end of the parameter MTSsuccess2; string? If so, succeed.
inc mov cmp jb cmp ja and cmp je pop pop jmp
di ah, es:[di] ah, ‘a’ DoCmp2 ah, ‘z’ DoCmp2 ah, 5fh al, ah CmpLoop di si ScanLoop
add add pop pop mov popf
sp, 2 sp, 2 si di ax, di
;Get the next input character. ;Convert input character to ; upper case if it’s a lower ; case character.
;Compare input character against
;Remove di from stack. ;Remove si from stack.
;Return failure position in AX.
Control Structures clc ret MTSSuccess2: MTSSuccess:
MatchToiStr Main
add add mov pop pop popf stc ret endp proc mov mov mov meminit lesi ldxi xor match jnc print byte jmp
NoMatch:
16.5
print byte
;Return failure.
sp, 2 sp, 2 ax, di si di
;Remove DI value from stack. ;Remove SI value from stack. ;Return next position in AX.
;Return success.
ax, dseg ds, ax es, ax
TestString TestPat cx, cx NoMatch “Matched”,cr,lf,0 Quit
“Did not match”,cr,lf,0
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Extracting Substrings from Matched Patterns Often, simply determining that a string matches a given pattern is insufficient. You may want to perform various operations that depend upon the actual information in that string. However, the pattern matching facilities described thus far do not provide a mechanism for testing individual components of the input string. In this section, you will see how to extract portions of a pattern for further processing. Perhaps an example may help clarify the need to extract portions of a string. Suppose you are writing a stock buy/sell program and you want it to process commands described by the following regular expression: (buy | sell) [0-9]+ shares of (ibm | apple | hp | dec)
While it is easy to devise a Standard Library pattern that recognizes strings of this form, calling the match routine would only tell you that you have a legal buy or sell command. It does not tell you if you are to buy or sell, who to buy or sell, or how many shares to buy or sell. Of course, you could take the cross product of (buy | sell) with (ibm | apple | hp | dec) and generate eight different regular expressions that uniquely determine whether you’re buying or selling and whose stock you’re trading, but you can’t process the integer values this way (unless you willing to have millions of regular expressions). A better soluPage 925
Chapter 16 tion would be to extract substrings from the legal pattern and process these substrings after you verify that you have a legal buy or sell command. For example, you could extract buy or sell into one string, the digits into another, and the company name into a third. After verifying the syntax of the command, you could process the individual strings you’ve extracted. The UCR Standard Library patgrab routine provides this capability for you. You normally call patgrab after calling match and verifying that it matches the input string. Patgrab expects a single parameter – a pointer to a pattern recently processed by match. Patgrab creates a string on the heap consisting of the characters matched by the given pattern and returns a pointer to this string in es:di. Note that patgrab only returns a string associated with a single pattern data structure, not a chain of pattern data structures. Consider the following pattern: PatToGrab Pat2 str1 str2
pattern pattern byte byte
{matchstr, str1, 0, Pat2} {matchstr, str2} “Hello”,0 “ there”,0
Calling match on PatToGrab will match the string “Hello there”. However, if after calling match you call patgrab and pass it the address of PatToGrab, patgrab will return a pointer to the string “Hello”. Of course, you might want to collect a string that is the concatenation of several strings matched within your pattern (i.e., a portion of the pattern list). This is where calling the sl_match2 pattern matching function comes in handy. Consider the following pattern: Numbers FirstNumber OtherDigs
pattern pattern pattern
{sl_match2, FirstNumber} {anycset, digits, 0, OtherDigs} {spancset, digits}
This pattern matches the same strings as Numbers OtherDigs
pattern pattern
{anycset, digits, 0, OtherDigs} {spancset, digits}
So why bother with the extra pattern that calls sl_match2? Well, as it turns out the sl_match2 matching function lets you create parenthetical patterns. A parenthetical pattern is a pattern list that the pattern matching routines (especially patgrab) treat as a single pattern. Although the match routine will match the same strings regardless of which version of Numbers you use, patgrab will produce two entirely different strings depending upon your choice of the above patterns. If you use the latter version, patgrab will only return the first digit of the number. If you use the former version (with the call to sl_match2), then patgrab returns the entire string matched by sl_match2, and that turns out to be the entire string of digits. The following sample program demonstrates how to use parenthetical patterns to extract the pertinent information from the stock command presented earlier. It uses parenthetical patterns for the buy/sell command, the number of shares, and the company name. .xlist include stdlib.a includelib stdlib.lib matchfuncs .list dseg
segment
para public ‘data’
; Variables used to hold the number of shares bought/sold, a pointer to ; a string containing the buy/sell command, and a pointer to a string ; containing the company name. Count CmdPtr CompPtr
Page 926
word dword dword
0 ? ?
Control Structures ; Some test strings to try out: Cmd1 Cmd2 Cmd3 Cmd4 BadCmd0
byte byte byte byte byte
“Buy 25 shares of apple stock”,0 “Sell 50 shares of hp stock”,0 “Buy 123 shares of dec stock”,0 “Sell 15 shares of ibm stock”,0 “This is not a buy/sell command”,0
; Patterns for the stock buy/sell command: ; ; StkCmd matches buy or sell and creates a parenthetical pattern ; that contains the string “buy” or “sell”. StkCmd
pattern
{sl_match2, buyPat, 0, skipspcs1}
buyPat buystr
pattern byte
{matchistr,buystr,sellpat} “BUY”,0
sellpat sellstr
pattern byte
{matchistr,sellstr} “SELL”,0
; Skip zero or more white space characters after the buy command. skipspcs1
pattern
{spancset, whitespace, 0, CountPat}
; CountPat is a parenthetical pattern that matches one or more ; digits. CountPat Numbers RestOfNum
pattern pattern pattern
{sl_match2, Numbers, 0, skipspcs2} {anycset, digits, 0, RestOfNum} {spancset, digits}
; The following patterns match “ shares of “ allowing any amount ; of white space between the words. skipspcs2
pattern
{spancset, whitespace, 0, sharesPat}
sharesPat sharesStr
pattern byte
{matchistr, sharesStr, 0, skipspcs3} “SHARES”,0
skipspcs3
pattern
{spancset, whitespace, 0, ofPat}
ofPat ofStr
pattern byte
{matchistr, ofStr, 0, skipspcs4} “OF”,0
skipspcs4
pattern
{spancset, whitespace, 0, CompanyPat}
; The following parenthetical pattern matches a company name. ; The patgrab-available string will contain the corporate name. CompanyPat
pattern
{sl_match2, ibmpat}
ibmpat ibm
pattern byte
{matchistr, ibm, applePat} “IBM”,0
applePat apple
pattern byte
{matchistr, apple, hpPat} “APPLE”,0
hpPat hp
pattern byte
{matchistr, hp, decPat} “HP”,0
decPat decstr
pattern byte
{matchistr, decstr} “DEC”,0
include ends
stdsets.a
dseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
cseg
Page 927
Chapter 16 ; DoBuySell; ; ; ; ; ; ;
This routine processes a stock buy/sell command. After matching the command, it grabs the components of the command and outputs them as appropriate. This routine demonstrates how to use patgrab to extract substrings from a pattern string.
DoBuySell
proc near ldxi StkCmd xor cx, cx match jnc NoMatch
On entry, es:di must point at the buy/sell command you want to process.
lesi patgrab mov mov lesi patgrab atoi mov free
DoBuySell
Main
word ptr CmdPtr, di word ptr CmdPtr+2, es CountPat ;Convert digits to integer Count, ax ;Return storage to heap.
lesi patgrab mov mov
word ptr CompPtr, di word ptr CompPtr+2, es
printf byte byte byte dword
“Stock command: %^s\n” “Number of shares: %d\n” “Company to trade: %^s\n\n”,0 CmdPtr, Count, CompPtr
les free les free ret NoMatch:
StkCmd
print byte ret endp
proc mov mov mov
CompanyPat
di, CmdPtr di, CompPtr
“Illegal buy/sell command”,cr,lf,0
ax, dseg ds, ax es, ax
meminit lesi call lesi call lesi call lesi call lesi call Quit: Main
Page 928
ExitPgm endp
Cmd1 DoBuySell Cmd2 DoBuySell Cmd3 DoBuySell Cmd4 DoBuySell BadCmd0 DoBuySell
Control Structures cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Sample program output: Stock command: Buy Number of shares: 25 Company to trade: apple Stock command: Sell Number of shares: 50 Company to trade: hp Stock command: Buy Number of shares: 123 Company to trade: dec Stock command: Sell Number of shares: 15 Company to trade: ibm Illegal buy/sell command
16.6
Semantic Rules and Actions Automata theory is mainly concerned with whether or not a string matches a given pattern. Like many theoretical sciences, practitioners of automata theory are only concerned if something is possible, the practical applications are not as important. For real programs, however, we would like to perform certain operations if we match a string or perform one from a set of operations depending on how we match the string. A semantic rule or semantic action is an operation you perform based upon the type of pattern you match. This is, it is the piece of code you execute when you are satisfied with some pattern matching behavior. For example, the call to patgrab in the previous section is an example of a semantic action. Normally, you execute the code associated with a semantic rule after returning from the call to match. Certainly when processing regular expressions, there is no need to process a semantic action in the middle of pattern matching operation. However, this isn’t the case for a context free grammar. Context free grammars often involve recursion or may use the same pattern several times when matching a single string (that is, you may reference the same nonterminal several times while matching the pattern). The pattern matching data structure only maintains pointers (EndPattern, StartPattern, and StrSeg) to the last substring matched by a given pattern. Therefore, if you reuse a subpattern while matching a string and you need to execute a semantic rule associated with that subpattern, you will need to execute that semantic rule in the middle of the pattern matching operation, before you reference that subpattern again. It turns out to be very easy to insert semantic rules in the middle of a pattern matching operation. All you need to do is write a pattern matching function that always succeeds (i.e., it returns with the carry flag clear). Within the body of your pattern matching routine you can choose to ignore the string the matching code is testing and perform any other actions you desire.
Page 929
Chapter 16 Your semantic action routine, on return, must set the carry flag and it must copy the original contents of di into ax. It must preserve all other registers. Your semantic action must not call the match routine (call sl_match2 instead). Match does not allow recursion (it is not reentrant) and calling match within a semantic action routine will mess up the pattern match in progress. The following example provides several examples of semantic action routines within a program. This program converts arithmetic expressions in infix (algebraic) form to reverse polish notation (RPN) form. ; ; ; ; ; ; ;
INFIX.ASM A simple program which demonstrates the pattern matching routines in the UCR library. This program accepts an arithmetic expression on the command line (no interleaving spaces in the expression is allowed, that is, there must be only one command line parameter) and converts it from infix notation to postfix (rpn) notation. .xlist include stdlib.a includelib stdlib.lib matchfuncs .list
dseg ; ; ; ; ; ; ; ; ; ; ;
segment
para public ‘data’
Grammar for simple infix -> postfix translation operation (the semantic actions are enclosed in braces}: E -> FE’ E’ -> +F {output ‘+’} E’ | -F {output ‘-’} E’ | <empty string> F -> TF’ F -> *T {output ‘*’} F’ | /T {output ‘/’} F’ | <empty string> T -> -T {output ‘neg’} | S S -> {output constant} | (E) UCR Standard Library Pattern which handles the grammar above:
; An expression consists of an “E” item followed by the end of the string: infix2rpn EndOfString
pattern pattern
{sl_Match2,E,,EndOfString} {EOS}
; An “E” item consists of an “F” item optionally followed by “+” or “-” ; and another “E” item: E Eprime epf epPlus
pattern pattern pattern pattern
{sl_Match2, F,,Eprime} {MatchChar, ‘+’, Eprime2, epf} {sl_Match2, F,,epPlus} {OutputPlus,,,Eprime} ;Semantic rule
Eprime2 emf epMinus
pattern pattern pattern
{MatchChar, ‘-’, Succeed, emf} {sl_Match2, F,,epMinus} {OutputMinus,,,Eprime} ;Semantic rule
; An “F” item consists of a “T” item optionally followed by “*” or “/” ; followed by another “T” item:
Page 930
F Fprime fmf pMul
pattern pattern pattern pattern
{sl_Match2, T,,Fprime} {MatchChar, ‘*’, Fprime2, fmf} {sl_Match2, T, 0, pMul} {OutputMul,,,Fprime} ;Semantic rule
Fprime2 fdf pDiv
pattern pattern pattern
{MatchChar, ‘/’, Succeed, fdf} {sl_Match2, T, 0, pDiv} {OutputDiv, 0, 0,Fprime} ;Semantic rule
Control Structures ; T item consists of an “S” item or a “-” followed by another “T” item: T TT tpn
pattern pattern pattern
{MatchChar, ‘-’, S, TT} {sl_Match2, T, 0,tpn} {OutputNeg}
;Semantic rule
; An “S” item is either a string of one or more digits or “(“ followed by ; and “E” item followed by “)”: Const spd
pattern pattern
{sl_Match2, DoDigits, 0, spd} {OutputDigits}
DoDigits SpanDigits
pattern pattern
{Anycset, Digits, 0, SpanDigits} {Spancset, Digits}
S IntE CloseParen
pattern pattern pattern
{MatchChar, ‘(‘, Const, IntE} {sl_Match2, E, 0, CloseParen} {MatchChar, ‘)’}
Succeed
pattern
{DoSucceed}
include
stdsets.a
dseg
ends
cseg
segment assume
;Semantic rule
para public ‘code’ cs:cseg, ds:dseg
; DoSucceed matches the empty string. In other words, it matches anything ; and always returns success without eating any characters from the input ; string. DoSucceed
DoSucceed
proc mov stc ret endp
far ax, di
; OutputPlus is a semantic rule which outputs the “+” operator after the ; parser sees a valid addition operator in the infix string. OutputPlus
OutputPlus
proc print byte mov stc ret endp
far “ +”,0 ax, di
;Required by sl_Match
; OutputMinus is a semantic rule which outputs the “-” operator after the ; parser sees a valid subtraction operator in the infix string. OutputMinus
OutputMinus
proc print byte mov stc ret endp
far “ -”,0 ax, di
;Required by sl_Match
; OutputMul is a semantic rule which outputs the “*” operator after the ; parser sees a valid multiplication operator in the infix string.
Page 931
Chapter 16 OutputMul
OutputMul
proc print byte mov stc ret endp
far “ *”,0 ax, di
;Required by sl_Match
; OutputDiv is a semantic rule which outputs the “/” operator after the ; parser sees a valid division operator in the infix string. OutputDiv
OutputDiv
proc print byte mov stc ret endp
far “ /”,0 ax, di
;Required by sl_Match
; OutputNeg is a semantic rule which outputs the unary “-” operator after the ; parser sees a valid negation operator in the infix string. OutputNeg
OutputNeg
proc print byte mov stc ret endp
far “ neg”,0 ax, di
;Required by sl_Match
; OutputDigits outputs the numeric value when it encounters a legal integer ; value in the input string. OutputDigits
OutputDigits
proc push push mov putc lesi patgrab puts free stc pop mov pop ret endp
far es di al, ‘ ‘ const
di ax, di es
; Okay, here’s the main program which fetches the command line parameter ; and parses it. Main
proc mov mov mov meminit
print byte getsm print byte
Page 932
ax, dseg ds, ax es, ax ; memory to the heap.
“Enter an arithmetic expression: “,0
“Expression in postfix form: “,0
Control Structures ldxi xor match jc
Succeeded
print byte
“Syntax error”,0
Succeeded:
putcr
Quit: Main
ExitPgm endp
cseg
ends
infix2rpn cx, cx
; Allocate a reasonable amount of space for the stack (8k). sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
; zzzzzzseg must be the last segment that gets loaded into memory! zzzzzzseg LastBytes zzzzzzseg
16.7
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Constructing Patterns for the MATCH Routine A major issue we have yet to discuss is how to convert regular expressions and context free grammars into patterns suitable for the UCR Standard Library pattern matching routines. Most of the examples appearing up to this point have used an ad hoc translation scheme; now it is time to provide an algorithm to accomplish this. The following algorithm converts a context free grammar to a UCR Standard Library pattern data structure. If you want to convert a regular expression to a pattern, first convert the regular expression to a context free grammar (see “Converting REs to CFGs” on page 905). Of course, it is easy to convert many regular expression forms directly to a pattern, when such conversions are obvious you can bypass the following algorithm; for example, it should be obvious that you can use spancset to match a regular expression like [0-9]*. The first step you must always take is to eliminate left recursion from the grammar. You will generate an infinite loop (and crash the machine) if you attempt to code a grammar containing left recursion into a pattern data structure. For information on eliminating left recursion, see “Eliminating Left Recursion and Left Factoring CFGs” on page 903. You might also want to left factor the grammar while you are eliminating left recursion. The Standard Library routines fully support backtracking, so left factoring is not strictly necessary, however, the matching routine will execute faster if it does not need to backtrack. If a grammar production takes the form A → B C where A, B, and C are nonterminal symbols, you would create the following pattern: A
pattern
{sl_match2,B,0,C}
This pattern description for A checks for an occurrence of a B pattern followed by a C pattern.
Page 933
Chapter 16 If B is a relatively simple production (that is, you can convert it to a single pattern data structure), you can optimize this to: A
pattern
{B’s Matching Function, B’s parameter, 0, C}
The remaining examples will always call sl_match2, just to be consistent. However, as long as the nonterminals you invoke are simple, you can fold them into A’’s pattern. If a grammar production takes the form A → B | C where A, B, and C are nonterminal symbols, you would create the following pattern: A
pattern
{sl_match2, B, C}
This pattern tries to match B. If it succeeds, A succeeds; if it fails, it tries to match C. At this point, A’’s success or failure is the success or failure of C. Handling terminal symbols is the next thing to consider. These are quite easy – all you need to do is use the appropriate matching function provided by the Standard Library, e.g., matchstr or matchchar. For example, if you have a production of the form A → abc | y you would convert this to the following pattern: A abc ypat
pattern byte pattern
{matchstr,abc,ypat} “abc”,0 {matchchar,’y’}
The only remaining detail to consider is the empty string. If you have a production of the form A → ε then you need to write a pattern matching function that always succeed. The elegant way to do this is to write a custom pattern matching function. This function is succeed
succeed
proc mov stc ret endp
far ax, di
;Required by sl_match ;Always succeed.
Another, sneaky, way to force success is to use matchstr and pass it the empty string to match, e.g., success emptystr
pattern byte
{matchstr, emptystr} 0
The empty string always matches the input string, no matter what the input string contains. If you have a production with several alternatives and ε is one of them, you must process ε last. For example, if you have the productions A → abc | y | BC | ε you would use the following pattern: A abc tryY tryBC DoSuccess
pattern byte pattern pattern pattern
{matchstr,abc, tryY} “abc”,0 {matchchar, ‘y’, tryBC} {sl_match2, B, DoSuccess, C} {succeed}
While the technique described above will let you convert any CFG to a pattern that the Standard Library can process, it certainly does not take advantage of the Standard Library facilities, nor will it produce particularly efficient patterns. For example, consider the production: Digits →
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Converting this to a pattern using the techniques described above will yield the pattern: Digits try1 try2 try3 try4 try5 try6
Page 934
pattern pattern pattern pattern pattern pattern pattern
{matchchar, ‘0’, try1} {matchchar, ‘1’, try2} {matchchar, ‘2’, try3} {matchchar, ‘3’, try4} {matchchar, ‘4’, try5} {matchchar, ‘5’, try6} {matchchar, ‘6’, try7}
Control Structures try7 try8 try9
pattern pattern pattern
{matchchar, ‘7’, try8} {matchchar, ‘8’, try9} {matchchar, ‘9’}
Obviously this isn’t a very good solution because we can match this same pattern with the single statement: Digits
pattern
{anycset, digits}
If your pattern is easy to specify using a regular expression, you should try to encode it using the built-in pattern matching functions and fall back on the above algorithm once you’ve handled the low level patterns as best you can. With experience, you will be able to choose an appropriate balance between the algorithm in this section and ad hoc methods you develop on your own.
16.8
Some Sample Pattern Matching Applications The best way to learn how to convert a pattern matching problem to the respective pattern matching algorithms is by example. The following sections provide several examples of some small pattern matching problems and their solutions.
16.8.1
Converting Written Numbers to Integers One interesting pattern matching problem is to convert written (English) numbers to their integer equivalents. For example, take the string “one hundred ninety-two” and convert it to the integer 192. Although written numbers represent a pattern quite a bit more complex than the ones we’ve seen thus far, a little study will show that it is easy to decompose such strings. The first thing we will need to do is enumerate the English words we will need to process written numbers. This includes the following words: zero, one, two, three, four, five, six, seven, eight, nine, ten, eleven twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, thirty, forty, fifty sixty, seventy, eighty, ninety, hundred, and thousand. With this set of words we can build all the values between zero and 65,535 (the values we can represent in a 16 bit integer. Next, we’ve got to decide how to put these words together to form all the values between zero and 65,535. The first thing to note is that zero only occurs by itself, it is never part of another number. So our first production takes the form: Number →
zero | NonZero
The next thing to note is that certain values may occur in pairs, denoting addition. For example, eighty-five denotes the sum of eighty plus five. Also note that certain other pairs denote multiplication. If you have a statement like “two hundred” or “fifteen hundred” the “hundred” word says multiply the preceding value by 100. The multiplicative words, “hundred” and “thousand” , are also additive. Any value following these terms is added in to the total9; e.g., “one hundred five” means 1*100+5. By combining the appropriate rules, we obtain the following grammar NonZero → Thousands → Maybe100s → Hundreds → After100 →
Thousands Maybe100s | Hundreds Under100 thousand Hundreds | ε Under100 hundred After100 | Under100 Under100 | ε
9. We will ignore special multiplicative forms like “one thousand thousand” (one million) because these forms are all too large to fit into 16 bits. .
Page 935
Chapter 16 Under100 → Tens Maybe1s| Teens | ones Maybe1s → Ones | ε ones → one | two | three | four | five | six | seven | eight | nine teens → ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen tens → twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety
The final step is to add semantic actions to actually convert the strings matched by this grammar to integer values. The basic idea is to initialize an accumulator value to zero. Whenever you encounter one of the strings that ones, teens, or tens matches, you add the corresponding value to the accumulator. If you encounter the hundred or thousand strings, you multiply the accumulator by the appropriate factor. The complete program to do the conversion follows: ; ; ; ; ;
Numbers.asm This program converts written English numbers in the range “zero” to “sixty five thousand five hundred thirty five” to the corresponding integer value. .xlist include stdlib.a includelib stdlib.lib matchfuncs .list
dseg
segment
para public ‘data’
Value HundredsVal ThousandsVal
word word word
0 0 0
Str0 Str1 Str2 Str3 Str4 Str5 Str6 Str7 Str8 Str9 Str10 Str11 Str12 Str13
byte byte byte byte byte byte byte byte byte byte byte byte byte byte
“twenty one”,0 “nineteen hundred thirty-five”,0 “thirty three thousand two hundred nineteen”,0 “three”,0 “fourteen”,0 “fifty two”,0 “seven hundred”,0 “two thousand seven”,0 “four thousand ninety six”,0 “five hundred twelve”,0 “twenty three thousand two hundred ninety-five”,0 “seventy-five hundred”,0 “sixty-five thousand”,0 “one thousand”,0
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 936
;Store results here.
The following grammar is what we use to process the numbers. Semantic actions appear in the braces. Note: begin by initializing Value, HundredsVal, and ThousandsVal to zero. N
-> separators zero | N4
N4
-> do1000s maybe100s | do100s
Maybe100s
-> do100s | <empty string>
do1000s
-> Under100 “THOUSAND” separators {ThousandsVal := Value*1000}
do100s
-> Under100 “HUNDRED”
Control Structures ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
{HundredsVal := Value*100} After100 | Under100 After100
-> {Value := 0} Under100 | {Value := 0} <empty string>
Under100
-> {Value := 0} try20 try1s | {Value := 0} doTeens | {Value := 0} do1s
try1s
-> do1s | <empty string>
try20
-> “TWENTY” {Value := Value + 20} | “THIRTY” {Value := Value + 30} | ... | “NINETY” {Value := Value + 90}
doTeens
-> “TEN” {Value := Value + 10} | “ELEVEN” {Value := Value + 11} | ... | “NINETEEN” {Value := Value + 19}
do1s
-> “ONE” {Value := Value + 1} | “TWO” {Value := Value + 2} | ... | “NINE” {Value := Value + 9}
separators delim2 doSuccess AtLast AtEOS
pattern pattern pattern pattern pattern
{anycset, delimiters, 0, delim2} {spancset, delimiters} {succeed} {sl_match2, separators, AtEOS, AtEOS} {EOS}
N N2 zero
pattern pattern byte
{sl_match2, separators, N2, N2} {matchistr, zero, N3, AtLast} “ZERO”,0
N3 N4 Maybe100s
pattern pattern pattern
{sl_match2, N4, 0, AtLast} {sl_match2, do1000s, do100s, Maybe100s} {sl_match2, do100s, AtLast, AtLast}
do1000s do1000s2 do1000s3 do1000s4 do1000s5 str1000
pattern pattern pattern pattern pattern byte
{sl_match2, Under100, 0, do1000s2} {matchistr, str1000, 0, do1000s3} {sl_match2, separators, do1000s4, do1000s5} {EOS, 0, 0, do1000s5} {Get1000s} “THOUSAND”,0
do100s do100s1 do100s2 do100s3 do100s4 do100s5 str100
pattern pattern pattern pattern pattern pattern byte
{sl_match2, {sl_match2, {matchistr, {sl_match2, {EOS, 0, 0, {Get100s} “HUNDRED”,0
After100 After100a
pattern pattern
{SetVal, 0, 0, After100a} {sl_match2, Under100, doSuccess}
Under100 Under100a Under100b
pattern pattern pattern
{SetVal, 0, 0, Under100a} {sl_match2, try20, Under100b, Do1orE} {sl_match2, doTeens, do1s}
Do1orE
pattern
{sl_match2, do1s, doSuccess, 0}
NumPat
macro
lbl, next, Constant, string
do100s1, Under100, After100} Under100, 0, do100s2} str100, 0, do100s3} separators, do100s4, do100s5} do100s5}
Page 937
Chapter 16 lbl try SkipSpcs tryEOS val str
local pattern pattern pattern pattern pattern byte byte endm
try, SkipSpcs, val, str, tryEOS {sl_match2, try, next} {matchistr, str, 0, SkipSpcs} {sl_match2, separators, tryEOS, val} {EOS, 0, 0, val} {AddVal, Constant} string 0
NumPat NumPat NumPat NumPat NumPat NumPat NumPat NumPat NumPat NumPat
doTeens, try11, 10, “TEN” try11, try12, 11, “ELEVEN” try12, try13, 12, “TWELVE” try13, try14, 13, “THIRTEEN” try14, try15, 14, “FOURTEEN” try15, try16, 15, “FIFTEEN” try16, try17, 16, “SIXTEEN” try17, try18, 17, “SEVENTEEN” try18, try19, 18, “EIGHTEEN” try19, 0, 19, “NINETEEN”
NumPat NumPat NumPat NumPat NumPat NumPat NumPat NumPat NumPat
do1s, try2, try3, try4, try5, try6, try7, try8, try9,
NumPat NumPat NumPat NumPat NumPat NumPat NumPat NumPat
try20, try30, try40, try50, try60, try70, try80, try90,
include
stdsets.a
dseg
ends
cseg
segment assume
try2, try3, try4, try5, try6, try7, try8, try9, 0, 9,
1, “ONE” 2, “TWO” 3, “THREE” 4, “FOUR” 5, “FIVE” 6, “SIX” 7, “SEVEN” 8, “EIGHT” “NINE”
try30, try40, try50, try60, try70, try80, try90, 0, 90,
20, “TWENTY” 30, “THIRTY” 40, “FORTY” 50, “FIFTY” 60, “SIXTY” 70, “SEVENTY” 80, “EIGHTY” “NINETY”
para public ‘code’ cs:cseg, ds:dseg
; Semantic actions for our grammar: ; ; ; ; Get1000sWe’ve just processed the value one..nine, grab it from ; the value variable, multiply it by 1000, and store it ; into thousandsval. Get1000s
Page 938
proc push push mov mov
far ds dx ax, dseg ds, ax
mov mul mov mov
ax, 1000 Value ThousandsVal, ax Value, 0
pop
dx
Control Structures
Get1000s
mov pop stc ret endp
; Get100s; ;
We’ve just processed the value one..nine, grab it from the value variable, multiply it by 100, and store it into hundredsval.
Get100s
proc push push mov mov
far ds dx ax, dseg ds, ax
mov mul mov mov
ax, 100 Value HundredsVal, ax Value, 0 dx ax, di ds
Get100s
pop mov pop stc ret endp
; SetVal-
This routine sets Value to whatever is in si
SetVal
SetVal
proc push mov mov mov mov pop stc ret endp
; AddVal-
This routine sets adds whatever is in si to Value
AddVal
proc push mov mov add mov pop stc ret endp
AddVal
ax, di ds
;Required by sl_match. ;Always return success.
;Required by sl_match. ;Always return success.
far ds ax, dseg ds, ax Value, si ax, di ds
far ds ax, dseg ds, ax Value, si ax, di ds
; Succeed matches the empty string. In other words, it matches anything ; and always returns success without eating any characters from the input ; string. Succeed
Succeed
proc mov stc ret endp
far ax, di
; This subroutine expects a pointer to a string containing the English ; version of an integer number. It converts this to an integer and
Page 939
Chapter 16 ; prints the result. ConvertNumber
NoMatch:
proc mov mov mov
near value, 0 HundredsVal, 0 ThousandsVal, 0
ldxi xor match jnc mov putc puts print byte mov add add putu putcr jmp
N cx, cx
print byte
Done: ConvertNumber
ret endp
Main
proc mov mov mov
NoMatch al, “‘”
“‘ = “, 0 ax, ThousandsVal ax, HundredsVal ax, Value
Done
“Illegal number”,cr,lf,0
ax, dseg ds, ax es, ax
meminit
;Init memory manager.
; Union in a “-” to the delimiters set because numbers can have ; dashes in them. lesi mov addchar
delimiters al, ‘-’
; Some calls to test the ConvertNumber routine and the conversion process. lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi
Page 940
Str0 ConvertNumber Str1 ConvertNumber Str2 ConvertNumber Str3 ConvertNumber Str4 ConvertNumber Str5 ConvertNumber Str6 ConvertNumber Str7 ConvertNumber Str8 ConvertNumber Str9 ConvertNumber Str10 ConvertNumber Str11
Control Structures call lesi call lesi call
ConvertNumber Str12 ConvertNumber Str13 ConvertNumber
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Sample output: ‘twenty one’ = 21 ‘nineteen hundred thirty-five’ = 1935 ‘thirty three thousand two hundred nineteen’ = 33219 ‘three’ = 3 ‘fourteen’ = 14 ‘fifty two’ = 52 ‘seven hundred’ = 700 ‘two thousand seven’ = 2007 ‘four thousand ninety six’ = 4096 ‘five hundred twelve’ = 512 ‘twenty three thousand two hundred ninety-five’ = 23295 ‘seventy-five hundred’ = 7500 ‘sixty-five thousand’ = 65000 ‘one thousand’ = 1000
16.8.2
Processing Dates Another useful program that converts English text to numeric form is a date processor. A date processor takes strings like “Jan 23, 1997” and converts it to three integer values representing the month, day, and year. Of course, while we’re at it, it’s easy enough to modify the grammar for date strings to allow the input string to take any of the following common date formats: Jan 23, 1997 January 23, 1997 23 Jan, 1997 23 January, 1997 1/23/97 1-23-97 1/23/1997 1-23-1997
In each of these cases the date processing routines should store one into the variable month, 23 into the variable day, and 1997 into the year variable (we will assume all years are in the range 1900-1999 if the string supplies only two digits for the year). Of course, we could also allow dates like “January twenty-third, nineteen hundred and ninety seven” by using an number processing parser similar to the one presented in the previous section. However, that is an exercise left to the reader. The grammar to process dates is Date →
EngMon Integer Integer | Integer EngMon Integer |
Page 941
Chapter 16 Integer / Integer / Integer | Integer - Integer - Integer EngMon → Integer → digit →
JAN | JANUARY | FEB | FEBRUARY | … | DEC | DECEMBER digit Integer | digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
We will use some semantic rules to place some restrictions on these strings. For example, the grammar above allows integers of any size; however, months must fall in the range 1-12 and days must fall in the range 1-28, 1-29, 1-30, or 1-31 depending on the year and month. Years must fall in the range 0-99 or 1900-1999. Here is the 80x86 code for this grammar: ; datepat.asm ; ; This program converts dates of various formats to a three integer ; component value- month, day, and year. .xlist .286 include stdlib.a includelib stdlib.lib matchfuncs .list .lall
dseg
segment
para public ‘data’
; The following three variables hold the result of the conversion. month day year
word word word
0 0 0
; StrPtr is a double word value that points at the string under test. ; The output routines use this variable. It is declared as two word ; values so it is easier to store es:di into it. strptr
word
0,0
; Value is a generic variable the ConvertInt routine uses value
word
0
; Number of valid days in each month (Feb is handled specially) DaysInMonth
byte
31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
; Some sample strings to test the date conversion routines. Str0 Str1 Str2 Str3 Str4 Str5 Str6 Str7 Str8 Str9 Str10 Str11 Str12 Str13
Page 942
byte byte byte byte byte byte byte byte byte byte byte byte byte byte
“Feb 4, 1956”,0 “July 20, 1960”,0 “Jul 8, 1964”,0 “1/1/97”,0 “1-1-1997”,0 “12-25-74”,0 “3/28/1981”,0 “January 1, 1999”,0 “Feb 29, 1996”,0 “30 June, 1990”,0 “August 7, 1945”,0 “30 September, 1992”,0 “Feb 29, 1990”,0 “29 Feb, 1992”,0
Control Structures
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
The following grammar is what we use to process the dates Date -> | | |
EngMon Integer Integer Integer EngMon Integer Integer “/” Integer “/” Integer Integer “-” Integer “-” Integer
EngMon-> Integer-> digit->
Jan | January | Feb | February | ... | Dec | December digit integer | digit 0 | 1 | ... | 9
Some semantic rules this code has to check: If the year is in the range 0-99, this code has to add 1900 to it. If the year is not in the range 0-99 or 1900-1999 then return an error. The month must be in the range 1-12, else return an error. The day must be between one and 28, 29, 30, or 31. The exact maximum day depends on the month.
separators
pattern
{spancset, delimiters}
; DatePat processes dates of the form “MonInEnglish Day Year” DatePat DayYear YearPat
pattern pattern pattern
{sl_match2, EngMon, DatePat2, DayYear} {sl_match2, DayInteger, 0, YearPat} {sl_match2, YearInteger}
; DatePat2 processes dates of the form “Day MonInEng Year” DatePat2 MonthYear
pattern pattern
{sl_match2, DayInteger, DatePat3, MonthYear} {sl_match2, EngMon, 0, YearPat}
; DatePat3 processes dates of the form “mm-dd-yy” DatePat3 DatePat3a DatePat3b DatePat3c DatePat3d DatePat3e DatePat3f
pattern pattern pattern pattern pattern pattern pattern
{sl_match2, {sl_match2, {matchchar, {sl_match2, {sl_match2, {matchchar, {sl_match2,
MonInteger, DatePat4, DatePat3a} separators, DatePat3b, DatePat3b} ‘-’, 0, DatePat3c} DayInteger, 0, DatePat3d} separators, DatePat3e, DatePat3e} ‘-’, 0, DatePat3f} YearInteger}
; DatePat4 processes dates of the form “mm/dd/yy” DatePat4 DatePat4a DatePat4b DatePat4c DatePat4d DatePat4e DatePat4f
pattern pattern pattern pattern pattern pattern pattern
{sl_match2, {sl_match2, {matchchar, {sl_match2, {sl_match2, {matchchar, {sl_match2,
MonInteger, 0, DatePat4a} separators, DatePat4b, DatePat4b} ‘/’, 0, DatePat4c} DayInteger, 0, DatePat4d} separators, DatePat4e, DatePat4e} ‘/’, 0, DatePat4f} YearInteger}
; DayInteger matches an decimal string, converts it to an integer, and ; stores the result away in the Day variable. DayInteger SetDayPat
pattern pattern
{sl_match2, Integer, 0, SetDayPat} {SetDay}
; MonInteger matches an decimal string, converts it to an integer, and ; stores the result away in the Month variable. MonInteger SetMonPat
pattern pattern
{sl_match2, Integer, 0, SetMonPat} {SetMon}
Page 943
Chapter 16 ; YearInteger matches an decimal string, converts it to an integer, and ; stores the result away in the Year variable.
YearInteger SetYearPat
; ; ; ;
pattern pattern
{sl_match2, Integer, 0, SetYearPat} {SetYear}
Integer skips any leading delimiter characters and then matches a decimal string. The Integer0 pattern matches exactly the decimal characters; the code does a patgrab on Integer0 when converting this string to an integer.
Integer Integer0 number number2 Convert2Int
pattern pattern pattern pattern pattern
{sl_match2, separators, 0, Integer0} {sl_match2, number, 0, Convert2Int} {anycset, digits, 0, number2} {spancset, digits} {ConvertInt}
; A macro to make it easy to declare each of the 24 English month ; patterns (24 because we allow the full month name and an ; abbreviation). MoPat
macro name, next, str, str2, value local SetMo, string, full, short, string2, doMon
name short full
pattern pattern pattern
string
byte str byte 0
string2
byte byte
str2 0
SetMo
pattern endm
{MonthVal, value}
{sl_match2, short, next} {matchistr, string2, full, SetMo} {matchistr, string, 0, SetMo}
; EngMon is a chain of patterns that match one of the strings ; JAN, JANUARY, FEB, FEBRUARY, etc. The last parameter to the ; MoPat macro is the month number. EngMon
pattern {sl_match2, separators, jan, jan} MoPat jan, feb, “JAN”, “JANUARY”, 1 MoPat feb, mar, “FEB”, “FEBRUARY”, 2 MoPat mar, apr, “MAR”, “MARCH”, 3 MoPat apr, may, “APR”, “APRIL”, 4 MoPat may, jun, “MAY”, “MAY”, 5 MoPat jun, jul, “JUN”, “JUNE”, 6 MoPat jul, aug, “JUL”, “JULY”, 7 MoPat aug, sep, “AUG”, “AUGUST”, 8 MoPat sep, oct, “SEP”, “SEPTEMBER”, 9 MoPat oct, nov, “OCT”, “OCTOBER”, 10 MoPat nov, decem, “NOV”, “NOVEMBER”, 11 MoPat decem, 0, “DEC”, “DECEMBER”, 12
; We use the “digits” and “delimiters” sets from the standard library. include dseg
Page 944
ends
stdsets.a
Control Structures
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
; ConvertInt-
Matches a sequence of digits and converts them to an integer.
ConvertInt
proc push push push mov mov
far ds es di ax, dseg ds, ax
lesi Integer0 patgrab atou mov Value, ax free pop mov pop pop stc ret ConvertInt
di ax, di es ds
;Integer0 contains the decimal ; string we matched, grab that ; string and convert it to an ; integer and save the result. ;Free mem allocated by patgrab.
;Required by sl_match.
;Always succeed.
endp
; SetDay, SetMon, and SetYear simply copy value to the appropriate ; variable. SetDay
SetDay
SetMon
SetMon
SetYear
proc push mov mov mov mov mov pop stc ret endp
far ds ax, dseg ds, ax ax, value day, ax ax, di ds
proc push mov mov mov mov mov pop stc ret endp
far ds ax, dseg ds, ax ax, value Month, ax ax, di ds
proc push mov mov mov mov mov pop stc ret
far ds ax, dseg ds, ax ax, value Year, ax ax, di ds
Page 945
Chapter 16 SetYear
endp
; MonthVal is a pattern used by the English month patterns. ; This pattern function simply copies the matchparm field to ; the month variable (the matchparm field is passed in si). MonthVal
MonthVal
proc push mov mov mov mov pop stc ret endp
far ds ax, dseg ds, ax Month, si ax, di ds
; ChkDate;
Checks a date to see if it is valid. Returns with the carry flag set if it is, clear if not.
ChkDate
proc push push push
far ds ax bx
mov mov
ax, dseg ds, ax
; If the year is in the range 0-99, add 1900 to it. ; Then check to see if it’s in the range 1900-1999.
Notb100:
cmp ja add cmp jae cmp jb
Year, 100 Notb100 Year, 1900 Year, 2000 BadDate Year, 1900 BadDate
; Okay, make sure the month is in the range 1-12 cmp ja cmp jb
Month, 12 BadDate Month, 1 BadDate
; See if the number of days is correct for all months except Feb: mov mov test je cmp jne
bx, Month ax, Day ax, ax BadDate ah, 0 BadDate
cmp je cmp ja jmp
bx, 2 DoFeb al, DaysInMonth[bx-1] BadDate GoodDate
;Make sure Day <> 0.
;Make sure Day < 256.
;Handle Feb elsewhere. ;Check against max val.
; Kludge to handle leap years. Note that 1900 is *not* a leap year. DoFeb:
Page 946
cmp jb ja mov
ax, 29 GoodDate BadDate bx, Year
;Only applies if day is ; equal to 29. ;Error if Day > 29. ;1900 is not a leap year
Control Structures cmp je and jne
bx, 1900 BadDate bx, 11b BadDate
GoodDate:
pop pop pop stc ret
bx ax ds
BadDate:
bx ax ds
ChkDate
pop pop pop clc ret endp
; ConvertDate; ; ; ;
ES:DI contains a pointer to a string containing a valid date. This routine converts that date to the three integer values found in the Month, Day, and Year variables. Then it prints them to verify the pattern matching routine.
ConvertDate
proc
near
ldxi xor match jnc
DatePat cx, cx
mov mov
strptr, di strptr+2, es
;Save string pointer for ; use by printf
call jnc
ChkDate NoMatch
;Validate the date.
printf byte dword jmp
“%-20^s = Month: %2d Day: %2d Year: %4d\n”,0 strptr, Month, Day, Year Done
printf byte dword
“Illegal date (‘%^s’)”,cr,lf,0 strptr
NoMatch:
Done: ConvertDate
ret endp
Main
proc mov mov mov
; so handle that here. ;Else, Year mod 4 is a ; leap year.
NoMatch
ax, dseg ds, ax es, ax
meminit
;Init memory manager.
; Call ConvertDate to test several different date strings. lesi call lesi call lesi call lesi call
Str0 ConvertDate Str1 ConvertDate Str2 ConvertDate Str3 ConvertDate
Page 947
Chapter 16 lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call
Str4 ConvertDate Str5 ConvertDate Str6 ConvertDate Str7 ConvertDate Str8 ConvertDate Str9 ConvertDate Str10 ConvertDate Str11 ConvertDate Str12 ConvertDate Str13 ConvertDate
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Sample Output: Feb 4, 1956 = Month: 2 Day: 4 Year: 1956 July 20, 1960 = Month: 7 Day: 20 Year: 1960 Jul 8, 1964 = Month: 7 Day: 8 Year: 1964 1/1/97 = Month: 1 Day: 1 Year: 1997 1-1-1997 = Month: 1 Day: 1 Year: 1997 12-25-74 = Month: 12 Day: 25 Year: 1974 3/28/1981 = Month: 3 Day: 28 Year: 1981 January 1, 1999 = Month: 1 Day: 1 Year: 1999 Feb 29, 1996 = Month: 2 Day: 29 Year: 1996 30 June, 1990 = Month: 6 Day: 30 Year: 1990 August 7, 1945 = Month: 8 Day: 7 Year: 1945 30 September, 1992 = Month: 9 Day: 30 Year: 1992 Illegal date (‘Feb 29, 1990’) 29 Feb, 1992 = Month: 2 Day: 29 Year: 1992
16.8.3
Evaluating Arithmetic Expressions Many programs (e.g., spreadsheets, interpreters, compilers, and assemblers) need to process arithmetic expressions. The following example provides a simple calculator that operates on floating point numbers. This particular program uses the 80x87 FPU chip, although it would not be too difficult to modify it so that it uses the floating point routines in the UCR Standard Library. ; ARITH2.ASM ; ; A simple floating point calculator that demonstrates the use of the ; UCR Standard Library pattern matching routines. Note that this
Page 948
Control Structures ; program requires an FPU. .xlist .386 .387 option segment:use16 include stdlib.a includelib stdlib.lib matchfuncs .list dseg
segment
para public ‘data’
; The following is a temporary used when converting a floating point ; string to a 64 bit real value. CurValue
real8
0.0
; Some sample strings containing expressions to try out: Str1 Str2 Str3 Str4 Str5 Str6 Str7 Str8 Str9 str10 ; ; ; ; ; ; ; ; ; ; ; ; ;
byte byte byte byte byte byte byte byte byte byte
“5+2*(3-1)”,0 “(5+2)*(7-10)”,0 “5”,0 “(6+2)/(5+1)-7e5*2/1.3e2+1.5”,0 “2.5*(2-(3+1)/4+1)”,0 “6+(-5*2)”,0 “6*-1”,0 “1.2e5/2.1e5”,0 “0.9999999999999999+1e-15”,0 “2.1-1.1”,0
Grammar for simple infix -> postfix translation operation: Semantic rules appear in braces. E -> FE’ {print result} E’ -> +F {fadd} E’ | -F {fsub} E’ | <empty string> F -> TF’ F -> *T {fmul} F’ | /T {fdiv} F’ | <empty string> T -> -T {fchs} | S S -> {fld constant} | (E)
UCR Standard Library Pattern which handles the grammar above:
; An expression consists of an “E” item followed by the end of the string: Expression EndOfString
pattern pattern
{sl_Match2,E,,EndOfString} {EOS}
; An “E” item consists of an “F” item optionally followed by “+” or “-” ; and another “E” item: E Eprime epf epPlus
pattern pattern pattern pattern
{sl_Match2, F,,Eprime} {MatchChar, ‘+’, Eprime2, epf} {sl_Match2, F,,epPlus} {DoFadd,,,Eprime}
Eprime2 emf epMinus
pattern pattern pattern
{MatchChar, ‘-’, Succeed, emf} {sl_Match2, F,,epMinus} {DoFsub,,,Eprime}
; An “F” item consists of a “T” item optionally followed by “*” or “/” ; followed by another “T” item: F Fprime fmf pMul
pattern pattern pattern pattern
{sl_Match2, T,,Fprime} {MatchChar, ‘*’, Fprime2, fmf} {sl_Match2, T, 0, pMul} {DoFmul,,,Fprime}
Page 949
Chapter 16 Fprime2 fdf pDiv
pattern pattern pattern
{MatchChar, ‘/’, Succeed, fdf} {sl_Match2, T, 0, pDiv} {DoFdiv, 0, 0,Fprime}
; T item consists of an “S” item or a “-” followed by another “T” item: T TT tpn ; ; ; ; ; ; ; ; ; ;
pattern pattern pattern
{MatchChar, ‘-’, S, TT} {sl_Match2, T, 0,tpn} {DoFchs}
An “S” item is either a floating point constant or “(“ followed by and “E” item followed by “)”. The regular expression for a floating point constant is [0-9]+ ( “.” [0-9]* | ) ( ((e|E) (+|-| ) [0-9]+) | ) Note: the pattern “Const” matches exactly the characters specified by the above regular expression. It is the pattern the calculator grabs when converting a string to a floating point number.
Const ConstStr Const2 Const3 Const4 Const5 Const6 Const7 Const8
pattern pattern pattern pattern pattern pattern pattern pattern pattern
{sl_match2, {sl_match2, {matchchar, {sl_match2, {matchchar, {matchchar, {matchchar, {matchchar, {sl_match2,
FldConst
pattern
{PushValue}
ConstStr, 0, FLDConst} DoDigits, 0, Const2} ‘.’, Const4, Const3} DoDigits, Const4, Const4} ‘e’, const5, const6} ‘E’, Succeed, const6} ‘+’, const7, const8} ‘-’, const8, const8} DoDigits}
; DoDigits handles the regular expression [0-9]+ DoDigits SpanDigits
pattern pattern
{Anycset, Digits, 0, SpanDigits} {Spancset, Digits}
; The S production handles constants or an expression in parentheses. S IntE CloseParen
pattern pattern pattern
{MatchChar, ‘(‘, Const, IntE} {sl_Match2, E, 0, CloseParen} {MatchChar, ‘)’}
; The Succeed pattern always succeeds. Succeed
pattern
{DoSucceed}
; We use digits from the UCR Standard Library cset standard sets. include dseg
ends
cseg
segment assume
stdsets.a
para public ‘code’ cs:cseg, ds:dseg
; DoSucceed matches the empty string. In other words, it matches anything ; and always returns success without eating any characters from the input ; string. DoSucceed
DoSucceed
Page 950
proc mov stc ret endp
far ax, di
Control Structures ; DoFadd - Adds the two items on the top of the FPU stack. DoFadd
DoFadd
proc faddp mov stc ret endp
far st(1), st ax, di
;Required by sl_Match ;Always succeed.
; DoFsub - Subtracts the two values on the top of the FPU stack. DoFsub
DoFsub
proc fsubp mov stc ret endp
far st(1), st ax, di
;Required by sl_Match
; DoFmul- Multiplies the two values on the FPU stack. DoFmul
DoFmul
proc fmulp mov stc ret endp
far st(1), st ax, di
;Required by sl_Match
; DoFdiv- Divides the two values on the FPU stack. DoFDiv
DoFDiv
proc fdivp mov stc ret endp
far st(1), st ax, di
;Required by sl_Match
; DoFchs- Negates the value on the top of the FPU stack. DoFchs
DoFchs
proc fchs mov stc ret endp
far
; PushValue; ;
We’ve just matched a string that corresponds to a floating point constant. Convert it to a floating point value and push that value onto the FPU stack.
PushValue
proc push push pusha mov mov
ax, di
;Required by sl_Match
far ds es ax, dseg ds, ax
lesi patgrab atof free lesi sdfpa fld
Const
CurValue
popa mov pop pop
ax, di es ds
CurValue
;FP val matched by this pat. ;Get a copy of the string. ;Convert to real. ;Return mem used by patgrab. ;Copy floating point accumulator ; to a local variable and then ; copy that value to the FPU stk.
Page 951
Chapter 16
PushValue
stc ret endp
; DoExp; ;
This routine expects a pointer to a string containing an arithmetic expression in ES:DI. It evaluates the given expression and prints the result.
DoExp
proc finit fwait
near ;Be sure to do this!
puts ldxi xor match jc printff byte ret GoodVal:
DoExp
fstp printff byte dword ret endp
;Print the expression Expression cx, cx GoodVal “ is an illegal expression”,cr,lf,0
CurValue “ = %12.6ge\n”,0 CurValue
; The main program tests the expression evaluator. Main
Page 952
proc mov mov mov meminit
ax, dseg ds, ax es, ax
lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call
Str1 DoExp Str2 DoExp Str3 DoExp Str4 DoExp Str5 DoExp Str6 DoExp Str7 DoExp Str8 DoExp Str9 DoExp Str10 DoExp
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes
segment db
para public ‘zzzzzz’ 16 dup (?)
Control Structures zzzzzzseg
ends end
Main
Sample Output: 5+2*(3-1) = 9.000E+0000 (5+2)*(7-10) = -2.100E+0001 5 = 5.000E+0000 (6+2)/(5+1)-7e5*2/1.3e2+1.5 = -1.077E+0004 2.5*(2-(3+1)/4+1) = 5.000E+0000 6+(-5*2) = -4.000E+0000 6*-1 = -6.000E+0000 1.2e5/2.1e5 = 5.714E-0001 0.9999999999999999+1e-15 = 1.000E+0000 2.1-1.1 = 1.000E+0000
16.8.4
A Tiny Assembler Although the UCR Standard Library pattern matching routines would probably not be appropriate for writing a full lexical analyzer or compiler, they are useful for writing small compilers/assemblers or programs where speed of compilation/assembly is of little concern. One good example is the simple nonsymbolic assembler appearing in the SIM88610 simulator for an earlier version of the x86 processors11. This “mini-assembler” accepts an x86 assembly language statement and immediately assembles it into memory. This allows SIM886 users to create simple assembly language programs within the SIM886 monitor/debugger12. Using the Standard Library pattern matching routines makes it very easy to implement such an assembler. The grammar for this miniassembler is Stmt →
Grp1 Grp2 Grp3 goto halt
Grp1 → Grp2 → Grp3 →
load | store | add | sub ifeq | iflt | ifgt get | put
reg →
ax | bx | cx | dx
operand →
reg | constant | [bx] | constant [bx]
constant →
hexdigit constant | hexdigit
hexdigit →
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f
reg “,” reg “,” operand operand
operand | reg “,” constant | | |
There are some minor semantic details that the program handles (such as disallowing stores into immediate operands). The assembly code for the miniassembler follows: ; ASM.ASM ; .xlist include stdlib.a matchfuncs includelib stdlib.lib .list 10. SIM886 is an earlier version of SIMx86. It is also available on the Companion CD-ROM. 11. The current x86 system is written with Borland’s Delphi, using a pattern matching library written for Pascal that is very similar to the Standard Library’s pattern matching code. 12. See the lab manual for more details on SIM886.
Page 953
Chapter 16 dseg
segment
para public ‘data’
; Some sample statements to assemble: Str1 Str2 Str3 Str4 Str5 Str6 Str7 Str8 Str9 Str10 Str11 Str12 Str13 Str14
byte byte byte byte byte byte byte byte byte byte byte byte byte byte
“load ax, 0”,0 “load ax, bx”,0 “load ax, ax”,0 “add ax, 15”,0 “sub ax, [bx]”,0 “store bx, [1000]”,0 “load bx, 2000[bx]”,0 “goto 3000”,0 “iflt ax, bx, 100”,0 “halt”,0 “This is illegal”,0 “load ax, store”,0 “store ax, 1000”,0 “ifeq ax, 0, 0”,0
; Variables used by the assembler. AsmConst AsmOpcode AsmOprnd1 AsmOprnd2
word byte byte byte
0 0 0 0
include
stdsets.a
;Bring in the standard char sets.
; Patterns for the assembler: ; Pattern is ( ; (load|store|add|sub) reg “,” operand | ; (ifeq|iflt|ifgt) reg1 “,” reg2 “,” const | ; (get|put) operand | ; goto operand | ; halt ; ) ; ; With a few semantic additions (e.g., cannot store to a const). InstrPat
pattern
{spancset, WhiteSpace,Grp1,Grp1}
Grp1 Grp1Strs Grp1Store Grp1Add Grp1Sub
pattern pattern pattern pattern pattern
{sl_Match2,Grp1Strs, Grp2 ,Grp1Oprnds} {TryLoad,,Grp1Store} {TryStore,,Grp1Add} {TryAdd,,Grp1Sub} {TrySub}
; Patterns for the LOAD, STORE, ADD, and SUB instructions. LoadPat LoadInstr2
pattern byte
{MatchStr,LoadInstr2} “LOAD”,0
StorePat StoreInstr2
pattern byte
{MatchStr,StoreInstr2} “STORE”,0
AddPat AddInstr2
pattern byte
{MatchStr,AddInstr2} “ADD”,0
SubPat SubInstr2
pattern byte
{MatchStr,SubInstr2} “SUB”,0
; Patterns for the group one (LOAD/STORE/ADD/SUB) instruction operands: Grp1Oprnds Grp1Reg Grp1ws2 Grp1Comma Grp1ws3
Page 954
pattern pattern pattern pattern pattern
{spancset,WhiteSpace,Grp1reg,Grp1reg} {MatchReg,AsmOprnd1,,Grp1ws2} {spancset,WhiteSpace,Grp1Comma,Grp1Comma} {MatchChar,’,’,0,Grp1ws3} {spancset,WhiteSpace,Grp1Op2,Grp1Op2}
Control Structures Grp1Op2 EndOfLine NullChar
pattern pattern pattern
{MatchGen,,,EndOfLine} {spancset,WhiteSpace,NullChar,NullChar} {EOS}
Grp1Op2Reg
pattern
{MatchReg,AsmOprnd2}
; Patterns for the group two instructions (IFEQ, IFLT, IFGT): Grp2 Grp2Strs Grp2IFLT Grp2IFGT
pattern pattern pattern pattern
{sl_Match2,Grp2Strs, Grp3 ,Grp2Oprnds} {TryIFEQ,,Grp2IFLT} {TryIFLT,,Grp2IFGT} {TryIFGT}
Grp2Oprnds Grp2Reg Grp2ws2 Grp2Comma Grp2ws3 Grp2Reg2 Grp2ws4 Grp2Comma2 Grp2ws5 Grp2Op3
pattern pattern pattern pattern pattern pattern pattern pattern pattern pattern
{spancset,WhiteSpace,Grp2reg,Grp2reg} {MatchReg,AsmOprnd1,,Grp2ws2} {spancset,WhiteSpace,Grp2Comma,Grp2Comma} {MatchChar,’,’,0,Grp2ws3} {spancset,WhiteSpace,Grp2Reg2,Grp2Reg2} {MatchReg,AsmOprnd2,,Grp2ws4} {spancset,WhiteSpace,Grp2Comma2,Grp2Comma2} {MatchChar,’,’,0,Grp2ws5} {spancset,WhiteSpace,Grp2Op3,Grp2Op3} {ConstPat,,,EndOfLine}
; Patterns for the IFEQ, IFLT, and IFGT instructions. IFEQPat IFEQInstr2
pattern byte
{MatchStr,IFEQInstr2} “IFEQ”,0
IFLTPat IFLTInstr2
pattern byte
{MatchStr,IFLTInstr2} “IFLT”,0
IFGTPat IFGTInstr2
pattern byte
{MatchStr,IFGTInstr2} “IFGT”,0
pattern pattern pattern pattern
{sl_Match2,Grp3Strs, Grp4 ,Grp3Oprnds} {TryGet,,Grp3Put} {TryPut,,Grp3GOTO} {TryGOTO}
; Grp3 Patterns: Grp3 Grp3Strs Grp3Put Grp3Goto
; Patterns for the GET and PUT instructions. GetPat GetInstr2
pattern byte
{MatchStr,GetInstr2} “GET”,0
PutPat PutInstr2
pattern byte
{MatchStr,PutInstr2} “PUT”,0
GOTOPat GOTOInstr2
pattern byte
{MatchStr,GOTOInstr2} “GOTO”,0
; Patterns for the group three (PUT/GET/GOTO) instruction operands: Grp3Oprnds Grp3Op
pattern pattern
{spancset,WhiteSpace,Grp3Op,Grp3Op} {MatchGen,,,EndOfLine}
; Patterns for the group four instruction (HALT). Grp4
pattern
{TryHalt,,,EndOfLine}
HaltPat HaltInstr2
pattern byte
{MatchStr,HaltInstr2} “HALT”,0
; Patterns to match the four non-register addressing modes: BXIndrctPat BXIndrctStr
pattern byte
{MatchStr,BXIndrctStr} “[BX]”,0
Page 955
Chapter 16 BXIndexedPat
pattern
{ConstPat,,,BXIndrctPat}
DirectPat DP2 DP3
pattern pattern pattern
{MatchChar,’[‘,,DP2} {ConstPat,,,DP3} {MatchChar,’]’}
ImmediatePat
pattern
{ConstPat}
; Pattern to match a hex constant: HexConstPat
pattern
dseg
ends
cseg
segment assume
{Spancset, xdigits}
para public ‘code’ cs:cseg, ds:dseg
; The store macro tweaks the DS register and stores into the ; specified variable in DSEG. store
; ; ; ;
macro push push mov mov mov pop pop endm
Where, What ds ax ax, seg Where ds, ax Where, What ax ds
Pattern matching routines for the assembler. Each mnemonic has its own corresponding matching function that attempts to match the mnemonic. If it does, it initializes the AsmOpcode variable with the base opcode of the instruction.
; Compare against the “LOAD” string. TryLoad
NoTLMatch:
TryLoad
proc push push ldxi match2 jnc
far dx si LoadPat NoTLMatch
store
AsmOpcode, 0
pop pop ret endp
si dx
;Initialize base opcode.
; Compare against the “STORE” string. TryStore
NoTSMatch:
TryStore
proc push push ldxi match2 jnc store pop pop ret endp
far dx si StorePat NoTSMatch AsmOpcode, 1 si dx
; Compare against the “ADD” string. TryAdd
Page 956
proc push
far dx
;Initialize base opcode.
Control Structures push ldxi match2 jnc store NoTAMatch:
TryAdd
pop pop ret endp
si AddPat NoTAMatch AsmOpcode, 2
;Initialize ADD opcode.
si dx
; Compare against the “SUB” string. TrySub
NoTMMatch:
TrySub
proc push push ldxi match2 jnc store pop pop ret endp
far dx si SubPat NoTMMatch AsmOpcode, 3
;Initialize SUB opcode.
si dx
; Compare against the “IFEQ” string. TryIFEQ
NoIEMatch:
TryIFEQ
proc push push ldxi match2 jnc store pop pop ret endp
far dx si IFEQPat NoIEMatch AsmOpcode, 4
;Initialize IFEQ opcode.
si dx
; Compare against the “IFLT” string. TryIFLT
NoILMatch:
TryIFLT
proc push push ldxi match2 jnc store pop pop ret endp
far dx si IFLTPat NoILMatch AsmOpcode, 5
;Initialize IFLT opcode.
si dx
; Compare against the “IFGT” string. TryIFGT
NoIGMatch:
TryIFGT
proc push push ldxi match2 jnc store pop pop ret endp
far dx si IFGTPat NoIGMatch AsmOpcode, 6
;Initialize IFGT opcode.
si dx
Page 957
Chapter 16 ; Compare against the “GET” string. TryGET
NoGMatch:
TryGET
proc push push ldxi match2 jnc store store pop pop ret endp
far dx si GetPat NoGMatch AsmOpcode, 7 AsmOprnd1, 2
;Initialize Special opcode. ;GET’s Special opcode.
si dx
; Compare against the “PUT” string. TryPut
NoPMatch:
TryPUT
proc push push ldxi match2 jnc store store pop pop ret endp
far dx si PutPat NoPMatch AsmOpcode, 7 AsmOprnd1, 3
;Initialize Special opcode. ;PUT’s Special opcode.
si dx
; Compare against the “GOTO” string. TryGOTO
NoGMatch:
TryGOTO
proc push push ldxi match2 jnc store store pop pop ret endp
far dx si GOTOPat NoGMatch AsmOpcode, 7 AsmOprnd1, 1
;Initialize Special opcode. ;PUT’s Special opcode.
si dx
; Compare against the “HALT” string. TryHalt
NoHMatch:
TryHALT ; ; ; ;
Page 958
proc push push ldxi match2 jnc store store store pop pop ret endp
far dx si HaltPat NoHMatch AsmOpcode, 7 AsmOprnd1, 0 AsmOprnd2, 0
;Initialize Special opcode. ;Halt’s special opcode.
si dx
MatchReg checks to see if we’ve got a valid register value. On entry, DS:SI points at the location to store the byte opcode (0, 1, 2, or 3) for a reasonable register (AX, BX, CX, or DX); ES:DI points at the string containing (hopefully) the register operand, and CX points at the last
Control Structures ; location plus one we can check in the string. ; ; On return, Carry=1 for success, 0 for failure. ES:AX must point beyond ; the characters which make up the register if we have a match. MatchReg
proc
far
; ES:DI Points at two characters which should be AX/BX/CX/DX. Anything ; else is an error. cmp jne xor cmp je inc cmp je inc cmp je inc cmp je clc mov ret
byte ptr BadReg ax, ax byte ptr GoodReg ax byte ptr GoodReg ax byte ptr GoodReg ax byte ptr GoodReg
ds:[si], al ax, 2[di] ax, cx BadReg
MatchReg
mov lea cmp ja stc ret endp
; MatchGen; ; ;
Matches a general addressing mode. Stuffs the appropriate addressing mode code into AsmOprnd2. If a 16-bit constant is required by this addressing mode, this code shoves that into the AsmConst variable.
MatchGen
proc push push
BadReg:
es:1[di], ‘X’
;Everyone needs this
es:[di], ‘A’
;886 “AX” reg code. ;AX?
es:[di], ‘B’
;BX?
es:[di], ‘C’
;CX?
es:[di], ‘D’
;DX?
ax, di
GoodReg: ;Save register opcode. ;Skip past register. ;Be sure we didn’t go ; too far.
far dx si
; Try a register operand. ldxi match2 jc
Grp1Op2Reg MGDone
; Try “[bx]”. ldxi match2 jnc store jmp
BXIndrctPat TryBXIndexed AsmOprnd2, 4 MGDone
; Look for an operand of the form “xxxx[bx]”. TryBXIndexed: ldxi match2 jnc store jmp
BXIndexedPat TryDirect AsmOprnd2, 5 MGDone
; Try a direct address operand “[xxxx]”.
Page 959
Chapter 16 TryDirect: ldxi match2 jnc store jmp
DirectPat TryImmediate AsmOprnd2, 6 MGDone
; Look for an immediate operand “xxxx”. TryImmediate: ldxi match2 jnc store
ImmediatePat MGDone AsmOprnd2, 7
MGDone:
MatchGen
pop pop ret endp
si dx
; ConstPat;
Matches a 16-bit hex constant. If it matches, it converts the string to an integer and stores it into AsmConst.
ConstPat
proc push push ldxi match2 jnc
far dx si HexConstPat CPDone
push push mov mov atoh mov pop pop stc
ds ax ax, seg AsmConst ds, ax
si dx
ConstPat
pop pop ret endp
; Assemble;
This code assembles the instruction that ES:DI points at and displays the hex opcode(s) for that instruction.
Assemble
proc
CPDone:
AsmConst, ax ax ds
near
; Print out the instruction we’re about to assemble. print byte strupr puts putcr
“Assembling: “,0
; Assemble the instruction: ldxi xor match jnc
InstrPat cx, cx SyntaxError
; Quick check for illegal instructions: cmp
Page 960
AsmOpcode, 7
;Special/Get instr.
Control Structures jne cmp je cmp je
TryStoreInstr AsmOprnd1, 2 SeeIfImm AsmOprnd1, 1 IsGOTO
TryStoreInstr:
cmp jne
AsmOpcode, 1 InstrOkay
;Store Instruction
SeeIfImm:
cmp jne print db db jmp
AsmOprnd2, 7 InstrOkay
;Immediate Adrs Mode
IsGOTO:
cmp je print db byte db jmp
;GET opcode ;Goto opcode
“Syntax error: store/get immediate not allowed.” “ Try Again”,cr,lf,0 ASMDone AsmOprnd2, 7 InstrOkay
;Immediate mode for GOTO
“Syntax error: GOTO only allows immediate “ “mode.”,cr,lf 0 ASMDone
; Merge the opcode and operand fields together in the instruction byte, ; then output the opcode byte. InstrOkay:
SimpleInstr:
mov shl shl or shl shl shl or puth cmp jb cmp jbe
al, al, al, al, al, al, al, al,
AsmOpcode 1 1 AsmOprnd1 1 1 1 AsmOprnd2
AsmOpcode, 4 SimpleInstr AsmOpcode, 6 PutConstant
;IFEQ instruction ;IFGT instruction
cmp AsmOprnd2, 5 jb ASMDone
; If this instruction has a 16 bit operand, output it here. PutConstant:
SyntaxError:
ASMDone: Assemble
mov putc mov puth mov putc xchg puth jmp print db db
al, ‘ ‘ ax, ASMConst al, ‘ ‘ al, ah ASMDone
“Syntax error in instruction.” cr,lf,0
putcr ret endp
; Main program that tests the assembler. Main
proc mov mov mov
ax, seg dseg ;Set up the segment registers ds, ax es, ax
Page 961
Chapter 16 meminit lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call lesi call
Str1 Assemble Str2 Assemble Str3 Assemble Str4 Assemble Str5 Assemble Str6 Assemble Str7 Assemble Str8 Assemble Str9 Assemble Str10 Assemble Str11 Assemble Str12 Assemble Str13 Assemble Str14 Assemble
Quit: Main cseg
ExitPgm endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 256 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Sample Output: Assembling: LOAD AX, 0 07 00 00 Assembling: LOAD AX, BX 01 Assembling: LOAD AX, AX 00 Assembling: ADD AX, 15 47 15 00 Assembling: SUB AX, [BX] 64 Assembling: STORE BX, [1000] 2E 00 10 Assembling: LOAD BX, 2000[BX] 0D 00 20 Assembling: GOTO 3000 EF 00 30 Assembling: IFLT AX, BX, 100 A1 00 01 Assembling: HALT E0 Assembling: THIS IS ILLEGAL Syntax error in instruction.
Page 962
Control Structures Assembling: LOAD AX, STORE Syntax error in instruction. Assembling: STORE AX, 1000 Syntax error: store/get immediate not allowed. Try Again Assembling: IFEQ AX, 0, 0 Syntax error in instruction.
16.8.5
The “MADVENTURE” Game Computer games are a perfect example of programs that often use pattern matching. One class of computer games in general, the adventure game13, is a perfect example of games that use pattern matching. An adventure style game excepts English-like commands from the user, parses these commands, and acts upon them. In this section we will develop an adventure game shell. That is, it will be a reasonably functional adventure style game, capable of accepting and processing user commands. All you need do is supply a story line and a few additional details to develop a fully functioning adventure class game. An adventure game usually consists of some sort of maze through which the player moves. The program processes commands like go north or go right to move the player through the maze. Each move can deposit the player in a new room of the game. Generally, each room or area contains objects the player can interact with. This could be reward objects such as items of value or it could be an antagonistic object like a monster or enemy player. Usually, an adventure game is a puzzle of some sort. The player finds clues and picks up useful object in one part of the maze to solve problems in other parts of the maze. For example, a player could pick up a key in one room that opens a chest in another; then the player could find an object in the chest that is useful elsewhere in the maze. The purpose of the game is to solve all the interlocking puzzles and maximize one’s score (however that is done). This text will not dwell upon the subtleties of game design; that is a subject for a different text. Instead, we’ll look at the tools and data structures required to implement the game design. The Madventure game’s use of pattern matching is quite different from the previous examples appearing in this chapter. In the examples up to this point, the matching routines specifically checked the validity of an input string; Madventure does not do this. Instead, it uses the pattern matching routines to simply determine if certain key words appear on a line input by the user. The program handles the actual parsing (determining if the command is syntactically correct). To understand how the Madventure game does this, it would help if we took a look at how to play the Madventure game14. The Madventure prompts the user to enter a command. Unlike the original adventure game that required commands like “GO NORTH” (with no other characters other than spaces as part of the command), Madventure allows you to write whole sentences and then it attempts to pick out the key words from those sentences. For example, Madventure accepts the “GO NORTH” command; however, it also accepts commands like “North is the direction I want to go” and “I want to go in the north direction.” Madventure doesn’t really care as long as it can find “GO” and “NORTH” somewhere on the command line. This is a little more flexible that the original Adventure game structure. Of course, this scheme isn’t infallible, it will treat commands like “I absolutely, positively, do NOT want to go anywhere near the north direction” as a “GO NORTH” command. Oh well, the user almost always types just “GO NORTH” anyway.
13. These are called adventure games because the original program of the genre was called “Adventure.” 14. One word of caution, no one is going to claim that Madventure is a great game. If it were, it would be sold, it wouldn’t appear in this text! So don’t expect too much from the design of the game itself.
Page 963
Chapter 16 A Madventure command usually consists of a noun keyword and a verb keyword. The Madventure recognizes six verbs and fourteen nouns15. The verbs are verbs →
go | get | drop | inventory | quit | help
The nouns are nouns →
north | south | east | west | lime | beer | card | sign | program | homework | money | form | coupon
Obviously, Madventure does not allow all combinations of verbs and nouns. Indeed, the following patterns are the only legal ones: LegalCmds →
go direction | get item | drop item | inventory | quit | help
direction →
north | south | east | west
item →
lime | beer | card | sign | program | homework | money | form | coupon
However, the pattern does not enforce this grammar. It just locates a noun and a verb on the line and, if found, sets the noun and verb variables to appropriate values to denote the keywords it finds. By letting the main program handle the parsing, the program is somewhat more flexible. There are two main patterns in the Madventure program: NounPat and VerbPat. These patterns match words (nouns or verbs) using a regular expression like the following: (ARB* ‘ ‘ |
ε) word (‘ ‘ | EOS)
This regular expression matches a word that appears at the beginning of a sentence, at the end of a sentence, anywhere in the middle of a sentence, or a sentence consisting of a single word. Madventure uses a macro (MatchNoun or MatchVerb) to create an expression for each noun and verb in the above expression. To get an idea of how Madvent processes words, consider the following VerbPat pattern: VerbPat
pattern MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb
{sl_match2, MatchGo} MatchGO, MatchGet, “GO”, 1 MatchGet, MatchDrop, “GET”, 2 MatchDrop, MatchInv, “DROP”, 3 MatchInv, MatchQuit, “INVENTORY”, 4 MatchQuit, MatchHelp, “QUIT”, 5 MatchHelp, 0, “HELP”, 6
The MatchVerb macro expects four parameters. The first is an arbitrary pattern name; the second is a link to the next pattern in the list; the third is the string to match, and the fourth is a number that the matching routines will store into the verb variable if that string matches (by default, the verb variable contains zero). It is very easy to add new verbs to this list. For example, if you wanted to allow “run” and “walk” as synonyms for the “go” verb, you would just add two patterns to this list: VerbPat
pattern MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb
{sl_match2, MatchGo} MatchGO, MatchGet, “GO”, 1 MatchGet, MatchDrop, “GET”, 2 MatchDrop, MatchInv, “DROP”, 3 MatchInv, MatchQuit, “INVENTORY”, 4 MatchQuit, MatchHelp, “QUIT”, 5 MatchHelp, MatchRun, “HELP”, 6 MatchRun, MatchWalk, “RUN”, 1 MatchWalk, 0, “WALK”, 1
There are only two things to consider when adding new verbs: first, don’t forget that the next field of the last verb should contain zero; second, the current version of Madventure 15. However, one beautiful thing about Madventure is that it is very easy to extend and add more nouns and verbs.
Page 964
Control Structures only allows up to seven verbs. If you want to add more you will need to make a slight modification to the main program (more on that, later). Of course, if you only want to create synonyms, as we’ve done here, you simply reuse existing verb values so there is no need to modify the main program. When you call the match routine and pass it the address of the VerbPat pattern, it scans through the input string looking for the first verb. If it finds that verb (“GO”) it sets the verb variable to the corresponding verb value at the end of the pattern. If match cannot find the first verb, it tries the second. If that fails, it tries the third, and so on. If match cannot find any of the verbs in the input string, it does not modify the verb variable (which contains zero). If there are two or more of the above verbs on the input line, match will locate the first verb in the verb list above. This may not be the first verb appearing on the line. For example, if you say “Let’s get the money and go north” the match routine will match the “go” verb, not the “get” verb. By the same token, the NounPat pattern would match the north noun, not the money noun. So this command would be identical to “GO NORTH.” The MatchNoun is almost identical to the MatchVerb macro; there is, however, one difference – the MatchNoun macro has an extra parameter which is the name of the data structure representing the given object (if there is one). Basically, all the nouns (in this version of Madventure) except NORTH, SOUTH, EAST, and WEST have some sort of data structure associated with them. The maze in Madventure consists of nine rooms defined by the data structure: Room north south west east ItemList Description Room
struct word word word word word word ends
? ? ? ? MaxWeight dup (?) ?
The north, south, west, and east fields contain near pointers to other rooms. The program uses the CurRoom variable to keep track of the player’s current position in the maze. When the player issues a “GO” command of some sort, Madventure copies the appropriate value from the north, south, west, or east field to the CurRoom variable, effectively changing the room the user is in. If one of these pointers is NULL, then the user cannot move in that direction. The direction pointers are independent of one another. If you issue the command “GO NORTH” and then issue the command “GO SOUTH” upon arriving in the new room, there is no guarantee that you will wind up in the original room. The south field of the second room may not point at the room that led you there. Indeed, there are several cases in the Madventure game where this occurs. The ItemList array contains a list of near pointers to objects that could be in the room. In the current version of this game, the objects are all the nouns except north, south, east, and west. The player can carry these objects from room to room (indeed, that is the major purpose of this game). Up to MaxWeight objects can appear in the room (MaxWeight is an assembly time constant that is currently four; so there are a maximum of four items in any one given room). If an entry in the ItemList is non-NULL, then it is a pointer to an Item object. There may be zero to MaxWeight objects in a room. The Description field contains a pointer to a zero terminated string that describes the room. The program prints this string each time through the command loop to keep the player oriented. The second major data type in Madventure is the Item structure. This structure takes the form:
Page 965
Chapter 16 Item Value Weight Key ShortDesc LongDesc WinDesc Item
struct word word word word word word ends
? ? ? ? ? ?
The Value field contains an integer value awarded to the player when the player drops this object in the appropriate room. This is how the user scores points. The Weight field usually contains one or two and determines how much this object “weighs.” The user can only carry around MaxWeight units of weight at any one given time. Each time the user picks up an object, the weight of that object is added to the user’s total weight. When the user drops an object, Madventure subtracts the object’s weight from the total. The Key field contains a pointer to a room associated with the object. When the user drops the object in the Key room, the user is awarded the points in the Value field and the object disappears from the game. If the user drops the object in some other room, the object stays in that room until the user picks it up again. The ShortDesc, LongDesc, and WinDesc fields contain pointers to zero terminated strings. Madventure prints the ShortDesc string in response to an INVENTORY command. It prints the LongDesc string when describing a room’s contents. It prints the WinDesc string when the user drops the object in its Key room and the object disappears from the game. The Madventure main program is deceptively simple. Most of the logic is hidden in the pattern matching routines and in the parsing routine. We’ve already discussed the pattern matching code; the only important thing to remember is that it initializes the noun and verb variables with a value uniquely identifying each noun and verb. The main program’s logic uses these two values as an index into a two dimensional table that takes the following form:
Table 65: Madventure Noun/Verb Table No Verb
GO
GET
DROP
No Noun North
Do North
South
Do South
East
Do East
West
Do West
Lime
Get Item
Drop Item
Beer
Get Item
Drop Item
Card
Get Item
Drop Item
Sign
Get Item
Drop Item
Program
Get Item
Drop Item
Page 966
Inventory
Quit
Help
Inventory
Quit
Help
Control Structures
Table 65: Madventure Noun/Verb Table No Verb
GO
GET
DROP
Homework
Get Item
Drop Item
Money
Get Item
Drop Item
Form
Get Item
Drop Item
Coupon
Get Item
Drop Item
Inventory
Quit
Help
The empty entries in this table correspond to illegal commands. The other entries are addresses of code within the main program that handles the given command. To add more nouns (objects) to the game, you need only extend the NounPat pattern and add additional rows to the table (of course, you may need to add code to handle the new objects if they are not easily handled by the routines above). To add new verbs you need only extended the VerbPat pattern and add new columns to this table16. Other than the goodies mentioned above, the rest of the program utilizes techniques appearing throughout this and previous chapters. The only real surprising thing about this program is that you can implement a fairly complex program with so few lines of code. But such is the advantage of using pattern matching techniques in your assembly language programs. ; MADVENT.ASM ; ; This is a “shell” of an adventure game that you can use to create ; your own adventure style games. .xlist .286 include stdlib.a includelib stdlib.lib matchfuncs .list dseg
segment
para public ‘data’
; Equates: NULL MaxWeight
; ; ; ; ; ; ; ; ; ; ; ;
equ equ
0 4
;Max weight user can carry at one time.
The “ROOM” data structure defines a room, or area, where a player can go. The NORTH, SOUTH, EAST, and WEST fields contain the address of the rooms to the north, south, east, and west of the room. The game transfers control to the room whose address appears in these fields when the player supplies a GO NORTH, GO SOUTH, etc., command. The ITEMLIST field contains a list of pointers to objects appearing in this room. In this game, the user can pick up and drop these objects (if there are any present). The DESCRIPTION field contains a (near) address of a short description of the current room/area.
16. Currently, the Madventure program computes the index into this table (a 14x8) table by shifting to the left three bits rather than multiplying by eight. You will need to modify this code if you add more columns to the table.
Page 967
Chapter 16 Room north south west east
struct word word word word
? ;Near pointers to other structures where ? ; we will wind up on the GO NORTH, GO SOUTH, ? ; etc., commands. ?
ItemList
word
MaxWeight dup (?)
Description Room
word ends
? ;Description of room.
; ; ; ; ; ; ; ; ; ; ; ; ; ;
The ITEM data structure describes the objects that may appear within a room (in the ITEMLIST above). The VALUE field contains the number of points this object is worth if the user drops it off in the proper room (i.e, solves the puzzle). The WEIGHT field provides the weight of this object. The user can only carry four units of weight at a time. This field is usually one, but may be more for larger objects. The KEY field is the address of the room where this object must be dropped to solve the problem. The SHORTDESC field is a pointer to a string that the program prints when the user executes an INVENTORY command. LONGDESC is a pointer to a string the program prints when describing the contents of a room. The WINDESC field is a pointer to a string that the program prints when the user solves the appropriate puzzle.
Item Value Weight Key ShortDesc LongDesc WinDesc Item
struct word word word word word word ends
? ? ? ? ? ?
; State variables for the player: CurRoom ItemsOnHand CurWeight CurScore TotalCounter Noun Verb NounPtr
word word word word word word word word
Room1 MaxWeight dup (?) 0 15 9 0 0 0
;Room the player is in. ;Items the player carries. ;Weight of items carried. ;Player’s current score. ;Items left to place. ;Current noun value. ;Current verb value. ;Ptr to current noun item.
; Input buffer for commands InputLine byte 128 dup (?) ; The following macros generate a pattern which will match a single word ; which appears anywhere on a line. In particular, they match a word ; at the beginning of a line, somewhere in the middle of the line, or ; at the end of a line. This program defines a word as any sequence ; of character surrounded by spaces or the beginning or end of a line. ; ; MatchNoun/Verb matches lines defined by the regular expression: ; ; (ARB* ‘ ‘ | ε) string (‘ ‘ | EOS)
Page 968
MatchNoun
macro local local
Name, next, WordString, ItemVal, ItemPtr WS1, WS2, WS3, WS4 WS5, WS6, WordStr
Name WS1 WS2 WS3
Pattern Pattern Pattern Pattern
{sl_match2, WS1, next} {MatchStr, WordStr, WS2, WS5} {arb,0,0,WS3} {Matchchar, ‘ ‘,0, WS4}
Control Structures WS4 WS5 WS6 WordStr
Pattern Pattern Pattern byte byte endm
{MatchStr, WordStr, 0, WS5} {SetNoun,ItemVal,0,WS6} {SetPtr, ItemPtr,0,MatchEOS} WordString 0
MatchVerb
macro local local
Name, next, WordString, ItemVal WS1, WS2, WS3, WS4 WS5, WordStr
Name WS1 WS2 WS3 WS4 WS5 WordStr
Pattern Pattern Pattern Pattern Pattern Pattern byte byte endm
{sl_match2, WS1, next} {MatchStr, WordStr, WS2, WS5} {arb,0,0,WS3} {Matchchar, ‘ ‘,0, WS4} {MatchStr, WordStr, 0, WS5} {SetVerb,ItemVal,0,MatchEOS} WordString 0
; Generic patterns which most of the patterns use: MatchEOS MatchSpc
Pattern Pattern
{EOS,0,MatchSpc} {MatchChar,’ ‘}
; Here are the list of nouns allowed in this program. NounPat
pattern
{sl_match2, MatchNorth}
MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun MatchNoun
MatchNorth, MatchSouth, “NORTH”, 1, 0 MatchSouth, MatchEast, “SOUTH”, 2, 0 MatchEast, MatchWest, “EAST”, 3, 0 MatchWest, MatchLime, “WEST”, 4, 0 MatchLime, MatchBeer, “LIME”, 5, Item3 MatchBeer, MatchCard, “BEER”, 6, Item9 MatchCard, MatchSign, “CARD”, 7, Item2 MatchSign, MatchPgm, “SIGN”, 8, Item1 MatchPgm, MatchHW, “PROGRAM”, 9, Item7 MatchHW, MatchMoney, “HOMEWORK”, 10, Item4 MatchMoney, MatchForm, “MONEY”, 11, Item5 MatchForm, MatchCoupon, “FORM”, 12, Item6 MatchCoupon, 0, “COUPON”, 13, Item8
; Here is the list of allowable verbs. VerbPat
pattern
{sl_match2, MatchGo}
MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb MatchVerb
MatchGO, MatchGet, “GO”, 1 MatchGet, MatchDrop, “GET”, 2 MatchDrop, MatchInv, “DROP”, 3 MatchInv, MatchQuit, “INVENTORY”, 4 MatchQuit, MatchHelp, “QUIT”, 5 MatchHelp, 0, “HELP”, 6
; Data structures for the “maze”. Room1
room
{Room1, Room5, Room4, Room2, {Item1,0,0,0}, Room1Desc}
Room1Desc
byte
“at the Commons”,0
Item1
item
{10,2,Room3,GS1,GS2,GS3}
Page 969
Chapter 16 GS1 GS2
byte byte byte byte byte byte byte byte
“a big sign”,0 “a big sign made of styrofoam with funny “ “letters on it.”,0 “The ETA PI Fraternity thanks you for return” “ing their sign, they”,cr,lf “make you an honorary life member, as long as “ “you continue to pay”,cr,lf “your $30 monthly dues, that is.”,0
Room2
room
{NULL, Room5, Room1, Room3, {Item2,0,0,0}, Room2Desc}
Room2Desc
byte
‘at the “C” on the hill above campus’,0
Item2 LC1 LC2 LC3
item byte byte byte byte byte byte byte byte
{10,1,Room1,LC1,LC2,LC3} “a lunch card”,0 “a lunch card which someone must have “ “accidentally dropped here.”, 0 “You get a big meal at the Commons cafeteria” cr,lf “It would be a good idea to go visit the “ “student health center”,cr,lf “at this time.”,0
Room3
room
{NULL, Room6, Room2, Room2, {Item3,0,0,0}, Room3Desc}
Room3Desc
byte
“at ETA PI Frat House”,0
Item3 BL1 BL2
item byte byte byte byte byte byte byte byte
{10,2,Room2,BL1,BL2,BL3} “a bag of lime”,0 “a bag of baseball field lime which someone “ “is obviously saving for”,cr,lf “a special occasion.”,0 “You spread the lime out forming a big ‘++’ “ “after the ‘C’”,cr,lf “Your friends in Computer Science hold you “ “in total awe.”,0
Room4
room
{Room1, Room7, Room7, Room5, {Item4,0,0,0}, Room4Desc}
Room4Desc
byte
“in Dr. John Smith’s Office”,0
Item4 HW1 HW2 HW3
item byte byte byte byte byte byte byte byte
{10,1,Room7,HW1,HW2,HW3} “a homework assignment”,0 “a homework assignment which appears to “ “to contain assembly language”,0 “The grader notes that your homework “ “assignment looks quite”,cr,lf “similar to someone else’s assignment “ “in the class and reports you”,cr,lf “to the instructor.”,0
Room5
room
{Room1, Room9, Room7, Room2, {Item5,0,0,0}, Room5Desc}
Room5Desc
byte
Item5 M1 M2
item byte byte byte byte byte byte
GS3
BL3
M3
Page 970
“in the computer lab”,0 {10,1,Room9,M1,M2,M3} “some money”,0 “several dollars in an envelope in the “ “trashcan”,0 “The waitress thanks you for your “ “generous tip and gets you”,cr,lf “another pitcher of beer. “
Control Structures byte byte
“Then she asks for your ID.”,cr,lf “You are at least 21 aren’t you?”,0
Room6
room
{Room3, Room9, Room5, NULL, {Item6,0,0,0}, Room6Desc}
Room6Desc
byte
“at the campus book store”,0
Item6 AD1 AD2 AD3
item byte byte byte byte byte byte byte byte
{10,1,Room8,AD1,AD2,AD3} “an add/drop/change form”,0 “an add/drop/change form filled out for “ “assembly to get a letter grade”,0 “You got the form in just in time. “ “It would have been a shame to”,cr,lf “have had to retake assembly because “ “you didn’t realize you needed to “,cr,lf “get a letter grade in the course.”,0
Room7
room
{Room1, Room7, Room4, Room8, {Item7,0,0,0}, Room7Desc}
Room7Desc
byte
“in the assembly lecture”,0
Item7 AP1 AP2
item byte byte byte byte byte byte byte byte byte
{10,1,Room5,AP1,AP2,AP3} “an assembly language program”,0 “an assembly language program due in “ “the assemblylanguage class.”,0 “The sample program the instructor gave “ “you provided all the information”,cr,lf “you needed to complete your assignment. “ “You finish your work and”,cr,lf “head to the local pub to celebrate.” cr,lf,0
Room8
room
{Room5, Room6, Room7, Room9, {Item8,0,0,0}, Room8Desc}
Room8Desc
byte
Item8 C1 C2 C3
item byte byte byte byte byte byte byte byte byte
{10,1,Room6,C1,C2,C3} “a coupon”,0 “a coupon good for a free text book”,0 ‘You get a free copy of “Cliff Notes for ‘ ‘The Art of Assembly’,cr,lf ‘Language Programming” Alas, it does not ‘ “provide all the”,cr,lf “information you need for the class, so you “ “sell it back during”,cr,lf “the book buy-back period.”,0
Room9
room
{Room6, Room9, Room8, Room3, {Item9,0,0,0}, Room9Desc}
Room9Desc Item9 B1 B2 B3
byte item byte byte byte byte byte byte byte byte
“at The Pub”,0 {10,2,Room4,B1,B2,B3} “a pitcher of beer”,0 “an ice cold pitcher of imported beer”,0 “Dr. Smith thanks you profusely for your “ “good taste in brews.”,cr,lf “He then invites you to the pub for a “ “round of pool and”,cr,lf “some heavy duty hob-nobbing, “ “CS Department style.”,0
AP3
“at the Registrar’s office”,0
Page 971
Chapter 16 dseg
ends
cseg
segment assume
; SetNoun;
Copies the value in SI (the matchparm parameter) to the NOUN variable.
SetNoun
SetNoun
proc push mov mov mov mov stc pop ret endp
; SetVerb;
Copies the value in SI (the matchparm parameter) to the VERB variable.
SetVerb
SetVerb
proc push mov mov mov mov stc pop ret endp
; SetPtr;
Copies the value in SI (the matchparm parameter) to the NOUNPTR variable.
SetPtr
proc push mov mov mov mov stc pop ret endp
SetPtr
para public ‘code’ ds:dseg
far ds ax, dseg ds, ax Noun, si ax, di ds
far ds ax, dseg ds, ax Verb, si ax, di ds
far ds ax, dseg ds, ax NounPtr, si ax, di ds
; CheckPresence; BX points at an item. DI points at an item list. This ; routine checks to see if that item is present in the ; item list. Returns Carry set if item was found, ; clear if not found. CheckPresence ; ; ; ;
Page 972
proc
MaxWeight is an assembly-time adjustable constant that determines how many objects the user can carry, or can be in a room, at one time. The following repeat macro emits “MaxWeight” compare and branch sequences to test each item pointed at by DS:DI.
ItemCnt
= repeat cmp je
0 MaxWeight bx, [di+ItemCnt] GotIt
ItemCnt
= endm
ItemCnt+2
Control Structures clc ret GotIt: CheckPresence
stc ret endp
; RemoveItem; ; ; ; ;
BX contains a pointer to an item. DI contains a pointer to an item list which contains that item. This routine searches the item list and removes that item from the list. To remove an item from the list, we need only store a zero (NULL) over the top of its pointer entry in the list.
RemoveItem
proc
; Once again, we use the repeat macro to automatically generate a chain ; of compare, branch, and remove code sequences for each possible item ; in the list. ItemCnt
NotThisOne: ItemCnt
= repeat local cmp jne mov ret
0 MaxWeight NotThisOne bx, [di+ItemCnt] NotThisOne word ptr [di+ItemCnt], NULL
= endm
ItemCnt+2
RemoveItem
ret endp
; InsertItem; ; ; ;
BX contains a pointer to an item, DI contains a pointer to and item list. This routine searches through the list for the first empty spot and copies the value in BX to that point. It returns the carry set if it succeeds. It returns the carry clear if there are no empty spots available.
InsertItem
proc
ItemCnt
= repeat local cmp jne mov stc ret
0 MaxWeight NotThisOne word ptr [di+ItemCnt], 0 NotThisOne [di+ItemCnt], bx
= endm
ItemCnt+2
NotThisOne: ItemCnt
InsertItem
clc ret endp
; LongDesc- Long description of an item. ; DI points at an item - print the long description of it. LongDesc
proc push test jz mov puts putcr
di di, di NoDescription di, [di].item.LongDesc
Page 973
Chapter 16 NoDescription: LongDesc
pop ret endp
di
; ShortDesc- Print the short description of an object. ; DI points at an item (possibly NULL). Print the short description for it. ShortDesc
ShortDesc
proc push test jz mov puts putcr pop ret endp
; Describe: ;
“CurRoom” points at the current room. Describe it and its contents.
Describe
proc push push push mov mov
NoDescription:
mov mov print byte puts putcr print byte
di di, di NoDescription di, [di].item.ShortDesc
di
es bx di di, ds es, di bx, CurRoom di, [bx].room.Description “You are currently “,0
“Here you find the following:”,cr,lf,0
; For each possible item in the room, print out the long description ; of that item. The repeat macro generates a code sequence for each ; possible item that could be in this room. ItemCnt
= repeat mov call
0 MaxWeight di, [bx].room.ItemList[ItemCnt] LongDesc
ItemCnt
= endm
ItemCnt+2
pop pop pop ret endp
di bx es
Describe
; Here is the main program, that actually plays the game. Main
proc mov mov mov meminit print byte byte byte
Page 974
ax, dseg ds, ax es, ax
cr,lf,lf,lf,lf,lf “Welcome to “,’”MADVENTURE”’,cr,lf ‘If you need help, type the command “HELP”’
Control Structures
RoomLoop:
byte
cr,lf,0
dec jnz
CurScore NotOverYet
;One point for each move.
; If they made too many moves without dropping anything properly, boot them ; out of the game. print byte byte byte jmp
“WHOA! You lost! You get to join the legions of “ “the totally lame”,cr,lf ‘who have failed at “MADVENTURE”’,cr,lf,0 Quit
; Okay, tell ‘em where they are and get a new command from them. NotOverYet:
; ; ; ; ; ; ; ; ; ; ; ; ;
; ; ; ;
putcr call print byte byte lesi gets strupr
Describe cr,lf “Command: “,0 InputLine ;Ignore case by converting to U.C.
Okay, process the command. Note that we don’t actually check to see if there is a properly formed sentence. Instead, we just look to see if any important keywords are on the line. If they are, the pattern matching routines load the appropriate values into the noun and verb variables (nouns: north=1, south=2, east=3, west=4, lime=5, beer=6, card=7, sign=8, program=9, homework=10, money=11, form=12, coupon=13; verbs: go=1, get=2, drop=3, inventory=4, quit=5, help=6). This code uses the noun and verb variables as indexes into a two dimensional array whose elements contain the address of the code to process the given command. If a given command does not make any sense (e.g., “go coupon”) the entry in the table points at the bad command code. mov mov mov
Noun, 0 Verb, 0 NounPtr, 0
ldxi xor match
VerbPat cx, cx
lesi ldxi xor match
InputLine NounPat cx, cx
Okay, index into the command table and jump to the appropriate handler. Note that we will cheat and use a 14x8 array. There are really only seven verbs, not eight. But using eight makes things easier since it is easier to multiply by eight than seven. mov
si, CurRoom;The commands expect this here.
mov shl add shl jmp
bx, Noun bx, 3 ;Multiply by eight. bx, Verb bx, 1 ;Multiply by two - word table. cseg:jmptbl[bx]
; The following table contains the noun x verb cross product. ; The verb values (in each row) are the following: ; ; NONE GO GET DROP INVNTRY QUIT HELP ; 0 1 2 3 4 5 6
unused 7
Page 975
Chapter 16 ; ; There is one row for each noun (plus row zero, corresponding to no ; noun found on line). jmptbl
word word word word word word word word
Bad Bad Bad Bad DoInventory QuitGame DoHelp Bad
NorthCmds SouthCmds EastCmds WestCmds LimeCmds BeerCmds CardCmds SignCmds ProgramCmds HomeworkCmds MoneyCmds FormCmds CouponCmds
word word word word word word word word word word word word word
Bad, Bad, Bad, Bad, Bad, Bad, Bad, Bad, Bad, Bad, Bad, Bad, Bad,
;No noun, ;No noun, ;No noun, ;No noun, ;No noun, ;No noun, ;No noun, ;N/A
no verb GO GET DROP INVENTORY QUIT HELP
GoNorth, Bad, Bad, Bad, Bad, Bad, Bad GoSouth, Bad, Bad, Bad, Bad, Bad, Bad GoEast, Bad, Bad, Bad, Bad, Bad, Bad GoWest, Bad, Bad, Bad, Bad, Bad, Bad Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad, Bad, GetItem, DropItem, Bad, Bad, Bad,
Bad Bad Bad Bad Bad Bad Bad Bad Bad
; If the user enters a command we don’t know how to process, print an ; appropriate error message down here. Bad:
; ; ; ; ; ;
“I’m sorry, I don’t understand how to ‘%s’\n”,0 InputLine NotOverYet
Handle the movement commands here. Movements are easy, all we’ve got to do is fetch the NORTH, SOUTH, EAST, or WEST pointer from the current room’s data structure and set the current room to that address. The only catch is that some moves are not legal. Such moves have a NULL (zero) in the direction field. A quick check for this case handles illegal moves.
GoNorth:
mov jmp
si, [si].room.North MoveMe
GoSouth:
mov jmp
si, [si].room.South MoveMe
GoEast:
mov jmp
si, [si].room.East MoveMe
GoWest: MoveMe:
mov test jnz printf byte byte jmp
si, [si].room.West si, si SetCurRoom
mov jmp
CurRoom, si RoomLoop
SetCurRoom:
; ; ; ; ; ;
Page 976
printf byte dword jmp
;See if move allowed.
“Sorry, you cannot go in this direction.” cr, lf, 0 RoomLoop ;Move to new room.
Handle the GetItem command down here. At this time the user has entered GET and some noun that the player can pick up. First, we will make sure that item is in this room. Then we will check to make sure that picking up this object won’t overload the player. If these two conditions are met, we’ll transfer the object from the room to the player.
Control Structures GetItem:
mov mov lea call jc printf byte byte jmp
bx, NounPtr ;Ptr to item user wants. si, CurRoom di, [si].room.ItemList;Ptr to item list in di. CheckPresence;See if in room. GotTheItem “Sorry, that item is not available here.” cr, lf, 0 RoomLoop
; Okay, see if picking up this object will overload the player. GotTheItem:
mov add cmp jbe printf byte byte jmp
ax, [bx].Item.Weight ax, CurWeight ax, MaxWeight WeightOkay “Sorry, you are already carrying too many items “ “to safely carry\nthat object\n”,0 RoomLoop
; Okay, everything’s cool, transfer the object from the room to the user. WeightOkay:
mov call lea call jmp
CurWeight, ax;Save new weight. RemoveItem ;Remove item from room. di, ItemsOnHand;Ptr to player’s list. InsertItem RoomLoop
; Handle dropped objects down here. DropItem:
lea mov call jc printf byte jmp
di, ItemsOnHand;See if the user has bx, NounPtr ; this item on hand. CheckPresence CanDropIt1 “You are not currently holding that item\n”,0 RoomLoop
; Okay, let’s see if this is the magic room where this item is ; supposed to be dropped. If so, award the user some points for ; properly figuring this out. CanDropIt1:
mov cmp jne
ax, [bx].item.key ax, CurRoom JustDropIt
; Okay, success! Print the winning message for this object. mov puts putcr
di, [bx].item.WinDesc
; Award the user some points. mov add
ax, [bx].item.value CurScore, ax
; Since the user dropped it, they can carry more things now. mov sub
ax, [bx].item.Weight CurWeight, ax
; Okay, take this from the user’s list. lea call
di, ItemsOnHand RemoveItem
; Keep track of how may objects the user has successfully dropped.
Page 977
Chapter 16 ; When this counter hits zero, the game is over. dec jnz
TotalCounter RoomLoop
printf byte byte byte byte dword jmp
“Well, you’ve found where everything goes “ “and your score is %d.\n” “You might want to play again and see if “ “you can get a better score.\n”,0 CurScore Quit
; If this isn’t the room where this object belongs, just drop the thing ; off. If this object won’t fit in this room, ignore the drop command. JustDropIt:
mov lea call jc printf byte byte jmp
di, CurRoom di, [di].room.ItemList InsertItem DroppedItem “There is insufficient room to leave “ “that item here.\n”,0 RoomLoop
; If they can drop it, do so. Don’t forget we’ve just unburdened the ; user so we need to deduct the weight of this object from what the ; user is currently carrying. DroppedItem:
lea call mov sub jmp
di, ItemsOnHand RemoveItem ax, [bx].item.Weight CurWeight, ax RoomLoop
; If the user enters the INVENTORY command, print out the objects on hand DoInventory:
printf byte byte mov call mov call mov call mov call printf byte byte dword inc jmp
“You currently have the following items in your “ “possession:”,cr,lf,0 di, ItemsOnHand[0] ShortDesc di, ItemsOnHand[2] ShortDesc di, ItemsOnHand[4] ShortDesc di, ItemsOnHand[6] ShortDesc “\nCurrent score: %d\n” “Carrying ability: %d/4\n\n”,0 CurScore,CurWeight CurScore ;This command is free. RoomLoop
; If the user requests help, provide it here. DoHelp:
Page 978
printf byte byte byte byte byte byte byte byte byte byte byte
“List of commands:”,cr,lf,lf “GO {NORTH, EAST, WEST, SOUTH}”,cr,lf “{GET, DROP} {LIME, BEER, CARD, SIGN, PROGRAM, “ “HOMEWORK, MONEY, FORM, COUPON}”,cr,lf “SHOW INVENTORY”,cr,lf “QUIT GAME”,cr,lf “HELP ME”,cr,lf,lf “Each command costs you one point.”,cr,lf “You accumulate points by picking up objects and “ “dropping them in their”,cr,lf “ appropriate locations.”,cr,lf
Control Structures byte byte byte byte byte byte jmp
“If you drop an item in its proper location, it “ “disappears from the game.”,cr,lf “The game is over if your score drops to zero or “ “you properly place”,cr,lf “ all items.”,cr,lf 0 RoomLoop
; If they quit prematurely, let ‘em know what a wimp they are! QuitGame:
16.9
printf byte byte dword
“So long, your score is %d and there are “ “still %d objects unplaced\n”,0 CurScore, TotalCounter
Quit: Main cseg
ExitPgm endp ends
;DOS macro to quit program.
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Laboratory Exercises Programming with the Standard Library Pattern Matching routines doubles the complexity. Not only must you deal with the complexities of 80x86 assembly language, you must also deal with the complexities of the pattern matching paradigm, a programming language in its own right. While you can use a program like CodeView to track down problems in an assembly language program, no such debugger exists for “programs” you write with the Standard Library’s pattern matching “language.” Although the pattern matching routines are written in assembly language, attempting to trace through a pattern using CodeView will not be very enlightening. In this laboratory exercise, you will learn how to develop some rudimentary tools to help debug pattern matching programs.
16.9.1
Checking for Stack Overflow (Infinite Loops) One common problem in pattern matching programs is the possibility of an infinite loop occurring in the pattern. This might occur, for example, if you have a left recursive production. Unfortunately, tracking down such loops in a pattern is very tedious, even with the help of a debugger like CodeView. Fortunately, there is a very simple change you can make to a program that uses patterns that will abort the program an warn you if infinite recursion exists. Infinite recursion in a pattern occurs when sl_Match2 continuously calls itself without ever returning. This overflows the stack and causes the program to crash. There is a very easy change you can make to your programs to check for stack overflow: •
In patterns where you would normally call sl_Match2, call MatchPat instead.
•
Include the following statements near the beginning of your program (before any patterns):
DEBUG
=
0
ifdef
DEBUG
;Define for debugging.
Page 979
Chapter 16 MatchPat MatchPat
textequ else textequ endif
<MatchSP> <sl_Match2>
If you define the DEBUG symbol, your patterns will call the MatchSP procedure, otherwise they will call the sl_Match2 procedure. During testing, define the DEBUG symbol. •
Insert the following procedure somewhere in your program:
MatchSP
proc cmp jbe jmp
AbortPgm:
print byte cr,lf,lf byte "Error: Stack overflow in MatchSP routine.",cr,lf,0 ExitPgm endp
MatchSP
far sp, offset StkOvrfl AbortPgm sl_Match2
This code sandwiches itself between your pattern and the sl_Match2 routine. It checks the stack pointer (sp) to see if it has dropped below a minimally acceptable point in the stack segment. If not, it continues execution by jumping to the sl_Match2 routine; otherwise it aborts program execution with an error message. •
The final change to your program is to modify the stack segment so that it looks like the following:
sseg StkOvrfl stk sseg
segment word word db ends
para stack 'stack' 64 dup (?) ? 1024 dup ("stack ")
;Buffer for stack overflow ;Stack overflow if drops ; below StkOvrfl.
After making these changes, your program will automatically stop with an error message if infinite recursion occurs since infinite recursion will most certainly cause a stack overflow17. The following code (Ex16_1a.asm on the companion CD-ROM) presents a simple calculator, similar to the calculator in the section “Evaluating Arithmetic Expressions” on page 948, although this calculator only supports addition. As noted in the comments appearing in this program, the pattern for the expression parser has a serious flaw – it uses a left recursive production. This will most certainly cause an infinite loop and a stack overflow. For your lab report: Run this program with and without the DEBUG symbol defined (i.e., comment out the definition for one run). Describe what happens. ; ; ; ; ;
EX16_1a.asm A simple floating point calculator that demonstrates the use of the UCR Standard Library pattern matching routines. Note that this program requires an FPU. .xlist .386 .387 option segment:use16 include stdlib.a includelib stdlib.lib matchfuncs .list
17. This code will also abort your program if you use too much stack space without infinite recursion. A problem in its own right.
Page 980
Control Structures ; If the symbol "DEBUG" is defined, then call the MatchSP routine ; to do stack overflow checking. If "DEBUG" is not defined, just ; call the sl_Match2 routine directly. DEBUG
MatchPat MatchPat
dseg
=
0
;Define for debugging.
ifdef textequ else textequ endif
DEBUG <MatchSP>
segment
para public 'data'
<sl_Match2>
; The following is a temporary used when converting a floating point ; string to a 64 bit real value. CurValue
real8
0.0
byte
"5+2-(3-1)",0
; A Test String: TestStr ; ; ; ; ; ; ; ; ; ; ;
Grammar for simple infix -> postfix translation operation: Semantic rules appear in braces. NOTE: This code has a serious problem. The first production is left recursive and will generate an infinite loop. E -> E+T {print result} | T {print result} T -> {fld constant} | (E)
UCR Standard Library Pattern that handles the grammar above:
; An expression consists of an "E" item followed by the end of the string: Expression EndOfString
pattern pattern
{MatchPat,E,,EndOfString} {EOS}
; An "E" item consists of an "E" item optionally followed by "+" or "-" ; and a "T" item (E -> E+T | T): E Eplus epPlus
; ; ; ; ; ; ; ; ; ;
pattern pattern pattern
{MatchPat, E,T,Eplus} {MatchChar, '+', T, epPlus} {DoFadd}
A "T" item is either a floating point constant or "(" followed by an "E" item followed by ")". The regular expression for a floating point constant is [0-9]+ ( "." [0-9]* | ) ( ((e|E) (+|-| ) [0-9]+) | ) Note: the pattern "Const" matches exactly the characters specified by the above regular expression. It is the pattern the calculator grabs when converting a string to a floating point number.
Const ConstStr Const2 Const3 Const4 Const5 Const6 Const7
pattern pattern pattern pattern pattern pattern pattern pattern
{MatchPat, ConstStr, 0, FLDConst} {MatchPat, DoDigits, 0, Const2} {matchchar, '.', Const4, Const3} {MatchPat, DoDigits, Const4, Const4} {matchchar, 'e', const5, const6} {matchchar, 'E', Succeed, const6} {matchchar, '+', const7, const8} {matchchar, '-', const8, const8}
Page 981
Chapter 16 Const8
pattern
{MatchPat, DoDigits}
FldConst
pattern
{PushValue}
; DoDigits handles the regular expression [0-9]+ DoDigits SpanDigits
pattern pattern
{Anycset, Digits, 0, SpanDigits} {Spancset, Digits}
; The S production handles constants or an expression in parentheses. T IntE CloseParen
pattern pattern pattern
{MatchChar, '(', Const, IntE} {MatchPat, E, 0, CloseParen} {MatchChar, ')'}
; The Succeed pattern always succeeds. Succeed
pattern
{DoSucceed}
; We use digits from the UCR Standard Library cset standard sets. include dseg
ends
cseg
segment assume
; ; ; ;
stdsets.a
para public 'code' cs:cseg, ds:dseg
Debugging feature #1: This is a special version of sl_Match2 that checks for stack overflow. Stack overflow occurs whenever there is an infinite loop (i.e., left recursion) in a pattern.
MatchSP
proc cmp jbe jmp
AbortPgm:
print byte cr,lf,lf byte "Error: Stack overflow in MatchSP routine.",cr,lf,0 ExitPgm endp
MatchSP
far sp, offset StkOvrfl AbortPgm sl_Match2
; DoSucceed matches the empty string. In other words, it matches anything ; and always returns success without eating any characters from the input ; string. DoSucceed
DoSucceed
proc mov stc ret endp
far ax, di
; DoFadd - Adds the two items on the top of the FPU stack. DoFadd
Page 982
DoFadd
proc faddp mov stc ret endp
far st(1), st ax, di
; PushValue;
We've just matched a string that corresponds to a floating point constant. Convert it to a floating
;Required by sl_Match ;Always succeed.
Control Structures ;
point value and push that value onto the FPU stack.
PushValue
proc push push pusha mov mov lesi patgrab atof free lesi sdfpa fld
PushValue
popa mov pop pop stc ret endp
far ds es ax, dseg ds, ax Const
CurValue CurValue
;FP val matched by this pat. ;Get a copy of the string. ;Convert to real. ;Return mem used by patgrab. ;Copy floating point accumulator ; to a local variable and then ; copy that value to the FPU stk.
ax, di es ds
; The main program tests the expression evaluator. Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
finit fwait
GoodVal:
;Be sure to do this!
lesi puts
TestStr
ldxi xor match jc printff byte ret
Expression cx, cx
fstp printff byte dword
;Print the expression
GoodVal " is an illegal expression",cr,lf,0
CurValue " = %12.6ge\n",0 CurValue
Quit: Main cseg
ExitPgm endp ends
sseg
segment word word db ends
para stack 'stack' 64 dup (?) ;Buffer for stack overflow ? ;Stack overflow if drops 1024 dup ("stack "); below StkOvrfl.
segment db ends end
para public 'zzzzzz' 16 dup (?)
StkOvrfl stk sseg zzzzzzseg LastBytes zzzzzzseg
Main
Page 983
Chapter 16
16.9.2
Printing Diagnostic Messages from a Pattern When there is no other debugging method available, you can always use print statements to help track down problems in your patterns. If your program calls pattern matching functions in your own code (like the DoFAdd, DoSucceed, and PushValue procedures in the code above), you can easily insert print or printf statements in these functions that will print an appropriate message when they execute. Unfortunately, a problem may develop in a portion of a pattern that does not call any local pattern matching functions, so inserting print statements within an existing (local) pattern matching function might not help. To solve this problem, all you need to do is insert a call to a local pattern matching function in the patterns you suspect have a problem. Rather than make up a specific local pattern to print an individual message, a better solution is to write a generic pattern matching function whose whole purpose is to display a message. The following PatPrint function does exactly this: ; PatPrint- A debugging aid. This "Pattern matching function" prints ; the string that DS:SI points at. PatPrint
PatPrint
proc push push mov mov mov puts mov pop pop stc ret endp
far es di di, ds es, di di, si ax, di di es
From “Constructing Patterns for the MATCH Routine” on page 933, you will note that the pattern matching system passes the value of the MatchParm parameter to a pattern matching function in the ds:si register pair. The PatPrint function prints the string that ds:si points at (by moving ds:si to es:di and calling puts). The following code (Ex16_1b.asm on the companion CD-ROM) demonstrates how to insert calls to PatPrint within your patterns to print out data to help you track down problems in your patterns. For your lab report: run this program and describe its output in your report. Describe how this output can help you track down the problem with this program. Modify the grammar to match the grammar in the corresponding sample program (see “Evaluating Arithmetic Expressions” on page 948) while still printing out each production that this program processes. Run the result and include the output in your lab report. ; ; ; ; ;
EX16_1a.asm A simple floating point calculator that demonstrates the use of the UCR Standard Library pattern matching routines. Note that this program requires an FPU. .xlist .386 .387 option segment:use16 include stdlib.a includelib stdlib.lib matchfuncs .list
; If the symbol "DEBUG" is defined, then call the MatchSP routine ; to do stack overflow checking. If "DEBUG" is not defined, just ; call the sl_Match2 routine directly.
Page 984
Control Structures DEBUG
MatchPat MatchPat
dseg
=
0
;Define for debugging.
ifdef textequ else textequ endif
DEBUG <MatchSP>
segment
para public 'data'
<sl_Match2>
; The following is a temporary used when converting a floating point ; string to a 64 bit real value. CurValue
real8
0.0
byte
"5+2-(3-1)",0
; A Test String: TestStr
; ; ; ; ; ; ; ; ; ;
Grammar for simple infix -> postfix translation operation: Semantic rules appear in braces. NOTE: This code has a serious problem. The first production is left recursive and will generate an infinite loop. E -> E+T {print result} | T {print result} T -> {fld constant} | (E) UCR Standard Library Pattern that handles the grammar above:
; An expression consists of an "E" item followed by the end of the string: Expression EndOfString
pattern pattern
{MatchPat,E,,EndOfString} {EOS}
; An "E" item consists of an "E" item optionally followed by "+" or "-" ; and a "T" item (E -> E+T | T): E EMsg
pattern byte
{PatPrint,EMsg,,E2} "E->E+T | T",cr,lf,0
E2 Eplus epPlus E3 EMsg3
pattern pattern pattern pattern byte
{MatchPat, E,T,Eplus} {MatchChar, '+', T, epPlus} {DoFadd,,,E3} {PatPrint,EMsg3} "E->E+T",cr,lf,0
; ; ; ; ; ; ; ; ; ;
A "T" item is either a floating point constant or "(" followed by an "E" item followed by ")". The regular expression for a floating point constant is [0-9]+ ( "." [0-9]* | ) ( ((e|E) (+|-| ) [0-9]+) | ) Note: the pattern "Const" matches exactly the characters specified by the above regular expression. It is the pattern the calculator grabs when converting a string to a floating point number.
Const ConstStr Const2 Const3 Const4 Const5 Const6
pattern pattern pattern pattern pattern pattern pattern
{MatchPat, ConstStr, 0, FLDConst} {MatchPat, DoDigits, 0, Const2} {matchchar, '.', Const4, Const3} {MatchPat, DoDigits, Const4, Const4} {matchchar, 'e', const5, const6} {matchchar, 'E', Succeed, const6} {matchchar, '+', const7, const8}
Page 985
Chapter 16 Const7 Const8
pattern pattern
{matchchar, '-', const8, const8} {MatchPat, DoDigits}
FldConst ConstMsg CMsg
pattern pattern byte
{PushValue,,,ConstMsg} {PatPrint,CMsg} "T->const",cr,lf,0
; DoDigits handles the regular expression [0-9]+ DoDigits SpanDigits
pattern pattern
{Anycset, Digits, 0, SpanDigits} {Spancset, Digits}
; The S production handles constants or an expression in parentheses. T TMsg
pattern byte
{PatPrint,TMsg,,T2} "T->(E) | const",cr,lf,0
T2 IntE CloseParen
pattern pattern pattern
{MatchChar, '(', Const, IntE} {MatchPat, E, 0, CloseParen} {MatchChar, ')',,T3}
T3 TMsg3
pattern byte
{PatPrint,TMsg3} "T->(E)",cr,lf,0
; The Succeed pattern always succeeds. Succeed
pattern
{DoSucceed}
; We use digits from the UCR Standard Library cset standard sets. include dseg
ends
cseg
segment assume
; ; ; ;
stdsets.a
para public 'code' cs:cseg, ds:dseg
Debugging feature #1: This is a special version of sl_Match2 that checks for stack overflow. Stack overflow occurs whenever there is an infinite loop (i.e., left recursion) in a pattern.
MatchSP
proc cmp jbe jmp
AbortPgm:
print byte byte ExitPgm endp
MatchSP
far sp, offset StkOvrfl AbortPgm sl_Match2
cr,lf,lf "Error: Stack overflow in MatchSP routine.",cr,lf,0
; PatPrint- A debugging aid. This "Pattern matching function" prints ; the string that DS:SI points at. PatPrint
PatPrint
Page 986
proc push push mov mov mov puts mov pop pop stc ret endp
far es di di, ds es, di di, si ax, di di es
Control Structures ; DoSucceed matches the empty string. In other words, it matches anything ; and always returns success without eating any characters from the input ; string. DoSucceed
DoSucceed
proc mov stc ret endp
far ax, di
; DoFadd - Adds the two items on the top of the FPU stack. DoFadd
DoFadd
proc faddp mov stc ret endp
; PushValue; ;
We've just matched a string that corresponds to a floating point constant. Convert it to a floating point value and push that value onto the FPU stack.
PushValue
proc push push pusha mov mov lesi patgrab atof free lesi sdfpa fld
PushValue
popa mov pop pop stc ret endp
far st(1), st ax, di
;Required by sl_Match ;Always succeed.
far ds es ax, dseg ds, ax Const
CurValue CurValue
;FP val matched by this pat. ;Get a copy of the string. ;Convert to real. ;Return mem used by patgrab. ;Copy floating point accumulator ; to a local variable and then ; copy that value to the FPU stk.
ax, di es ds
; The main program tests the expression evaluator. Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
finit fwait
;Be sure to do this!
lesi puts
TestStr
ldxi xor match jc printff byte ret
Expression cx, cx
;Print the expression
GoodVal " is an illegal expression",cr,lf,0
Page 987
Chapter 16 GoodVal:fstp
CurValue printff byte " = %12.6ge\n",0 dword CurValue
Quit: Main cseg
ExitPgm endp ends
sseg
segment word word db ends
para stack 'stack' 64 dup (?) ? 1024 dup ("stack ")
segment db ends end
para public 'zzzzzz' 16 dup (?)
StkOvrfl stk sseg zzzzzzseg LastBytes zzzzzzseg
;Buffer for stack overflow ;Stack overflow if drops ; below StkOvrfl.
Main
16.10 Programming Projects 1)
Modify the program in Section 16.8.3 (Arith2.asm on the companion CD-ROM) so that it includes some common trigonometric operations (sin, cos, tan, etc.). See the chapter on floating point arithmetic to see how to compute these functions. The syntax for the functions should be similar to “sin(E)” where “E” represents an arbitrary expression.
2)
Modify the (English numeric input problem in Section 16.8.1 to handle negative numbers. The pattern should allow the use of the prefixes “negative” or “minus” to denote a negative number.
3)
Modify the (English) numeric input problem in Section 16.8.1 to handle four byte unsigned integers.
4)
Write your own “Adventure” game based on the programming techniques found in the “Madventure” game in Section 16.8.5.
5)
Write a “tiny assembler” for the modern version of the x86 processor using the techniques found in Section 16.8.4.
6)
Write a simple “DOS Shell” program that reads a line of text from the user and processes valid DOS commands found on that line. Handle at least the DEL, RENAME, TYPE, and COPY commands. See “MS-DOS, PC-BIOS, and File I/O” on page 699 for information concerning the implementation of these DOS commands.
16.11 Summary This has certainly been a long chapter. The general topic of pattern matching receives insufficient attention in most textbooks. In fact, you rarely see more than a dozen or so pages dedicated to it outside of automata theory texts, compiler texts, or texts covering pattern matching languages like Icon or SNOBOL4. That is one of the main reasons this chapter is extensive, to help cover the paucity of information available elsewhere. However, there is another reason for the length of this chapter and, especially, the number of lines of code appearing in this chapter – to demonstrate how easy it is to develop certain classes of programs using pattern matching techniques. Could you imagine having to write a program like Madventure using standard C or Pascal programming techniques? The resulting program would probably be longer than the assembly version appearing in this chapter! If you are not impressed with the power of pattern matching, you should probably reread this chapter. It is very surprising how few programmers truly understand the theory of pattern matching; especially considering how many program use, or could benefit from, pattern matching techniques. Page 988
Control Structures This chapter begins by discussing the theory behind pattern matching. It discusses simple patterns, known as regular languages, and describes how to design nondeterministic and deterministic finite state automata – the functions that match patterns described by regular expressions. This chapter also describes how to convert NFAs and DFAs into assembly language programs. For the details, see • • • • • • • • •
“An Introduction to Formal Language (Automata) Theory” on page 883 “Machines vs. Languages” on page 883 “Regular Languages” on page 884 “Regular Expressions” on page 885 “Nondeterministic Finite State Automata (NFAs)” on page 887 “Converting Regular Expressions to NFAs” on page 888 “Converting an NFA to Assembly Language” on page 890 “Deterministic Finite State Automata (DFAs)” on page 893 “Converting a DFA to Assembly Language” on page 895
Although the regular languages are probably the most commonly processed patterns in modern pattern matching programs, they are also only a small subset of the possible types of patterns you can process in a program. The context free languages include all the regular languages as a subset and introduce many types of patterns that are not regular. To represent a context free language, we often use a context free grammar. A CFG contains a set of expressions known as productions. This set of productions, a set of nonterminal symbols, a set of terminal symbols, and a special nonterminal, the starting symbol, provide the basis for converting powerful patterns into a programming language. In this chapter, we’ve covered a special set of the context free grammars known as LL(1) grammars. To properly encode a CFG as an assembly language program, you must first convert the grammar to an LL(1) grammar. This encoding yields a recursive descent predictive parser. Two primary steps required before converting a grammar to a program that recognizes strings in the context free language is to eliminate left recursion from the grammar and left factor the grammar. After these two steps, it is relatively easy to convert a CFG to an assembly language program. For more information on CFGs, see • • • •
“Context Free Languages” on page 900 “Eliminating Left Recursion and Left Factoring CFGs” on page 903 “Converting CFGs to Assembly Language” on page 905 “Some Final Comments on CFGs” on page 912
Sometimes it is easier to deal with regular expressions rather than context free grammars. Since CFGs are more powerful than regular expressions, this text generally adopts grammars whereever possible However, regular expressions are generally easier to work with (for simple patterns), especially in the early stages of development. Sooner or later, though, you may need to convert a regular expression to a CFG so you can combine it with other components of the grammar. This is very easy to do and there is a simple algorithm to convert REs to CFGs. For more details, see •
“Converting REs to CFGs” on page 905
Although converting CFGs to assembly language is a straightforward process, it is very tedious. The UCR Standard Library includes a set of pattern matching routines that completely eliminate this tedium and provide many additional capabilities as well (such as automatic backtracking, allowing you to encode grammars that are not LL(1)). The pattern matching package in the Standard Library is probably the most novel and powerful set of routines available therein. You should definitely investigate the use of these routines, they can save you considerable time. For more information, see • •
“The UCR Standard Library Pattern Matching Routines” on page 913 “The Standard Library Pattern Matching Functions” on page 914
One neat feature the Standard Library provides is your ability to write customized pattern matching functions. In addition to letting you provide pattern matching facilities
Page 989
Chapter 16 missing from the library, these pattern matching functions let you add semantic rules to your grammars. For all the details, see • • •
“Designing Your Own Pattern Matching Routines” on page 922 “Extracting Substrings from Matched Patterns” on page 925 “Semantic Rules and Actions” on page 929
Although the UCR Standard Library provides a powerful set of pattern matching routines, its richness may be its primary drawback. Those who encounter the Standard Library’s pattern matching routines for the first time may be overwhelmed, especially when attempting to reconcile the material in the section on context free grammars with the Standard Library patterns. Fortunately, there is a straightforward, if inefficient, way to translate CFGs into Standard Library patterns. This technique is outlined in •
“Constructing Patterns for the MATCH Routine” on page 933
Although pattern matching is a very powerful paradigm that most programmers should familiarize themselves with, most people have a hard time seeing the applications when they first encounter pattern matching. Therefore, this chapter concludes with some very complete programs that demonstrate pattern matching in action. These examples appear in the section: •
Page 990
“Some Sample Pattern Matching Applications” on page 935
Control Structures
16.12 Questions 1)
Assume that you have two inputs that are either zero or one. Create a DFA to implement the following logic functions (assume that arriving in a final state is equivalent to being true, if you wind up in a non-accepting state you return false) a) OR
b) XOR
e) Equals (XNOR)
f) AND
c) NAND
A Input
B Input
0 1
d) NOR
1 3
5
Example, A
2)
If r, s, and t are regular expressions, what strings with the following regular expressions match? a) r*
b) r s
c) r+
d) r | s
3)
Provide a regular expression for integers that allow commas every three digits as per U.S. syntax (e.g., for every three digits from the right of the number there must be exactly one comma). Do not allow misplaced commas.
4)
Pascal real constants must have at least one digit before the decimal point. Provide a regular expression for FORTRAN real constants that does not have this restriction.
5)
In many language systems (e.g., FORTRAN and C) there are two types of floating point numbers, single precision and double precision. Provide a regular expression for real numbers that allows the input of floating point numbers using any of the characters [dDeE] as the exponent symbol (d/D stands for double precision).
6)
Provide an NFA that recognizes the mnemonics for the 886 instruction set.
7)
Convert the NFA above into assembly language. Do not use the Standard Library pattern matching routines.
8)
Repeat question (7) using the Standard Library pattern matching routines.
9)
Create a DFA for Pascal identifiers.
10)
Convert the above DFA to assembly code using straight assembly statements.
11)
Convert the above DFA to assembly code using a state table with input classification. Describe the data in your classification table.
12)
Eliminate left recursion from the following grammar: Stmt
→ | | |
if expression then Stmt endif if expression then Stmt else Stmt endif Stmt ; Stmt
ε
13)
Left factor the grammar you produce in problem 12.
14)
Convert the result from question (13) into assembly language without using the Standard Library pattern matching routines.
15)
Convert the result from question (13) in assembly language using the Standard Library pattern matching routines. Page 991
Chapter 16
16)
Convert the regular expression obtained in question (3) to a set of productions for a context free grammar.
17)
Why is the ARB matching function inefficient? Describe how the pattern (ARB “hello” ARB) would match the string “hello there”.
18)
Spancset matches zero or more occurrences of some characters in a character set. Write a pattern match-
ing function, callable as the first field of the pattern data type, that matches one or more occurrences of some character (feel free to look at the sources for spancset). 19)
Write the matchichar pattern matching function that matches an individual character regardless of case (feel free to look at the sources for matchchar).
20)
Explain how to use a pattern matching function to implement a semantic rule.
21)
How would you extract a substring from a matched pattern?
22)
What are parenthetical patterns? How to you create them?
Page 992
Interrupts, Traps, and Exceptions
Chapter 17
The concept of an interrupt is something that has expanded in scope over the years. The 80x86 family has only added to the confusion surrounding interrupts by introducing the int (software interrupt) instruction. Indeed, different manufacturers have used terms like exceptions, faults, aborts, traps, and interrupts to describe the phenomena this chapter discusses. Unfortunately, there is no clear consensus as to the exact meaning of these terms. Different authors adopt different terms to their own use. While it is tempting to avoid the use of such misused terms altogether, for the purpose of discussion it would be nice to have a set of well defined terms we can use in this chapter. Therefore, we will pick three of the terms above, interrupts, traps, and exceptions, and define them. This chapter attempts to use the most common meanings for these terms, but don’t be surprised to find other texts using them in different contexts. On the 80x86, there are three types of events commonly known as interrupts: traps, exceptions, and interrupts (hardware interrupts). This chapter will describe each of these forms and discuss their support on the 80x86 CPUs and PC compatible machines. Although the terms trap and exception are often used synonymously, we will use the term trap to denote a programmer initiated and expected transfer of control to a special handler routine. In many respects, a trap is nothing more than a specialized subroutine call. Many texts refer to traps as software interrupts. The 80x86 int instruction is the main vehicle for executing a trap. Note that traps are usually unconditional; that is, when you execute an int instruction, control always transfers to the procedure associated with the trap. Since traps execute via an explicit instruction, it is easy to determine exactly which instructions in a program will invoke a trap handling routine. An exception is an automatically generated trap (coerced rather than requested) that occurs in response to some exceptional condition. Generally, there isn’t a specific instruction associated with an exception1, instead, an exception occurs in response to some degenerate behavior of normal 80x86 program execution. Examples of conditions that may raise (cause) an exception include executing a division instruction with a zero divisor, executing an illegal opcode, and a memory protection fault. Whenever such a condition occurs, the CPU immediately suspends execution of the current instruction and transfers control to an exception handler routine. This routine can decide how to handle the exceptional condition; it can attempt to rectify the problem or abort the program and print an appropriate error message. Although you do not generally execute a specific instruction to cause an exception, as with the software interrupts (traps), execution of some instruction is what causes an exception. For example, you only get a division error when executing a division instruction somewhere in a program. Hardware interrupts, the third category that we will refer to simply as interrupts, are program control interruption based on an external hardware event (external to the CPU). These interrupts generally have nothing at all to do with the instructions currently executing; instead, some event, such as pressing a key on the keyboard or a time out on a timer chip, informs the CPU that a device needs some attention. The CPU interrupts the currently executing program, services the device, and then returns control back to the program. An interrupt service routine is a procedure written specifically to handle a trap, exception, or interrupt. Although different phenomenon cause traps, exceptions, and interrupts, the structure of an interrupt service routine, or ISR, is approximately the same for each of these.
1. Although we will classify the into instruction in this category. This is an exception to this rule.
Page 995 Thi d
t
t d ith F
M k
402
Chapter 17
17.1
80x86 Interrupt Structure and Interrupt Service Routines (ISRs) Despite the different causes of traps, exceptions, and interrupts, they share a common format for their handling routines. Of course, these interrupt service routines will perform different activities depending on the source of the invocation, but it is quite possible to write a single interrupt handling routine that processes traps, exceptions, and hardware interrupts. This is rarely done, but the structure of the 80x86 interrupt system allows this. This section will describe the 80x86’s interrupt structure and how to write basic interrupt service routines for the 80x86 real mode interrupts. The 80x86 chips allow up to 256 vectored interrupts. This means that you can have up to 256 different sources for an interrupt and the 80x86 will directly call the service routine for that interrupt without any software processing. This is in contrast to nonvectored interrupts that transfer control directly to a single interrupt service routine, regardless of the interrupt source. The 80x86 provides a 256 entry interrupt vector table beginning at address 0:0 in memory. This is a 1K table containing 256 4-byte entries. Each entry in this table contains a segmented address that points at the interrupt service routine in memory. Generally, we will refer to interrupts by their index into this table, so interrupt zero’s address (vector) is at memory location 0:0, interrupt one’s vector is at address 0:4, interrupt two’s vector is at address 0:8, etc. When an interrupt occurs, regardless of source, the 80x86 does the following: 1)
The CPU pushes the flags register onto the stack.
2)
The CPU pushes a far return address (segment:offset) onto the stack, segment value first.
3)
The CPU determines the cause of the interrupt (i.e., the interrupt number) and fetches the four byte interrupt vector from address 0:vector*4.
4)
The CPU transfers control to the routine specified by the interrupt vector table entry.
After the completion of these steps, the interrupt service routine takes control. When the interrupt service routine wants to return control, it must execute an iret (interrupt return) instruction. The interrupt return pops the far return address and the flags off the stack. Note that executing a far return is insufficient since that would leave the flags on the stack. There is one minor difference between how the 80x86 processes hardware interrupts and other types of interrupts – upon entry into the hardware interrupt service routine, the 80x86 disables further hardware interrupts by clearing the interrupt flag. Traps and exceptions do not do this. If you want to disallow further hardware interrupts within a trap or exception handler, you must explicitly clear the interrupt flag with a cli instruction. Conversely, if you want to allow interrupts within a hardware interrupt service routine, you must explicitly turn them back on with an sti instruction. Note that the 80x86’s interrupt disable flag only affects hardware interrupts. Clearing the interrupt flag will not prevent the execution of a trap or exception. ISRs are written like almost any other assembly language procedure except that they return with an iret instruction rather than ret. Although the distance of the ISR procedure (near vs. far) is usually of no significance, you should make all ISRs far procedures. This will make programming easier if you decide to call an ISR directly rather than using the normal interrupt handling mechanism. Exceptions and hardware interrupts ISRs have a very special restriction: they must preserve the state of the CPU. In particular, these ISRs must preserve all registers they modify. Consider the following extremely simple ISR: SimpleISR
SimpleISR
Page 996
proc mov iret endp
far ax, 0
The 80x86 Instruction Set This ISR obviously does not preserve the machine state; it explicitly disturbs the value in ax and then returns from the interrupt. Suppose you were executing the following code
segment when a hardware interrupt transferred control to the above ISR: mov add
ax, 5 ax, 2
; Suppose the interrupt occurs here. puti . . .
The interrupt service routine would set the ax register to zero and your program would print zero rather than the value five. Worse yet, hardware interrupts are generally asynchronous, meaning they can occur at any time and rarely do they occur at the same spot in a program. Therefore, the code sequence above would print seven most of the time; once in a great while it might print zero or two (it will print two if the interrupt occurs between the mov ax, 5 and add ax, 2 instructions). Bugs in hardware interrupt service routines are very difficult to find, because such bugs often affect the execution of unrelated code. The solution to this problem, of course, is to make sure you preserve all registers you use in the interrupt service routine for hardware interrupts and exceptions. Since trap calls are explicit, the rules for preserving the state of the machine in such programs is identical to that for procedures. Writing an ISR is only the first step to implementing an interrupt handler. You must also initialize the interrupt vector table entry with the address of your ISR. There are two common ways to accomplish this – store the address directly in the interrupt vector table or call DOS and let DOS do the job for you. Storing the address yourself is an easy task. All you need to do is load a segment register with zero (since the interrupt vector table is in segment zero) and store the four byte address at the appropriate offset within that segment. The following code sequence initializes the entry for interrupt 255 with the address of the SimpleISR routine presented earlier: mov mov pushf cli mov mov popf
ax, 0 es, ax
word ptr es:[0ffh*4], offset SimpleISR word ptr es:[0ffh*4 + 2], seg SimpleISR
Note how this code turns off the interrupts while changing the interrupt vector table. This is important if you are patching a hardware interrupt vector because it wouldn’t do for the interrupt to occur between the last two mov instructions above; at that point the interrupt vector is in an inconsistent state and invoking the interrupt at that point would transfer control to the offset of SimpleISR and the segment of the previous interrupt 0FFh handler. This, of course, would be a disaster. The instructions that turn off the interrupts while patching the vector are unnecessary if you are patching in the address of a trap or exception handler2. Perhaps a better way to initialize an interrupt vector is to use DOS’ Set Interrupt Vector call. Calling DOS (see “MS-DOS, PC-BIOS, and File I/O” on page 699) with ah equal to 25h provides this function. This call expects an interrupt number in the al register and the address of the interrupt service routine in ds:dx. The call to MS-DOS that would accomplish the same thing as the code above is
2. Strictly speaking, this code sequence does not require the pushf, cli, and popf instructions because interrupt 255 does not correspond to any hardware interrupt on a typical PC machine. However, it is important to provide this example so you’re aware of the problem.
Page 997
Chapter 17 mov mov mov lea int mov mov
ax, dx, ds, dx, 21h ax, ds,
25ffh seg SimpleISR dx SimpleISR
;AH=25h, AL=0FFh. ;Load DS:DX with ; address of ISR ;Call DOS ;Restore DS so it ; points back at DSEG.
dseg ax
Although this code sequence is a little more complex than poking the data directly into the interrupt vector table, it is safer. Many programs monitor changes made to the interrupt vector table through DOS. If you call DOS to change an interrupt vector table entry, those programs will become aware of your changes. If you circumvent DOS, those programs may not find out that you’ve patched in your own interrupt and could malfunction. Generally, it is a very bad idea to patch the interrupt vector table and not restore the original entry after your program terminates. Well behaved programs always save the previous value of an interrupt vector table entry and restore this value before termination. The following code sequences demonstrate how to do this. First, by patching the table directly: mov mov
ax, 0 es, ax
; Save the current entry in the dword variable IntVectSave: mov mov mov mov
ax, es:[IntNumber*4] word ptr IntVectSave, ax ax, es:[IntNumber*4 + 2] word ptr IntVectSave+2, ax
; Patch the interrupt vector table with the address of our ISR pushf cli mov mov
;Required if this is a hw interrupt. ; “ “ “ “ “ “ “ word ptr es:[IntNumber*4], offset OurISR word ptr es:[IntNumber*4+2], seg OurISR
popf
;Required if this is a hw interrupt.
; Okay, do whatever it is that this program is supposed to do: . . .
; Restore the interrupt vector entries before quitting: mov mov pushf cli mov mov mov mov popf
ax, 0 es, ax ;Required if this is a hw interrupt. ; “ “ “ “ “ “ ax, word ptr IntVectSave es:[IntNumber*4], ax ax, word ptr IntVectSave+2 es:[IntNumber*4 + 2], ax ;Required if this is a hw interrupt.
. . .
If you would prefer to call DOS to save and restore the interrupt vector table entries, you can obtain the address of an existing interrupt table entry using the DOS Get Interrupt Vector call. This call, with ah=35h, expects the interrupt number in al; it returns the existing vector for that interrupt in the es:bx registers. Sample code that preserves the interrupt vector using DOS is
Page 998
The 80x86 Instruction Set ; Save the current entry in the dword variable IntVectSave: mov int mov mov
ax, 3500h + IntNumber 21h word ptr IntVectSave, bx word ptr IntVectSave+2, es
;AH=35h, AL=Int #.
; Patch the interrupt vector table with the address of our ISR mov mov lea mov int
dx, ds, dx, ax, 21h
seg OurISR dx OurISR 2500h + IntNumber
;AH=25, AL=Int #.
; Okay, do whatever it is that this program is supposed to do: . . .
; Restore the interrupt vector entries before quitting: lds mov int
bx, IntVectSave ax, 2500h+IntNumber 21h
;AH=25, AL=Int #.
. . .
17.2
Traps A trap is a software-invoked interrupt. To execute a trap, you use the 80x86 int (software interrupt) instruction3. There are only two primary differences between a trap and an arbitrary far procedure call: the instruction you use to call the routine (int vs. call) and the fact that a trap pushes the flags on the stack so you must use the iret instruction to return from it. Otherwise, there really is no difference between a trap handler’s code and the body of a typical far procedure. The main purpose of a trap is to provide a fixed subroutine that various programs can call without having to actually know the run-time address. MS-DOS is the perfect example. The int 21h instruction is an example of a trap invocation. Your programs do not have to know the actual memory address of DOS’ entry point to call DOS. Instead, DOS patches the interrupt 21h vector when it loads into memory. When you execute int 21h, the 80x86 automatically transfers control to DOS’ entry point, whereever in memory that happens to be. There is a long lists of support routines that use the trap mechanism to link application programs to themselves. DOS, BIOS, the mouse drivers, and Netware are a few examples. Generally, you would use a trap to call a resident program function. Resident programs (see “Resident Programs” on page 1025) load themselves into memory and remain resident once they terminate. By patching an interrupt vector to point at a subroutine within the resident code, other programs that run after the resident program terminates can call the resident subroutines by executing the appropriate int instruction. Most resident programs do not use a separate interrupt vector entry for each function they provide. Instead, they usually patch a single interrupt vector and transfer control to an appropriate routine using a function number that the caller passes in a register. By convention, most resident programs expect the function number in the ah register. A typical trap handler would execute a case statement on the value in the ah register and transfer control to the appropriate handler function.
3. You can also simulate an int instruction by pushing the flags and executing a far call to the trap handler. We will consider this mechanism later on.
Page 999
Chapter 17 Since trap handlers are virtually identical to far procedures in terms of use, we will not discuss traps in any more detail here. However, the text chapter will explore this subject in greater depth when it discusses resident programs.
17.3
Exceptions Exceptions occur (are raised) when an abnormal condition occurs during execution. There are fewer than eight possible exceptions on machines running in real mode. Protected mode execution provides many others, but we will not consider those here, we will only consider those exceptions interesting to those working in real mode4. Although exception handlers are user defined, the 80x86 hardware defines the exceptions that can occur. The 80x86 also assigns a fixed interrupt number to each of the exceptions. The following sections describe each of these exceptions in detail. In general, an exception handler should preserve all registers. However, there are several special cases where you may want to tweak a register value before returning. For example, if you get a bounds violation, you may want to modify the value in the register specified by the bound instruction before returning. Nevertheless, you should not arbitrarily modify registers in an exception handling routine unless you intend to immediately abort the execution of your program.
17.3.1
Divide Error Exception (INT 0) This exception occurs whenever you attempt to divide a value by zero or the quotient does not fit in the destination register when using the div or idiv instructions. Note that the FPU’s fdiv and fdivr instructions do not raise this exception. MS-DOS provides a generic divide exception handler that prints a message like “divide error” and returns control to MS-DOS. If you want to handle division errors yourself, you must write your own exception handler and patch the address of this routine into location 0:0. On 8086, 8088, 80186, and 80188 processors, the return address on the stack points at the next instruction after the divide instruction. On the 80286 and later processors, the return address points at the beginning of the divide instruction (include any prefix bytes that appear). When a divide exception occurs, the 80x86 registers are unmodified; that is, they contain the values they held when the 80x86 first executed the div or idiv instruction. When a divide exception occurs, there are three reasonable things you can attempt: abort the program (the easy way out), jump to a section of code that attempts to continue program execution in view of the error (e.g., as the user to reenter a value), or attempt to figure out why the error occurred, correct it, and reexecute the division instruction. Few people choose this last alternative because it is so difficult.
17.3.2
Single Step (Trace) Exception (INT 1) The single step exception occurs after every instruction if the trace bit in the flags register is equal to one. Debuggers and other programs will often set this flag so they can trace the execution of a program. When this exception occurs, the return address on the stack is the address of the next instruction to execute. The trap handler can decode this opcode and decide how to proceed. Most debuggers use the trace exception to check for watchpoints and other events that change dynamically during program execution. Debuggers that use the trace excep-
4. For more details on exceptions in protected mode, see the bibliography.
Page 1000
The 80x86 Instruction Set tion for single stepping often disassemble the next instruction using the return address on the stack as a pointer to that instruction’s opcode bytes. Generally, a single step exception handler should preserve all 80x86 registers and other state information. However, you will see an interesting use of the trace exception later in this text where we will purposely modify register values to make one instruction behave like another (see “The PC Keyboard” on page 1153). Interrupt one is also shared by the debugging exceptions capabilities of 80386 and later processors. These processors provide on-chip support via debugging registers. If some condition occurs that matches a value in one of the debugging registers, the 80386 and later CPUs will generate a debugging exception that uses interrupt vector one.
17.3.3
Breakpoint Exception (INT 3) The breakpoint exception is actually a trap, not an exception. It occurs when the CPU executes an int 3 instruction. However, we will consider it an exception since programmers rarely put int 3 instructions directly into their programs. Instead, a debugger like Codeview often manages the placement and removal of int 3 instructions. When the 80x86 calls a breakpoint exception handling routine, the return address on the stack is the address of the next instruction after the breakpoint opcode. Note, however, that there are actually two int instructions that transfer control through this vector. Generally, though, it is the one-byte int 3 instruction whose opcode is 0cch; otherwise it is the two byte equivalent: 0cdh, 03h.
17.3.4
Overflow Exception (INT 4/INTO) The overflow exception, like int 3, is technically a trap. The CPU only raises this exception when you execute an into instruction and the overflow flag is set. If the overflow flag is clear, the into instruction is effectively a nop, if the overflow flag is set, into behaves like an int 4 instruction. Programmers can insert an into instruction after an integer computation to check for an arithmetic overflow. Using into is equivalent to the following code sequence: « Some integer arithmetic code » jno GoodCode int 4 GoodCode:
One big advantage to the into instruction is that it does not flush the pipeline or prefetch queue if the overflow flag is not set. Therefore, using the into instruction is a good technique if you provide a single overflow handler (that is, you don’t have some special code for each sequence where an overflow could occur). The return address on the stack is the address of the next instruction after into. Generally, an overflow handler does not return to that address. Instead, it will usually abort the program or pop the return address and flags off the stack and attempt the computation in a different way.
17.3.5
Bounds Exception (INT 5/BOUND) Like into, the bound instruction (see “The INT, INTO, BOUND, and IRET Instructions” on page 292) will cause a conditional exception. If the specified register is outside the specified bounds, the bound instruction is equivalent to an int 5 instruction; if the register is within the specified bounds, the bound instruction is effectively a nop. The return address that bound pushes is the address of the bound instruction itself, not the instruction following bound. If you return from the exception without modifying the
Page 1001
Chapter 17 value in the register (or adjusting the bounds), you will generate an infinite loop because the code will reexecute the bound instruction and repeat this process over and over again. One sneaky trick with the bound instruction is to generate a global minimum and maximum for an array of signed integers. The following code demonstrates how you can do this: ; This program demonstrates how to compute the minimum and maximum values ; for an array of signed integers using the bound instruction .xlist .286 include stdlib.a includelib stdlib.lib .list dseg
segment
para public ‘data’
; The following two values contain the bounds for the BOUND instruction. LowerBound UpperBound
word word
? ?
; Save the INT 5 address here: OldInt5
dword
?
; Here is the array we want to compute the minimum and maximum for: Array
ArraySize
word word word =
dseg
ends
cseg
segment assume
; ; ; ; ; ; ; ; ; ;
1, 2, -5, 345, -26, 23, 200, 35, -100, 20, 45 62, -30, -1, 21, 85, 400, -265, 3, 74, 24, -2 1024, -7, 1000, 100, -1000, 29, 78, -87, 60 ($-Array)/2
para public ‘code’ cs:cseg, ds:dseg
Our interrupt 5 ISR. It compares the value in AX with the upper and lower bounds and stores AX in one of them (we know AX is out of range by virtue of the fact that we are in this ISR). Note: in this particular case, we know that DS points at dseg, so this ISR will get cheap and not bother reloading it. Warning: This code does not handle the conflict between bound/int5 and the print screen key. Pressing prtsc while executing this code may produce incorrect results (see the text).
BoundISR
proc cmp jl
near ax, LowerBound NewLower
; Must be an upper bound violation.
NewLower: BoundISR Main
Page 1002
mov iret
UpperBound, ax
mov iret endp
LowerBound, ax
proc mov mov meminit
ax, dseg ds, ax
The 80x86 Instruction Set ; Begin by patching in the address of our ISR into int 5’s vector. mov mov mov mov mov mov
ax, 0 es, ax ax, es:[5*4] word ptr OldInt5, ax ax, es:[5*4 + 2] word ptr OldInt5+2, ax
mov mov
word ptr es:[5*4], offset BoundISR es:[5*4 + 2], cs
; Okay, process the array elements. Begin by initializing the upper ; and lower bounds values with the first element of the array. mov mov mov
ax, Array LowerBound, ax UpperBound, ax
; Now process each element of the array:
GetMinMax:
mov mov mov bound add loop
bx, 2 cx, ArraySize ax, Array[bx] ax, LowerBound bx, 2 GetMinMax
;Start with second element.
printf byte byte dword
“The minimum value is %d\n” “The maximum value is %d\n”,0 LowerBound, UpperBound
;Move on to next element. ;Repeat for each element.
; Okay, restore the interrupt vector: mov mov mov mov mov mov
ax, 0 es, ax ax, word ptr OldInt5 es:[5*4], ax ax, word ptr OldInt5+2 es:[5*4+2], ax
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
If the array is large and the values appearing in the array are relatively random, this code demonstrates a fast way to determine the minimum and maximum values in the array. The alternative, comparing each element against the upper and lower bounds and storing the value if outside the range, is generally a slower approach. True, if the bound instruction causes a trap, this is much slower than the compare and store method. However, it a large array with random values, the bounds violation will rarely occur. Most of the time the bound instruction will execute in 7-13 clock cycles and it will not flush the pipeline or the prefetch queue5.
Page 1003
Chapter 17 Warning: IBM, in their infinite wisdom, decided to use int 5 as the print screen operation. The default int 5 handler will dump the current contents of the screen to the printer. This has two implications for those who would like to use the bound instruction in their programs. First, if you do not install your own int 5 handler and you execute a bound instruction that generates a bound exception, you will cause the machine to print the contents of the screen. Second, if you press the PrtSc key with your int 5 handler installed, BIOS will invoke your handler. The former case is a programming error, but this latter case means you have to make your bounds exception handler a little smarter. It should look at the byte pointed at by the return address. If this is an int 5 instruction opcode (0cdh), then you need to call the original int 5 handler, or simply return from interrupt (do you want them pressing the PrtSc key at that point?). If it is not an int 5 opcode, then this exception was probably raised by the bound instruction. Note that when executing a bound instruction the return address may not be pointing directly at a bound opcode (0c2h). It may be pointing at a prefix byte to the bound instruction (e.g., segment, addressing mode, or size override). Therefore, it is best to check for the int 5 opcode.
17.3.6
Invalid Opcode Exception (INT 6) The 80286 and later processors raise this exception if you attempt to execute an opcode that does not correspond to a legal 80x86 instruction. These processors also raise this exception if you attempt to execute a bound, lds, les, lidt, or other instruction that requires a memory operand but you specify a register operand in the mod/rm field of the mod/reg/rm byte. The return address on the stack points at the illegal opcode. By examining this opcode, you can extend the instruction set of the 80x86. For example, you could run 80486 code on an 80386 processor by providing subroutines that mimic the extra 80486 instructions (like bswap, cmpxchg, etc.).
17.3.7
Coprocessor Not Available (INT 7) The 80286 and later processors raise this exception if you attempt to execute an FPU (or other coprocessor) instruction without having the coprocessor installed. You can use this exception to simulate the coprocessor in software. On entry to the exception handler, the return address points at the coprocessor opcode that generated the exception.
17.4
Hardware Interrupts Hardware interrupts are the form most engineers (as opposed to PC programmers) associate with the term interrupt. We will adopt this same strategy henceforth and will use the non-modified term “interrupt” to mean a hardware interrupt. On the PC, interrupts come from many different sources. The primary sources of interrupts, however, are the PCs timer chip, keyboard, serial ports, parallel ports, disk drives, CMOS real-time clock, mouse, sound cards, and other peripheral devices. These devices connect to an Intel 8259A programmable interrupt controller (PIC) that prioritizes the interrupts and interfaces with the 80x86 CPU. The 8259A chip adds considerable complexity to the software that processes interrupts, so it makes perfect sense to discuss the PIC first, before trying to describe how the interrupt service routines have to deal with it. Afterwards, this section will briefly describe each device and the conditions under which
5. Note that on the 80486 and later processors, the bound instruction may actually be slower than the corresponding straight line code.
Page 1004
The 80x86 Instruction Set it interrupts the CPU. This text will fully describe many of these devices in later chapters, so this chapter will not go into a lot of detail except when discussing the timer interrupt.
17.4.1
The 8259A Programmable Interrupt Controller (PIC) The 8259A (82596 or PIC, hereafter) programmable interrupt controller chip accepts interrupts from up to eight different devices. If any one of the devices requests service, the 8259 will toggle an interrupt output line (connected to the CPU) and pass a programmable interrupt vector to the CPU. You can cascade the device to support up to 64 devices by connecting nine 8259s together: eight of the devices with eight inputs each whose outputs become the eight inputs of the ninth device. A typical PC uses two of these devices to provide 15 interrupt inputs (seven on the master PIC with the eight input coming from the slave PIC to process its eight inputs)7. The sections following this one will describe the devices connected to each of those inputs, for now we will concentrate on what the 8259 does with those inputs. Nevertheless, for the sake of discussion, the following table lists the interrupt sources on the PC:
Table 66: 8259 Programmable Interrupt Controller Inputs Input on 8259
80x86 INT
Device
IRQ 0
8
Timer chip
IRQ 1
9
Keyboard
IRQ 2
0Ah
Cascade for controller 2 (IRQ 8-15)
IRQ 3
0Bh
Serial port 2
IRQ 4
0Ch
Serial port 1
IRQ 5
0Dh
Parallel port 2 in AT, reserved in PS/2 systems
IRQ 6
0Eh
Diskette drive
IRQ 7
0Fh
Parallel port 1
IRQ 8/0
70h
Real-time clock
IRQ 9/1
71h
CGA vertical retrace (and other IRQ 2 devices)
IRQ 10/2
72h
Reserved
IRQ 11/3
73h
Reserved
IRQ 12/4
74h
Reserved in AT, auxiliary device on PS/2 systems
IRQ 13/5
75h
FPU interrupt
IRQ 14/6
76h
Hard disk controller
IRQ 15/7
77h
Reserved
The 8259 PIC is a very complex chip to program. Fortunately, all of the hard stuff has already been done for you by the BIOS when the system boots. We will not discuss how to initialize the 8259 in this text because that information is only useful to those writing operating systems like Linux, Windows, or OS/2. If you want your interrupt service routines to run correctly under DOS or any other OS, you must not reinitialize the PIC. The PICs interface to the system through four I/O locations: ports 20h/0A0h and 21h/0A1h. The first address in each pair is the address of the master PIC (IRQ 0-7), the 6. The original 8259 was designed for Intel’s 8080 system. The 8259A provided support for the 80x86 and some other features. Since almost no one uses 8259 chips anymore, this text will use the generic term 8259. 7. The original IBM PC and PC/XT machines only supported eight interrupts via one 8259 chip. IBM, and virtually all clone manufacturers, added the second PIC in PC/AT and later designs.
Page 1005
Chapter 17
Interrupt Mask Register 7 6 5 4 3 2 1
0 Contoller Adrs 21h 0A1h IRQ 0 IRQ 1 IRQ 2 IRQ 3 IRQ 4 IRQ 5 IRQ 6 IRQ 7
/ / / / / / / /
IRQ 8 IRQ 9 IRQ 10 IRQ 11 IRQ 12 IRQ 13 IRQ 14 IRQ 15
To disable a specific device's interrupt, write a one to the mask register To enable a specific device's interrupt, write a zero to the mask register
Figure 17.1 8259 Interrupt Mask Register second address in each pair corresponds to the slave PIC (IRQ 8-15). Port 20h/0A0h is a read/write location to which you write PIC commands and read PIC status, we will refer to this as the command register or the status register. The command register is write only, the status register is read only. They just happen to share the same I/O location. The read/write lines on the PIC determine which register the CPU accesses. Port 21h/0A1h is a read/write location that contains the interrupt mask register, we will refer to this as the mask register. Choose the appropriate address depending upon which interrupt controller you want to use. The interrupt mask register is an eight bit register that lets you individually enable and disable interrupts from devices on the system. This is similar to the actions of the cli and sti instructions, but on a device by device basis. Writing a zero to the corresponding bit enables that device’s interrupts. Writing a one disables interrupts from the affected device. Note that this is non-intuitive. Figure 17.1 provides the layout of the interrupt mask register. When changing bits in the mask register, it is important that you not simply load al with a value and output it directly to the mask register port. Instead, you should read the mask register and then logically or in or and out the bits you want to change; finally, you can write the output back to the mask register. The following code sequence enables COM1: interrupts without affecting any others: in and out
al, 21h al, 0efh 21h, al
;Read existing bits. ;Turn on IRQ 4 (COM1). ;Write result back to PIC.
The command register provides lots of options, but there are only three commands you would want to execute on this chip that are compatible with the BIOS’ initialization of the 8259: sending an end of interrupt command and sending one of two read status register commands. One a specific interrupt occurs, the 8259 masks all further interrupts from that device until is receives an end of interrupt signal from the interrupt service routine. On PCs running DOS, you accomplish this by writing the value 20h to the command register. The following code does this: mov out
Page 1006
al, 20h 20h, al
;Port 0A0h if IRQ 8-15.
The 80x86 Instruction Set You must send exactly one end of interrupt command to the PIC for each interrupt you service. If you do not send the end of interrupt command, the PIC will not honor any more interrupts from that device; if you send two or more end of interrupt commands, there is the possibility that you will accidentally acknowledge a new interrupt that may be pending and you will lose that interrupt. For some interrupt service routines you write, your ISR will not be the only ISR that an interrupt invokes. For example, the PC’s BIOS provides an ISR for the timer interrupt that maintains the time of day. If you patch into the timer interrupt, you will need to call the PC BIOS’ timer ISR so the system can properly maintain the time of day and handle other timing related chores (see “Chaining Interrupt Service Routines” on page 1010). However, the BIOS’ timer ISR outputs the end of interrupt command. Therefore, you should not output the end of interrupt command yourself, otherwise the BIOS will output a second end of interrupt command and you may lose an interrupt in the process. The other two commands you can send the 8259 let you select whether to read the in-service register (ISR) or the interrupt request register (IRR). The in-service register contains set bits for each active ISR (because the 8259 allows prioritized interrupts, it is quite possible that one ISR has been interrupted by a higher priority ISR). The interrupt request register contains set bits in corresponding positions for interrupts that have not yet been serviced (probably because they are a lower priority interrupt than the interrupt currently being serviced by the system). To read the in-service register, you would execute the following statements: ; Read the in-service register in PIC #1 (at I/O address 20h) mov out in
al, 0bh 20h, al al, 20h
To read the interrupt request register, you would use the following code: ; Read the interrupt request register in PIC #1 (at I/O address 20h) mov out in
al, 0ah 20h, al al, 20h
Writing any other values to the command port may cause your system to malfunction.
17.4.2
The Timer Interrupt (INT 8) The PC’s motherboard contains an 8254 compatible timer chip. This chip contains three timer channels, one of which generates interrupts every 55 msec (approximately). This is about once every 1/18.2 seconds. You will often hear this interrupt referred to as the “eighteenth second clock.” We will simply call it the timer interrupt. The timer interrupt vector is probably the most commonly patched interrupt in the system. It turns out there are two timer interrupt vectors in the system. Int 8 is the hardware vector associated with the timer interrupt (since it comes in on IRQ 0 on the PIC). Generally, you should not patch this interrupt if you want to write a timer ISR. Instead, you should patch the second timer interrupt, interrupt 1ch. The BIOS’ timer interrupt handler (int 8) executes an int 1ch instruction before it returns. This gives a user patched routine access to the timer interrupt. Unless you are willing to duplicate the BIOS and DOS timer code, you should never completely replace the existing timer ISR with one of your own, you should always ensure that the BIOS and DOS ISRs execute in addition to your ISR. Patching into the int 1ch vector is the easiest way to do this. Even replacing the int 1ch vector with a pointer to your ISR is very dangerous. The timer interrupt service routine is the one most commonly patched by various resident programs (see “Resident Programs” on page 1025). By simply writing the address of your ISR into the timer interrupt vector, you may disable such resident programs and cause your
Page 1007
Chapter 17 system to malfunction. To solve this problem, you need to create an interrupt chain. For more details, see the section “Chaining Interrupt Service Routines” on page 1010. By default the timer interrupt is always enabled on the interrupt controller chip. Indeed, disabling this interrupt may cause your system to crash or otherwise malfunction. At the very least, you system will not maintain the correct time if you disable the timer interrupt.
17.4.3
The Keyboard Interrupt (INT 9) The keyboard microcontroller on the PC’s motherboard generates two interrupts on each keystroke – one when you press a key and one when you release it. This is on IRQ 1 on the master PIC. The BIOS responds to this interrupt by reading the keyboard’s scan code, converting this to an ASCII character, and storing the scan and ASCII codes away in the system type ahead buffer. By default, this interrupt is always enabled. If you disable this interrupt, the system will not be able to respond to any keystrokes, including ctrl-alt-del. Therefore, your programs should always reenable this interrupt if they ever disable it. For more information on the keyboard interrupt, see “The PC Keyboard” on page 1153.
17.4.4
The Serial Port Interrupts (INT 0Bh and INT 0Ch) The PC uses two interrupts, IRQ 3 and IRQ 4, to support interrupt driven serial communications. The 8250 (or compatible) serial communications controller chip (SCC) generates an interrupt in one of four situations: a character arriving over the serial line, the SCC finishes the transmission of a character and is requesting another, an error occurs, or a status change occurs. The SCC activates the same interrupt line (IRQ 3 or 4) for all four interrupt sources. The interrupt service routine is responsible for determining the exact nature of the interrupt by interrogating the SCC. By default, the system disables IRQ 3 and IRQ 4. If you install a serial ISR, you will need to clear the interrupt mask bit in the 8259 PIC before it will respond to interrupts from the SCC. Furthermore, the SCC design includes its own interrupt mask. You will need to enable the interrupt masks on the SCC chip as well. For more information on the SCC, see “The PC Serial Ports” on page 1223.
17.4.5
The Parallel Port Interrupts (INT 0Dh and INT 0Fh) The parallel port interrupts are an enigma. IBM designed the original system to allow two parallel port interrupts and then promptly designed a printer interface card that didn’t support the use of interrupts. As a result, almost no DOS based software today uses the parallel port interrupts (IRQ 5 and IRQ 7). Indeed, on the PS/2 systems IBM reserved IRQ5 which they formerly used for LPT2:. However, these interrupts have not gone to waste. Many devices which IBM’s engineers couldn’t even conceive when designing the first PC have made good use of these interrupts. Examples include SCSI cards and sound cards. Many devices today include “interrupt jumpers” that let you select IRQ 5 or IRQ 7 when installing the device. Since IRQ 5 and IRQ 7 find such little use as parallel port interrupts, we will effectively ignore the “parallel port interrupts” in this text.
Page 1008
The 80x86 Instruction Set
17.4.6
The Diskette and Hard Drive Interrupts (INT 0Eh and INT 76h) The floppy and hard disk drives generate interrupts at the completion of a disk operation. This is a very useful feature for multitasking systems like OS/2, Linux, or Windows. While the disk is reading or writing data, the CPU can go execute instructions for another process. When the disk finishes the read or write operation, it interrupts the CPU so it can resume the original task. While managing the disk drives would be an interesting topic to cover in this text, this book is already long enough. Therefore, this text will avoid discussing the disk drive interrupts (IRQ 6 and IRQ 14) in the interest of saving some space. There are many texts that cover low level disk I/O in assembly language, see the bibliography for details. By default, the floppy and hard disk interrupts are always enabled. You should not change this status if you intend to use the disk drives on your system.
17.4.7
The Real-Time Clock Interrupt (INT 70h) PC/AT and later machines included a CMOS real-time clock. This device is capable of generating timer interrupts in multiples of 976 µsec (let’s call it 1 msec). By default, the real-time clock interrupt is disabled. You should only enable this interrupt if you have an int 70h ISR installed.
17.4.8
The FPU Interrupt (INT 75h) The 80x87 FPU generates an interrupt whenever a floating point exception occurs. On CPUs with built-in FPUs (80486DX and better) there is a bit in one of the control register you can set to simulate a vectored interrupt. BIOS generally initializes such bits for compatibility with existing systems. By default, BIOS disables the FPU interrupt. Most programs that use the FPU explicitly test the FPU’s status register to determine if an error occurs. If you want to allow FPU interrupts, you must enable the interrupts on the 8259 and on the 80x87 FPU.
17.4.9
Nonmaskable Interrupts (INT 2) The 80x86 chips actually provide two interrupt input pins. The first is the maskable interrupt. This is the pin to which the 8259 PIC connects. This interrupt is maskable because you can enable or disable it with the cli and sti instructions. The nonmaskable interrupt, as its name implies, cannot be disabled under software control. Generally, PCs use this interrupt to signal a memory parity error, although certain systems use this interrupt for other purposes as well. Many older PC systems connect the FPU to this interrupt. This interrupt cannot be masked, so it is always enabled by default.
17.4.10 Other Interrupts As mentioned in the section on the 8259 PIC, there are several interrupts reserved by IBM. Many systems use the reserved interrupts for the mouse or for other purposes. Since such interrupts are inherently system dependent, we will not describe them here.
Page 1009
Chapter 17
17.5
Chaining Interrupt Service Routines Interrupt service routines come in two basic varieties – those that need exclusive access to an interrupt vector and those that must share an interrupt vector with several other ISRs. Those in the first category include error handling ISRs (e.g., divide error or overflow) and certain device drivers. The serial port is a good example of a device that rarely has more than one ISR associated with it at any one given time8. The timer, real-time clock, and keyboard ISRs generally fall into the latter category. It is not at all uncommon to find several ISRs in memory sharing each of these interrupts. Sharing an interrupt vector is rather easy. All an ISR needs to do to share an interrupt vector is to save the old interrupt vector when installing the ISR (something you need to do anyway, so you can restore the interrupt vector when your code terminates) and then call the original ISR before or after you do your own ISR processing. If you’ve saved away the address of the original ISR in the dseg double word variable OldIntVect, you can call the original ISR with the following code: ; Presumably, DS points at DSEG at this point. pushf call
OldIntVect
;Simulate an INT instruction by pushing ; the flags and making a far call.
Since OldIntVect is a dword variable, this code generates a far call to the routine whose segmented address appears in the OldIntVect variable. This code does not jump to the location of the OldIntVect variable. Many interrupt service routines do not modify the ds register to point at a local data segment. In fact, some simple ISRs do not change any of the segment registers. In such cases it is common to put any necessary variables (especially the old segment value) directly in the code segment. If you do this, your code could jump directly to the original ISR rather than calling it. To do so, you would just use the code: MyISR
proc
near
. . .
jmp endp
cs:OldIntVect
MyISR OldIntVect
dword
?
This code sequence passes along your ISR’s flags and return address as the flag and return address values to the original ISR. This is fine, when the original ISR executes the iret instruction, it will return directly to the interrupted code (assuming it doesn’t pass control to some other ISR in the chain). The OldIntVect variable must be in the code segment if you use this technique to transfer control to the original ISR. After all, when you executing the jmp instruction above, you must have already restored the state of the CPU, including the ds register. Therefore, you have no idea what segment ds is pointing at, and it probably isn’t pointing at your local data segment. Indeed, the only segment register whose value is known to you is cs, so you must keep the vector address in your code segment. The following simple program demonstrates interrupt chaining. This short program patches into the int 1ch vector. The ISR counts off seconds and notifies the main program as each second passes. The main program prints a short message every second. When 10 seconds have expired, this program removes the ISR from the interrupt chain and terminates. ; TIMER.ASM ; This program demonstrates how to patch into the int 1Ch timer interrupt ; vector and create an interrupt chain. 8. There is no reason this has to be this way, it’s just that most people rarely run two programs at the same time which must both be accessing the serial port.
Page 1010
The 80x86 Instruction Set .xlist .286 include stdlib.a includelib stdlib.lib .list dseg
segment
para public ‘data’
; The TIMERISR will update the following two variables. ; It will update the MSEC variable every 55 ms. ; It will update the TIMER variable every second. MSEC TIMER
word word
dseg
ends
cseg
segment assume
0 0
para public ‘code’ cs:cseg, ds:dseg
; The OldInt1C variable must be in the code segment because of the ; way TimerISR transfers control to the next ISR in the int 1Ch chain. OldInt1C
; ; ; ; ; ;
?
The timer interrupt service routine. This guy increment MSEC variable by 55 on every interrupt. Since this interrupt gets called every 55 msec (approx) the MSEC variable contains the current number of milliseconds. When this value exceeds 1000 (one second), the ISR subtracts 1000 from the MSEC variable and increments TIMER by one.
TimerISR
SetMSEC:
TimerISR Main
; ; ; ; ; ; ;
dword
proc push push mov mov
near ds ax ax, dseg ds, ax
mov add cmp jb inc sub mov pop pop jmp endp
ax, MSEC ax, 55 ax, 1000 SetMSEC Timer ax, 1000 MSEC, ax ax ds cseg:OldInt1C
proc mov mov meminit
;Interrupt every 55 msec.
;A second just passed. ;Adjust MSEC value.
;Transfer to original ISR.
ax, dseg ds, ax
Begin by patching in the address of our ISR into int 1ch’s vector. Note that we must turn off the interrupts while actually patching the interrupt vector and we must ensure that interrupts are turned back on afterwards; hence the cli and sti instructions. These are required because a timer interrupt could come along between the two instructions that write to the int 1Ch interrupt vector. This would be a big mess. mov mov mov mov mov
ax, 0 es, ax ax, es:[1ch*4] word ptr OldInt1C, ax ax, es:[1ch*4 + 2]
Page 1011
Chapter 17 mov
word ptr OldInt1C+2, ax
cli mov mov sti
word ptr es:[1Ch*4], offset TimerISR es:[1Ch*4 + 2], cs
; Okay, the ISR updates the TIMER variable every second. ; Continuously print this value until ten seconds have ; elapsed. Then quit.
TimerLoop:
mov printf byte dword cmp jbe
Timer, 0 “Timer = %d\n”,0 Timer Timer, 10 TimerLoop
; Okay, restore the interrupt vector. We need the interrupts off ; here for the same reason as above. mov mov cli mov mov mov mov sti
17.6
ax, 0 es, ax ax, word ptr OldInt1C es:[1Ch*4], ax ax, word ptr OldInt1C+2 es:[1Ch*4+2], ax
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Reentrancy Problems A minor problem develops with developing ISRs, what happens if you enable interrupts while in an ISR and a second interrupt from the same device comes along? This would interrupt the ISR and then reenter the ISR from the beginning. Many applications do not behave properly under these conditions. An application that can properly handle this situation is said to be reentrant. Code segments that do not operate properly when reentered are nonreentrant. Consider the TIMER.ASM program in the previous section. This is an example of a nonreentrant program. Suppose that while executing the ISR, it is interrupted at the following point: TimerISR
Page 1012
proc push push mov mov
near ds ax ax, dseg ds, ax
mov add cmp jb
ax, MSEC ax, 55 ax, 1000 SetMSEC
;Interrupt every 55 msec.
The 80x86 Instruction Set ; <<<<< Suppose the interrupt occurs at this point >>>>>
SetMSEC:
TimerISR
inc sub mov pop pop jmp endp
Timer ;A second just passed. ax, 1000 ;Adjust MSEC value. MSEC, ax ax ds cseg:OldInt1C ;Transfer to original ISR.
Suppose that, on the first invocation of the interrupt, MSEC contains 950 and Timer contains three. If a second interrupt occurs and the specified point above, ax will contain 1005. So the interrupt suspends the ISR and reenters it from the beginning. Note that TimerISR is nice enough to preserve the ax register containing the value 1005. When the second invocation of TimerISR executes, it finds that MSEC still contains 950 because the first invocation has yet to update MSEC. Therefore, it adds 55 to this value, determines that it exceeds 1000, increments Timer (it becomes four) and then stores five into MSEC. Then it returns (by jumping to the next ISR in the int 1ch chain). Eventually, control returns the first invocation of the TimerISR routine. At this time (less than 55 msec after updating Timer by the second invocation) the TimerISR code increments the Timer variable again and updates MSEC to five. The problem with this sequence is that it has incremented the Timer variable twice in less than 55 msec. Now you might argue that hardware interrupts always clear the interrupt disable flag so it would not be possible for this interrupt to be reentered. Furthermore, you might argue that this routine is so short, it would never take more than 55 msec to get to the noted point in the code above. However, you are forgetting something: some other timer ISR could be in the system that calls your code after it is done. That code could take 55 msec and just happen to turn the interrupts back on, making it perfectly possible that your code could be reentered. The code between the mov ax, MSEC and mov MSEC, ax instructions above is called a critical region or critical section. A program must not be reentered while it is executing in a critical region. Note that having critical regions does not mean that a program is not reentrant. Most programs, even those that are reentrant, have various critical regions. The key is to prevent an interrupt that could cause a critical region to be reentered while in that critical region. The easiest way to prevent such an occurrence is to turn off the interrupts while executing code in a critical section. We can easily modify the TimerISR to do this with the following code: TimerISR
proc push push mov mov
near ds ax ax, dseg ds, ax
; Beginning of critical section, turn off interrupts. pushf cli
SetMSEC:
;Preserve current I flag state. ;Make sure interrupts are off.
mov add cmp jb
ax, MSEC ax, 55 ax, 1000 SetMSEC
inc sub mov
Timer ax, 1000 MSEC, ax
;Interrupt every 55 msec.
;A second just passed. ;Adjust MSEC value.
; End of critical region, restore the I flag to its former glory. popf
Page 1013
Chapter 17
TimerISR
pop pop jmp endp
ax ds cseg:OldInt1C;Transfer to original ISR.
We will return to the problem of reentrancy and critical regions in the next two chapters of this text.
17.7
The Efficiency of an Interrupt Driven System Interrupts introduce a considerable amount of complexity to a software system (see “Debugging ISRs” on page 1020). One might ask if using interrupts is really worth the trouble. The answer of course, is yes. Why else would people use interrupts if they were proven not to be worthwhile? However, interrupts are like many other nifty things in computer science – they have their place; if you attempt to use interrupts in an inappropriate fashion they will only make things worse for you. The following sections explore the efficiency aspects of using interrupts. As you will soon discover, an interrupt driven system is usually superior despite the complexity. However, this is not always the case. For many systems, alternative methods provide better performance.
17.7.1
Interrupt Driven I/O vs. Polling The whole purpose of an interrupt driven system is to allow the CPU to continue processing instructions while some I/O activity occurs. This is in direct contrast to a polling system where the CPU continually tests an I/O device to see if the I/O operation is complete. In an interrupt driven system, the CPU goes about its business and the I/O device interrupts it when it needs servicing. This is generally much more efficient than wasting CPU cycles polling a device while it is not ready. The serial port is a perfect example of a device that works extremely well with interrupt driven I/O. You can start a communication program that begins downloading a file over a modem. Each time a character arrives, it generates an interrupt and the communication program starts up, buffers the character, and then returns from the interrupt. In the meantime, another program (like a word processor) can be running with almost no performance degradation since it takes so little time to process the serial port interrupts. Contrast the above scenario with one where the serial communication program continually polls the serial communication chip to see if a character has arrived. In this case the CPU spends all of its time looking for an input character even though one rarely (in CPU terms) arrives. Therefore, no CPU cycles are left over to do other processing like running your word processor. Suppose interrupts were not available and you wanted to allow background downloads while using your word processing program. Your word processing program would have to test the input data on the serial port once every few milliseconds to keep from losing any data. Can you imagine how difficult such a word processor would be to write? An interrupt system is the clear choice in this case. If downloading data while word processing seems far fetched, consider a more simple case – the PC’s keyboard. Whenever a keypress interrupt occurs, the keyboard ISR reads the key pressed and saves it in the system type ahead buffer for the moment when the application wants to read the keyboard data. Can you imagine how difficult it would be to write applications if you had to constantly poll the keyboard port yourself to keep from losing characters? Even in the middle of a long calculation? Once again, interrupts provide an easy solution.
Page 1014
The 80x86 Instruction Set
17.7.2
Interrupt Service Time Of course, the serial communication system just described is an example of a best case scenario. The communication program takes so little time to do its job that most of the time is left over for the word processing program. However, were to you run a different interrupt driven I/O system, for example, copying files from one disk to another, the interrupt service routine would have a noticeable impact on the performance of the word processing system. Two factors control an ISR’s impact on a computer system: the frequency of interrupts and the interrupt service time. The frequency is how many times per second (or other time measurement) a particular interrupt occurs. The interrupt service time is how long the ISR takes to service the interrupt. The nature of the frequency varies according to source of the interrupt. For example, the timer chip generates evenly spaced interrupts about 18 times per second, likewise, a serial port receiving at 9600bps generates better than 100 interrupts per second. On the other hand, the keyboard rarely generates more than about 20 interrupts per second and they are not very regular. The interrupt service time is obviously dependent upon the number of instructions the ISR must execute. The interrupt service time is also dependent upon the particular CPU and clock frequency. The same ISR executing identical instructions on two CPUs will run in less time on a faster machine. The amount of time an interrupt service routine takes to handle an interrupt, multiplied by the frequency of the interrupt, determines the impact the interrupt will have on system performance. Remember, every CPU cycle spent in an ISR is one less cycle available for your application programs. Consider the timer interrupt. Suppose the timer ISR takes 100 µsec to complete its tasks. This means that the timer interrupt consumes 1.8 msec out of every second, or about 0.18% of the total computer time. Using a faster CPU will reduce this percentage (by reducing the time spent in the ISR); using a slower CPU will increase the percentage. Nevertheless, you can see that a short ISR such as this one will not have a significant effect on overall system performance. One hundred microseconds is fast for a typical timer ISR, especially when your system has several timer ISRs chained together. However, even if the timer ISR took ten times as long to execute, it would only rob the system of less than 2% of the available CPU cycles. Even if it took 100 times longer (10 msec), there would only be an 18% performance degradation; most people would barely notice such a degradation9. Of course, one cannot allow the ISR to take as much time as it wants. Since the timer interrupt occurs every 55 msec, the maximum time the ISR can use is just under 55msec. If the ISR requires more time than there is between interrupts, the system will eventually lose an interrupt. Furthermore, the system will spend all its time servicing the interrupt rather than accomplishing anything else. For many systems, having an ISR that consumes as much as 10% of the overall CPU cycles will not prove to a problem. However, before you go off and start designing slow interrupt service routines, you should remember that your ISR is probably not the only ISR in the system. While your ISR is consuming 25% of the CPU cycles, there may be another ISR that is doing the same thing; and another, and another, and… Furthermore, there may be some ISRs that require fast servicing. For example, a serial port ISR may need to read a character from the serial communications chip each millisecond or so. If your timer ISR requires 4 msec to execute and does so with the interrupts turned off, the serial port ISR will miss some characters. Ultimately, of course, you would like to write ISRs so they are as fast as possible so they have as little impact on system performance as they can. This is one of the main rea-
9. As a general rule, people begin to notice a real difference in performance between 25 and 50%. It isn’t instantly obvious until about 50% (i.e., running at one-half the speed).
Page 1015
Chapter 17 sons most ISRs for DOS are still written in assembly language. Unless you are designing an embedded system, one in which the PC runs only your application, you need to realize that your ISRs must coexist with other ISRs and applications; you do not want the performance of your ISR to adversely affect the performance of other code in the system.
17.7.3
Interrupt Latency Interrupt latency is the time between the point a device signals that it needs service and the point where the ISR provides the needed service. This is not instantaneous! At the very least, the 8259 PIC needs to signal the CPU, the CPU needs to interrupt the current program, push the flags and return address, obtain the ISR address, and transfer control to the ISR. The ISR may need to push various registers, set up certain variables, check device status to determine the source of the interrupt, and so on. Furthermore, there may be other ISRs chained into the interrupt vector before you and they execute to completion before transferring control to your ISR that actually services the device. Eventually, the ISR actually does whatever it is that the device needs done. In the best case on the fastest microprocessors with simple ISRs, the latency could be under a microsecond. On slower systems, with several ISRs in a chain, the latency could be as bad as several milliseconds. For some devices, the interrupt latency is more important than the actual interrupt service time. For example, an input device may only interrupt the CPU once every 10 seconds. However, that device may be incapable of holding the data on its input port for more than a millisecond. In theory, any interrupt service time less than 10 seconds is fine; but the CPU must read the data within one millisecond of its arrival or the system will lose the data. Low interrupt latency (that is, responding quickly) is very important in many applications. Indeed, in some applications the latency requirements are so strict that you have to use a very fast CPU or you have to abandon interrupts altogether and go back to polling. What a minute! Isn’t polling less efficient than an interrupt driven system? How will polling improve things? An interrupt driven I/O system improves system performance by allowing the CPU to work on other tasks in between I/O operations. In principle, servicing interrupts takes very little CPU time compared the arrival of interrupts to the system. By using interrupt driven I/O, you can use all those other CPU cycles for some other purpose. However, suppose the I/O device is producing service requests at such a rate that there are no free CPU cycles. Interrupt driven I/O will provide few benefits in this case. For example, suppose we have an eight bit I/O device connected to two I/O ports. Suppose bit zero of port 310h contains a one if data is available and a zero otherwise. If data is available, the CPU must read the eight bits at port 311h. Reading port 311h clears bit zero of port 310h until the next byte arrives. If you wanted to read 8192 bytes from this port, you could do this with the following short segment of code:
DataAvailLp:
mov mov lea in shr jnc
cx, 8192 dx, 310h bx, Array al, dx al, 1 DataAvailLp
;Point bx at storage buffer ;Read status port. ;Test bit zero. ;Wait until data is
inc in mov inc
dx al, dx [bx], al bx
;Point at data port. ;Read data. ;Store data into buffer. ;Move on to next array
dec loop
dx DataAvailLp
;Point back at status port. ;Repeat 8192 times.
available.
element.
. . .
Page 1016
The 80x86 Instruction Set This code uses a classical polling loop (DataAvailLp) to wait for each available character. Since there are only three instructions in the polling loop, this loop can probably execute in just under a microsecond10. So it might take as much as one microsecond to determine that data is available, in which case the code falls through and by the second instruction in the sequence we’ve read the data from the device. Let’s be generous and say that takes another microsecond. Suppose, instead, we use a interrupt service routine. A well-written ISR combined with a good system hardware design will probably have latencies measured in microseconds. To measure the best case latency we could hope to achieve would require some sort of hardware timer than begins counting once an interrupt event occurs. Upon entry into our interrupt service routine we could read this counter to determine how much time has passed between the interrupt and its service. Fortunately, just such a device exists on the PC – the 8254 timer chip that provides the source of the 55 msec interrupt. The 8254 timer chip actually contains three separate timers: timer #0, timer #1, and timer #2. The first timer (timer #0) provides the clock interrupt, so it will be the focus of our discussion. The timer contains a 16 bit register that the 8254 decrements at regular intervals (1,193,180 times per second). Once the timer hits zero, it generates an interrupt on the 8259 IRQ 0 line and then wraps around to 0FFFFh and continues counting down from that point. Since the counter automatically resets to 0FFFFh after generating each interrupt, this means that the 8254 timer generates interrupts every 65,536/1,193,180 seconds, or once every 54.9254932198 msec, which is 18.2064819336 times per second. We’ll just call these once every 55 msec or 18 (or 18.2) times per second, respectively. Another way to view this is that the 8254 decrements the counter once every 838 nanoseconds (or 0.838 µsec). The following short assembly language program measures interrupt latency by patching into the int 8 vector. Whenever the timer chip counts down to zero, it generates an interrupt that directly calls this program’s ISR. The ISR quickly reads the timer chip’s counter register, negates the value (so 0FFFFh becomes one, 0FFFEh becomes two, etc.), and then adds it to a running total. The ISR also increments a counter so that it can keep track of the number of times it has added a counter value to the total. Then the ISR jumps to the original int 8 handler. The main program, in the mean time, simply computes and displays the current average read from the counter. When the user presses any key, this program terminates. ; ; ; ; ;
This program measures the latency of an INT 08 ISR. It works by reading the timer chip immediately upon entering the INT 08 ISR By averaging this value for some number of executions, we can determine the average latency for this code. .xlist .386 option segment:use16 include stdlib.a includelib stdlib.lib .list
cseg
segment assume
para public ‘code’ cs:cseg, ds:nothing
; All the variables are in the code segment in order to reduce ISR ; latency (we don’t have to push and set up DS, saving a few instructions ; at the beginning of the ISR). OldInt8 SumLatency
dword dword
? 0
10. On a fast CPU (.e.g, 100 MHz Pentium), you might expect this loop to execute in much less time than one microsecond. However, the in instruction is probably going to be quite slow because of the wait states associated with external I/O devices.
Page 1017
Chapter 17 Executions Average ; ; ; ; ; ;
8254 Timer Chip port addresses: equ equ
40h 43h
The following ISR reads the 8254 timer chip, negates the result (because the timer counts backwards), adds the result to the SumLatency variable, and then increments the Executions variable that counts the number of times we execute this code. In the mean time, the main program is busy computing and displaying the average latency time for this ISR. To read the 16 bit 8254 counter value, this code needs to write a zero to the 8254 control port and then read the timer port twice (reads the L.O. then H.O. bytes). There needs to be a short delay between reading the two bytes from the same port address.
TimerISR
SettleDelay:
TimerISR
Main
; ; ; ; ; ; ; ;
0 0
This program reads the 8254 timer chip. This chip counts from 0FFFFh down to zero and then generates an interrupt. It wraps around from 0 to 0FFFFh and continues counting down once it generates the interrupt.
Timer0_8254 Cntrl_8254
; ; ; ; ; ; ; ; ; ; ; ;
dword dword
proc push mov out in mov jmp in xchg neg add inc pop jmp endp
near ax eax, 0 ;Ch 0, latch & read data. Cntrl_8254, al ;Output to 8253 cmd register. al, Timer0_8254 ;Read latch #0 (LSB) & ignore. ah, al SettleDelay ;Settling delay for 8254 chip. al, Timer0_8254 ;Read latch #0 (MSB) ah, al ax ;Fix, ‘cause timer counts down. cseg:SumLatency, eax cseg:Executions ax cseg:OldInt8
proc meminit
Begin by patching in the address of our ISR into int 8’s vector. Note that we must turn off the interrupts while actually patching the interrupt vector and we must ensure that interrupts are turned back on afterwards; hence the cli and sti instructions. These are required because a timer interrupt could come along between the two instructions that write to the int 8 interrupt vector. Since the interrupt vector is in an inconsistent state at that point, this could cause the system to crash. mov mov mov mov mov mov
ax, 0 es, ax ax, es:[8*4] word ptr OldInt8, ax ax, es:[8*4 + 2] word ptr OldInt8+2, ax
cli mov mov sti
word ptr es:[8*4], offset TimerISR es:[8*4 + 2], cs
; First, wait for the first call to the ISR above. Since we will be dividing
Page 1018
The 80x86 Instruction Set ; by the value in the Executions variable, we need to make sure that it is ; greater than zero before we do anything. Wait4Non0:
cmp je
cseg:Executions, 0 Wait4Non0
; Okay, start displaying the good values until the user presses a key at ; the keyboard to stop everything: DisplayLp:
mov cdq div mov printf byte dword mov int je mov int
eax, SumLatency ;Extends eax->edx. Executions Average, eax “Count: %ld, average: %ld\n”,0 Executions, Average ah, 1 16h DisplayLp ah, 0 16h
;Test for keystroke.
;Read that keystroke.
; Okay, restore the interrupt vector. We need the interrupts off ; here for the same reason as above. mov mov cli mov mov mov mov sti
ax, 0 es, ax ax, word ptr OldInt8 es:[8*4], ax ax, word ptr OldInt8+2 es:[8*4+2], ax
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
On a 66 MHz 80486 DX/2 processor, the above code reports an average value of 44 after it has run for about 10,000 iterations. This works out to about 37 µsec between the device signalling the interrupt and the ISR being able to process it11. The latency of polled I/O would probably be an order of magnitude less than this! Generally, if you have some high speed application like audio or video recording or playback, you probably cannot afford the latencies associated with interrupt I/O. On the other hand, such applications demand such high performance out of the system, that you probably wouldn’t have any CPU cycles left over to do other processing while waiting for I/O.
11. Patching into the int 1Ch interrupt vector produces latencies in the 137 µsec range.
Page 1019
Chapter 17 Another issue with respect to ISR latency is latency consistency. That is, is there the same amount of latency from interrupt to interrupt? Some ISRs can tolerate considerable latency as long as it is consistent (that is, the latency is roughly the same from interrupt to interrupt). For example, suppose you want to patch into the timer interrupt so you can read an input port every 55 msec and store this data away. Later, when processing the data, your code might work under the assumption that the data readings are 55 msec (or 54.9…) apart. This might not be true if there are other ISRs in the timer interrupt chain before your ISR. For example, there may be an ISR that counts off 18 interrupts and then executes some code sequence that requires 10 msec. This means that 16 out of every 18 interrupts your data collection routine would collect data at 55 msec intervals right on the nose. But when that 18th interrupt occurs, the other timer ISR will delay 10 msec before passing control to your routine. This means that your 17th reading will be 65 msec since the last reading. Don’t forget, the timer chip is still counting down during all of this, that means there are now only 45 msec to the next interrupt. Therefore, your 18th reading would occur 45 msec after the 17th. Hardly a consistent pattern. If your ISR needs a consistent latencies, you should try to install your ISR as early in the interrupt chain as possible.
17.7.4
Prioritized Interrupts Suppose you have the interrupts turned off for a brief spell (perhaps you are processing some interrupt) and two interrupt requests come in while the interrupts are off. What happens when you turn the interrupts back on? Which interrupt will the CPU first service? The obvious answer would be “whichever interrupt occurred first.” However, suppose the both occurred at exactly the same time (or, at least, within a short enough time frame that we cannot determine which occurred first), or maybe, as is really the case, the 8259 PIC cannot keep track of which interrupt occurred first? Furthermore, what if one interrupt is more important that another? Suppose for example, that one interrupt tells that the user has just pressed a key on the keyboard and a second interrupt tells you that your nuclear reactor is about to melt down if you don’t do something in the next 100 µsec. Would you want to process the keystroke first, even if its interrupt came in first? Probably not. Instead, you would want to prioritizes the interrupts on the basis of their importance; the nuclear reactor interrupt is probably a little more important than the keystroke interrupt, you should probably handle it first. The 8259 PIC provides several priority schemes, but the PC BIOS initializes the 8259 to use fixed priority. When using fixed priorities, the device on IRQ 0 (the timer) has the highest priority and the device on IRQ 7 has the lowest priority. Therefore, the 8259 in the PC (running DOS) always resolves conflicts in this manner. If you were going to hook that nuclear reactor up to your PC, you’d probably want to use the nonmaskable interrupt since it has a higher priority than anything provided by the 8259 (and you can’t mask it with a CLI instruction).
17.8
Debugging ISRs Although writing ISRs can simplify the design of many types of programs, ISRs are almost always very difficult to debug. There are two main reasons ISRs are more difficult than standard applications to debug. First, as mentioned earlier, errant ISRs can modify values the main program uses (or, worse yet, that some other program in memory is using) and it is difficult to pin down the source of the error. Second, most debuggers have fits when you attempt to set breakpoints within an ISR. If your code includes some ISRs and the program seems to be misbehaving and you cannot immediately see the reason, you should immediately suspect interference by the ISR. Many programmers have forgotten about ISRs appearing in their code and have spent weeks attempting to locate a bug in their non-ISR code, only to discover the problem was with the ISR. Always suspect the ISR first. Generally, ISRs are short and you can
Page 1020
The 80x86 Instruction Set quickly eliminate the ISR as the cause of your problem before trying to track the bug down elsewhere. Debuggers often have problems because they are not reentrant or they call BIOS or DOS (that are not reentrant) so if you set a breakpoint in an ISR that has interrupted BIOS or DOS and the debugger calls BIOS or DOS, the system may crash because of the reentrancy problems. Fortunately, most modern debuggers have a remote debugging mode that lets you connect a terminal or another PC to a serial port and execute the debug commands on that second display and keyboard. Since the debugger talks directly to the serial chip, it avoids calling BIOS or DOS and avoids the reentrancy problems. Of course, this doesn’t help much if you’re writing a serial ISR, but it works fine with most other programs. A big problem when debugging interrupt service routines is that the system crashes immediately after you patch the interrupt vector. If you do not have a remote debugging facility, the best approach to debug this code is to strip the ISR to its bare essentials. This might be the code that simply passes control on to the next ISR in the interrupt chain (if applicable). Then add one section of code at a time back to your ISR until the ISR fails. Of course, the best debugging strategy is to write code that doesn’t have any bugs. While this is not a practical solution, one thing you can do is attempt to do as little as possible in the ISR. Simply read or write the device’s data and buffer any inputs for the main program to handle later. The smaller your ISR is, the less complex it is, the higher the probability is that it will not contain any bugs. Debugging ISRs, unfortunately, is not easy and it is not something you can learn right out of a book. It takes lots of experience and you will need to make a lot of mistakes. There is unfortunately, but there is no substitute for experience when debugging ISRs.
17.9
Summary This chapter discusses three phenomena occurring in PC systems: interrupts (hardware), traps, and exceptions. An interrupt is an asynchronous procedure call the CPU generates in response to an external hardware signal. A trap is a programmer-supplied call to a routine and is a special form of a procedure call. An exception occurs when a program executes and instruction that generates some sort of error. For additional details, see •
“Interrupts, Traps, and Exceptions” on page 995.
When an interrupt, trap, or exception occurs, the 80x86 CPU pushes the flags and transfers control to an interrupt service routine (ISR). The 80x86 supports an interrupt vector table that provides segmented addresses for up to 256 different interrupts. When writing your own ISR, you need to store the address of you ISR in an appropriate location in the interrupt vector table to activate that ISR. Well-behaved programs also save the original interrupt vector value so they can restore it when they terminate. For the details, see •
“80x86 Interrupt Structure and Interrupt Service Routines (ISRs)” on page 996
A trap, or software interrupt, is nothing more than the execution of an 80x86 “int n” instruction. Such an instruction transfers control to the ISR whose vector appears in the nth entry in the interrupt vector table. Generally, you would use a trap to call a routine in a resident program appearing somewhere in memory (like DOS or BIOS). For more information, see •
“Traps” on page 999
An exception occurs whenever the CPU executes an instruction and that instruction is illegal or the execution of that instruction generates some sort of error (like division by zero). The 80x86 provides several built-in exceptions, although this text only deals with the exceptions available in real mode. For the details, see •
“Exceptions” on page 1000 Page 1021
Chapter 17 • • • • • • •
“Divide Error Exception (INT 0)” on page 1000 “Single Step (Trace) Exception (INT 1)” on page 1000 “Breakpoint Exception (INT 3)” on page 1001 “Overflow Exception (INT 4/INTO)” on page 1001 “Bounds Exception (INT 5/BOUND)” on page 1001 “Invalid Opcode Exception (INT 6)” on page 1004 “Coprocessor Not Available (INT 7)” on page 1004
The PC provides hardware support for up to 15 vectored interrupts using a pair of 8259A programmable interrupt controller chips (PICs). Devices that normally generate hardware interrupts include a timer, the keyboard, serial ports, parallel ports, disk drives, sound cards, the real time clock, and the FPU. The 80x86 lets you enable and disable all maskable interrupts with the cli and sti instructions. The PIC also lets you individually mask the devices that can interrupt the system. However, the 80x86 provides a special nonmaskable interrupt that has a higher priority than the other hardware interrupts and cannot be disabled by a program. For more details on these hardware interrupts, see • • • • • • • • • • •
“Hardware Interrupts” on page 1004 “The 8259A Programmable Interrupt Controller (PIC)” on page 1005 “The Timer Interrupt (INT 8)” on page 1007 “The Keyboard Interrupt (INT 9)” on page 1008 “The Serial Port Interrupts (INT 0Bh and INT 0Ch)” on page 1008 “The Parallel Port Interrupts (INT 0Dh and INT 0Fh)” on page 1008 “The Diskette and Hard Drive Interrupts (INT 0Eh and INT 76h)” on page 1009 “The Real-Time Clock Interrupt (INT 70h)” on page 1009 “The FPU Interrupt (INT 75h)” on page 1009 “Nonmaskable Interrupts (INT 2)” on page 1009 “Other Interrupts” on page 1009
Interrupt service routines that you write may need to coexist with other ISRs in memory. In particular, you may not be able to simply replace an interrupt vector with the address of your ISR and let your ISR take over from there. Often, you will need to create an interrupt chain and call the previous ISR in the interrupt chain once you are done processing the interrupt. To see why you create interrupt chains, and to learn how to create them, see •
“Chaining Interrupt Service Routines” on page 1010
With interrupts comes the possibility of reentrancy. that is, the possibility that a routine might be interrupt and called again before the first call finished execution. This chapter introduces the concept of reentrancy and gives some examples that demonstrate problems with nonreentrant code. For details, see •
“Reentrancy Problems” on page 1012
The whole purpose of an interrupt driven system is to improve the efficiency of that system. Therefore, it should come as no surprise that ISRs should be as efficient as possible. This chapter discusses why interrupt driven I/O systems can be more efficient and contrasts interrupt driven I/O with polled I/O. However, interrupts can cause problems if the corresponding ISR is too slow. Therefore, programmers who write ISRs need to be aware of such parameters as interrupt service time, frequency of interrupts, and interrupt latency. To learn about these concepts, see • • • •
“The Efficiency of an Interrupt Driven System” on page 1014 “Interrupt Driven I/O vs. Polling” on page 1014 “Interrupt Service Time” on page 1015 “Interrupt Latency” on page 1016
If multiple interrupts occur simultaneously, the CPU must decide which interrupt to handle first. The 8259 PIC and the PC use a prioritized interrupt scheme assigning the highest priority to the timer and work down from there. The 80x86 always processes the interrupt with the highest priority first. For more details, see Page 1022
The 80x86 Instruction Set •
“Prioritized Interrupts” on page 1020
Page 1023
Chapter 17
Page 1024
Resident Programs
Chapter 18
Most MS-DOS applications are transient. They load into memory, execute, terminate, and DOS uses the memory allocated to the application for the next program the user executes. Resident programs follow these same rules, except for the last. A resident program, upon termination, does not return all memory back to DOS. Instead, a portion of the program remains resident, ready to be reactivated by some other program at a future time. Resident programs, also known as terminate and stay resident programs or TSRs, provide a tiny amount of multitasking to an otherwise single tasking operating system. Until Microsoft Windows became popular, resident programs were the most popular way to allow multiple applications to coexist in memory at one time. Although Windows has diminished the need for TSRs for background processing, TSRs are still valuable for writing device drivers, antiviral tools, and program patches. This chapter will discuss the issues you must deal with when writing resident programs.
18.1
DOS Memory Usage and TSRs When you first boot DOS, the memory layout will look something like the following: 0FFFFFh
High Memory Area (HMA) and Upper Memory Blocks (UMB) Video, ROM, and Adapter memory space
0BFFFFh (640K) Memory available for application use Free Memory Pointer Interrupt vectors, BIOS variables, DOS variables, and lower memory portion of DOS.
00000h
DOS Memory Map (no active application) DOS maintains a free memory pointer that points the the beginning of the block of free memory. When the user runs an application program, DOS loads this application starting at the address the free memory pointer contains. Since DOS generally runs only a single application at a time, all the memory from the free memory pointer to the end of RAM (0BFFFFh) is available for the application’s use: 0FFFFFh Free Memory Pointer
0BFFFFh (640K)
Memory in use by application
00000h
DOS Memory Map (w/active application) When the program terminates normally via DOS function 4Ch (the Standard Library exitpgm macro), MS-DOS reclaims the memory in use by the application and resets the free memory pointer to just above DOS in low memory. Page 1025 Thi d
t
t d ith F
M k
402
Chapter 18
MS-DOS provides a second termination call which is identical to the terminate call with one exception, it does not reset the free memory pointer to reclaim all the memory in use by the application. Instead, this terminate and stay resident call frees all but a specified block of memory. The TSR call (ah=31h) requires two parameters, a process termination code in the al register (usually zero) and dx must contain the size of the memory block to protect, in paragraphs. When DOS executes this code, it adjusts the free memory pointer so that it points at a location dx*16 bytes above the program’s PSP (see “MS-DOS, PC-BIOS, and File I/O” on page 699). This leaves memory looking like this: 0FFFFFh 0BFFFFh (640K) Free Memory Pointer Memory in use by resident application 00000h
DOS Memory Map (w/resident application) When the user executes a new application, DOS loads it into memory at the new free memory pointer address, protecting the resident program in memory: 0FFFFFh Free Memory Pointer 0BFFFFh (640K) Memory in use by normal application Memory in use by resident application 00000h
DOS Memory Map (w/resident and normal application) When this new application terminates, DOS reclaims its memory and readjusts the free memory pointer to its location before running the application – just above the resident program. By using this free memory pointer scheme, DOS can protect the memory in use by the resident program1. The trick to using the terminate and stay resident call is to figure out how many paragraphs should remain resident. Most TSRs contain two sections of code: a resident portion and a transient portion. The transient portion is the data, main program, and support routines that execute when you run the program from the command line. This code will probably never execute again. Therefore, you should not leave it in memory when your program terminates. After all, every byte consumed by the TSR program is one less byte available to other application programs. The resident portion of the program is the code that remains in memory and provides whatever functions are necessary of the TSR. Since the PSP is usually right before the first byte of program code, to effectively use the DOS TSR call, your program must be organized as follows:
1. Of course, DOS could never protect the resident program from an errant application. If the application decides to write zeros all over memory, the resident program, DOS, and many other memory areas will be destroyed.
Page 1026
Resident Programs
High addresses
SSEG, ZZZZZZSEG, etc. Transient code
Resident code and data PSP
Low addresses
Memory Organization for a Resident Program To use TSRs effectively, you need to organize your code and data so that the resident portions of your program loads into lower memory addresses and the transient portions load into the higher memory addresses. MASM and the Microsoft Linker both provide facilities that let you control the loading order of segments within your code (see “MASM: Directives & Pseudo-Opcodes” on page 355). The simple solution, however, is to put all your resident code and data in a single segment and make sure that this segment appears first in every source module of your program. In particular, if you are using the UCR Standard Library SHELL.ASM file, you must make sure that you define your resident segments before the include directives for the standard library files. Otherwise MS-DOS will load all the standard library routines before your resident segment and that would waste considerable memory. Note that you only need to define your resident segment first, you do not have to place all the resident code and data before the includes. The following will work just fine: ResidentSeg ResidentSeg
segment ends
para public ‘resident’
EndResident EndResident
segment ends
para public ‘EndRes’
.xlist include stdlib.a includelib stdlib.lib .list ResidentSeg
segment assume
para public ‘resident’ cs:ResidentSeg, ds:ResidentSeg
PSP
word
?
;This var must be here!
; Put resident code and data here ResidentSeg
ends
dseg
segment
para public ‘data’
; Put transient data here dseg
ends
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
; Put Transient code here. cseg
ends etc.
The purpose of the EndResident segment will become clear in a moment. For more information on DOS memory ordering, see Chapter Six.
Page 1027
Chapter 18
Now the only problem is to figure out the size of the resident code, in paragraphs. With your code structured in the manner shown above, determining the size of the resident program is quite easy, just use the following statements to terminate the transient portion of your code (in cseg): mov mov mov int mov ; ; ; ; ;
ax, ResidentSeg es, ax ah, 62h 21h es:PSP, bx
;Need access to ResidentSeg ;DOS Get PSP call. ;Save PSP value in PSP variable.
The following code computes the sixe of the resident portion of the code. The EndResident segment is the first segment in memory after resident code. The program’s PSP value is the segment address of the start of the resident block. By computing EndResident-PSP we compute the size of the resident portion in paragraphs. mov sub
dx, EndResident dx, bx
;Get EndResident segment address. ;Subtract PSP.
; Okay, execute the TSR call, preserving only the resident code. mov int
ax, 3100h 21h
;AH=31h (TSR), AL=0 (return code).
Executing the code above returns control to MS-DOS, preserving your resident code in memory. There is one final memory management detail to consider before moving on to other topics related to resident programs – accessing data within an resident program. Procedures within a resident program become active in response to a direct call from some other program or a hardware interrupt (see the next section). Upon entry, the resident routine may specify that certain registers contain various parameters, but one thing you cannot expect is for the calling code to properly set up the segment registers for you. Indeed, the only segment register that will contain a meaningful value (to the resident code) is the code segment register. Since many resident functions will want to access local data, this means that those functions may need to set up ds or some other segment register(s) upon initial entry. For example, suppose you have a function, count, that simply counts the number of times some other code calls it once it has gone resident. One would thing that the body of this function would contain a single instruction: inc counter. Unfortunately, such an instruction would increment the variable at counter’s offset in the current data segment (that is, the segment pointed at by the ds register). It is unlikely that ds would be pointing at the data segment associated with the count procedure. Therefore, you would be incrementing some word in a different segment (probably the caller’s data segment). This would produce disastrous results. There are two solutions to this problem. The first is to put all variables in the code segment (a very common practice in resident sections of code) and use a cs: segment override prefix on all your variables. For example, to increment the counter variable you could use the instruction inc cs:counter. This technique works fine if there are only a few variable references in your procedures. However, it suffers from a few serious drawbacks. First, the segment override prefix makes your instructions larger and slower; this is a serious problem if you access many different variables throughout your resident code. Second, it is easy to forget to place the segment override prefix on a variable, thereby causing the TSR function to wipe out memory in the caller’s data segment. Another solution to the segment problem is to change the value in the ds register upon entry to a resident procedure and restore it upon exit. The following code demonstrates how to do this: push push pop inc pop
ds cs ds Counter ds
;Preserve original DS value. ;Copy CS’s value to DS. ;Bump the variable’s value. ;Restore original DS value.
Of course, using the cs: segment override prefix is a much more reasonable solution here. However, had the code been extensive and had accessed many local variables, loading ds with cs (assuming you put your variables in the resident segment) would be more efficient.
Page 1028
Resident Programs
18.2
Active vs. Passive TSRs Microsoft identifies two types of TSR routines: active and passive. A passive TSR is one that activates in response to an explicit call from an executing application program. An active TSR is one that responds to a hardware interrupt or one that a hardware interrupt calls. TSRs are almost always interrupt service routines (see “80x86 Interrupt Structure and Interrupt Service Routines (ISRs)” on page 996). Active TSRs are typically hardware interrupt service routines and passive TSRs are generally trap handlers (see “Traps” on page 999). Although, in theory, it is possible for a TSR to determine the address of a routine in a passive TSR and call that routine directly, the 80x86 trap mechanism is the perfect device for calling such routines, so most TSRs use it. Passive TSRs generally provide a callable library of routines or extend some DOS or BIOS call. For example, you might want to reroute all characters an application sends to the printer to a file. By patching into the int 17h vector (see “The PC Parallel Ports” on page 1199) you can intercept all characters destined for the printer2. Or you could add additional functionality to a BIOS routine by chaining into its interrupt vector. For example, you could add new function calls to the int 10h BIOS video services routine (see “MS-DOS, PC-BIOS, and File I/O” on page 699) by looking for a special value in ah and passing all other int 10h calls on through to the original handler. Another use of a passive TSR is to provide a brand new set of services through a new interrupt vector that the BIOS does not already provide. The mouse services, provided by the mouse.com driver, is a good example of such a TSR. Active TSRs generally serve one of two functions. They either service a hardware interrupt directly, or they piggyback off the hardware interrupt so they can activate themselves on a periodic basis without an explicit call from an application. Pop-up programs are a good example of active TSRs. A pop-up program chains itself into the PC’s keyboard interrupt (int 9). Pressing a key activates such a program. The program can read the PC’s keyboard port (see “The PC Keyboard” on page 1153) to see if the user is pressing a special key sequence. Should this keysequence appear, the application can save a portion of the screen memory and “pop-up” on the screen, perform some user-requested function, and then restore the screen when done. Borland’s Sidekick program is an example of an extremely popular TSR program, though many others exist. Not all active TSRs are pop-ups, though. Certain viruses are good examples of active TSRs. They patch into various interrupt vectors that activate them automatically so they can go about their dastardly deeds. Fortunately, some anti-viral programs are also good examples of active TSRs, they patch into those same interrupt vectors and detect the activities of a virus and attempt to limit the damage the virus may cause. Note that a TSR may contain both active and passive components. That is, there may be certain routines that a hardware interrupt invokes and others that an application calls explicitly. However, if any routine in a resident program is active, we’ll claim that the entire TSR is active. The following program is a short example of a TSR that provides both active and passive routines. This program patches into the int 9 (keyboard interrupt) and int 16h (keyboard trap) interrupt vectors. Every time the system generates a keyboard interrupt, the active routine (int 9) increments a counter. Since the keyboard usually generates two keyboard interrupts per keystroke, dividing this value by two produces the approximate number of keys typed since starting the TSR3. A passive routine, tied into the int 16h vector, returns the number of keystrokes to the calling program. The following code provides two programs, the TSR and a short application to display the number of keystrokes since the TSR started running. ; This is an example of an active TSR that counts keyboard interrupts ; once activated. ; The resident segment definitions must come before everything else.
2. Assuming the application uses DOS or BIOS to print the characters and does not talk directly to the printer port itself. 3. It is not an exact count because some keys generate more than two keyboard interrupts.
Page 1029
Chapter 18 ResidentSeg ResidentSeg
segment ends
para public ‘Resident’
EndResident EndResident
segment ends
para public ‘EndRes’
.xlist include stdlib.a includelib stdlib.lib .list ; Resident segment that holds the TSR code: ResidentSeg
segment assume
para public ‘Resident’ cs:ResidentSeg, ds:nothing
; The following variable counts the number of keyboard interrupts KeyIntCnt
word
0
; These two variables contain the original INT 9 and INT 16h ; interrupt vector values: OldInt9 OldInt16
dword dword
? ?
; MyInt9; ; ;
The system calls this routine every time a keyboard interrupt occus. This routine increments the KeyIntCnt variable and then passes control on to the original Int9 handler.
MyInt9 MyInt9
proc inc jmp endp
; MyInt16; ; ; ; ; ;
This is the passive component of this TSR. An application explicitly calls this routine with an INT 16h instruction. If AH contains 0FFh, this routine returns the number of keyboard interrupts in the AX register. If AH contains any other value, this routine passes control to the original INT 16h (keyboard trap) handler.
MyInt16
proc cmp je jmp
far ResidentSeg:KeyIntCnt ResidentSeg:OldInt9
far ah, 0FFh ReturnCnt ResidentSeg:OldInt16;Call original handler.
; If AH=0FFh, return the keyboard interrupt count ReturnCnt: MyInt16
mov iret endp
ResidentSeg
ends
cseg
segment assume
Main
proc meminit mov mov
Page 1030
ax, ResidentSeg:KeyIntCnt
para public ‘code’ cs:cseg, ds:ResidentSeg
ax, ResidentSeg ds, ax
Resident Programs
; ; ; ;
mov mov
ax, 0 es, ax
print byte byte
“Keyboard interrupt counter TSR program”,cr,lf “Installing....”,cr,lf,0
Patch into the INT 9 and INT 16 interrupt vectors. Note that the statements above have made ResidentSeg the current data segment, so we can store the old INT 9 and INT 16 values directly into the OldInt9 and OldInt16 variables. cli mov mov mov mov mov mov
;Turn off interrupts! ax, es:[9*4] word ptr OldInt9, ax ax, es:[9*4 + 2] word ptr OldInt9+2, ax es:[9*4], offset MyInt9 es:[9*4+2], seg ResidentSeg
mov mov mov mov mov mov sti
ax, es:[16h*4] word ptr OldInt16, ax ax, es:[16h*4 + 2] word ptr OldInt16+2, ax es:[16h*4], offset MyInt16 es:[16h*4+2], seg ResidentSeg ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident. print byte
“Installed.”,cr,lf,0
mov int
ah, 62h 21h
;Get this program’s PSP ; value.
dx, EndResident dx, bx ax, 3100h 21h
;Compute size of program.
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
;DOS TSR command.
Main
Here’s the application that calls MyInt16 to print the number of keystrokes: ; ; ; ; ;
This is the companion program to the keycnt TSR. This program calls the “MyInt16” routine in the TSR to determine the number of keyboard interrupts. It displays the approximate number of keystrokes (keyboard ints/2) and quits. .xlist include stdlib.a includelib stdlib.lib .list
cseg
segment assume
Main
proc meminit
para public ‘code’ cs:cseg, ds:nothing
print
Page 1031
Chapter 18 byte mov int shr putu putcr ExitPgm
18.3
“Approximate number of keys pressed: “,0 ah, 0FFh 16h ax, 1 ;Must divide by two.
Main cseg
endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Reentrancy One big problem with active TSRs is that their invocation is asynchronous. They can activate at the touch of a keystroke, timer interrupt, or via an incoming character on the serial port, just to name a few. Since they activate on a hardware interrupt, the PC could have been executing just about any code when the interrupt came along. This isn’t a problem unless the TSR itself decides to call some foreign code, such as DOS, a BIOS routine, or some other TSR. For example, the main application may be making a DOS call when a timer interrupt activates a TSR, interrupting the call to DOS while the CPU is still executing code inside DOS. If the TSR attempts to make a call to DOS at this point, then this will reenter DOS. Of course, DOS is not reentrant, so this creates all kinds of problems (usually, it hangs the system). When writing active TSRs that call other routines besides those provided directly in the TSR, you must be aware of possible reentrancy problems. Note that passive TSRs never suffer from this problem. Indeed, any TSR routine you call passively will execute in the caller’s environment. Unless some other hardware ISR or active TSR makes the call to your routine, you do not need to worry about reentrancy with passive routines. However, reentrancy is an issue for active TSR routines and passive routines that active TSRs call.
18.3.1 Reentrancy Problems with DOS DOS is probably the biggest sore point to TSR developers. DOS is not reentrant yet DOS contains many services a TSR might use. Realizing this, Microsoft has added some support to DOS to allow TSRs to see if DOS is currently active. After all, reentrancy is only a problem if you call DOS while it is already active. If it isn’t already active, you can certainly call it from a TSR with no ill effects. MS-DOS provides a special one-byte flag (InDOS) that contains a zero if DOS is currently active and a non-zero value if DOS is already processing an application request. By testing the InDOS flag your TSR can determine if it can safely make a DOS call. If this flag is zero, you can always make the DOS call. If this flag contains one, you may not be able to make the DOS call. MS-DOS provides a function call, Get InDOS Flag Address, that returns the address of the InDOS flag. To use this function, load ah with 34h and call DOS. DOS will return the address of the InDOS flag in es:bx. If you save this address, your resident programs will be able to test the InDOS flag to see if DOS is active. Actually, there are two flags you should test, the InDOS flag and the critical error flag (criterr). Both of these flags should contain zero before you call DOS from a TSR. In DOS version 3.1 and later, the critical error flag appears in the byte just before the InDOS flag.
Page 1032
Resident Programs
So what should you do if these flags aren’t both zero? It’s easy enough to say “hey, come back and do this stuff later when MS-DOS returns back to the user program.” But how do you do this? For example, if a keyboard interrupt activates your TSR and you pass control on to the real keyboard handler because DOS is busy, you can’t expect your TSR to be magically restarted later on when DOS is no longer active. The trick is to patch your TSR into the timer interrupt as well as the keyboard interrupt. When the keystroke interrupt wakes your TSR and you discover that DOS is busy, the keyboard ISR can simply set a flag to tell itself to try again later; then it passes control to the original keyboard handler. In the meantime, a timer ISR you’ve written is constantly checking this flag you’ve created. If the flag is clear, it simply passes control on to the original timer interrupt handler, if the flag is set, then the code checks the InDOS and CritErr flags. If these guys say that DOS is busy, the timer ISR passes control on to the original timer handler. Shortly after DOS finishes whatever it was doing, a timer interrupt will come along and detect that DOS is no longer active. Now your ISR can take over and make any necessary calls to DOS that it wants. Of course, once your timer code determines that DOS is not busy, it should clear the “I want service” flag so that future timer interrupts don’t inadvertently restart the TSR. There is only one problem with this approach. There are certain DOS calls that can take an indefinite amount of time to execute. For example, if you call DOS to read a key from the keyboard (or call the Standard Library’s getc routine that calls DOS to read a key), it could be hours, days, or even longer before somebody actually bothers to press a key. Inside DOS there is a loop that waits until the user actually presses a key. And until the user presses some key, the InDOS flag is going to remain non-zero. If you’ve written a timer-based TSR that is buffering data every few seconds and needs to write the results to disk every now and then, you will overflow your buffer with new data if you wait for the user, who just went to lunch, to press a key in DOS’ command.com program. Luckily, MS-DOS provides a solution to this problem as well – the idle interrupt. While MS-DOS is in an indefinite loop wait for an I/O device, it continually executes an int 28h i nstruction. By patching into the int 28h vector, your TSR can determine when DOS is sitting in such a loop. When DOS executes the int 28h instruction, it is safe to make any DOS call whose function number (the value in ah) is greater than 0Ch. So if DOS is busy when your TSR wants to make a DOS call, you must use either a timer interrupt or the idle interrupt (int 28h) to activate the portion of your TSR that must make DOS calls. One final thing to keep in mind is that whenever you test or modify any of the above mentioned flags, you are in a critical section. Make sure the interrupts are off. If not, your TSR make activate two copies of itself or you may wind up entering DOS at the same time some other TSR enters DOS. An example of a TSR using these techniques will appear a little later, but there are some additional reentrancy problems we need to discuss first.
18.3.2 Reentrancy Problems with BIOS DOS isn’t the only non-reentrant code a TSR might want to call. The PC’s BIOS routines also fall into this category. Unfortunately, BIOS doesn’t provide an “InBIOS” flag or a multiplex interrupt. You will have to supply such functionality yourself. The key to preventing reentering a BIOS routine you want to call is to use a wrapper. A wrapper is a short ISR that patches into an existing BIOS interrupt specifically to manipulate an InUse flag. For example, suppose you need to make an int 10h (video services) call from within your TSR. You could use the following code to provide an “Int10InUse” flag that your TSR could test: MyInt10
MyInt10
proc inc pushf call dec iret endp
far cs:Int10InUse cs:OldInt10 cs:Int10InUse
Page 1033
Chapter 18
Assuming you’ve initialized the Int10InUse variable to zero, the in use flag will contain zero when it is safe to execute an int 10h instruction in your TSR, it will contain a non-zero value when the interrupt 10h handler is busy. You can use this flag like the InDOS flag to defer the execution of your TSR code. Like DOS, there are certain BIOS routines that may take an indefinite amount of time to complete. Reading a key from the keyboard buffer, reading or writing characters on the serial port, or printing characters to the printer are some examples. While, in some cases, it is possible to create a wrapper that lets your TSR activate itself while a BIOS routine is executing one of these polling loops, there is probably no benefit to doing so. For example, if an application program is waiting for the printer to take a character before it sends another to printer, having your TSR preempt this and attempt to send a character to the printer won’t accomplish much (other than scramble the data sent to the print). Therefore, BIOS wrappers generally don’t worry about indefinite postponement in a BIOS routine. 5, 8, 9, D, E, 10, 13, 16, 17, 21, 28 If you run into problems with your TSR code and certain application programs, you may want to place wrappers around the following interrupts to see if this solves your problem: int 5, int 8, int 9, int B, int C, int D, int E, int 10, int 13, int 14, int 16, or int 17. These are common culprits when TSR problems develop.
18.3.3 Reentrancy Problems with Other Code Reentrancy problems occur in other code you might call as well. For example, consider the UCR Standard Library. The UCR Standard Library is not reentrant. This usually isn’t much of a problem for a couple of reasons. First, most TSRs do not call Standard Library subroutines. Instead, they provide results that normal applications can use; those applications use the Standard Library routines to manipulate such results. A second reason is that were you to include some Standard Library routines in a TSR, the application would have a separate copy of the library routines. The TSR might execute an strcmp instruction while the application is in the middle of an strcmp routine, but these are not the same routines! The TSR is not reentering the application’s code, it is executing a separate routine. However, many of the Standard Library functions make DOS or BIOS calls. Such calls do not check to see if DOS or BIOS is already active. Therefore, calling many Standard Library routines from within a TSR may cause you to reenter DOS or BIOS. One situation does exist where a TSR could reenter a Standard Library routine. Suppose your TSR has both passive and active components. If the main application makes a call to a passive routine in your TSR and that routine call a Standard Library routine, there is the possibility that a system interrupt could interrupt the Standard Library routine and the active portion of the TSR reenter that same code. Although such a situation would be extremely rare, you should be aware of this possibility. Of course, the best solution is to avoid using the Standard Library within your TSRs. If for no other reason, the Standard Library routines are quite large and TSRs should be as small as possible.
18.4
The Multiplex Interrupt (INT 2Fh) When installing a passive TSR, or an active TSR with passive components, you will need to choose some interrupt vector to patch so other programs can communicate with your passive routines. You could pick an interrupt vector almost at random, say int 84h, but this could lead to some compatibility problems. What happens if someone else is already using that interrupt vector? Sometimes, the choice of interrupt vector is clear. For example, if your passive TSR is extended the int 16h keyboard services, it makes sense to patch in to the int 16h vector and add additional functions above and beyond those already provided by the BIOS. On the other hand, if you are creating a driver for some brand new device for the PC, you probably would not want to piggyback the support functions for this device on some other interrupt. Yet arbitrarily picking an unused interrupt vector is risky; how many other programs out there decided to do the
Page 1034
Resident Programs
same thing? Fortunately, MS-DOS provides a solution: the multiplex interrupt. Int 2Fh provides a general mechanism for installing, testing the presence of, and communicating with a TSR. To use the multiplex interrupt, an application places an identification value in ah and a function number in al and then executes an int 2Fh instruction. Each TSR in the int 2Fh chain compares the value in ah against its own unique identifier value. If the values match, the TSR process the command specified by the value in the al register. If the identification values do not match, the TSR passes control to the next int 2Fh handler in the chain. Of course, this only reduces the problem somewhat, it doesn’t eliminate it. Sure, we don’t have to guess an interrupt vector number at random, but we still have to choose a random identification number. After all, it seems reasonable that we must choose this number before designing the TSR and any applications that call it, after all, how will the applications know what value to load into ah if we dynamically assign this value when the TSR goes resident? Well, there is a little trick we can play to dynamically assign TSR identifiers and let any interested applications determine the TSR’s ID. By convention, function zero is the “Are you there?” call. An application should always execute this function to determine if the TSR is actually present in memory before making any service requests. Normally, function zero returns a zero in al if the TSR is not present, it returns 0FFh if it is present. However, when this function returns 0FFh it only tells you that some TSR has responded to your query; it does not guarantee that the TSR you are interested in is actually present in memory. However, by extending the convention somewhat, it is very easy to verify the presence of the desired TSR. Suppose the function zero call also returns a pointer to a unique identification string in the es:di registers. Then the code testing for the presence of a specific TSR could test this string when the int 2Fh call detects the presence of a TSR. the following code segment demonstrates how a TSR could determine if a TSR identified as “Randy’s INT 10h Extension” is present in memory; this code will also determine the unique identification code for that TSR, for future reference: ; Scan through all the possible TSR IDs. If one is installed, see if ; it’s the TSR we’re interested in. IDLoop:
TryNext: Success:
mov mov push mov int pop cmp je strcmpl byte byte je loop jmp
cx, 0FFh ah, cl cx al, 0 2Fh cx al, 0 TryNext
mov . . .
FuncID, cl
;This will be the ID number. ;ID -> AH. ;Preserve CX across call ;Test presence function code. ;Call multiplex interrupt. ;Restore CX. ;Installed TSR? ;Returns zero if none there. ;See if it’s the one we want.
“Randy’s INT “ “10h Extension”,0 Success ;Branch off if it is ours. IDLoop ;Otherwise, try the next one. NotInstalled ;Failure if we get to this point. ;Save function result.
If this code succeeds, the variable FuncId contains the identification value for resident TSR. If it fails, the application program probably needs to abort, or otherwise ensure that it never calls the missing TSR. The code above lets an application easily detect the presence of and determine the ID number for a specific TSR. The next question is “How do we pick the ID number for the TSR in the first place?” The next section will address that issue, as well as how the TSR must respond to the multiplex interrupt.
18.5
Installing a TSR Although we’ve already discussed how to make a program go resident (see “DOS Memory Usage and TSRs” on page 1025), there are a few aspects to installing a TSR that we need to address. First, what hapPage 1035
Chapter 18
pens if a user installs a TSR and then tries to install it a second time without first removing the one that is already resident? Second, how can we assign a TSR identification number that won’t conflict with a TSR that is already installed? This section will address these issues. The first problem to address is an attempt to reinstall a TSR program. Although one could imagine a type of TSR that allows multiple copies of itself in memory at one time, such TSRs are few and far in-between. In most cases, having multiple copies of a TSR in memory will, at best, waste memory and, at worst, crash the system. Therefore, unless you are specifically written a TSR that allows multiple copies of itself in memory at one time, you should check to see if the TSR is installed before actually installing it. This code is identical to the code an application would use to see if the TSR is installed, the only difference is that the TSR should print a nasty message and refuse to go TSR if it finds a copy of itself already installed in memory. The following code does this: SearchLoop:
TryNext:
mov mov push mov int pop cmp je strcmpl byte byte je loop jmp
AlreadyThere: print byte byte ExitPgm . . .
cx, 0FFh ah, cl cx al, 0 2Fh cx al, 0 TryNext “Randy’s INT “ “10h Extension”,0 AlreadyThere SearchLoop NotInstalled “A copy of this TSR already exists in memory”,cr,lf “Aborting installation process.”,cr,lf,0
In the previous section, you saw how to write some code that would allow an application to determine the TSR ID of a specific resident program. Now we need to look at how to dynamically choose an identification number for the TSR, one that does not conflict with any other TSRs. This is yet another modification to the scanning loop. In fact, we can modify the code above to do this for us. All we need to do is save away some ID value that does not does not have an installed TSR. We need only add a few lines to the above code to accomplish this:
SearchLoop:
mov mov mov push mov int pop cmp je strcmpl byte byte je loop jmp
FuncID, 0 cx, 0FFh ah, cl cx al, 0 2Fh cx al, 0 TryNext
;Initialize FuncID to zero.
“Randy’s INT “ “10h Extension”,0 AlreadyThere SearchLoop NotInstalled
; Note: presumably DS points at the resident data segment that contains ; the FuncID variable. Otherwise you must modify the following to ; point some segment register at the segment containing FuncID and ; use the appropriate segment override on FuncID. TryNext:
mov loop jmp
AlreadyThere: print
Page 1036
FuncID, cl SearchLoop NotInstalled
;Save possible function ID if this ; identifier is not in use.
Resident Programs byte byte ExitPgm NotInstalled: cmp jne print byte byte ExitPgm
“A copy of this TSR already exists in memory”,cr,lf “Aborting installation process.”,cr,lf,0 FuncID, 0 GoodID
;If there are no available IDs, this ; will still contain zero.
“There are too many TSRs already installed.”,cr,lf “Sorry, aborting installation process.”,cr,lf,0
GoodID:
If this code gets to label “GoodID” then a previous copy of the TSR is not present in memory and the FuncID variable contains an unused function identifier. Of course, when you install your TSR in this manner, you must not forget to patch your interrupt 2Fh handler into the int 2Fh chain. Also, you have to write an interrupt 2Fh handler to process int 2Fh calls. The following is a very simple multiplex interrupt handler for the code we’ve been developing: FuncID OldInt2F
byte dword
0 ?
;Should be in resident segment. ; Ditto.
MyInt2F
proc cmp je jmp
far ah, cs:FuncID ItsUs cs:OldInt2F
;Is this call for us? ;Chain to previous guy, if not.
; Now decode the function value in AL: ItsUs:
IDString
cmp jne mov lesi iret byte byte
al, 0 TryOtherFunc al, 0FFh IDString
;Verify presence call? ;Return “present” value in AL. ;Return pointer to string in es:di. ;Return to caller.
““Randy’s INT “ “10h Extension”,0
; Down here, handle other multiplex requests. ; This code doesn’t offer any, but here’s where they would go. ; Just test the value in AL to determine which function to execute. TryOtherFunc:
MyInt2F
18.6
. . . iret endp
Removing a TSR Removing a TSR is quite a bit more difficult that installing one. There are three things the removal code must do in order to properly remove a TSR from memory: first, it needs to stop any pending activities (e.g., the TSR may have some flags set to start some activity at a future time); second it needs to restore all interrupt vectors to their former values; third, it needs to return all reserved memory back to DOS so other applications can make use of it. The primary difficulty with these three activities is that it is not always possible to properly restore the interrupt vectors. If your TSR removal code simply restores the old interrupt vector values, you may create a really big problem. What happens if the user runs some other TSRs after running yours and they patch into the same interrupt vectors as your TSR? This would produce interrupt chains that look something like the following: Interrupt Vector
TSR #1
TSR #1
Your TSR
Original TSR
Page 1037
Chapter 18
If you restore the interrupt vector with your original value, you will create the following: Interrupt Vector
TSR #1
TSR #1
?
Original TSR
This effectively disables the TSRs that chain into your code. Worse yet, this only disables the interrupts that those TSRs have in common with your TSR. the other interrupts those TSRs patch into are still active. Who knows how those interrupts will behave under such circumstances? One solution is to simply print an error message informing the user that they cannot remove this TSR until they remove all TSRs installed prior to this one. This is a common problem with TSRs and most DOS users who install and remove TSRs should be comfortable with the fact that they must remove TSRs in the reverse order that they install them. It would be tempting to suggest a new convention that TSRs should obey; perhaps if the function number is 0FFh, a TSR should store the value in es:bx away in the interrupt vector specified in cl . This would allow a TSR that would like to remove itself to pass the address of its original interrupt handler to the previous TSR in the chain. There are only three problems with this approach: first, almost no TSRs in existence currently support this feature, so it would be of little value; second, some TSRs might use function 0FFh for something else, calling them with this value, even if you knew their ID number, could create a problem; finally, just because you’ve removed the TSR from the interrupt chain doesn’t mean you can (truly) free up the memory the TSR uses. DOS’ memory management scheme (the free pointer business) works like a stack. If there are other TSRs installed above yours in memory, most applications wouldn’t be able to use the memory freed up by removing your TSR anyway. Therefore, we’ll also adopt the strategy of simply informing the user that they cannot remove a TSR if there are others installed in shared interrupt chains. Of course, that does bring up a good question, how can we determine if there are other TSRs chained in to our interrupts? Well, this isn’t so hard. We know that the 80x86’s interrupt vectors should still be pointing at our routines if we’re the last TSR run. So all we’ve got to do is compare the patched interrupt vectors against the addresses of our interrupt service routines. If they all match, then we can safely remove our TSR from memory. If only one of them does not match, then we cannot remove the TSR from memory. The following code sequence tests to see if it is okay to detach a TSR containing ISRs for int 2fH and int 9: ; OkayToRmv; ; ; ; ;
This routine returns the carry flag set if it is okay to remove the current TSR from memory. It checks the interrupt vectors for int 2F and int 9 to make sure they are still pointing at our local routines. This code assumes DS is pointing at the resident code’s data segment.
OkayToRmv
proc push mov mov mov cmp jne mov cmp jne
near es ax, 0 ;Point ES at interrupt vector es, ax ; table. ax, word ptr OldInt2F ax, es:[2fh*4] CantRemove ax, word ptr OldInt2F+2 ax, es:[2Fh*4 + 2] CantRemove
mov cmp jne mov cmp jne
ax, word ptr OldInt9 ax, es:[9*4] CantRemove ax, word ptr OldInt9+2 ax, es:[9*4 + 2] CantRemove
; We can safely remove this TSR from memory. stc pop ret
Page 1038
es
Resident Programs ‘ Someone else is in the way, we cannot remove this TSR. CantRemove: OkayToRmv
clc pop ret endp
es
Before the TSR attempts to remove itself, it should call a routine like this one to see if removal is possible. Of course, the fact that no other TSR has chained into the same interrupts does not guarantee that there are not TSRs above yours in memory. However, removing the TSR in that case will not crash the system. True, you may not be able to reclaim the memory the TSR is using (at least until you remove the other TSRs), but at least the removal will not create complications. To remove the TSR from memory requires two DOS calls, one to free the memory in use by the TSR and one to free the memory in use by the environment area assigned to the TSR. To do this, you need to make the DOS deallocation call (see “MS-DOS, PC-BIOS, and File I/O” on page 699). This call requires that you pass the segment address of the block to release in the es register. For the TSR program itself, you need to pass the address of the TSR’s PSP. This is one of the reasons a TSR needs to save its PSP when it first installs itself. The other free call you must make frees the space associated with the TSR’s environment block. The address of this block is at offset 2Ch in the PSP. So we should probably free it first. The following calls handle the job of free the memory associated with a TSR: ; Presumably, the PSP variable was initialized with the address of this ; program’s PSP before the terminate and stay resident call. mov mov mov int
es, PSP es, es:[2Ch] ah, 49h 21h
mov mov int
es, PSP ah, 49h 21h
;Get address of environment block. ;DOS deallocate block call. ;Now free the program’s memory ; space.
Some poorly-written TSRs provide no facilities to allow you to remove them from memory. If someone wants remove such a TSR, they will have to reboot the PC. Obviously, this is a poor design. Any TSR you design for anything other than a quick test should be capable of removing itself from memory. The multiplex interrupt with function number one is often used for this purpose. To remove a TSR from memory, some application program passes the TSR ID and a function number of one to the TSR. If the TSR can remove itself from memory, it does so and returns a value denoting success. If the TSR cannot remove itself from memory, it returns some sort of error condition. Generally, the removal program is the TSR itself with a special parameter that tells it to remove the TSR currently loaded into memory. A little later this chapter presents an example of a TSR that works precisely in this fashion (see “A Keyboard Monitor TSR” on page 1041).
18.7
Other DOS Related Issues In addition to reentrancy problems with DOS, there are a few other issues your TSRs must deal with if they are going to make DOS calls. Although your calls might not cause DOS to reenter itself, it is quite possible for your TSR’s DOS calls to disturb data structures in use by an executing application. These data structures include the application’s stack, PSP, disk transfer area (DTA), and the DOS extended error information record. When an active or passive TSR gains control of the CPU, it is operating in the environment of the main (foreground) application. For example, the TSR’s return address and any values it saves on the stack are pushed onto the application’s stack. If the TSR does not use much stack space, this is fine, it need not switch stacks. However, if the TSR consumes considerable amounts of stack space because of recursive Page 1039
Chapter 18
calls or the allocation of local variables, the TSR should save the application’s ss and sp values and switch to a local stack. Before returning, of course, the TSR should switch back to the foreground application’s stack. Likewise, if the TSR execute’s DOS’ get psp address call, DOS returns the address of the foreground application’s PSP, not the TSR’s PSP4. The PSP contains several important address that DOS uses in the event of an error. For example, the PSP contains the address of the termination handler, ctrl-break handler, and critical error handler. If you do not switch the PSP from the foreground application to the TSR’s and one of the exceptions occurs (e.g., someone hits control-break or a disk error occurs), the handler associated with the application may take over. Therefore, when making DOS calls that can result in one of these conditions, you need to switch PSPs. Likewise, when your TSR returns control to the foreground application, it must restore the PSP value. MS-DOS provides two functions that get and set the current PSP address. The DOS Set PSP call (ah=51h) sets the current program’s PSP address to the value in the bx register. The DOS Get PSP call (ah=50h) returns the current program’s PSP address in the bx register. Assuming the transient portion of your TSR has saved it’s PSP address in the variable PSP, you switch between the TSR’s PSP and the foreground application’s PSP as follows: ; Assume we’ve just entered the TSR code, determined that it’s okay to ; call DOS, and we’ve switch DS so that it points at our local variables. mov int mov mov mov int . . . mov mov int
ah, 51h 21h AppPSP, bx bx, PSP ah, 50h 21h
;Get application’s PSP address ;Save application’s PSP locally. ;Change system PSP to TSR’s PSP. ;Set PSP call ;TSR code
bx, AppPSP ah, 50h 21h
;Restore system PSP address to ; point at application’s PSP.
« clean up and return from TSR »
Another global data structure that DOS uses is the disk transfer area. This buffer area was used extensively for disk I/O in DOS version 1.0. Since then, the main use for the DTA has been the find first file and find next file functions (see “MS-DOS, PC-BIOS, and File I/O” on page 699). Obviously, if the application is in the middle of using data in the DTA and your TSR makes a DOS call that changes the data in the DTA, you will affect the operation of the foreground process. MS-DOS provides two calls that let you get and set the address of the DTA. The Get DTA Address call, with ah=2Fh, returns the address of the DTA in the es:bx registers. The Set DTA call (ah=1Ah) sets the DTA to the value found in the ds:dx register pair. With these two calls you can save and restore the DTA as we did for the PSP address above. The DTA is usually at offset 80h in the PSP, the following code preserve’s the foreground application’s DTA and sets the current DTA to the TSR’s at offset PSP:80. ; This code makes the same assumptions as the previous example. mov int mov mov
ah, 2Fh ;Get application DTA 21h word ptr AppDTA, bx word ptr AppDTA+2, es
push mov mov mov int pop . . .
ds ds, PSP dx, 80h ah, 1ah 21h ds
;DTA is in PSP ; at offset 80h ;Set DTA call.
;TSR code.
4. This is another reason the transient portion of the TSR must save the PSP address in a resident variable for the TSR.
Page 1040
Resident Programs push mov mov mov int
ds dx, word ptr AppDTA ds, word ptr AppDTA+2 ax, 1ah ;Set DTA call. 21h
The last issue a TSR must deal with is the extended error information in DOS. If a TSR interrupts a program immediately after DOS returns to that program, there may be some error information the foreground application needs to check in the DOS extended error information. If the TSR makes any DOS calls, DOS may replace this information with the status of the TSR DOS call. When control returns to the foreground application, it may read the extended error status and get the information generated by the TSR DOS call, not the application’s DOS call. DOS provides two asymmetrical calls, Get Extended Error and Set Extended Error that read and write these values, respectively. The call to Get Extended Error returns the error status in the ax, bx, cx, dx, si, di, es, and ds registers. You need to save the registers in a data structure that takes the following form: ExtError eeAX eeBX eeCX eeDX eeSI eeDI eeDS eeES ExtError
struct word word word word word word word word word ends
? ? ? ? ? ? ? ? 3 dup (0)
;Reserved.
The Set Extended Error call requires that you pass an address to this structure in the ds:si register pair (which is why these two calls are asymmetrical). To preserve the extended error information, you would use code similar to the following: ; Save assumptions as the above routines here. Also, assume the error ; data structure is named ERR and is in the same segment as this code. push mov mov int
ds ah, 59h bx, 0 21h
mov pop mov mov mov mov mov mov mov . . . mov mov int
cs:ERR.eeDS, ds ds ERR.eeAX, ax ERR.eeBX, bx ERR.eeCX, cx ERR.eeDX, dx ERR.eeSI, si ERR.eeDI, di ERR.eeES, es
;Save ptr to our DS. ;Get extended error call ;Required by this call
;Retrieve ptr to our data.
;TSR code goes here. si, offset ERR ax, 5D0Ah 21h
;DS already points at correct seg. ;5D0Ah is Set Extended Error code.
« clean up and quit »
18.8
A Keyboard Monitor TSR The following program extends the keystroke counter program presented a little earlier in this chapter. This particular program monitors keystrokes and each minute writes out data to a file listing the date, time, and approximate number of keystrokes in the last minute.
Page 1041
Chapter 18
This program can help you discover how much time you spend typing versus thinking at a display screen5. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
This is an example of an active TSR that counts keyboard interrupts once activated. Every minute it writes the number of keyboard interrupts that occurred in the previous minute to an output file. This continues until the user removes the program from memory. Usage: KEYEVAL filename
-
Begins logging keystroke data to this file.
KEYEVAL REMOVE
-
Removes the resident program from memory.
This TSR checks to make sure there isn’t a copy already active in memory. When doing disk I/O from the interrupts, it checks to make sure DOS isn’t busy and it preserves application globals (PSP, DTA, and extended error info). When removing itself from memory, it makes sure there are no other interrupts chained into any of its interrupts before doing the remove. The resident segment definitions must come before everything else.
ResidentSeg ResidentSeg
segment ends
para public ‘Resident’
EndResident EndResident
segment ends
para public ‘EndRes’
.xlist .286 include stdlib.a includelib stdlib.lib .list ; Resident segment that holds the TSR code: ResidentSeg
segment assume
para public ‘Resident’ cs:ResidentSeg, ds:nothing
; Int 2Fh ID number for this TSR: MyTSRID
byte
0
; The following variable counts the number of keyboard interrupts KeyIntCnt
word
0
; Counter counts off the number of milliseconds that pass, SecCounter ; counts off the number of seconds (up to 60). Counter SecCounter
word word
0 0
; FileHandle is the handle for the log file: FileHandle
word
0
; NeedIO determines if we have a pending I/O opearation. NeedIO
word
0
; PSP is the psp address for this program. PSP
word
0
5. This program is intended for your personal enjoyment only, it is not intended to be used for unethical purposes such as monitoring employees for evaluation purposes.
Page 1042
Resident Programs ; Variables to tell us if DOS, INT 13h, or INT 16h are busy: InInt13 InInt16 InDOSFlag
byte byte dword
0 0 ?
; These variables contain the original values in the interrupt vectors ; we’ve patched. OldInt9 OldInt13 OldInt16 OldInt1C OldInt28 OldInt2F
dword dword dword dword dword dword
? ? ? ? ? ?
; DOS data structures: ExtErr eeAX eeBX eeCX eeDX eeSI eeDI eeDS eeES
? ? ? ? ? ? ? ? 3 dup (0)
ExtErr
struct word word word word word word word word word ends
XErr AppPSP AppDTA
ExtErr word dword
{} ? ?
;Extended Error Status. ;Application PSP value. ;Application DTA address.
; The following data is the output record. After storing this data ; to these variables, the TSR writes this data to disk. month day year hour minute second Keystrokes RecSize
byte byte word byte byte byte word =
0 0 0 0 0 0 0 $-month
; MyInt9; ; ;
The system calls this routine every time a keyboard interrupt occus. This routine increments the KeyIntCnt variable and then passes control on to the original Int9 handler.
MyInt9 MyInt9
proc inc jmp endp
; MyInt1C; ; ;
Timer interrupt. This guy counts off 60 seconds and then attempts to write a record to the output file. Of course, this call has to jump through all sorts of hoops to keep from reentering DOS and other problematic code.
far ResidentSeg:KeyIntCnt ResidentSeg:OldInt9
Page 1043
Chapter 18 MyInt1C
; ; ; ;
proc assume
far ds:ResidentSeg
push push pusha mov mov
ds es ax, ResidentSeg ds, ax
pushf call
OldInt1C
;Save all the registers.
First things first, let’s off a minute. Since we’re milliseconds, let’s shoot per second so the timings add cmp jb sub inc
bump our interrupt counter so we can count getting interrupted about every 54.92549 for a little more accuracy than 18 times don’t drift too much.
Counter, 549 Counter, 10000 NotSecYet Counter, 10000 SecCounter
;54.9 msec per int 1C. ;1 second.
NotSecYet: ; If NEEDIO is not zero, then there is an I/O operation in progress. ; Do not disturb the output values if this is the case. cli cmp jne
;This is a critical region. NeedIO, 0 SkipSetNIO
; Okay, no I/O in progress, see if a minute has passed since the last ; time we logged the keystrokes to the file. If so, it’s time to start ; another I/O operation. cmp SecCounter, 60 jb Int1CDone mov NeedIO, 1 mov ax, KeyIntCnt shr ax, 1 mov KeyStrokes, ax mov KeyIntCnt, 0 mov SecCounter, 0 SkipSetNIO:
Int1CDone:
MyInt1C
Page 1044
;One minute passed yet? ;Flag need for I/O. ;Copy this to the output ; buffer after computing ; # of keystrokes. ;Reset for next minute.
cmp jne
NeedIO, 1 Int1CDone
;Is the I/O already in ; progress? Or done?
call jnc
ChkDOSStatus Int1CDone
;See if DOS/BIOS are free. ;Branch if busy.
call
DoIO
;Do I/O if DOS is free.
popa pop pop iret endp assume
;Restore registers and quit. es ds ds:nothing
; MyInt28; ; ; ;
Idle interrupt. If DOS is in a busy-wait loop waiting for I/O to complete, it executes an int 28h instruction each time through the loop. We can ignore the InDOS and CritErr flags at that time, and do the I/O if the other interrupts are free.
MyInt28
proc assume
far ds:ResidentSeg
push push pusha
ds es ;Save all the registers.
Resident Programs
Int28Done:
MyInt28
mov mov
ax, ResidentSeg ds, ax
pushf call
OldInt28
cmp jne
NeedIO, 1 Int28Done
;Do we have a pending I/O?
mov or jne
al, InInt13 al, InInt16 Int28Done
;See if BIOS is busy.
call
DoIO
;Go do I/O if BIOS is free.
popa pop pop iret endp assume
;Call the next INT 28h ; ISR in the chain.
es ds ds:nothing
; MyInt16;
This is just a wrapper for the INT 16h (keyboard trap) handler.
MyInt16
proc inc
far ResidentSeg:InInt16
; Call original handler: pushf call
ResidentSeg:OldInt16
; For INT 16h we need to return the flags that come from the previous call.
MyInt16
pushf dec popf retf endp
; MyInt13;
This is just a wrapper for the INT 13h (disk I/O trap) handler.
MyInt13
proc inc pushf call pushf dec popf retf endp
MyInt13
ResidentSeg:InInt16 2
;Fake IRET to keep flags.
far ResidentSeg:InInt13 ResidentSeg:OldInt13 ResidentSeg:InInt13 2
;Fake iret to keep flags.
; ChkDOSStatus;
Returns with the carry clear if DOS or a BIOS routine is busy and we can’t interrupt them.
ChkDOSStatus
near ds:ResidentSeg bx, InDOSFlag al, es:[bx] al, es:[bx-1] al, InInt16 al, InInt13 Okay2Call
Okay2Call: ChkDOSStatus
proc assume les mov or or or je clc ret
;Get InDOS flag. ;OR with CritErr flag. ;OR with our wrapper ; values.
clc ret endp
Page 1045
Chapter 18 assume
ds:nothing
; PreserveDOS- Gets a copy’s of DOS’ current PSP, DTA, and extended ; error information and saves this stuff. Then it sets ; the PSP to our local PSP and the DTA to PSP:80h. PreserveDOS
proc assume
near ds:ResidentSeg
mov int mov
ah, 51h 21h AppPSP, bx
mov int mov mov
ah, 2Fh ;Get app’s DTA. 21h word ptr AppDTA, bx word ptr AppDTA+2, es
push mov xor int
ds ah, 59h bx, bx 21h
mov pop mov mov mov mov mov mov mov
cs:XErr.eeDS, ds ds XErr.eeAX, ax XErr.eeBX, bx XErr.eeCX, cx XErr.eeDX, dx XErr.eeSI, si XErr.eeDI, di XErr.eeES, es
;Get app’s PSP. ;Save for later
;Get extended err info.
; Okay, point DOS’s pointers at us:
PreserveDOS
mov mov int
bx, PSP ah, 50h 21h
push mov mov mov int pop
ds ds, PSP dx, 80h ah, 1Ah 21h ds
ret endp assume
ds:nothing
;Set PSP. ;Set the DTA to ; address PSP:80h ;Set DTA call.
; RestoreDOS- Restores DOS’ important global data values back to the ; application’s values. RestoreDOS
Page 1046
proc assume
near ds:ResidentSeg
mov mov int
bx, AppPSP ah, 50h 21h
push lds mov int pop push
ds dx, AppDTA ah, 1Ah 21h ds ds
mov mov int pop
si, offset XErr ax, 5D0Ah 21h ds
;Set PSP
;Set DTA
;Saved extended error stuff. ;Restore XErr call.
Resident Programs RestoreDOS
ret endp assume
ds:nothing
; DoIO;
This routine processes each of the I/O operations required to write data to the file.
DoIO
proc assume
near ds:ResidentSeg
mov
NeedIO, 0FFh
;A busy flag for us.
; The following Get Date DOS call may take a while, so turn the ; interrupts back on (we’re clear of the critical section once we ; write 0FFh to NeedIO).
PhasesDone: DoIO
sti call
PreserveDOS
;Save DOS data.
mov int mov mov mov
ah, 2Ah 21h month, dh day, dl year, cx
;Get Date DOS call
mov int mov mov mov
ah, 2Ch 21h hour, ch minute, cl second, dh
;Get Time DOS call
mov mov mov mov int mov mov int
ah, bx, cx, dx, 21h ah, bx, 21h
;DOS Write call ;Write data to this file. ;This many bytes. ;Starting at this address. ;Ignore return errors (!). ;DOS Commit call ;Write data to this file. ;Ignore return errors (!).
mov call
NeedIO, 0 RestoreDOS
ret endp assume
ds:nothing
40h FileHandle RecSize offset month 68h FileHandle
;Ready to start over.
; MyInt2F; ; ; ; ; ; ; ; ; ; ;
Provides int 2Fh (multiplex interrupt) support for this TSR. The multiplex interrupt recognizes the following subfunctions (passed in AL):
MyInt2F
proc assume
far ds:nothing
cmp je jmp
ah, MyTSRID YepItsOurs OldInt2F
00- Verify presence.
Returns 0FFh in AL and a pointer to an ID string in es:di if the TSR ID (in AH) matches this particular TSR.
01- Remove.
Removes the TSR from memory. Returns 0 in AL if successful, 1 in AL if failure.
;Match our TSR identifier?
; Okay, we know this is our ID, now check for a verify vs. remove call. YepItsOurs:
cmp jne
al, 0 TryRmv
;Verify Call
Page 1047
Chapter 18 mov lesi iret
al, 0ffh IDString
;Return success.
IDString
byte
“Keypress Logger TSR”,0
TryRmv:
cmp jne
al, 1 IllegalOp
call je mov iret
TstRmvable CanRemove ax, 1
;Return back to caller.
;Remove call. ;See if we can remove this guy. ;Branch if we can. ;Return failure for now.
; Okay, they want to remove this guy *and* we can remove it from memory. ; Take care of all that here.
CanRemove:
assume
ds:ResidentSeg
push push pusha cli mov mov mov mov
ds es ax, es, ax, ds,
mov mov mov mov
ax, word ptr OldInt9 es:[9*4], ax ax, word ptr OldInt9+2 es:[9*4 + 2], ax
mov mov mov mov
ax, word ptr OldInt13 es:[13h*4], ax ax, word ptr OldInt13+2 es:[13h*4 + 2], ax
mov mov mov mov
ax, word ptr OldInt16 es:[16h*4], ax ax, word ptr OldInt16+2 es:[16h*4 + 2], ax
mov mov mov mov
ax, word ptr OldInt1C es:[1Ch*4], ax ax, word ptr OldInt1C+2 es:[1Ch*4 + 2], ax
mov mov mov mov
ax, word ptr OldInt28 es:[28h*4], ax ax, word ptr OldInt28+2 es:[28h*4 + 2], ax
mov mov mov mov
ax, word ptr OldInt2F es:[2Fh*4], ax ax, word ptr OldInt2F+2 es:[2Fh*4 + 2], ax
0 ax cs ax
;Turn off the interrupts while ; we mess with the interrupt ; vectors.
; Okay, with that out of the way, let’s close the file. ; Note: INT 2F shouldn’t have to deal with DOS busy because it’s ; a passive TSR call. mov mov int
ah, 3Eh bx, FileHandle 21h
;Close file command
; Okay, one last thing before we quit- Let’s give the memory allocated ; to this TSR back to DOS. mov mov mov int
Page 1048
ds, PSP es, ds:[2Ch] ah, 49h 21h
;Ptr to environment block. ;DOS release memory call.
Resident Programs mov mov mov int
ax, ds es, ax ah, 49h 21h
popa pop pop mov iret
es ds ax, 0
;Release program code space.
;Return Success.
; They called us with an illegal subfunction value. Try to do as little ; damage as possible. IllegalOp: MyInt2F
mov iret endp assume
ax, 0
;Who knows what they were thinking?
ds:nothing
; TstRmvable- Checks to see if we can remove this TSR from memory. ; Returns the zero flag set if we can remove it, clear ; otherwise. TstRmvable
TRDone: TstRmvable ResidentSeg
cseg
proc cli push mov mov
near ds ax, 0 ds, ax
cmp jne cmp jne
word ptr ds:[9*4], offset MyInt9 TRDone word ptr ds:[9*4 + 2], seg MyInt9 TRDone
cmp jne cmp jne
word ptr ds:[13h*4], offset MyInt13 TRDone word ptr ds:[13h*4 + 2], seg MyInt13 TRDone
cmp jne cmp jne
word ptr ds:[16h*4], offset MyInt16 TRDone word ptr ds:[16h*4 + 2], seg MyInt16 TRDone
cmp jne cmp jne
word ptr ds:[1Ch*4], offset MyInt1C TRDone word ptr ds:[1Ch*4 + 2], seg MyInt1C TRDone
cmp jne cmp jne
word ptr ds:[28h*4], offset MyInt28 TRDone word ptr ds:[28h*4 + 2], seg MyInt28 TRDone
cmp jne cmp pop sti ret endp ends
word ptr ds:[2Fh*4], offset MyInt2F TRDone word ptr ds:[2Fh*4 + 2], seg MyInt2F ds
segment assume
para public ‘code’ cs:cseg, ds:ResidentSeg
Page 1049
Chapter 18 ; SeeIfPresent; ;
Checks to see if our TSR is already present in memory. Sets the zero flag if it is, clears the zero flag if it is not.
SeeIfPresent
proc push push push mov mov push mov int pop cmp je strcmpl byte je
near es ds di cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
dec js cmp pop pop pop ret endp
cl IDLoop cx, 0 di ds es
IDLoop:
TryNext: Success:
SeeIfPresent
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Keypress Logger TSR”,0 Success ;Test USER IDs of 80h..FFh ;Clear zero flag.
; FindID; ; ; ; ;
Determines the first (well, last actually) TSR ID available in the multiplex interrupt chain. Returns this value in the CL register.
FindID
proc push push push
near es ds di
mov mov push mov int pop cmp je dec js xor cmp pop pop pop ret endp
cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 Success cl IDLoop cx, cx cx, 1 di ds es
IDLoop:
Success:
FindID
Main
Returns the zero flag set if it locates an empty slot. Returns the zero flag clear if failure.
;Start with ID 0FFh. ;Verify presence call. ;Present in memory? ;Test USER IDs of 80h..FFh ;Clear zero flag
proc meminit mov mov
ax, ResidentSeg ds, ax
mov int mov
ah, 62h 21h PSP, bx
;Get this program’s PSP ; value.
; Before we do anything else, we need to check the command line
Page 1050
Resident Programs ; ; ; ; ;
parameters. We must have either a valid filename or the command “remove”. If remove appears on the command line, then remove the resident copy from memory using the multiplex (2Fh) interrupt. If remove is not on the command line, we’d better have a filename and there had better not be a copy already loaded into memory. argc cmp je print byte byte byte ExitPgm
cx, 1 GoodParmCnt
;Must have exactly 1 parm.
“Usage:”,cr,lf “ KeyEval filename”,cr,lf “or KeyEval REMOVE”,cr,lf,0
; Check for the REMOVE command. GoodParmCnt:
RemoveIt:
mov argv stricmpl byte jne
“REMOVE”,0 TstPresent
call je print byte byte ExitPgm
SeeIfPresent RemoveIt
mov printf byte dword
MyTSRID, cl
mov mov int cmp je print byte ExitPgm RmvFailure:
ax, 1
print byte byte byte byte ExitPgm
“TSR is not present in memory, cannot remove” cr,lf,0
“Removing TSR (ID #%d) from memory...”,0 MyTSRID ah, cl al, 1 2Fh al, 1 RmvFailure
;Remove cmd, ah contains ID ;Succeed?
“removed.”,cr,lf,0
cr,lf “Could not remove TSR from memory.”,cr,lf “Try removing other TSRs in the reverse order “ “you installed them.”,cr,lf,0
; Okay, see if the TSR is already in memory. If so, abort the ; installation process. TstPresent:
call jne print byte byte ExitPgm
SeeIfPresent GetTSRID “TSR is already present in memory.”,cr,lf “Aborting installation process”,cr,lf,0
; Get an ID for our TSR and save it away. GetTSRID:
call je print byte ExitPgm
FindID GetFileName “Too many resident TSRs, cannot install”,cr,lf,0
Page 1051
Chapter 18 ; Things look cool so far, check the filename and open the file. GetFileName:
mov printf byte byte byte dword
MyTSRID, cl “Keypress logger TSR program”,cr,lf “TSR ID = %d”,cr,lf “Processing file:”,0 MyTSRID
puts putcr
GoodOpen:
InstallInts:
; ; ; ;
Page 1052
mov mov push push pop mov int jnc print byte puti print byte ExitPgm
ah, 3Ch cx, 0 ds es ds dx, di 21h GoodOpen
;Create file command. ;Normal file.
pop mov
ds FileHandle, ax
print byte
“Installing interrupts...”,0
;Point ds:dx at name ;Open the file
“DOS error #”,0 “ opening file.”,cr,lf,0
;Save file handle.
Patch into the INT 9, 13h, 16h, 1Ch, 28h, and 2Fh interrupt vectors. Note that the statements above have made ResidentSeg the current data segment, so we can store the old values directly into the OldIntxx variables. cli mov mov mov mov mov mov mov mov
;Turn off interrupts! ax, 0 es, ax ax, es:[9*4] word ptr OldInt9, ax ax, es:[9*4 + 2] word ptr OldInt9+2, ax es:[9*4], offset MyInt9 es:[9*4+2], seg ResidentSeg
mov mov mov mov mov mov
ax, es:[13h*4] word ptr OldInt13, ax ax, es:[13h*4 + 2] word ptr OldInt13+2, ax es:[13h*4], offset MyInt13 es:[13h*4+2], seg ResidentSeg
mov mov mov mov mov mov
ax, es:[16h*4] word ptr OldInt16, ax ax, es:[16h*4 + 2] word ptr OldInt16+2, ax es:[16h*4], offset MyInt16 es:[16h*4+2], seg ResidentSeg
mov mov mov mov mov mov
ax, es:[1Ch*4] word ptr OldInt1C, ax ax, es:[1Ch*4 + 2] word ptr OldInt1C+2, ax es:[1Ch*4], offset MyInt1C es:[1Ch*4+2], seg ResidentSeg
mov mov mov
ax, es:[28h*4] word ptr OldInt28, ax ax, es:[28h*4 + 2]
Resident Programs mov mov mov
word ptr OldInt28+2, ax es:[28h*4], offset MyInt28 es:[28h*4+2], seg ResidentSeg
mov mov mov mov mov mov sti
ax, es:[2Fh*4] word ptr OldInt2F, ax ax, es:[2Fh*4 + 2] word ptr OldInt2F+2, ax es:[2Fh*4], offset MyInt2F es:[2Fh*4+2], seg ResidentSeg ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident. print byte
“Installed.”,cr,lf,0 dx, EndResident dx, PSP ax, 3100h 21h
Main cseg
mov sub mov int endp ends
;Compute size of program.
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
;DOS TSR command.
Main
The following is a short little application that reads the data file produced by the above program and produces a simple report of the date, time, and keystrokes: ; This program reads the file created by the KEYEVAL.EXE TSR program. ; It displays the log containing dates, times, and number of keystrokes. .xlist .286 include stdlib.a includelib stdlib.lib .list dseg
segment
para public ‘data’
FileHandle
word
?
month day year hour minute second KeyStrokes RecSize
byte byte word byte byte byte word =
0 0 0 0 0 0 0 $-month
dseg
ends
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
Page 1053
Chapter 18 ; SeeIfPresent; ;
Checks to see if our TSR is present in memory. Sets the zero flag if it is, clears the zero flag if it is not.
SeeIfPresent
near es ds
IDLoop:
TryNext: Success:
SeeIfPresent
Main
proc push push pusha mov mov push mov int pop cmp je strcmpl byte je dec js cmp popa pop pop ret endp
argc cmp je print byte byte ExitPgm
mov argv print byte byte puts putcr mov mov push push pop mov int jnc print byte puti print byte ExitPgm
Page 1054
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Keypress Logger TSR”,0 Success cl IDLoop cx, 0
;Test USER IDs of 80h..FFh ;Clear zero flag.
ds es
proc meminit mov mov
GoodParmCnt:
cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
ax, dseg ds, ax
cx, 1 GoodParmCnt
;Must have exactly 1 parm.
“Usage:”,cr,lf “ KEYRPT filename”,cr,lf,0
ax, 1
“Keypress logger report program”,cr,lf “Processing file:”,0
ah, 3Dh al, 0 ds es ds dx, di 21h GoodOpen
;Open file command. ;Open for reading. ;Point ds:dx at name ;Open the file
“DOS error #”,0 “ opening file.”,cr,lf,0
Resident Programs GoodOpen:
pop mov
ds FileHandle, ax
;Save file handle.
; Okay, read the data and display it: ReadLoop:
ReadError: Quit:
18.9
mov mov mov mov int jc test je
ah, 3Fh bx, FileHandle cx, RecSize dx, offset month 21h ReadError ax, ax Quit
mov mov mov dtoam puts free print byte
cx, year dl, day dh, month
mov mov mov mov ttoam puts free printf byte dword jmp
ch, cl, dh, dl,
print byte mov mov int ExitPgm
;Read file command ;Number of bytes. ;Place to put data. ;EOF?
“, “,0 hour minute second 0
“, keystrokes = %d\n”,0 KeyStrokes ReadLoop “Error reading file”,cr,lf,0 bx, FileHandle ah, 3Eh 21h
;Close file
Main cseg
endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Semiresident Programs A semiresident program is one that temporarily loads itself into memory, executes another program (a child process), and then removes itself from memory after the child process terminates. Semiresident programs behave like resident programs while the child executes, but they do not stay in memory once the child terminates. The main use for semiresident programs is to extend an existing application or patch an application6 (the child process). The nice thing about a semiresident program patch is that it does not have to modify
6. Patching a program means to replace certain opcode bytes in the object file. Programmers apply patches to correct bugs or extend a product whose sources are not available.
Page 1055
Chapter 18
the application’s “.EXE” file directly on the disk. If for some reason the patch fails, you haven’t destroyed the ‘.EXE” file, you’ve only wiped out the object code in memory. A semiresident application, like a TSR, has a transient and a resident part. The resident part remains in memory while the child process executes. The transient part initializes the program and then transfers control to the resident part that loads the child application over the resident portion. The transient code patches the interrupt vectors and does all the things a TSR does except it doesn’t issue the TSR command. Instead, the resident program loads the application into memory and transfers control to that program. When the application returns control to the resident program, it exits to DOS using the standard ExitPgm call (ah=4Ch). While the application is running, the resident code behaves like any other TSR. Unless the child process is aware of the semiresident program, or the semiresident program patches interrupt vectors the application normally uses, the semiresident program will probably be an active resident program, patching into one or more of the hardware interrupts. Of course, all the rules that apply to active TSRs also apply to active semiresident programs. The following is a very generic example of s semiresident program. This program, “RUN.ASM”, runs the application whose name and command line parameters appear as command line parameters to run. In other words: c:> run pgm.exe parm1 parm2 etc.
is equivalent to pgm parm1 parm2 etc.
Note that you must supply the “.EXE” or “.COM” extension to the program’s filename. This code begins by extracting the program’s filename and command line parameters from run’s command line. Run builds an exec structure (see “MS-DOS, PC-BIOS, and File I/O” on page 699) and then calls DOS to execute the program. On return, run fixes up the stack and returns to DOS. ; RUN.ASM - The barebones semiresident program. ; ; Usage: ; RUN <program.exe> <program’s command line> ; or RUN <program.com> <program’s command line> ; ; RUN executes the specified program with the supplied command line parameters. ; At first, this may seem like a stupid program. After all, why not just run ; the program directly from DOS and skip the RUN altogether? Actually, there ; is a good reason for RUN-- It lets you (by modifying the RUN source file) ; set up some environment prior to running the program and clean up that ; environment after the program terminates (“environment” in this sense does ; not necessarily refer to the MS-DOS ENVIRONMENT area). ; ; For example, I have used this program to switch the mode of a TSR prior to ; executing an EXE file and then I restored the operating mode of that TSR ; after the program terminated. ; ; In general, you should create a new version of RUN.EXE (and, presumbably, ; give it a unique name) for each application you want to use this program ; with. ; ; ;---------------------------------------------------------------------------; ; ; Put these segment definitions 1st because we want the Standard Library ; routines to load last in memory, so they wind up in the transient portion. CSEG CSEG SSEG SSEG ZZZZZZSEG ZZZZZZSEG
Page 1056
segment ends segment ends segment ends
para public ‘CODE’ para stack ‘stack’ para public ‘zzzzzzseg’
Resident Programs ; Includes for UCR Standard Library macros. include include include include include include
consts.a stdin.a stdout.a misc.a memory.a strings.a
includelib stdlib.lib CSEG
segment assume
para public ‘CODE’ cs:cseg, ds:cseg
; Variables used by this program. ; MS-DOS EXEC structure. ExecStruct
dw dd dd dd
0 CmdLine DfltFCB DfltFCB
DfltFCB CmdLine PgmName
db db dd
3,” “,0,0,0,0,0 0, 0dh, 126 dup (“ “) ;Cmd line for program. ? ;Points at pgm name.
Main
proc mov mov
ax, cseg ds, ax
MemInit
;Use parent’s Environment blk. ;For the cmd ln parms.
;Get ptr to vars segment ;Start the memory mgr.
; If you want to do something before the execution of the command-line ; specified program, here is a good place to do it:
;
-------------------------------------
; Now let’s fetch the program name, etc., from the command line and execute ; it. argc or jz mov argv mov mov
cx, cx Quit
;See how many cmd ln parms ; we have. ;Just quit if no parameters.
ax, 1
;Get the first parm (pgm name)
word ptr PgmName, di;Save ptr to name word ptr PgmName+2, es
; Okay, for each word on the command line after the filename, copy ; that word to CmdLine buffer and separate each word with a space, ; just like COMMAND.COM does with command line parameters it processes. ParmLoop:
lea dec jz
si, CmdLine+1 ;Index into cmdline. cx ExecutePgm
inc argv
ax
;Point at next parm. ;Get the next parm.
Page 1057
Chapter 18
CpyLp:
StrDone:
; ; ; ;
push mov inc inc mov cmp je inc mov inc inc jmp
ax byte ptr [si], ‘ ‘ ;1st item and separator on ln. CmdLine si al, es:[di] al, 0 StrDone CmdLine ;Increment byte cnt ds:[si], al si di CpyLp
mov pop jmp
byte ptr ds:[si], cr ;In case this is the end. ax ;Get current parm # ParmLoop
Okay, we’ve built command line, now The first step is isn’t using. That
ExecutePgm:
the MS-DOS execute structure and the necessary let’s see about running the program. to free up all the memory that this program would be everything from zzzzzzseg on.
mov int mov mov sub mov mov int
ah, 21h es, ax, ax, bx, ah, 21h
62h bx zzzzzzseg bx ax 4ah
;Get our PSP value ;Compute size of ; resident run code. ;Release unused memory.
; Warning! No Standard Library calls after this point. We’ve just ; released the memory that they’re sitting in. So the program load ; we’re about to do will wipe out the Standard Library code. mov mov mov lds mov int
bx, es, bx, dx, ax, 21h
seg ExecStruct bx offset ExecStruct ;Ptr to program record. PgmName 4b00h ;Exec pgm
; When we get back, we can’t count on *anything* being correct. First, fix ; the stack pointer and then we can finish up anything else that needs to ; be done. mov mov mov mov mov
ax, ss, sp, ax, ds,
sseg ax offset EndStk seg cseg ax
; Okay, if you have any great deeds to do after the program, this is a ; good place to put such stuff. ;
-------------------------------------
; Return control to MS-DOS Quit: Main cseg
ExitPgm endp ends
sseg
segment dw dw ends
endstk sseg
para stack ‘stack’ 128 dup (0) ?
; Set aside some room for the heap. zzzzzzseg Heap
Page 1058
segment db
para public ‘zzzzzzseg’ 200h dup (?)
Resident Programs zzzzzzseg
ends end
Main
Since RUN.ASM is rather simple perhaps a more complex example is in order. The following is a fully functional patch for the Lucasart’s game XWING. The motivation for this patch can about because of the annoyance of having to look up a password everytime you play the game. This little patch searches for the code that calls the password routine and stores NOPs over that code in memory. The operation of this code is a little different than that of RUN.ASM. The RUN program sends an execute command to DOS that runs the desired program. All system changes RUN needs to make must be made before or after the application executes. XWPATCH operates a little differently. It loads the XWING.EXE program into memory and searches for some specific code (the call to the password routine). Once it finds this code, it stores NOP instructions over the top of the call. Unfortunately, life isn’t quite that simple. When XWING.EXE loads, the password code isn’t yet present in memory. XWING loads that code as an overlay later on. So the XWPATCH program finds something that XWING.EXE does load into memory right away – the joystick code. XWPATCH patches the joystick code so that any call to the joystick routine (when detecting or calibrating the joystick) produces a call to XWPATCH’s code that searches for the password code. Once XWPATCH locates and NOPs out the call to the password routine, it restores the code in the joystick routine. From that point forward, XWPATCH is simply taking up memory space; XWING will never call it again until XWING terminates. ; XWPATCH.ASM ; ; Usage: ; XWPATCH - must be in same directory as XWING.EXE ; ; This program executes the XWING.EXE program and patches it to avoid ; having to enter the password every time you run it. ; ; This program is intended for educational purposes only. ; It is a demonstration of how to write a semiresident program. ; It is not intended as a device to allow the piracy of commercial software. ; Such use is illegal and is punishable by law. ; ; This software is offered without warranty or any expectation of ; correctness. Due to the dynamic nature of software design, programs ; that patch other programs may not work with slight changes in the ; patched program (XWING.EXE). USE THIS CODE AT YOUR OWN RISK. ; ;---------------------------------------------------------------------------byp wp
textequ textequ
<word ptr>
; Put these segment definitions here so the UCR Standard Library will ; load after zzzzzzseg (in the transient section). cseg cseg
segment para public ‘CODE’ ends
sseg sseg
segment ends
para stack ‘STACK’
zzzzzzseg zzzzzzseg
segment ends
para public ‘zzzzzzseg’
.286 include stdlib.a includelib stdlib.lib CSEG
segment
para public ‘CODE’
Page 1059
Chapter 18 assume
cs:cseg, ds:nothing
; CountJSCalls-Number of times xwing calls the Joystick code before ; we patch out the password call. CountJSCalls ; PSP;
dw
250
Program Segment Prefix. Needed to free up memory before running the real application program.
PSP
dw
0
; Program Loading data structures (for DOS). ExecStruct
LoadSSSP LoadCSIP PgmName
dw dd dd dd dd dd dd
0 CmdLine DfltFCB DfltFCB ? ? Pgm
;Use parent’s Environment blk. ;For the cmd ln parms.
DfltFCB CmdLine Pgm
db db db
3,” “,0,0,0,0,0 2, “ “, 0dh, 16 dup (“ “);Cmd line for program “XWING.EXE”,0
;**************************************************************************** ; XWPATCH begins here. This is the memory resident part. Only put code ; which which has to be present at run-time or needs to be resident after ; freeing up memory. ;**************************************************************************** Main
proc mov mov mov mov mov mov meminit2
; ; ; ; ;
cs:PSP, ds ax, cseg ds, ax
;Get ptr to vars segment
ax, zzzzzzseg es, ax cx, 1024/16
Now, free up memory from ZZZZZZSEG on to make room for XWING. Note: Absolutely no calls to UCR Standard Library routines from this point forward! (ExitPgm is okay, it’s just a macro which calls DOS.) Note that after the execution of this code, none of the code & data from zzzzzzseg on is valid. mov sub inc mov mov int jnc
bx, zzzzzzseg bx, PSP bx es, PSP ah, 4ah 21h GoodRealloc
; Okay, I lied. Here’s a StdLib call, but it’s okay because we failed ; to load the application over the top of the standard library code. ; But from this point on, absolutely no more calls! print byte byte jmp
“Memory allocation error.” cr,lf,0 Quit
GoodRealloc: ; Now load the XWING program into memory:
Page 1060
Resident Programs mov mov mov lds mov int jc ; ; ; ; ; ;
bx, seg ExecStruct es, bx bx, offset ExecStruct ;Ptr to program record. dx, PgmName ax, 4b01h ;Load, do not exec, pgm 21h Quit ;If error loading file.
Unfortunately, the password code gets loaded dynamically later on. So it’s not anywhere in memory where we can search for it. But we do know that the joystick code is in memory, so we’ll search for that code. Once we find it, we’ll patch it so it calls our SearchPW routine. Note that you must use a joystick (and have one installed) for this patch to work properly. mov mov xor
si, zzzzzzseg ds, si si, si
mov mov mov mov call jc
di, cs es, di di, offset JoyStickCode cx, JoyLength FindCode Quit ;If didn’t find joystick code.
; Patch the XWING joystick code here mov mov mov
byp ds:[si], 09ah;Far call wp ds:[si+1], offset SearchPW wp ds:[si+3], cs
; Okay, start the XWING.EXE program running mov int mov mov mov mov mov mov jmp Quit: Main ; ; ; ; ; ; ; ;
ah, 62h ;Get PSP 21h ds, bx es, bx wp ds:[10], offset Quit wp ds:[12], cs ss, wp cseg:LoadSSSP+2 sp, wp cseg:LoadSSSP dword ptr cseg:LoadCSIP
ExitPgm endp
SearchPW gets call from XWING when it attempts to calibrate the joystick. We’ll let XWING call the joystick several hundred times before we actually search for the password code. The reason we do this is because XWING calls the joystick code early on to test for the presence of a joystick. Once we get into the calibration code, however, it calls the joystick code repetitively, so a few hundred calls doesn’t take very long to expire. Once we’re in the calibration code, the password code has been loaded into memory, so we can search for it then.
SearchPW
proc cmp je dec sti neg neg ret
far cs:CountJSCalls, 0 DoSearch cs:CountJSCalls ;Code we stole from xwing for bx ; the patch. di
; Okay, search for the password code. DoSearch:
push mov push
bp bp, sp ds
Page 1061
Chapter 18 push pusha
es
; Search for the password code in memory: mov mov xor
si, zzzzzzseg ds, si si, si
mov mov mov mov call jc
di, cs es, di di, offset PasswordCode cx, PWLength FindCode NotThere ;If didn’t find pw code.
; Patch the XWING password code here. Just store NOPs over the five ; bytes of the far call to the password routine. mov mov mov mov mov
byp byp byp byp byp
ds:[si+11], ds:[si+12], ds:[si+13], ds:[si+14], ds:[si+15],
090h 090h 090h 090h 090h
;NOP out a far call
; Adjust the return address and restore the patched joystick code so ; that it doesn’t bother jumping to us anymore. NotThere:
sub les
word ptr [bp+2], 5 ;Back up return address. bx, [bp+2] ;Fetch return address.
; Store the original joystick code over the call we patched to this ; routine. mov mov mov mov mov mov
SearchPW
popa pop pop pop ret endp
ax, word ptr JoyStickCode es:[bx], ax ax, word ptr JoyStickCode+2 es:[bx+2], ax al, byte ptr JoyStickCode+4 es:[bx+4], al es ds bp
;**************************************************************************** ; ; FindCode: On entry, ES:DI points at some code in *this* program which ; appears in the XWING game. DS:SI points at a block of memory ; in the XWING game. FindCode searches through memory to find the ; suspect piece of code and returns DS:SI pointing at the start of ; that code. This code assumes that it *will* find the code! ; It returns the carry clear if it finds it, set if it doesn’t. FindCode
DoCmp: CmpLoop:
Page 1062
proc push push push mov push push push repe cmpsb pop pop pop je inc dec
near ax bx dx dx, 1000h di si cx cx si di FoundCode si dx
;Search in 4K blocks. ;Save ptr to compare code. ;Save ptr to start of string. ;Save count.
Resident Programs
FoundCode:
FindCode
jne sub mov inc mov cmp jb
CmpLoop si, 1000h ax, ds ah ds, ax ax, 9000h DoCmp
pop pop pop stc ret
dx bx ax
pop pop pop clc ret endp
dx bx ax
;Stop at address 9000:0 ; and fail if not found.
;**************************************************************************** ; ; Call to password code that appears in the XWING game. This is actually ; data that we’re going to search for in the XWING object code. PasswordCode
PasswordCode EndPW: PWLength
proc call mov mov push push byte endp
near $+47h [bp-4], ax [bp-2], dx dx ax 9ah, 04h, 00
=
EndPW-PasswordCode
; The following is the joystick code we’re going to search for. JoyStickCode
JoyStickCode EndJSC:
proc sti neg neg pop pop pop ret mov in mov not and jnz in endp
near bx di bp dx cx bp, bx al, dx bl, al al al, ah $+11h al, dx
JoyLength cseg
= ends
EndJSC-JoyStickCode
sseg
segment dw dw ends
para stack ‘STACK’ 256 dup (0) ?
segment db ends end
para public ‘zzzzzzseg’ 1024 dup (0)
endstk sseg zzzzzzseg Heap zzzzzzseg
Main
Page 1063
Chapter 18
18.10 Summary Resident programs provide a small amount of multitasking to DOS’ single tasking world. DOS provides support for resident programs through a rudimentary memory management system. When an application issues the terminate and stay resident call, DOS adjusts its memory pointers so the memory space reserved by the TSR code is protected from future program loading operations. For more information on how this process works, see •
“DOS Memory Usage and TSRs” on page 1025
TSRs come in two basic forms: active and passive. Passive TSRs are not self-activating. A foreground application must call a routine in a passive TSR to activate it. Generally, an application interfaces to a passive TSR using the 80x86 trap mechanism (software interrupts). Active TSRs, on the other hand, do not rely on the foreground application for activation. Instead, they attach themselves to a hardware interrupt that activates them independently of the foreground process. For more information, see •
“Active vs. Passive TSRs” on page 1029
The nature of an active TSR introduces many compatibility problems. The primary problem is that an active TSR might want to call a DOS or BIOS routine after having just interrupted either of these systems. This creates problems because DOS and BIOS are not reentrant. Fortunately, MS-DOS provides some hooks that give active TSRs the ability to schedule DOS calls with DOS is inactive. Although the BIOS routines do not provide this same facility, it is easy to add a wrapper around a BIOS call to let you schedule calls appropriately. One additional problem with DOS is that an active TSR might disturb some global variable in use by the foreground process. Fortunately, DOS lets the TSR save and restore these values, preventing some nasty compatibility problems. For details, see • • • • •
“Reentrancy” on page 1032 “Reentrancy Problems with DOS” on page 1032 “Reentrancy Problems with BIOS” on page 1033 “Reentrancy Problems with Other Code” on page 1034 “Other DOS Related Issues” on page 1039
MS-DOS provides a special interrupt to coordinate communication between TSRs and other applications. The multiplex interrupt lets you easily check for the presence of a TSR in memory, remove a TSR from memory, or pass various information between the TSR and an active application. For more information, see •
“The Multiplex Interrupt (INT 2Fh)” on page 1034
Well written TSRs follow stringent rules. In particular, a good TSR follows certain conventions during installation and always provide the user with a safe removal mechanism that frees all memory in use by the TSR. In those rare cases where a TSR cannot remove itself, it always reports an appropriate error and instructs the user how to solve the problem. For more information on load and removing TSRs, see • • •
“Installing a TSR” on page 1035 “Removing a TSR” on page 1037 “A Keyboard Monitor TSR” on page 1041
A semiresident routine is one that is resident during the execution of some specific program. It automatically unloads itself when that application terminates. Semiresident applications find application as program patchers and “time-release TSRs.” For more information on semiresident programs, see •
Page 1064
“Semiresident Programs” on page 1055
Processes, Coroutines, and Concurrency
Chapter 19
When most people speak of multitasking, they usually mean the ability to run several different application programs concurrently on one machine. Given the structure of the original 80x86 chips and MS-DOS’ software design, this is very difficult to achieve when running DOS. Look at how long it’s taken Microsoft to get Windows to multitask as well as it does. Given the problems large companies like Microsoft have had trying to get multitasking to work, you might thing that it is a very difficult thing to manage. However, this isn’t true. Microsoft has problems trying to make different applications that are unaware of one another work harmoniously together. Quite frankly, they have not succeeded in getting existing DOS applications to multitask well. Instead, they’ve been working on developers to write new programs that work well under Windows. Multitasking is not trivial, but it is not that difficult when you write an application with multitasking specifically in mind. You can even write programs that multitask under DOS if you only take a few precautions. In this chapter, we will discuss the concept of a DOS process, a coroutine, and a general process.
19.1
DOS Processes Although MS-DOS is a single tasking operating system, this does not mean there can only be one program at a time in memory. Indeed, the whole purpose of the previous chapter was to describe how to get two or more programs operating in memory at one time. However, even if we ignore TSRs for the time being, you can still load several programs into memory at one time under DOS. The only catch is, DOS only provides the ability for them to run one at a time in a very specific fashion. Unless the processes are cooperating, their execution profile follows a very strict pattern.
19.1.1 Child Processes in DOS When a DOS application is running, it can load and executing some other program using the DOS EXEC function (see “MS-DOS, PC-BIOS, and File I/O” on page 699). Under normal circumstances, when an application (the parent) runs a second program (the child), the child process executes to completion and then returns to the parent. This is very much like a procedure call, except it is a little more difficult to pass parameters between the two. MS-DOS provides several functions you can use to load and execute program code, terminate processes, and obtain the exit status for a process. The following table lists many of these operations.
Table 67: DOS Character Oriented Functions Function # (AH) 4Bh
4Bh
4Bh
Input Parameters
al - 0 ds:dx - pointer to program name. es:bx - pointer to LOADEXEC structure.
al - 1 ds:dx - pointer to program name. es:bx - pointer to LOAD structure.
Output Parameters
Description
ax- error code if
Load and execute program
carry set. ax- error code if
Load program
carry set.
al - 3 ds:dx - pointer to program name. es:bx - pointer to OVERLAY structure.
ax- error code if carry set.
Load overlay
Page 1065 Thi d
t
t d ith F
M k
402
Chapter 19
Table 67: DOS Character Oriented Functions Function # (AH) 4Ch
Input Parameters
Output Parameters
al - process return code
al - return value ah- termination
4Dh
Description Terminate execution Get child process return value
method.
19.1.1.1 Load and Execute The “load and execute” call requires two parameters. The first, in ds:dx, is a pointer to a zero terminated string containing the pathname of the program to execute. This must be a “.COM” or “.EXE” file and the string must contain the program name’s extension. The second parameter, in es:bx, is a pointer to a LOADEXEC data structure. This data structure takes the following form: LOADEXEC EnvPtr CmdLinePtr FCB1 FCB2 LOADEXEC
struct word dword dword dword ends
? ? ? ?
;Pointer ;Pointer ;Pointer ;Pointer
to to to to
environment area command line default FCB1 default FCB2
Envptr is the segment address of the DOS environment block created for the new application. If this field contains a zero, DOS creates a copy of the current process’ environment block for the child process. If the program you are running does not access the environment block, you can save several hundred bytes to a few kilobytes by pointing the environment pointer field to a string of four zeros.
The CmdLinePtr field contains the address of the command line to supply to the program. DOS will copy this command line to offset 80h in the new PSP it creates for the child process. A valid command line consists of a byte containing a character count, a least one space, any character belonging to the command line, and a terminating carriage return character (0Dh). The first byte should contain the length of the ASCII characters in the command line, not including the carriage return. If this byte contains zero, then the second byte of the command line should be the carriage return, not a space. Example: MyCmdLine
byte
12, “ file1 file2”,cr
The FCB1 and FCB2 fields need to point at the two default file control blocks for this program. FCBs became obsolete with DOS 2.0, but Microsoft has kept FCBs around for compatibility anyway. For most programs you can point both of these fields at the following string of bytes: DfltFCB
byte
3,” “,0,0,0,0,0
The load and execute call will fail if there is insufficient memory to load the child process. When you create an “.EXE” file using MASM, it creates an executable file that grabs all available memory, by default. Therefore, there will be no memory available for the child process and DOS will always return an error. Therefore, you must readjust the memory allocation for the parent process before attempting to run the child process. The section “Semiresident Programs” on page 1055 describes how to do this. There are other possible errors as well. For example, DOS might not be able to locate the program name you specify with the zero terminated string. Or, perhaps, there are too many open files and DOS doesn’t have a free buffer available for the file I/O. If an error occurs, DOS returns with the carry flag set and an appropriate error code in the ax register. The following example program executes the “COMMAND.COM” program, allowing a user to execute DOS commands from inside your application. When the user types “exit” at the DOS command line, DOS returns control to your program. ; RUNDOS.ASM - Demonstrates how to invoke a copy of the COMMAND.COM ; DOS command line interpreter from your programs. include
Page 1066
stdlib.a
Processes, Coroutines, and Concurrency includelib stdlib.lib dseg
segment
para public ‘data’
; MS-DOS EXEC structure. ExecStruct
word dword dword dword
0 CmdLine DfltFCB DfltFCB
DfltFCB CmdLine PgmName
byte byte dword
3,” “,0,0,0,0,0 0, 0dh filename
filename
byte
“c:\command.com”,0
dseg
ends
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
Main
proc mov mov
ax, dseg ds, ax
MemInit ; ; ; ; ; ; ; ; ; ;
Okay, we’ve built command line, now The first step is isn’t using. That
;Use parent’s Environment blk. ;For the cmd ln parms.
;Cmd line for program. ;Points at pgm name.
;Get ptr to vars segment ;Start the memory mgr.
the MS-DOS execute structure and the necessary let’s see about running the program. to free up all the memory that this program would be everything from zzzzzzseg on.
Note: unlike some previous examples in other chapters, it is okay to call Standard Library routines in this program after freeing up memory. The difference here is that the Standard Library routines are loaded early in memory and we haven’t free up the storage they are sitting in. mov int mov mov sub mov mov int
ah, 21h es, ax, ax, bx, ah, 21h
62h bx zzzzzzseg bx ax 4ah
;Get our PSP value ;Compute size of ; resident run code. ;Release unused memory.
; Tell the user what is going on: print byte byte byte byte
cr,lf “RUNDOS- Executing a copy of command.com”,cr,lf “Type ‘EXIT’ to return control to RUN.ASM”,cr,lf 0
; Warning! No Standard Library calls after this point. We’ve just ; released the memory that they’re sitting in. So the program load ; we’re about to do will wipe out the Standard Library code. mov mov mov lds mov int ; ; ; ; ; ;
bx, es, bx, dx, ax, 21h
seg ExecStruct bx offset ExecStruct ;Ptr to program record. PgmName 4b00h ;Exec pgm
In MS-DOS 6.0 the following code isn’t required. But in various older versions of MS-DOS, the stack is messed up at this point. Just to be safe, let’s reset the stack pointer to a decent place in memory. Note that this code preserves the carry flag and the value in the AX register so we can test for a DOS error condition when we are done
Page 1067
Chapter 19 ; fixing the stack. mov mov mov mov mov
bx, ss, sp, bx, ds,
sseg ax offset EndStk seg dseg bx
; Test for a DOS error: jnc print byte puti print byte byte jmp
GoodCommand “DOS error #”,0 “ while attempting to run COMMAND.COM”,cr,lf 0 Quit
; Print a welcome back message. GoodCommand:
print byte byte byte
“Welcome back to RUNDOS. Hope you had fun.”,cr,lf “Now returning to MS-DOS’ version of COMMAND.COM.” cr,lf,lf,0
; Return control to MS-DOS Quit: Main cseg
ExitPgm endp ends
sseg
segment dw ends
para stack ‘stack’ 128 dup (0)
segment db ends end
para public ‘zzzzzzseg’ 200h dup (?)
sseg zzzzzzseg Heap zzzzzzseg
Main
19.1.1.2 Load Program The load and execute function gives the parent process very little control over the child process. Unless the child communicates with the parent process via a trap or interrupt, DOS suspends the parent process until the child terminates. In many cases the parent program may want to load the application code and then execute some additional operations before the child process takes over. Semiresident programs, appearing in the previous chapter, provide a good example. The DOS “load program” function provides this capability; it will load a program from the disk and return control back to the parent process. The parent process can do whatever it feels is appropriate before passing control to the child process. The load program call requires parameters that are very similar to the load and execute call. Indeed, the only difference is the use of the LOAD structure rather than the LOADEXEC structure, and even these structures are very similar to one another. The LOAD data structure includes two extra fields not present in the LOADEXE structure: LOAD EnvPtr CmdLinePtr FCB1 FCB2 SSSP CSIP LOAD
struct word dword dword dword dword dword ends
? ? ? ? ? ?
;Pointer to environment area. ;Pointer to command line. ;Pointer to default FCB1. ;Pointer to default FCB2. ;SS:SP value for child process. ;Initial program starting point.
The LOAD command is useful for many purposes. Of course, this function provides the primary vehicle for creating semiresident programs; however, it is also quite useful for providing extra error recovery, Page 1068
Processes, Coroutines, and Concurrency
redirecting application I/O, and loading several executable processes into memory for concurrent execution. After you load a program using the DOS load command, you can obtain the PSP address for that program by issuing the DOS get PSP address call (see “MS-DOS, PC-BIOS, and File I/O” on page 699). This would allow the parent process to modify any values appearing in the child process’ PSP prior to its execution. DOS stores the termination address for a procedure in the PSP. This termination address normally appears in the double word at offset 10h in the PSP. If you do not change this location, the program will return to the first instruction beyond the int 21h instruction for the load function. Therefore, before actually transferring control to the user application, you should change this termination address.
19.1.1.3 Loading Overlays Many programs contain blocks of code that are independent of one other; that is, while routines in one block of code execute, the program will not call routines in the other independent blocks of code. For example, a modern game may contain some initialization code, a “staging area” where the user chooses certain options, an “action area” where the user plays the game, and a “debriefing area” that goes over the player’s actions. When running in a 640K MS-DOS machine, all this code may not fit into available memory at the same time. To overcome this memory limitation, most large programs use overlays. An overlay is a portion of the program code that shares memory for its code with other code modules. The DOS load overlay function provides support for large programs that need to use overlays. Like the load and load/execute functions, the load overlay expects a pointer to the code file’s pathname in the ds:dx register pair and the address of a data structure in the es:bx register pair. This overlay data structure has the following format: overlay StartSeg RelocFactor overlay
struct word word ends
? 0
The StartSeg field contains the segment address where you RelocFactor field contains a relocation factor. This value should
want DOS to load the program. The be zero unless you want the starting
offset of the segment to be something other than zero.
19.1.1.4 Terminating a Process The process termination function is nothing new to you by now, you’ve used this function over and over again already if you written any assembly language programs and run them under DOS (the Standard Library ExitPgm macro executes this command). In this section we’ll look at exactly what the terminate process function call does. First of all, the terminate process function gives you the ability to pass a single byte termination code back to the parent process. Whatever value you pass in al to the terminate call becomes the return, or termination code. The parent process can test this value using the Get Child Process Return Value call (see the next section). You can also test this return value in a DOS batch file using the “if errorlevel” statement. The terminate process command does the following: • • • •
Flushes file buffers and closes files. Restores the termination address (int 22h) from offset 0Ah in the PSP (this is the return address of the process). Restores the address of the Break handler (int 23h) from offset 0Eh in the PSP (see “Exception Handling in DOS: The Break Handler” on page 1070) Restores the address of the critical error handler (int 24h) from offset 12h in the PSP (see “Exception Handling in DOS: The Critical Error Handler” on page 1071).
Page 1069
Chapter 19
•
Deallocates any memory held by the process.
Unless you really know what you’re doing, you should not change the values at offsets 0Ah, 0Eh, or 12h in the PSP. By doing so you could produce an inconsistent system when your program terminates.
19.1.1.5 Obtaining the Child Process Return Code A parent process can obtain the return code from a child process by making the DOS Get Child Process Return Code function call. This call returns the value in the al register at the point of termination plus information that tells you how the child process terminated. This call (ah=4Dh) returns the termination code in the al register. It also returns the cause of termination in the ah register. The ah register will contain one of the following values:
Table 68: Termination Cause Value in AH
Reason for Termination
0
Normal termination (int 21h, ah=4Ch)
1
Terminated by ctrl-C
2
Terminated by critical error
3
TSR termination (int 21h, ah=31h)
The termination code appearing in al is valid only for normal and TSR terminations. Note that you can only call this routine once after a child process terminates. MS-DOS returns meaningless values in AX after the first such call. Likewise, if you use this function without running a child process, the results you obtain will be meaningless. DOS does not return if you do this.
19.1.2 Exception Handling in DOS: The Break Handler Whenever the users presses a ctrl-C or ctrl-Break key MS-DOS may trap such a key sequence and execute an int 23h instruction1. MS-DOS provides a default break handler routine that terminates the program. However, a well-written program generally replaces the default break handler with one of its own so it can capture ctrl-C or ctrl-break key sequences and shut the program down in an orderly fashion. When DOS terminates a program due to a break interrupt, it flushes file buffers, closes all open files, releases memory belonging to the application, all the normal stuff it does on program termination. However, it does not restore any interrupt vectors (other than interrupt 23h and interrupt 24h). If your code has replaced any interrupt vectors, especially hardware interrupt vectors, then those vectors will still be pointing at your program’s interrupt service routines after DOS terminates your program. This will probably crash the system when DOS loads a new program over the top of your code. Therefore, you should write a break handler so your application can shut itself down in an orderly fashion if the user presses ctrl-C or ctrl-break. The easiest, and perhaps most universal, break handler consists of a single instruction – iret . If you point the interrupt 23h vector at an iret instruction, MS-DOS will simply ignore any ctrl-C or ctrl-break keys you press. This is very useful for turning off the break handling during critical sections of code that you do not want the user to interrupt.
1. MS-DOS always executes an int 23h instruction if it is processing a function code in the range 1-0Ch. For other DOS functions, MS-DOS only executes int 23h if the Break flag is set
Page 1070
Processes, Coroutines, and Concurrency
On the other hand, simply turning off ctrl-C and ctrl-break handling throughout your entire program is not satisfactory either. If for some reason the user wants to abort your program, pressing ctrl-break or ctrl-C is what they will probably try to do this. If your program disallows this, the user may resort to something more drastic like ctrl-alt-delete to reset the machine. This will certainly mess up any open files and may cause other problems as well (of course, you don’t have to worry about restoring any interrupt vectors!). To patch in your own break handler is easy – just store the address of your break handler routine into the interrupt vector 23h. You don’t even have to save the old value, DOS does this for you automatically (it stores the original vector at offset 0Eh in the PSP). Then, when the users presses a ctrl-C or ctrl-break key, MS-DOS transfers control to your break handler. Perhaps the best response for a break handler is to set some flag to tell the application and break occurred, and then leave it up to the application to test this flag a reasonable points to determine if it should shut down. Of course, this does require that you test this flag at various points throughout your application, increasing the complexity of your code. Another alternative is to save the original int 23h vector and transfer control to DOS’ break handler after you handle important operations yourself. You can also write a specialized break handler to return a DOS termination code that the parent process can read. Of course, there is no reason you cannot change the interrupt 23h vector at various points throughout your program to handle changing requirements. At various points you can disable the break interrupt entirely, restore interrupt vectors at others, or prompt the user at still other points.
19.1.3 Exception Handling in DOS: The Critical Error Handler DOS invokes the critical error handler by executing an int 24h instruction whenever some sort of I/O error occurs. The default handler prints the familiar message: I/O Device Specific Error Message Abort, Retry, Ignore, Fail?
If the user presses an “A”, this code immediately returns to DOS’ COMMAND.COM program; it doesn’t even close any open files. If the user presses an “R” to retry, MS-DOS will retry the I/O operation, though this usually results in another call to the critical error handler. The “I” option tells MS-DOS to ignore the error and return to the calling program as though nothing had happened. An “F” response instructs MS-DOS to return an error code to the calling program and let it handle the problem. Of the above options, having the user press “A” is the most dangerous. This causes an immediate return to DOS and your code does not get the chance to clean up anything. For example, if you’ve patched some interrupt vectors, your program will not get the opportunity to restore them if the user selects the abort option. This may crash the system when MS-DOS loads the next program over the top of your interrupt service routine(s) in memory. To intercept DOS critical errors, you will need to patch the interrupt 24h vector to point at your own interrupt service routine. Upon entry into your interrupt 24h service routine, the stack will contain the following data:
Page 1071
Chapter 19
Flags CS IP ES DS BP DI SI DX CX BX AX Flags CS IP
Original INT 24h return address
Registers DOS pushes for your INT 24h handler
INT 24h return address (back to DOS) for your handler
Stack Contents Upon Entry to a Critical Error Handler MS-DOS passes important information in several of the registers to your critical error handler. By inspecting these values you can determine the cause of the critical error and the device on which it occurred. The high order bit of the ah register determines if the error occurred on a block structured device (typically a disk or tape) or a character device. The other bits in ah have the following meaning:
Table 69: Device Error Bits in AH Bit(s) 0
Page 1072
Description 0=Read operation. 1=Write operation.
1-2
Indicates affected disk area. 00- MS-DOS area. 01- File allocation table (FAT). 10- Root directory. 11- Files area.
3
0- Fail response not allowed. 1- Fail response is okay.
4
0- Retry response not allowed. 1- Retry response is okay.
5
0- Ignore response is not allowed. 1- Ignore response is okay.
6
Undefined
7
0- Character device error. 1- Block structured device error.
Processes, Coroutines, and Concurrency
In addition to the bits in ah, for block structured devices the al register contains the drive number where the error occurred (0=A, 1=B, 2=C, etc.). The value in the al register is undefined for character devices.
The lower half of the di register contains additional information about the block device error (the upper byte of di is undefined, you will need to mask out those bits before attempting to test this data).
Table 70: Block Structured Device Error Codes (in L.O. byte of DI) Error Code
Description
0
Write protection error.
1
Unknown drive.
2
Drive not ready.
3
Invalid command.
4
Data error (CRC error).
5
Length of request structure is incorrect.
6
Seek error on device.
7
Disk is not formatted for MS-DOS.
8
Sector not found.
9
Printer out of paper.
0Ah
Write error.
0Bh
Read error.
0Ch
General failure.
0Fh
Disk was changed at inappropriate time.
Upon entry to your critical error handler, interrupts are turned off. Because this error occurs as a result of some MS-DOS call, MS-DOS is already entered and you will not be able to make any calls other than functions 1-0Ch and 59h (get extended error information). Your critical error handler must preserve all registers except al. The handler must return to DOS with an iret instruction and al must contain one of the following codes:
Table 71: Critical Error Handler Return Codes Code
Meaning
0
Ignore device error.
1
Retry I/O operation again.
2
Terminate process (abort).
3
Fail current system call.
The following code provides a trivial example of a critical error handler. The main program attempts to send a character to the printer. If you do not connect a printer, or turn off the printer before running this program, it will generate the critical error. ; ; ; ; ; ;
Sample INT 24h critical error handler. This code demonstrates a sample critical error handler. It patches into INT 24h and displays an appropriate error message and asks the user if they want to retry, abort, ignore, or fail (just like DOS).
Page 1073
Chapter 19 .xlist include stdlib.a includelib stdlib.lib .list dseg
segment
para public ‘data’
Value ErrCode
word word
0 0
dseg
ends
cseg
segment assume
; ; ; ; ; ;
A replacement critical error handler. Note that this routine is even worse than DOS’, but it demonstrates how to write such a routine. Note that we cannot call any Standard Library I/O routines in the critical error handler because they do not use DOS calls 1-0Ch, which are the only allowable DOS calls at this point.
CritErrMsg
byte byte byte
cr,lf “DOS Critical Error!”,cr,lf “A)bort, R)etry, I)gnore, F)ail? $”
MyInt24
proc push push push
far dx ds ax
push pop lea mov int
cs ds dx, CritErrMsg ah, 9 21h
mov int and
ah, 1 21h al, 5Fh
;DOS read character call.
cmp jne pop mov jmp
al, ‘I’ NotIgnore ax al, 0 Quit24
;Ignore?
NotIgnore:
cmp jne pop mov jmp
al, ‘r’ NotRetry ax al, 1 Quit24
;Retry?
NotRetry:
cmp jne pop mov jmp
al, ‘A’ NotAbort ax al, 2 Quit24
;Abort?
NotAbort:
cmp jne pop mov pop pop iret
al, ‘F’ BadChar ax al, 3 ds dx
mov mov jmp endp
ah, 2 dl, 7 Int24Lp
Int24Lp:
Quit24:
BadChar: MyInt24
Page 1074
para public ‘code’ cs:cseg, ds:dseg
;DOS print string call.
;Convert l.c. -> u.c.
;Bell character
Processes, Coroutines, and Concurrency
Main
proc mov mov mov meminit
ax, dseg ds, ax es, ax
mov mov mov mov
ax, 0 es, ax word ptr es:[24h*4], offset MyInt24 es:[24h*4 + 2], cs
mov mov int rcl and mov printf byte byte byte dword
ah, 5 dl, ‘a’ 21h Value, 1 Value, 1 ErrCode, ax
Quit: Main
ExitPgm endp
cseg
ends
cr,lf,lf “Print char returned with error status %d and “ “error code %d\n”,0 Value, ErrCode ;DOS macro to quit program.
; Allocate a reasonable amount of space for the stack (8k). ; Note: if you use the pattern matching package you should set up a ; somewhat larger stack. sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
; zzzzzzseg must be the last segment that gets loaded into memory! ; This is where the heap begins. zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
19.1.4 Exception Handling in DOS: Traps In addition to the break and critical error exceptions, there are the 80x86 exceptions that can happen during the execution of your programs. Examples include the divide error exception, bounds exception, and illegal opcode exception. A well-written application will always handle all possible exceptions. DOS does not provide direct support for these exceptions, other than a possible default handler. In particular, DOS does not restore such vectors when the program terminates; this is something the application, break handler, and critical error handler must take care of. For more information on these exceptions, see “Exceptions” on page 1000.
19.1.5 Redirection of I/O for Child Processes When a child process begins execution, it inherits all open files from the parent process (with the exception of certain files opened with networking file functions). In particular, this includes the default Page 1075
Chapter 19
files opened for the DOS standard input, standard output, standard error, auxiliary, and printer devices. DOS assigns the file handle values zero through four, respectively, to these devices. If a parent process closes one of these file handles and then reassigns the handle with a Force Duplicate File Handle call. Note that the DOS EXEC call does not process the I/O redirection operators (“<“, and “>”, and “|”). If you want to redirect the standard I/O of a child process, you must do this before loading and executing the child process. To redirect one of the five standard I/O devices, you should do the following steps: 1)
Duplicate the file handle you want to redirect (e.g., to redirect the standard output, duplicate file handle one).
2)
Close the affected file (e.g., file handle one for standard output).
3)
Open a file using the standard DOS Create or CreateNew calls.
4)
Use the Force Duplicate File Handle call to copy the new file handle to file handle one.
5)
Run the child process.
6)
On return from the child, close the file.
7)
Copy the file handle you duplicated in step one back to the standard output file handle using the Force Duplicate Handle function.
This technique looks like it would be perfect for redirecting printer or serial port I/O. Unfortunately, many programs bypass DOS when sending data to the printer and use the BIOS call or, worse yet, go directly to the hardware. Almost no software bothers with DOS’ serial port support – it truly is that bad. However, most programs do call DOS to input or output characters on the standard input, output, and error devices. The following code demonstrates how to redirect the output of a child process to a file. ; REDIRECT.ASM -Demonstrates how to redirect I/O for a child process. ; This particular program invokes COMMAND.COM to execute ; a DIR command, when is sent to the specified output file. include stdlib.a includelib stdlib.lib dseg
segment
OrigOutHandle word FileHandle word FileName byte
para public ‘data’ ? ? “dirctry.txt”,0
;Holds copy of STDOUT handle. ;File I/O handle. ;Filename for output data.
;Use parent’s Environment blk. ;For the cmd ln parms.
; MS-DOS EXEC structure. ExecStruct
word dword dword dword
0 CmdLine DfltFCB DfltFCB
DfltFCB CmdLine PgmName PgmNameStr dseg
byte byte dword byte ends
3,” “,0,0,0,0,0 7, “ /c DIR”, 0dh ;Do a directory command. PgmNameStr ;Points at pgm name. “c:\command.com”,0
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
Main
proc mov mov MemInit
ax, dseg ds, ax
;Get ptr to vars segment ;Start the memory mgr.
; Free up some memory for COMMAND.COM: mov int
Page 1076
ah, 62h 21h
;Get our PSP value
Processes, Coroutines, and Concurrency mov mov sub mov mov int
es, ax, ax, bx, ah, 21h
bx zzzzzzseg bx ax 4ah
;Compute size of ; resident run code. ;Release unused memory.
; Save original output file handle. mov bx, 1 ;Std out is file handle 1. mov ah, 45h ;Duplicate the file handle. int 21h mov OrigOutHandle, ax;Save duplicate handle. ; Open the output file: mov mov lea int mov
ah, 3ch cx, 0 dx, FileName 21h FileHandle, ax
;Create file. ;Normal attributes. ;Save opened file handle.
; Force the standard output to send its output to this file. ; Do this by forcing the file’s handle onto file handle #1 (stdout). mov mov mov int
ah, 46h cx, 1 bx, FileHandle 21h
;Force dup file handle ;Existing handle to change. ;New file handle to use.
; Print the first line to the file: print byte
“Redirected directory listing:”,cr,lf,0
; Okay, execute the DOS DIR command (that is, execute COMMAND.COM with ; the command line parameter “/c DIR”). mov mov mov lds mov int
bx, es, bx, dx, ax, 21h
seg ExecStruct bx offset ExecStruct ;Ptr to program record. PgmName 4b00h ;Exec pgm
mov mov mov mov mov
bx, ss, sp, bx, ds,
sseg ;Reset the stack on return. ax offset EndStk seg dseg bx
; Okay, close the output file and switch standard output back to the ; console. mov mov int
ah, 3eh bx, FileHandle 21h
;Close output file.
mov mov mov int
ah, 46h ;Force duplicate handle cx, 1 ;StdOut bx, OrigOutHandle ;Restore previous handle. 21h
; Return control to MS-DOS Quit: Main cseg
ExitPgm endp ends
sseg
segment dw dw ends
endstk sseg
para stack ‘stack’ 128 dup (0) ?
Page 1077
Chapter 19 zzzzzzseg Heap zzzzzzseg
19.2
segment db ends end
para public ‘zzzzzzseg’ 200h dup (?) Main
Shared Memory The only problem with running different DOS programs as part of a single application is interprocess communication. That is, how do all these programs talk to one other? When a typical DOS application runs, DOS loads in all code and data segments; there is no provision, other than reading data from a file or the process termination code, for one process to pass information to another. Although file I/O will work, it is cumbersome and slow. The ideal solution would be for one process to leave a copy of various variables that other processes can share. Your programs can easily do this using shared memory. Most modern multitasking operating systems provide for shared memory – memory that appears in the address space of two or more processes. Furthermore, such shared memory is often persistent, meaning it continues to hold values after its creator process terminates. This allows other processes to start later and use the values left behind by the shared variables’ creator. Unfortunately, MS-DOS is not a modern multitasking operating system and it does not support shared memory. However, we can easily write a resident program that provides this capability missing from DOS. The following sections describe how to create two types of shared memory regions – static and dynamic.
19.2.1 Static Shared Memory A TSR to implement static shared memory is trivial. It is a passive TSR that provides three functions – verify presence, remove, and return segment pointer. The transient portion simply allocates a 64K data segment and then terminates. Other processes can obtain the address of the 64K shared memory block by making the “return segment pointer” call. These processes can place all their shared data into the segment belonging to the TSR. When one process quits, the shared segment remains in memory as part of the TSR. When a second process runs and links with the shared segment, the variables from the shared segment are still intact, so the new process can access those values. When all processes are done sharing data, the user can remove the shared memory TSR with the remove function. As mentioned above, there is almost nothing to the shared memory TSR. The following code implements it: ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 1078
SHARDMEM.ASM This TSR sets aside a 64K shared memory region for other processes to use. Usage: SHARDMEM -
Loads resident portion and activates shared memory capabilities.
SHARDMEM REMOVE -
Removes shared memory TSR from memory.
This TSR checks to make sure there isn’t a copy already active in memory. When removing itself from memory, it makes sure there are no other interrupts chained into INT 2Fh before doing the remove.
The following segments must appear in this order and before the Standard Library includes.
ResidentSeg ResidentSeg
segment ends
para public ‘Resident’
SharedMemory
segment
para public ‘Shared’
Processes, Coroutines, and Concurrency SharedMemory
ends
EndResident EndResident
segment ends
para public ‘EndRes’
.xlist .286 include stdlib.a includelib stdlib.lib .list ; Resident segment that holds the TSR code: ResidentSeg
segment assume
para public ‘Resident’ cs:ResidentSeg, ds:nothing
; Int 2Fh ID number for this TSR: MyTSRID
byte byte
0 0
;Padding so we can print it.
; PSP is the psp address for this program. PSP
word
0
OldInt2F
dword
?
; MyInt2F; ; ; ; ; ; ; ; ; ; ; ; ; ;
Provides int 2Fh (multiplex interrupt) support for this TSR. The multiplex interrupt recognizes the following subfunctions (passed in AL):
MyInt2F
proc assume
far ds:nothing
cmp je jmp
ah, MyTSRID YepItsOurs OldInt2F
00h- Verify presence.
Returns 0FFh in AL and a pointer to an ID string in es:di if the TSR ID (in AH) matches this particular TSR.
01h- Remove.
Removes the TSR from memory. Returns 0 in AL if successful, 1 in AL if failure.
10h- Return Seg Adrs.
Returns the segment address of the shared segment in ES.
;Match our TSR identifier?
; Okay, we know this is our ID, now check for a verify, remove, or ; return segment call. YepItsOurs:
cmp jne mov lesi iret
al, 0 TryRmv al, 0ffh IDString
;Verify Call
IDString
byte
“Static Shared Memory TSR”,0
TryRmv:
cmp jne
al, 1 TryRetSeg
;Return success. ;Return back to caller.
;Remove call.
; See if we can remove this TSR: push mov mov cmp jne cmp
es ax, 0 es, ax word ptr es:[2Fh*4], offset MyInt2F TRDone word ptr es:[2Fh*4 + 2], seg MyInt2F
Page 1079
Chapter 19 TRDone:
je mov pop iret
CanRemove;Branch if we can. ax, 1 ;Return failure for now. es
; Okay, they want to remove this guy *and* we can remove it from memory. ; Take care of all that here.
CanRemove:
assume
ds:ResidentSeg
push pusha cli mov mov mov mov
ds ax, es, ax, ds,
mov mov mov mov
ax, word ptr OldInt2F es:[2Fh*4], ax ax, word ptr OldInt2F+2 es:[2Fh*4 + 2], ax
0 ax cs ax
;Turn off the interrupts while ; we mess with the interrupt ; vectors.
; Okay, one last thing before we quit- Let’s give the memory allocated ; to this TSR back to DOS. mov mov mov int
ds, PSP es, ds:[2Ch] ah, 49h 21h
mov mov mov int
ax, ds es, ax ah, 49h 21h
popa pop pop mov iret
ds es ax, 0
;Ptr to environment block. ;DOS release memory call. ;Release program code space.
;Return Success.
; See if they want us to return the segment address of our shared segment ; here. TryRetSeg:
cmp al, jne IllegalOp mov ax, mov es, mov ax, clc iret
10h
;Return Segment Opcode
SharedMemory ax 0
;Return success
; They called us with an illegal subfunction value. Try to do as little ; damage as possible. IllegalOp: MyInt2F ResidentSeg
mov iret endp assume ends
ax, 0
;Who knows what they were thinking?
ds:nothing
; Here’s the segment that will actually hold the shared data. SharedMemory SharedMemory cseg
Page 1080
segment db ends
para public ‘Shared’ 0FFFFh dup (?)
segment assume
para public ‘code’ cs:cseg, ds:ResidentSeg
Processes, Coroutines, and Concurrency ; SeeIfPresent; ;
Checks to see if our TSR is already present in memory. Sets the zero flag if it is, clears the zero flag if it is not.
SeeIfPresent
proc push push push mov mov push mov int pop cmp je strcmpl byte je
near es ds di cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
dec js cmp pop pop pop ret endp
cl IDLoop cx, 0 di ds es
IDLoop:
TryNext: Success:
SeeIfPresent
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Static Shared Memory TSR”,0 Success ;Test USER IDs of 80h..FFh ;Clear zero flag.
; FindID; ; ; ; ;
Determines the first (well, last actually) TSR ID available in the multiplex interrupt chain. Returns this value in the CL register.
FindID
proc push push push
near es ds di
mov mov push mov int pop cmp je dec js xor cmp pop pop pop ret endp
cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 Success cl IDLoop cx, cx cx, 1 di ds es
IDLoop:
Success:
FindID
Main
Returns the zero flag set if it locates an empty slot. Returns the zero flag clear if failure.
;Start with ID 0FFh. ;Verify presence call. ;Present in memory? ;Test USER IDs of 80h..FFh ;Clear zero flag
proc meminit mov mov
ax, ResidentSeg ds, ax
mov int mov
ah, 62h 21h PSP, bx
;Get this program’s PSP ; value.
; Before we do anything else, we need to check the command line
Page 1081
Chapter 19 ; parameters. If there is one, and it is the word “REMOVE”, then remove ; the resident copy from memory using the multiplex (2Fh) interrupt.
Usage:
argc cmp jb je print byte byte byte ExitPgm
cx, 1 TstPresent DoRemove
;Must have 0 or 1 parms.
“Usage:”,cr,lf “ shardmem”,cr,lf “or shardmem REMOVE”,cr,lf,0
; Check for the REMOVE command. DoRemove:
RemoveIt:
mov argv stricmpl byte jne
“REMOVE”,0 Usage
call je print byte byte ExitPgm
SeeIfPresent RemoveIt
mov printf byte dword
MyTSRID, cl
mov mov int cmp je print byte ExitPgm RmvFailure:
ax, 1
print byte byte byte byte ExitPgm
“TSR is not present in memory, cannot remove” cr,lf,0
“Removing TSR (ID #%d) from memory...”,0 MyTSRID ah, cl al, 1 2Fh al, 1 RmvFailure
;Remove cmd, ah contains ID ;Succeed?
“removed.”,cr,lf,0
cr,lf “Could not remove TSR from memory.”,cr,lf “Try removing other TSRs in the reverse order “ “you installed them.”,cr,lf,0
; Okay, see if the TSR is already in memory. If so, abort the ; installation process. TstPresent:
call jne print byte byte ExitPgm
SeeIfPresent GetTSRID “TSR is already present in memory.”,cr,lf “Aborting installation process”,cr,lf,0
; Get an ID for our TSR and save it away. GetTSRID:
call je print byte ExitPgm
FindID GetFileName “Too many resident TSRs, cannot install”,cr,lf,0
; Things look cool so far, so install the interrupts
Page 1082
Processes, Coroutines, and Concurrency GetFileName:
mov print byte
MyTSRID, cl “Installing interrupts...”,0
; Patch into the INT 2Fh interrupt chain. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[2Fh*4] word ptr OldInt2F, ax ax, es:[2Fh*4 + 2] word ptr OldInt2F+2, ax es:[2Fh*4], offset MyInt2F es:[2Fh*4+2], seg ResidentSeg ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to zero out the shared ; memory segment and then terminate and stay resident. printf byte dword
“Installed, TSR ID #%d.”,cr,lf,0 MyTSRID
mov mov mov xor mov stosw
ax, es, cx, ax, di,
dx, EndResident dx, PSP ax, 3100h 21h
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 256 dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
rep
SharedMemory ax 32768 ax ax
;Zero out the shared ; memory segment. ;32K words = 64K bytes. ;Store all zeros, ; starting at offset zero.
;Compute size of program. ;DOS TSR command.
Main
This program simply carves out a chunk of memory (the 64K in the SharedMemory segment) and returns a pointer to it in es whenever a program executes the appropriate int 2Fh call (ah= TSR ID and al =10h). The only catch is how do we declared shared variables in the applications that use shared memory? Well, that’s fairly easy if we play a sneaky trick on MASM, the Linker, DOS, and the 80x86. When DOS loads your program into memory, it generally loads the segments in the same order they first appear in your source files. The UCR Standard Library, for example, takes advantage of this by insisting that you include a segment named zzzzzzseg at the end of all your assembly language source files. The UCR Standard Library memory management routines build the heap starting at zzzzzzseg, it must be the last segment (containing valid data) because the memory management routines may overwrite anything following zzzzzzseg. For our shared memory segment, we would like to create a segment something like the following: SharedMemory
segment
para public ‘Shared’
« define all shared variables here» SharedMemory
ends
Page 1083
Chapter 19
Applications that share data would define all shared variables in this shared segment. There are, however, five problems. First, how do we tell the assembler/linker/DOS/80x86 that this is a shared segment, rather than having a separate segment for each program? Well, this problem is easy to solve; we don’t bother telling MASM, the linker, or DOS anything. The way we make the different applications all share the same segment in memory is to invoke the shared memory TSR in the code above with function code 10h. This returns the address of the TSR’s SharedMemory segment in the es register. In our assembly language programs we fool MASM into thinking es points at its local shared memory segment when, in fact, es points at the global segment. The second problem is minor, but annoying nonetheless. When you create a segment, MASM, the linker, and DOS set aside storage for that segment. If you declare a large number of variables in a shared segment, this can waste memory since the program will actually use the memory space in the global shared segment. One easy way to reclaim the storage that MASM reserves for this segment is to define the shared segment after zzzzzzseg in your shared memory applications. By doing so, the Standard Library will absorb any memory reserved for the (dummy) shared memory segment into the heap, since all memory after zzzzzzseg belongs to the heap (when you use the standard meminit call). The third problem is slightly more difficult to deal with. Since you will not be use the local segment, you cannot initialize any variables in the shared memory segment by placing values in the operand field of byte, word, dword, etc., directives. Doing so will only initialize the local memory in the heap, the system will not copy this data to the global shared segment. Generally, this isn’t a problem because processes won’t normally initialize shared memory as they load. Instead, there will probably be a single application you run first that initializes the shared memory area for the rest of the processes that using the global shared segment. The fourth problem is that you cannot initialize any variables with the address of an object in shared memory. For example, if the variable shared_K is in the shared memory segment, you could not use a statement like the following: printf byte dword
“Value of shared_K is %d\n”,0 shared_K
The problem with this code is that MASM initializes the double word after the string above with the address of the shared_K variable in the local copy of the shared data segment. This will not print out the copy in the global shared data segment. The last problem is anything but minor. All programs that use the global shared memory segment must define their variables at identical offsets within the shared segment. Given the way MASM assigns offsets to variables within a segment, if you are one byte off in the declaration of any of your variables, your program will be accessing its variables at different addresses than other processes sharing the global shared segment. This will scramble memory and produce a disaster. The only reasonable way to declare variables for shared memory programs is to create an include file with all the shared variable declarations for all concerned programs. Then include this single file into all the programs that share the variables. Now you can add, remove, or modify variables without having to worry about maintaining the shared variable declarations in the other files. The following two sample programs demonstrate the use of shared memory. The first application reads a string from the user and stuffs it into shared memory. The second application reads that string from shared memory and displays it on the screen. First, here is the include file containing the single shared variable declaration used by both applications: ; shmvars.asm ; ; This file contains the shared memory variable declarations used by ; all applications that refer to shared memory. InputLine
Page 1084
byte
128 dup (?)
Processes, Coroutines, and Concurrency
Here is the first application that reads an input string from the user and shoves it into shared memory: ; ; ; ; ; ; ;
SHMAPP1.ASM This is a shared memory application that uses the static shared memory TSR (SHARDMEM.ASM). This program inputs a string from the user and passes that string to SHMAPP2.ASM through the shared memory area. .xlist include stdlib.a includelib stdlib.lib .list
dseg ShmID dseg
segment byte ends
para public ‘data’ 0
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg, es:SharedMemory
; SeeIfPresent-Checks to see if the shared memory TSR is present in memory. ; Sets the zero flag if it is, clears the zero flag if ; it is not. This routine also returns the TSR ID in CL. SeeIfPresent
IDLoop:
TryNext: Success:
SeeIfPresent
proc push push push mov mov push mov int pop cmp je strcmpl byte je
near es ds di cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
dec js cmp pop pop pop ret endp
cl IDLoop cx, 0 di ds es
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Static Shared Memory TSR”,0 Success ;Test USER IDs of 80h..FFh ;Clear zero flag.
; The main program for application #1 links with the shared memory ; TSR and then reads a string from the user (storing the string into ; shared memory) and then terminates. Main
proc assume mov mov meminit print byte
cs:cseg, ds:dseg, es:SharedMemory ax, dseg ds, ax
“Shared memory application #1”,cr,lf,0
; See if the shared memory TSR is around: call je print byte byte
SeeIfPresent ItsThere “Shared Memory TSR (SHARDMEM) is not loaded.”,cr,lf “This program cannot continue execution.”,cr,lf,0
Page 1085
Chapter 19 ExitPgm ; If the shared memory TSR is present, get the address of the shared segment ; into the ES register: ItsThere:
mov mov int
ah, cl al, 10h 2Fh
;ID of our TSR. ;Get shared segment address.
; Get the input line from the user: print byte lea gets print byte puts print byte Quit: Main
“Enter a string: “,0 di, InputLine
;ES already points at proper seg.
“Entered ‘”,0 “‘ into shared memory.”,cr,lf,0
ExitPgm endp
;DOS macro to quit program.
cseg ends sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends
para public ‘zzzzzz’ 16 dup (?)
; ; ; ; ; ; ; ; ; ; ; ; ; ;
The shared memory segment must appear after “zzzzzzseg”. Note that this isn’t the physical storage for the data in the shared segment. It’s really just a place holder so we can declare variables and generate their offsets appropriately. The UCR Standard Library will reuse the memory associated with this segment for the heap. To access data in the shared segment, this application calls the shared memory TSR to obtain the true segment address of the shared memory segment. It can then access variables in the shared memory segment (where ever it happens to be) off the ES register. Note that all the variable declarations go into an include file. All applications that refer to the shared memory segment include this file in the SharedMemory segment. This ensures that all shared segments have the exact same variable layout.
SharedMemory
SharedMemory
segment
para public ‘Shared’
include
shmvars.asm
ends end
Main
The second application is very similar, here it is ; ; ; ; ; ; ;
Page 1086
SHMAPP2.ASM This is a shared memory application that uses the static shared memory TSR (SHARDMEM.ASM). This program assumes the user has already run the SHMAPP1 program to insert a string into shared memory. This program simply prints that string from shared memory.
Processes, Coroutines, and Concurrency .xlist include stdlib.a includelib stdlib.lib .list dseg ShmID dseg
segment byte ends
para public ‘data’ 0
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg, es:SharedMemory
; SeeIfPresent Checks to see if the shared memory TSR is present in memory. ; Sets the zero flag if it is, clears the zero flag if ; it is not. This routine also returns the TSR ID in CL. SeeIfPresent
IDLoop:
TryNext: Success:
SeeIfPresent
proc push push push mov mov push mov int pop cmp je strcmpl byte je
near es ds di cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
dec js cmp pop pop pop ret endp
cl IDLoop cx, 0 di ds es
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Static Shared Memory TSR”,0 Success ;Test USER IDs of 80h..FFh ;Clear zero flag.
; The main program for application #1 links with the shared memory ; TSR and then reads a string from the user (storing the string into ; shared memory) and then terminates. Main
proc assume mov mov meminit print byte
cs:cseg, ds:dseg, es:SharedMemory ax, dseg ds, ax
“Shared memory application #2”,cr,lf,0
; See if the shared memory TSR is around: call je print byte byte ExitPgm
SeeIfPresent ItsThere “Shared Memory TSR (SHARDMEM) is not loaded.”,cr,lf “This program cannot continue execution.”,cr,lf,0
; If the shared memory TSR is present, get the address of the shared segment ; into the ES register: ItsThere:
mov mov int
ah, cl al, 10h 2Fh
;ID of our TSR. ;Get shared segment address.
; Print the string input in SHMAPP1:
Page 1087
Chapter 19 print byte lea puts print byte
“String from SHMAPP1 is ‘”,0 di, InputLine
;ES already points at proper seg.
“‘ from shared memory.”,cr,lf,0
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends
para public ‘zzzzzz’ 16 dup (?)
; ; ; ; ; ; ; ; ; ; ; ; ; ;
;DOS macro to quit program.
The shared memory segment must appear after “zzzzzzseg”. Note that this isn’t the physical storage for the data in the shared segment. It’s really just a place holder so we can declare variables and generate their offsets appropriately. The UCR Standard Library will reuse the memory associated with this segment for the heap. To access data in the shared segment, this application calls the shared memory TSR to obtain the true segment address of the shared memory segment. It can then access variables in the shared memory segment (where ever it happens to be) off the ES register. Note that all the variable declarations go into an include file. All applications that refer to the shared memory segment include this file in the SharedMemory segment. This ensures that all shared segments have the exact same variable layout.
SharedMemory
SharedMemory
segment
para public ‘Shared’
include
shmvars.asm
ends end
Main
19.2.2 Dynamic Shared Memory Although the static shared memory the previous section describes is very useful, it does suffer from a few limitations. First of all, any program that uses the global shared segment must be aware of the location of every other program that uses the shared segment. This effectively means that the use of the shared segment is limited to a single set of cooperating processes at any one given time. You cannot have two independent sets of programs using the shared memory at the same time. Another limitation with the static system is that you must know the size of all variables when you write your program, you cannot create dynamic data structures whose size varies at run time. It would be nice, for example, to have calls like shmalloc and shmfree that let you dynamically allocate and free memory in a shared region. Fortunately, it is very easy to overcome these limitations by creating a dynamic shared memory manager. A reasonable shared memory manager will have four functions: initialize, shmalloc, shmattach, and shmfree. The initialization call reclaims all shared memory in use. The shmalloc call lets a process allocate a new block of shared memory. Only one process in a group of cooperating processes makes this call. Once shmalloc allocates a block of memory, the other processes use the shmattach call to obtain the address of the shared memory block. The following code implements a dynamic shared memory manager. The code is similar to that appearing in the Standard Library except this code allows a maximum of 64K storage on the heap. Page 1088
Processes, Coroutines, and Concurrency ; ; ; ; ; ; ; ; ; ; ; ;
SHMALLOC.ASM This TSR sets up a dynamic shared memory system. This TSR checks to make sure there isn’t a copy already active in memory. When removing itself from memory, it makes sure there are no other interrupts chained into INT 2Fh before doing the remove.
The following segments must appear in this order and before the Standard Library includes.
ResidentSeg ResidentSeg
segment ends
para public ‘Resident’
SharedMemory SharedMemory
segment ends
para public ‘Shared’
EndResident EndResident
segment ends
para public ‘EndRes’
.xlist .286 include stdlib.a includelib stdlib.lib .list ; Resident segment that holds the TSR code: ResidentSeg
segment assume
para public ‘Resident’ cs:ResidentSeg, ds:nothing
NULL
equ
0
; ; ; ; ; ; ; ;
Data structure for an allocated data region. Key-
user supplied ID to associate this region with a particular set of processes.
Next- Points at the next allocated block. Prev- Points at the previous allocated block. Size- Size (in bytes) of allocated block, not including header structure.
Region key next prev blksize Region
struct word word word word ends
? ? ? ?
Startmem
equ
Region ptr [0]
AllocatedList word FreeList word
0 0
;Points at chain of alloc’d blocks. ;Points at chain of free blocks.
; Int 2Fh ID number for this TSR: MyTSRID
byte byte
0 0
;Padding so we can print it.
; PSP is the psp address for this program. PSP
word
0
OldInt2F
dword
?
; MyInt2F; ;
Provides int 2Fh (multiplex interrupt) support for this TSR. The multiplex interrupt recognizes the following subfunctions (passed in AL):
Page 1089
Chapter 19 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; MyInt2F
00h- Verify presence.
Returns 0FFh in AL and a pointer to an ID string in es:di if the TSR ID (in AH) matches this particular TSR.
01h- Remove.
Removes the TSR from memory. Returns 0 in AL if successful, 1 in AL if failure.
11h- shmalloc
CX contains the size of the block to allocate. DX contains the key for this block. Returns a pointer to block in ES:DI and size of allocated block in CX. Returns an error code in AX. Zero is no error, one is “key already exists,” two is “insufficient memory for request.”
12h- shmfree
DX contains the key for this block. This call frees the specified block from memory.
13h- shminit
Initializes the shared memory system freeing all blocks currently in use.
14h- shmattach
DX contains the key for a block. Search for that block and return its address in ES:DI. AX contains zero if successful, three if it cannot locate a block with the specified key.
proc assume
far ds:nothing
cmp je jmp
ah, MyTSRID;Match our TSR identifier? YepItsOurs OldInt2F
; Okay, we know this is our ID, now check for a verify, remove, or ; return segment call. YepItsOurs:
cmp jne mov lesi iret
al, 0 ;Verify Call TryRmv al, 0ffh;Return success. IDString ;Return back to caller.
IDString byte “Dynamic Shared Memory TSR”,0 TryRmv:
cmp jne
al, 1 Tryshmalloc
;Remove call.
; See if we can remove this TSR:
TRDone:
push mov mov cmp jne cmp je mov pop iret
es ax, 0 es, ax word ptr es:[2Fh*4], offset MyInt2F TRDone word ptr es:[2Fh*4 + 2], seg MyInt2F CanRemove ;Branch if we can. ax, 1 ;Return failure for now. es
; Okay, they want to remove this guy *and* we can remove it from memory. ; Take care of all that here. assume
Page 1090
ds:ResidentSeg
Processes, Coroutines, and Concurrency CanRemove:
push pusha cli mov mov mov mov
ds ax, es, ax, ds,
mov mov mov mov
ax, word ptr OldInt2F es:[2Fh*4], ax ax, word ptr OldInt2F+2 es:[2Fh*4 + 2], ax
0 ax cs ax
;Turn off the interrupts while ; we mess with the interrupt ; vectors.
; Okay, one last thing before we quit- Let’s give the memory allocated ; to this TSR back to DOS.
; ; ; ; ;
ds, PSP es, ds:[2Ch] ah, 49h 21h
mov mov mov int
ax, ds es, ax ah, 49h 21h
popa pop pop mov iret
ds es ax, 0
;Ptr to environment block. ;DOS release memory call. ;Release program code space.
;Return Success.
Stick BadKey here so that it is close to its associated branch (from below). If come here, we’ve discovered an allocated block with the specified key. Return an error code (AX=1) and the size of that allocated block (in CX).
BadKey:
; ; ; ; ; ; ; ; ; ;
mov mov mov int
mov mov pop pop iret
cx, [bx].Region.BlkSize ax, 1 ;Already allocated error. bx ds
See if this is a shmalloc call. If so, on entry DX contains the key. CX contains the number of bytes to allocate. On exit: ES:DI points at the allocated block (if successful). CX contains the actual size of the allocated block (>=CX on entry). AX contains error code, 0 if no error.
Tryshmalloc:
cmp al, 11h jne Tryshmfree
;shmalloc function code.
; First, search through the allocated list to see if a block with the ; current key number already exists. DX contains the requested key. assume assume assume
ds:SharedMemory bx:ptr Region di:ptr Region
push push mov mov
ds bx bx, SharedMemory ds, bx
Page 1091
Chapter 19
SearchLoop:
mov test je
bx, ResidentSeg:AllocatedList bx, bx ;Anything on this list? SrchFreeList
cmp je mov test jne
dx, [bx].Key BadKey bx, [bx].Next bx, bx SearchLoop
;Key exist already? ;Get next region. ;NULL?, if not, try another ; entry in the list.
; If an allocated block with the specified key does not already exist, ; then try to allocate one from the free memory list. SrchFreeList: mov test je
bx, ResidentSeg:FreeList bx, bx ;Empty free list? OutaMemory
FirstFitLp:
cx, [bx].BlkSize GotBlock bx, [bx].Next bx, bx FirstFitLp
cmp jbe mov test jne
;Is this block big enough? ;If not, on to the next one. ;Anything on this list?
; If we drop down here, we were unable to find a block that was large ; enough to satisfy the request. Return an appropriate error OutaMemory:
; ; ; ; ; ; ; ; ;
mov mov pop pop iret
cx, 0 ax, 2 bx ds
;Nothing available. ;Insufficient memory error.
If we find a large enough block, we’ve got to carve the new block out of it and return the rest of the storage to the free list. If the free block is at least 32 bytes larger than the requested size, we will do this. If the free block is less than 32 bytes larger, we will simply give this free block to the requesting process. The reason for the 32 bytes is simple: We need eight bytes for the new block’s header (the free block already has one) and it doesn’t make sense to fragment blocks to sizes below 24 bytes. That would only increase processing time when processes free up blocks by requiring more work coalescing blocks.
GotBlock:
mov sub cmp jbe
ax, [bx].BlkSize ax, cx ax, 32 GrabWholeBlk
;Compute difference in size. ;At least 32 bytes left? ;If not, take this block.
; Okay, the free block is larger than the requested size by more than 32 ; bytes. Carve the new block from the end of the free block (that way ; we do not have to change the free block’s pointers, only the size. mov add sub
di, bx di, [bx].BlkSize di, cx
;Scoot to end, minus 8 ;Point at new block.
sub sub
[bx].BlkSize, cx [bx].BlkSize, 8
;Remove alloc’d block and ; room for header.
mov mov
[di].BlkSize, cx [di].Key, dx
;Save size of block. ;Save key.
; Link the new block into the list of allocated blocks.
NoPrev: RmvDone:
Page 1092
mov mov mov test je mov
bx, ResidentSeg:AllocatedList [di].Next, bx [di].Prev, NULL ;NULL previous pointer. bx, bx ;See if it was an empty list. NoPrev [bx].Prev, di ;Set prev ptr for old guy.
mov add mov mov
ResidentSeg:AllocatedList, di di, 8 ;Point at actual data area. ax, ds ;Return ptr in es:di. es, ax
Processes, Coroutines, and Concurrency mov pop pop iret
ax, 0 bx ds
;Return success.
; If the current free block is larger than the request, but not by more ; that 32 bytes, just give the whole block to the user. GrabWholeBlk: mov mov cmp je cmp je
di, bx cx, [bx].BlkSize [bx].Prev, NULL Rmv1st [bx].Next, NULL RmvLast
;Return actual size. ;First guy in list? ;Last guy in list?
; Okay, this record is sandwiched between two other in the free list. ; Cut it out from among the two. mov mov mov
ax, [bx].Next bx, [bx].Prev [bx].Next, ax
;Save the ptr to the next ; item in the prev item’s ; next field.
mov mov mov jmp
ax, bx bx, [di].Next [bx].Prev, bx RmvDone
;Save the ptr to the prev ; item in the next item’s ; prev field.
; The block we want to remove is at the beginning of the free list. ; It could also be the only item on the free list! Rmv1st:
mov mov jmp
ax, [bx].Next FreeList, ax RmvDone
;Remove from free list.
; If the block we want to remove is at the end of the list, handle that ; down here. RmvLast:
; ; ; ; ; ; ; ; ; ;
mov mov jmp
bx, [bx].Prev [bx].Next, NULL RmvDone
assume
ds:nothing, bx:nothing, di:nothing
This code handles the SHMFREE function. On entry, DX contains the key for the block to free. We need to search through the allocated block list and find the block with that key. If we do not find such a block, this code returns without doing anything. If we find the block, we need to add its memory to the free pool. However, we cannot simply insert this block on the front of the free list (as we did for the allocated blocks). It might turn out that this block we’re freeing is adjacent to one or two other free blocks. This code has to coalesce such blocks into a single free block.
Tryshmfree:
cmp jne
al, 12h Tryshminit
; First, search the allocated block list to see if we can find the ; block to remove. If we don’t find it in the list anywhere, just return. assume assume assume
ds:SharedMemory bx:ptr Region di:ptr Region
push push push
ds di bx
Page 1093
Chapter 19
SrchList:
FreeDone:
; ; ; ;
mov mov mov
bx, SharedMemory ds, bx bx, ResidentSeg:AllocatedList
test je cmp je mov test jne pop pop pop iret
bx, bx FreeDone dx, [bx].Key FoundIt bx, [bx].Next bx, bx SrchList bx di ds
;Empty allocated list? ;Search for key in DX. ;At end of list? ;Nothing allocated, just ; return to caller.
Okay, we found the block the user wants to delete. Remove it from the allocated list. There are three cases to consider: (1) it is at the front of the allocated list, (2) it is at the end of the allocated list, and (3) it is in the middle of the allocated list.
FoundIt:
cmp je cmp je
[bx].Prev, NULL Free1st [bx].Next, NULL FreeLast
;1st item in list? ;Last item in list?
; Okay, we’re removing an allocated item from the middle of the allocated ; list. mov mov mov xchg mov jmp ; ; ; ;
di, [bx].Next ax, [bx].Prev [di].Prev, ax ax, di [di].Next, ax AddFree
;[next].prev := [cur].prev
;[prev].next := [cur].next
Handle the case where we are removing the first item from the allocation list. It is possible that this is the only item on the list (i.e., it is the first and last item), but this code handles that case without any problems.
Free1st:
mov mov jmp
ax, [bx].Next ResidentSeg:AllocatedList, ax AddFree
; If we’re removing the last guy in the chain, simply set the next field ; of the previous node in the list to NULL. FreeLast: ; ; ; ; ; ; ; ; ;
mov mov
di, [bx].Prev [di].Next, NULL
Okay, now we’ve got to put the freed block onto the free block list. The free block list is sorted according to address. We have to search for the first free block whose address is greater than the block we’ve just freed and insert the new free block before that one. If the two blocks are adjacent, then we’ve got to merge them into a single free block. Also, if the block before is adjacent, we must merge it as well. This will coalesce all free blocks on the free list so there are as few free blocks as possible and those blocks are as large as possible.
AddFree:
mov test jne
ax, ResidentSeg:FreeList ax, ax ;Empty list? SrchPosn
; If the list is empty, stick this guy on as the only entry. mov mov mov jmp
Page 1094
ResidentSeg:FreeList, bx [bx].Next, NULL [bx].Prev, NULL FreeDone
Processes, Coroutines, and Concurrency ; If the free list is not empty, search for the position of this block ; in the free list: SrchPosn:
mov cmp jb mov test jne
di, ax bx, di FoundPosn ax, [di].Next ax, ax SrchPosn
;At end of list?
; If we fall down here, the free block belongs at the end of the list. ; See if we need to merge the new block with the old one. mov add add cmp je
ax, di ax, [di].BlkSize ax, 8 ax, bx MergeLast
;Compute address of 1st byte ; after this block.
; Okay, just add the free block to the end of the list. mov mov mov jmp
[di].Next, bx [bx].Prev, di [bx].Next, NULL FreeDone
; Merge the freed block with the block DI points at. MergeLast:
mov add add mov jmp
ax, [di].BlkSize ax, [bx].BlkSize ax, 8 [di].BlkSize, ax FreeDone
; If we found a free block before which we are supposed to insert ; the current free block, drop down here and handle it. FoundPosn:
mov add add cmp jne
ax, bx ax, [bx].BlkSize ax, 8 ax, di DontMerge
;Compute the address of the ; next block in memory. ;Equal to this block?
; The next free block is adjacent to the one we’re freeing, so just ; merge the two. mov add add mov mov mov mov jmp
ax, [di].BlkSize ax, 8 [bx].BlkSize, ax ax, [di].Next [bx].Next, ax ax, [di].Prev [bx].Prev, ax TryMergeB4
;Merge the sizes together. ;Tweak the links.
; If the blocks are not adjacent, just link them together here. DontMerge:
mov mov mov mov
ax, [di].Prev [di].Prev, bx [bx].Prev, ax [bx].Next, di
; Now, see if we can merge the current free block with the previous free blk. TryMergeB4:
mov mov add add cmp je pop pop pop iret
di, [bx].Prev ax, di ax, [di].BlkSize ax, 8 ax, bx CanMerge bx di ds
;Nothing allocated, just ; return to caller.
Page 1095
Chapter 19 ; If we can merge the previous and current free blocks, do that here: CanMerge:
; ; ; ;
mov mov mov add add pop pop pop iret
ax, [bx].Next [di].Next, ax ax, [bx].BlkSize ax, 8 [di].BlkSize, ax bx di ds
assume assume assume
ds:nothing bx:nothing di:nothing
Here’s where we handle the shared memory initializatin (SHMINIT) function. All we got to do is create a single block on the free list (which is all available memory), empty out the allocated list, and then zero out all shared memory.
Tryshminit:
cmp jne
al, 13h TryShmAttach
; Reset the memory allocation area to contain a single, free, block of ; memory whose size is 0FFF8h (need to reserve eight bytes for the block’s ; data structure).
rep ; ; ; ;
es di cx
mov mov mov xor mov stosw
ax, es, cx, ax, di,
;Zero out the shared ; memory segment.
mov mov mov mov mov mov
di, 4 es:[di].Region.Key, 0 ;Key is arbitrary. es:[di].Region.Next, 0 ;No other entries. es:[di].Region.Prev, 0 ; Ditto. es:[di].Region.BlkSize, 0FFF8h ;Rest of segment. ResidentSeg:FreeList, di
pop pop pop mov iret
cx di es ax, 0
;Return no error.
Handle the SHMATTACH function here. On entry, DX contains a key number. Search for an allocated block with that key number and return a pointer to that block (if found) in ES:DI. Return an error code (AX=3) if we cannot find the block.
TryShmAttach: cmp jne mov mov FindOurs:
Page 1096
SharedMemory ax 32768 ax ax
Note: the commented out lines below are unnecessary since the code above has already zeroed out the entire shared memory segment. Note: we cannot put the first record at offset zero because offset zero is the special value for the NULL pointer. We’ll use 4 instead.
; ; ;
; ; ; ;
push push push
mov cmp je mov
al, 14h IllegalOp ax, SharedMemory es, ax
;Attach opcode.
di, ResidentSeg:AllocatedList dx, es:[di].Region.Key FoundOurs di, es:[di].Region.Next
Processes, Coroutines, and Concurrency
FoundOurs:
test jne mov iret
di, di FoundOurs ax, 3
;Can’t find the key.
add mov iret
di, 8 ax, 0
;Point at actual data. ;No error.
; They called us with an illegal subfunction value. Try to do as little ; damage as possible. IllegalOp: MyInt2F ResidentSeg
mov iret endp assume ends
ax, 0
;Who knows what they were thinking?
ds:nothing
; Here’s the segment that will actually hold the shared data. SharedMemory SharedMemory
cseg
segment db ends
para public ‘Shared’ 0FFFFh dup (?)
segment assume
para public ‘code’ cs:cseg, ds:ResidentSeg
; SeeIfPresent; ;
Checks to see if our TSR is already present in memory. Sets the zero flag if it is, clears the zero flag if it is not.
SeeIfPresent
proc push push push mov mov push mov int pop cmp je strcmpl byte je
near es ds di cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
dec js cmp pop pop pop ret endp
cl IDLoop cx, 0 di ds es
IDLoop:
TryNext: Success:
SeeIfPresent
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Dynamic Shared Memory TSR”,0 Success ;Test USER IDs of 80h..FFh ;Clear zero flag.
; FindID; ; ; ; ;
Determines the first (well, last actually) TSR ID available in the multiplex interrupt chain. Returns this value in the CL register.
FindID
proc push
Returns the zero flag set if it locates an empty slot. Returns the zero flag clear if failure. near es
Page 1097
Chapter 19
IDLoop:
Success:
FindID
Main
push push
ds di
mov mov push mov int pop cmp je dec js xor cmp pop pop pop ret endp
cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 Success cl IDLoop cx, cx cx, 1 di ds es
;Start with ID 0FFh. ;Verify presence call. ;Present in memory? ;Test USER IDs of 80h..FFh ;Clear zero flag
proc meminit mov mov
ax, ResidentSeg ds, ax
mov int mov
ah, 62h 21h PSP, bx
;Get this program’s PSP ; value.
; Before we do anything else, we need to check the command line ; parameters. If there is one, and it is the word “REMOVE”, then remove ; the resident copy from memory using the multiplex (2Fh) interrupt.
Usage:
argc cmp jb je print byte byte byte ExitPgm
cx, 1 TstPresent DoRemove
;Must have 0 or 1 parms.
“Usage:”,cr,lf “ shmalloc”,cr,lf “or shmalloc REMOVE”,cr,lf,0
; Check for the REMOVE command. DoRemove:
RemoveIt:
mov argv stricmpl byte jne
“REMOVE”,0 Usage
call je print byte byte ExitPgm
SeeIfPresent RemoveIt
mov printf byte dword
MyTSRID, cl
mov mov int cmp je print
Page 1098
ax, 1
“TSR is not present in memory, cannot remove” cr,lf,0
“Removing TSR (ID #%d) from memory...”,0 MyTSRID ah, cl al, 1 2Fh al, 1 RmvFailure
;Remove cmd, ah contains ID ;Succeed?
Processes, Coroutines, and Concurrency byte ExitPgm RmvFailure:
print byte byte byte byte ExitPgm
“removed.”,cr,lf,0
cr,lf “Could not remove TSR from memory.”,cr,lf “Try removing other TSRs in the reverse order “ “you installed them.”,cr,lf,0
; Okay, see if the TSR is already in memory. If so, abort the ; installation process. TstPresent:
call jne print byte byte ExitPgm
SeeIfPresent GetTSRID “TSR is already present in memory.”,cr,lf “Aborting installation process”,cr,lf,0
; Get an ID for our TSR and save it away. GetTSRID:
call je print byte ExitPgm
FindID GetFileName “Too many resident TSRs, cannot install”,cr,lf,0
; Things look cool so far, so install the interrupts GetFileName:
mov print byte
MyTSRID, cl “Installing interrupts...”,0
; Patch into the INT 2Fh interrupt chain. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[2Fh*4] word ptr OldInt2F, ax ax, es:[2Fh*4 + 2] word ptr OldInt2F+2, ax es:[2Fh*4], offset MyInt2F es:[2Fh*4+2], seg ResidentSeg ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to initialize the shared ; memory segment and then terminate and stay resident. printf byte dword
“Installed, TSR ID #%d.”,cr,lf,0 MyTSRID
mov mov int
ah, MyTSRID al, 13h 2Fh
;Initialization call.
dx, EndResident dx, PSP ax, 3100h 21h
;Compute size of program.
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 256 dup (?)
;DOS TSR command.
Page 1099
Chapter 19 zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
We can modify the two applications from the previous section to try out this code:
; ; ; ; ; ; ;
SHMAPP3.ASM This is a shared memory application that uses the dynamic shared memory TSR (SHMALLOC.ASM). This program inputs a string from the user and passes that string to SHMAPP4.ASM through the shared memory area. .xlist include stdlib.a includelib stdlib.lib .list
dseg ShmID dseg
segment byte ends
para public ‘data’ 0
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg, es:SharedMemory
; SeeIfPresent-Checks to see if the shared memory TSR is present in memory. ; Sets the zero flag if it is, clears the zero flag if ; it is not. This routine also returns the TSR ID in CL. SeeIfPresent
IDLoop:
TryNext: Success:
SeeIfPresent
proc push push push mov mov push mov int pop cmp je strcmpl byte je
near es ds di cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
dec js cmp pop pop pop ret endp
cl IDLoop cx, 0 di ds es
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Dynamic Shared Memory TSR”,0 Success ;Test USER IDs of 80h..FFh ;Clear zero flag.
; The main program for application #1 links with the shared memory ; TSR and then reads a string from the user (storing the string into ; shared memory) and then terminates. Main
Page 1100
proc assume mov mov meminit
cs:cseg, ds:dseg, es:SharedMemory ax, dseg ds, ax
Processes, Coroutines, and Concurrency print byte
“Shared memory application #3”,cr,lf,0
; See if the shared memory TSR is around: call je print byte byte ExitPgm
SeeIfPresent ItsThere “Shared Memory TSR (SHMALLOC) is not loaded.”,cr,lf “This program cannot continue execution.”,cr,lf,0
; Get the input line from the user: ItsThere:
mov print byte lea getsm
ShmID, cl “Enter a string: “,0 di, InputLine
;ES already points at proper seg.
; The string is in our heap space. Let’s move it over to the shared ; memory segment. strlen inc push push
cx es di
;Add one for zero byte.
mov mov mov int
dx, 1234h ah, ShmID al, 11h 2Fh
;Our “key” value.
mov mov
si, di dx, es
;Save as dest ptr.
pop pop strcpy
di es
;Retrive source address.
print byte puts print byte
;Shmalloc call.
;Copy from local to shared. “Entered ‘”,0 “‘ into shared memory.”,cr,lf,0
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends
para public ‘zzzzzz’ 16 dup (?)
end
Main
; ; ; ; ;
;DOS macro to quit program.
SHMAPP4.ASM This is a shared memory application that uses the dynamic shared memory TSR (SHMALLOC.ASM). This program assumes the user has already run the SHMAPP3 program to insert a string into shared memory. This program
Page 1101
Chapter 19 ; simply prints that string from shared memory. ; .xlist include stdlib.a includelib stdlib.lib .list dseg ShmID dseg
segment byte ends
para public ‘data’ 0
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg, es:SharedMemory
; SeeIfPresent-Checks to see if the shared memory TSR is present in memory. ; Sets the zero flag if it is, clears the zero flag if ; it is not. This routine also returns the TSR ID in CL. SeeIfPresent
IDLoop:
TryNext: Success:
SeeIfPresent
proc push push push mov mov push mov int pop cmp je strcmpl byte je
near es ds di cx, 0ffh ah, cl cx al, 0 2Fh cx al, 0 TryNext
dec js cmp pop pop pop ret endp
cl IDLoop cx, 0 di ds es
;Start with ID 0FFh. ;Verify presence call. ;Present in memory?
“Dynamic Shared Memory TSR”,0 Success ;Test USER IDs of 80h..FFh ;Clear zero flag.
; The main program for application #1 links with the shared memory ; TSR and then reads a string from the user (storing the string into ; shared memory) and then terminates. Main
proc assume mov mov meminit print byte
cs:cseg, ds:dseg, es:SharedMemory ax, dseg ds, ax
“Shared memory application #4”,cr,lf,0
; See if the shared memory TSR is around: call je print byte byte ExitPgm
SeeIfPresent ItsThere “Shared Memory TSR (SHMALLOC) is not loaded.”,cr,lf “This program cannot continue execution.”,cr,lf,0
; If the shared memory TSR is present, get the address of the shared segment ; into the ES register: ItsThere:
Page 1102
mov mov mov int
ah, cl ;ID of our TSR. al, 14h ;Attach call dx, 1234h;Our “key” value 2Fh
Processes, Coroutines, and Concurrency ; Print the string input in SHMAPP3: print byte
“String from SHMAPP3 is ‘”,0
puts print byte Quit: Main
“‘ from shared memory.”,cr,lf,0
ExitPgm endp
;DOS macro to quit program.
cseg ends
19.3
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Coroutines DOS processes, even when using shared memory, suffer from one primary drawback – each program executes to completion before returning control back to the parent process. While this paradigm is suitable for many applications, it certainly does not suffice for all. A common paradigm is for two programs to swap control of the CPU back and forth while executing. This mechanism, slightly different from the subroutine call and return mechanism, is a coroutine. Before discussing coroutines, it is probably a good idea to provide a solid definition for the term process. In a nutshell, a process is a program that is executing. A program can exist on the disk; processes exist in memory and have a program stack (with return addresses, etc.) associated with them. If there are multiple processes in memory at one time, each process must have its own program stack. A cocall operation transfers control between two processes. A cocall is effectively a call and a return instruction all rolled into one operation. From the point of view of the process executing the cocall, the cocall operation is equivalent to a procedure call; from the point of view of the processing being called, the cocall operation is equivalent to a return operation. When the second process cocalls the first, control resumes not at the beginning of the first process, but immediately after the cocall operation. If two processes execute a sequence of mutual cocalls, control will transfer between the two processes in the following fashion:
Page 1103
Chapter 19
Process #1
Process #2
cocall prcs2 cocall prcs1
cocall prcs2
cocall prcs2 cocall prcs1
cocall prcs1
Cocall Sequence Between Two Processes Cocalls are quite useful for games where the “players” take turns, following different strategies. The first player executes some code to make its first move, then cocalls the second player and allows it to make a move. After the second player makes its move, it cocalls the first process and gives the first player its second move, picking up immediately after its cocall. This transfer of control bounces back and forth until one player wins. The 80x86 CPUs do not provide a cocall instruction. However, it is easy to implement cocalls with existing instructions. Even so, there is little need for you to supply your own cocall mechanism, the UCR Standard Library provides a cocall package for 8086, 80186, and 80286 processors2. This package includes the pcb (process control block) data structure and three functions you can call: coinit, cocall, and cocalll.
The pcb structure maintains the current state of a process. The pcb maintains all the register values and other accounting information for a process. When a process makes a cocall, it stores the return address for the cocall in the pcb. Later, when some other process cocalls this process, the cocall operation simply reloads the registers, include cs:ip , from the pcb and that returns control to the next instruction after the first process’ cocall. The pcb structure takes the following form: pcb
struct
2. The cocall package works fine with the other processors as long as you don’t use the 32-bit register set. Later, we will discuss how to extend the Standard Library routines to handle the 32-bit capabilities of the 80386 and late processors.
Page 1104
Processes, Coroutines, and Concurrency NextProc regsp regss regip regcs regax regbx regcx regdx regsi regdi regbp regds reges regflags PrcsID StartingTime StartingDate CPUTime
dword word word word word word word word word word word word word word word word dword dword dword
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
;Link to next PCB (for multitasking).
;Used for multitasking accounting. ;Used for multitasking accounting. ;Used for multitasking accounting.
Four of these fields (as labelled) exist for preemptive multitasking and have no meaning for coroutines. We will discuss preemptive multitasking in the next section. There are two important things that should be evident from this structure. First, the main reason the existing Standard Library coroutine support is limited to 16 bit register is because there is only room for the 16 bit versions of each of the registers in the pcb. If you want to support the 80386 and later 32 bit register sets, you would need to modify the pcb structure and the code that saves and restores registers in the pcb. The second thing that should be evident is that the coroutine code preserves all registers across a cocall. This means you cannot pass information from one process to another in the registers when using a cocall. You will need to pass data between processes in global memory locations. Since coroutines generally exist in the same program, you will not even need to resort to the shared memory techniques. Any variables you declare in your data segment will be visible to all coroutines. Note, by the way, that a program may contain more than two coroutines. If coroutine one cocalls coroutine two, and coroutine two cocalls coroutine three, and then coroutine three cocalls coroutine one, coroutine one picks up immediately after the cocall it made to coroutine two. Process #1
Process #2
cocall prcs2
cocall prcs3
Process #3
cocall prcs1
Cocalls Between Three Processes Since a cocall effectively returns to the target coroutine, you might wonder what happens on the first cocall to any process. After all, if that process has not executed any code, there is no “return address” where you can resume execution. This is an easy problem to solve, we need only initialize the return address of such a process to the address of the first instruction to execute in that process.
Page 1105
Chapter 19
A similar problem exists for the stack. When a program begins execution, the main program (coroutine one) takes control and uses the stack associated with the entire program. Since each process must have its own stack, where do the other coroutines get their stacks? The easiest way to initialize the stack and initial address for a coroutine is to do this when declaring a
pcb for a process. Consider the following pcb variable declaration: ProcessTwo
pcb
{0,
offset EndStack2, seg EndStack2, offset StartLoc2, seg StartLoc2}
This definition initializes the NextProc field with NULL (the Standard Library coroutine functions do not use this field) and initialize the ss:sp and cs:ip fields with the last address of a stack area (EndStack2) and the first instruction of the process (StartLoc2). Now all you need to do is reserve a reasonable amount of stack storage for the process. You can create multiple stacks in the SHELL.ASM sseg as follows: sseg
segment
para stack ‘stack’
; Stack for process #2: stk2 EndStack2
byte word
1024 dup (?) ?
; Stack for process #3: stk3 EndStack3
byte word
1024 dup (?) ?
; The primary stack for the main program (process #1) must appear at ; the end of sseg. stk sseg
byte ends
1024 dup (?)
There is the question of “how much space should one reserve for each stack?” This, of course, varies with the application. If you have a simple application that doesn’t use recursion or allocate any local variables on the stack, you could get by with as little as 256 bytes of stack space for a process. On the other hand, if you have recursive routines or allocate storage on the stack, you will need considerably more space. For simple programs, 1-8K stack storage should be sufficient. Keep in mind that you can allocate a maximum of 64K in the SHELL.ASM sseg. If you need additional stack space, you will need to up the other stacks in a different segment (they do not need to be in sseg, it’s just a convenient place for them) or you will need to allocate the stack space differently. Note that you do not have to allocate the stack space as an array within your program. You can also allocate stack space dynamically using the Standard Library malloc call. The following code demonstrates how to set up an 8K dynamically allocated stack for the pcb variable Process2: mov malloc jc mov mov
cx, 8192 InsufficientRoom Process2.ss, es Process2.sp, di
Setting up the coroutines the main program will call is pretty easy. However, there is the issue of setting up the pcb for the main program. You cannot initialize the pcb for the main program the same way you initialize the pcb for the other processes; it is already running and has valid cs:ip and ss:sp values. Were you to initialize the main program’s pcb the same way we did for the other processes, the system would simply restart the main program when you make a cocall back to it. To initialize the pcb for the main program, you must use the coinit function. The coinit function expects you to pass it the address of the main program’s pcb in the es:di register pair. It initializes some variables internal to the Standard Library so the first cocall operation will save the 80x86 machine state in the pcb you specify by es:di. After the coinit call, you can begin making cocalls to other processes in your program.
Page 1106
Processes, Coroutines, and Concurrency
To cocall a coroutine, you use the Standard Library cocall function. The cocall function call takes two forms. Without any parameters this function transfers control to the coroutine whose pcb address appears in the es:di register pair. If the address of a pcb appears in the operand field of this instruction, cocall transfers control to the specified coroutine (don’t forget, the name of the pcb, not the process, must appear in the operand field). The best way to learn how to use coroutines is via example. The following program is an interesting piece of code that generates mazes on the PC’s display. The maze generation algorithm has one major constraint – there must be no more than one correct solution to the maze (it is possible for there to be no solution). The main program creates a set of background processes called “demons” (actually, daemon is the correct term, but demon sounds more appropriate here). Each demon begins carving out a portion of the maze subject to the main constraint. Each demon gets to dig one cell from the maze and then it passes control to another demon. As it turns out, demons can “dig themselves into a corner” and die (demons live only to dig). When this happens, the demon removes itself from the list of active demons. When all demons die off, the maze is (in theory) complete. Since the demons die off fairly regularly, there must be some mechanism to create new demons. Therefore, this program randomly spawns new demons who start digging their own tunnels perpendicular to their parents. This helps ensure that there is a sufficient supply of demons to dig out the entire maze; the demons all die off only when there are no, or few, cells remaining to dig in the maze. ; ; ; ; ; ;
AMAZE.ASM A maze generation/solution program. This program generates an 80x25 maze and directly draws the maze on the video display. It demonstrates the use of coroutines within a program. .xlist include stdlib.a includelib stdlib.lib .list
byp
textequ
dseg
segment
para public ‘data’
; Constants: ; ; Define the “ToScreen” symbol (to any value) if the maze is 80x25 and you ; want to display it on the video screen. ToScreen
equ
0
; Maximum X and Y coordinates for the maze (matching the display). MaxXCoord MaxYCoord
equ equ
80 25
; Useful X,Y constants: WordsPerRow BytesPerRow
= =
MaxXCoord+2 WordsPerRow*2
StartX StartY EndX EndY
equ equ equ equ
1 3 MaxXCoord MaxYCoord-1
EndLoc StartLoc
= =
( (EndY-1)*MaxXCoord + EndX-1)*2 ( (StartY-1)*MaxXCoord + StartX-1)*2
;Starting ;Starting ;Ending X ;Ending Y
X coordinate for maze Y coordinate for maze coordinate for maze coordinate for maze
; Special 16-bit PC character codes for the screen for symbols drawn during ; maze generation. See the chapter on the video display for details.
WallChar
ifdef
mono
;Mono display adapter.
equ
7dbh
;Solid block character
Page 1107
Chapter 19 NoWallChar VisitChar PathChar
equ equ equ
720h 72eh 72ah
else WallChar NoWallChar VisitChar PathChar
equ equ equ equ
;space ;Period ;Asterisk ;Color display adapter.
1dbh 0edbh 0bdbh 4e2ah
;Solid block character ;space ;Period ;Asterisk
endif
; The following are the constants that may appear in the Maze array: Wall NoWall Visited
= = =
0 1 2
; The following are the directions the demons can go in the maze North South East West
= = = =
0 1 2 3
; Some important variables: ; The Maze array must contain an extra row and column around the ; outside edges for our algorithm to work properly. Maze
word
(MaxYCoord+2) dup ((MaxXCoord+2) dup (Wall))
; The follow macro computes an index into the above array assuming ; a demon’s X and Y coordinates are in the dl and dh registers, respectively. ; Returns index in the AX register MazeAdrs
; ; ; ; ;
macro mov mov mul add adc shl endm
al, ah, ah al, ah, ax,
dh WordsPerRow dl 0 1
;Index into array is computed ; by (Y*words/row + X)*2. ;Convert to byte index
The following macro computes an index into the screen array, using the same assumptions as above. Note that the screen matrix is 80x25 whereas the maze matrix is 82x27; The X/Y coordinates in DL/DH are 1..80 and 1..25 rather than 0..79 and 0..24 (like we need). This macro adjusts for that.
ScrnAdrs
macro mov dec mov mul add adc dec shl endm
al, al ah, ah al, ah, ax ax,
dh MaxXCoord dl 0 1
; PCB for the main program. The last live demon will call this guy when ; it dies. MainPCB
Page 1108
pcb
{}
Processes, Coroutines, and Concurrency ; List of up to 32 demons. MaxDemons ModDemons
= =
32 MaxDemons-1
;Must be a power of two. ;Mask for MOD computation.
DemonList
pcb
MaxDemons dup ({})
DemonIndex DemonCnt
byte byte
0 0
;Index into demon list. ;Number of demons in list.
; Random number generator seed (we’ll use our random number generator ; rather than the standard library’s because we want to be able to specify ; an initial seed value). Seed
word
dseg
ends
0
; The following is the segment address of the video display, change this ; from 0B800h to 0B000h if you have a monochrome display rather than a ; color display. ScreenSeg Screen ScreenSeg
segment equ ends
at 0b800h this word
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
; ; ; ; ;
;Don’t generate in date here!
Totally bogus random number generator, but we don’t need a really great one for this program. This code uses its own random number generator rather than the one in the Standard Library so we can allow the user to use a fixed seed to produce the same maze (with the same seed) or different mazes (by choosing different seeds).
RandNum
RandNum
proc push mov and add mov xor rol xor inc mov pop ret endp
near cx cl, byte ptr Seed cl, 7 cl, 4 ax, Seed ax, 55aah ax, cl ax, Seed ax Seed, ax cx
; Init- Handles all the initialization chores for the main program. ; In particular, it initializes the coroutine package, gets a ; random number seed from the user, and initializes the video display. Init
proc print byte getsm atoi free mov
near “Enter a small integer for a random number seed:”,0
Seed, ax
; Fill the interior of the maze with wall characters, fill the outside ; two rows and columns with nowall values. This will prevent the demons ; from wandering outside the maze. ; Fill the first row with Visited values.
Page 1109
Chapter 19
rep
cld mov lesi mov stosw
cx, WordsPerRow Maze ax, Visited
; Fill the last row with NoWall values.
rep
mov lea stosw
cx, WordsPerRow di, Maze+(MaxYCoord+1)*BytesPerRow
; Write a NoWall value to the starting position: mov
Maze+(StartY*WordsPerRow+StartX)*2, NoWall
; Write NoWall values along the two vertical edges of the maze.
EdgesLoop:
lesi mov mov mov add loop
Maze cx, MaxYCoord+1 es:[di], ax es:[di+BytesPerRow-2], ax di, BytesPerRow EdgesLoop
ifdef
ToScreen
;Plug the left edge. ;Plug the right edge.
; Okay, fill the screen with WallChar values:
rep
lesi mov mov stosw
Screen ax, WallChar cx, 2000
; Write appropriate characters to the starting and ending locations: mov mov
word ptr es:Screen+EndLoc, PathChar word ptr es:Screen+StartLoc, NoWallChar
endif
;ToScreen
; Zero out the DemonList:
rep Init
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 1110
mov lea mov mov xor stosb
cx, di, ax, es, ax,
(size pcb)*MaxDemons DemonList dseg ax ax
ret endp
CanStart- This function checks around the current position to see if the maze generator can start digging a new tunnel in a direction perpendicular to the current tunnel. You can only start a new tunnel if there are wall characters for at least two positions in the desired direction: ## *## ## If “*” is current position and “#” represent wall characters and the current direction is north or south, then it is okay for the maze generator to start a new path in the east direction. Assuming “.” represents a tunnel, you cannot start a new tunnel in the east direction if any of the following patterns occur:
Processes, Coroutines, and Concurrency ; ; .# #. ## ## ## ## ; *## *## *.# *#. *## *## ; ## ## ## ## .# #. ; ; CanStart returns true (carry set) if we can start a new tunnel off the ; path being dug by the current demon. ; ; On entry, dl is demon’s X-Coordinate ; dh is demon’s Y-Coordinate ; cl is demon’s direction CanStart
proc push push
near ax bx
MazeAdrs mov
bx, ax
;Compute index to demon(x,y) in maze.
; CL contains the current direction, 0=north, 1=south, 2=east, 3=west. ; Note that we can test bit #1 for north/south (0) or east/west (1). test jz ; ; ; ; ;
cl, 10b NorthSouth
;See if north/south or east/west
If the demon is going in an east or west direction, we can start a new tunnel if there are six wall blocks just above or below the current demon. Note: We are checking if all values in these six blocks are Wall values. This code depends on the fact that Wall characters are zero and the sum of these six blocks will be zero if a move is possible.
ReturnFalse:
mov add add je
al, byp Maze[bx+BytesPerRow*2] ;Maze[x, y+2] al, byp Maze[bx+BytesPerRow*2+2] ;Maze[x+1,y+2] al, byp Maze[bx+BytesPerRow*2-2] ;Maze[x-1,y+2] ReturnTrue
mov add add je
al, byp Maze[bx-BytesPerRow*2] ;Maze[x, y-2] al, byp Maze[bx-BytesPerRow*2+2] ;Maze[x+1,y-2] al, byp Maze[bx-BytesPerRow*2-2] ;Maze[x-1,y-2] ReturnTrue
clc pop pop ret
bx ax
;Clear carry = false.
; If the demon is going in a north or south direction, we can start a ; new tunnel if there are six wall blocks just to the left or right ; of the current demon. NorthSouth:
ReturnTrue:
mov add add je
al, byp Maze[bx+4];Maze[x+2,y] al, byp Maze[bx+BytesPerRow+4];Maze[x+2,y+1] al, byp Maze[bx-BytesPerRow+4];Maze[x+2,y-1] ReturnTrue
mov add add jne
al, byp Maze[bx-4];Maze[x-2,y] al, byp Maze[bx+BytesPerRow-4];Maze[x-2,y+1] al, byp Maze[bx-BytesPerRow-4];Maze[x-2,y-1] ReturnFalse
CanStart
stc pop pop ret endp
;Set carry = true.
; CanMove; ; ; ;
Tests to see if the current demon (dir=cl, x=dl, y=dh) can move in the specified direction. Movement is possible if the demon will not come within one square of another tunnel. This function returns true (carry set) if a move is possible. On entry, CH contains the direction this code should test.
bx ax
Page 1111
Chapter 19 CanMove
proc push push
ax bx
MazeAdrs mov
bx, ax
cmp jb je cmp je
ch, South IsNorth IsSouth ch, East IsEast
;Put @Maze[x,y] into ax.
; If the demon is moving west, check the blocks in the rectangle formed ; by Maze[x-2,y-1] to Maze[x-1,y+1] to make sure they are all wall values.
ReturnFalse:
mov add add add add add je clc pop pop ret
al, byp Maze[bx-BytesPerRow-4];Maze[x-2, al, byp Maze[bx-BytesPerRow-2];Maze[x-1, al, byp Maze[bx-4];Maze[x-2, y] al, byp Maze[bx-2];Maze[x-1, y] al, byp Maze[bx+BytesPerRow-4];Maze[x-2, al, byp Maze[bx+BytesPerRow-2];Maze[x-1, ReturnTrue
y-1] y-1] y+1] y+1]
bx ax
; If the demon is going east, check the blocks in the rectangle formed ; by Maze[x+1,y-1] to Maze[x+2,y+1] to make sure they are all wall values. IsEast:
ReturnTrue:
mov add add add add add jne stc pop pop ret
al, byp Maze[bx-BytesPerRow+4];Maze[x+2, al, byp Maze[bx-BytesPerRow+2];Maze[x+1, al, byp Maze[bx+4];Maze[x+2, y] al, byp Maze[bx+2];Maze[x+1, y] al, byp Maze[bx+BytesPerRow+4];Maze[x+2, al, byp Maze[bx+BytesPerRow+2];Maze[x+1, ReturnFalse
y-1] y-1] y+1] y+1]
bx ax
; If the demon is going north, check the blocks in the rectangle formed ; by Maze[x-1,y-2] to Maze[x+1,y-1] to make sure they are all wall values. IsNorth:
mov add add add add add jne stc pop pop ret
al, byp Maze[bx-BytesPerRow-2];Maze[x-1, y-1] al, byp Maze[bx-BytesPerRow*2-2];Maze[x-1, y-2] al, byp Maze[bx-BytesPerRow];Maze[x, y-1] al, byp Maze[bx-BytesPerRow*2];Maze[x, y-2] al, byp Maze[bx-BytesPerRow+2];Maze[x+1, y-1] al, byp Maze[bx-BytesPerRow*2+2];Maze[x+1, y-2] ReturnFalse bx ax
; If the demon is going south, check the blocks in the rectangle formed ; by Maze[x-1,y+2] to Maze[x+1,y+1] to make sure they are all wall values. IsSouth:
Page 1112
mov add add add add add jne stc
al, byp Maze[bx+BytesPerRow-2];Maze[x-1, y+1] al, byp Maze[bx+BytesPerRow*2-2];Maze[x-1, y+2] al, byp Maze[bx+BytesPerRow];Maze[x, y+1] al, byp Maze[bx+BytesPerRow*2];Maze[x, y+2] al, byp Maze[bx+BytesPerRow+2];Maze[x+1, y+1] al, byp Maze[bx+BytesPerRow*2+2];Maze[x+1, y+2] ReturnFalse
Processes, Coroutines, and Concurrency pop pop ret CanMove
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
endp
SetDir- Changes the current direction. The maze digging algorithm has decided to change the direction of the tunnel begin dug by one of the demons. This code checks to see if we CAN change the direction, and picks a new direction if possible. If the demon is going north or south, a direction change causes the demon to go east or west. Likewise, if the demon is going east or west, a direction change forces it to go north or south. If the demon cannot change directions (because it cannot move in the new direction for one reason or another), SetDir returns without doing anything. If a direction change is possible, then SetDir selects a new direction. If there is only one possible new direction, the demon is sent off in that direction. If the demon could move off in one of two different directions, SetDir “flips a coin” to choose one of the two new directions. This function returns the new direction in al.
SetDir
; ; ; ;
bx ax
proc
near
test je
cl, 10b IsNS
;See if north/south ; or east/west direction.
We’re going east or west. If we can move EITHER north or south from this point, randomly choose one of the directions. If we can only move one way or the other, choose that direction. If we can’t go either way, return without changing the direction. mov call jnc mov call jnc call and ret
ch, North CanMove NotNorth ch, South CanMove DoNorth RandNum ax, 1
DoNorth:
mov ret
ax, North
NotNorth:
mov call jnc mov ret
ch, South CanMove TryReverse ax, South
DoSouth:
;See if we can move north ;See if we can move south ;Get a random direction ;Make it north or south.
; If the demon is moving north or south, choose a new direction of east ; or west, if possible. IsNS:
DoEast:
mov call jnc mov call jnc call and or ret
ch, East CanMove NotEast ch, West CanMove DoEast RandNum ax, 1b al, 10b
mov ret
ax, East
;See if we can move East ;See if we can move West ;Get a random direction ;Make it East or West
Page 1113
Chapter 19 DoWest:
mov ret
ax, West
NotEast:
mov call jc
ch, West CanMove DoWest
; Gee, we can’t switch to a perpendicular direction, see if we can ; turn around. TryReverse:
mov xor call jc
ch, cl ch, 1 CanMove ReverseDir
; If we can’t turn around (likely), then keep going in the same direction. mov mov ret
ah, 0 al, cl
;Stay in same direction.
; Otherwise reverse direction down here. ReverseDir:
SetDir
mov mov xor ret endp
; Stuck; ;
This function checks to see if a demon is stuck and cannot move in any direction. It returns true if the demon is stuck and needs to be killed.
Stuck
NotStuck: Stuck
proc mov call jc mov call jc mov call jc mov call ret endp
; NextDemon;
Searches through the demon list to find the next available active demon. Return a pointer to this guy in es:di.
NextDemon
proc push
near ax
NDLoop:
inc and mov mul mov add cmp je
DemonIndex ;Move on to next demon, DemonIndex, ModDemons ; MOD MaxDemons. al, size pcb ;Compute index into DemonIndex ; DemonList. di, ax ;See if the demon at this di, offset DemonList ; offset is active. byp [di].pcb.NextProc, 0 NDLoop
mov mov pop ret endp
ax, ds es, ax ax
NextDemon
Page 1114
ah, 0 al, cl al, 1
near ch, North CanMove NotStuck ch, South CanMove NotStuck ch, East CanMove NotStuck ch, West CanMove
Processes, Coroutines, and Concurrency ; Dig; ; ; ; ;
This is the demon process. It moves the demon one position (if possible) in its current direction. After moving one position forward, there is a 25% chance that this guy will change its direction; there is a 25% chance this demon will spawn a child process to dig off in a perpendicular direction.
Dig
proc
; ; ; ;
See if the current demon is stuck. If the demon is stuck, then we’ve go to remove it from the demon list. If it is not stuck, then have it continue digging. If it is stuck and this is the last active demon, then return control to the main program. call jc
; ; ; ;
near
Stuck NotStuck
Okay, kill the current demon. Note: this will never kill the last demon because we have the timer process running. The timer process is the one that always stops the program. dec
DemonCnt
; Since the count is not zero, there must be more demons in the demon ; list. Free the stack space associated with the current demon and ; then search out the next active demon and have at it. MoreDemons:
mov mul mov
al, size pcb DemonIndex bx, ax
; Free the stack space associated with this process. Note this code is ; naughty. It assumes the stack is allocated with the Standard Library ; malloc routine that always produces a base address of 8. mov mov free
es, DemonList[bx].regss di, 8
;Cheating!
; Mark the demon entry for this guy as unused. mov
byp DemonList[bx].NextProc, 0
;Mark as unused.
; Okay, locate the next active demon in the list. FndNxtDmn:
call cocall
NextDemon ;Never returns
; If the demon is not stuck, then continue digging away. NotStuck:
mov call jnc
ch, cl CanMove DontMove
; If we can move, then adjust the demon’s coordinates appropriately: cmp jb je cmp jne
cl, South MoveNorth MoveSouth cl, East MoveWest
inc jmp
dl MoveDone
dec
dl
; Moving East:
MoveWest:
Page 1115
Chapter 19
MoveNorth:
jmp
MoveDone
dec jmp
dh MoveDone
MoveSouth:inc dh ; Okay, store a NoWall value at this entry in the maze and output a NoWall ; character to the screen (if writing data to the screen). MoveDone:
MazeAdrs mov mov ifdef ScrnAdrs mov push mov mov mov pop endif
bx, ax Maze[bx], NoWall ToScreen bx, ax es ax, ScreenSeg es, ax word ptr es:[bx], NoWallChar es
; Before leaving, see if this demon shouldn’t change direction. DontMove:
call and jne call mov
RandNum al, 11b NoChangeDir SetDir cl, al
;25% chance result is zero.
NoChangeDir: ; Also, see if this demon should spawn a child process call and jne
RandNum al, 11b NoSpawn
;Give it a 25% chance.
; Okay, see if it’s possible to spawn a new process at this point: call jnc
CanStart NoSpawn
; See if we’ve already got MaxDemons active: cmp jae
DemonCnt, MaxDemons NoSpawn
inc
DemonCnt
;Add another demon.
; Okay, create a new demon and add him to the list. push push
dx cx
;Save cur demon info.
; Locate a free slot for this demon FindSlot:
lea add cmp jne
si, DemonList- size pcb si, size pcb byp [si].pcb.NextProc, 0 FindSlot
; Allocate some stack space for the new demon. mov malloc
cx, 256
; Set up the stack pointer for this guy:
Page 1116
;256 byte stack.
Processes, Coroutines, and Concurrency add mov mov
di, 248 ;Point stack at end. [si].pcb.regss, es [si].pcb.regsp, di
; Set up the execution address for this guy: mov mov
[si].pcb.regcs, cs [si].pcb.regip, offset Dig
; Initial coordinates and direction for this guy: mov
[si].pcb.regdx, dx
; Select a direction for this guy. pop push
cx cx
;Retrieve direction.
call mov mov
SetDir ah, 0 [si].pcb.regcx, ax
; Set up other misc junk: mov sti pushf pop mov
[si].pcb.regds, seg dseg [si].pcb.regflags byp [si].pcb.NextProc, 1
;Mark active.
; Restore current process’ parameters pop pop
cx dx
;Restore current demon.
NoSpawn: ; Okay, with all of the above done, it’s time to pass control on to a new ; digger. The following cocall passes control to the next digger in the ; DemonList. GetNextDmn:
call
NextDemon
; Okay, we’ve got a pointer to the next demon in the list (might be the ; same demon if there’s only one), pass control to that demon.
Dig
cocall jmp endp
Dig
; TimerDemon- This demon introduces a delay between ; each cycle in the demon list. This slows down the ; maze generation so you can see the maze being built ; (which makes the program more interesting to watch). TimerDemon
Wait4Change:
proc push push
near es ax
mov mov mov cmp je
ax, 40h es, ax ax, es:[6Ch] ax, es:[6Ch] Wait4Change
cmp je pop pop call cocall jmp
DemonCnt, 1 QuitProgram es ax NextDemon
;BIOS variable area ;BIOS timer location ;BIOS changes this every ; 1/18th second.
TimerDemon
Page 1117
Chapter 19 QuitProgram: TimerDemon
; ; ; ; ;
cocall endp
MainPCB
;Quit the program
What good is a maze generator program if it cannot solve the mazes it creates? SolveMaze finds the solution (if any) for this maze. It marks the solution path and the paths it tried, but failed on. function solvemaze(x,y:integer):boolean
sm_X sm_Y
textequ textequ
<[bp+6]> <[bp+4]>
SolveMaze
proc push mov
near bp bp, sp
; See if we’ve just solved the maze: cmp jne cmp jne mov pop ret
byte ptr sm_X, EndX NotSolved byte ptr sm_Y, EndY NotSolved ax, 1 ;Return true. bp 4
; See if moving to this spot was an illegal move. There will be ; a NoWall value at this cell in the maze if the move is legal. NotSolved:
mov mov MazeAdrs mov cmp je mov pop ret
dl, sm_X dh, sm_Y bx, ax Maze[bx], NoWall MoveOK ax, 0 bp 4
;Return failure
; Well, it is possible to move to this point, so place an appropriate ; value on the screen and keep searching for the solution. MoveOK:
; ; ; ; ; ;
Page 1118
mov
Maze[bx], Visited
ifdef push ScrnAdrs mov mov mov mov pop endif
ToScreen es
;Write a “VisitChar” ; character to the ; screen at this X,Y ; position.
bx, ax ax, ScreenSeg es, ax word ptr es:[bx], VisitChar es
Recusively call SolveMaze until we get a solution. Just call SolveMaze for the four possible directions (up, down, left, right) we could go. Since we’ve left “Visited” values in the Maze, we will not accidentally search back through the path we’ve already travelled. Furthermore, if we cannot go in one of the four directions, SolveMaze will catch this immediately upon entry (see the code at the start of this routine). mov dec push push call test jne
ax, sm_X ax ax sm_Y SolveMaze ax, ax Solved
;Try the path at location ; (X-1, Y)
push
sm_X
;Try the path at location
;Solution?
Processes, Coroutines, and Concurrency mov dec push call test jne
ax, sm_Y ax ax SolveMaze ax, ax Solved
; (X, Y-1)
mov inc push push call test jne
ax, sm_X ax ax sm_Y SolveMaze ax, ax Solved
;Try the path at location ; (X+1, Y)
push mov inc push call test jne pop ret
sm_X ax, sm_Y ax ax SolveMaze ax, ax Solved bp 4
;Try the path at location ; (X, Y+1)
ifdef push mov mov ScrnAdrs mov mov mov mov pop mov endif
ToScreen es dl, sm_X dh, sm_Y
;Draw return path.
pop ret endp
bp 4
;Solution?
;Solution?
;Solution?
Solved:
SolveMaze
bx, ax ax, ScreenSeg es, ax word ptr es:[bx], PathChar es ax, 1 ;Return true
; Here’s the main program that drives the whole thing: Main
proc mov mov mov meminit call lesi coinit
ax, dseg ds, ax es, ax
Init MainPCB
;Initialize maze stuff. ;Initialize coroutine ; package.
; Create the first demon. ; Set up the stack pointer for this guy: mov malloc add mov mov
cx, 256 di, 248 DemonList.regsp, di DemonList.regss, es
; Set up the execution address for this guy: mov mov
DemonList.regcs, cs DemonList.regip, offset Dig
; Initial coordinates and direction for this guy:
Page 1119
Chapter 19 mov mov mov mov mov
cx, East ;Start off going east. dh, StartY dl, StartX DemonList.regcx, cx DemonList.regdx, dx
; Set up other misc junk: mov sti pushf pop mov inc mov
DemonList.regds, seg dseg DemonList.regflags byp DemonList.NextProc, 1 DemonCnt DemonIndex, 0
;Demon is “active”.
; Set up the Timer demon: mov mov
DemonList.regsp+(size pcb), offset EndTimerStk DemonList.regss+(size pcb), ss
; Set up the execution address for this guy: mov mov
DemonList.regcs+(size pcb), cs DemonList.regip+(size pcb), offset TimerDemon
; Set up other misc junk: mov sti pushf pop mov inc
DemonList.regds+(size pcb), seg dseg DemonList.regflags+(size pcb) byp DemonList.NextProc+(size pcb), 1 DemonCnt
; Start the ball rolling. mov mov lea cocall
ax, ds es, ax di, DemonList
; Wait for the user to press a key before solving the maze: getc mov push mov push call
ax, StartX ax ax, StartY ax SolveMaze
; Wait for another keystroke before quitting: getc mov int Quit: Main
ExitPgm endp
cseg
ends
sseg
segment
ax, 3 10h
;Clear screen and reset video mode. ;DOS macro to quit program.
para stack ‘stack’
; Stack for the timer demon we create (we’ll allocate the other ; stacks dynamically). TimerStk EndTimerStk
Page 1120
byte word
256 dup (?) ?
Processes, Coroutines, and Concurrency
; Main program’s stack: stk sseg
byte ends
512 dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
The existing Standard Library coroutine package is not suitable for programs that use the 80386 and later 32 bit register sets. As mentioned earlier, the problem lies in the fact that the Standard Library only preserves the 16-bit registers when switching between processes. However, it is a relatively trivial extension to modify the Standard Library so that it saves 32 bit registers. To do so, just change the definition of the pcb (to make room for the 32 bit registers) and the sl_cocall routine: .386 option
segment:use16
dseg
segment
para public ‘data’
wp
equ
<word ptr>
; 32-bit PCB. Note we only keep the L.O. 16 bits of SP since we are ; operating in real mode. pcb32 regsp regss regip regcs
struc word word word word
? ? ? ?
regeax regebx regecx regedx regesi regedi regebp
dword dword dword dword dword dword dword
? ? ? ? ? ? ?
regds reges regflags pcb32
word word dword ends
? ? ?
DefaultPCB DefaultCortn
pcb32 pcb32
<> <>
CurCoroutine
dword
DefaultCortn
dseg
ends
cseg
segment
;Points at the currently executing ; coroutine.
para public ‘slcode’
;============================================================================ ; ; 32-Bit Coroutine support. ; ; COINIT32- ES:DI contains the address of the current (default) process’ PCB. CoInit32
proc assume push
far ds:dseg ax
Page 1121
Chapter 19
CoInit32
push mov mov mov mov pop pop ret endp
ds ax, dseg ds, ax wp dseg:CurCoroutine, di wp dseg:CurCoroutine+2, es ds ax
; COCALL32- transfers control to a coroutine. ES:DI contains the address ; of the PCB. This routine transfers control to that coroutine and then ; returns a pointer to the caller’s PCB in ES:DI. cocall32
proc assume pushfd push push push push mov mov cli
far ds:dseg ds es edi eax ax, dseg ds, ax
;Save these for later
;Critical region ahead.
; Save the current process’ state: les pop mov mov mov mov pop mov
di, dseg:CurCoroutine es:[di].pcb32.regeax es:[di].pcb32.regebx, es:[di].pcb32.regecx, es:[di].pcb32.regedx, es:[di].pcb32.regesi, es:[di].pcb32.regedi es:[di].pcb32.regebp,
pop pop pop pop pop mov mov
es:[di].pcb32.reges es:[di].pcb32.regds es:[di].pcb32.regflags es:[di].pcb32.regip es:[di].pcb32.regcs es:[di].pcb32.regsp, sp es:[di].pcb32.regss, ss
mov mov mov mov mov
bx, es ;Save so we can return in ecx, edi ; ES:DI later. edx, es:[di].pcb32.regedi es, es:[di].pcb32.reges di, dx ;Point es:di at new PCB
mov mov
wp dseg:CurCoroutine, di wp dseg:CurCoroutine+2, es
mov mov
es:[di].pcb32.regedi, ecx ;The ES:DI return values. es:[di].pcb32.reges, bx
ebx ecx edx esi ebp
; Okay, switch to the new process:
Page 1122
mov mov mov mov mov mov mov mov mov
ss, es:[di].pcb32.regss sp, es:[di].pcb32.regsp eax, es:[di].pcb32.regeax ebx, es:[di].pcb32.regebx ecx, es:[di].pcb32.regecx edx, es:[di].pcb32.regedx esi, es:[di].pcb32.regesi ebp, es:[di].pcb32.regebp ds, es:[di].pcb32.regds
push push push push
es:[di].pcb32.regflags es:[di].pcb32.regcs es:[di].pcb32.regip es:[di].pcb32.regedi
Processes, Coroutines, and Concurrency
cocall32
mov pop iret endp
es, es:[di].pcb32.reges edi
; CoCall32l works just like cocall above, except the address of the pcb ; follows the call in the code stream rather than being passed in ES:DI. ; Note: this code does *not* return the caller’s PCB address in ES:DI. ; cocall32l
proc assume push mov pushfd push push push push mov mov cli
far ds:dseg ebp bp, sp ds es edi eax ax, dseg ds, ax ;Critical region ahead.
; Save the current process’ state: les pop mov mov mov mov pop pop pop pop pop pop pop mov mov
di, dseg:CurCoroutine es:[di].pcb32.regeax es:[di].pcb32.regebx, ebx es:[di].pcb32.regecx, ecx es:[di].pcb32.regedx, edx es:[di].pcb32.regesi, esi es:[di].pcb32.regedi es:[di].pcb32.reges es:[di].pcb32.regds es:[di].pcb32.regflags es:[di].pcb32.regebp es:[di].pcb32.regip es:[di].pcb32.regcs es:[di].pcb32.regsp, sp es:[di].pcb32.regss, ss
mov mov add mov mov les mov mov
dx, es:[di].pcb32.regip ;Get return address (ptr to cx, es:[di].pcb32.regcs ; PCB address. es:[di].pcb32.regip, 4 ;Skip ptr on return. es, cx ;Get the ptr to the new pcb di, dx ; address, then fetch the di, es:[di] ; pcb val. wp dseg:CurCoroutine, di wp dseg:CurCoroutine+2, es
; Okay, switch to the new process:
cocall32l cseg
mov mov mov mov mov mov mov mov mov
ss, es:[di].pcb32.regss sp, es:[di].pcb32.regsp eax, es:[di].pcb32.regeax ebx, es:[di].pcb32.regebx ecx, es:[di].pcb32.regecx edx, es:[di].pcb32.regedx esi, es:[di].pcb32.regesi ebp, es:[di].pcb32.regebp ds, es:[di].pcb32.regds
push push push push mov pop iret
es:[di].pcb32.regflags es:[di].pcb32.regcs es:[di].pcb32.regip es:[di].pcb32.regedi es, es:[di].pcb32.reges edi
endp ends
Page 1123
Chapter 19
19.4
Multitasking Coroutines provide a reasonable mechanism for switching between processes that must take turns. For example, the maze generation program in the previous section would generate poor mazes if the daemon processes didn’t take turns removing one cell at a time from the maze. However, the coroutine paradigm isn’t always suitable; not all processes need to take turns. For example, suppose you are writing an action game where the user plays against the computer. In addition, the computer player operates independently of the user in real time. This could be, for example, a space war game or a flight simulator game (where you are dog fighting other pilots). Ideally, we would like to have two computers. One to handle the user interaction and one for the computer player. Both systems would communicate their moves to one another during the game. If the (human) player simply sits and watches the screen, the computer player would win since it is active and the human player is not. Of course, it would considerably limit the marketability of your game were it to require two computers to play. However, you can use multitasking to simulate two separate computer systems on a single CPU. The basic idea behind multitasking is that one process runs for a period of time (the time quantum or time slice ) and then a timer interrupts the process. The timer ISR saves the state of the process and then switches control to another process. That process runs for its time slice and then the timer interrupt switches to another process. In this manner, each process gets some amount of computer time. Note that multitasking is very easy to implement if you have a coroutine package. All you need to do is write a timer ISR that cocalls the various processes, one per timer interrupt A timer interrupt that switches between processes is a dispatcher. One decision you will need to make when designing a dispatcher is a policy for the process selection algorithm. A simple policy is to place all processes in a queue and then rotate among them. This is known as the round-robin policy. Since this is the policy the UCR Standard Library process package uses, we will adopt it as well. However, there are other process selection criteria, generally involving the priority of a process, available as well. See a good text on operating systems for details. The choice of the time quantum can have a big impact on performance. Generally, you would like the time quantum to be small. The time sharing (switching between processes based on the clock) will be much smoother if you use small time quanta. For example, suppose you choose five second time quanta and you were running four processes concurrently. Each process would get five seconds; it would run very fast during those five seconds. However, at the end of its time slice it would have to wait for the other three process’ turns, 15 seconds, before it ran again. The users of such programs would get very frustrated with them, users like programs whose performance is relatively consistent from one moment to the next. If we make the time slice one millisecond, instead of five seconds, each process would run for one millisecond and then switch to the next processes. This means that each processes gets one millisecond out of five. This is too small a time quantum for the user to notice the pause between processes. Since smaller time quanta seem to be better, you might wonder “why not make them as small as possible?” For example, the PC supports a one millisecond timer interrupt. Why not use that to switch between processes? The problem is that there is a fair amount of overhead required to switch from one processes to another. The smaller you make the time quantum, the larger will be the overhead of using time slicing. Therefore, you want to pick a time quantum that is a good balance between smooth process switching and too much overhead. As it turns out, the 1/18th second clock is probably fine for most multitasking requirements.
19.4.1 Lightweight and HeavyWeight Processes There are two major types of processes in the world of multitasking: lightweight processes, also known as threads, and heavyweight processes. These two types of processes differ mainly in the details of memory management. A heavyweight process swaps memory management tables and moves lots of data
Page 1124
Processes, Coroutines, and Concurrency
around. Threads only swap the stack and CPU registers. Threads have much less overhead cost than heavyweight processes. We will not consider heavyweight processes in this text. Heavyweight processes appear in protected mode operating systems like UNIX, Linux, OS/2, or Windows NT. Since there is rarely any memory management (at the hardware level) going on under DOS, the issue of changing memory management tables around is moot. Switching from one heavyweight application to another generally corresponds to switching from one application to another. Using lightweight processes (threads) is perfectly reasonable under DOS. Threads (short for “execution thread” or “thread of execution”) correspond to two or more concurrent execution paths within the same program. For example, we could think of each of the demons in the maze generation program as being a separate thread of execution. Although threads have different stacks and machine states, they share code and data memory. There is no need to use a “shared memory TSR” to provide global shared memory (see “Shared Memory” on page 1078). Instead, maintaining local variables is the difficult task. You must either allocate local variables on the process’ stack (which is separate for each process) or you’ve got to make sure that no other process uses the variables you declare in the data segment specifically for one thread. We could easily write our own threads package, but we don’t have to; the UCR Standard Library provides this capability in the processes package. To see how to incorporate threads into your programs, keep reading…
19.4.2 The UCR Standard Library Processes Package The UCR Standard Library provides six routines to let you manage threads. These routines include
prcsinit , prcsquit , fork, die, kill , and yield . These functions let you initialize and shut down the
threads system, start new processes, terminate processes, and voluntarily pass the CPU off to another process. The prcsinit and prcsquit functions let you initialize and shutdown the system. The prcsinit call prepares the threads package. You must call this routine before executing any of the other five process routines. The prcsquit function shuts down the threads system in preparation for program termination. Prcsinit patches into the timer interrupt (interrupt 8). Prcsquit restores the interrupt 8 vector. It is very important that you call prcsquit before your program returns to DOS. Failure to do so will leave the int 8 vector pointing off into memory which may cause the system to crash when DOS loads the next program. Your program must patch the break and critical error exception vectors to ensure that you call prcsquit in the event of abnormal program termination. Failure to do so may crash the system if the user terminates the program with ctrl-break or an abort on an I/O error. Prcsinit and prcsquit do not require any parameters, nor do they return any values. The fork call spawns a new process. On entry, es:di must point at a pcb for the new process. The regss and regsp fields of the pcb must contain the address of the top of the stack area for this new process. The fork call fills in the other fields of the pcb (including cs:ip)/
For each call you make to fork, the fork routine returns twice, once for each thread of execution. The parent process typically returns first, but this is not certain; the child process is usually the second return from the fork call. To differentiate the two calls, fork returns two process identifiers (PIDs) in the ax and bx registers. For the parent process, fork returns with ax containing zero and bx containing the PID of the child process. For the child process, fork returns with ax containing the child’s PID and bx containing zero. Note that both threads return and continuing executing the same code after the call to fork. If you want the child and parent processes to take separate paths, you would execute code like the following:
Page 1125
Chapter 19 lesi fork test je
NewPCB
;Assume regss/regsp are initialized.
ax, ax ;Parent PID is zero at this point. ParentProcess ;Go elsewhere if parent process.
; Child process continues execution here
The parent process should save the child’s PID. You can use the PID to terminate a process at some later time. It is important to repeat that you must initialize the regss and regsp fields in the pcb before calling fork. You must allocate storage for a stack (dynamically or statically) and point ss:sp at the last word of this stack area. Once you call fork, the process package uses whatever value that happens to be in the regss and regsp fields. If you have not initialized these values, they will probably contain zero and when the process starts it will wipe out the data at address 0:FFFE. This may crash the system at one point or another. The die call kills the current process. If there are multiple processes running, this call transfers control to some other processes waiting to run. If the current process is the only process on the system’s run queue, then this call will crash the system. The kill call lets one process terminate another. Typically, a parent process will use this call to terminate a child process. To kill a process, simply load the ax register with the PID of the process you want to terminate and then call kill . If a process supplies its own PID to the kill function, the process terminates itself (that is, this is equivalent to a die call). If there is only one process in the run queue and that process kills itself, the system will crash. The last multitasking management routine in the process package is the yield call. Yield voluntarily gives up the CPU. This is a direct call to the dispatcher, that will switch to another task in the run queue. Control returns after the yield call when the next time slice is given to this process. If the current process is the only one in the queue, yield immediately returns. You would normally use the yield call to free up the CPU between long I/O operations (like waiting for a keypress). This would allow other tasks to get maximum use of the CPU while your process is just spinning in a loop waiting for some I/O operation to complete. The Standard Library multitasking routines only work with the 16 bit register set of the 80x86 family. Like the coroutine package, you will need to modify the pcb and the dispatcher code if you want to support the 32 bit register set of the 80386 and later processors. This task is relatively simple and the code is quite similar to that appearing in the section on coroutines; so there is no need to present the solution here.
19.4.3 Problems with Multitasking When threads share code and data certain problems can develop. First of all, reentrancy becomes a problem. You cannot call a non-reentrant routine (like DOS) from two separate threads if there is ever the possibility that the non-reentrant code could be interrupted and control transferred to a second thread that reenters the same routine. Reentrancy is not the only problem, however. It is quite possible to design two routines that access shared variables and those routines misbehave depending on where the interrupts occur in the code sequence. We will explore these problems in the section on synchronization (see “Synchronization” on page 1129), just be aware, for now, that these problems exist. Note that simply turning off the interrupts (with cli ) may not solve the reentrancy problem. Consider the following code: cli mov mov int sti
Page 1126
ah, 3Eh bx, Handle 21h
;Prevent reentrancy. ;DOS close call. ;Turn interrupts back on.
Processes, Coroutines, and Concurrency
This code will not prevent DOS from being reentered because DOS (and BIOS) turn the interrupts back on! There is a solution to this problem, but it’s not by using cli and sti .
19.4.4 A Sample Program with Threads The following program provides a simple demonstration of the Standard Library processes package. This short program creates two threads – the main program and a timer process. On each timer tick the background (timer) process kicks in and increments a memory variable. It then yields the CPU back to the main program. On the next timer tick control returns to the background process and this cycle repeats. The main program reads a string from the user while the background process is counting off timer ticks. When the user finishes the line by pressing the enter key, the main program kills the background process and then prints the amount of time necessary to enter the line of text. Of course, this isn’t the most efficient way to time how long it takes someone to enter a line of text, but it does provide an example of the multitasking features of the Standard Library. This short program segment demonstrates all the process routines except die. Note that it also demonstrates the fact that you must supply int 23h and int 24h handlers when using the process package. ; MULTI.ASM ; Simple program to demonstrate the use of multitasking. .xlist include stdlib.a includelib stdlib.lib .list dseg
segment
para public ‘data’
ChildPID BackGndCnt
word word
0 0
;Child’s PID so we can kill it. ;Counts off clock ticks in backgnd.
; PCB for our background process. Note we initialize ss:sp here. BkgndPCB
pcb
{0,offset EndStk2, seg EndStk2}
; Data buffer to hold an input string. InputLine
byte
dseg
ends
cseg
segment assume
128 dup (0)
para public ‘code’ cs:cseg, ds:dseg
; A replacement critical error handler. This routine calls prcsquit ; if the user decides to abort the program. CritErrMsg
byte byte byte
cr,lf “DOS Critical Error!”,cr,lf “A)bort, R)etry, I)gnore, F)ail? $”
MyInt24
proc push push push
far dx ds ax
push pop lea mov int
cs ds dx, CritErrMsg ah, 9 21h
mov int
ah, 1 21h
Int24Lp:
;DOS print string call. ;DOS read character call.
Page 1127
Chapter 19 and
al, 5Fh
;Convert l.c. -> u.c.
cmp jne pop mov jmp
al, ‘I’ NotIgnore ax al, 0 Quit24
;Ignore?
NotIgnore:
cmp jne pop mov jmp
al, ‘r’ NotRetry ax al, 1 Quit24
;Retry?
NotRetry:
cmp jne prcsquit pop mov jmp
al, ‘A’ NotAbort
;Abort?
cmp jne pop mov pop pop iret
al, ‘F’ BadChar ax al, 3 ds dx
mov mov jmp endp
ah, 2 dl, 7 Int24Lp
NotAbort:
Quit24:
BadChar: MyInt24
;If quitting, fix INT 8. ax al, 2 Quit24
;Bell character
; We will simply disable INT 23h (the break exception). MyInt23 MyInt23
proc iret endp
far
; Okay, this is a pretty weak background process, but it does demonstrate ; how to use the Standard Library calls. BackGround
BackGround Main
proc sti mov mov inc yield jmp endp proc mov mov mov meminit
ax, dseg ds, ax BackGndCnt
;Bump call Counter by one. ;Give CPU back to foregnd.
BackGround
ax, dseg ds, ax es, ax
; Initialize the INT 23h and INT 24h exception handler vectors. mov mov mov mov mov mov prcsinit
Page 1128
ax, 0 es, ax word ptr es:[24h*4], offset MyInt24 es:[24h*4 + 2], cs word ptr es:[23h*4], offset MyInt23 es:[23h*4 + 2], cs ;Start multitasking system.
Processes, Coroutines, and Concurrency
ParentPrcs:
lesi fork test je jmp
BkgndPCB
;Fire up a new process
ax, ax ParentPrcs BackGround
;Parent’s return?
mov
ChildPID, bx
;Save child process ID.
print byte byte byte
“I am timing you while you enter a string. So type” cr,lf “quickly: “,0
lesi gets
InputLine
mov kill
ax, ChildPID
printf byte byte dword
;Go do backgroun stuff.
;Stop the child from running.
“While entering ‘%s’ you took %d clock ticks” cr,lf,0 InputLine, BackGndCnt
prcsquit Quit: Main
ExitPgm endp
cseg
ends
sseg
segment
;DOS macro to quit program.
para stack ‘stack’
; Here is the stack for the background process we start stk2 EndStk2
byte word
256 dup (?) ?
;Here’s the stack for the main program/foreground process.
19.5
stk sseg
byte ends
1024 dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Synchronization Many problems occur in cooperative concurrently executing processes due to synchronization (or the lack thereof). For example, one process can produce data that other processes consume. However, it might take much longer for the producer to create than data than it takes for the consumer to use it. Some mechanism must be in place to ensure that the consumer does not attempt to use the data before the producer creates it. Likewise, we need to ensure that the consumer uses the data created by the producer before the producer creates more data. The producer-consumer problem is one of several very famous synchronization problems from operating systems theory. In the producer-consumer problem there are one or more processes that produce data and write this data to a shared buffer. Likewise, there are one or more consumers that read data from this buffer. There are two synchronization issues we must deal with – the first is to ensure that the producers do not produce more data than the buffer can hold (conversely, we must prevent the consumers from removing data from an empty buffer); the second is to ensure the integrity of the buffer data structure by allowing access to only one process at a time.
Page 1129
Chapter 19
Consider what can happen in a simple producer-consumer problem. Suppose the producer and consumer processes share a single data buffer structure organized as follows: buffer Count InPtr OutPtr Data buffer
struct word word word byte ends
0 0 0 MaxBufSize dup (?)
The Count field specifies the number of data bytes currently in the buffer. InPtr points at the next available location to place data in the buffer. OutPtr is the address of the next byte to remove from the buffer. Data is the actual buffer array. Adding and removing data is very easy. The following code segments almost handle this job: ; Producer;
This procedure adds the value in al to the buffer. Assume that the buffer variable MyBuffer is in the data segment.
Producer
proc pushf sti push
near ;Must have interrupts on! bx
; The following loop waits until there is room in the buffer to insert ; another byte. WaitForRoom:
cmp jae
MyBuffer.Count, MaxBufSize WaitForRoom
; Okay, insert the byte into the buffer. mov mov inc inc
bx, MyBuffer.InPtr MyBuffer.Data[bx], al MyBuffer.Count ;We just added a byte to the buffer. MyBuffer.InPtr ;Move on to next item in buffer.
; If we are at the physical end of the buffer, wrap around to the beginning. cmp jb mov
MyBuffer.InPtr, MaxBufSize NoWrap MyBuffer.InPtr, 0 bx
Producer
pop popf ret endp
; Consumer;
This procedure waits for data (if necessary) and returns the next available byte from the buffer.
Consumer
proc pushf sti push cmp je
NoWrap:
WaitForData:
near ;Must have interrupts on! bx Count, 0 WaitForData
;Is the buffer empty? ;If so, wait for data to arrive.
; Okay, fetch an input character mov mov dec inc cmp jb mov
bx, MyBuffer.OutPtr al, MyBuffer.Data[bx] MyBuffer.Count MyBuffer.OutPtr MyBuffer.OutPtr, MaxBufSize NoWrap MyBuffer.OutPtr, 0
pop popf ret endp
bx
NoWrap:
Consumer
Page 1130
Processes, Coroutines, and Concurrency
The only problem with this code is that it won’t always work if there are multiple producer or consumer processes. In fact, it is easy to come up with a version of this code that won’t work for a single set of producer and consumer processes (although the code above will work fine, in that special case). The problem is that these procedures access global variables and, therefore, are not reentrant. In particular, the problem lies with the way these two procedures manipulate the buffer control variables. Consider, for a moment, the following statements from the Consumer procedure: dec
MyBuffer.Count
« Suppose an interrupt occurs here » inc cmp jb mov
MyBuffer.OutPtr MyBuffer.OutPtr, MaxBufSize NoWrap MyBuffer.OutPtr, 0
NoWrap:
If an interrupt occurs at the specified point above and control transfers to another consumer process that reenters this code, the second consumer would malfunction. The problem is that the first consumer has fetched data from the buffer but has yet to update the output pointer. The second consumer comes along and removes the same byte as the first consumer. The second consumer then properly updates the output pointer to point at the next available location in the circular buffer. When control eventually returns to the first consumer process, it finishes the operation by incrementing the output pointer. This causes the system to skip over the next byte which no process has read. The end result is that two consumer processes fetch the same byte and then skip a byte in the buffer. This problem is easily solved by recognizing the fact that the code that manipulates the buffer data is a critical region. By restricting execution in the critical region to one process at a time, we can solve this problem. In the simple example above, we can easily prevent reentrancy by turning the interrupts off while in the critical region. For the consumer procedure, the code would look like this: ; Consumer;
This procedure waits for data (if necessary) and returns the next available byte from the buffer.
Consumer
proc pushf sti push cmp je
WaitForData:
near ;Must have interrupts on! bx Count, 0 WaitForData
;Is the buffer empty? ;If so, wait for data to arrive.
; The following is a critical region, so turn the interrupts off. cli ; Okay, fetch an input character mov mov dec inc cmp jb mov
bx, MyBuffer.OutPtr al, MyBuffer.Data[bx] MyBuffer.Count MyBuffer.OutPtr MyBuffer.OutPtr, MaxBufSize NoWrap MyBuffer.OutPtr, 0
pop popf ret endp
bx
NoWrap:
Consumer
;Restore interrupt flag.
Note that we cannot turn the interrupts off during the execution of the whole procedure. Interrupts must be on while this procedure is waiting for data, otherwise the producer process will never be able to put data in the buffer for the consumer. Simply turning the interrupts off does not always work. Some critical regions may take a considerable amount of time (seconds, minutes, or even hours) and you cannot leave the interrupts off for that amount Page 1131
Chapter 19
of time3. Another problem is that the critical region may call a procedure that turns the interrupts back on and you have no control over this. A good example is a procedure that calls MS-DOS. Since MS-DOS is not reentrant, MS-DOS is, by definition, a critical section; we can only allow one process at a time inside MS-DOS. However, MS-DOS reenables the interrupts, so we cannot simply turn off the interrupts before calling an MS-DOS function an expect this to prevent reentrancy. Turning off the interrupts doesn’t even work for the consumer/producer procedures given earlier. Note that interrupts must be on while the consumer is waiting for data to arrive in the buffer (conversely, the producers must have interrupts on while waiting for room in the buffer). It is quite possible for the code to detect the presence of data and just before the execution of the cli instruction, an interrupt transfers control to a second consumer process. While it is not possible for both processes to update the buffer variables concurrently, it is possible for the second consumer process to remove the only data value from the input buffer and then switch back to the first consumer that removes a phantom value from the buffer (and causes the Count variable to go negative). One poorly thought out solution is to use a flag to control access to a critical region. A process, before entering the critical region, tests the flag to see if any other process is currently in the critical region; if not, the process sets the flag to “in use” and then enters the critical region. Upon leaving the critical region, the process sets the flag to “not in use.” If a process wants to enter a critical region and the flag’s value is “in use”, the process must wait until the process currently in the critical section finishes and writes the “not in use” value to the flag. The only problem with this solution is that it is nothing more than a special case of the producer/consumer problem. The instructions that update the in-use flag form their own critical section that you must protect. As a general solution, the in-use flag idea fails.
19.5.1 Atomic Operations, Test & Set, and Busy-Waiting The problem with the in-use flag idea is that it takes several instructions to test and set the flag. A typical piece of code that tests such a flag would read its value and determine if the critical section is in use. If not, it would then write the “in-use” value to the flag to let other processes know that it is in the critical section. The problem is that an interrupt could occur after the code tests the flag but before it sets the flag to “in use.” Then some other process can come along, test the flag and find that it is not in use, and enter the critical region. The system could interrupt that second process while it is still in the critical region and transfer control back to the first. Since the first process has already determined that the critical region is not in use, it sets the flag to “in use” and enters the critical region. Now we have two processes in the critical region and the system is in violation of the mutual exclusion requirement (only one process in a critical region at a time). The problem with this approach is that testing and setting the in-use flag is not an uninterruptable (atomic ) operation. If it were, then there would be no problem. Of course, it is easy to make a sequence of instructions non-interruptible by putting a cli instruction before them. Therefore, we can test and set a flag in an atomic operation as follows (assume in-use is zero, not in-use is one): TestLoop:
IsInUse:
pushf cli cmp je mov sti je popf
Flag, 0 IsInUse Flag, 0 TestLoop
;Turn ints off while testing and ; setting flag. ;Already in use? ;If not, make it so. ;Allow ints (if in-use already). ;Wait until not in use.
; When we get down here, the flag was “not in-use” and we’ve just set it ; to “in-us.” We now have exclusive access to the critical section.
3. In general, you should not leave the interrupts off for more than about 30 milliseconds when using the 1/18th second clock for multitasking. A general rule of thumb is that interrupts should not be off for much more than abou;50% of the time quantum.
Page 1132
Processes, Coroutines, and Concurrency
Another solution is to use a so-called “test and set” instruction – one that both tests a specific condition and sets the flag to a desired value. In our case, we need an instruction that both tests a flag to see if it is not in-use and sets it to in-use at the same time (if the flag was already in-use, it will remain in use afterward). Although the 80x86 does not support a specific test and set instruction, it does provide several others that can achieve the same effect. These instructions include xchg, shl , shr, sar, rcl , rcr , rol, ror, btc/btr /bts (available only on the 80386 and later processors), and cmpxchg (available only on the 80486 and later processors). In a limited sense, you can also use the addition and subtraction instructions (add, sub, adc, sbb, inc, and dec) as well. The exchange instruction provides the most generic form for the test and set operation. If you have a flag (0=in use, 1=not in use) you can test and set this flag without messing with the interrupts using the following code: InUseLoop:
mov xchg cmp je
al, 0 al, Flag al, 0 InUseLoop
;0=In Use
The xchg instruction atomically swaps the value in al with the value in the flag variable. Although the xchg instruction doesn’t actually test the value, it does place the original flag value in a location (al ) that is safe from modification by another process. If the flag originally contained zero (in-use), this exchange sequence swaps a zero for the existing zero and the loop repeats. If the flag originally contained a one (not in-use) then this code swaps a zero (in-use) for the one and falls out of the in use loop. The shift and rotate instructions also act as test and set instructions, assuming you use the proper values for the in-use flag. With in-use equal to zero and not in-use equal to one, the following code demonstrates how to use the shr instruction for the test and set operation: InUseLoop:
shr jnc
Flag, 1 InUseLoop
;In-use bit to carry, 0->Flag. ;Repeat if already in use.
This code shifts the in-use bit (bit number zero) into the carry flag and clears the in-use flag. At the same time, it zeros the Flag variable, assuming Flag always contains zero or one. The code for the atomic test and set sequences using the other shift and rotates is very similar and appears in the exercises. Starting with the 80386, Intel provided a set of instructions explicitly intended for test and set operations: btc (bit test and complement), bts (bit test and set), and btr (bit test and reset). These instructions copy a specific bit from the destination operand into the carry flag and then complement, set, or reset (clear) that bit. The following code demonstrates how to use the btr instruction to manipulate our in-use flag: InUseLoop:
btr jnc
Flag, 0 InUseLoop
;In-use flag is in bit zero.
The btr instruction is a little more flexible than the shr instruction because you don’t have to guarantee that all the other bits in the Flag variable are zero; it tests and clears bit zero without affect any other bits in the Flag variable. The 80486 (and later) cmpxchg instruction provides a very generic synchronization primitive. A “compare and swap” instruction turns out to be the only atomic instruction you need to implement almost any synchronization primitive. However, its generic structure means that it is a little too complex for simple test and set operations. You will get an opportunity to design a test and set sequence using cmpxchg in the exercises. For more details on cmpxchg, see “The CMPXCHG, and CMPXCHG8B Instructions” on page 263. Returning to the producer/consumer problem, we can easily solve the critical region problem that exists in these routines using the test and set instruction sequence presented above. The following code does this for the Producer procedure, you would modify the Consumer procedure in a similar fashion. ; Producer;
This procedure adds the value in al to the buffer. Assume that the buffer variable MyBuffer is in the data segment.
Producer
proc
near
Page 1133
Chapter 19 pushf sti
;Must have interrupts on!
; Okay, we are about to enter a critical region (this whole procedure), ; so test the in-use flag to see if this critical region is already in use. InUseLoop:
shr jnc
Flag, 1 InUseLoop
push
bx
; The following loop waits until there is room in the buffer to insert ; another byte. WaitForRoom:
cmp jae
MyBuffer.Count, MaxBufSize WaitForRoom
; Okay, insert the byte into the buffer. mov mov inc inc
bx, MyBuffer.InPtr MyBuffer.Data[bx], al MyBuffer.Count ;We just added a byte to the buffer. MyBuffer.InPtr ;Move on to next item in buffer.
; If we are at the physical end of the buffer, wrap around to the beginning. cmp jb mov
MyBuffer.InPtr, MaxBufSize NoWrap MyBuffer.InPtr, 0
mov pop popf ret endp
Flag, 1 bx
NoWrap:
Producer
;Set flag to not in use.
One minor problem with the test and set approach to protecting a critical region is that it uses a busy-waiting loop. While the critical region is not available, the process spins in a loop waiting for its turn at the critical region. If the process that is currently in the critical region remains there for a considerable length of time (say, seconds, minutes, or hours), the process(es) waiting to enter the critical region continue to waste CPU time waiting for the flag. This, in turn, wastes CPU time that could be put to better use getting the process in the critical region through it so another process can enter. Another problem that might exist is that it is possible for one process to enter the critical region, locking other processes out, leave the critical region, do some processing, and then reenter the critical region all during the same time slice. If it turns out that the process is always in the critical region when the timer interrupt occurs, none of the other processes waiting to enter the critical region will ever do so. This is a problem known as starvation – processes waiting to enter the critical region never do so because some other process always beats them into it. One solution to these two problems is to use a synchronization object known as a semaphore. Semaphores provide an efficient and general purpose mechanism for protecting critical regions. To find out about semaphores, keep reading...
19.5.2 Semaphores A semaphore is an object with two basic methods: wait and signal (or release). To use a semaphore, you create a semaphore variable (an instance) for a particular critical region or other resource you want to protect. When a process wants to use a given resource, it waits on the semaphore. If no other process is currently using the resource, then the wait call sets the semaphore to in-use and immediately returns to the process. At that time, the process has exclusive access to the resource. If some other process is already using the resource (e.g., is in the critical region), then the semaphore blocks the current process by moving it off the run queue and onto the semaphore queue. When the process that currently holds the Page 1134
Processes, Coroutines, and Concurrency
resource releases it, the release operation removes the first waiting process from the semaphore queue and places it back in the run queue. At the next available time slice, that new process returns from its wait call and can enter its critical region. Semaphores solve the two important problems with the busy-waiting loop described in the previous section. First, when a process waits and the semaphore blocks the process, that process is no longer on the run queue, so it consumes no more CPU time until the point that a release operation places it back onto the run queue. So unlike busy-waiting, the semaphore mechanism does not waste (as much) CPU time on processes that are waiting for some resource. Semaphores can also solve the starvation problem. The wait operation, when blocking a process, can place it at the end of a FIFO semaphore queue. The release operation can fetch a new process from the front of the FIFO queue to place back on to the run queue. This policy ensures that each process entering the semaphore queue gets equal priority access to the resource4. Implementing semaphores is an easy task. A semaphore generally consists of an integer variable and a queue. The system initializes the integer variable with the number of processes than may share the resource at one time (this value is usually one for critical regions and other resources requiring exclusive access). The wait operation decrements this variable. If the result is greater than or equal to zero, the wait function simply returns to the caller; if the result is less than zero, the wait function saves the machine state, moves the process’ pcb from the run queue to the semaphore’s queue, and then switches the CPU to a different process (i.e., a yield call). The release function is almost the converse. It increments the integer value. If the result is not one, the release function moves a pcb from the front of the semaphore queue to the run queue. If the integer value becomes one, there are no more processes on the semaphore queue, so the release function simply returns to the caller. Note that the release function does not activate the process it removes from the semaphore process queue. It simply places that process in the run queue. Control always returns to the process that made the release call (unless, of course, a timer interrupt occurs while executing the release function). Of course, any time you manipulate the system’s run queue you are in a critical region. Therefore, we seem to have a minor problem here – the whole purpose of a semaphore is to protect a critical region, yet the semaphore itself has a critical region we need to protect. This seems to involve circular reasoning. However, this problem is easily solved. Remember, the main reasons we do not turn off interrupts to protect a critical region is because that critical region may take a long time to execute or it may call other routines that turn the interrupts back on. The critical section in a semaphore is very short and does not call any other routines. Therefore, briefly turning off the interrupts while in the semaphore’s critical region is perfectly reasonable. If you are not allowed to turn off interrupts, you can always use a test and set instruction in a loop to protect a critical region. Although this introduces a busy-waiting loop, it turns out that you will never wait more than two time slices before exiting the busy-waiting loop, so you do not waste much CPU time waiting to enter the semaphore’s critical region. Although semaphores solve the two major problems with the busy waiting loop, it is very easy to get into trouble when using semaphores. For example, if a process waits on a semaphore and the semaphore grants exclusive access to the associate resource, then that process never releases the semaphore, any processes waiting on that semaphore will be suspended indefinitely. Likewise, any process that waits on the same semaphore twice without a release in-between will suspend itself, and any other processes that wait on that semaphore, indefinitely. Any process that does not release a resource it no longer needs violates the concept of a semaphore and is a logic error in the program. There are also some problems that may develop if a process waits on multiple semaphores before releasing any. We will return to that problem in the section on deadlocks (see “Deadlock” on page 1146).
4. This FIFO policy is but one example of a release policy. You could have some other policy based on a priority scheme. However, the FIFO policy does not promote starvation.
Page 1135
Chapter 19
Although we could write our own semaphore package (and there is good reason to), the Standard Library process package provides its own wait and release calls along with a definition for a semaphore variable. The next section describes those calls.
19.5.3 The UCR Standard Library Semaphore Support The UCR Standard Library process package provides two functions to manipulate semaphore variables: WaitSemaph and RlsSemaph. These functions wait and signal a semaphore, respectively. These routines mesh with the process management facilities, making it easy to implement synchronization using semaphores in your programs. The process package provides the following definition for a semaphore data type: semaphore SemaCnt smaphrLst endsmaphrLst semaphore
struct word dword dword ends
1 ? ?
The SemaCnt field determines how many more processes can share a resource (if positive), or how many processes are currently waiting for the resource (if negative). By default, this field is initialized to the value one. This allows one process at a time to use the resource protected by the semaphore. Each time a process waits on a semaphore, it decrements this field. If the decremented result is positive or zero, the wait operation immediately returns. If the decremented result is negative, then the wait operation moves the current process’ pcb from the run queue to the semaphore queue defined by the smaphrLst and endsmaphrLst fields in the structure above.
Most of the time you will use the default value of one for the SemaCnt field. There are some occasions, though, when you might want to allow more than one process access to some resource. For example, suppose you’ve developed a multiplayer game that communicates between different machines using the serial communications port or a network adapter card. You might have an area in the game which has room for only two players at a time. For example, players could be racing to a particular “transporter” room in an alien space ship, but there is room for only two players in the transporter room at a time. By initializing the semaphore variable to two, rather than one, the wait operation would allow two players to continue at one time rather than just one. When the third player attempts to enter the transporter room, the WaitSemaph function would block the player from entering the room until one of the other players left (perhaps by “transporting out” of the room). To use the WaitSemaph or RlsSemaph function is very easy; just load the es:di register pair with the address of desired semaphore variable and issue the appropriate function call. RlsSemaph always returns immediately (assuming a timer interrupt doesn’t occur while in RlsSemaph), the WaitSemaph call returns when the semaphore will allow access to the resource it protects. Examples of these two calls appear in the next section. Like the Standard Library coroutine and process packages, the semaphore package only preserves the 16 bit register set of the 80x86 CPU. If you want to use the 32 bit register set of the 80386 and later processors, you will need to modify the source code for the WaitSemaph and RlsSemaph functions. The code you need to change is almost identical to the code in the coroutine and process packages, so this is nearly a trivial change. Do keep in mind, though, that you will need to change this code if you use any 32 bit facilities of the 80386 and later processors.
19.5.4 Using Semaphores to Protect Critical Regions You can use semaphores to provide mutually exclusive access to any resource. For example, if several processes want to use the printer, you can create a semaphore that allows access to the printer by only one process at a time (a good example of a process that will be in the “critical region” for several minutes Page 1136
Processes, Coroutines, and Concurrency
at a time). However the most common task for a semaphore is to protect a critical region from reentry. Three common examples of code you need to protect from reentry include DOS calls, BIOS calls, and various Standard Library calls. Semaphores are ideal for controlling access to these functions. To protect DOS from reentry by several different processes, you need only create a DOSsmaph variable and issue appropriate WaitSemaph and RlsSemaph calls around the call to DOS. The following sample code demonstrates how to do this. ; MULTIDOS.ASM ; ; This program demonstrates how to use semaphores to protect DOS calls. .xlist include stdlib.a includelib stdlib.lib .list dseg
segment
para public ‘data’
DOSsmaph
semaphore {}
; Macros to wait and release the DOS semaphore: DOSWait
DOSRls
macro push push lesi WaitSemaph pop pop endm macro push push lesi RlsSemaph pop pop endm
es di DOSsmaph di es
es di DOSsmaph di es
; PCB for our background process: BkgndPCB
pcb
{0,offset EndStk2, seg EndStk2}
; Data the foreground and background processes print: StrPtrs1
dword dword dword
str1_a, str1_b, str1_c, str1_d, str1_e, str1_f str1_g, str1_h, str1_i, str1_j, str1_k, str1_l 0
str1_a str1_b str1_c str1_d str1_e str1_f str1_g str1_h str1_i str1_j str1_k str1_l
byte byte byte byte byte byte byte byte byte byte byte byte
“Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground:
StrPtrs2
dword dword dword
str2_a, str2_b, str2_c, str2_d, str2_e, str2_f str2_g, str2_h, str2_i 0
str2_a str2_b
byte byte
“Background: string ‘a’”,cr,lf,0 “Background: string ‘b’”,cr,lf,0
string string string string string string string string string string string string
‘a’”,cr,lf,0 ‘b’”,cr,lf,0 ‘c’”,cr,lf,0 ‘d’”,cr,lf,0 ‘e’”,cr,lf,0 ‘f’”,cr,lf,0 ‘g’”,cr,lf,0 ‘h’”,cr,lf,0 ‘i’”,cr,lf,0 ‘j’”,cr,lf,0 ‘k’”,cr,lf,0 ‘l’”,cr,lf,0
Page 1137
Chapter 19 str2_c str2_d str2_e str2_f str2_g str2_h str2_i
byte byte byte byte byte byte byte
dseg
ends
cseg
segment assume
“Background: “Background: “Background: “Background: “Background: “Background: “Background:
string string string string string string string
‘c’”,cr,lf,0 ‘d’”,cr,lf,0 ‘e’”,cr,lf,0 ‘f’”,cr,lf,0 ‘g’”,cr,lf,0 ‘h’”,cr,lf,0 ‘i’”,cr,lf,0
para public ‘code’ cs:cseg, ds:dseg
; A replacement critical error handler. This routine calls prcsquit ; if the user decides to abort the program. CritErrMsg
byte byte byte
cr,lf “DOS Critical Error!”,cr,lf “A)bort, R)etry, I)gnore, F)ail? $”
MyInt24
proc push push push
far dx ds ax
push pop lea mov int
cs ds dx, CritErrMsg ah, 9 21h
mov int and
ah, 1 21h al, 5Fh
;DOS read character call.
cmp jne pop mov jmp
al, ‘I’ NotIgnore ax al, 0 Quit24
;Ignore?
NotIgnore:
cmp jne pop mov jmp
al, ‘r’ NotRetry ax al, 1 Quit24
;Retry?
NotRetry:
cmp jne prcsquit pop mov jmp
al, ‘A’ NotAbort
;Abort?
cmp jne pop mov pop pop iret
al, ‘F’ BadChar ax al, 3 ds dx
mov mov jmp endp
ah, 2 dl, 7 Int24Lp
Int24Lp:
NotAbort:
Quit24:
BadChar: MyInt24
;DOS print string call.
;Convert l.c. -> u.c.
;If quitting, fix INT 8. ax al, 2 Quit24
;Bell character
; We will simply disable INT 23h (the break exception). MyInt23 MyInt23
Page 1138
proc iret endp
far
Processes, Coroutines, and Concurrency
; ; ; ; ; ; ;
This background process calls DOS to print several strings to the screen. In the meantime, the foreground process is also printing strings to the screen. To prevent reentry, or at least a jumble of characters on the screen, this code uses semaphores to protect the DOS calls. Therefore, each process will print one complete line then release the semaphore. If the other process is waiting it will print its line.
BackGround
PrintLoop:
proc mov mov lea cmp je les DOSWait puts DOSRls add jmp
BkGndDone: BackGround
die endp
Main
proc mov mov mov meminit
ax, dseg ds, ax bx, StrPtrs2 ;Array of str ptrs. word ptr [bx+2], 0 ;At end of pointers? BkGndDone di, [bx] ;Get string to print. ;Calls DOS to print string. bx, 4 PrintLoop
;Point at next str ptr. ;Terminate this process
ax, dseg ds, ax es, ax
; Initialize the INT 23h and INT 24h exception handler vectors. mov mov mov mov mov mov
ax, 0 es, ax word ptr es:[24h*4], offset MyInt24 es:[24h*4 + 2], cs word ptr es:[23h*4], offset MyInt23 es:[23h*4 + 2], cs
prcsinit lesi fork test je jmp
;Start multitasking system. BkgndPCB
;Fire up a new process
ax, ax ParentPrcs BackGround
;Parent’s return? ;Go do background stuff.
; The parent process will print a bunch of strings at the same time ; the background process is doing this. We’ll use the DOS semaphore ; to protect the call to DOS that PUTS makes. ParentPrcs: DlyLp0: DlyLp1: DlyLp2:
PrintLoop:
ForeGndDone:
DOSWait mov loop loop loop DOSRls lea cmp je les DOSWait puts DOSRls add jmp
cx, 0 DlyLp0 DlyLp1 DlyLp2
;Force the other process ; to wind up waiting in ; the semaphore queue by ; delay for at least one ; clock tick.
bx, StrPtrs1 ;Array of str ptrs. word ptr [bx+2],0 ;At end of pointers? ForeGndDone di, [bx] ;Get string to print. ;Calls DOS to print string. bx, 4 PrintLoop
;Point at next str ptr.
prcsquit
Page 1139
Chapter 19 Quit: Main
ExitPgm endp
cseg
ends
sseg
segment
;DOS macro to quit program.
para stack ‘stack’
; Here is the stack for the background process we start stk2 EndStk2
byte word
1024 dup (?) ?
;Here’s the stack for the main program/foreground process. stk sseg
byte ends
1024 dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
This program doesn’t directly call DOS, but it calls the Standard Library puts routine that does. In general, you could use a single semaphore to protect all BIOS, DOS, and Standard Library calls. However, this is not particularly efficient. For example, the Standard Library pattern matching routines make no DOS calls; therefore, waiting on the DOS semaphore to do a pattern match while some other process is making a DOS call unnecessarily delays the pattern match. There is nothing wrong with having one process do a pattern match while another is making a DOS call. Unfortunately, some Standard Library routines do make DOS calls (puts is a good example), so you must use the DOS semaphore around such calls. In theory, we could use separate semaphores to protect DOS, different BIOS calls, and different Standard Library calls. However, keeping track of all those semaphores within a program is a big task. Furthermore, ensuring that a call to DOS does not also invoke an unprotected BIOS routine is a difficult task. So most programmers use a single semaphore to protect all Standard Library, DOS, and BIOS calls.
19.5.5 Using Semaphores for Barrier Synchronization Although the primary use of a semaphores is to provide exclusive access to some resource, there are other synchronization uses for semaphores as well. In this section we’ll look at the use of the Standard Library’s semaphores objects to create a barrier. A barrier is a point in a program where a process stops and waits for other processes to synchronize (reach their respective barriers). In many respects, a barrier is the dual to a semaphore. A semaphore prevents more than n processes from gaining access to some resource. A barrier does not grant access until at least n processes are requesting access. Given the different nature of these two synchronization methods, you might think that it would be difficult to use the WaitSemaph and RlsSemaph routines to implement barriers. However, it turns out to be quite simple. Suppose we were to initialize the semaphore's SemaCnt field to zero rather than one. When the first process waits on this semaphore, the system will immediately block that process. Likewise, each additional process that waits on this semaphore will block and wait on the semaphore queue. This would normally be a disaster since there is no active process that will signal the semaphore so it will activate the blocked processes. However, if we modify the wait call so that it checks the SemaCnt field before actually doing the wait, the nth process can skip the wait call and reactivate the other processes. Consider the following macro:
Page 1140
Processes, Coroutines, and Concurrency barrier
AllHere: AllDone:
macro local cmp jle WaitSemaph cmp je RlsSemaph
Wait4Cnt AllHere, AllDone es:[di].semaphore.SemaCnt, -(Wait4Cnt-1) AllHere es:[di].semaphore.SemaCnt, 0 AllDone
endm
This macro expects a single parameter that should be the number of processes (including the current process) that need to be at a barrier before any of the processes can proceed. The SemaCnt field is a negative number whose absolute value determines how many processes are currently waiting on the semaphore. If a barrier requires four processes, no process can proceed until the fourth process hits the barrier; at that time the SemaCnt field will contain minus three. The macro above computes what the value of SemaCnt should be if all processes are at the barrier. If SemaCnt matches this value, it signals the semaphore that begins a chain of operations with each blocked process releasing the next. When SemaCnt hits zero, the last blocked process does not release the semaphore since there are no other processes waiting on the queue. It is very important to remember to initialize the SemaCnt field to zero before using semaphores for barrier synchronization in this manner. If you do not initialize SemaCnt to zero, the WaitSemaph call will probably not block any of the processes. The following sample program provides a simple example of barrier synchronization using the Standard Library’s semaphore package: ; ; ; ; ; ; ; ;
BARRIER.ASM This sample program demonstrates how to use the Standard Library’s semaphore objects to synchronize several processes at a barrier. This program is similar to the MULTIDOS.ASM program insofar as the background processes all print a set of strings. However, rather than using an inelegant delay loop to synchronize the foreground and background processes, this code uses barrier synchronization to achieve this. .xlist include stdlib.a includelib stdlib.lib .list
dseg
segment
para public ‘data’
BarrierSemaph semaphore {0} DOSsmaph semaphore {}
;Must init SemaCnt to zero.
; Macros to wait and release the DOS semaphore: DOSWait
DOSRls
macro push push lesi WaitSemaph pop pop endm macro push push lesi RlsSemaph pop pop endm
es di DOSsmaph di es
es di DOSsmaph di es
; Macro to synchronize on a barrier:
Page 1141
Chapter 19 Barrier
AllHere: AllDone:
macro local cmp jle WaitSemaph cmp jge RlsSemaph
Wait4Cnt AllHere, AllDone es:[di].semaphore.SemaCnt, -(Wait4Cnt-1) AllHere es:[di].semaphore.SemaCnt, 0 AllDone
endm
; PCBs for our background processes: BkgndPCB2 BkgndPCB3
pcb pcb
{0,offset EndStk2, seg EndStk2} {0,offset EndStk3, seg EndStk3}
; Data the foreground and background processes print: StrPtrs1
dword dword dword
str1_a, str1_b, str1_c, str1_d, str1_e, str1_f str1_g, str1_h, str1_i, str1_j, str1_k, str1_l 0
str1_a str1_b str1_c str1_d str1_e str1_f str1_g str1_h str1_i str1_j str1_k str1_l
byte byte byte byte byte byte byte byte byte byte byte byte
“Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground: “Foreground:
StrPtrs2
dword dword dword
str2_a, str2_b, str2_c, str2_d, str2_e, str2_f str2_g, str2_h, str2_i 0
str2_a str2_b str2_c str2_d str2_e str2_f str2_g str2_h str2_i
byte byte byte byte byte byte byte byte byte
“Background “Background “Background “Background “Background “Background “Background “Background “Background
StrPtrs3
dword dword dword
str3_a, str3_b, str3_c, str3_d, str3_e, str3_f str3_g, str3_h, str3_i 0
str3_a str3_b str3_c str3_d str3_e str3_f str3_g str3_h str3_i
byte byte byte byte byte byte byte byte byte
“Background “Background “Background “Background “Background “Background “Background “Background “Background
dseg
ends
cseg
segment assume
string string string string string string string string string string string string
1: 1: 1: 1: 1: 1: 1: 1: 1:
2: 2: 2: 2: 2: 2: 2: 2: 2:
‘a’”,cr,lf,0 ‘b’”,cr,lf,0 ‘c’”,cr,lf,0 ‘d’”,cr,lf,0 ‘e’”,cr,lf,0 ‘f’”,cr,lf,0 ‘g’”,cr,lf,0 ‘h’”,cr,lf,0 ‘i’”,cr,lf,0 ‘j’”,cr,lf,0 ‘k’”,cr,lf,0 ‘l’”,cr,lf,0
string string string string string string string string string
string string string string string string string string string
‘a’”,cr,lf,0 ‘b’”,cr,lf,0 ‘c’”,cr,lf,0 ‘d’”,cr,lf,0 ‘e’”,cr,lf,0 ‘f’”,cr,lf,0 ‘g’”,cr,lf,0 ‘h’”,cr,lf,0 ‘i’”,cr,lf,0
‘j’”,cr,lf,0 ‘k’”,cr,lf,0 ‘l’”,cr,lf,0 ‘m’”,cr,lf,0 ‘n’”,cr,lf,0 ‘o’”,cr,lf,0 ‘p’”,cr,lf,0 ‘q’”,cr,lf,0 ‘r’”,cr,lf,0
para public ‘code’ cs:cseg, ds:dseg
; A replacement critical error handler. This routine calls prcsquit ; if the user decides to abort the program.
Page 1142
Processes, Coroutines, and Concurrency
CritErrMsg
byte byte byte
cr,lf “DOS Critical Error!”,cr,lf “A)bort, R)etry, I)gnore, F)ail? $”
MyInt24
proc push push push
far dx ds ax
push pop lea mov int
cs ds dx, CritErrMsg ah, 9 21h
mov int and
ah, 1 21h al, 5Fh
;DOS read character call.
cmp jne pop mov jmp
al, ‘I’ NotIgnore ax al, 0 Quit24
;Ignore?
NotIgnore:
cmp jne pop mov jmp
al, ‘r’ NotRetry ax al, 1 Quit24
;Retry?
NotRetry:
cmp jne prcsquit pop mov jmp
al, ‘A’ NotAbort
;Abort?
cmp jne pop mov pop pop iret
al, ‘F’ BadChar ax al, 3 ds dx
mov mov jmp endp
ah, 2 dl, 7 Int24Lp
Int24Lp:
NotAbort:
Quit24:
BadChar: MyInt24
;DOS print string call.
;Convert l.c. -> u.c.
;If quitting, fix INT 8. ax al, 2 Quit24
;Bell character
; We will simply disable INT 23h (the break exception). MyInt23 MyInt23
; ; ; ; ; ; ;
proc iret endp
far
This background processes call DOS to print several strings to the screen. In the meantime, the foreground process is also printing strings to the screen. To prevent reentry, or at least a jumble of characters on the screen, this code uses semaphores to protect the DOS calls. Therefore, each process will print one complete line then release the semaphore. If the other process is waiting it will print its line.
BackGround1
proc mov mov
ax, dseg ds, ax
Page 1143
Chapter 19 ; Wait for everyone else to get ready: lesi barrier
BarrierSemaph 3
; Okay, start printing the strings: PrintLoop:
lea cmp je les DOSWait puts DOSRls add jmp
bx, StrPtrs2 ;Array of str ptrs. word ptr [bx+2],0 ;At end of pointers? BkGndDone di, [bx] ;Get string to print. ;Calls DOS to print string. bx, 4 PrintLoop
;Point at next str ptr.
BkGndDone: BackGround1
die endp
BackGround2
proc mov mov
ax, dseg ds, ax
lesi barrier
BarrierSemaph 3
lea cmp je les DOSWait puts DOSRls add jmp
bx, StrPtrs3 ;Array of str ptrs. word ptr [bx+2],0 ;At end of pointers? BkGndDone di, [bx] ;Get string to print.
PrintLoop:
BkGndDone: BackGround2
die endp
Main
proc mov mov mov meminit
;Calls DOS to print string. bx, 4 PrintLoop
;Point at next str ptr.
ax, dseg ds, ax es, ax
; Initialize the INT 23h and INT 24h exception handler vectors. mov mov mov mov mov mov
ax, 0 es, ax word ptr es:[24h*4], offset MyInt24 es:[24h*4 + 2], cs word ptr es:[23h*4], offset MyInt23 es:[23h*4 + 2], cs
prcsinit
;Start multitasking system.
; Start the first background process: lesi fork test je jmp
BkgndPCB2
;Fire up a new process
ax, ax StartBG2 BackGround1
;Parent’s return? ;Go do backgroun stuff.
; Start the second background process: StartBG2:
Page 1144
lesi fork
BkgndPCB3
;Fire up a new process
Processes, Coroutines, and Concurrency test je jmp
ax, ax ParentPrcs BackGround2
;Parent’s return? ;Go do backgroun stuff.
; The parent process will print a bunch of strings at the same time ; the background process is doing this. We’ll use the DOS semaphore ; to protect the call to DOS that PUTS makes. ParentPrcs:
PrintLoop:
lesi barrier
BarrierSemaph 3
lea cmp je les DOSWait puts DOSRls add jmp
bx, StrPtrs1 ;Array of str ptrs. word ptr [bx+2],0 ;At end of pointers? ForeGndDone di, [bx] ;Get string to print.
ForeGndDone:
prcsquit
Quit: Main
ExitPgm endp
cseg
ends
sseg
segment
;Calls DOS to print string. bx, 4 PrintLoop
;Point at next str ptr.
;DOS macro to quit program.
para stack ‘stack’
; Here are the stacks for the background processes we start stk2 EndStk2
byte word
1024 dup (?) ?
stk3 EndStk3
byte word
1024 dup (?) ?
;Here’s the stack for the main program/foreground process. stk sseg
byte ends
1024 dup (?)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Sample Output: Background 1: string ‘a’ Background 1: string ‘b’ Background 1: string ‘c’ Background 1: string ‘d’ Background 1: string ‘e’ Background 1: string ‘f’ Foreground: string ‘a’ Background 1: string ‘g’ Background 2: string ‘j’ Foreground: string ‘b’ Background 1: string ‘h’ Background 2: string ‘k’ Foreground: string ‘c’ Background 1: string ‘i’ Background 2: string ‘l’ Foreground: string ‘d’ Background 2: string ‘m’ Foreground: string ‘e’ Background 2: string ‘n’ Foreground: string ‘f’ Background 2: string ‘o’ Foreground: string ‘g’
Page 1145
Chapter 19 Background 2: string ‘p’ Foreground: string ‘h’ Background 2: string ‘q’ Foreground: string ‘i’ Background 2: string ‘r’ Foreground: string ‘j’ Foreground: string ‘k’ Foreground: string ‘l’
Note how background process number one ran for one clock period before the other processes waited on the DOS semaphore. After this initial burst, the processes all took turns calling DOS.
19.6
Deadlock Although semaphores can solve any synchronization problems, don’t get the impression that semaphores don’t introduce problems of their own. As you’ve already seen, the improper use of semaphores can result in the indefinite suspension of processes waiting on the semaphore queue. However, even if you correctly wait and signal individual semaphores, it is quite possible for correct operations on combinations of semaphores to produce this same effect. Indefinite suspension of a process because of semaphore problems is a serious issue. This degenerate situation is known as deadlock or deadly embrace. Deadlock occurs when one process holds one resource and is waiting for another while a second process is holding that other resource and waiting for the first. To see how deadlock can occur, consider the following code: ; Process one: lesi Semaph1 WaitSemaph « Assume interrupt occurs here » lesi Semaph2 WaitSemaph . . . ; Process two: lesi Semaph2 WaitSemaph lesi Semaph1 WaitSemaph . . .
Process one grabs the semaphore associated with Semaph1. Then a timer interrupt comes along which causes a context switch to process two. Process two grabs the semaphore associated with Semaph2 and then tries to get Semaph1. However, process one is already holding Semaph1, so process two blocks and waits for process one to release this semaphore. This returns control (eventually) to process one. Process one then tries to graph Semaph2. Unfortunately, process two is already holding Semaph2, so process one blocks waiting for Semaph2. Now both processes are blocked waiting for the other. Since neither process can run, neither process can release the semaphore the other needs. Both processes are deadlocked. One easy way to prevent deadlock from occurring is to never allow a process to hold more than one semaphore at a time. Unfortunately, this is not a practical solution; many processes may need to have exclusive access to several resources at one time. However, we can devise another solution by observing the pattern that resulted in deadlock in the previous example. Deadlock came about because the two processes grabbed different semaphores and then tried to grab the semaphore that the other was holding. In Page 1146
Processes, Coroutines, and Concurrency
other words, they grabbed the two semaphores in a different order (process one grabbed Semaph1 first and Semaph2 second, process two grabbed Semaph2 first and Semaph1 second). It turns out that two process will never deadlock if they wait on common semaphores in the same order. We could modify the previous example to eliminate the possibility of deadlock thusly: ; Process one: lesi Semaph1 WaitSemaph lesi Semaph2 WaitSemaph . . . ; Process two: lesi Semaph1 WaitSemaph lesi Semaph2 WaitSemaph . . .
Now it doesn’t matter where the interrupt occurs above, deadlock cannot occur. If the interrupt occurs between the two WaitSemaph calls in process one (as before), when process two attempts to wait on Semaph1, it will block and process one will continue with Semaph2 available. An easy way to keep out of trouble with deadlock is to number all your semaphore variables and make sure that all processes acquire (wait on) semaphores from the smallest numbered semaphore to the highest. This ensures that all processes acquire the semaphores in the same order, and that ensures that deadlock cannot occurs. Note that this policy of acquiring semaphores only applies to semaphores that a process holds concurrently. If a process needs semaphore six for a while, and then it needs semaphore two after it has released semaphore six, there is no problem acquiring semaphore two after releasing semaphore six. However, if at any point the process needs to hold both semaphores, it must acquire semaphore two first. Processes may release the semaphores in any order. The order that a process releases semaphores does not affect whether deadlock can occur. Of course, processes should always release a semaphore as soon as the process is done with the resource guarded by that semaphore; there may be other processes waiting on that semaphore. While the above scheme works and is easy to implement, it is by no means the only way to handle deadlock, nor is it always the most efficient. However, it is simple to implement and it always works. For more information on deadlocks, see a good operating systems text.
19.7
Summary Despite the fact that DOS is not reentrant and doesn’t directly support multitasking, that doesn’t mean your applications can’t multitask; it’s just difficult to get different applications to run independently of one another under DOS. Although DOS doesn’t switch among different programs in memory, DOS certainly allows you to load multiple programs into memory at one time. The only catch is that only one such program actually executes. DOS provides several calls to load and execute “.EXE” and “.COM” files from the disk. These processes effectively behave like subroutine calls, with control returning to the program invoking such a program only after that “child” program terminates. For more details, see • •
“DOS Processes” on page 1065 “Child Processes in DOS” on page 1065 Page 1147
Chapter 19
• • • • •
“Load and Execute” on page 1066 “Load Program” on page 1068 “Loading Overlays” on page 1069 “Terminating a Process” on page 1069 “Obtaining the Child Process Return Code” on page 1070
Certain errors can occur during the execution of a DOS process that transfer control to exception handlers. Besides the 80x86 exceptions, DOS’ break handler and critical error handler are the primary examples. Any program that patches the interrupt vectors should provide its own exception handlers for these conditions so it can restore interrupts on a ctrl-C or I/O error exception. Furthermore, well-written program always provide replacement exception handlers for these two conditions that provide better support that the default DOS handlers. For more information on DOS exceptions, see • • •
“Exception Handling in DOS: The Break Handler” on page 1070 “Exception Handling in DOS: The Critical Error Handler” on page 1071 “Exception Handling in DOS: Traps” on page 1075
When a parent process invokes a child process with the LOAD or LOADEXEC calls, the child process inherits all open files from the parent process. In particular, the child process inherits the standard input, standard output, standard error, auxiliary I/O, and printer devices. The parent process can easily redirect I/O to/from these devices before passing control to a child process. This, in effect, redirects the I/O during the execution of the child process. For more details, see •
“Redirection of I/O for Child Processes” on page 1075
When two DOS programs want to communicate with each other, they typically read and write data to a file. However, creating, opening, reading, and writing files is a lot of work, especially just to share a few variable values. A better alternative is to use shared memory. Unfortunately, DOS does not provide support to allow two programs to share a common block of memory. However, it is very easy to write a TSR that manages shared memory for various programs. For details and the complete code to two shared memory managers, see: • • •
“Shared Memory” on page 1078 “Static Shared Memory” on page 1078 “Dynamic Shared Memory” on page 1088
A coroutine call is the basic mechanism for switching control between two processes. A “cocall” operation is the equivalent of a subroutine call and return all rolled into one operation. A cocall transfers control to some other process. When some other process returns control to a coroutine (via cocall), control resumes with the first instruction after the cocall code. The UCR Standard Library provides complete coroutine support so you can easily put coroutines into your assembly language programs. For all the details on coroutines, plus a neat maze generator program that uses coroutines, see •
“Coroutines” on page 1103
Although you can use coroutines to simulate multitasking (“cooperative multitasking”), the major problem with coroutines is that each application must decide when to switch to another process via a cocall. Although this eliminates certain reentrancy and synchronization problems, deciding when and where to make such calls increases the work necessary to write multitasking applications. A better approach is to use preemptive multitasking where the timer interrupt performs the context switches. Reentrancy and synchronization problems develop in such a system, but with care those problems are easily overcome. For the details on true preemptive multitasking, and to see how the UCR Standard Library supports multitasking, see • • • • •
Page 1148
“Multitasking” on page 1124 “Lightweight and HeavyWeight Processes” on page 1124 “The UCR Standard Library Processes Package” on page 1125 “Problems with Multitasking” on page 1126 “A Sample Program with Threads” on page 1127
Processes, Coroutines, and Concurrency
Preemptive multitasking opens up a Pandora’s box. Although multitasking makes certain programs easier to implement, the problems of process synchronization and reentrancy rears its ugly head in a multitasking system. Many processes require some sort of synchronized access to global variables. Further, most processes will need to call DOS, BIOS, or some other routine (e.g., the Standard Library) that is not reentrant. Somehow we need to control access to such code so that multiple processes do not adversely affect one another. Synchronization is achievable using several different techniques. In some simple cases we can simply turn off the interrupts, eliminating the reentrancy problems. In other cases we can use test and set or semaphores to protect a critical region. For more details on these synchronization operations, see • • • • • •
“Synchronization” on page 1129 “Atomic Operations, Test & Set, and Busy-Waiting” on page 1132 “Semaphores” on page 1134 “The UCR Standard Library Semaphore Support” on page 1136 “Using Semaphores to Protect Critical Regions” on page 1136 “Using Semaphores for Barrier Synchronization” on page 1140
The use of synchronization objects, like semaphores, can introduce new problems into a system. Deadlock is a perfect example. Deadlock occurs when one process is holding some resource and wants another and a second process is hold the desired resource and wants the resource held by the first process5. You can easily avoid deadlock by controlling the order that the various processes acquire groups of semaphores. For all the details, see •
“Deadlock” on page 1146
5. Or any chain of processes where everyone in the chain is holding something that another process in the chain wants.
Page 1149
Chapter 19
Page 1150
The PC Keyboard
Chapter 20
The PC’s keyboard is the primary human input device on the system. Although it seems rather mundane, the keyboard is the primary input device for most software, so learning how to program the keyboard properly is very important to application developers. IBM and countless keyboard manufacturers have produced numerous keyboards for PCs and compatibles. Most modern keyboards provide at least 101 different keys and are reasonably compatible with the IBM PC/AT 101 Key Enhanced Keyboard. Those that do provide extra keys generally program those keys to emit a sequence of other keystrokes or allow the user to program a sequence of keystrokes on the extra keys. Since the 101 key keyboard is ubiquitous, we will assume its use in this chapter. When IBM first developed the PC, they used a very simple interface between the keyboard and the computer. When IBM introduced the PC/AT, they completely redesigned the keyboard interface. Since the introduction of the PC/AT, almost every keyboard has conformed to the PC/AT standard. Even when IBM introduced the PS/2 systems, the changes to the keyboard interface were minor and upwards compatible with the PC/AT design. Therefore, this chapter will also limit its attention to PC/AT compatible devices since so few PC/XT keyboards and systems are still in use. There are five main components to the keyboard we will consider in this chapter – basic keyboard information, the DOS interface, the BIOS interface, the int 9 keyboard interrupt service routine, and the hardware interface to the keyboard. The last section of this chapter will discuss how to fake keyboard input into an application.
20.1
Keyboard Basics The PC’s keyboard is a computer system in its own right. Buried inside the keyboards case is an 8042 microcontroller chip that constantly scans the switches on the keyboard to see if any keys are down. This processing goes on in parallel with the normal activities of the PC, hence the keyboard never misses a keystroke because the 80x86 in the PC is busy. A typical keystroke starts with the user pressing a key on the keyboard. This closes an electrical contact in the switch so the microcontroller and sense that you’ve pressed the switch. Alas, switches (being the mechanical things that they are) do not always close (make contact) so cleanly. Often, the contacts bounce off one another several times before coming to rest making a solid contact. If the microcontroller chip reads the switch constantly, these bouncing contacts will look like a very quick series of key presses and releases. This could generate multiple keystrokes to the main computers, a phenomenon known as keybounce, common to many cheap and old keyboards. But even on the most expensive and newest keyboards, keybounce is a problem if you look at the switch a million times a second; mechanical switches simply cannot settle down that quickly. Most keyboard scanning algorithms, therefore, control how often they scan the keyboard. A typical inexpensive key will settle down within five milliseconds, so if the keyboard scanning software only looks at the key every ten milliseconds, or so, the controller will effectively miss the keybounce1. Simply noting that a key is pressed is not sufficient reason to generate a key code. A user may hold a key down for many tens of milliseconds before releasing it. The keyboard controller must not generate a new key sequence every time it scans the keyboard and finds a key held down. Instead, it should generate a single key code value when the key goes from an up position to the down position (a down key operation). Upon detecting a down key stroke, the microcontroller sends a keyboard scan code to the PC. The scan code is not related to the ASCII code for that key, it is an arbitrary value IBM chose when they first developed the PC’s keyboard.
1. A typical user cannot type 100 characters/sec nor reliably press a key for less than 1/50th of a second, so scanning the keyboard at 10 msec intervals will not lose any keystrokes.
Page 1153 Thi d
t
t d ith F
M k
402
Chapter 20
The PC keyboard actually generates two scan codes for every key you press. It generates a down code when you press a key and an up code when you release the key. The 8042 microcontroller chip transmits these scan codes to the PC where they are processed by the keyboard’s interrupt service routine. Having separate up and down codes is important because certain keys (like shift, control, and alt) are only meaningful when held down. By generating up codes for all the keys, the keyboard ensures that the keyboard interrupt service routine knows which keys are pressed while the user is holding down one of these modifier keys. The following table lists the scan codes that the keyboard microcontroller transmits to the PC:
Table 72: PC Keyboard Scan Codes (in hex) Key
Down
Up
Key
Down
Up
Key
Down
Up
Key
Down
Up
Esc
1
81
[{
1A
9A
,<
33
B3
center
4C
CC
1!
2
82
]}
1B
9B
.>
34
B4
right
4D
CD
2@
3
83
Enter
1C
9C
/?
35
B5
+
4E
CE
3#
4
84
Ctrl
1D
9D
R shift
36
B6
end
4F
CF
4$
5
85
A
1E
9E
* PrtSc
37
B7
down
50
D0
5%
6
86
S
1F
9F
alt
38
B8
pgdn
51
D1
6^
7
87
D
20
A0
space
39
B9
ins
52
D2
7&
8
88
F
21
A1
CAPS
3A
BA
del
53
D3
8*
9
89
G
22
A2
F1
3B
BB
/
E0 35
B5
9(
0A
8A
H
23
A3
F2
3C
BC
enter
E0 1C
9C
0)
0B
8B
J
24
A4
F3
3D
BD
F11
57
D7
-_
0C
8C
K
25
A5
F4
3E
BE
F12
58
D8
=+
0D
8D
L
26
A6
F5
3F
BF
ins
E0 52
D2
Bksp
0E
8E
;:
27
A7
F6
40
C0
del
E0 53
D3
Tab
0F
8F
‘“
28
A8
F7
41
C1
home
E0 47
C7
Q
10
90
`~
29
A9
F8
42
C2
end
E0 4F
CF
W
11
91
L shift
2A
AA
F9
43
C3
pgup
E0 49
C9
E
12
92
\|
2B
AB
F10
44
C4
pgdn
E0 51
D1
R
13
93
Z
2C
AC
NUM
45
C5
left
E0 4B
CB
T
14
94
X
2D
AD
SCRL
46
C6
right
E0 4D
CD
Y
15
95
C
2E
AE
home
47
C7
up
E0 48
C8
U
16
96
V
2F
AF
up
48
C8
down
E0 50
D0
I
17
97
B
30
B0
pgup
49
C9
R alt
E0 38
B8
O
18
98
N
31
B1
-
4A
CA
R ctrl
E0 1D
9D
P
19
99
M
32
B2
left
4B
CB
Pause
E1 1D 45 E1 9D C5
-
The keys in italics are found on the numeric keypad. Note that certain keys transmit two or more scan codes to the system. The keys that transmit more than one scan code were new keys added to the keyboard when IBM designed the 101 key enhanced keyboard.
Page 1154
The PC Keyboard
When the scan code arrives at the PC, a second microcontroller chip receives the scan code, does a conversion on the scan code2, makes the scan code available at I/O port 60h, and then interrupts the processor and leaves it up to the keyboard ISR to fetch the scan code from the I/O port. The keyboard (int 9) interrupt service routine reads the scan code from the keyboard input port and processes the scan code as appropriate. Note that the scan code the system receives from the keyboard microcontroller is a single value, even though some keys on the keyboard represent up to four different values. For example, the “A” key on the keyboard can produce A, a, ctrl-A, or alt-A. The actual code the system yields depends upon the current state of the modifier keys (shift, ctrl, alt, capslock, and numlock). For example, if an A key scan code comes along (1Eh) and the shift key is down, the system produces the ASCII code for an uppercase A. If the user is pressing multiple modifier keys the system prioritizes them from low to high as follows: • • • • •
No modifier key down Numlock/Capslock (same precedence, lowest priority) shift ctrl alt (highest priority)
Numlock and capslock affect different sets of keys3, so there is no ambiguity resulting from their equal precedence in the above chart. If the user is pressing two modifier keys at the same time, the system only recognizes the modifier key with the highest priority above. For example, if the user is pressing the ctrl and alt keys at the same time, the system only recognizes the alt key. The numlock, capslock, and shift keys are a special case. If numlock or capslock is active, pressing the shift key makes it inactive. Likewise, if numlock or capslock is inactive, pressing the shift key effectively “activates” these modifiers. Not all modifiers are legal for every key. For example, ctrl-8 is not a legal combination. The keyboard interrupt service routine ignores all keypresses combined with illegal modifier keys. For some unknown reason, IBM decided to make certain key combinations legal and others illegal. For example, ctrl-left and ctrl-right are legal, but ctrl-up and ctrl-down are not. You’ll see how to fix this problem a little later. The shift, ctrl, and alt keys are active modifiers. That is, modification to a keypress occurs only while the user holds down one of these modifier keys. The keyboard ISR keeps track of whether these keys are down or up by setting an associated bit upon receiving the down code and clearing that bit upon receiving the up code for shift, ctrl, or alt. In contrast, the numlock, scroll lock, and capslock keys are toggle modifiers4. The keyboard ISR inverts an associated bit every time it sees a down code followed by an up code for these keys. Most of the keys on the PC’s keyboard correspond to ASCII characters. When the keyboard ISR encounters such a character, it translates it to a 16 bit value whose L.O. byte is the ASCII code and the H.O. byte is the key’s scan code. For example, pressing the “A” key with no modifier, with shift, and with control produces 1E61h, 1E41h, and 1E01h, respectively (“a”, “A”, and ctrl-A). Many key sequences do not have corresponding ASCII codes. For example, the function keys, the cursor control keys, and the alt key sequences do not have corresponding ASCII codes. For these special extended code, the keyboard ISR stores a zero in the L.O. byte (where the ASCII code typically goes) and the extended code goes in the H.O. byte. The extended code is usually, though certainly not always, the scan code for that key. The only problem with this extended code approach is that the value zero is a legal ASCII character (the NUL character). Therefore, you cannot directly enter NUL characters into an application. If an application must input NUL characters, IBM has set aside the extended code 0300h (ctrl-3) for this purpose. You application must explicitly convert this extended code to the NUL character (actually, it need only recog-
2. The keyboard doesn’t actually transmit the scan codes appearing in the previous table. Instead, it transmits its own scan code that the PC’s microcontroller translates to the scan codes in the table. Since the programmer never sees the native scan codes so we will ignore them. 3. Numlock only affects the keys on the numeric keypad, capslock only affects the alphabetic keys. 4. It turns out the INS key is also a toggle modifier, since it toggles a bit in the BIOS variable area. However, INS also returns a scan code, the other modifiers do not.
Page 1155
Chapter 20
nize the H.O. value 03, since the L.O. byte already is the NUL character). Fortunately, very few programs need to allow the input of the NUL character from the keyboard, so this problem is rarely an issue. The following table lists the scan and extended key codes the keyboard ISR generates for applications in response to a keypress with various modifiers. Extended codes are in italics. All other values (except the scan code column) represent the L.O. eight bits of the 16 bit code. The H.O. byte comes from the scan code column.
Table 73: Keyboard Codes (in hex) Key Esc 1! 2@ 3# 4$ 5% 6^ 7& 8* 9( 0) -_ =+ Bksp Tab Q W E R T Y U I O P [{ ]} enter ctrl A S D F G H J K L ;: ‘“ Key
Page 1156
Scan Code 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 Scan Code
ASCII
Shifta
Ctrl
1B 31 32 33 34 35 36 37 38 39 30 2D 3D 08 09 71 77 65 72 74 79 75 69 6F 70 5B 5D 0D
1B 21 40 23 24 25 5E 26 2A 28 29 5F 2B 08 0F00 51 57 45 52 54 59 55 49 4F 50 7B 7D 0D
1B
61 73 64 66 67 68 6A 6B 6C 3B 27 ASCII
41 53 44 46 47 48 4A 4B 4C 3A 22 Shift
0300
1E
1F
Alt
7800 7900 7A00 7B00 7C00 7D00 7E00 7F00 8000 8100 8200 8300
7F 11 17 05 12 14 19 15 09 0F 10 1B 1D 0A
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900
01 13 04 06 07 08 0A 0B 0C
1E00 1F00 2000 2100 2200 2300 2400 2500 2600
Ctrl
Alt
Num
Caps
1B 31 32 33 34 35 36 37 38 39 30 2D 3D 08 09 71 77 65 72 74 79 75 69 6F 70 5B 5D 0D
1B 31 32 33 34 35 36 37 38 39 30 2D 3D 08 09 51 57 45 52 54 59 55 49 4F 50 5B 5D 0D
61 73 64 66 67 68 6A 6B 6C 3B 27 Num
41 53 44 46 47 48 4A 4B 4C 3B 27 Caps
Shift Caps 1B 31 32 33 34 35 36 37 38 39 30 5F 2B 08 0F00 71 77 65 72 74 79 75 69 6F 70 7B 7D 0A
Shift Num 1B 31 32 33 34 35 36 37 38 39 30 5F 2B 08 0F00 51 57 45 52 54 59 55 49 4F 50 7B 7D 0A
61 73 64 66 67 68 6A 6B 6C 3A 22 Shift Caps
41 53 44 46 47 48 4A 4B 4C 3A 22 Shift Num
The PC Keyboard
Table 73: Keyboard Codes (in hex) Key `~ Lshift \| Z X C V B N M ,< .> /? Rshift * PrtSc alt space caps F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 num scrl home up pgup -d left center right +e end down pgdn ins del Key
Scan Code 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 Scan Code
ASCII
Shifta
60
7E
5C 7A 78 63 76 62 6E 6D 2C 2E 2F
7C 5A 58 43 56 42 4E 4D 3C 3E 3F
1C 1A 18 03 16 02 0E 0D
2A
INT 5b
20
Ctrl
Alt
60
Shift Caps 7E
Shift Num 7E
5C 7A 78 63 76 62 6E 6D 2C 2E 2F
5C 5A 58 43 56 42 4E 4D 2C 2E 2F
7C 7A 78 63 76 62 6E 6D 3C 3E 3F
7C 5A 58 43 56 42 4E 4D 3C 3E 3F
10c
2A
2A
INT 5
INT 5
20
20
20
20
20
20
3B00 3C00 3D00 3E00 3F00 4000 4100 4200 4300 4400
5400 5500 5600 5700 5800 5900 5A00 5B00 5C00 5D00
5E00 5F00 6000 6100 6200 6300 6400 6500 6600 6700
6800 6900 6A00 6B00 6C00 6D00 6E00 6F00 7000 7100
3B00 3C00 3D00 3E00 3F00 4000 4100 4200 4300 4400
3B00 3C00 3D00 3E00 3F00 4000 4100 4200 4300 4400
5400 5500 5600 5700 5800 5900 5A00 5B00 5C00 5D00
5400 5500 5600 5700 5800 5900 5A00 5B00 5C00 5D00
4700 4800 4900 2D 4B00 4C00 4D00 2B 4F00 5000 5100 5200 5300 ASCII
37 38 39 2D 34 35 36 2B 31 32 33 30 2E Shift
7700
Alt
37 38 39 2D 34 35 36 2B 31 32 33 30 2E Num
4700 4800 4900 2D 4B00 4C00 4D00 2B 4F00 5000 5100 5200 5300 Caps
37 38 39 2D 34 35 36 2B 31 32 33 30 2E Shift Caps
4700 4800 4900 2D 4B00 4C00 4D00 2B 4F00 5000 5100 5200 5300 Shift Num
2C00 2D00 2E00 2F00 3000 3100 3200
8400 7300 7400 7500 7600
Ctrl
Num
Caps
60
a. For the alphabetic characters, if capslock is active then see the shift-capslock column. b. Pressing the PrtSc key does not produce a scan code. Instead, BIOS executes an int 5 instruction which should print the screen. c. This is the control-P character that will activate the printer under MS-DOS. d. This is the minus key on the keypad. e. This is the plus key on the keypad.
Page 1157
Chapter 20
The 101-key keyboards generally provide an enter key and a “/” key on the numeric keypad. Unless you write your own int 9 keyboard ISR, you will not be able to differentiate these keys from the ones on the main keyboard. The separate cursor control pad also generates the same extended codes as the numeric keypad, except it never generates numeric ASCII codes. Otherwise, you cannot differentiate these keys from the equivalent keys on the numeric keypad (assuming numlock is off, of course). The keyboard ISR provides a special facility that lets you enter the ASCII code for a keystroke directly from the keyboard. To do this, hold down the alt key and typing out the decimal ASCII code (0..255) for a character on the numeric keypad. The keyboard ISR will convert these keystrokes to an eight-bit value, attach at H.O. byte of zero to the character, and use that as the character code. The keyboard ISR inserts the 16 bit value into the PC’s type ahead buffer. The system type ahead buffer is a circular queue that uses the following variables 40:1A - HeadPtr word ? 40:1C - TailPtr word ? 40:1E - Buffer word 16 dup (?)
The keyboard ISR inserts data at the location pointed at by TailPtr . The BIOS keyboard function removes characters from the location pointed at by the HeadPtr variable. These two pointers almost always contain an offset into the Buffer array5. If these two pointers are equal, the type ahead buffer is empty. If the value in HeadPtr is two greater than the value in TailPtr (or HeadPtr is 1Eh and TailPtr is 3Ch), then the buffer is full and the keyboard ISR will reject any additional keystrokes.
Note that the TailPtr variable always points at the next available location in the type ahead buffer. Since there is no “count” variable providing the number of entries in the buffer, we must always leave one entry free in the buffer area; this means the type ahead buffer can only hold 15 keystrokes, not 16. In addition to the type ahead buffer, the BIOS maintains several other keyboard-related variables in segment 40h. The following table lists these variables and their contents:
Table 74: Keyboard Related BIOS Variables Name
Addressa
Size
Description
KbdFlags1 (modifier flags)
40:17
Byte
This byte maintains the current status of the modifier keys on the keyboard. The bits have the following meanings: bit 7: Insert mode toggle bit 6: Capslock toggle (1=capslock on) bit 5: Numlock toggle (1=numlock on) bit 4: Scroll lock toggle (1=scroll lock on) bit 3: Alt key (1=alt is down) bit 2: Ctrl key (1=ctrl is down) bit 1: Left shift key (1=left shift is down) bit 0: Right shift key (1=right shift is down)
5. It is possible to change these pointers so they point elsewhere in the 40H segment, but this is not a good idea because many applications assume that these two pointers contain a value in the range 1Eh..3Ch.
Page 1158
The PC Keyboard
Table 74: Keyboard Related BIOS Variables Name
Addressa
Size
Description
KbdFlags2 (Toggle keys down)
40:18
Byte
Specifies if a toggle key is currently down. bit 7: Insert key (currently down if 1) bit 6: Capslock key (currently down if 1) bit 5: Numlock key (currently down if 1) bit 4: Scroll lock key (currently down if 1) bit 3: Pause state locked (ctrl-Numlock) if one bit 2: SysReq key (currently down if 1) bit 1: Left alt key (currently down if 1) bit 0: Left ctrl key (currently down if 1)
AltKpd
40:19
Byte
BIOS uses this to compute the ASCII code for an alt-Keypad sequence.
BufStart
40:80
Word
Offset of start of keyboard buffer (1Eh). Note: this variable is not supported on many systems, be careful if you use it.
BufEnd
40:82
Word
Offset of end of keyboard buffer (3Eh). See the note above.
KbdFlags3
40:96
Byte
Miscellaneous keyboard flags. bit 7: Read of keyboard ID in progress bit 6: Last char is first kbd ID character bit 5: Force numlock on reset bit 4: 1 if 101-key kbd, 0 if 83/84 key kbd. bit 3: Right alt key pressed if 1 bit 2: Right ctrl key pressed if 1 bit 1: Last scan code was E0h bit 0: Last scan code was E1h
KbdFlags4
40:97
Byte
More miscellaneous keyboard flags. bit 7: Keyboard transmit error bit 6: Mode indicator update bit 5: Resend receive flag bit 4: Acknowledge received bit 3: Must always be zero bit 2: Capslock LED (1=on) bit 1: Numlock LED (1=on) bit 0: Scroll lock LED (1=on)
a. Addresses are all given in hexadecimal
One comment is in order about KbdFlags1 and KbdFlags4. Bits zero through two of the KbdFlags4 variable is BIOS’ current settings for the LEDs on the keyboard. periodically, BIOS compares the values for capslock, numlock, and scroll lock in KbdFlags1 against these three bits in KbdFlags4. If they do not agree, BIOS will send an appropriate command to the keyboard to update the LEDs and it will change the values in the KbdFlags4 variable so the system is consistent. Therefore, if you mask in new values for numlock, scroll lock, or caps lock, the BIOS will automatically adjust KbdFlags4 and set the LEDs accordingly.
20.2
The Keyboard Hardware Interface IBM used a very simple hardware design for the keyboard port on the original PC and PC/XT machines. When they introduced the PC/AT, IBM completely resigned the interface between the PC and Page 1159
Chapter 20
the keyboard. Since then, almost every PC model and PC clone has followed this keyboard interface standard6. Although IBM extended the capabilities of the keyboard controller when they introduced their PS/2 systems, the PS/2 models are still upwards compatible from the PC/AT design. Since there are so few original PCs in use today (and fewer people write original software for them), we will ignore the original PC keyboard interface and concentrate on the AT and later designs. There are two keyboard microcontrollers that the system communicates with – one on the PC’s motherboard (the on-board microcontroller) and one inside the keyboard case (the keyboard microcontroller). Communication with the on-board microcontroller is through I/O port 64h. Reading this byte provides the status of the keyboard controller. Writing to this byte sends the on-board microcontroller a command. The organization of the status byte is
7
6
5
4
3
2
1
0
Output Buffer Status (1 = full, 0 = empty) Input Buffer Status (1= full, 0 = empty) System Flag (1 = self test passed, 0 = failed) Command/Data Available (0 = data available at port 60h, 1 = command available at port 64h) Keyboard active (1=enabled, 0=disabled) Error detected (1 = error in transmission, 0 = no error) Time-out error (1 = keyboard timed out, 0 = no time out error) Parity error (1 = parity error on transmission, 0 = no error)
On-Board 8042 Keyboard Microcontroller Status Byte (Read Port 64h) Communication to the microcontroller in the keyboard unit is via the bytes at I/O addresses 60h and 64h. Bits zero and one in the status byte at port 64h provide the necessary handshaking control for these ports. Before writing any data to these ports, bit zero of port 64h must be zero; data is available for reading from port 60h when bit one of port 64h contains a one. The keyboard enable and disable bits in the command byte (port 64h) determine whether the keyboard is active and whether the keyboard will interrupt the system when the user presses (or releases) a key, etc. Bytes written to port 60h are sent to the keyboard microcontroller and bytes written to port 64h are sent to the on-board microcontroller. Bytes read from port 60h generally come from the keyboard, although you can program the on-board microcontroller to return certain values at this port, as well. The following tables lists the commands sent to the keyboard microcontroller and the values you can expect back. The following table lists the allowable commands you can write to port 64h:
Table 75: On-Board Keyboard Controller Commands (Port 64h) Value (hex)
Description
20
Transmit keyboard controller’s command byte to system as a scan code at port 60h.
60
The next byte written to port 60h will be stored in the keyboard controller’s command byte.
6. We will ignore the PCjr machine in this discussion.
Page 1160
The PC Keyboard
Table 75: On-Board Keyboard Controller Commands (Port 64h) Value (hex)
Description
A4
Test if a password is installed (PS/2 only). Result comes back in port 60h. 0FAh means a password is installed, 0F1h means no password.
A5
Transmit password (PS/2 only). Starts receipt of password. The next sequence of scan codes written to port 60h, ending with a zero byte, are the new password.
A6
Password match. Characters from the keyboard are compared to password until a match occurs.
A7
Disable mouse device (PS/2 only). Identical to setting bit five of the command byte.
A8
Enable mouse device (PS/2 only). Identical to clearing bit five of the command byte.
A9
Test mouse device. Returns 0 if okay, 1 or 2 if there is a stuck clock, 3 or 4 if there is a stuck data line. Results come back in port 60h.
AA
Initiates self-test. Returns 55h in port 60h if successful.
AB
Keyboard interface test. Tests the keyboard interface. Returns 0 if okay, 1 or 2 if there is a stuck clock, 3 or 4 if there is a stuck data line. Results come back in port 60h.
AC
Diagnostic. Returns 16 bytes from the keyboard’s microcontroller chip. Not available on PS/2 systems.
AD
Disable keyboard. Same operation as setting bit four of the command register.
AE
Enable keyboard. Same operation as clearing bit four of the command register.
C0
Read keyboard input port to port 60h. This input port contains the following values: bit 7: Keyboard inhibit keyswitch (0 = inhibit, 1 = enabled). bit 6: Display switch (0=color, 1=mono). bit 5: Manufacturing jumper. bit 4: System board RAM (always 1). bits 0-3: undefined.
C1
Copy input port (above) bits 0-3 to status bits 4-7. (PS/2 only)
C2
Copy input pot (above) bits 4-7 to status port bits 4-7. (PS/2 only).
D0
Copy microcontroller output port value to port 60h (see definition below).
D1
Write the next data byte written to port 60h to the microcontroller output port. This port has the following definition: bit 7: Keyboard data. bit 6: Keyboard clock. bit 5: Input buffer empty flag. bit 4: Output buffer full flag. bit 3: Undefined. bit 2: Undefined. bit 1: Gate A20 line. bit 0: System reset (if zero). Note: writing a zero to bit zero will reset the machine. Writing a one to bit one combines address lines 19 and 20 on the PC’s address bus.
D2
Write keyboard buffer. The keyboard controller returns the next value sent to port 60h as though a keypress produced that value. (PS/2 only).
D3
Write mouse buffer. The keyboard controller returns the next value sent to port 60h as though a mouse operation produced that value. (PS/2 only).
D4
Writes the next data byte (60h) to the mouse (auxiliary) device. (PS/2 only).
Page 1161
Chapter 20
Table 75: On-Board Keyboard Controller Commands (Port 64h) Value (hex)
Description
E0
Read test inputs. Returns in port 60h the status of the keyboard serial lines. Bit zero contains the keyboard clock input, bit one contains the keyboard data input.
Fx
Pulse output port (see definition for D1). Bits 0-3 of the keyboard controller command byte are pulsed onto the output port. Resets the system if bit zero is a zero.
Commands 20h and 60h let you read and write the keyboard controller command byte. This byte is internal to the on-board microcontroller and has the following layout:
7
6
5
4
3
2
1
0
Keyboard interrupt (1 = enabled, 0= disabled) Mouse device interrupt (1 = enabled, 0 = disabled) System Flag (1 = self test passed, 0 = failed) PC/AT inhibit override (1 = enabled always) Must be zero on PS/2 systems Keyboard disable (1 = disable keyboard, 0 = no action) PC/AT keyboard enable (1 = enable keyboard, 0 = no action) PS/2 mouse disable (1 = disable, 0 = no action) PC Compatibility mode (1 = translate kbd codes to PC scan codes) Must be zero.
On-Board 8042 Keyboard Microcontroller Command byte (see commands 20h and 60h) The system transmits bytes written to I/O port 60h directly to the keyboard’s microcontroller. Bit zero of the status register must contain a zero before writing any data to this port. The commands the keyboard recognizes are
Table 76: Keyboard Microcontroller Commands (Port 60h) Value (hex)
Page 1162
Description
ED
Send LED bits. The next byte written to port 60h updates the LEDs on the keyboard. The parameter (next) byte contains: bits 3-7: Must be zero. bit 2: Capslock LED (1 = on, 0 = off). bit 1: Numlock LED (1 = on, 0 = off). bit 0: Scroll lock LED (1 = on, 0 = off).
EE
Echo commands. Returns 0EEh in port 60h as a diagnostic aid.
The PC Keyboard
Table 76: Keyboard Microcontroller Commands (Port 60h) Value (hex)
Description
F0
Select alternate scan code set (PS/2 only). The next byte written to port 60h selects one of the following options: 00: Report current scan code set in use (next value read from port 60h). 01: Select scan code set #1 (standard PC/AT scan code set). 02: Select scan code set #2. 03: Select scan code set #3.
F2
Send two-byte keyboard ID code as the next two bytes read from port 60h (PS/2 only).
F3
Set Autorepeat delay and repeat rate. Next byte written to port 60h determines rate: bit 7: must be zero bits 5,6: Delay. 00- 1/4 sec, 01- 1/2 sec, 10- 3/4 sec, 11- 1 sec. bits 0-4: Repeat rate. 0- approx 30 chars/sec to 1Fh- approx 2 chars/sec.
F4
Enable keyboard.
F5
Reset to power on condition and wait for enable command.
F6
Reset to power on condition and begin scanning keyboard.
F7
Make all keys autorepeat (PS/2 only).
F8
Set all keys to generate an up code and a down code (PS/2 only).
F9
Set all keys to generate an up code only (PS/2 only).
FA
Set all keys to autorepeat and generate up and down codes (PS/2 only).
FB
Set an individual key to autorepeat. Next byte contains the scan code of the desired key. (PS/2 only).
FC
Set an individual key to generate up and down codes. Next byte contains the scan code of the desired key. (PS/2 only).
FD
Set an individual key to generate only down codes. Next byte contains the scan code of the desired key. (PS/2 only).
FE
Resend last result. Use this command if there is an error receiving data.
FF
Reset keyboard to power on state and start the self-test.
The following short program demonstrates how to send commands to the keyboard’s controller. This little TSR utility programs a “light show” on the keyboard’s LEDs. ; ; ; ; ; ; ; ;
LEDSHOW.ASM This short TSR creates a light show on the keyboard’s LEDs. For space reasons, this code does not implement a multiplex handler nor can you remove this TSR once installed. See the chapter on resident programs for details on how to do this. cseg and EndResident must occur before the standard library segments!
cseg cseg
segment ends
para public ‘code’
; Marker segment, to find the end of the resident section. EndResident EndResident
segment ends
para public ‘Resident’
.xlist include stdlib.a includelib stdlib.lib .list
Page 1163
Chapter 20
byp
equ
cseg
segment assume
para public ‘code’ cs:cseg, ds:cseg
; SetCmd; ;
Sends the command byte in the AL register to the 8042 keyboard microcontroller chip (command register at port 64h).
SetCmd
proc push push cli
near cx ax
;Save command value. ;Critical region, no ints now.
; Wait until the 8042 is done processing the current command. Wait4Empty:
xor in test loopnz
cx, cx al, 64h al, 10b Wait4Empty
;Allow 65,536 times thru loop. ;Read keyboard status register. ;Input buffer full? ;If so, wait until empty.
; Okay, send the command to the 8042:
SetCmd
pop out sti pop ret endp
ax 64h, al
;Retrieve command.
; SendCmd;
The following routine sends a command or data byte to the keyboard data port (port 60h).
SendCmd
proc push push push mov mov mov
near ds bx cx cx, 40h ds, cx bx, ax
mov call
al, 0ADh SetCmd
;Okay, ints can happen again. cx
cli
;Save data byte ;Disable kbd for now. ;Disable ints while accessing HW.
; Wait until the 8042 is done processing the current command. Wait4Empty:
xor in test loopnz
cx, cx al, 64h al, 10b Wait4Empty
;Allow 65,536 times thru loop. ;Read keyboard status register. ;Input buffer full? ;If so, wait until empty.
; Okay, send the data to port 60h
SendCmd
Page 1164
mov out
al, bl 60h, al
mov call sti
al, 0AEh SetCmd
pop pop pop ret endp
cx bx ds
;Reenable keyboard. ;Allow interrupts now.
The PC Keyboard
; SetLEDs; ;
Writes the value in AL to the LEDs on the keyboard. Bits 0..2 correspond to scroll, num, and caps lock, respectively.
SetLEDs
proc push push
near ax cx
mov
ah, al
;Save LED bits.
mov call mov call
al, 0EDh SendCmd al, ah SendCmd
;8042 set LEDs cmd. ;Send the command to 8042. ;Get parameter byte ;Send parameter to the 8042.
cx ax
SetLEDs
pop pop ret endp
; MyInt1C;
Every 1/4 seconds (every 4th call) this routine rotates the LEDs to produce an interesting light show.
CallsPerIter CallCnt LEDIndex LEDTable
equ byte word byte byte byte byte
4 CallsPerIter LEDTable 111b, 110b, 101b, 111b, 110b, 101b, 111b, 110b, 101b, 111b, 110b, 101b,
011b,111b, 011b,111b, 011b,111b, 011b,111b,
byte byte byte byte
000b, 000b, 000b, 000b,
100b, 100b, 100b, 100b,
010b, 010b, 010b, 010b,
001b, 001b, 001b, 001b,
000b, 000b, 000b, 000b,
100b, 100b, 100b, 100b,
010b, 010b, 010b, 010b,
001b 001b 001b 001b
byte byte byte byte
000b, 000b, 000b, 000b,
001b, 001b, 001b, 001b,
010b, 010b, 010b, 010b,
100b, 100b, 100b, 100b,
000b, 000b, 000b, 000b,
001b, 001b, 001b, 001b,
010b, 010b, 010b, 010b,
100b 100b 100b 100b
byte byte byte byte
010b, 010b, 010b, 010b,
001b, 001b, 001b, 001b,
010b, 010b, 010b, 010b,
100b, 100b, 100b, 100b,
010b, 010b, 010b, 010b,
001b, 001b, 001b, 001b,
010b, 010b, 010b, 010b,
100b 100b 100b 100b
000b, 111b, 000b, 111b, 000b, 111b, 000b, 111b, this byte
000b, 000b, 000b, 000b,
111b, 111b, 111b, 111b,
000b, 000b, 000b, 000b,
111b, 111b, 111b, 111b,
000b, 000b, 000b, 000b,
111b 111b 111b 111b
TableEnd
byte byte byte byte equ
OldInt1C
dword
?
MyInt1C
proc assume
far ds:cseg
push push push
ds ax bx
mov mov
ax, cs ds, ax
dec jne mov mov mov call
CallCnt NotYet CallCnt, CallsPerIter bx, LEDIndex al, [bx] SetLEDs
110b, 110b, 110b, 110b,
101b, 101b, 101b, 101b,
011b 011b 011b 011b
;Reset call count.
Page 1165
Chapter 20
MyInt1C
inc cmp jne lea mov pop pop pop jmp endp
Main
proc
SetTbl: NotYet:
; ; ; ;
bx bx, offset TableEnd SetTbl bx, LEDTable LEDIndex, bx bx ax ds cs:OldInt1C
mov mov
ax, cseg ds, ax
print byte byte
“LED Light Show”,cr,lf “Installing....”,cr,lf,0
Patch into the INT 1Ch interrupt vector. Note that the statements above have made cseg the current data segment, so we can store the old INT 1Ch values directly into the OldInt1C variable. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[1Ch*4] word ptr OldInt1C, ax ax, es:[1Ch*4 + 2] word ptr OldInt1C+2, ax es:[1Ch*4], offset MyInt1C es:[1Ch*4+2], cs ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident. print byte
“Installed.”,cr,lf,0
mov int
ah, 62h 21h
;Get this program’s PSP ; value.
dx, EndResident dx, bx ax, 3100h 21h
;Compute size of program.
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
;DOS TSR command.
Main
The keyboard microcontroller also sends data to the on-board microcontroller for processing and release to the system through port 60h. Most of these values are key press scan codes (up or down codes), but the keyboard transmits several other values as well. A well designed keyboard interrupt service routine should be able to handle (or at least ignore) the non-scan code values. Any particular, any program that sends commands to the keyboard needs to be able to handle the resend and acknowledge commands Page 1166
The PC Keyboard
that the keyboard microcontroller returns in port 60h. The keyboard microcontroller sends the following values to the system:
Table 77: Keyboard to System Transmissions Value (hex) 00
Description Data overrun. System sends a zero byte as the last value when the keyboard controller’s internal buffer overflows.
1..58 81..D8
Scan codes for key presses. The positive values are down codes, the negative values (H.O. bit set) are up codes.
83AB
Keyboard ID code returned in response to the F2 command (PS/2 only).
AA
Returned during basic assurance test after reset. Also the up code for the left shift key.
EE
Returned by the ECHO command.
F0
Prefix to certain up codes (N/A on PS/2).
FA
Keyboard acknowledge to keyboard commands other than resend or ECHO.
FC
Basic assurance test failed (PS/2 only).
FD
Diagnostic failure (not available on PS/2).
FE
Resend. Keyboard requests the system to resend the last command.
FF
Key error (PS/2 only).
Assuming you have not disabled keyboard interrupts (see the keyboard controller command byte), any value the keyboard microcontroller sends to the system through port 60h will generate an interrupt on IRQ line one (int 9). Therefore, the keyboard interrupt service routine normally handles all the above codes. If you are patching into int 9, don’t forget to send and end of interrupt (EOI) signal to the 8259A PIC at the end of your ISR code. Also, don’t forget you can enable or disable the keyboard interrupt at the 8259A. In general, your application software should not access the keyboard hardware directly. Doing so will probably make your software incompatible with utility software such as keyboard enhancers (keyboard macro programs), pop-up software, and other resident programs that read the keyboard or insert data into the system’s type ahead buffer. Fortunately, DOS and BIOS provide an excellent set of functions to read and write keyboard data. Your programs will be much more robust if you stick to using those functions. Accessing the keyboard hardware directly should be left to keyboard ISRs and those keyboard enhancers and pop-up programs that absolutely have to talk directly to the hardware.
20.3
The Keyboard DOS Interface MS-DOS provides several calls to read characters from the keyboard (see “MS-DOS, PC-BIOS, and File I/O” on page 699). The primary thing to note about the DOS calls is that they only return a single byte. This means that you lose the scan code information the keyboard interrupt service routine saves in the type ahead buffer. If you press a key that has an extended code rather than an ASCII code, MS-DOS returns two keycodes. On the first call MS-DOS returns a zero value. This tells you that you must call the get character routine again. The code MS-DOS returns on the second call is the extended key code. Note that the Standard Library routines call MS-DOS to read characters from the keyboard. Therefore, the Standard Library getc routine also returns extended keycodes in this manner. The gets and getsm
Page 1167
Chapter 20
routines throw away any non-ASCII keystrokes since it would not be a good thing to insert zero bytes into the middle of a zero terminated string.
20.4
The Keyboard BIOS Interface Although MS-DOS provides a reasonable set of routines to read ASCII and extended character codes from the keyboard, the PC’s BIOS provides much better keyboard input facilities. Furthermore, there are lots of interesting keyboard related variables in the BIOS data area you can poke around at. In general, if you do not need the I/O redirection facilities provided by MS-DOS, reading your keyboard input using BIOS functions provides much more flexibility. To call the MS-DOS BIOS keyboard services you use the int 16h instruction. The BIOS provides the following keyboard functions:
Table 78: BIOS Keyboard Support Functions Function # (AH)
Input Parameters
0
Output Parameters
al - ASCII character ah- scan code
Description Read character. Reads next available character from the system’s type ahead buffer. Wait for a keystroke if the buffer is empty.
1
ZF- Set if no key. ZF- Clear if key available. al - ASCII code ah- scan code
Checks to see if a character is available in the type ahead buffer. Sets the zero flag if not key is available, clears the zero flag if a key is available. If there is an available key, this function returns the ASCII and scan code value in ax. The value in ax is undefined if no key is available.
2
al- shift flags
Returns the current status of the shift flags in al. The shift flags are defined as follows:
3
al = 5 bh = 0, 1, 2, 3 for
1/4, 1/2, 3/4, or 1 second delay bl = 0..1Fh for 30/sec to 2/sec. 5
Page 1168
ch = scan code cl = ASCII code
bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Alt key is down bit 2: Ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down
Set auto repeat rate. The bh register contains the amount of time to wait before starting the autorepeat operation, the bl register contains the autorepeat rate.
Store keycode in buffer. This function stores the value in the cx register at the end of the type ahead buffer. Note that the scan code in ch doesn’t have to correspond to the ASCII code appearing in cl . This routine will simply insert the data you provide into the system type ahead buffer.
The PC Keyboard
Table 78: BIOS Keyboard Support Functions Function # (AH)
Input Parameters
Output Parameters
Description
al - ASCII character ah- scan code
Read extended character. Like ah=0 call, except this one passes all key codes, the ah=0 call throws away codes that are not PC/XT compatible.
11h
ZF- Set if no key. ZF- Clear if key available. al - ASCII code ah- scan code
Like the ah=01h call except this one does not throw away keycodes that are not PC/XT compatible (i.e., the extra keys found on the 101 key keyboard).
12h
al- shift flags ah- extended shift flags
Returns the current status of the shift flags in ax. The shift flags are defined as follows:
10h
bit 15: SysReq key pressed bit 14: Capslock key currently down bit 13: Numlock key currently down bit 12: Scroll lock key currently down bit 11: Right alt key is down bit 10:Right ctrl key is down bit 9: Left alt key is down bit 8: Left ctrl key is down bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Either alt key is down (some machines, left only) bit 2: Either ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down
Note that many of these functions are not supported in every BIOS that was ever written. In fact, only the first three functions were available in the original PC. However, since the AT came along, most BIOSes have supported at least the functions above. Many BIOS provide extra functions, and there are many TSR applications you can buy that extend this list even farther. The following assembly code demonstrates how to write an int 16h TSR that provides all the functions above. You can easily extend this if you desire. ; ; ; ; ; ; ; ; ; ; ; ; ; ;
INT16.ASM A short passive TSR that replaces the BIOS’ int 16h handler. This routine demonstrates the function of each of the int 16h functions that a standard BIOS would provide. Note that this code does not patch into int 2Fh (multiplex interrupt) nor can you remove this code from memory except by rebooting. If you want to be able to do these two things (as well as check for a previous installation), see the chapter on resident programs. Such code was omitted from this program because of length constraints. cseg and EndResident must occur before the standard library segments!
cseg cseg
segment ends
para public ‘code’
; Marker segment, to find the end of the resident section.
Page 1169
Chapter 20 EndResident EndResident
segment ends
para public ‘Resident’
.xlist include stdlib.a includelib stdlib.lib .list byp
equ
cseg
segment assume
para public ‘code’ cs:cseg, ds:cseg
OldInt16
dword
?
; BIOS variables: KbdFlags1 KbdFlags2 AltKpd HeadPtr TailPtr Buffer EndBuf
equ equ equ equ equ equ equ
1eh 3eh
KbdFlags3 KbdFlags4
equ equ
incptr
macro local add cmp jb mov mov endm
which NoWrap bx, 2 bx, EndBuf NoWrap bx, Buffer which, bx
NoWrap:
; MyInt16; ; ; ; ; ; ; ; ; ; ; ; ; ; MyInt16
This routine processes the int 16h function requests. AH -00h 01h
05h 10h 11h 12h
Description -----------------------------------------------Get a key from the keyboard, return code in AX. Test for available key, ZF=1 if none, ZF=0 and AX contains next key code if key available. Get shift status. Returns shift key status in AL. Set Autorepeat rate. BH=0,1,2,3 (delay time in quarter seconds), BL=0..1Fh for 30 char/sec to 2 char/sec repeat rate. Store scan code (in CX) in the type ahead buffer. Get a key (same as 00h in this implementation). Test for key (same as 01h). Get extended key status. Returns status in AX.
proc test je cmp jb je cmp je cmp je cmp je cmp je
far ah, 0EFh GetKey ah, 2 TestKey GetStatus ah, 3 SetAutoRpt ah, 5 StoreKey ah, 11h TestKey ah, 12h ExtStatus
02h 03h
;Check for 0h and 10h ;Check for 01h and 02h ;Check for AutoRpt function. ;Check for StoreKey function. ;Extended test key opcode. ;Extended status call
; Well, it’s a function we don’t know about, so just return to the caller.
Page 1170
The PC Keyboard iret ; If the user specified ah=0 or ah=10h, come down here (we will not ; differentiate between extended and original PC getc calls). GetKey:
mov int je
ah, 11h 16h GetKey
push push mov mov cli mov mov incptr pop pop iret
ds bx ax, 40h ds, ax bx, HeadPtr ax, [bx] HeadPtr bx ds
;See if key is available. ;Wait for keystroke.
;Critical region! Ints off. ;Ptr to next character. ;Get the character. ;Bump up HeadPtr ;Restores interrupt flag.
; TestKey; ; ; ; ; ; ; ; ; ; ; ;
Checks to see if a key is available in the keyboard buffer. We need to turn interrupts on here (so the kbd ISR can place a character in the buffer if one is pending). Generally, you would want to save the interrupt flag here. But BIOS always forces interrupts on, so there may be some programs out there that depend on this, so we won’t “fix” this problem.
TestKey:
sti push push mov mov cli mov mov cmp pop pop sti retf
Returns key status in ZF and AX. If ZF=1 then no key is available and the value in AX is indeterminate. If ZF=0 then a key is available and AX contains the scan/ASCII code of the next available key. This call does not remove the next character from the input buffer. ;Turn on the interrupts. ds bx ax, 40h ds, ax ;Critical region, ints off! bx, HeadPtr ax, [bx] bx, TailPtr bx ds 2
;BIOS returns avail keycode. ;ZF=1, if empty buffer ;Inst back on. ;Pop flags (ZF is important!)
; The GetStatus call simply returns the KbdFlags1 variable in AL. GetStatus:
push mov mov mov pop iret
ds ax, 40h ds, ax al, KbdFlags1 ds
;Just return Std Status.
; StoreKey-
Inserts the value in CX into the type ahead buffer.
StoreKey:
push push mov mov cli mov push mov incptr cmp jne pop
ds bx ax, 40h ds, ax bx, TailPtr bx [bx], cx TailPtr bx, HeadPtr StoreOkay TailPtr
;Ints off, critical region. ;Address where we can put ; next key code. ;Store the key code away. ;Move on to next entry in buf. ;Data overrun? ;If not, jump, if so ; ignore key entry.
Page 1171
Chapter 20 StoreOkay:
sub add pop pop iret
sp, 2 sp, 2 bx ds
;So stack matches alt path. ;Remove junk data from stk. ;Restores interrupts.
; ExtStatus;
Retrieve the extended keyboard status and return it in AH, also returns the standard keyboard status in AL.
ExtStatus:
push mov mov
ds ax, 40h ds, ax
mov and test je or
ah, KbdFlags2 ah, 7Fh ah, 100b NoSysReq ah, 80h
and mov and or mov and or
ah, al, al, ah, al, al, ah,
mov pop iret
al, KbdFlags1 ds
;Clear final sysreq field. ;Test cur sysreq bit. ;Skip if it’s zero. ;Set final sysreq bit.
NoSysReq: 0F0h KbdFlags3 1100b al KbdFlags2 11b al
;Clear alt/ctrl bits. ;Grab rt alt/ctrl bits. ;Merge into AH. ;Grab left alt/ctrl bits. ;Merge into AH. ;AL contains normal flags.
; SetAutoRpt- Sets the autorepeat rate. On entry, bh=0, 1, 2, or 3 (delay ; in 1/4 sec before autorepeat starts) and bl=0..1Fh (repeat ; rate, about 2:1 to 30:1 (chars:sec). SetAutoRpt:
Page 1172
push push
cx bx
mov call
al, 0ADh SetCmd
;Disable kbd for now.
and mov shl and or mov call mov call
bh, 11b cl, 5 bh, cl bl, 1Fh bh, bl al, 0F3h SendCmd al, bh SendCmd
;Force into proper range.
mov call mov call
al, 0AEh SetCmd al, 0F4h SendCmd
pop pop iret
bx cx
;Move to final position. ;Force into proper range. ;8042 command data byte. ;8042 set repeat rate cmd. ;Send the command to 8042. ;Get parameter byte ;Send parameter to the 8042. ;Reenable keyboard. ;Restart kbd scanning.
MyInt16
endp
; SetCmd; ;
Sends the command byte in the AL register to the 8042 keyboard microcontroller chip (command register at port 64h).
SetCmd
proc push push cli
near cx ax
;Save command value. ;Critical region, no ints now.
The PC Keyboard ; Wait until the 8042 is done processing the current command. Wait4Empty:
xor in test loopnz
cx, cx al, 64h al, 10b Wait4Empty
;Allow 65,536 times thru loop. ;Read keyboard status register. ;Input buffer full? ;If so, wait until empty.
; Okay, send the command to the 8042:
SetCmd
pop out sti pop ret endp
; SendCmd;
The following routine sends a command or data byte to the keyboard data port (port 60h).
SendCmd
proc push push push mov mov mov
near ds bx cx cx, 40h ds, cx bx, ax
mov cli
bh, 3
RetryLp:
ax 64h, al
;Retrieve command. ;Okay, ints can happen again.
cx
;Save data byte ;Retry cnt. ;Disable ints while accessing HW.
; Clear the Error, Acknowledge received, and resend received flags ; in KbdFlags4 and
byte ptr KbdFlags4, 4fh
; Wait until the 8042 is done processing the current command. Wait4Empty:
xor in test loopnz
cx, cx al, 64h al, 10b Wait4Empty
;Allow 65,536 times thru loop. ;Read keyboard status register. ;Input buffer full? ;If so, wait until empty.
; Okay, send the data to port 60h mov out sti
al, bl 60h, al ;Allow interrupts now.
; Wait for the arrival of an acknowledgement from the keyboard ISR: Wait4Ack:
xor cx, cx ;Wait a long time, if need be. test byp KbdFlags4, 10 ;Acknowledge received bit. jnz GotAck loop Wait4Ack dec bh ;Do a retry on this guy. jne RetryLp
; If the operation failed after 3 retries, set the error bit and quit. or
byp KbdFlags4, 80h ;Set error bit. cx bx ds
SendCmd
pop pop pop ret endp
Main
proc
GotAck:
Page 1173
Chapter 20
; ; ; ;
mov mov
ax, cseg ds, ax
print byte byte
“INT 16h Replacement”,cr,lf “Installing....”,cr,lf,0
Patch into the INT 9 and INT 16 interrupt vectors. Note that the statements above have made cseg the current data segment, so we can store the old INT 9 and INT 16 values directly into the OldInt9 and OldInt16 variables. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[16h*4] word ptr OldInt16, ax ax, es:[16h*4 + 2] word ptr OldInt16+2, ax es:[16h*4], offset MyInt16 es:[16h*4+2], cs ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident.
20.5
print byte
“Installed.”,cr,lf,0
mov int
ah, 62h 21h
;Get this program’s PSP ; value.
dx, EndResident dx, bx ax, 3100h 21h
;Compute size of program.
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
;DOS TSR command.
Main
The Keyboard Interrupt Service Routine The int 16h ISR is the interface between application programs and the keyboard. In a similar vein, the int 9 ISR is the interface between the keyboard hardware and the int 16h ISR. It is the job of the int 9 ISR to process keyboard hardware interrupts, convert incoming scan codes to scan/ASCII code combinations and place them in the typeahead buffer, and process other messages the keyboard generates. To convert keyboard scan codes to scan/ASCII codes, the int 9 ISR must keep track of the current state of the modifier keys. When a scan code comes along, the int 9 ISR can use the xlat instruction to translate the scan code to an ASCII code using a table int 9 selects on the basis of the modifier flags. Another important issue is that the int 9 handler must handle special key sequences like ctrl-alt-del (reset) and PrtSc. The following assembly code provides a simple int 9 handler for the keyboard. It does not support alt-Keypad ASCII code entry or a few other minor features, but it does support almost everything you need for a keyboard interrupt service routine. Certainly it demonstrates all the techniques you need to know when programming the keyboard.
Page 1174
The PC Keyboard ; ; ; ; ; ; ; ; ; ; ; ;
INT9.ASM A short TSR to provide a driver for the keyboard hardware interrupt. Note that this code does not patch into int 2Fh (multiplex interrupt) nor can you remove this code from memory except by rebooting. If you want to be able to do these two things (as well as check for a previous installation), see the chapter on resident programs. Such code was omitted from this program because of length constraints. cseg and EndResident must occur before the standard library segments!
cseg OldInt9 cseg
segment dword ends
para public ‘code’ ?
; Marker segment, to find the end of the resident section. EndResident EndResident
segment ends
para public ‘Resident’
.xlist include stdlib.a includelib stdlib.lib .list NumLockScan ScrlLockScan CapsLockScan CtrlScan AltScan RShiftScan LShiftScan InsScanCode DelScanCode
equ equ equ equ equ equ equ equ equ
45h 46h 3ah 1dh 38h 36h 2ah 52h 53h
; Bits for the various modifier keys RShfBit LShfBit CtrlBit AltBit SLBit NLBit CLBit InsBit
equ equ equ equ equ equ equ equ
1 2 4 8 10h 20h 40h 80h
KbdFlags KbdFlags2 KbdFlags3 KbdFlags4
equ equ equ equ
byp
equ
cseg
segment assume
para public ‘code’ ds:nothing
; ; ; ; ; ; ; ; ;
ptr ptr ptr ptr
ds:[17h]> ds:[18h]> ds:[96h]> ds:[97h]>
Scan code translation table. The incoming scan code from the keyboard selects a row. The modifier status selects the column. The word at the intersection of the two is the scan/ASCII code to put into the PC’s type ahead buffer. If the value fetched from the table is zero, then we do not put the character into the type ahead buffer. norm
shft
ctrl
alt
num
caps
shcap
shnum
ScanXlat word 0000h, 0000h, 0000h, 0000h, 0000h, 0000h, 0000h, 0000h word 011bh, 011bh, 011bh, 011bh, 011bh, 011bh, 011bh, 011bh word 0231h, 0231h, 0000h, 7800h, 0231h, 0231h, 0231h, 0321h
;ESC ;1 !
Page 1175
Chapter 20 word word word word word
0332h, 0433h, 0534h, 0635h, 0736h,
0340h, 0423h, 0524h, 0625h, 075eh,
0300h, 0000h, 0000h, 0000h, 071eh,
7900h, 7a00h, 7b00h, 7c00h, 7d00h,
0332h, 0433h, 0534h, 0635h, 0736h,
0332h, 0433h, 0534h, 0635h, 0736h,
0332h, 0423h, 0524h, 0625h, 075eh,
0332h 0423h 0524h 0625h 075eh
;2 ;3 ;4 ;5 ;6
word word word word word word word word
0837h, 0938h, 0a39h, 0b30h, 0c2dh, 0d3dh, 0e08h, 0f09h,
0826h, 092ah, 0a28h, 0b29h, 0c5fh, 0d2bh, 0e08h, 0f00h,
0000h, 0000h, 0000h, 0000h, 0000h, 0000h, 0e7fh, 0000h,
7e00h, 7f00h, 8000h, 8100h, 8200h, 8300h, 0000h, 0000h,
0837h, 0938h, 0a39h, 0b30h, 0c2dh, 0d3dh, 0e08h, 0f09h,
0837h, 0938h, 0a39h, 0b30h, 0c2dh, 0d3dh, 0e08h, 0f09h,
0826h, 092ah, 0a28h, 0b29h, 0c5fh, 0d2bh, 0e08h, 0f00h,
0826h 092ah 0a28h 0b29h 0c5fh 0d2bh 0e08h 0f00h
;7 & ;8 * ;9 ( ;0 ) ;- _ ;= + ;bksp ;Tab
word word word word word word word word
norm 1071h, 1177h, 1265h, 1372h, 1474h, 1579h, 1675h, 1769h,
shft 1051h, 1057h, 1245h, 1352h, 1454h, 1559h, 1655h, 1749h,
ctrl 1011h, 1017h, 1205h, 1312h, 1414h, 1519h, 1615h, 1709h,
alt 1000h, 1100h, 1200h, 1300h, 1400h, 1500h, 1600h, 1700h,
num caps shcap shnum 1071h, 1051h, 1051h, 1071h 1077h, 1057h, 1057h, 1077h 1265h, 1245h, 1245h, 1265h 1272h, 1252h, 1252h, 1272h 1474h, 1454h, 1454h, 1474h 1579h, 1559h, 1579h, 1559h 1675h, 1655h, 1675h, 1655h 1769h, 1749h, 1769h, 1749h
;Q ;W ;E ;R ;T ;Y ;U ;I
word word word word word word word word
186fh, 1970h, 1a5bh, 1b5dh, 1c0dh, 1d00h, 1e61h, 1f73h,
184fh, 1950h, 1a7bh, 1b7dh, 1c0dh, 1d00h, 1e41h, 1f5eh,
180fh, 1910h, 1a1bh, 1b1dh, 1c0ah, 1d00h, 1e01h, 1f13h,
1800h, 1900h, 0000h, 0000h, 0000h, 1d00h, 1e00h, 1f00h,
186fh, 1970h, 1a5bh, 1b5dh, 1c0dh, 1d00h, 1e61h, 1f73h,
;O ;P ;[ { ;] } ;enter ;ctrl ;A ;S
word word word word word word word word
norm 2064h, 2166h, 2267h, 2368h, 246ah, 256bh, 266ch, 273bh,
shft 2044h, 2146h, 2247h, 2348h, 244ah, 254bh, 264ch, 273ah,
ctrl 2004h, 2106h, 2207h, 2308h, 240ah, 250bh, 260ch, 0000h,
alt 2000h, 2100h, 2200h, 2300h, 2400h, 2500h, 2600h, 0000h,
num caps shcap shnum 2064h, 2044h, 2064h, 2044h 2166h, 2146h, 2166h, 2146h 2267h, 2247h, 2267h, 2247h 2368h, 2348h, 2368h, 2348h 246ah, 244ah, 246ah, 244ah 256bh, 254bh, 256bh, 254bh 266ch, 264ch, 266ch, 264ch 273bh, 273bh, 273ah, 273ah
;D ;F ;G ;H ;J ;K ;L ;; :
word word word word word word word word
2827h, 2960h, 2a00h, 2b5ch, 2c7ah, 2d78h, 2e63h, 2f76h,
2822h, 297eh, 2a00h, 2b7ch, 2c5ah, 2d58h, 2e43h, 2f56h,
0000h, 0000h, 2a00h, 2b1ch, 2c1ah, 2d18h, 2e03h, 2f16h,
0000h, 0000h, 2a00h, 0000h, 2c00h, 2d00h, 2e00h, 2f00h,
2827h, 2960h, 2a00h, 2b5ch, 2c7ah, 2d78h, 2e63h, 2f76h,
2822h 297eh 2a00h 2b7ch 2c5ah 2d58h 2e43h 2f56h
;’ “ ;` ~ ;LShf ;\ | ;Z ;X ;C ;V
word word word word word word word word
norm 3062h, 316eh, 326dh, 332ch, 342eh, 352fh, 3600h, 372ah,
shft 3042h, 314eh, 324dh, 333ch, 343eh, 353fh, 3600h, 0000h,
ctrl 3002h, 310eh, 320dh, 0000h, 0000h, 0000h, 3600h, 3710h,
alt 3000h, 3100h, 3200h, 0000h, 0000h, 0000h, 3600h, 0000h,
num caps shcap shnum 3062h, 3042h, 3062h, 3042h 316eh, 314eh, 316eh, 314eh 326dh, 324dh, 326dh, 324dh 332ch, 332ch, 333ch, 333ch 342eh, 342eh, 343eh, 343eh 352fh, 352fh, 353fh, 353fh 3600h, 3600h, 3600h, 3600h 372ah, 372ah, 0000h, 0000h
;B ;N ;M ;, < ;. > ;/ ? ;rshf ;* PS
word word word word word word word word
3800h, 3920h, 3a00h, 3b00h, 3c00h, 3d00h, 3e00h, 3f00h,
3800h, 3920h, 3a00h, 5400h, 5500h, 5600h, 5700h, 5800h,
3800h, 3920h, 3a00h, 5e00h, 5f00h, 6000h, 6100h, 6200h,
3800h, 0000h, 3a00h, 6800h, 6900h, 6a00h, 6b00h, 6c00h,
3800h, 3920h, 3a00h, 3b00h, 3c00h, 3d00h, 3e00h, 3f00h,
;alt ;spc ;caps ;F1 ;F2 ;F3 ;F4 ;F5
;
;
;
;
Page 1176
184fh, 1950h, 1a5bh, 1b5dh, 1c0dh, 1d00h, 1e41h, 1f53h,
2827h, 2960h, 2a00h, 2b5ch, 2c5ah, 2d58h, 2e43h, 2f56h,
3800h, 3920h, 3a00h, 3b00h, 3c00h, 3d00h, 3e00h, 3f00h,
186fh, 1970h, 1a7bh, 1b7dh, 1c0ah, 1d00h, 1e61h, 1f73h,
2822h, 297eh, 2a00h, 2b7ch, 2c7ah, 2d78h, 2e63h, 2f76h,
3800h, 3920h, 3a00h, 5400h, 5500h, 5600h, 5700h, 5800h,
184fh 1950h 1a7bh 1b7dh 1c0ah 1d00h 1e41h 1f53h
3800h 3920h 3a00h 5400h 5500h 5600h 5700h 5800h
norm shft ctrl alt num caps shcap shnum word 4000h, 5900h, 6300h, 6d00h, 4000h, 4000h, 5900h, 5900h
@ # $ % ^
;F6
The PC Keyboard word word word word word word word
4100h, 4200h, 4300h, 4400h, 4500h, 4600h, 4700h,
5a00h, 5b00h, 5c00h, 5d00h, 4500h, 4600h, 4737h,
6400h, 6500h, 6600h, 6700h, 4500h, 4600h, 7700h,
6e00h, 6f00h, 7000h, 7100h, 4500h, 4600h, 0000h,
4100h, 4200h, 4300h, 4400h, 4500h, 4600h, 4737h,
4100h, 4200h, 4300h, 4400h, 4500h, 4600h, 4700h,
5a00h, 5b00h, 5c00h, 5d00h, 4500h, 4600h, 4737h,
5a00h 5b00h 5c00h 5d00h 4500h 4600h 4700h
;F7 ;F8 ;F9 ;F10 ;num ;scrl ;home
word word word word word word word word
4800h, 4900h, 4a2dh, 4b00h, 4c00h, 4d00h, 4e2bh, 4f00h,
4838h, 4939h, 4a2dh, 4b34h, 4c35h, 4d36h, 4e2bh, 4f31h,
0000h, 8400h, 0000h, 7300h, 0000h, 7400h, 0000h, 7500h,
0000h, 0000h, 0000h, 0000h, 0000h, 0000h, 0000h, 0000h,
4838h, 4939h, 4a2dh, 4b34h, 4c35h, 4d36h, 4e2bh, 4f31h,
4800h, 4900h, 4a2dh, 4b00h, 4c00h, 4d00h, 4e2bh, 4f00h,
4838h, 4939h, 4a2dh, 4b34h, 4c35h, 4d36h, 4e2bh, 4f31h,
4800h 4900h 4a2dh 4b00h 4c00h 4d00h 4e2bh 4f00h
;up ;pgup ;;left ;Center ;right ;+ ;end
word word word word word word word word
norm shft ctrl 5000h, 5032h, 0000h, 5100h, 5133h, 7600h, 5200h, 5230h, 0000h, 5300h, 532eh, 0000h, 0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0 5700h, 0000h, 0000h,
alt 0000h, 0000h, 0000h, 0000h,
num caps shcap shnum 5032h, 5000h, 5032h, 5000h 5133h, 5100h, 5133h, 5100h 5230h, 5200h, 5230h, 5200h 532eh, 5300h, 532eh, 5300h
;
0000h, 5700h, 5700h, 0000h, 0000h
word 5800h, 0000h, 0000h, 0000h, 5800h, 5800h, 0000h, 0000h
;down ;pgdn ;ins ;del ; -; -; -;F11 ;F12
;**************************************************************************** ; ; AL contains keyboard scan code. PutInBuffer
proc near push ds push bx mov bx, 40h mov ds, bx
;Point ES at the BIOS ; variables.
; If the current scan code is E0 or E1, we need to take note of this fact ; so that we can properly process cursor keys.
TryE1:
cmp jne or and jmp
al, 0e0h TryE1 KbdFlags3, 10b KbdFlags3, 0FEh Done
cmp jne or and jmp
al, 0e1h DoScan KbdFlags3, 1 KbdFlags3, 0FDh Done
;Set E0 flag ;Clear E1 flag
;Set E1 flag ;Clear E0 Flag
; Before doing anything else, see if this is Ctrl-Alt-Del: DoScan:
cmp jnz mov and cmp jne mov jmp
al, DelScanCode TryIns bl, KbdFlags bl, AltBit or CtrlBit ;Alt = bit 3, ctrl = bit 2 bl, AltBit or CtrlBit DoPIB word ptr ds:[72h], 1234h ;Warm boot flag. dword ptr cs:RebootAdrs ;REBOOT Computer
RebootAdrs
dword
0ffff0000h
;Reset address.
; Check for the INS key here. This one needs to toggle the ins bit ; in the keyboard flags variables.
Page 1177
Chapter 20 TryIns:
TryInsUp:
cmp jne or jmp
al, InsScanCode TryInsUp KbdFlags2, InsBit doPIB
cmp jne and xor jmp
al, InsScanCode+80h TryLShiftDn KbdFlags2, not InsBit KbdFlags, InsBit QuitPIB
;Note INS is down. ;Pass on INS key. ;INS up scan code. ;Note INS is up. ;Toggle INS bit.
; Handle the left and right shift keys down here. TryLShiftDn:
TryLShiftUp:
TryRShiftDn:
TryRShiftUp:
cmp jne or jmp
al, LShiftScan TryLShiftUp KbdFlags, LShfBit QuitPIB
;Note that the left ; shift key is down.
cmp jne and jmp
al, LShiftScan+80h TryRShiftDn KbdFlags, not LShfBit QuitPIB
;Note that the left ; shift key is up.
cmp jne or jmp
al, RShiftScan TryRShiftUp KbdFlags, RShfBit QuitPIB
cmp jne and jmp
al, RShiftScan+80h TryAltDn KbdFlags, not RShfBit QuitPIB
;Right shf is down.
;Right shf is up.
; Handle the ALT key down here. TryAltDn: GotoQPIB: TryAltUp:
cmp jne or jmp
al, AltScan TryAltUp KbdFlags, AltBit QuitPIB
cmp jne and jmp
al, AltScan+80h TryCtrlDn KbdFlags, not AltBit DoPIB
;Alt key is down.
;Alt key is up.
; Deal with the control key down here. TryCtrlDn:
TryCtrlUp:
cmp jne or jmp
al, CtrlScan TryCtrlUp KbdFlags, CtrlBit QuitPIB
cmp jne and jmp
al, CtrlScan+80h TryCapsDn KbdFlags, not CtrlBit QuitPIB
;Ctrl key is down.
;Ctrl key is up.
; Deal with the CapsLock key down here. TryCapsDn:
TryCapsUp:
Page 1178
cmp jne or xor jmp
al, CapsLockScan TryCapsUp KbdFlags2, CLBit KbdFlags, CLBit QuitPIB
cmp jne and call jmp
al, CapsLockScan+80h TrySLDn KbdFlags2, not CLBit SetLEDs QuitPIB
;Capslock is down. ;Toggle capslock.
;Capslock is up.
The PC Keyboard ; Deal with the Scroll Lock key down here. TrySLDn:
TrySLUp:
cmp jne or xor jmp
al, ScrlLockScan TrySLUp KbdFlags2, SLBit KbdFlags, SLBit QuitPIB
cmp jne and call jmp
al, ScrlLockScan+80h TryNLDn KbdFlags2, not SLBit SetLEDs QuitPIB
;Scrl lock is down. ;Toggle scrl lock.
;Scrl lock is up.
; Handle the NumLock key down here. TryNLDn:
TryNLUp:
cmp jne or xor jmp
al, NumLockScan TryNLUp KbdFlags2, NLBit KbdFlags, NLBit QuitPIB
cmp jne and call jmp
al, NumLockScan+80h DoPIB KbdFlags2, not NLBit SetLEDs QuitPIB
;Numlock is down. ;Toggle numlock.
;Numlock is up.
; Handle all the other keys here: DoPIB:
test jnz
al, 80h QuitPIB
;Ignore other up keys.
; If the H.O. bit is set at this point, we’d best only have a zero in AL. ; Otherwise, this is an up code which we can safely ignore. call test je
Convert ax, ax QuitPIB
PutCharInBuf: push mov mov int pop
cx cx, ax ah, 5 16h cx
QuitPIB:
and
KbdFlags3, 0FCh
Done:
pop bx pop ds ret endp
PutInBuffer
;Chk for bad code.
;Store scan code into ; type ahead buffer. ;E0, E1 not last code.
;**************************************************************************** ; ; ConvertAL contains a PC Scan code. Convert it to an ASCII char/Scan ; code pair and return the result in AX. This code assumes ; that DS points at the BIOS variable space (40h). Convert
proc push
near bx
test jz mov mov jmp
al, 80h DownScanCode ah, al al, 0 CSDone
;See if up code
Page 1179
Chapter 20 ; Okay, we’ve got a down key. But before going on, let’s see if we’ve ; got an ALT-Keypad sequence. DownScanCode: mov mov shl shl shl
bh, bl, bx, bx, bx,
0 al 1 1 1
;Multiply by eight to compute ; row index index the scan ; code xlat table
; Compute modifier index as follows: ; ; if alt then modifier = 3 test je add jmp ;
if ctrl, then modifier = 2
NotAlt:
; ; ; ;
KbdFlags, AltBit NotAlt bl, 3 DoConvert
test je add jmp
KbdFlags, CtrlBit NotCtrl bl, 2 DoConvert
Regardless of the shift setting, we’ve got to deal with numlock and capslock. Numlock is only a concern if the scan code is greater than or equal to 47h. Capslock is only a concern if the scan code is less than this.
NotCtrl:
NumOnly:
cmp jb test je test je add jmp
al, 47h DoCapsLk KbdFlags, NLBit NoNumLck KbdFlags, LShfBit or RShfBit NumOnly bl, 7 DoConvert
add jmp
bl, 4 DoConvert
;Test Numlock bit ;Check l/r shift. ;Numlock and shift. ;Numlock only.
; If numlock is not active, see if a shift key is: NoNumLck:
test je add jmp
KbdFlags, LShfBit or RShfBit DoConvert bl, 1 DoConvert
;Check l/r shift. ;normal if no shift.
; If the scan code’s value is below 47h, we need to check for capslock. DoCapsLk:
CapsOnly:
test je test je add jmp
KbdFlags, CLBit DoShift KbdFlags, LShfBit or RShfBit CapsOnly bl, 6 DoConvert
add jmp
bl, 5 DoConvert
;Chk capslock bit ;Chk for l/r shift ;Shift and capslock. ;Capslock
; Well, nothing else is active, check for just a shift key. DoShift:
DoConvert: CSDone: Convert
Page 1180
test je add
KbdFlags, LShfBit or RShfBit DoConvert bl, 1
shl mov pop ret endp
bx, 1 ax, ScanXlat[bx] bx
;l/r shift. ;Shift ;Word array
The PC Keyboard
; SetCmd; ;
Sends the command byte in the AL register to the 8042 keyboard microcontroller chip (command register at port 64h).
SetCmd
proc push push cli
near cx ax
;Save command value. ;Critical region, no ints now.
; Wait until the 8042 is done processing the current command. Wait4Empty:
xor in test loopnz
cx, cx al, 64h al, 10b Wait4Empty
;Allow 65,536 times thru loop. ;Read keyboard status register. ;Input buffer full? ;If so, wait until empty.
; Okay, send the command to the 8042:
SetCmd
pop out sti pop ret endp
; SendCmd;
The following routine sends a command or data byte to the keyboard data port (port 60h).
SendCmd
proc push push push mov mov mov
near ds bx cx cx, 40h ds, cx bx, ax
mov cli
bh, 3
RetryLp:
ax 64h, al
;Retrieve command. ;Okay, ints can happen again.
cx
;Save data byte ;Retry cnt. ;Disable ints while accessing HW.
; Clear the Error, Acknowledge received, and resend received flags ; in KbdFlags4 and
byte ptr KbdFlags4, 4fh
; Wait until the 8042 is done processing the current command. Wait4Empty:
xor in test loopnz
cx, cx al, 64h al, 10b Wait4Empty
;Allow 65,536 times thru loop. ;Read keyboard status register. ;Input buffer full? ;If so, wait until empty.
; Okay, send the data to port 60h mov out sti
al, bl 60h, al ;Allow interrupts now.
; Wait for the arrival of an acknowledgement from the keyboard ISR: Wait4Ack:
xor cx, cx ;Wait a long time, if need be. test byp KbdFlags4,10h ;Acknowledge received bit. jnz GotAck loop Wait4Ack dec bh ;Do a retry on this guy. jne RetryLp
; If the operation failed after 3 retries, set the error bit and quit. or
byp KbdFlags4,80h ;Set error bit.
Page 1181
Chapter 20 GotAck:
SendCmd
pop pop pop ret endp
; SetLEDs; ;
Updates the KbdFlags4 LED bits from the KbdFlags variable and then transmits new flag settings to the keyboard.
SetLEDs
proc push push mov mov shr and and or mov
near ax cx al, KbdFlags cl, 4 al, cl al, 111b KbdFlags4, 0F8h KbdFlags4, al ah, al
mov call
al, 0ADh SetCmd
;Disable kbd for now.
mov call mov call
al, 0EDh SendCmd al, ah SendCmd
;8042 set LEDs cmd. ;Send the command to 8042. ;Get parameter byte ;Send parameter to the 8042.
al, 0AEh SetCmd al, 0F4h SendCmd cx ax
;Reenable keyboard.
SetLEDs
mov call mov call pop pop ret endp
; MyInt9;
Interrupt service routine for the keyboard hardware interrupt.
MyInt9
proc push push push
far ds ax cx
mov mov
ax, 40h ds, ax
mov call cli xor in test loopz in cmp je cmp jne or jmp
al, 0ADh SetCmd cx, cx al, 64h al, 10b Wait4Data al, 60h al, 0EEh QuitInt9 al, 0FAh NotAck KbdFlags4, 10h QuitInt9
cmp jne or jmp
al, 0FEh NotResend KbdFlags4, 20h QuitInt9
Wait4Data:
NotAck:
cx bx ds
;Clear LED bits. ;Mask in new bits. ;Save LED bits.
;Restart kbd scanning.
;Disable keyboard ;Disable interrupts. ;Read kbd status port. ;Data in buffer? ;Wait until data available. ;Get keyboard data. ;Echo response? ;Acknowledge? ;Set ack bit. ;Resend command? ;Set resend bit.
; Note: other keyboard controller commands all have their H.O. bit set
Page 1182
The PC Keyboard ; and the PutInBuffer routine will ignore them. NotResend:
call
PutInBuffer
;Put in type ahead buffer.
QuitInt9:
mov call
al, 0AEh SetCmd
;Reenable the keyboard
mov out pop pop pop iret endp
al, 20h 20h, al cx ax ds
;Send EOI (end of interrupt) ; to the 8259A PIC.
MyInt9
Main
; ; ; ;
proc assume
ds:cseg
mov mov
ax, cseg ds, ax
print byte byte
“INT 9 Replacement”,cr,lf “Installing....”,cr,lf,0
Patch into the INT 9 interrupt vector. Note that the statements above have made cseg the current data segment, so we can store the old INT 9 value directly into the OldInt9 variable. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[9*4] word ptr OldInt9, ax ax, es:[9*4 + 2] word ptr OldInt9+2, ax es:[9*4], offset MyInt9 es:[9*4+2], cs ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident. print byte
“Installed.”,cr,lf,0
mov int
ah, 62h 21h
;Get this program’s PSP ; value.
dx, EndResident dx, bx ax, 3100h 21h
;Compute size of program.
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
;DOS TSR command.
Main
Page 1183
Chapter 20
20.6
Patching into the INT 9 Interrupt Service Routine For many programs, such as pop-up programs or keyboard enhancers, you may need to intercept certain “hot keys” and pass all remaining scan codes through to the default keyboard interrupt service routine. You can insert an int 9 interrupt service routine into an interrupt nine chain just like any other interrupt. When the keyboard interrupts the system to send a scan code, your interrupt service routine can read the scan code from port 60h and decide whether to process the scan code itself or pass control on to some other int 9 handler. The following program demonstrates this principle; it deactivates the ctrl-alt-del reset function on the keyboard by intercepting and throwing away delete scan codes when the ctrl and alt bits are set in the keyboard flags byte. ; ; ; ; ; ; ; ; ; ; ; ; ;
NORESET.ASM A short TSR that patches the int 9 interrupt and intercepts the ctrl-alt-del keystroke sequence. Note that this code does not patch into int 2Fh (multiplex interrupt) nor can you remove this code from memory except by rebooting. If you want to be able to do these two things (as well as check for a previous installation), see the chapter on resident programs. Such code was omitted from this program because of length constraints. cseg and EndResident must occur before the standard library segments!
cseg OldInt9 cseg
segment dword ends
para public ‘code’ ?
; Marker segment, to find the end of the resident section. EndResident EndResident
segment ends
para public ‘Resident’
.xlist include stdlib.a includelib stdlib.lib .list DelScanCode
equ
53h
; Bits for the various modifier keys CtrlBit AltBit
equ equ
4 8
KbdFlags
equ
cseg
segment assume
para public ‘code’ ds:nothing
; SetCmd; ;
Sends the command byte in the AL register to the 8042 keyboard microcontroller chip (command register at port 64h).
SetCmd
proc push push cli
near cx ax
;Save command value. ;Critical region, no ints now.
; Wait until the 8042 is done processing the current command. Wait4Empty:
Page 1184
xor in
cx, cx al, 64h
;Allow 65,536 times thru loop. ;Read keyboard status register.
The PC Keyboard test loopnz
al, 10b Wait4Empty
;Input buffer full? ;If so, wait until empty.
; Okay, send the command to the 8042:
SetCmd
pop out sti pop ret endp
; MyInt9; ; ; ; ; ;
Interrupt service routine for the keyboard hardware interrupt. Tests to see if the user has pressed a DEL key. If not, it passes control on to the original int 9 handler. If so, it first checks to see if the alt and ctrl keys are currently down; if not, it passes control to the original handler. Otherwise it eats the scan code and doesn’t pass the DEL through.
MyInt9
proc push push push
far ds ax cx
mov mov
ax, 40h ds, ax
mov call cli xor in test loopz
al, 0ADh SetCmd
in cmp jne mov and cmp jne
al, 60h ;Get keyboard data. al, DelScanCode ;Is it the delete key? OrigInt9 al, KbdFlags ;Okay, we’ve got DEL, is al, AltBit or CtrlBit ; ctrl+alt down too? al, AltBit or CtrlBit OrigInt9
Wait4Data:
ax 64h, al
;Retrieve command. ;Okay, ints can happen again.
cx
;Disable keyboard ;Disable interrupts.
cx, cx al, 64h al, 10b Wait4Data
;Read kbd status port. ;Data in buffer? ;Wait until data available.
; If ctrl+alt+DEL is down, just eat the DEL code and don’t pass it through. mov call
al, 0AEh SetCmd
;Reenable the keyboard
mov out pop pop pop iret
al, 20h 20h, al cx ax ds
;Send EOI (end of interrupt) ; to the 8259A PIC.
; If ctrl and alt aren’t both down, pass DEL on to the original INT 9 ; handler routine. OrigInt9:
MyInt9
Main
mov call
al, 0AEh SetCmd
pop pop pop jmp endp
cx ax ds cs:OldInt9
proc assume
;Reenable the keyboard
ds:cseg
Page 1185
Chapter 20
; ; ; ;
mov mov
ax, cseg ds, ax
print byte byte
“Ctrl-Alt-Del Filter”,cr,lf “Installing....”,cr,lf,0
Patch into the INT 9 interrupt vector. Note that the statements above have made cseg the current data segment, so we can store the old INT 9 value directly into the OldInt9 variable. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[9*4] word ptr OldInt9, ax ax, es:[9*4 + 2] word ptr OldInt9+2, ax es:[9*4], offset MyInt9 es:[9*4+2], cs ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident.
20.7
print byte
“Installed.”,cr,lf,0
mov int
ah, 62h 21h
;Get this program’s PSP ; value.
dx, EndResident dx, bx ax, 3100h 21h
;Compute size of program.
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
;DOS TSR command.
Main
Simulating Keystrokes At one point or another you may want to write a program that passes keystrokes on to another application. For example, you might want to write a keyboard macro TSR that lets you capture certain keys on the keyboard and send a sequence of keys through to some underlying application. Perhaps you’ll want to program an entire string of characters on a normally unused keyboard sequence (e.g., ctrl-up or ctrl-down). In any case, your program will use some technique to pass characters to a foreground application. There are three well-known techniques for doing this: store the scan/ASCII code directly in the keyboard buffer, use the 80x86 trace flag to simulate in al, 60h instructions, or program the on-board 8042 microcontroller to transmit the scan code for you. The next three sections describe these techniques in detail.
20.7.1 Stuffing Characters in the Type Ahead Buffer Perhaps the easiest way to insert keystrokes into an application is to insert them directly into the system’s type ahead buffer. Most modern BIOSes provide an int 16h function to do this (see “The Keyboard Page 1186
The PC Keyboard
BIOS Interface” on page 1168). Even if your system does not provide this function, it is easy to write your own code to insert data in the system type ahead buffer; or you can copy the code from the int 16h handler provided earlier in this chapter. The nice thing about this approach is that you can deal directly with ASCII characters (at least, for those key sequences that are ASCII). You do not have to worry about sending shift up and down codes around the scan code for tn “A” so you can get an upper case “A”, you need only insert 1E41h into the buffer. In fact, most programs ignore the scan code, so you can simply insert 0041h into the buffer and almost any application will accept the funny scan code of zero. The major drawback to the buffer insertion technique is that many (popular) applications bypass DOS and BIOS when reading the keyboard. Such programs go directly to the keyboard’s port (60h) to read their data. As such, shoving scan/ASCII codes into the type ahead buffer will have no effect. Ideally, you would like to stuff a scan code directly into the keyboard controller chip and have it return that scan code as though someone actually pressed that key. Unfortunately, there is no universally compatible way to do this. However, there are some close approximations, keep reading...
20.7.2 Using the 80x86 Trace Flag to Simulate IN AL, 60H Instructions One way to deal with applications that access the keyboard hardware directly is to simulate the 80x86 instruction set. For example, suppose we were able to take control of the int 9 interrupt service routine and execute each instruction under our control. We could choose to let all instructions except the in instruction execute normally. Upon encountering an in instruction (that the keyboard ISR uses to read the keyboard data), we check to see if it is accessing port 60h. If so, we simply load the al register with the desired scan code rather than actually execute the in instruction. It is also important to check for the out instruction, since the keyboard ISR will want to send and EOI signal to the 8259A PIC after reading the keyboard data, we can simply ignore out instructions that write to port 20h. The only difficult part is telling the 80x86 to pass control to our routine when encountering certain instructions (like in and out) and to execute other instructions normally. While this is not directly possible in real mode7, there is a close approximation we can make. The 80x86 CPUs provide a trace flag that generates an exception after the execution of each instruction. Normally, debuggers use the trace flag to single step through a program. However, by writing our own exception handler for the trace exception, we can gain control of the machine between the execution of every instruction. Then, we can look at the opcode of the next instruction to execute. If it is not an in or out instruction, we can simply return and execute the instruction normally. If it is an in or out instruction, we can determine the I/O address and decide whether to simulate or execute the instruction. In addition to the in and out instructions, we will need to simulate any int instructions we find as well. The reason is because the int instruction pushes the flags on the stack and then clears the trace bit in the flags register. This means that the interrupt service routine associated with that int instruction would execute normally and we would miss any in or out instructions appearing therein. However, it is easy to simulate the int instruction, leaving the trace flag enabled, so we will add int to our list of instructions to interpret. The only problem with this approach is that it is slow. Although the trace trap routine will only execute a few instructions on each call, it does so for every instruction in the int 9 interrupt service routine. As a result, during simulation, the interrupt service routine will run 10 to 20 times slower than the real code would. This generally isn’t a problem because most keyboard interrupt service routines are very short. However, you might encounter an application that has a large internal int 9 ISR and this method would noticeably slow the program. However, for most applications this technique works just fine and no one will notice any performance loss while they are typing away (slowly) at the keyboard.
7. It is possible to trap I/O instructions when running in protected mode.
Page 1187
Chapter 20
The following assembly code provides a short example of a trace exception handler that simulates keystrokes in this fashion: .xlist include stdlib.a includelib stdlib.lib .list cseg
segment assume
para public ‘code’ ds:nothing
; ScanCode must be in the Code segment. ScanCode
byte
0
;**************************************************************************** ; ; KbdSim- Passes the scan code in AL through the keyboard controller ; using the trace flag. The way this works is to turn on the ; trace bit in the flags register. Each instruction then causes a trace ; trap. The (installed) trace handler then looks at each instruction to ; handle IN, OUT, INT, and other special instructions. Upon encountering ; an IN AL, 60 (or equivalent) this code simulates the instruction and ; returns the specified scan code rather than actually executing the IN ; instruction. Other instructions need special treatment as well. See ; the code for details. This code is pretty good at simulating the hardware, ; but it runs fairly slow and has a few compatibility problems. KbdSim
proc
near
pushf push push push
es ax bx
xor mov cli mov
bx, bx es, bx cs:ScanCode, al
;Point es at int vector tbl ; (to simulate INT 9). ;No interrupts for now. ;Save output scan code.
push push
es:[1*4] es:2[1*4]
;Save current INT 1 vector ; so we can restore it later.
; Point the INT 1 vector at our INT 1 handler: mov mov
word ptr es:[1*4], offset MyInt1 word ptr es:[1*4 + 2], cs
; Turn on the trace trap (bit 8 of flags register): pushf pop or push popf
ax ah, 1 ax
; Simulate an INT 9 instruction. Note: cannot actually execute INT 9 here ; since INT instructions turn off the trace operation. pushf call
Page 1188
dword ptr es:[9*4]
The PC Keyboard ; Turn off the trace operation: pushf pop and push popf
ax ah, 0feh ax
;Clear trace bit.
; Disable trace operation. pop pop
es:[1*4 + 2] es:[1*4]
;Restore previous INT 1 ; handler.
; Okay, we’re done. Restore registers and return. VMDone:
KbdSim
pop pop pop popf ret endp
bx ax es
;---------------------------------------------------------------------------; ; MyInt1- Handles the trace trap (INT 1). This code looks at the next ; opcode to determine if it is one of the special opcodes we have to ; handle ourselves. MyInt1
; ; ; ; ;
proc push mov push push
far bp bp, sp bx ds
;Gain access to return adrs via BP.
If we get down here, it’s because this trace trap is directly due to our having punched the trace bit. Let’s process the trace trap to simulate the 80x86 instruction set. Get the return address into DS:BX
NextInstr:
lds
bx, 2[bp]
; The following is a special case to quickly eliminate most opcodes and ; speed up this code by a tiny amount.
NotSimple:
cmp jnb pop pop pop iret
byte ptr [bx], 0cdh ;Most opcodes are less than NotSimple ; 0cdh, hence we quickly ds ; return back to the real bx ; program. bp
je
IsIntInstr
;If it’s an INT instruction.
mov cmp je jb
bx, [bx] bl, 0e8h ExecInstr TryInOut0
;Get current instruction’s opcode. ;CALL opcode
cmp je cmp je pop pop pop iret
bl, 0ech MayBeIn60 bl, 0eeh MayBeOut20 ds bx bp
;IN al, dx instr. ;OUT dx, al instr. ;A normal instruction if we get ; down here.
Page 1189
Chapter 20 TryInOut0:
cmp je cmp je
bx, 60e4h IsINAL60 bx, 20e6h IsOut20
;IN al, 60h instr. ;out 20, al instr.
; If it wasn’t one of our magic instructions, execute it and continue. ExecInstr:
pop pop pop iret
ds bx bp
; If this instruction is IN AL, DX we have to look at the value in DX to ; determine if it’s really an IN AL, 60h instruction. MayBeIn60:
cmp jne inc mov jmp
dx, 60h ExecInstr word ptr 2[bp] al, cs:ScanCode NextInstr
;Skip over this 1 byte instr.
; If this is an IN AL, 60h instruction, simulate it by loading the current ; scan code into AL. IsInAL60:
mov add jmp
al, cs:ScanCode word ptr 2[bp], 2 ;Skip over this 2-byte instr. NextInstr
; If this instruction is OUT DX, AL we have to look at DX to see if we’re ; outputting to location 20h (8259). MayBeOut20:
cmp jne inc jmp
dx, 20h ExecInstr word ptr 2[bp] NextInstr
;Skip this 1 byte instruction.
; If this is an OUT 20h, al instruction, simply skip over it. IsOut20:
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 1190
add jmp
word ptr 2[bp], 2 ;Skip instruction. NextInstr
IsIntInstr- Execute this code if it’s an INT instruction. The problem with the INT instructions is that they reset the trace bit upon execution. For certain guys (see above) we can’t have that. Note: at this point the stack looks like the following: flags rtn cs -+ | rtn ip +-- Points at next instr the CPU will execute. bp bx ds We need to simulate the appropriate INT instruction by: (1) (2) (3)
adding two to the return address on the stack (so it returns beyond the INT instruction. pushing the flags onto the stack. pushing a phony return address onto the stack which simulates the INT 1 interrupt return address but which “returns” us to the specified interrupt vector handler.
All this results in a stack which looks like the following: flags rtn cs -+
The PC Keyboard ; ; ; ; ; ; ; ; ; ; ;
rtn ip
| +-- Points at next instr beyond the INT instruction.
flags
--- Bogus flags to simulate those pushed by INT instr.
rtn cs -+ | rtn ip +-- “Return address” which points at the ISR for this INT. bp bx ds
IsINTInstr:
; MyInt1
add mov mov shl shl
word ptr 2[bp], 2 ;Bump rtn adrs beyond INT instr. bl, 1[bx] bh, 0 bx, 1 ;Multiply by 4 to get vector bx, 1 ; address.
push push push
[bp-0] [bp-2] [bp-4]
;Get and save BP ;Get and save BX. ;Get and save DS.
push xor mov
cx cx, cx ds, cx
;Point DS at interrupt ; vector table.
mov mov
cx, [bp+6] [bp-0], cx
;Get original flags. ;Save as pushed flags.
mov mov mov mov
cx, ds:2[bx] [bp-2], cx cx, ds:[bx] [bp-4], cx
;Get vector and use it as ; the return address.
pop pop pop pop iret
cx ds bx bp
endp
; Main program - Simulates some keystrokes to demo the above code. Main
proc mov mov
ax, cseg ds, ax
print byte byte byte
“Simulating keystrokes via Trace Flag”,cr,lf “This program places ‘DIR’ in the keyboard buffer” cr,lf,0
mov call mov call
al, 20h KbdSim al, 0a0h KbdSim
;”D” down scan code
mov call mov call
al, 17h KbdSim al, 97h KbdSim
;”I” down scan code
mov call mov call
al, 13h KbdSim al, 93h KbdSim
;”R” down scan code
mov
al, 1Ch
;Enter down scan code
;”D” up scan code
;”I” up scan code
;”R” up scan code
Page 1191
Chapter 20 call mov call
KbdSim al, 9Ch KbdSim
;Enter up scan code
Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
20.7.3 Using the 8042 Microcontroller to Simulate Keystrokes Although the trace flag based “keyboard stuffer” routine works with most software that talks to the hardware directly, it still has a few problems. Specifically, it doesn’t work at all with programs that operate in protected mode via a “DOS Extender” library (programming libraries that let programmers access more than one megabyte of memory while running under DOS). The last technique we will look at is to program the on-board 8042 keyboard microcontroller to transmit a keystroke for us. There are two ways to do this: the PS/2 way and the hard way. The PS/2’s microcontroller includes a command specifically designed to return user programmable scan codes to the system. By writing a 0D2h byte to the controller command port (64h) and a scan code byte to port 60h, you can force the controller to return that scan code as though the user pressed a key on the keyboard. See “The Keyboard Hardware Interface” on page 1159 for more details. Using this technique provides the most compatible (with existing software) way to return scan codes to an application. Unfortunately, this trick only works on machines that have keyboard controllers that are compatible with the PS/2’s; this is not the majority of machines out there. However, if you are writing code for PS/2s or compatibles, this is the best way to go. The keyboard controller on the PC/AT and most other PC compatible machines does not support the 0D2h command. Nevertheless, there is a sneaky way to force the keyboard controller to transmit a scan code, if you’re willing to break a few rules. This trick may not work on all machines (indeed, there are many machines on which this trick is known to fail), but it does provide a workaround on a large number of PC compatible machines. The trick is simple. Although the PC’s keyboard controller doesn’t have a command to return a byte you send it, it does provide a command to return the keyboard controller command byte (KCCB). It also provides another command to write a value to the KCCB. So by writing a value to the KCCB and then issuing the read KCCB command, we can trick the system into returning a user programmable code. Unfortunately, the KCCB contains some undefined reserved bits that have different meanings on different brands of keyboard microcontroller chips. That is the main reason this technique doesn’t work with all machines. The following assembly code demonstrates how to use the PS/2 and PC keyboard controller stuffing methods: .xlist include stdlib.a includelib stdlib.lib .list cseg
Page 1192
segment
para public ‘code’
The PC Keyboard assume
ds:nothing
;**************************************************************************** ; ; PutInATBuffer; ; The following code sticks the scan code into the AT-class keyboard ; microcontroller chip and asks it to send the scan code back to us ; (through the hardware port). ; ; The AT keyboard controller: ; ; Data port is at I/O address 60h ; Status port is at I/O address 64h (read only) ; Command port is at I/O address 64h (write only) ; ; The controller responds to the following values sent to the command port: ; ; 20h - Read Keyboard Controller’s Command Byte (KCCB) and send the data to ; the data port (I/O address 60h). ; ; 60h - Write KCCB. The next byte written to I/O address 60h is placed in ; the KCCB. The bits of the KCCB are defined as follows: ; ; bit 7- Reserved, should be a zero ; bit 6- IBM industrial computer mode. ; bit 5- IBM industrial computer mode. ; bit 4- Disable keyboard. ; bit 3- Inhibit override. ; bit 2- System flag ; bit 1- Reserved, should be a zero. ; bit 0- Enable output buffer full interrupt. ; ; AAh - Self test ; ABh - Interface test ; ACh - Diagnostic dump ; ADh - Disable keyboard ; AEh - Enable keyboard ; C0h - Read Keyboard Controller input port (equip installed) ; D0h - Read Keyboard Controller output port ; D1h - Write Keyboard Controller output port ; E0h - Read test inputs ; F0h - FFh - Pulse Output port. ; ; The keyboard controller output port is defined as follows: ; ; bit 7 - Keyboard data (output) ; bit 6 - Keyboard clock (output) ; bit 5 - Input buffer empty ; bit 4 - Output buffer full ; bit 3 - undefined ; bit 2 - undefined ; bit 1 - Gate A20 ; bit 0 - System reset (0=reset) ; ; The keyboard controller input port is defined as follows: ; ; bit 7 - Keyboard inhibit switch (0=inhibited) ; bit 6 - Display switch (0=color, 1= mono) ; bit 5 - Manufacturing jumper ; bit 4 - System board RAM (0=disable 2nd 256K RAM on system board). ; bits 0-3 - undefined. ; ; The keyboard controller status port (64h) is defined as follows: ; ; bit 1 - Set if input data (60h) not available. ; bit 0 - Set if output port (60h) cannot accept data.
PutInATBuffer proc assume pushf push
near ds:nothing ax
Page 1193
Chapter 20 push push push
bx cx dx
mov
dl, al
;Save char to output.
; Wait until the keyboard controller does not contain data before ; proceeding with shoving stuff down its throat. WaitWhlFull:
; ; ; ; ;
xor in test loopnz
First things first, let’s mask the interrupt controller chip (8259) to tell it to ignore interrupts coming from the keyboard. However, turn the interrupts on so we properly process interrupts from other sources (this is especially important because we’re going to wind up sending a false EOI to the interrupt controller inside the INT 9 BIOS routine). cli in push or out
; ; ; ; ; ;
cx, cx al, 64h al, 1 WaitWhlFull
al, 21h ax al, 2 21h, al
;Get current mask ;Save intr mask ;Mask keyboard interrupt
Transmit the desired scan code to the keyboard controller. Call this byte the new keyboard controller command (we’ve turned off the keyboard, so this won’t affect anything). The following code tells the keyboard controller to take the next byte sent to it and use this byte as the KCCB: call mov out
WaitToXmit al, 60h 64h, al
;Write new KCCB command.
; Send the scan code as the new KCCB: call mov out
WaitToXmit al, dl 60h, al
; The following code instructs the system to transmit the KCCB (i.e., the ; scan code) to the system: call mov out xor Wait4OutFull: in test loopz
WaitToXmit al, 20h 64h, al
;”Send KCCB” command.
cx, cx al, 64h al, 1 Wait4OutFull
; Okay, Send a 45h back as the new KCCB to allow the normal keyboard to work ; properly.
; ; ; ;
Page 1194
call mov out
WaitToXmit al, 60h 64h, al
call mov out
WaitToXmit al, 45h 60h, al
Okay, execute an INT 9 routine so the BIOS (or whoever) can read the key we just stuffed into the keyboard controller. Since we’ve masked INT 9 at the interrupt controller, there will be no interrupt coming along from the key we shoved in the buffer.
The PC Keyboard DoInt9:
in int
al, 60h 9
;Prevents ints from some codes. ;Simulate hardware kbd int.
; Just to be safe, reenable the keyboard: call mov out
WaitToXmit al, 0aeh 64h, al
; Okay, restore the interrupt mask for the keyboard in the 8259a. pop out pop pop pop pop popf ret PutInATBuffer endp
ax 21h, al dx cx bx ax
; WaitToXmit- Wait until it’s okay to send a command byte to the keyboard ; controller port. WaitToXmit
proc push push xor TstCmdPortLp: in test loopnz pop pop ret WaitToXmit endp
near cx ax cx, cx al, 64h al, 2 TstCmdPortLp ax cx
;Check cntrlr input buffer full flag.
;**************************************************************************** ; ; PutInPS2Buffer- Like PutInATBuffer, it uses the keyboard controller chip ; to return the keycode. However, PS/2 compatible controllers ; have an actual command to return keycodes. PutInPS2Buffer proc pushf push push push push mov
near ax bx cx dx dl, al
;Save char to output.
; Wait until the keyboard controller does not contain data before ; proceeding with shoving stuff down its throat. WaitWhlFull:
xor in test loopnz
cx, cx al, 64h al, 1 WaitWhlFull
; The following code tells the keyboard controller to take the next byte ; sent to it and return it as a scan code. call mov out
WaitToXmit al, 0d2h 64h, al
;Return scan code command.
Page 1195
Chapter 20 ; Send the scan code: call mov out pop pop pop pop popf ret PutInPS2Buffer endp
WaitToXmit al, dl 60h, al dx cx bx ax
; Main program - Simulates some keystrokes to demo the above code. Main
20.8
proc mov mov
ax, cseg ds, ax
print byte byte byte
“Simulating keystrokes via Trace Flag”,cr,lf “This program places ‘DIR’ in the keyboard buffer” cr,lf,0
mov call mov call
al, 20h PutInATBuffer al, 0a0h PutInATBuffer
;”D” down scan code
mov call mov call
al, 17h PutInATBuffer al, 97h PutInATBuffer
;”I” down scan code
mov call mov call
al, 13h PutInATBuffer al, 93h PutInATBuffer
;”R” down scan code
mov call mov call
al, 1Ch PutInATBuffer al, 9Ch PutInATBuffer
;Enter down scan code
;”D” up scan code
;”I” up scan code
;”R” up scan code
;Enter up scan code
Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?) Main
Summary This chapter might seem excessively long for such a mundane topic as keyboard I/O. After all, the Standard Library provides only one primitive routine for keyboard input, getc. However, the keyboard on the PC is a complex beast, having no less than two specialized microprocessors controlling it. These microprocessors accept commands from the PC and send commands and data to the PC. If you want to
Page 1196
The PC Keyboard
write some tricky keyboard handling code, you need to have a firm understanding of the keyboard’s underlying hardware. This chapter began by describing the actions the system takes when a user presses a key. As it turns out, the system transmits two scan codes every time you press a key – one scan code when you press the key and one scan code when you release the key. These are called down codes and up codes, accordingly. The scan codes the keyboard transmits to the system have little relationship to the standard ASCII character set. Instead, the keyboard uses its own character set and relies upon the keyboard interrupt service routine to translate these scan codes to their appropriate ASCII codes. Some keys do not have ASCII codes, for these keys the system passes along an extended key code to the application requesting keyboard input. While translating scan codes to ASCII codes, the keyboard interrupt service routine makes use of certain BIOS flags that track the position of the modifier keys. These keys include the shift, ctrl, alt, capslock, and numlock keys. These keys are known as modifiers because the modify the normal code produced by keys on the keyboard. The keyboard interrupt service routine stuffs incoming characters in the system type ahead buffer and updates other BIOS variables in segment 40h. An application program or other system service can access this data prepared by the keyboard interrupt service routine. For more information, see •
“Keyboard Basics” on page 1153
The PC interfaces to the keyboard using two separate microcontroller chips. These chips provide user programming registers and a very flexible command set. If you want to program the keyboard beyond simply reading the keystrokes produced by the keyboard (i.e., manipulate the LEDs on the keyboard), you will need to become familiar with the registers and command sets of these microcontrollers. The discussion of these topics appears in •
“The Keyboard Hardware Interface” on page 1159
Both DOS and BIOS provide facilities to read a key from the system’s type ahead buffer. As usual, BIOS’ functions provide the most flexibility in terms of getting at the hardware. Furthermore, the BIOS int 16h routine lets you check shift key status, stuff scan/ASCII codes into the type ahead buffer, adjust the autorepeat rate, and more. Given this flexibility, it is difficult to understand why someone would want to talk directly to the keyboard hardware, especially considering the compatibility problems that seem to plague such projects. To learn the proper way to read characters from the keyboard, and more, see • •
“The Keyboard DOS Interface” on page 1167 “The Keyboard BIOS Interface” on page 1168
Although accessing the keyboard hardware directly is a bad idea for most applications, there is a small class of programs, like keyboard enhancers and pop-up programs, that really do need to access the keyboard hardware directly. These programs must supply an interrupt service routine for the int 9 (keyboard) interrupt. For all the details, see: • •
“The Keyboard Interrupt Service Routine” on page 1174 “Patching into the INT 9 Interrupt Service Routine” on page 1184
A keyboard macro program (keyboard enhancer) is a perfect example of a program that might need to talk directly to the keyboard hardware. One problem with such programs is that they need to pass characters along to some underlying application. Given the nature of applications present in the world, this can be a difficult task if you want to be compatible with a large number of PC applications. The problems, and some solutions, appear in • • • •
“Simulating Keystrokes” on page 1186 “Stuffing Characters in the Type Ahead Buffer” on page 1186 “Using the 80x86 Trace Flag to Simulate IN AL, 60H Instructions” on page 1187 “Using the 8042 Microcontroller to Simulate Keystrokes” on page 1192
Page 1197
Chapter 20
Page 1198
The PC Parallel Ports
Chapter 21
The original IBM PC design provided support for three parallel printer ports that IBM designated LPT1:, LPT2:, and LPT3:1. IBM probably envisioned machines that could support a standard dot matrix printer, a daisy wheel printer, and maybe some other auxiliary type of printer for different purposes, all on the same machine (laser printers were still a few years in the future at that time). Surely IBM did not anticipate the general use that parallel ports have received or they would probably have designed them differently. Today, the PC’s parallel port controls keyboards, disk drives, tape drives, SCSI adapters, ethernet (and other network) adapters, joystick adapters, auxiliary keypad devices, other miscellaneous devices, and, oh yes, printers. This chapter will not attempt to describe how to use the parallel port for all these various purposes – this book is long enough already. However, a thorough discussion of how the parallel interface controls a printer and one other application of the parallel port (cross machine communication) should provide you with enough ideas to implement the next great parallel device.
21.1
Basic Parallel Port Information There are two basic data transmission methods modern computes employ: parallel data transmission and serial data transmission. In a serial data transmission scheme (see “The PC Serial Ports” on page 1223) one device sends data to another a single bit at a time across one wire. In a parallel transmission scheme, one device sends data to another several bits at a time (in parallel) on several different wires. For example, the PC’s parallel port provides eight data lines compared to the serial port’s single data line. Therefore, it would seem that the parallel port would be able to transmit data eight times as fast since there are eight times as many wires in the cable. Likewise, it would seem that a serial cable, for the same price as a parallel cable, would be able to go eight times as far since there are fewer wires in the cable. And these are the common trade-offs typically given for parallel vs. serial communication methods: speed vs. cost. In practice, parallel communications is not eight times faster than serial communications, nor do parallel cables cost eight times as much. In generally, those who design serial cables (.e.g, ethernet cables) use higher materials and shielding. This raises the cost of the cable, but allows devices to transmit data, still a bit at a time, much faster. Furthermore, the better cable design allows greater distances between devices. Parallel cables, on the other hand, are generally quite inexpensive and designed for very short connections (generally no more than about six to ten feet). The real world problems of electrical noise and cross-talk create problems when using long parallel cables and limit how fast the system can transmit data. In fact the original Centronics printer port specification called for no more than 1,000 characters/second data transmission rate, so many printers were designed to handle data at this transmission rate. Most parallel ports can easily outperform this value; however, the limiting factor is still the cable, not any intrinsic limitation in a modern computer. Although a parallel communication system could use any number of wires to transmit data, most parallel systems use eight data lines to transmit a byte at a time. There are a few notable exceptions. For example, the SCSI interface is a parallel interface, yet newer versions of the SCSI standard allow eight, sixteen, and even thirty-two bit data transfers. In this chapter we will concentrate on byte-sized transfers since the parallel port on the PC provides for eight-bit data. A typical parallel communication system can be one way (or unidirectional ) or two way (bidirectional ). The PC’s parallel port generally supports unidirectional communications (from the PC to the printer), so we will consider this simpler case first. In a unidirectional parallel communication system there are two distinguished sites: the transmitting site and the receiving site. The transmitting site places its data on the data lines and informs the receiving site that data is available; the receiving site then reads the data lines and informs the transmitting site that it
1. In theory, the BIOS allows for a fourth parallel printer port, LPT4:, but few (if any) adapter cards have ever been built that claim to work as LPT4:.
Page 1199 Thi d
t
t d ith F
M k
402
Chapter 21
has taken the data. Note how the two sites synchronize their access to the data lines – the receiving site does not read the data lines until the transmitting site tells it to, the transmitting site does not place a new value on the data lines until the receiving site removes the data and tells the transmitting site that it has the data. Handshaking is the term that describes how these two sites coordinate the data transfer. To properly implement handshaking requires two additional lines. The strobe (or data strobe) line is what the transmitting site uses to tell the receiving site that data is available. The acknowledge line is what the receiving site uses to tell the transmitting site that it has taken the data and is ready for more. The PC’s parallel port actually provides a third handshaking line, busy, that the receiving site can use to tell the transmitting site that it is busy and the transmitting site should not attempt to send data. A typical data transmission session looks something like the following: Transmitting site: 1)
The transmitting site checks the busy line to see if the receiving is busy. If the busy line is active, the transmitter waits in a loop until the busy line becomes inactive.
2)
The transmitting site places its data on the data lines.
3)
The transmitting site activates the strobe line.
4)
The transmitting site waits in a loop for the acknowledge line to become active.
5)
The transmitting site sets the strobe inactive.
6)
The transmitting site waits in a loop for the acknowledge line to become inactive.
7)
The transmitting site repeats steps one through six for each byte it must transmit.
Receiving site: 1)
The receiving site sets the busy line inactive (assuming it is ready to accept data).
2)
The receiving site waits in a loop until the strobe line becomes active.
3)
The receiving site reads the data from the data lines (and processes the data, if necessary).
4)
The receiving site activates the acknowledge line.
5)
The receiving site waits in a loop until the strobe line goes inactive.
6)
The receiving site sets the acknowledge line inactive.
7)
The receiving site repeats steps one through six for each additional byte it must receive.
By carefully following these steps, the receiving and transmitting sites carefully coordinate their actions so the transmitting site doesn’t attempt to put several bytes on the data lines before the receiving site consumes them and the receiving site doesn’t attempt to read data that the transmitting site has not sent. Bidirectional data transmission is often nothing more than two unidirectional data transfers with the roles of the transmitting and receiving sites reversed for the second communication channel. Some PC parallel ports (particularly on PS/2 systems and many notebooks) provide a bidirectional parallel port. Bidirectional data transmission on such hardware is slightly more complex than on systems that implement bidirectional communication with two unidirectional ports. Bidirectional communication on a bidirectional parallel port requires an extra set of control lines so the two sites can determine who is writing to the common data lines at any one time.
Page 1200
The PC Parallel Ports
21.2
The Parallel Port Hardware The standard unidirectional parallel port on the PC provides more than the 11 lines described in the previous section (eight data, three handshake). The PC’s parallel port provides the following signals:
Table 79: Parallel Port Signals Pin Number on Connector
I/O Direction
Active Polarity
Signal Description
1
output
0
Strobe (data available signal).
2-9
output
-
Data lines (bit 0 is pin 2, bit 7 is pin 9).
10
input
0
Acknowledge line (active when remote system has taken data).
11
input
0
Busy line (when active, remote system is busy and cannot accept data).
12
input
1
Out of paper (when active, printer is out of paper).
13
input
1
Select. When active, the printer is selected.
14
output
0
Autofeed. When active, the printer automatically inserts a line feed after every carriage return it receives.
15
input
0
Error. When active, there is a printer error.
16
output
0
Init. When held active for at least 50 µsec, this signal causes the printer to initialize itself.
17
output
0
Select input. This signal, when inactive, forces the printer off-line
18-25
-
-
Signal ground.
Note that the parallel port provides 12 output lines (eight data lines, strobe, autofeed, init, and select input) and five input lines (acknowledge, busy, out of paper, select, and error). Even though the port is unidirectional, there is a good mixture of input and output lines available on the port. Many devices (like disk and tape drives) that require bidirectional data transfer use these extra lines to perform bidirectional data transfer. On bidirectional parallel ports (found on PS/2 and laptop systems), the strobe and data lines are both input and output lines. There is a bit in a control register associated with the parallel port that selects the transfer direction at any one given instant (you cannot transfer data in both direction simultaneously). There are three I/O addresses associated with a typical PC compatible parallel port. These addresses belong to the data register, the status register, and the control register. The data register is an eight-bit read/write port. Reading the data register (in a unidirectional mode) returns the value last written to the data register. The control and status registers provide the interface to the other I/O lines. The organization of these ports is as follows:
7
6
5
4
3
2
1
0
Unused Printer ackon PS/2 systems (active if zero) Device error (active if zero) Device selected (selected if one) Device out of paper (out of paper if one) Printer acknowledge (ack if zero) Printer busy (busy if zero)
Parallel Port Status Register (read only) Page 1201
Chapter 21
Bit two (printer acknowledge) is available only on PS/2 and other systems that support a bidirectional printer port. Other systems do not use this bit.
7
6
5
4
3
2
1
0
Strobe (data available = 1) Autofeed (add linefeed = 1) Init (initialize printer = 0) Select input (On-line = 1) Enable parallel port IRQ (active if 1) PS/2 Data direction (output = 0, input = 1) Unused
Parallel Port Control Register The parallel port control register is an output register. Reading this location returns the last value written to the control register except for bit five that is write only. Bit five, the data direction bit, is available only on PS/2 and other systems that support a bidirectional parallel port. If you write a zero to this bit, the strobe and data lines are output bits, just like on the unidirectional parallel port. If you write a one to this bit, then the data and strobe lines are inputs. Note that in the input mode (bit 5 = 1), bit zero of the control register is actually an input. Note: writing a one to bit four of the control register enables the printer IRQ (IRQ 7). However, this feature does not work on all systems so very few programs attempt to use interrupts with the parallel port. When active, the parallel port will generate an int 0Fh whenever the printer acknowledges a data transmission. Since the PC supports up to three separate parallel ports, there could be as many as three sets of these parallel port registers in the system at any one time. There are three parallel port base addresses associated with the three possible parallel ports: 3BCh, 378h, and 278h. We will refer to these as the base addresses for LPT1:, LPT2:, and LPT3:, respectively. The parallel port data register is always located at the base address for a parallel port, the status register appears at the base address plus one, and the control register appears at the base address plus two. For example, for LPT1:, the data register is at I/O address 3BCh, the status register is at I/O address 3BDh, and the control register is at I/O address 3BEh. There is one minor glitch. The I/O addresses for LPT1:, LPT2:, and LPT3: given above are the physical addresses for the parallel ports. The BIOS provides logical addresses for these parallel ports as well. This lets users remap their printers (since most software only writes to LPT1:). To accomplish this, the BIOS reserves eight bytes in the BIOS variable space (40:8, 40:0A, 40:0C, and 40:0E). Location 40:8 contains the base address for logical LPT1:, location 40:0A contains the base address for logical LPT2:, etc. When software accesses LPT1:, LPT2:, etc., it generally accesses the parallel port whose base address appears in one of these locations.
21.3
Controlling a Printer Through the Parallel Port Although there are many devices that connect to the PC’s parallel port, printers still make up the vast number of such connections. Therefore, describing how to control a printer from the PC’s parallel port is probably the best first example to present. As with the keyboard, your software can operate at three different levels: it can print data using DOS, using BIOS, or by writing directly to the parallel port hardware. As with the keyboard interface, using DOS or BIOS is the best approach if you want to maintain compatibility with other devices that plug into the parallel port2. Of course, if you are controlling some other type of
2. Many devices connect to the parallel port with a pass-through plug allowing you to use that device and still use the parallel port for your printer. However, if you talk directly to the parallel port with your software, it may conflict with that device’s operation.
Page 1202
The PC Parallel Ports
device, going directly to the hardware is your only choice. However, the BIOS provides good printer support, so going directly to the hardware is rarely necessary if you simply want to send data to the printer.
21.3.1 Printing via DOS MS-DOS provides two calls you can use to send data to the printer. DOS function 05h writes the character in the dl register directly to the printer. Function 40h, with a file handle of 04h, also sends data to the printer. Since the chapter on DOS and BIOS fully describes these functions, we will not discuss them any further here. For more information, see “MS-DOS, PC-BIOS, and File I/O” on page 699 .
21.3.2 Printing via BIOS Although DOS provides a reasonable set of functions to send characters to the printer, it does not provide functions to let you initialize the printer or obtain the current printer status. Furthermore, DOS only prints to LPT1:. The PC’s int 17h BIOS routine provides three functions, print, initialize, and status. You can apply these functions to any supported parallel port on the system. The print function is roughly equivalent to DOS’ print character function. The initialize function initializes the printer using system dependent timing information. The printer status returns the information from the printer status port along with time-out information. For more information on these routines, see “MS-DOS, PC-BIOS, and File I/O” on page 699.
21.3.3 An INT 17h Interrupt Service Routine Perhaps the best way to see how the BIOS functions operate is to write a replacement int 17h ISR for a printer. This section explains the handshaking protocol and variables the printer driver uses. It also describes the operation and return results associated with each machine. There are eight variables in the BIOS variable space (segment 40h) the printer driver uses. The following table describes each of these variables:
Table 80: BIOS Parallel Port Variables Address
Description
40:08
Base address of LPT1: device.
40:0A
Base address of LPT2: device.
40:0C
Base address of LPT3: device.
40:0E
Base address of LPT4: device.
40:78
LPT1: time-out value. The printer port driver software should return an error if the printer device does not respond in a reasonable amount of time. This variable (if non-zero) determines how many loops of 65,536 iterations each a driver will wait for a printer acknowledge. If zero, the driver will wait forever.
40:79
LPT2: time-out value. See description above.
40:7A
LPT3: time-out value. See description above.
40:7B
LPT4: time-out value. See description above.
You will notice a slight deviation in the handshake protocol in the following code. This printer driver does not wait for an acknowledge from the printer after sending a character. Instead, it checks to see if Page 1203
Chapter 21
the printer has sent an acknowledge to the previous character before sending a character. This saves a small amount of time because the program printer then characters can continue to operating in parallel with the receipt of the acknowledge from the printer. You will also notice that this particular driver does not monitor the busy lines. Almost every printer in existence leaves this line inactive (not busy), so there is no need to check it. If you encounter a printer than does manipulate the busy line, the modification to this code is trivial. The following code implements the int 17h service: ; ; ; ; ; ; ; ; ; ; ; ; ; ;
INT17.ASM A short passive TSR that replaces the BIOS’ int 17h handler. This routine demonstrates the function of each of the int 17h functions that a standard BIOS would provide. Note that this code does not patch into int 2Fh (multiplex interrupt) nor can you remove this code from memory except by rebooting. If you want to be able to do these two things (as well as check for a previous installation), see the chapter on resident programs. Such code was omitted from this program because of length constraints. cseg and EndResident must occur before the standard library segments!
cseg cseg
segment ends
para public ‘code’
; Marker segment, to find the end of the resident section. EndResident EndResident
segment ends
para public ‘Resident’
.xlist include stdlib.a includelib stdlib.lib .list byp
equ
cseg
segment assume
para public ‘code’ cs:cseg, ds:cseg
OldInt17
dword
?
; BIOS variables: PrtrBase PrtrTimeOut
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 1204
equ equ
8 78h
This code handles the INT 17H operation. INT 17H is the BIOS routine to send data to the printer and report on the printer’s status. There are three different calls to this routine, depending on the contents of the AH register. The DX register contains the printer port number. DX=0 DX=1 DX=2 DX=3
-----
Use Use Use Use
LPT1: LPT2: LPT3: LPT4:
AH=0 --
Print the character in AL to the printer. Printer status is returned in AH. If bit #0 = 1 then a timeout error occurred.
AH=1 --
Initialize printer. Status is returned in AH.
AH=2 --
Return printer status in AH.
The status bits returned in AH are as follows:
The PC Parallel Ports ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Bit --0 1 2 3 4 5 6 7
Function -------------------------1=time out error unused unused 1=I/O error 1=selected, 0=deselected. 1=out of paper 1=acknowledge 1=not busy
Non-error values ---------------0 x x 0 1 0 x x
Note that the hardware returns bit 3 with zero if an error has occurred, with one if there is no error. The software normally inverts this bit before returning it to the caller. Printer port hardware locations: There are three ports used by the printer hardware: PrtrPortAdrs --PrtrPortAdrs+1 --PrtrPortAdrs+2 ---
Output port where data is sent to printer (8 bits). Input port where printer status can be read (8 bits). Output port where control information is sent to the printer.
Data output port- 8-bit data is transmitted to the printer via this port. Input status port: bit 0: bit 1: bit 2:
unused. unused. unused.
bit 3:
-Error, normally this bit means that the printer has encountered an error. However, with the P101 installed this is a data return line for the keyboard scan.
bit 4:
+SLCT, normally this bit is used to determine if the printer is selected or not. With the P101 installed this is a data return line for the keyboard scan.
bit 5:
+PE, a 1 in this bit location means that the printer has detected the end of paper. On many printer ports, this bit has been found to be inoperative.
bit 6:
-ACK, A zero in this bit position means that the printer has accepted the last character and is ready to accept another. This bit is not normally used by the BIOS as bit 7 also provides this function (and more).
bit 7:
-Busy, When this signal is active (0) the printer is busy and cannot accept data. When this bit is set to one, the printer can accept another character.
Output control port: Bit 0:
+Strobe, A 0.5 us (minimum) active high pulse on this bit clocks the data latched into the printer data output port to the printer.
Bit 1:
+Auto FD XT - A 1 stored at this bit causes the printer to line feed after a line is printed. On some printer interfaces (e.g., the Hercules Graphics Card) this bit is inoperative.
Bit 2:
-INIT, a zero on this bit (for a minimum of 50 us) will cause the printer to (re)init-
Page 1205
Chapter 21 ; ; ; ; ; ; ; ; ; ; ; ; ; ; MyInt17
ialize itself. Bit 3:
+SLCT IN, a one in this bit selects the printer. A zero will cause the printer to go off-line.
Bit 4:
+IRQ ENABLE, a one in this bit position allows an interrupt to occur when -ACK changes from one to zero.
Bit 5: Bit 6: Bit 7:
Direction control on BI-DIR port. 0=output, 1=input. reserved, must be zero. reserved, must be zero.
proc assume
far ds:nothing
push push push push
ds bx cx dx
mov mov
bx, 40h ds, bx
;Point DS at BIOS vars.
cmp ja
dx, 3 InvalidPrtr
;Must be LPT1..LPT4.
cmp jz cmp jb je
ah, 0 PrtChar ah, 2 PrtrInit PrtrStatus
;Branch to the appropriate code for ; the printer function
; If they passed us an opcode we don’t know about, just return. InvalidPrtr:
jmp
ISR17Done
; Initialize the printer by pulsing the init line for at least 50 us. ; The delay loop below will delay well beyond 50 usec even on the fastest ; machines. PrtrInit:
PIDelay:
mov shl mov test je add in and out mov loop or out jmp
bx, dx bx, 1 dx, PrtrBase[bx] dx, dx InvalidPrtr dx, 2 al, dx al, 11011011b dx, al cx, 0 PIDelay al, 100b dx, al ISR17Done
;Get printer port value. ;Convert to byte index. ;Get printer base address. ;Does this printer exist? ;Quit if no such printer. ;Point dx at control reg. ;Read current status. ;Clear INIT/BIDIR bits. ;Reset printer. ;This will produce at least ; a 50 usec delay. ;Stop resetting printer.
; Return the current printer status. This code reads the printer status ; port and formats the bits for return to the calling code. PrtrStatus:
Page 1206
mov shl mov mov test je inc in and jmp
bx, dx bx, 1 dx, PrtrBase[bx] al, 00101001b dx, dx InvalidPrtr dx al, dx al, 11111000b ISR17Done
;Get printer port value. ;Convert to byte index. ;Base address of printer port. ;Dflt: every possible error. ;Does this printer exist? ;Quit if no such printer. ;Point at status port. ;Read status port. ;Clear unused/timeout bits.
The PC Parallel Ports
; Print the character in the accumulator! PrtChar:
; ; ; ;
mov mov shl mov or jz
bx, dx cl, PrtrTimeOut[bx] ;Get time out value. bx, 1 ;Convert to byte index. dx, PrtrBase[bx] ;Get Printer port address dx, dx ;Non-nil pointer? NoPrtr2 ; Branch if a nil ptr
The following code checks to see if an acknowlege was received from the printer. If this code waits too long, a time-out error is returned. Acknowlege is supplied in bit #7 of the printer status port (which is the next address after the printer data port).
WaitLp1: WaitLp2:
push inc mov mov xor in mov test jnz loop dec jnz
ax dx bl, cl bh, cl cx, cx al, dx ah, al al, 80h GotAck WaitLp2 bl WaitLp1
;Point at status port ;Put timeout value in bl ; and bh. ;Init count to 65536. ;Read status port ;Save status for now. ;Printer acknowledge? ;Branch if acknowledge. ;Repeat 65536 times. ;Decrement time out value. ;Repeat 65536*TimeOut times.
; See if the user has selected no timeout: cmp je ; ; ; ; ;
bh, 0 WaitLp1
TIMEOUT ERROR HAS OCCURRED! A timeout - I/O error is returned to the system at this point. Either we fall through to this point from above (time out error) or the referenced printer port doesn’t exist. In any case, return an error.
NoPrtr2:
or and xor
ah, 9 ah, 0F9h ah, 40h
;Set timeout-I/O error flags ;Turn off unused flags. ;Flip busy bit.
; Okay, restore registers and return to caller. pop mov jmp
cx al, cl ISR17Done
;Remove old ax. ;Restore old al.
; If the printer port exists and we’ve received an acknowlege, then it’s ; okay to transmit data to the printer. That job is handled down here. GotAck: GALp:
; ; ; ;
mov loop pop push dec pushf cli out
cx, 16 GALp ax ax dx
;Short delay if crazy prtr ; needs hold time after ack. ;Get char to output and ; save again. ;Point DX at printer port. ;Turn off interrupts for now.
dx, al
;Output data to the printer.
The following short delay gives the data time to travel through the parallel lines. This makes sure the data arrives at the printer before the strobe (the times can vary depending upon the capacitance of the parallel cable’s lines).
mov DataSettleLp: loop
cx, 16 DataSettleLp
;Give data time to settle ; before sending strobe.
; Now that the data has been latched on the printer data output port, a ; strobe must be sent to the printer. The strobe line is connected to
Page 1207
Chapter 21 ; ; ; ;
bit zero of the control port. Also note that this clears bit 5 of the control port. This ensures that the port continues to operate as an output port if it is a bidirectional device. This code also clears bits six and seven which IBM claims should be left zero. inc inc in and out
dx dx al, dx al, 01eh dx, al
;Point DX at the printer ; control output port. ;Get current control bits. ;Force strobe line to zero and ; make sure it’s an output port.
mov loop
cx, 16 Delay0
;Short delay to allow data ; to become good.
or out
al, 1 dx, al
;Send out the (+) strobe. ;Output (+) strobe to bit 0
mov loop
cx, 16 StrobeDelay
;Short delay to lengthen strobe
and out popf
al, 0FEh dx, al
;Clear the strobe bit. ;Output to control port. ;Restore interrupts.
pop mov
dx al, dl
;Get old AX value ;Restore old AL value
dx cx bx ds
MyInt17
pop pop pop pop iret endp
Main
proc
Delay0:
StrobeDelay:
ISR17Done:
; ; ; ;
mov mov
ax, cseg ds, ax
print byte byte
“INT 17h Replacement”,cr,lf “Installing....”,cr,lf,0
Patch into the INT 17 interrupt vector. Note that the statements above have made cseg the current data segment, so we can store the old INT 17 value directly into the OldInt17 variable. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[17h*4] word ptr OldInt17, ax ax, es:[17h*4 + 2] word ptr OldInt17+2, ax es:[17h*4], offset MyInt17 es:[17h*4+2], cs ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident.
Page 1208
print byte
“Installed.”,cr,lf,0
mov int
ah, 62h 21h
mov sub mov
dx, EndResident;Compute size of program. dx, bx ax, 3100h ;DOS TSR command.
;Get this program’s PSP ; value.
The PC Parallel Ports
21.4
int endp ends
21h
Main cseg sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?) Main
Inter-Computer Communications on the Parallel Port Although printing is, by far, the most popular use for the parallel port on a PC, many devices use the parallel port for other purposes, as mentioned earlier. It would not be fitting to close this chapter without at least one example of a non-printer application for the parallel port. This section will describe how to get two computers to transmit files from one to the other across the parallel port. The Laplinkprogram from Travelling Software is a good example of a commercial product that can transfer data across the PC’s parallel port; although the following software is not as robust or feature laden as Laplink, it does demonstrate the basic principles behind such software. Note that you cannot connect two computer’s parallel ports with a simple cable that has DB25 connectors at each end. In fact, doing so could damage the computers’ parallel ports because you’d be connecting digital outputs to digital outputs (a real no-no). However, you purchase “Laplink compatible” cables (or buy real Laplink cables for that matter) the provide proper connections between the parallel ports of two computers. As you may recall from the section on the parallel port hardware, the unidirectional parallel port provides five input signals. A Laplink cable routes four of the data lines to four of these input lines in both directions. The connections on a Laplink compatible cable are as follows:
Transmitting Site
Receiving Site
Data bit 4
Busy (inverted)
Data bit 3
Acknowledge
Data bit 2
Paper Empty
Data bit 1
Select
Data bit 0
Error
Connections on a Laplink Compatible Cable Data written on bits zero through three of the data register at the transmitting site appear, unchanged, on bits three through six of the status port on the receiving site. Bit four of the transmitting site appears, inverted, at bit seven of the receiving site. Note that Laplink compatible cables are bidirectional. That is, you can transmit data from either site to the other using the connections above. However, since there are only five input bits on the parallel port, you must transfer the data four bits at a time (we need one bit for the data strobe). Since the receiving site needs to acknowledge data transmissions, we cannot simultaneously transmit data in both directions. We must use one of the output lines at the site receiving data to acknowledge the incoming data.
Page 1209
Chapter 21
Since the two sites cooperating in a data transfer across the parallel cable must take turns transmitting and receiving data, we must develop a protocol so each participant in the data transfer knows when it is okay to transmit and receive. Our protocol will be very simple – a site is either a transmitter or a receiver, the roles will never switch. Designing a more complex protocol is not difficult, but this simple protocol will suffice for the example you are about to see. Later in this section we will discuss ways to develop a protocol that allows two-way transmissions. The following example programs will transmit and receive a single file across the parallel port. To use this software, you run the transmit program on the transmitting site and the receive program on the receiving site. The transmission program fetches a file name from the DOS command line and opens that file for reading (generating an error, and quitting, if the file does not exist). Assuming the file exists, the transmit program then queries the receiving site to see if it is available. The transmitter checks for the presence of the receiving site by alternately writing zeros and ones to all output bits then reading its input bits. The receiving site will invert these values and write them back when it comes on-line. Note that the order of execution (transmitter first or receiver first) does not matter. The two programs will attempt to handshake until the other comes on line.When both sites cycle through the inverting values three times, they write the value 05h to their output ports to tell the other site they are ready to proceed. A time-out function aborts either program if the other site does not respond in a reasonable amount of time. Once the two sites are synchronized, the transmitting site determines the size of the file and then transmits the file name and size to the receiving site. The receiving site then begins waiting for the receipt of data. The transmitting site sends the data 512 bytes at a time to the receiving site. After the transmission of 512 bytes, the receiving site delays sending an acknowledgment and writes the 512 bytes of data to the disk. Then the receiving site sends the acknowledge and the transmitting site begins sending the next 512 bytes. This process repeats until the receiving site has accepted all the bytes from the file. Here is the code for the transmitter: ; ; ; ; ; ; ; ;
TRANSMIT.ASM This program is the transmitter portion of the programs that transmit files across a Laplink compatible parallel cable. This program assumes that the user want to use LPT1: for transmission. Adjust the equates, or read the port from the command line if this is inappropriate. .286 .xlist include stdlib.a includelib stdlib.lib .list
dseg
segment
para public ‘data’
TimeOutConst PrtrBase
equ equ
4000 10
MyPortAdrs FileHandle FileBuffer
word word byte
? ? 512 dup (?)
;Holds printer port address. ;Handle for output file. ;Buffer for incoming data.
?
;Size of incoming file. ;Holds ptr to filename
FileSize dword ? FileNamePtr dword
Page 1210
;About 1 min on 66Mhz 486. ;Offset to LPT1: adrs.
dseg
ends
cseg
segment assume
; TestAbort;
Check to see if the user has pressed ctrl-C and wants to abort this program. This routine calls BIOS to see if the
para public ‘code’ cs:cseg, ds:dseg
The PC Parallel Ports ; ; ;
user has pressed a key. If so, it calls DOS to read the key (function AH=8, read a key w/o echo and with ctrl-C checking).
TestAbort
TestAbort
proc push push push mov int je mov int pop pop pop ret endp
; SendByte;
Transmit the byte in AL to the receiving site four bits at a time.
SendByte
proc push push mov
near cx dx ah, al
;Save byte to xmit.
mov
dx, MyPortAdrs
;Base address of LPT1: port.
NoKeyPress:
near ax cx dx ah, 1 16h NoKeyPress ah, 8 21h dx cx ax
;See if keypress. ;Return if no keypress. ;Read char, chk for ctrl-C. ;DOS aborts if ctrl-C.
; First, just to be sure, write a zero to bit #4. This reads as a one ; in the busy bit of the receiver. mov out ; ; ; ; ; ; ;
al, 0 dx, al
;Data not ready yet.
Wait until the receiver is not busy. The receiver will write a zero to bit #4 of its data register while it is busy. This comes out as a one in our busy bit (bit 7 of the status register). This loop waits until the receiver tells us its ready to receive data by writing a one to bit #4 (which we read as a zero). Note that we check for a ctrl-C every so often in the event the user wants to abort the transmission.
inc W4NBLp: mov Wait4NotBusy: in test loopne je call jmp
dx cx, 10000 al, dx al, 80h Wait4NotBusy ItsNotbusy TestAbort W4NBLp
;Point at status register. ;Read status register value. ;Bit 7 = 1 if busy. ;Repeat while busy, 10000 times. ;Leave loop if not busy. ;Check for Ctrl-C.
; Okay, put the data on the data lines: ItsNotBusy:
dec mov and out or out
dx al, al, dx, al, dx,
ah 0Fh al 10h al
;Point at data register. ;Get a copy of the data. ;Strip out H.O. nibble ;”Prime” data lines, data not avail. ;Turn data available on. ;Send data w/data available strobe.
; Wait for the acknowledge from the receiving site. Every now and then ; check for a ctrl-C so the user can abort the transmission program from ; within this loop. W4ALp: Wait4Ack:
inc mov in test loope jne call
dx cx, 10000 al, dx al, 80h Wait4Ack GotAck TestAbort
;Point at status register. ;Times to loop between ctrl-C checks. ;Read status port. ;Ack = 1 when rcvr acknowledges. ;Repeat 10000 times or until ack. ;Branch if we got an ack. ;Every 10000 calls, check for a
Page 1211
Chapter 21 jmp
W4ALp
; ctrl-C from the user.
; Send the data not available signal to the receiver: GotAck:
dec mov out
dx al, 0 dx, al
;Point at data register. ;Write a zero to bit 4, this appears ; as a one in the rcvr’s busy bit.
; Okay, on to the H.O. nibble: inc W4NB2: mov Wait4NotBsy2: in test loopne je call jmp
dx cx, 10000 al, dx al, 80h Wait4NotBsy2 NotBusy2 TestAbort W4NB2
;Point at status register. ;10000 calls between ctrl-C checks. ;Read status register. ;Bit 7 = 1 if busy. ;Loop 10000 times while busy. ;H.O. bit clear (not busy)? ;Check for ctrl-C.
; Okay, put the data on the data lines: NotBusy2:
dec mov shr out or out
dx al, al, dx, al, dx,
ah 4 al 10h al
;Point at data register. ;Retrieve data to get H.O. nibble. ;Move H.O. nibble to L.O. nibble. ;”Prime” data lines. ;Data + data available strobe. ;Send data w/data available strobe.
; Wait for the acknowledge from the receiving site: W4A2Lp: Wait4Ack2:
inc mov in test loope jne call jmp
dx cx, 10000 al, dx al, 80h Wait4Ack2 GotAck2 TestAbort W4A2Lp
;Point at status register. ;Read status port. ;Ack = 1 ;While while no acknowledge ;H.O. bit = 1 (ack)? ;Check for ctrl-C
; Send the data not available signal to the receiver: GotAck2:
SendByte
dec mov out
dx al, 0 dx, al
;Point at data register. ;Output a zero to bit #4 (that ; becomes busy=1 at rcvr).
mov pop pop ret endp
al, ah dx cx
;Restore original data in AL.
; Synchronization routines: ; ; Send0sTransmits a zero to the receiver site and then waits to ; see if it gets a set of ones back. Returns carry set if ; this works, returns carry clear if we do not get a set of ; ones back in a reasonable amount of time. Send0s
Wait41s:
Page 1212
proc push push
near cx dx
mov
dx, MyPortAdrs
mov out
al, 0 dx, al
;Write the initial zero ; value to our output port.
xor inc in dec
cx, cx dx al, dx dx
;Checks for ones 10000 times. ;Point at status port. ;Read status port. ;Point back at data port.
The PC Parallel Ports and cmp loopne je clc pop pop ret Got1s:
al, 78h al, 78h Wait41s Got1s
;Mask input bits. ;All ones yet? ;Branch if success. ;Return failure.
dx cx
Send0s
stc pop pop ret endp
; Send1s; ; ;
Transmits all ones to the receiver site and then waits to see if it gets a set of zeros back. Returns carry set if this works, returns carry clear if we do not get a set of zeros back in a reasonable amount of time.
Send1s
proc push push
near cx dx
mov
dx, MyPortAdrs
;LPT1: base address.
mov out
al, 0Fh dx, al
;Write the “all ones” ; value to our output port.
mov inc in dec and loopne je clc pop pop ret
cx, 0 dx al, dx dx al, 78h Wait40s Got0s
Wait40s:
Got0s:
Send1s
stc pop pop ret endp
;Return success. dx cx
;Point at input port. ;Read the status port. ;Point back at data port. ;Mask input bits. ;Loop until we get zero back. ;All zeros? If so, branch. ;Return failure.
dx cx
;Return success. dx cx
; Synchronize- This procedure slowly writes all zeros and all ones to its ; output port and checks the input status port to see if the ; receiver site has synchronized. When the receiver site ; is synchronized, it will write the value 05h to its output ; port. So when this site sees the value 05h on its input ; port, both sites are synchronized. Returns with the ; carry flag set if this operation is successful, clear if ; unsuccessful. Synchronize
SyncLoop:
proc print byte byte
near “Synchronizing with receiver program” cr,lf,0
mov
dx, MyPortAdrs
mov call jc
cx, TimeOutConst Send0s Got1s
;Time out delay. ;Send zero bits, wait for ; ones (carry set=got ones).
; If we didn’t get what we wanted, write some ones at this point and see ; if we’re out of phase with the receiving site.
Page 1213
Chapter 21 Retry0:
call jc
Send1s SyncLoop
;Send ones, wait for zeros. ;Carry set = got zeros.
; Well, we didn’t get any response yet, see if the user has pressed ctrl-C ; to abort this program. DoRetry:
call
TestAbort
; Okay, the receiving site has yet to respond. Go back and try this again. loop
SyncLoop
; If we’ve timed out, print an error message and return with the carry ; flag clear (to denote a timeout error). print byte byte clc ret
“Transmit: Timeout error waiting for receiver” cr,lf,0
; Okay, we wrote some zeros and we got some ones. Let’s write some ones ; and see if we get some zeros. If not, retry the loop. Got1s: call jnc
Send1s DoRetry
;Send one bits, wait for ; zeros (carry set=got zeros).
; Well, we seem to be synchronized. Just to be sure, let’s play this out ; one more time. call jnc call jnc
Send0s Retry0 Send1s DoRetry
;Send zeros, wait for ones. ;Send ones, wait for zeros.
; We’re syncronized. Let’s send out the 05h value to the receiving ; site to let it know everything is cool: mov out
al, 05h dx, al
FinalDelay:
xor loop
cx, cx FinalDelay
Synchronize
print byte byte stc ret endp
;Send signal to receiver to ; tell it we’re sync’d. ;Long delay to give the rcvr ; time to prepare.
“Synchronized with receiving site” cr,lf,0
; File I/O routines: ; ; GetFileInfo- Opens the user specified file and passes along the file ; name and file size to the receiving site. Returns the ; carry flag set if this operation is successful, clear if ; unsuccessful. GetFileInfo
proc
near
; Get the filename from the DOS command line: mov argv mov mov printf byte dword
Page 1214
ax, 1 word ptr FileNamePtr, di word ptr FileNamePtr+2, es “Opening %^s\n”,0 FileNamePtr
The PC Parallel Ports ; Open the file: push mov lds int pop jc mov
ds ax, 3D00h dx, FileNamePtr 21h ds BadFile FileHandle, ax
;Open for reading.
; Compute the size of the file (do this by seeking to the last position ; in the file and using the return position as the file length): mov mov xor xor int jc
bx, ax ax, 4202h cx, cx dx, dx 21h BadFile
;Need handle in BX. ;Seek to end of file. ;Seek to position zero ; from the end of file.
; Save final position as file length: mov mov
word ptr FileSize, ax word ptr FileSize+2, dx
; Need to rewind file back to the beginning (seek to position zero): mov mov xor xor int jc
bx, FileHandle ax, 4200h cx, cx dx, dx 21h BadFile
;Need handle in BX. ;Seek to beginning of file. ;Seek to position zero
; Okay, transmit the good stuff over to the receiving site:
SendName:
BadFile:
GetFileInfo
mov call mov call mov call mov call
al, byte SendByte al, byte SendByte al, byte SendByte al, byte SendByte
les mov call inc cmp jne stc ret
bx, FileNamePtr al, es:[bx] SendByte bx al, 0 SendName
print byte puti putcr clc ret endp
ptr FileSize
;Send the file ; size over.
ptr FileSize+1 ptr FileSize+2 ptr FileSize+3 ;Send the characters ; in the filename to ; the receiver until ; we hit a zero byte. ;Return success.
“Error transmitting file information:”,0
; GetFileData-This procedure reads the data from the file and transmits ; it to the receiver a byte at a time. GetFileData
proc mov mov mov lea int
near ah, 3Fh cx, 512 bx, FileHandle dx, FileBuffer 21h
;DOS read opcode. ;Read 512 bytes at a time. ;File to read from. ;Buffer to hold data. ;Read the data
Page 1215
Chapter 21
XmitLoop:
GFDError:
GFDDone: GetFileData
jc
GFDError
;Quit if error reading data.
mov jcxz lea mov call inc loop jmp
cx, ax GFDDone bx, FileBuffer al, [bx] SendByte bx XmitLoop GetFileData
;Save # of bytes actually read. ; quit if at EOF. ;Send the bytes in the file ; buffer over to the rcvr ; one at a time.
print byte puti print byte ret endp
;Read rest of file.
“DOS error #”,0 “ while reading file”,cr,lf,0
; Okay, here’s the main program that controls everything. Main
proc mov mov meminit
ax, dseg ds, ax
; First, get the address of LPT1: from the BIOS variables area. mov mov mov mov
ax, 40h es, ax ax, es:[PrtrBase] MyPortAdrs, ax
; See if we have a filename parameter: argc cmp je print byte jmp
GotName:
Page 1216
cx, 1 GotName “Usage: transmit ”,cr,lf,0 Quit
call jnc
Synchronize Quit
;Wait for the transmitter program.
call jnc
GetFileInfo Quit
;Get file name and size.
call
GetFileData
;Get the file’s data.
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?) Main
The PC Parallel Ports
Here is the receiver program that accepts and stores away the data sent by the program above: ; ; ; ; ; ; ; ;
RECEIVE.ASM This program is the receiver portion of the programs that transmit files across a Laplink compatible parallel cable. This program assumes that the user want to use LPT1: for transmission. Adjust the equates, or read the port from the command line if this is inappropriate. .286 .xlist include stdlib.a includelib stdlib.lib .list
dseg
segment
para public ‘data’
TimeOutConst PrtrBase
equ equ
100 8
;About 1 min on 66Mhz 486. ;Offset to LPT1: adrs.
MyPortAdrs FileHandle FileBuffer
word word byte
? ? 512 dup (?)
;Holds printer port address. ;Handle for output file. ;Buffer for incoming data.
FileSize FileName
dword byte
? 128 dup (0)
;Size of incoming file. ;Holds filename
dseg
ends
cseg
segment assume
; TestAbort;
Reads the keyboard and gives the user the opportunity to hit the ctrl-C key.
TestAbort
TestAbort
proc push mov int je mov int pop ret endp
; GetByte;
Reads a single byte from the parallel port (four bits at at time). Returns the byte in AL.
GetByte
proc push push
NoKeyPress:
para public ‘code’ cs:cseg, ds:dseg
near ax ah, 1 16h NoKeypress ah, 8 21h ax
;See if keypress. ;Read char, chk for ctrl-C
near cx dx
; Receive the L.O. Nibble.
W4DLp: Wait4Data:
mov mov out
dx, MyPortAdrs al, 10h dx, al
;Signal not busy.
inc
dx
;Point at status port
mov in test loopne je call
cx, 10000 al, dx al, 80h Wait4Data DataIsAvail TestAbort
;See if data available. ; (bit 7=0 if data available). ;Is data available? ;If not, check for ctrl-C.
Page 1217
Chapter 21
DataIsAvail:
W4ALp: Wait4Ack:
jmp
W4DLp
shr and mov
al, 3 al, 0Fh ah, al
;Save this four bit package ; (This is the L.O. nibble ; for our byte).
dec mov out
dx al, 0 dx, al
;Point at data register. ;Signal data taken.
inc mov in test loope jne call jmp
dx cx, 10000 al, dx al, 80h Wait4Ack NextNibble TestAbort W4ALp
;Point at status register. ;Wait for transmitter to ; retract data available. ;Loop until data not avail. ;Branch if data not avail. ;Let user hit ctrl-C.
; Receive the H.O. nibble: NextNibble:
W4D2Lp: Wait4Data2:
DataAvail2:
W4A2Lp: Wait4Ack2:
ReturnData:
GetByte
dec mov out inc mov in test loopne je call jmp
dx al, 10h dx, al dx cx, 10000 al, dx al, 80h Wait4Data2 DataAvail2 TestAbort W4D2Lp
shl and or dec mov out
al, al, ah, dx al, dx,
inc mov in test loope jne call jmp
dx cx, 10000 al, dx al, 80h Wait4Ack2 ReturnData TestAbort W4A2Lp
;Point at status register.
mov pop pop ret endp
al, ah dx cx
;Put data in al.
1 0F0h al 0 al
;Point at data register. ;Signal not busy ;Point at status port ;See if data available. ; (bit 7=0 if data available). ;Loop until data available. ;Branch if data available. ;Check for ctrl-C. ;Merge this H.O. nibble ; with the existing L.O. ; nibble. ;Point at data register. ;Signal data taken.
;Wait for transmitter to ; retract data available. ;Wait for data not available. ;Branch if ack. ;Check for ctrl-C
; Synchronize- This procedure waits until it sees all zeros on the input ; bits we receive from the transmitting site. Once it receives ; all zeros, it writes all ones to the output port. When ; all ones come back, it writes all zeros. It repeats this ; process until the transmitting site writes the value 05h. Synchronize
Page 1218
proc
near
print byte byte
“Synchronizing with transmitter program” cr,lf,0
mov mov out mov
dx, al, dx, bx,
MyPortAdrs 0 al TimeOutConst
;Initialize our output port ; to prevent confusion. ;Time out condition.
The PC Parallel Ports SyncLoop: SyncLoop0:
mov inc in dec and cmp je cmp loopne
cx, 0 dx al, dx dx al, 78h al, 78h Got1s al, 0 SyncLoop0
;For time out purposes. ;Point at input port. ;Read our input bits. ;Keep only the data bits. ;Check for all ones. ;Branch if all ones. ;See if all zeros.
; Since we just saw a zero, write all ones to the output port. mov out
al, 0FFh dx, al
;Write all ones
; Now wait for all ones to arrive from the transmitting site. SyncLoop1:
inc in dec and cmp loopne je
dx al, dx dx al, 78h al, 78h SyncLoop1 Got1s
;Point at status register. ;Read status port. ;Point back at data register. ;Keep only the data bits. ;Are they all ones? ;Repeat while not ones. ;Branch if got ones.
; If we’ve timed out, check to see if the user has pressed ctrl-C to ; abort. call dec jne
TestAbort bx SyncLoop
;Check for ctrl-C. ;See if we’ve timed out. ;Repeat if time-out.
print byte byte clc ret
“Receive: connection timed out during synchronization” cr,lf,0 ;Signal time-out.
; Jump down here once we’ve seen both a zero and a one. Send the two ; in combinations until we get a 05h from the transmitting site or the ; user presses Ctrl-C. Got1s:
inc in dec shr and cmp je not out call jmp
dx al, dx dx al, 3 al, 0Fh al, 05h Synchronized al dx, al TestAbort Got1s
;Point at status register. ;Just copy whatever appears ; in our input port to the ; output port until the ; transmitting site sends ; us the value 05h ;Keep inverting what we get ; and send it to xmitter. ;Check for CTRL-C here.
; Okay, we’re synchronized. Return to the caller. Synchronized:
Synchronize
and out print byte byte stc ret endp
al, 0Fh dx, al
;Make sure busy bit is one ; (bit 4=0 for busy=1).
“Synchronized with transmitting site” cr,lf,0
; GetFileInfo- The transmitting program sends us the file length and a ; zero terminated filename. Get that data here. GetFileInfo
proc mov mov
near dx, MyPortAdrs al, 10h
;Set busy bit to zero.
Page 1219
Chapter 21 out
dx, al
;Tell xmit pgm, we’re ready.
; First four bytes contain the filesize: call mov call mov call mov call mov
GetByte byte ptr GetByte byte ptr GetByte byte ptr GetByte byte ptr
FileSize, al FileSize+1, al FileSize+2, al FileSize+3, al
; The next n bytes (up to a zero terminating byte) contain the filename: GetFileName:
GetFileInfo
mov call mov call inc cmp jne
bx, 0 GetByte FileName[bx], al TestAbort bx al, 0 GetFileName
ret endp
; GetFileData- Receives the file data from the transmitting site ; and writes it to the output file. GetFileData
proc
near
; First, see if we have more than 512 bytes left to go cmp jne cmp jbe
word ptr FileSize+2, 0 MoreThan512 word ptr FileSize, 512 LastBlock
;If H.O. word is not ; zero, more than 512. ;If H.O. is zero, just ; check L.O. word.
; We’ve got more than 512 bytes left to go in this file, read 512 bytes ; at this point. MoreThan512: ReadLoop:
mov lea call mov inc loop
cx, 512 bx, FileBuffer GetByte [bx], al bx ReadLoop
;Receive 512 bytes ; from the xmitter. ;Read a byte. ;Save the byte away. ;Move on to next ; buffer element.
; Okay, write the data to the file: mov mov mov lea int jc
ah, 40h bx, FileHandle cx, 512 dx, Filebuffer 21h BadWrite
;DOS write opcode. ;Write to this file. ;Write 512 bytes. ;From this address. ;Quit if error.
; Decrement the file size by 512 bytes: sub sbb jmp
word ptr FileSize, 512 word ptr FileSize, 0 GetFileData
;32-bit subtraction ; of 512.
; Process the last block, that contains 1..511 bytes, here. LastBlock: ReadLB:
Page 1220
mov lea call mov inc loop
cx, word ptr FileSize bx, FileBuffer GetByte [bx], al bx ReadLB
;Receive the last ; 1..511 bytes from ; the transmitter.
The PC Parallel Ports mov mov mov lea int jnc BadWrite:
print byte puti print byte
ah, 40h bx, FileHandle cx, word ptr FileSize dx, Filebuffer 21h Closefile
;Write the last block ; of bytes to the ; file.
“DOS error #”,0 “ while writing data.”,cr,lf,0
; Close the file here. CloseFile:
GetFileData
mov mov int ret endp
bx, FileHandle ah, 3Eh 21h
;Close this file. ;DOS close opcode.
; Here’s the main program that gets the whole ball rolling. Main
proc mov mov meminit
ax, dseg ds, ax
; First, get the address of LPT1: from the BIOS variables area.
GoodOpen:
mov mov mov mov
ax, 40h ;Point at BIOS variable segment. es, ax ax, es:[PrtrBase] MyPortAdrs, ax
call jnc
Synchronize Quit
;Wait for the transmitter program.
call
GetFileInfo
;Get file name and size.
printf byte dword
“Filename: %s\nFile size: %ld\n”,0 Filename, FileSize
mov mov lea int jnc print byte jmp
ah, 3Ch cx, 0 dx, Filename 21h GoodOpen
mov call
FileHandle, ax GetFileData
;Create file. ;Standard attributes
“Error opening file”,cr,lf,0 Quit ;Get the file’s data.
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?) Main
Page 1221
Chapter 21
21.5
Summary The PC’s parallel port, though originally designed for controlling parallel printers, is a general purpose eight bit output port with several handshaking lines you can use to control many other devices in addition to printers. In theory, parallel communications should be many times faster than serial communications. In practice, however, real world constraints and economics prevent this from being the case. Nevertheless, you can still connect high performance devices to the PC’s parallel port. The PC’s parallel ports come in two varieties: unidirectional and bidirectional. The bidirectional versions are available only on PS/2s, certain laptops, and a few other machines. Whereas the eight data lines are output only on the unidirectional ports, you can program them as inputs or outputs on the bidirectional port. While this bidirectional operation is of little value to a printer, it can improve the performance of other devices that connect to the parallel port, such as disk and tape drives, network adapters, SCSI adapters, and so on. When the system communicates with some other device over the parallel port, it needs some way to tell that device that data is available on the data lines. Likewise, the devices needs some way to tell the system that it is not busy and it has accepted the data. This requires some additional signals on the parallel port known as handshaking lines. A typical PC parallel port provides three handshaking signals: the data available strobe, the data taken acknowledge signal, and the device busy line. These lines easily control the flow of data between the PC and some external device. In addition to the handshaking lines, the PC’s parallel port provides several other auxiliary I/O lines as well. In total, there are 12 output lines and five input lines on the PC’s parallel port. There are three I/O ports in the PC’s address space associated with each I/O port. The first of these (at the port’s base address) is the data register. This is an eight bit output register on unidirectional ports, it is an input/output register on bidirectional ports. The second register, at the base address plus one, is the status register. The status register is an input port. Five of those bits correspond to the five input lines on the PC’s parallel port. The third register (at base address plus two) is the control register. Four of these bits correspond to the additional four output bits on the PC, one of the bits controls the IRQ line on the parallel port, and a sixth bit controls the data direction on the birdirectional ports. For more information on the parallel port’s hardware configuration, see: • •
“Basic Parallel Port Information” on page 1199 “The Parallel Port Hardware” on page 1201
Although many vendors use the parallel port to control lots of different devices, a parallel printer is still the device most often connected to the parallel port. There are three ways application programs commonly send data to the printer: by calling DOS to print a character, by calling BIOS’ int 17h ISR to print a character, or by talking directly to the parallel port. You should avoid this last technique because of possible software incompatibilities with other devices that connect to the parallel port. For more information on printing data, including how to write your own int 17h ISR/printer driver, see: • • • •
“Controlling a Printer Through the Parallel Port” on page 1202 “Printing via DOS” on page 1203 “Printing via BIOS” on page 1203 “An INT 17h Interrupt Service Routine” on page 1203
One popular use of the parallel port is to transfer data between two computers; for example, transferring data between a desktop and a laptop machine. To demonstrate how to use the parallel port to control other devices besides printers, this chapter presents a program to transfer data between computers on the unidirectional parallel ports (it also works on bidirectional ports). For all the details, see •
Page 1222
“Inter-Computer Communications on the Parallel Port” on page 1209
The PC Serial Ports
Chapter 22
The RS-232 serial communication standard is probably the most popular serial communication scheme in the world. Although it suffers from many drawbacks, speed being the primary one, it use is widespread and there are literally thousands of devices you can connect to a PC using an RS-232 interface. The PC supports up to four RS-232 compatible devices using the COM1:, COM2:, COM3:, and COM4: devices1. For those who need even more serial devices (e.g., to control an electronic bulletin board system [BBS], you can even buy devices that let you add 16, or more, serial ports to the PC. Since most PCs only have one or two serial ports, we will concentrate on how to use COM1: and COM2: in this chapter. Although, in theory, the PC’s original design allows system designers to implement the serial communication ports using any hardware they desire, much of today’s software that does serial communication talks directly to the 8250 Serial Communications Chip (SCC) directly. This introduces the same compatibility problems you get when you talk directly to the parallel port hardware. However, whereas the BIOS provides an excellent interface to the parallel port, supporting anything you would wish to do by going directly to the hardware, the serial support is not so good. Therefore, it is common practice to bypass the BIOS int 14h functions and control the 8250 SCC chip directly so software can access every bit of every register on the 8250. Perhaps an even greater problem with the BIOS code is that it does not support interrupts. Although software controlling parallel ports rarely uses interrupt driven I/O2, it is very common to find software that provides interrupt service routines for the serial ports. Since the BIOS does not provide such routines, any software that wants to use interrupt driven serial I/O will need to talk directly to the 8250 and bypass BIOS anyway. Therefore, the first part of this chapter will discuss the 8250 chip. Manipulating the serial port is not difficult. However, the 8250 SCC contains lots of registers and provides many features. Therefore it takes a lot of code to control every feature of the chip. Fortunately, you do not have to write that code yourself. The UCR Standard Library provides an excellent set of routines that let you control the 8250. They even an interrupt service routine allowing interrupt driven I/O. The second part of this chapter will present the code from the Standard Library as an example of how to program each of the registers on the 8250 SCC.
22.1
The 8250 Serial Communications Chip The 8250 and compatible chips (like the 16450 and 16550 devices) provide nine I/O registers. Certain upwards compatible devices (e.g., 16450 and 16550) provide a tenth register as well. These registers consume eight I/O port addresses in the PC’s address space. The hardware and locations of the addresses for these devices are the following:
Table 81: COM Port Addresses Port
Physical Base Address (in hex)
BIOS variable Containing Physical Addressa
COM1:
3F8
40:0
COM2:
2F8
40:2
a. Locations 40:4 and 40:6 contain the logical addresses for COM3: and COM4:, but we will not consider those ports here.
1. Most programs support only COM1: and COM2:. Support for additional serial devices is somewhat limited among various applications. 2. Because many parallel port adapters do not provide hardware support for interrupts.
Page 1223 Thi d
t
t d ith F
M k
402
Chapter 22
Like the PC’s parallel ports, we can swap COM1: and COM2: at the software level by swapping their base addresses in BIOS variable 40:0 and 40:2. However, software that goes directly to the hardware, especially interrupt service routines for the serial ports, needs to deal with hardware addresses, not logical addresses. Therefore, we will always mean I/O base address 3F8h when we discuss COM1: in this chapter. Likewise, we will always mean I/O base address 2F8h when we discuss COM2: in this chapter. The base address is the first of eight I/O locations consumed by the 8250 SCC. The exact purpose of these eight I/O locations appears in the following table:
Table 82: 8250 SCC Registers I/O Address (hex)
Description
3F8/2F8
Receive/Transmit data register. Also the L.O. byte of the Baud Rate Divisor Latch register.
3F9/2F9
Interrupt Enable Register. Also the H.O. byte of the Baud Rate Divisor Register.
3FA/2FA
Interrupt Identification Register (read only).
3FB/2FB
Line Control Register.
3FC/2FC
Modem Control Register.
3FD/2FD
Line Status Register (read only).
3FE/2FE
Modem Status Register (read only).
3FF/2FF
Shadow Receive Register (read only, not available on original PCs).
The following sections describe the purpose of each of these registers.
22.1.1 The Data Register (Transmit/Receive Register) The data register is actually two separate registers: the transmit register and the receive register. You select the transmit register by writing to I/O addresses 3F8h or 2F8h, you select the receive register by reading from these addresses. Assuming the transmit register is empty, writing to the transmit register begins a data transmission across the serial line. Assuming the receive register is full, reading the receive register returns the data. To determine if the transmitter is empty or the receiver is full, see the Line Status Register. Note that the Baud Rate Divisor register shares this I/O address with the receive and transmit registers. Please see “The Baud Rate Divisor” on page 1225 and “The Line Control Register” on page 1227 for more information on the dual use of this I/O location.
22.1.2 The Interrupt Enable Register (IER) When operating in interrupt mode, the 8250 SCC provides four sources of interrupt: the character received interrupt, the transmitter empty interrupt, the communication error interrupt, and the status change interrupt. You can individually enable or disable these interrupt sources by writing ones or zeros to the 8250 IER (Interrupt Enable Register). Writing a zero to a corresponding bit disables that particular interrupt. Writing a one enables that interrupt. This register is read/write, so you can interrogate the current settings at any time (for example, if you want to mask in a particular interrupt without affecting the others). The layout of this register is
Page 1224
The PC Serial Ports
7
6
5
4
3
2
1
0 Data Available Interrupt Transmitter Empty Interrupt Error or Break Interrupt Status Change Interrupt Unused (should be zero)
Serial Port Interrupt Enable Register (IER) The interrupt enable register I/O location is also common with the Baud Rate Divisor Register. Please see the next section and “The Line Control Register” on page 1227 for more information on the dual use of this I/O location.
22.1.3 The Baud Rate Divisor The Baud Rate Divisor Register is a 16 bit register that shares I/O locations 3F8h/2F8h and 3F9h/2F9h with the data and interrupt enable registers. Bit seven of the Line Control Register (see “The Line Control Register” on page 1227) selects the divisor register or the data/interrupt enable registers. The Baud Rate Divisor register lets you select the data transmission rate (properly called bits per second, or bps, not baud3). The following table lists the values you should write to these registers to control the transmission/reception rate:
Table 83: Baud Rate Divisor Register Values Bits Per Second
3F9/3F9 Value
3F8/2F8 Value
110
4
17h
300
1
80h
600
0
C0h
1200
0
60h
1800
0
40h
2400
0
30h
3600
0
20h
4800
0
18h
9600
0
0Ch
19.2K
0
6
38.4K
0
3
56K
0
1
3. The term “baud” describes the rate at which tones can change on a modem/telephone line. It turns out that, with normal telephone lines, the maximum baud rate is 600 baud. Modems that operate at 1200 bps use a different technique (beyond switching tones) to increase the data transfer rate. In general, there is no such thing as a “1200 baud,” “9600 baud,” or “14.4 kbaud” modem. Properly, these are 1200 bps, 9600bps, and 14.4K bps modems.
Page 1225
Chapter 22
You should only operate at speeds greater than 19.2K on fast PCs with high performance SCCs (e.g., 16450 or 16550). Furthermore, you should use high quality cables and keep your cables very short when running at high speeds.
22.1.4 The Interrupt Identification Register (IIR) The Interrupt Identification Register is a read-only register that specifies whether an interrupt is pending and which of the four interrupt sources requires attention. This register has the following layout:
7
6
5
4
3
2
1
0
Interrupt pending if zero (no interrupt if one) Interrupt source: 00: Status change interrupt 01: Transmitter empty interrupt 10: Data available interrupt 11: Error or break interrupt Always zero.
Interrupt Identification Register (IIR) Since the IIR can only report one interrupt at a time, and it is certainly possible to have two or more pending interrupts, the 8250 SCC prioritizes the interrupts. Interrupt source 00 (status change) has the lowest priority and interrupt source 11 (error or break) has the highest priority; i.e., the interrupt source number provides the priority (with three being the highest priority). The following table describes the interrupt sources and how you “clear” the interrupt value in the IIR. If two interrupts are pending and you service the higher priority request, the 8250 SCC replaces the value in the IIR with the identification of the next highest priority interrupt source.
Table 84: Interrupt Cause and Release Functions Priority
ID Value
Interrupt
Caused By
Reset By
Highest
11b
Error or Break
Overrun error, parity error, framing error, or break interrupt.
Reading the Line Status Register.
Next to highest
10b
Data available
Data arriving from an external source in the Receive Register.
Reading the Receive Register.
Next to lowest
01b
Transmitter empty
The transmitter finishes sending data and is ready to accept additional data.
Reading the IIR (with an interrupt ID of 01b) or writing to the Data Register.
Lowest
00b
Modem Status
Change in clear to send, data set ready, ring indicator, or received line signal detect signals.
Reading the modem status register.
One interesting point to note about the organization of the IIR: the bit layout provides a convenient way to transfer control to the appropriate section of the SCC interrupt service routine. Consider the following code: . . . in
Page 1226
al, dx
;Read IIR.
The PC Serial Ports mov mov jmp word
HandlerTbl
bl, al bh, 0 HandlerTbl[bx] RLSHandler, RDHandler, TEHandler, MSHandler
When an interrupt occurs, bit zero of the IIR will be zero. The next two bits contain the interrupt source number and the H.O. five bits are all zero. This lets us use the IIR value as the index into a table of pointers to the appropriate handler routines, as the above code demonstrates.
22.1.5 The Line Control Register The Line Control Register lets you specify the transmission parameters for the SCC. This includes setting the data size, number of stop bits, parity, forcing a break, and selecting the Baud Rate Divisor Register (see “The Baud Rate Divisor” on page 1225). The Line Control Register is laid out as follows:
7
6
5
4
3
2
1
0
Word length, 00= 5 bits, 01= 6 bits 10= 7 bits, 11= 8 bits. Stop bits (0=1, 1=2) Parity enable (0=diabled, 1=enabled) Parity control 00 = odd parity 01 = even parity 10 = parity is always 1 11 = parity is always 0 Transmit break while 1. Baud Rate Divisor Latch
Line Control Register (LCR) The 8250 SCC can transmit serial data as groups of five, six, seven, or eight bits. Most modern serial communication systems use seven or eight bits for transmission (you only need seven bits to transmit ASCII, eight bits to transmit binary data). By default, most applications transmit data using eight data bits. Of course, you always read eight bits from the receive register; the 8250 SCC pads all H.O. bits with zero if you are receiving less than eight bits. Note that if you are only transmitting ASCII characters, the serial communications will run about 10% faster with seven bit transmission rather than with eight bit transmission. This is an important thing to keep in mind if you control both ends of the serial cable. On the other hand, you will usually be connecting to some device that has a fixed word length, so you will have to program the SCC specifically to match that device. A serial data transmission consists of a start bit, five to eight data bits, and one or two stop bits. The start bit is a special signal that informs the SCC (or other device) that data is arriving on the serial line. The stop bits are, essentially, the absence of a start bit to provide a small amount of time between the arrival of consecutive characters on the serial line. By selecting two stop bits, you insert some additional time between the transmission of each character. Some older devices may require this additional time or they will get confused. However, almost all modern serial devices are perfectly happy with a single stop bit. Therefore, you should usually program the chip with only one stop bit. Adding a second stop bit increases transmission time by about 10%. The parity bits let you enable or disable parity and choose the type of parity. Parity is an error detection scheme. When you enable parity, the SCC adds an extra bit (the parity bit) to the transmission. If you select odd parity, the parity bit contains a zero or one so that the L.O. bit of the sum of the data and parity Page 1227
Chapter 22
bits is one. If you select even parity, the SCC produces a parity bit such that the L.O. bit of the sum of the parity and data bits is zero. The “stuck parity” values (10b and 11b) always produce a parity bit of zero or one. The main purpose of the parity bit is to detect a possible transmission error. If you have a long, noisy, or otherwise bad serial communications channel, it is possible to lose information during transmission. When this happens, it is unlikely that the sum of the bits will match the parity value. The receiving site can detect this “parity error” and report the error in transmission. You can also use the stuck parity values (10b and 11b) to strip the eighth bit and always replace it with a zero or one during transmission. For example, when transmitting eight bit PC/ASCII characters to a different computer system it is possible that the PC’s extended character set (those characters whose code is 128 or greater) does not map to the same character on the destination machine. Indeed, sending such characters may create problems on that machine. By setting the word size to seven bits and the parity to enabled and stuck at zero, you can automatically strip out all H.O. bits during transmission, replacing them with zero. Of course, if any extended characters come along, the SCC will map them to possibly unrelated ASCII characters, but this is a useful trick, on occasion. The break bit transmits a break signal to the remote system as long as there is a one programmed in this bit position. You should not leave break enabled while trying to transmit data. The break signal comes from the teletype days. A break is similar to ctrl-C or ctrl-break on the PC’s keyboard. It is supposed to interrupt a program running on a remote system. Note that the SCC can detect an incoming break signal and generate an appropriate interrupt, but this break signal is coming from the remote system, it is not (directly) connected to the outgoing break signal the LCR controls. Bit seven of the LCR is the Baud Rate Divisor Register latch bit. When this bit contains a one, locations 3F8h/2F8h and 3F9h/2F9h become the Baud Rate Divisor Register. When this bit contains a zero, those I/O locations correspond to the Data Registers and the Interrupt Enable Registers. You should always program this bit with a zero except while initializing the speed of the SCC. The LCR is a read/write register. Reading the LCR returns the last value written to it.
22.1.6 The Modem Control Register The 8250’s Modem Control Register contains five bits that let you directly control various output pins on the 8250 as well as enable the 8250’s loopback mode. The following diagram displays the contents of this register:
7
6
5
4
3
2
1
0
Data Terminal Ready (DTR) Request To Send (RTS) OUT 1 Interrupt Enable (OUT 2) Loopback mode (enabled if 1) Always zero
Modem Control Register (MCR) The 8250 routes the DTR and RTS bits directly to the DTR and RTS lines on the 8250 chip. When these bits are one, the corresponding outputs are active4. These lines are two separate handshake lines for RS-232 communications. 4. It turns out that the DTR and RTS lines are active low, so the 8250 actually inverts these lines on their way out. However, the receiving site reinverts these lines so the receiving site (if it is an 8250 SCC) will read these bits as one when they are active. See the description of the line status register for details.
Page 1228
The PC Serial Ports
The DTR signal is comparable to a busy signal. When a site’s DTR line is inactive, the other site is not supposed to transmit data to it. The DTR line is a manual handshake line. It appears as the Data Set Ready (DSR) line on the other side of the serial cable. The other device must explicitly check its DSR line to see if it can transmit data. The DTR/DSR scheme is mainly intended for handshaking between computers and modems. The RTS line provides a second form of handshake. It’s corresponding input signal is CTS (Clear To Send). The RTS/CTS handshake protocol is mainly intended for directly connected devices like computers and printers. You may ask “why are there two separate, but orthogonal handshake protocols?” The reason is because RS-232C has developed over the last 100 years (from the days of the first telegraphs) and is the result of combining several different schemes over the years. Out1 is a general purpose output on the SCC that has very little use on the IBM PC. Some adapter boards connect this signal, other leave it disconnected. In general, this bit has no function on PCs. The Interrupt Enable bit is a PC-specific item. This is normally a general purpose output (OUT 2) on the 8250 SCC. However, IBM’s designers connected this output to an external gate to enable or disable all interrupts from the SCC. This bit must be programmed with a one to enable interrupts. Likewise, you must ensure that this bit contains a zero if you are not using interrupts. The loopback bit connects the transmitter register to the receive register. All data sent out the transmitter immediately comes back in the receive register. This is useful for diagnostics, testing software, and detecting the serial chip. Note, unfortunately, that the loopback circuit will not generate any interrupts. You can only use this technique with polled I/O. The remaining bits in the MCR are reserved should always contain zero. Future versions of the SCC (or compatible chips) may use these bits for other purposes, with zero being the default (8250 simulation) state. The MCR is a read/write register. Reading the MCR returns the last value written to it.
22.1.7 The Line Status Register (LSR) The Line Status Register (LSR) is a read-only register that returns the current communication status. The bit layout for this register is the following:
7
6
5
4
3
2
1
0
Data Available (if 1) Overrun error (if 1) Parity error (if 1) Framing error (if 1) Break interrupt (if 1) Transmitter holding register Empty (if 1) Transmitter shift register empty (if 1) Unused
Line Status Register (LSR) The data available bit is set if there is data available in the Receive Register. This also generates an interrupt. Reading the data in the Receive Register clears this bit. The 8250 Receive Register can only hold one byte at a time. If a byte arrives and the program does not read it and then a second byte arrives, the 8250 wipes out the first byte with the second. The 8250 SCC sets Page 1229
Chapter 22
the overrun error bit when this occurs. Reading the LSR clears this bit (after reading the LSR). This error will generate the high priority error interrupt. The 8250 sets the parity bit if it detects a parity error when receiving a byte. This error only occurs if you have enabled the parity operation in the LCR. The 8250 resets this bit after you read the LSR. When this error occurs, the 8250 will generate the error interrupt. Bit three is the framing error bit. A framing error occurs if the 8250 receives a character without a valid stop bit. The 8250 will clear this bit after you read the LSR. This error will generate the high priority error interrupt. The 8250 sets the break interrupt bit when it receives the break signal from the transmitting device. This will also generate an error interrupt. Reading the LSR clears this bit. The 8250 sets bit five, the transmitter holding register empty bit, when it is okay to write another character to the Data Register. Note that the 8250 actually has two registers associated with the transmitter. The transmitter shift register contains the data actually being shifted out over the serial line. The transmitter holding register holds a value that the 8250 writes to the shift register when it finishes shifting out a character. Bit five indicates that the holding register is empty and the 8250 can accept another byte. Note that the 8250 might still be shifting out a character in parallel with this operation. The 8250 can generate an interrupt when the transmitter holding register is empty. Reading the LSR or writing to the Data Register clears this bit. The 8250 sets bit six when both the transmitter holding and transmitter shift registers are empty. This bit is clear when either register contains data.
22.1.8 The Modem Status Register (MSR) The Modem Status Register (MSR) reports the status of the handshake and other modem signals. Four bits provide the instantaneous values of these signals, the 8250 sets the other four bits if any of these signals change since the last time the CPU interrogates the MSR. The MSR has the following layout:
7
6
5
4
3
2
1
0
Clear To Send has changed. Data Set Ready has changed Trailing edge of Ring Indicator Data Carrier Dectect has changed Clear To Send Data Set Ready Ring Indicator Data Carrier Detect
Modem Status Register (MSR) The Clear To Send bit (bit #4) is a handshaking signal. This is normally connected to the RTS (Request To Send) signal on the remove device. When that remote device asserts its RTS line, data transmission can take place. The Data Set Ready bit (bit #5) is one if the remote device is not busy. This input is generally connected to the Data Terminal Ready (DTR) line on the remote device. The 8250 chip sets the Ring Indicator bit (bit #6) when the modem asserts the ring indicator line. You will rarely use this signal unless you are writing modem controlling software that automatically answers a telephone call.
Page 1230
The PC Serial Ports
The Data Carrier Detect bit (DCD, bit #7) is another modem specific signal. This bit contains a one while the modem detects a carrier signal on the phone line. Bits zero through three of the MSR are the “delta” bits. These bits contain a one if their corresponding modem status signal changes. Such an occurrence will also generate a modem status interrupt. Reading the MSR will clear these bits.
22.1.9 The Auxiliary Input Register The auxiliary input register is available only on later model 8250 compatible devices. This is a read-only register that returns the same value as reading the data register. The difference between reading this register and reading the data register is that reading the auxiliary input register does not affect the data available bit in the LSR. This allows you to test the incoming data value without removing it from the input register. This is useful, for example, when chaining serial chip interrupt service routines and you want to handle certain “hot” values in one ISR and pass all other characters on to a different serial ISR.
22.2
The UCR Standard Library Serial Communications Support Routines Although programming the 8250 SCC doesn’t seem like a real big problem, invariably it is a difficult chore (and tedious) to write all the software necessary to get the serial communication system working. This is especially true when using interrupt driven serial I/O. Fortunately, you do not have to write this software from scratch, the UCR Standard library provides 21 support routines that trivialize the use of the serial ports on the PC. About the only drawback to these routines is that they were written specifically for COM1:, although it isn’t too much work to modify them to work with COM2:. The following table lists the available routines:
Table 85: Standard Library Serial Port Support Name
Inputs
ComBaud
AX: bps (baud rate) = 110, 150, 300, 600, 1200, 2400, 4800, 9600, or 19200
ComStop
AX: 1 or 2
ComSize
AX: word size (5, 6, 7, or 8)
ComParity
AX: Parity selector. If bit zero is zero, parity off, if bit zero is one, bits one and two are: 00 - odd parity 01 - even parity 10 - parity stuck at 0 11 - parity stuck at 1
ComRead
Outputs
Description Sets the communication rate for the serial port. ComBaud only supports the specified speeds. If ax contains some other value on entry, ComBaud ignores the value. Sets the number of stop bits. The ax register contains the number of stop bits to use (1 or 2). Sets the number of data bits. The ax register contains the number of bits to transmit for each byte on the serial line. Sets the parity (if any) for the serial communications.
AL- Character read from port.
Waits until a character is available from in the data register and returns that character. Used for polled I/O on the serial port. Do not use if you’ve activated the serial interrupts (see ComInitIntr).
Page 1231
Chapter 22
Table 85: Standard Library Serial Port Support Name
Inputs
ComWrite
AL- Character to write.
Outputs
Description Waits until the transmitter holding register is empty, then writes the character in al to the output register. Used for polled I/O on the serial port. Do not use with interrupts activated.
ComTstIn
AL=0 if no character, AL=1 if char avail.
Test to see if a character is available at the serial port. Use only for polling I/O, do not use with interrupts activated.
ComTstOut
AL=0 if transmitter busy, AL=1 if not busy.
Test to see if it is okay to write a character to the output register. Use with polled I/O only, do not use with interrupts active.
ComGetLSR
AL= Current LSR value.
ComGetMSR
AL= Current MSR Value.
ComGetMCR
AL= Current MCR Value.
Returns the current MCR value in the al register. See the section on the MCR for more details.
AL= Current LCR Value.
Returns the current LCR value in the al register. See the section on the LCR for more details.
ComGetIIR
AL= Current IIR Value.
Returns the current IIR value in the al register. See the section on the IIR for more details.
ComGetIER
AL= Current IER Value.
ComSetMCR
AL = new MCR Value
ComGetLCR ComSetLCR
ComSetIER
AL = new LCR Value
AL = new IER Value
Returns the current LSR value in the al register. See the section on the LSR for more details. Returns the current MSR value in the al register. See the section on the MSR for more details.
Stores the value in al into the MCR register. See the section on the MCR for more details.
Stores the value in al into the LCR register. See the section on the LCR for more details.
Returns the current IER value in the al register. See the section on the IER for more details. Stores the value in al into the IER register. See the section on the IER for more details.
ComInitIntr
Initializes the system to support interrupt driven serial I/O. See details below.
ComDisIntr
Resets the system back to polled serial I/O
ComIn
Reads a character from the serial port when operating with interrupt driven I/O.
ComOut
Writes a character to the serial port using interrupt driven I/O.
The interrupt driven I/O features of the Standard Library routines deserve further explanation. When you call the ComInitIntr routine, it patches the COM1: interrupt vectors (int 0Ch), enables IRQ 4 in the 8259A PIC, and enables read and write interrupts on the 8250 SCC. One thing this call does not do that you should is patch the break and critical error exception vectors (int 23h and int 24h) to handle any program aborts that come along. When your program quits, either normally or via one of the above exceptions, it must call ComDisIntr to disable the interrupts. Otherwise, the next time a character arrives at the serial port the machine may crash since it will attempt to jump to an interrupt service routine that might not be there anymore. The ComIn and ComOut routines handle interrupt driven serial I/O. The Standard Library provides a reasonable input and output buffer (similar to the keyboard’s type ahead buffer), so you do not have to worry about losing characters unless your program is really, really slow or rarely reads any data from the serial port.
Page 1232
The PC Serial Ports
Between the ComInitIntr and ComDisIntr calls, you should not call any other serial support routines except ComIn and ComOut. The other routines are intended for polled I/O or initialization. Obviously, you should do any necessary initialization before enabling interrupts, and there is no need to do polled I/O while the interrupts are operational. Note that there is no equivalent to ComTstIn and ComTstOut while operating in interrupt mode. These routines are easy to write, instructions appear in the next section.
22.3
Programming the 8250 (Examples from the Standard Library) The UCR Standard Library Serial Communication routines provide an excellent example of how to program the 8250 SCC directly, since they use nearly all the features of that chip on the PC. Therefore, this section will list each of the routines and describe exactly what that routine is doing. By studying this code, you can learn about all the details associated with the SCC and discover how to extend or otherwise modify the Standard Library routines. ; Useful equates: BIOSvars Com1Adrs Com2Adrs
= = =
40h 0 2
BufSize
=
256
;BIOS segment address. ;Offset in BIOS vars to COM1: address. ;Offset in BIOS vars to COM2: address. ;# of bytes in buffers.
; Serial port equates. If you want to support COM2: rather than COM1:, simply ; change the following equates to 2F8h, 2F9h, ... ComPort ComIER ComIIR ComLCR ComMCR ComLSR ComMSR ; ; ; ; ; ;
= = = = = = =
3F8h 3F9h 3FAh 3FBh 3FCh 3FDh 3FEh
Variables, etc. This code assumes that DS=CS. That is, all the variables are in the code segment. Pointer to interrupt vector for int 0Ch in the interrupt vector table. Note: change these values to 0Bh*4 and 0Bh*4 + 2 if you want to support the COM2: pot.
int0Cofs equ int0Cseg equ
es:[0Ch*4] es:[0Ch*4 + 2]
OldInt0c
dword
; ; ; ;
?
Input buffer for incoming character (interrupt operation only). See the chapter on data structures and the description of circular queus for details on how this buffer works. It operates in a fashion not unlike the keyboard’s type ahead buffer.
InHead InTail InpBuf InpBufEnd
word word byte equ
InpBuf InpBuf Bufsize dup (?) this byte
; Output buffer for characters waiting to transmit. OutHead OutTail OutBuf OutBufEnd
word word byte equ
OutBuf OutBuf BufSize dup (?) this byte
; The i8259a variable holds a copy of the PIC’s IER so we can restore it ; upon removing our interrupt service routines from memory.
Page 1233
Chapter 22 i8259a
byte
0
;8259a interrupt enable register.
; The TestBuffer variable tells us whether we have to buffer up characters ; or if we can store the next character directly into the 8250’s output ; register (See the ComOut routine for details). TestBuffer
db
0
The first set of routines provided by the Standard Library let you initialize the 8250 SCC. These routines provide “programmer friendly” interfaces to the baud rate divisor and line control registers. They let you set the baud rate, data size, number of stop bits, and parity options on the SCC. The ComBaud routine sets the 8250’s transfer rate (in bits per second). This routine provides a nice “programmer’s interface” to the 8250 SCC. Rather than having to compute the baud rate divisor value yourself, you can simply load ax with the bps value you want and simply call this routine. Of course, one problem is that you must choose a bps value that this routine supports or it will ignore the baud rate change request. Fortunately, this routine supports all the common bps rates; if you need some other value, it is easy to modify this code to allow those other rates. This code consists of two parts. The first part compares the value in ax against the set of valid bps values. If it finds a match, it loads ax with the corresponding 16 bit divisor constant. The second part of this code switches on the baud rate divisor registers and stores the value in ax into these registers. Finally, it switches the first two 8250 I/O registers back to the data and interrupt enable registers. Note: This routine calls a few routines, notably ComSetLCR and ComGetLCR, that we will define a little later. These routines do the obvious functions, they read and write the LCR register (preserving registers, as appropriate). ComBaud
Page 1234
proc push push cmp ja je cmp ja je cmp ja je cmp ja je mov jmp
ax dx ax, 9600 Set19200 Set9600 ax, 2400 Set4800 Set2400 ax, 600 Set1200 Set600 ax, 150 Set300 Set150 ax, 1047 SetPort
Set150:
mov jmp
ax, 768 SetPort
;Divisor value for 150 bps.
Set300:
mov jmp
ax, 384 SetPort
;Divisor value for 300 bps.
Set600:
mov jmp
ax, 192 SetPort
;Divisor value for 600 bps.
Set1200:
mov jmp
ax, 96 SetPort
;Divisor value for 1200 bps.
Set2400:
mov jmp
ax, 48 SetPort
;Divisor value for 2400 bps.
Set4800:
mov jmp
ax, 24 SetPort
;Divisor value for 4800 bps.
Set9600:
mov jmp
ax, 12 short SetPort
;Divisor value for 9600 bps.
;Default to 110 bps.
The PC Serial Ports Set19200: SetPort:
ComBaud
mov mov call push or call mov mov out inc mov out pop call pop pop ret endp
ax, 6 dx, ax GetLCRCom ax al, 80h SetLCRCom ax, dx dx, ComPort dx, al dx al, ah dx, al ax SetLCRCom1 dx ax
;Divisor value for 19.2 kbps. ;Save baud value. ;Fetch LCR value. ;Save old divisor bit value. ;Set divisor select bit. ;Write LCR value back. ;Get baud rate divisor value. ;Point at L.O. byte of divisor reg. ;Output L.O. byte of divisor. ;Point at the H.O. byte. ;Put H.O. byte in AL. ;Output H.O. byte of divisor. ;Retrieve old LCR value. ;Restore divisor bit value.
The ComStop routine programs the LCR to provide the specified number of stop bits. On entry, ax should contain either one or two (the number of stop bits you desire). This code converts that to zero or one and writes the resulting L.O. bit to the stop bit field of the LCR. Note that this code ignores the other bits in the ax register. This code reads the LCR, masks out the stop bit field, and then inserts the value the caller specifies into that field. Note the usage of the shl ax, 2 instruction; this requires an 80286 or later processor. comStop
comStop
proc push push dec and shl mov call and or call pop pop ret endp
ax dx ax al, 1 ax, 2 ah, al ComGetLCR al, 11111011b al, ah ComSetLCR dx ax
;Convert 1 or 2 to 0 or 1. ;Strip other bits. ;position into bit #2. ;Save our output value. ;Read LCR value. ;Mask out Stop Bits bit. ;Merge in new # of stop bits. ;Write result back to LCR.
The ComSize routine sets the word size for data transmission. As usual, this code provides a “programmer friendly” interface to the 8250 SCC. On enter, you specify the number of bits (5, 6, 7, or 8) in the ax register, you do not have to worry an appropriate bit pattern for the 8250’s LCR register. This routine will compute the appropriate bit pattern for you. If the value in the ax register is not appropriate, this code defaults to an eight bit word size. ComSize
Okay:
comsize
proc push push sub cmp jbe mov mov call and or call pop pop ret endp
ax dx al, 5 al, 3 Okay al, 3 ah, al ComGetLCR al, 11111100b al, ah ComSetLCR dx ax
;Map 5..8 -> 00b, 01b, 10b, 11b ;Default to eight bits. ;Save new bit size. ;Read current LCR value. ;Mask out old word size. ;Merge in new word size. ;Write new LCR value back.
Page 1235
Chapter 22
The ComParity routine initializes the parity options on the 8250. Unfortunately, there is little possibility of a “programmer friendly” interface to this routine, So this code requires that you pass one of the following values in the ax register:
Table 86: ComParity Input Parameters Value in AX
Description
0
Disable parity.
1
Enable odd parity checking.
3
Enable even parity checking.
5
Enable stuck parity bit with value one.
7
Enable stuck parity bit with value zero.
comparity
comparity
proc push push shl and mov call and or call pop pop ret endp
ax dx al, 3 al, 00111000b ah, al ComGetLCR al, 11000111b al, ah ComSetLCR dx ax
;Move to final position in LCR. ;Mask out other data. ;Save for later. ;Get current LCR value. ;Mask out existing parity bits. ;Merge in new bits. ;Write results back to the LCR.
The next set of serial communication routines provide polled I/O support. These routines let you easily read characters from the serial port, write characters to the serial port, and check to see if there is data available at the input port or see if it is okay to write data to the output port. Under no circumstances should you use these routines when you’ve activated the serial interrupt system. Doing so may confuse the system and produce incorrect data or loss of data. The ComRead routine is comparable to getc – it waits until data is available at the serial port, reads that data, and returns it in the al register. This routine begins by making sure we can access the Receive Data register (by clearing the baud rate divisor latch bit in the LCR). ComRead
WaitForChar:
Page 1236
proc push call push and call call test jz mov in mov pop call mov pop ret
dx GetLCRCom ax al, 7fh SetLCRCom GetLSRCom al, 1 WaitForChar dx, comPort al, dx dl, al ax SetLCRCom al, dl dx
;Save divisor latch access bit. ;Select normal ports. ;Write LCR to turn off divisor reg. ;Get data available bit from LSR. ;Data Available? ;Loop until data available. ;Read the data from the input port. ;Save character ;Restore divisor access bit. ;Write it back to LCR. ;Restore output character.
The PC Serial Ports ComRead endp
The ComWrite routine outputs the character in al to the serial port. It first waits until the transmitter holding register is empty, then it writes the output data to the output register. ComWrite
WaitForXmtr:
ComWrite
proc push push mov call push and call call test jz mov mov out pop call pop pop ret endp
dx ax dl, al GetLCRCom ax al, 7fh SetLCRCom GetLSRCom al, 00100000b WaitForXmtr al, dl dx, ComPort dx, al ax SetLCRCom ax dx
;Save character to output ;Switch to output register. ;Save divisor latch access bit. ;Select normal input/output ports ; rather than divisor register. ;Read LSR for xmit empty bit. ;Xmtr buffer empty? ;Loop until empty. ;Get output character. ;Store it in the ouput port to ; get it on its way. ;Restore divisor access bit.
The ComTstIn and ComTstOut routines let you check to see if a character is available at the input port (ComTstIn) or if it is okay to send a character to the output port (ComTstOut). ComTstIn returns zero or one in al if data is not available or is available, respectively. ComTstOut returns zero or one in al if the transmitter register is full or empty, respectively. ComTstIn
ComTstIn ComTstOut
toc1: ComTstOut
proc call and ret endp proc push call test mov jz inc ret endp
GetComLSR ax, 1
;Keep only data available bit.
dx ComGetLSR al, 00100000b al, 0 toc1 ax
;Get the line status. ;Mask Xmitr empty bit. ;Assume not empty. ;Branch if not empty. ;Set to one if it is empty.
The next set of routines the Standard Library supplies load and store the various registers on the 8250 SCC. Although these are all trivial routines, they allow the programmer to access these register by name without having to know the address. Furthermore, these routines all preserve the value in the dx register, saving some code in the calling program if the dx register is already in use. The following routines let you read (“Get”) the value in the LSR, MSR, LCR, MCR, IIR, and IER registers, returning said value in the al register. They let you write (“Set”) the value in al to any of the LCR, MCR, and IER registers. Since these routines are so simple and straight-forward, there is no need to discuss each routine individually. Note that you should avoid calling these routines outside an SCC ISR while in interrupt mode, since doing so can affect the interrupt system on the 8250 SCC.
Page 1237
Chapter 22 ComGetLSR
ComGetLSR
ComGetMSR
ComGetMSR
ComSetMCR
ComSetMCR
ComGetMCR
ComGetMCR
ComGetLCR
ComGetLCR
ComSetLCR
ComSetLCR
ComGetIIR
ComGetIIR
Page 1238
proc push mov in pop ret endp
proc push mov in pop ret endp
proc push mov out pop ret endp
proc push mov in pop ret endp
proc push mov in pop ret endp
proc push mov out pop ret endp
proc push mov in pop ret endp
;Returns the LSR value in the AL reg. dx dx, comLSR al, dx dx
;Select LSR register. ;Read and return the LSR value.
;Returns the MSR value in the AL reg. dx dx, comMSR al, dx dx
;Select MSR register. ;Read and return MSR value.
;Stores AL’s value to the MCR reg. dx dx, comMCR dx, al dx
;Point at MCR register. ;Output value in AL to MCR.
;Stores value in AL into MCR reg. dx dx, comMCR al, dx dx
;Select MCR register. ;Read value from MCR register into AL.
;Return the LCR value in the AL reg. dx dx, comLCR al, dx dx
;Point at LCR register. ;Read and return LCR value.
;Write a new value to the LCR. dx dx, comLCR dx, al dx
;Point at LCR register. ;Write value in AL to the LCR.
;Return the value in the IIR. dx dx, comIIR al, dx dx
;Select IIR register. ;Read IIR value into AL and return.
The PC Serial Ports ComGetIER
ComGetIER ComSetIER
ComSetIER
proc push call push and call mov in mov pop call mov pop ret endp proc push push mov call push and call mov mov out pop call pop pop ret endp
;Return IER value in AL. dx ComGetLCR ax al, 7fh ComSetLCR dx, comIER al, dx dl, al ax ComSetLCR al, dl dx
;Need to select IER register by saving ; the LCR value and then clearing the ; baud rate divisor latch bit. ;Address the IER. ;Read current IER value. ;Save for now ;Retrieve old LCR value (divisor latch). ;Restore divisor latch ;Restore IER value
;Writes value in AL to the IER. dx ax ah, al ComGetLCR ax al, 7fh ComSetLCR al, ah dx, comIER dx, al ax ComSetLCR ax dx
;Save AX’s value. ;Save IER value to output. ;Get and save divsor access ; bit. ;Clear divisor access bit. ;Retrieve new IER value. ;Select IER register ;Output IER value. ;Restore divisor latch bit.
The last set of serial support routines appearing in the Standard Library provide support for interrupt driven I/O. There are five routines in this section of the code: ComInitIntr, ComDisIntr, ComIntISR, ComIn, and ComOut. The ComInitIntr initializes the serial port interrupt system. It saves the old int 0Ch interrupt vector, initializes the vector to point at the ComIntISR interrupt service routine, and properly initializes the 8259A PIC and 8250 SCC for interrupt based operation. ComDisIntr undoes everything the ComDisIntr routine sets up; you need to call this routine to disable interrupts before your program quits. ComOut and ComIn transfer data to and from the buffers described in the variables section; the ComIntISR routine is responsible for removing data from the transmit queue and sending over the serial line as well as buffering up incoming data from the serial line. The ComInitIntr routine initializes the 8250 SCC and 8259A PIC for interrupt based serial I/O. It also initializes the int 0Ch vector to point at the ComIntISR routine. One thing this code does not do is to provide break and critical error exception handlers. Remember, if the user hits ctrl-C (or ctrl-Break) or selects abort on an I/O error, the default exception handlers simply return to DOS without restoring the int 0Ch vector. It is important that your program provide exception handlers that will call ComDisIntr before allowing the system to return control to DOS. Otherwise the system may crash when DOS loads the next program into memory. See “Interrupts, Traps, and Exceptions” on page 995 for more details on writing these exception handlers. ComInitIntr
proc pushf push push push
;Save interrupt disable flag. es ax dx
; Turn off the interrupts while we’re doing this. cli
Page 1239
Chapter 22 ; Save old interrupt vector. Obviously, you must change the following code ; to save and set up the int 0Bh vector if you want to access COM2: rather ; than the COM1: port. xor mov mov mov mov mov
ax, ax ;Point at interrupt vectors es, ax ax, Int0Cofs word ptr OldIInt0C, ax ax, Int0Cseg word ptr OldInt0C+2, ax
; Point int 0ch vector at our interrupt service routine (see note above ; concerning switching to COM2:). mov mov mov mov
ax, cs Int0Cseg, ax ax, offset ComIntISR Int0Cofs, ax
; Clear any pending interrupts: call call call mov in ; ; ; ; ;
ComGetLSR ComGetMSR ComGetIIR dx, ComPort al, dx
;Clear Receiver line status ;Clear CTS/DSR/RI Interrupts ;Clear xmtr empty interrupt ;Clear data available intr.
Clear divisor latch access bit. WHILE OPERATING IN INTERRUPT MODE, THE DIVISOR ACCESS LATCH BIT MUST ALWAYS BE ZERO. If for some horrible reason you need to change the baud rate in the middle of a transmission (or while the interrupts are enabled) clear the interrupt flag, do your dirty work, clear the divisor latch bit, and finally restore interrupts. call and call
ComGetLCR al, 7fh ComSetLCR
;Get LCR. ;Clear divisor latch bit. ;Write new LCR value back.
; Enable the receiver and transmitter interrupts. Note that this code ; ignores error and modem status change interrupts. mov call
al, 3 SetIERCom
;Enable rcv/xmit interrupts
; Must set the OUT2 line for interrupts to work. ; Also sets DTR and RTS active. mov call
al, 00001011b ComSetMCR
; Activate the COM1 (int 0ch) bit in the 8259A interrupt controller chip. ; Note: you must change the following code to clear bit three (rather than ; four) to use this code with the COM2: port.
ComInitIntr
in mov and out
al, 21h i8259a, al al, 0efh 21h, al
pop pop pop popf ret endp
dx ax es
;Get 8259A interrupt enable value. ;Save interrupt enable bits. ;Bit 4=IRQ 4 = INT 0Ch ;Enable interrupts.
;Restore interrupt disable flag.
The ComDisIntr routine disables serial interrupts. It restores the original value of the 8259A interrupt enable register, it restores the int 0Ch interrupt vector, and it masks interrupts on the 8250 SCC. Note that this code assumes that you have not changed the interrupt enable bits in the 8259 PIC since calling
Page 1240
The PC Serial Ports ComInitIntr. It restores the 8259A’s interrupt enable register with the value from the 8259A interrupt enable register when you originally called ComInitIntr.
It would be a complete disaster to call this routine without first calling ComInitIntr. Doing so would patch the int 0Ch vector with garbage and, likewise, restore the 8259A interrupt enable register with a garbage value. Make sure you’ve called ComInitIntr before calling this routine. Generally, you should call ComInitIntr once, at the beginning of your program, and call ComDisIntr once, either at the end of your program or within the break or critical error exception routines. ComDisIntr
proc pushf push push push
es dx ax
cli xor mov
ax, ax es, ax
;Don’t allow interrupts while messing ; with the interrupt vectors. ;Point ES at interrupt vector table.
; First, turn off the interrupt source at the 8250 chip: call and call
ComGetMCR al, 3 ComSetMCR
;Get the OUT 2 (interrupt enable) bit. ;Mask out OUT 2 bit (masks ints) ;Write result to MCR.
; Now restore the IRQ 4 bit in the 8259A PIC. Note that you must modify this ; code to restore the IRQ 3 bit if you want to support COM2: instead of COM1: in and mov and or out
al, 21h al, 0efh ah, i8259a ah, 1000b al, ah 21h, al
;Get current 8259a IER value ;Clear IRQ 4 bit (change for COM2:!) ;Get our saved value ;Mask out com1: bit (IRQ 4). ;Put bit back in.
; Restore the interrupt vector:
ComDisIntr
mov mov mov mov
ax, word ptr OldInt0C Int0Cofs, ax ax, word ptr OldInt0C+2 Int0Cseg, ax
pop pop pop popf ret endp
ax dx es
The following code implements the interrupt service routine for the 8250 SCC. When an interrupt occurs, this code reads the 8250 IIR to determine the source of the interrupt. The Standard Library routines only provide direct support for data available interrupts and transmitter holding register empty interrupts. If this code detects an error or status change interrupt, it clears the interrupt status but takes no other action. If it detects a receive or transmit interrupt, it transfers control to the appropriate handler. The receiver interrupt handler is very easy to implement. All this code needs to do is read the character from the Receive Register and add this character to the input buffer. The only catch is that this code must ignore any incoming characters if the input buffer is full. An application can access this data using the ComIn routine that removes data from the input buffer. The transmit handler is somewhat more complex. The 8250 SCC interrupts the 80x86 when it is able to accept more data for transmission. However, the fact that the 8250 is ready for more data doesn’t guarantee there is data ready for transmission. The application produces data at its own rate, not necessarily at the rate that 8250 SCC wants it. Therefore, it is quite possible for the 8250 to say “give me more data” but Page 1241
Chapter 22
the application has not produced any. Obviously, we should not transmit anything at that point. Instead, we have to wait for the application to produce more data before transmission resumes. Unfortunately, this complicates the driver for the transmission code somewhat. With the receiver, the interrupt always indicates that the ISR can move data from the 8250 to the buffer. The application can remove this data at any time and the process is always the same: wait for a non-empty receive buffer and then remove the first item from the buffer. Unfortunately, we cannot simply do the converse operation when transmitting data. That is, we can’t simply store data in the transmit buffer and leave it up to the ISR to remove this data. The problem is that the 8250 only interrupts the system once when the transmitter holding register is empty. If there is no data to transmit at that point, the ISR must return without writing anything to the transmit register. Since there is no data in the transmit buffer, there will be no additional transmitter interrupts generated, even when there is data added to the transmit buffer. Therefore, the ISR and the routine responsible for adding data to the output buffer (ComOut) must coordinate their activities. If the buffer is empty and the transmitter is not currently transmitting anything, the ComOut routine must write its data directly to the 8250. If the 8250 is currently transmitting data, ComOut must append its data to the end of the output buffer. The ComIntISR and ComOut use a flag, TestBuffer, to determine whether ComOut should write directly to the serial port or append its data to the output buffer. See the following code and the code for ComOut for all the details. ComIntISR
TryAnother:
; ; ; ; ; ;
proc push push push mov in test jnz cmp jnz cmp jnz
far ax bx dx dx, ComIIR al, dx al, 1 IntRtn al, 100b ReadCom1 al, 10b WriteCom1
;Get interrupt id value. ;Any interrupts left? ;Quit if no interrupt pending. ;Since only xmit/rcv ints are ; active, this checks for rcv int. ;This checks for xmit empty intr.
Bogus interrupt? We shouldn’t ever fall into this code because we have not enabled the error or status change interrupts. However, it is possible that the application code has gone in and tweakd the IER on the 8250. Therefore, we need to supply a default interrupt handler for these conditions. The following code just reads all the appropriate registers to clear any pending interrupts. call call jmp
ComGetLSR ComGetMSR TryAnother
;Clear receiver line status ;Clear modem status. ;Check for lower priority intr.
; When there are no more pending interrupts on the 8250, drop down and ; and return from this ISR. IntRtn:
; ; ; ;
al, 20h 20h, al dx bx ax
;Acknowledge interrupt to the ; 8259A interrupt controller.
Handle incoming data here: (Warning: This is a critical region. Interrupts MUST BE OFF while executing this code. By default, interrupts are off in an ISR. DO NOT TURN THEM ON if you modify this code).
ReadCom1:
Page 1242
mov out pop pop pop iret
mov in
dx, ComPort al, dx
;Point at data input register. ;Get the input char
mov mov
bx, InHead [bx], al
;Insert the character into the ; serial input buffer.
inc cmp jb
bx ;Increment buffer ptr. bx, offset InpBufEnd NoInpWrap
The PC Serial Ports NoInpWrap:
mov cmp je mov jmp
bx, offset InpBuf bx, InTail ;If the buffer is full, ignore this TryAnother ; input character. InHead, bx TryAnother ;Go handle other 8250 interrupts.
; Handle outgoing data here (This is also a critical region): WriteCom1:
mov cmp jne
bx, OutTail bx, OutHead OutputChar
;See if the buffer is empty. ;If not, output the next char.
; If head and tail are equal, simply set the TestBuffer variable to zero ; and quit. If they are not equal, then there is data in the buffer and ; we should output the next character. mov jmp
TestBuffer, 0 TryAnother
;Handle other pending interrupts.
; The buffer pointers are not equal, output the next character down here. OutputChar:
mov mov out
al, [bx] dx, ComPort dx, al
;Get the next char from the buffer. ;Select output port. ;Output the character
; Okay, bump the output pointer.
NoOutWrap: ComIntISR
inc cmp jb mov mov jmp endp
bx bx, offset OutBufEnd NoOutWrap bx, offset OutBuf OutTail, bx TryAnother
These last two routines read data from the serial input buffer and write data to the serial output buffer. The ComIn routine, that handles the input chore, waits until the input buffer is not empty. Then it removes the first available byte from the input buffer and returns this value to the caller. ComIn
TstInLoop:
NoWrap2:
ComIn
proc pushf push sti mov cmp je mov cli inc cmp jne mov mov pop popf ret endp
;Save interrupt flag bx bx, InTail bx, InHead TstInLoop al, [bx]
;Make sure interrupts are on. ;Wait until there is at least one ; character in the input buffer.
;Get next char. ;Turn off ints while adjusting bx ; buffer pointers. bx, offset InpBufEnd NoWrap2 bx, offset InpBuf InTail, bx bx ;Restore interrupt flag.
The ComOut must check the TestBuffer variable to see if the 8250 is currently busy. If not (TestBuffer equals zero) then this code must write the character directly to the serial port and set TestBuffer to one (since the chip is now busy). If the TestBuffer contains a non-zero value, this code simply appends the character in al to the end of the output buffer.
Page 1243
Chapter 22 ComOut
; ; ; ;
proc pushf cli cmp jnz
far TestBuffer, 0 BufferItUp
;No interrupts now! ;Write directly to serial chip? ;If not, go put it in the buffer.
The following code writes the current character directly to the serial port because the 8250 is not transmitting anything now and we will never again get a transmit holding register empty interrupt (at least, not until we write data directly to the port). push mov out mov pop popf ret
dx dx, ComPort dx, al TestBuffer, 1 dx
;Select output register. ;Write character to port. ;Must buffer up next char. ;Restore interrupt flag.
; If the 8250 is busy, buffer up the character here: BufferItUp:
push mov mov
bx bx, OutHead [bx], al
;Pointer to next buffer position. ;Add the char to the buffer.
; Bump the output pointer.
NoWrap3: NoSetTail: ComOut
inc cmp jne mov cmp je mov pop popf ret endp
bx bx, offset OutBufEnd NoWrap3 bx, offset OutBuf bx, OutTail ;See if the buffer is full. NoSetTail ;Don’t add char if buffer is full. OutHead, bx ;Else, update buffer ptr. bx ;Restore interrupt flag
Note that the Standard Library does not provide any routines to see if there is data available in the input buffer or to see if the output buffer is full (comparable to the ComTstIn and ComTstOut routines). However, these are very easy routines to write; all you need do is compare the head and tail pointers of the two buffers. The buffers are empty if the head and tail pointers are equal. The buffers are full if the head pointer is one byte before the tail pointer (keep in mind, the pointers wrap around at the end of the buffer, so the buffer is also full if the head pointer is at the last position in the buffer and the tail pointer is at the first position in the buffer).
22.4
Summary This chapter discusses RS-232C serial communications on the PC. Like the parallel port, there are three levels at which you can access the serial port: through DOS, through BIOS, or by programming the hardware directly. Unlike DOS’ and BIOS’ parallel printer support, the DOS serial support is almost worthless and the BIOS support is rather weak (e.g., it doesn’t support interrupt driven I/O). Therefore, it is common programming practice on the PC to control the hardware directly from an application program. Therefore, familiarizing one’s self with the 8250 Serial Communication Chip (SCC) is important if you intend to do serial communications on the PC. This chapter does not discuss serial communication from DOS or BIOS, mainly because their support is so limited. For further information on programming the serial port from DOS or BIOS, see “MS-DOS, PC-BIOS, and File I/O” on page 699. The 8250 supports ten I/O registers that let you control the communication parameters, check the status of the chip, control interrupt capabilities, and, of course, perform serial I/O. the 8250 maps these registers to eight I/O locations in the PC’s I/O address space.
Page 1244
The PC Serial Ports
The PC supports up to four serial communication devices: COM1:, COM2:, COM3:, and COM4:. However, most software only deals with the COM1: and COM2: ports. Like the parallel port support, BIOS differentiates logical communication ports and physical communication ports. BIOS stores the base address of COM1:..COM4: in memory locations 40:0, 40:2, 40:4, and 40:6. This base address is the I/O address of the first 8250 register for that particular communication port. For more information on the 8250 hardware, check out • • • • • • • • • •
“The 8250 Serial Communications Chip” on page 1223 “The Data Register (Transmit/Receive Register)” on page 1224 “The Interrupt Enable Register (IER)” on page 1224 “The Baud Rate Divisor” on page 1225 “The Interrupt Identification Register (IIR)” on page 1226 “The Line Control Register” on page 1227 “The Modem Control Register” on page 1228 “The Line Status Register (LSR)” on page 1229 “The Modem Status Register (MSR)” on page 1230 “The Auxiliary Input Register” on page 1231
The UCR Standard Library provides a very reasonable set of routines you can use to control the serial port on the PC. Not only does this package provide a set of polling routines you can use much like the BIOS’ code, but it also provides an interrupt service routine to support interrupt driven I/O on the serial port. For more information on these routines, see •
“The UCR Standard Library Serial Communications Support Routines” on page 1231
The Standard Library serial I/O routines provide an excellent example of how to program the 8250 SCC. Therefore, this chapter concludes by presenting and explaining the Standard Library’s serial I/O routines. In particular, this code demonstrates some of the subtle problems with interrupt driven serial communication. For all the details, read •
“Programming the 8250 (Examples from the Standard Library)” on page 1233
Page 1245
Chapter 22
Page 1246
The PC Video Display
Chapter 23
The PC’s video display is a very complex system. First, there is not a single common device as exists for the parallel and serial ports, or even a few devices (like the keyboard systems found on PCs). No, there are literally dozens of different display adapter cards available for the PC. Furthermore, each adapter typically supports several different display modes. Given the large number of display modes and uses for the display adapters, it would be very easy to write a book as large as this one on the PC’s display adapters alone1 However, this is not that text. This book would hardly be complete without at least mentioning the PC’s video display, but there are not enough pages remaining in this book to do justice to the subject. Therefore, this chapter will discuss the 80 x 25 text display mode that nearly all display adapters support.
23.1
Memory Mapped Video Most peripheral devices on the PC use I/O mapped input/output. A program communicates with I/O mapped devices using the 80x86 in, out, ins , and outs instructions, accessing devices in the PC’s I/O address space. While the video controller chips that appear on PC video display adapters also map registers to the PC’s I/O space, these cards also employ a second form of I/O addressing: memory mapped I/O input/output. In particular, the 80 x 25 text display is nothing more than a two dimensional array of words with each word in the array corresponding a character on the screen. This array appears just above the 640K point in the PC’s memory address space. If you store data into this array using standard memory addressing instruction (e.g., mov), you will affect the characters appearing on the display. There are actually two different arrays you need to worry about. Monochrome system (remember those?) locate their text display starting at location B000:0000 in memory. Color systems locate their text displays at location B800:0000 in memory. These locations are the base addresses of a column major order array declared as follows: Display: array [0..79, 0..24] of word;
If you prefer to work with row major ordered arrays, no problem, the video display is equal to the following array definition: Display: array [0..24, 0..79] of word;
Note that location (0,0) is the upper left hand corner and location (79,24) is the lower right hand corner of the display (the values in parentheses are the x and y coordinates, with the x/horizontal coordinate appearing first). The L.O. byte of each word contains the PC/ASCII code for the character you want to display (see Appendix A for a listing of the PC/ASCII character set). The H.O. byte of each word is the attribute byte. We will return to the attribute byte in the next section. The display page consumes slightly less than 4 Kilobytes in the memory map. The color display adapters actually provide 32K for text displays and let you select one of eight different displays. Each such display begins on a 4K boundary, at address B800:0, B800:1000, B800:2000, B800:3000, ..., B800:7000. Note that most modern color display adapters actually provide memory from address A000:0 through B000:FFFF (and more), but the text display only uses the 32K from B800:0..B800:7FFF. In this chapter, we will only concern ourselves with the first color display page at address B800:0. However, everything discussed in this chapter applies to the other display pages as well. The monochrome adapter provides only a single display page. Indeed, the earliest monochrome display adapters included only 4K on-board memory (contrast this with modern high density color display adapters that have up to four megabytes of on-board memory!). 1. In fact, several such books exist. See the bibliography.
Page 1247 Thi d
t
t d ith F
M k
402
Chapter 23
You can address the memory on the video display like ordinary RAM. You could even store program variables, or even code, in this memory. However, it is never a good idea to do this. First of all, any time you write to the display screen, you will wipe out any variables stored in the active display memory. Even if you store such code or variables in an inactive display page (e.g., pages one through seven on a color display adapter), using this memory in this manner is not a good idea because access to the display adapter is very slow. Main memory runs two to ten times faster (depending on the machine).
23.2
The Video Attribute Byte The video attribute associated with each character on the screen controls underlining, intensity, inverse video, and blinking video on monochrome adapters. It controls blinking and character foreground/background colors on color displays. The following diagrams provide the possible attribute values:
7
6
5
4
3
2
1
0
Display Mode 000 000 = Invisible 000 001 = Underlined 000 111 = Normal 111 000 = Inverse video Intensity: high = 1, low = 0 Blinking = 1, static = 0
Monochrome Display Adapter Attribute Byte Format 7
6
5
4
3
2
1
0
Foreground color: 0000 = Black 0001 = Blue 0010 = Green 0011 = Cyan 0100 = Red 0101 = Magenta 0110 = Brown 0111 = Light Gray
1000 = Dark Gray 1001 = Light Blue 1010 = Light Green 1011 = Light Cyan 1100 = Light Red 1101 = Light Magenta 1110 = Yellow 1111 = White
Background color (see values 0000..0111 above) Blinking = 1, static = 0
Color Display Adapter Attribute Byte Format Page 1248
The PC Video Display
To get reverse video on the color display, simply swap the foreground and background colors. Note that a foreground color of zero with a background color of seven produces black characters on a white background, the standard reverse video colors and the same attribute values you’d use on the monochrome display adapter. You need to be careful when choosing foreground and background colors for text on a color display adapters. Some combinations are impossible to read (e.g., white characters on a white background). Other colors go together so poorly the text will be extremely difficult to read, if not impossible (how about light green letters on a green background?). You must choose your colors with care! Blinking characters are great for drawing attention to some important text on the screen (like a warning). However, it is easy to overdo blinking text on the screen. You should never have more than one word or phrase blinking on the screen at one time. Furthermore, you should never leave blinking characters on the screen for any length of time. After a few seconds, replace blinking characters with normal characters to avoid annoying the user of your software. Keep in mind, you can easily change the attributes of various characters on the screen without affecting the actual text. Remember, the attribute bytes appear at odd addresses in the memory space for the video display. You can easily go in and change these bytes while leaving the character code data alone.
23.3
Programming the Text Display You might ask why anyone would want to bother working directly with the memory mapped display on the PC. After all, DOS, BIOS, and the UCR Standard Library provide much more convenient ways to display text on the screen. Handling new lines (carriage return and line feed) at the end of each line or, worse yet, scrolling the screen when the display is full, is a lot of work. Work that is taken care of for you automatically by the aforementioned routines. Work you have to do yourself if you access screen memory directly. So why bother? There are two reasons: performance and flexibility. The BIOS video display routines2 are dreadfully slow. You can easily get a 10 to 100 times performance boost by writing directly to screen memory. For a typical computer science class project, this may not be important, especially if you’re running on a fast machine like a 150 MHz Pentium. On the other hand, if you are developing a program that displays and removes several windows or pop-up menus on the screen, the BIOS routines won’t cut it. Although the BIOS int 10h functions provide a large set of video I/O routines, there will be lots of functions you might want to have that the BIOS just doesn’t provide. In such cases, going directly to screen memory is one way to solve the problem. Another difficulty with BIOS routine is that they are not reentrant. You cannot call a BIOS display function from an interrupt service routine, nor can you freely call BIOS from concurrently executing processes. However, by writing your own video service routines, you can easily create a window for each concurrent thread you application is executing. Then each thread can call your routines to display its output independent of the other threads executing on the system. The AMAZE.ASM program (see “Processes, Coroutines, and Concurrency” on page 1065) is a good example of a program that directly access the text display by directly storing data into the video display’s memory mapped display array. This program access display memory directly because it is more convenient to do so (the screen’s display array maps quite nicely to the internal maze array). Simple video games like a space invaders game or a “remove the bricks” game also map nicely to a memory mapped video display. The following program provides an excellent example of an application that needs to access video memory directly. This program is a screen capture TSR. When you press the left shift key and then the right shift key, this program copies the current screen contents to an internal buffer. When you press the
2. The Standard Library calls DOS and DOS calls BIOS for all display I/O, hence they all become BIOS calls at one level or another.
Page 1249
Chapter 23
right shift key followed by the left shift key, this program copies its internal buffer to the display screen. Originally, this program was written to capture CodeView screens for the lab manual accompanying this text. There are commercial screen capture programs (e.g., HiJak) that would normally do the job, but are incompatible with CodeView. This short TSR allows one to capture screens in CodeView, quit CodeView, put the CodeView screen back onto the display, and the use a program like HiJak to capture the output. ; ; ; ; ; ; ; ; ; ; ; ;
GRABSCRN.ASM A short TSR to capture the current display screen and display it later. Note that this code does not patch into int 2Fh (multiplex interrupt) nor can you remove this code from memory except by rebooting. If you want to be able to do these two things (as well as check for a previous installation), see the chapter on resident programs. Such code was omitted from this program because of length constraints. cseg and EndResident must occur before the standard library segments!
cseg OldInt9 ScreenSave cseg
segment dword byte ends
para public ‘code’ ? 4096 dup (?)
; Marker segment, to find the end of the resident section. EndResident EndResident
segment ends
para public ‘Resident’
.xlist include stdlib.a includelib stdlib.lib .list RShiftScan LShiftScan
equ equ
36h 2ah
; Bits for the shift/modifier keys RShfBit LShfBit
equ equ
1 2
KbdFlags
equ
byp
equ
; Screen segment address. This value is for color displays only. ; Change to B000h if you want to use this program on a mono display.
Page 1250
ScreenSeg
equ
0B800h
cseg
segment assume
para public ‘code’ ds:nothing
; MyInt9; ; ; ; ; ; ; ;
INT 9 ISR. This routine reads the keyboard port to see if a shift key scan code just came along. If the right shift bit is set in KbdFlags the a left shift key scan code comes along, we want to copy the data from our internal buffer to the screen’s memory. If the left shift bit is set and a right shift key scan code comes along, we want to copy the screen memory into our local array. In any case (including none of the above), we always transfer control to the original INT 9 handler.
MyInt9
proc push push
far ds ax
The PC Video Display mov mov
ax, 40h ds, ax
in cmp je cmp jne
al, 60h al, RShiftScan DoRight al, LShiftScan QuitMyInt9
;Read the keyboard port. ;Right shift just go down? ;How about the left shift?
; If this is the left scan code, see if the right shift key is already ; down. test je
KbdFlags, RShfBit QuitMyInt9 ;Branch if no
; Okay, right shift was down and we just saw left shift, copy our local ; data back to screen memory: pushf push push push push mov mov mov lea mov mov xor jmp
es cx di si cx, 2048 si, cs ds, si si, ScreenSave di, ScreenSeg es, di di, di DoMove
; Okay, we just saw the right shift key scan code, see if the left shift ; key is already down. If so, save the current screen data to our local ; array. DoRight:
DoMove: rep
test je
KbdFlags, LShfBit QuitMyInt9
pushf push push push push mov mov mov lea mov mov xor
es cx di si cx, ax, es, di, si, ds, si,
cld movsw pop pop pop pop popf
2048 cs ax ScreenSave ScreenSeg si si
si di cx es
QuitMyInt9:
MyInt9
Main
pop pop jmp endp
ax ds OldInt9
proc assume
ds:cseg
mov mov
ax, cseg ds, ax
print
Page 1251
Chapter 23 byte byte byte byte byte byte ; ; ; ;
“Screen capture TSR”,cr,lf “Pressing left shift, then right shift, captures “ “the current screen.”,cr,lf “Pressing right shift, then left shift, displays “ “the last captured screen.”,cr,lf 0
Patch into the INT 9 interrupt vector. Note that the statements above have made cseg the current data segment, so we can store the old INT 9 value directly into the OldInt9 variable. cli mov mov mov mov mov mov mov mov sti
;Turn off interrupts! ax, 0 es, ax ax, es:[9*4] word ptr OldInt9, ax ax, es:[9*4 + 2] word ptr OldInt9+2, ax es:[9*4], offset MyInt9 es:[9*4+2], cs ;Okay, ints back on.
; We’re hooked up, the only thing that remains is to terminate and ; stay resident.
23.4
print byte
“Installed.”,cr,lf,0
mov int
ah, 62h 21h
dx, EndResident ;Compute size of program. dx, bx ax, 3100h ;DOS TSR command. 21h
Main cseg
mov sub mov int endp ends
sseg stk sseg
segment db ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment db ends end
para public ‘zzzzzz’ 16 dup (?)
;Get this program’s PSP ; value.
Main
Summary The PC’s video system uses a memory mapped array for the screen data. This is an 80 x 25 column major organized array of words. Each word in the array corresponds to a single character on the screen. This array begins at location B000:0 for monochrome displays and B800:0 for color displays. For additional information, see: •
“Memory Mapped Video” on page 1247
The L.O. byte is the PC/ASCII character code for that particular screen position, the H.O. byte contains the attributes for that character. The attribute selects blinking, intensity, and background/foreground colors (on a color display). For more information on the attribute byte, see: •
“The Video Attribute Byte” on page 1248
There are a few reasons why you would want to bother accessing display memory directly. Speed and flexibility are the two primary reasons people go directly to screen memory. You can create your own
Page 1252
The PC Video Display
screen functions that the BIOS doesn’t support and do it one or two orders of magnitude faster than the BIOS by writing directly to screen memory. To find out about this, and to see a simple example, check out •
“Programming the Text Display” on page 1249
Page 1253
Chapter 23
Page 1254
The PC Game Adapter
Chapter 24
One need look no farther than the internals of several popular games on the PC to discover than many programmers do not fully understand one of the least complex devices attached to the PC today – the analog game adapter. This device allows a user to connect up to four resistive potentiometers and four digital switch connections to the PC. The design of the PC’s game adapter was obviously influenced by the analog input capabilities of the Apple II computer1, the most popular computer available at the time the PC was developed. Although IBM provided for twice the analog inputs of the Apple II, thinking that would give them an edge, their decision to support only four switches and four potentiometers (or “pots”) seems confining to game designers today – in much the same way that IBM’s decision to support 256K RAM seems so limiting today. Nevertheless, game designers have managed to create some really marvelous products, even living with the limitations of IBM’s 1981 design. IBM’s analog input design, like Apple’s, was designed to be dirt cheap. Accuracy and performance were not a concern at all. In fact, you can purchase the electronic parts to build your own version of the game adapter, at retail, for under three dollars. Indeed, today you can purchase a game adapter card from various discount merchants for under eight dollars. Unfortunately, IBM’s low-cost design in 1981 produces some major performance problems for high-speed machines and high-performance game software in the 1990’s. However, there is no use crying over spilled milk – we’re stuck with the original game adapter design, we need to make the most of it. The following sections will describe how to do exactly that.
24.1
Typical Game Devices The game adapter is nothing more than a computer interface to various game input devices. The game adapter card typically contains a DB15 connector into which you plug an external device. Typical devices you can obtain for the game adapter include paddles, joysticks, flight yokes, digital joysticks, rudder pedals, RC simulators, and steering wheels. Undoubtedly, this is but a short list of the types of devices you can connect to the game adapter. Most of these devices are far more expensive that the game adapter card itself. Indeed, certain high performance flight simulator consoles for the game adapter cost several hundred dollars. The digital joystick is probably the least complex device you can connect to the PC’s game port. This device consists of four switches and a stick. Pushing the stick forward, left, right, or pulling it backward closes one of the switches. The game adapter card provides four switch inputs, so you can sense which direction (including the rest position) the user is pressing the digital joystick. Most digital joysticks also allow you to sense the in-between positions by closing two contacts at once. for example, pushing the control stick at a 45 degree angle between forward and right closes both the forward and right switches. The application software can sense this and take appropriate action. The original allure of these devices is that they were very cheap to manufacture (these were the original joysticks found on most home game machines). However, as manufacturers increased production of analog joysticks, the price fell to the point that digital joysticks failed to offer a substantial price difference. So today, you will rarely encounter such devices in the hands of a typical user. The game paddle is another device whose use has declined over the years. A game paddle is a single pot in a case with a single knob (and, typically, a single push button). Apple used to ship a pair of game paddles with every Apple II they sold. As a result, games that used game paddles were still quite popular when IBM released the PC in 1981. Indeed, a couple manufacturers produced game paddles for the PC when it was first introduced. However, once again the cost of manufacturing analog joysticks fell to the point that paddles couldn’t compete. Although paddles are the appropriate input device for many games, joysticks could do just about everything a game paddle could, and more. So the use of game paddles quickly died out. There is one thing you can do with game paddles that you cannot do with joysticks – you
1. In fact, the PC’s game adapter design was obviously stolen directly from the Apple II.
Page 1255 Thi d
t
t d ith F
M k
402
Chapter 24
can place four of them on a system and produce a four player game. However, this (obviously) isn’t important to most game designers who generally design their games for only one player. A game paddle or set of rudder pedals generally provide a single number in the range zero through some system dependent maximum value. 0
Maximum Reading
Game Paddle or Rudder Pedal Game Input Device Rudder pedals are really nothing more than a specially designed game paddle designed so you can activate them with your feet. Many flight simulator games take advantage of this input device to provide a more realistic experience. Generally, you would use rudder pedals in addition to a joystick device. A joystick contains two pots connected with a stick. Moving the joystick along the x-axis actuates one of the pots, moving the joystick along the y-axis actuates the other pot. By reading both pots, you can roughly determine the absolute position of the pot within its working range.
Y
A joystick uses two independent pots to provide an (X,Y) input value. Horizontal movements on the joystick affect the x-axis pot independently of the y-axis pot. Likewise, vertical movements affect the y-axis pot independent of the x-axis pot. By reading both pots you can determine the position of the joystick in the (X,Y) coordinate system.
X
Joystick Game Input Device An RC simulator is really nothing more than a box containing two joysticks. The yoke and steering wheel devices are essentially the same device, sold specifically for flight simulators or automotive games2. The steering wheel is connected to a pot that corresponds to the x-axis on the joystick. Pulling back (or pushing forward) on the wheel activates a second pot that corresponds to the y-axis on the joystick. Certain joystick devices, generically known as flight sticks, contain three pots. Two pots are connected in a standard joystick fashion, the third is connected to a knob which many games use for the throttle control. Other joysticks, like the Thrustmasteror CH Products’ FlightStick Pro, include extra switches including a special “cooley switch” that provide additional inputs to the game. The cooley switch is, essentially, a digital pot mounted on the top of a joystick. Users can select one of four positions on the cooley switch using their thumb. Most flight simulator programs compatible with such devices use the cooley switch to select different views from the aircraft.
2. In fact, many such devices are switchable between the two.
Page 1256
The Game Adapter
The cooley switch (shown here on a device layout similar to the CH Products' FlightStick Pro) is a thumb actuated digitial joystick. You can move the switch up, down, left or right, activating individual switches inside the game input device.
Cooley Switch (found on CH Products and Thrustmaster Joysticks) 24.2
The Game Adapter Hardware The game adapter hardware is simplicity itself. There is a single input port and a single output port. The input port bit layout is
7
I/O Address 201h 6 5 4 3 2 1
0
Pot #0 input Pot #1 input Pot #2 input Pot #3 input Switch #0 input Switch #1 in put Switch #2 input Switch #3 input
Game Adapter Input Port The four switches come in on the H.O. four bits of I/O port 201h. If the user is currently pressing a button, the corresponding bit position will contain a zero. If the button is up, the corresponding bit will contain a one. The pot inputs might seem strange at first glance. After all, how can we represent one of a large number of potential pot positions (say, at least 256) with a single bit? Obviously we can’t. However, the input bit on this port does not return any type of numeric value specifying the pot position. Instead, each of the Page 1257
Chapter 24
four pot bits is connected to an input of a resistive sensitive 558 quad timer chip. When you trigger the timer chip, it produces an output pulse whose duration is proportional to the resistive input to the timer. The output of this timer chip appears as the input bit for a given pot. The schematic for this circuit is Trigger (Write to I/O Address 201h)
558 Timer
D0 D1 D2 D3
L.O. Four Bits on Input Port 201h
External Potentiometers
Joystick Schematic Normally, the pot input bits contain zero. When you trigger the timer chip, the pot input lines go high for some period of time determined by the current resistance of the potentiometer. By measuring how long this bit stays set, you can get a rough estimate of the resistance. To trigger the pots, simply write any value to I/O port 201h. The actual value you write is unimportant. The following timing diagram shows how the signal varies on each pot’s input bit: Input on D0..D3 goes high for some period of time depending on the pot setting. 1 0 Trigger Occurs Here
Analog Input Timing Signal Page 1258
The Game Adapter
The only remaining question is “how do we determine the length of the pulse?” The following short loop demonstrates one way to determine the width of this timing pulse:
CntLp:
mov mov out in test loopne neg
cx, -1 dx, 201h dx, al al, dx al, 1 CntLp cx
;We’re going to count backwards ;Point at joystick port. ;Trigger the timer chip. ;Read joystick port. ;Check pot #0 input. ;Repeat while high. ;Convert CX to a positive value.
When this loop finish execution, the cx register will contain the number of passes made through this loop while the timer output signal was a logic one. The larger the value in cx , the longer the pulse and, therefore, the greater the resistance of pot #0. There are several minor problems with this code. First of all, the code will obviously produce different results on different machines running at different clock rates. For example, a 150 MHz Pentium system will execute this code much faster than a 5 MHz 8088 system3. The second problem is that different joysticks and different game adapter cards produce radically different timing results. Even on the same system with the same adapter card and joystick, you may not always get consistent readings on different days. It turns out that the 558 is somewhat temperature sensitive and will produce slightly different readings as the temperature changes. Unfortunately, there is no way to design a loop like the above so that it returns consistent readings across a wide variety of machines, potentiometers, and game adapter cards. Therefore, you have to write your application software so that it is insensitive to wide variances in the input values from the analog inputs. Fortunately, this is very easy to do, but more on that later.
24.3
Using BIOS’ Game I/O Functions The BIOS provides two functions for reading game adapter inputs. Both are subfunctions of the int 15h handler. To read the switches, load ah with 84h and dx with zero then execute an int 15h instruction. On return, al will contain the switch readings in the H.O. four bits (see the diagram in the previous section). This function is roughly equivalent to reading port 201h directly. To read the analog inputs, load ah with 84h and dx with one then execute an int 15h instruction. On return, AX, BX, CX, and DX will contain the values for pots zero, one, two, and three, respectively. In practice, this call should return values in the range 0-400h, though you cannot count on this for reasons described in the previous section. Very few programs use the BIOS joystick support. It’s easier to read the switches directly and reading the pots is not that much more work that calling the BIOS routine. The BIOS code is very slow. Most BIOSes read the four pots sequentially, taking up to four times longer than a program that reads all four pots concurrently (see the next section). Because reading the pots can take several hundred microseconds up to several milliseconds, most programmers writing high performance games do not use the BIOS calls, they write their own high performance routines instead. This is a real shame. By writing drivers specific to the PC’s original game adapter design, these developers force the user to purchase and use a standard game adapter card and game input device. Were the game to make the BIOS call, third party developers could create different and unique game controllers and then simply supply a driver that replaces the int 15h routine and provides the same programming interface. For example, Genovation made a device that lets you plug a joystick into the parallel port of a PC.
3. Actually, the speed difference is not as great as you would first think. Joystick adapter cards almost always interface to the computer system via the ISA bus. The ISA bus runs at only 8 Mhz and requires four clock cycles per data transfer (i.e., 500 ns to read the joystick input port). This is equivalent to a small number of wait states on a slow machine and a gigantic number of wait states on a fast machine. Tests run on a 5 MHz 8088 system vs. a 50 MHz 486DX system produces only a 2:1 to 3:1 speed difference between the two machines even though the 486 machine was over 50 times faster for most other computations.
Page 1259
Chapter 24
Colorado Spectrum created a similar device that lets you plug a joystick into the serial port. Both devices would let you use a joystick on machines that do not (and, perhaps, cannot) have a game adapter installed. However, games that access the joystick hardware directly will not be compatible with such devices. However, had the game designer made the int 15h call, their software would have been compatible since both Colorado Spectrum and Genovation supply int 15h TSRs to reroute joystick calls to use their devices. To help overcome game designer’s aversion to using the int 15h calls, this text will present a high performance version of the BIOS’ joystick code a little later in this chapter. Developers who adopt this Standard Game Device Interface will create software that will be compatible with any other device that supports the SGDI standard. For more details, see “The Standard Game Device Interface (SGDI)” on page 1262.
24.4
Writing Your Own Game I/O Routines Consider again the code that returns some value for a given pot setting:
CntLp:
mov mov out in test loopne neg
cx, -1 dx, 201h dx, al al, dx al, 1 CntLp cx
;We’re going to count backwards ;Point at joystick port. ;Trigger the timer chip. ;Read joystick port. ;Check pot #0 input. ;Repeat while high. ;Convert CX to a positive value.
As mentioned earlier, the big problem with this code is that you are going to get wildly different ranges of values from different game adapter cards, input devices, and computer systems. Clearly you cannot count on the code above always producing a value in the range 0..180h under these conditions. Your software will need to dynamically adjust the values it uses depending on the system parameters. You’ve probably played a game on the PC where the software asks you to calibrate the joystick before use. Calibration generally consists of moving the joystick handle to one corner (e.g., the upper-left corner), pressing a button or key and them moving the handle to the opposite corner (e.g., lower-right) and pressing a button again. Some systems even want you to move the joystick to the center position and press a button as well. Software that does this is reading the minimum, maximum, and centered values from the joystick. Given at least the minimum and maximum values, you can easily scale any reading to any range you want. By reading the centered value as well, you can get slightly better results, especially on really inexpensive (cheap) joysticks. This process of scaling a reading to a certain range is known as normalization. By reading the minimum and maximum values from the user and normalizing every reading thereafter, you can write your programs assuming that the values always fall within a certain range, for example, 0..255. To normalize a reading is very easy, you simply use the following formula: ( CurrentReading – MinimumReading ) ------------------------------------------------------------------------------------------------------------- × NormalValue ( MaximumReading – MinimumReading )
The MaximumReading and MinimumReading values are the minimum and maximum values read from the user at the beginning of your application. CurrentReading is the value just read from the game adapter. NormalValue is the upper bounds on the range to which you want to normalize the reading (e.g., 255), the lower bound is always zero4.
4. If you want a different lower bound, just add whatever value you want fro the lowest value to the result. You will also need to subtract this lower bound from the NormalValue variable in the above equation.
Page 1260
The Game Adapter
To get better results, especially when using a joystick, you should obtain three readings during the calibration phase for each pot – a minimum value, a maximum value, and a centered value. To normalize a reading when you’ve got these three values, you would use one of the following formulae: If the current reading is in the range minimum..center, use this formula: ( Current – Center ) ----------------------------------------------------------------- × NormalValue ( Center – Minimum ) × 2
If the current reading is in the range center..maximum, use this formula: ( Current – Center ) NormalValue ------------------------------------------------------------------ × NormalValue + -----------------------------------( Maximum – Center ) × 2 2
A large number of games on the market today jump through all kinds of hoops trying to coerce joystick readings into a reasonable range. It is surprising how few of them use that simple formula above. Some game designers might argue that the formulae above are overly complex and they are writing high performance games. This is nonsense. It takes two orders of magnitude more time to wait for the joystick to time out than it does to compute the above equations. So use them and make your programs easier to write. Although normalizing your pot readings takes so little time it is always worthwhile, reading the analog inputs is a very expensive operation in terms of CPU cycles. Since the timer circuit produces relatively fixed time delays for a given resistance, you will waste even more CPU cycles on a fast machine than you do on a slow machine (although reading the pot takes about the same amount of real time on any machine). One sure fire way to waste a lot of time is to read several pots one at a time; for example, when reading pots zero and one to get a joystick reading, read pot zero first and then read pot one afterwards. It turns out that you can easily read both pots in parallel. By doing so, you can speed up reading the joystick by a factor of two. Consider the following code:
CntLp:
mov mov mov mov mov out in and jz shr adc add loop and and
cx, 1000h si, 0 di, si ax, si dx, 201h dx, al al, dx al, 11b Done ax, 1 si, 0 di, ax CntLp si, 0FFFh di, 0FFFh
;Max times through loop ;We’ll put readings in SI and ; di. ;Set AH to zero. ;Point at joystick port. ;Trigger the timer chip. ;Read joystick port. ;Strip unwanted bits. ;Put pot 0 value into carry. ;Bump pot 0 value if still active. ;Bump pot 1 value if pot 1 active. ;Repeat while high. ;If time-out, force the register(s) ; containing 1000h to zero.
Done:
This code reads both pot zero and pot one at the same time. It works by looping while either pot is active5. Each time through the loop, this code adds the pots’ bit values to separate register that accumulator the result. When this loop terminates, si and di contain the readings for both pots zero and one. Although this particular loop contains more instructions than the previous loop, it still takes the same amount of time to execute. Remember, the output pulses on the 558 timer determine how long this code takes to execute, the number of instructions in the loop contribute very little to the execution time. However, the time this loop takes to execute one iteration of the loop does effect the resolution of this joystick read routine. The faster the loop executes, the more iterations the loop will run during the same timing period and the finer will be the measurement. Generally, though, the resolution of the above code is much greater than the accuracy of the electronics and game input device, so this isn’t much of a concern.
5. This code provides a time-out feature in the event there is no game adapter installed. In such an event this code forces the readings to zero.
Page 1261
Chapter 24
The code above demonstrates how to read two pots. It is very easy to extend this code to read three or four pots. An example of such a routine appears in the section on the SGDI device driver for the standard game adapter card. The other game device input, the switches, would seem to be simple in comparison to the potentiometer inputs. As usual, things are not as easy as they would seem at first glance. The switch inputs have some problems of their own. The first issue is keybounce. The switches on a typical joystick are probably an order of magnitude worse than the keys on the cheapest keyboard. Keybounce, and lots of it, is a fact you’re going to have to deal with when reading joystick switches. In general, you shouldn’t read the joystick switches more often than once every 10 msec. Many games read the switches on the 55 msec timer interrupt. For example, suppose your timer interrupt reads the switches and stores the result in a memory variable. The main application, when wanting to fire a weapon, checks the variable. If it’s set, the main program clears the variable and fires the weapon. Fifty-five milliseconds later, the timer sets the button variable again and the main program will fire again the next time it checks the variable. Such a scheme will totally eliminate the problems with keybounce. The technique above solves another problem with the switches: keeping track of when the button first goes down. Remember, when you read the switches, the bits that come back tell you that the switch is currently down. It does not tell you that the button was just pressed. You have to keep track of this yourself. One easy way to detect when a user first presses a button is to save the previous switch reading and compare it against the current reading. If they are different and the current reading indicates a switch depression, then this is a new switch down.
24.5
The Standard Game Device Interface (SGDI) The Standard Game Device Interface (SGDI) is a specification for an int 15h service that lets you read an arbitrary number of pots and joysticks. Writing SGDI compliant applications is easy and helps make your software compatible with any game device which provides SGDI compliance. By writing your applications to use the SGDI API you can ensure that your applications will work with future devices that provide extended SGDI capability. To understand the power and extensibility of the SGDI, you need to take a look at the application programmer’s interface (API) for the SGDI.
24.5.1
Application Programmer’s Interface (API) The SGDI interface extends the PC’s joystick BIOS int 15h API. You make SGDI calls by loading the 80x86 ah register with 84h and dx with an appropriate SGDI function code and then executing an int 15h instruction. The SGDI interface simply extends the functionality of the built-in BIOS routines. Note that and program that calls the standard BIOS joystick routines will work with an SGDI driver. The following table lists each of the SGDI functions:
Table 87: SGDI Functions and API (int 15h, ah=84h) DH 00
00
Page 1262
Inputs
dl = 0
dl = 1
Outputs
Description
readings
Read4Sw. This is the standard BIOS subfunction zero call. This reads the status of the first four switches and returns their values in the upper four bits of the al register.
al - Switch
ax- pot 0 bx - pot 1 cx - pot 2 dx - pot 3
Read4Pots. Standard BIOS subfunction one call. Reads all four pots (concurrently) and returns their raw values in ax, bx , cx , and dx as per BIOS specifications.
The Game Adapter
Table 87: SGDI Functions and API (int 15h, ah=84h) DH
Inputs
Outputs
01
dl = pot #
al = pot reading
02
dl = 0 al = pot mask
al = pot 0 ah = pot 1 dl = pot 2 dh = pot 3
03
04
dl = pot # al = minimum bx= maximum cx= centered dl = pot #
al = 0 if not cal-
ibrated, 1 if calibrated.
05
dl = pot #
ax = raw value
08
dl = switch #
ax = switch value
ax = switch val-
09
ues 80h
ReadPot. This function reads a pot and returns a normalized reading in the range 0..255. Read4. This routine reads the four pots on the standard game adapter card just like the Read4Pots function above. However, this routine normalizes the four values to the range 0..255 and returns those values in al, ah, dl , and dh. On entry, the al register contains a “pot mask” that you can use to select which of the four pots this routine actually reads. Calibrate. This function calibrates the pots for those calls that return normalized values. You must calibrate the pots before calling any such pot functions (ReadPot and Read4 above). The input values must be raw pot readings obtained by Read4Pots or other function that returns raw values. TestPotCalibrate. Checks to see if the specified pot has already been calibrated. Returns an appropriate value in al denoting the calibration status for the specified pot. See the note above about the need for calibration. ReadRaw. Reads a raw value from the specified pot. You can use this call to get the raw values required by the calibrate routine, above. ReadSw. Read the specified switch and returns zero (switch up) or one (switch down) in the ax register. Read16Sw. This call lets an application read up to 16 switches on a game device at a time. Bit zero of ax corresponds to switch zero, bit 15 of ax corresponds to switch fifteen. Remove. This function removes the driver from memory. Application programs generally won’t make this call.
TestPresence. This routine returns zero in the ax register if an SGDI driver is present in memory. It returns ax’s value unchanged otherwise (in particular, ah will still contain 84h).
81h
24.5.2
Description
Read4Sw Inputs: ah= 84h, dx = 0 This is the standard BIOS read switches call. It returns the status switches zero through three on the joystick in the upper four bits of the al register. Bit four corresponds to switch zero, bit five to switch one, bit six to switch two, and bit seven to switch three. One zero in each bit position denotes a depressed switch, a one bit corresponds to a switch in the up position. This call is provided for compatibility with the existing BIOS joystick routines. To read the joystick switches you should use the Read16Sw call described later in this document.
24.5.3
Read4Pots: Inputs: ah= 84h, dx = 1 This is the standard BIOS read pots call. It reads the four pots on the standard game adapter card and returns their readings in the ax (x axis/pot 0), bx (y axis/pot 1), cx (pot 2), and dx (pot 3) registers. These are raw, uncalibrated, pot readings whose values will differ from machine to machine and vary depending upon the game I/O card in use. This call is provided for compatibility with the existing BIOS
Page 1263
Chapter 24
joystick routines. To read the pots you should use the ReadPot, Read4, or ReadRaw routines described in the next several sections.
24.5.4
ReadPot Inputs: ah=84h, dh=1, dl =Pot number.
This reads the specified pot and returns a normalized pot value in the range 0..255 in the al register. This routine also sets ah to zero. Although the SGDI standard provides for up to 255 different pots, most adapters only support pots zero, one, two, and three. If you attempt to read any nonsupported pot this function returns zero in ax. Since the values are normalized, this call returns comparable values for a given game control setting regardless of machine, clock frequency, or game I/O card in use. For example, a reading of 128 corresponds (roughly) to the center setting on almost any machine. To properly produce normalized results, you must calibrate a given pot before making this call. See the CalibratePot routine for more details.
24.5.5
Read4: Inputs: ah = 84h, al = pot mask, dx=0200h
This routine reads the four pots on the game adapter card, just like the BIOS call (Read4Pots). However, it returns normalized values in al (x axis/pot 0), ah (y axis/pot 1), dl (pot 2), and dh (pot 3). Since this routine returns normalized values between zero and 255, you must calibrate the pots before calling this code. The al register contains a “pot mask” value. The L.O. four bits of al determine if this routine will actually read each pot. If bit zero, one, two, or three is one, then this function will read the corresponding pot; if the bits are zero, this routine will not read the corresponding pot and will return zero in the corresponding register.
24.5.6
CalibratePot Inputs: ah=84h, dh=3, dl =pot #, al=minimum value, bx =maximum value, cx =centered value.
Before you attempt to read a pot with the ReadPot or Read4 routines, you need to calibrate that pot. If you read a pot without first calibrating it, the SGDI driver will return only zero for that pot reading. To calibrate a pot you will need to read raw values for the pot in a minimum position, maximum position, and a centered position6. These must be raw pot readings. Use readings obtained by the Read4Pots routine. In theory, you need only calibrate a pot once after loading the SGDI driver. However, temperature fluctuations and analog circuitry drift may decalibrate a pot after considerable use. Therefore, you should recalibrate the pots you intend to read each time the user runs your application. Furthermore, you should give the user the option of recalibrating the pots at any time within your program.
24.5.7
TestPotCalibration Inputs: ah= 84h, dh=4 , dl = pot #.
This routine returns zero or one in ax denoting not calibrated or calibrated, respectively. You can use the call to see if the pots you intend to use have already been calibrated and you can skip the calibration phase. Please, however, note the comments about drift in the previous paragraph.
6. Many programmers compute the centered value as the arithmetic mean of the minimum and maximum values.
Page 1264
The Game Adapter
24.5.8
ReadRaw Inputs: ah = 84h, dh = 5, dl = pot #
Reads the specified pot and returns a raw (not calibrated) value in ax. You can use this routine to obtain minimum, centered, and maximum values for use when calling the calibrate routine.
24.5.9
ReadSwitch Inputs: ah= 84h, dh = 8, dl = switch #
This routine reads the specified switch and returns zero in ax if the switch is not depressed. It returns one if the switch is depressed. Note that this value is opposite the bit settings the Read4Sw function returns. If you attempt to read a switch number for an input that is not available on the current device, the SGDI driver will return zero (switch up). Standard game devices only support switches zero through three and most joysticks only provide two switches. Therefore, unless you are willing to tie your application to a specific device, you shouldn’t use any switches other than zero or one.
24.5.10 Read16Sw Inputs: ah = 84h, dh = 9
This SGDI routine reads up to sixteen switches with a single call. It returns a bit vector in the ax register with bit 0 corresponding to switch zero, bit one corresponding to switch one, etc. Ones denote switch depressed and zeros denote switches not depressed. Since the standard game adapter only supports four switches, only bits zero through three of al contain meaningful data (for those devices). All other bits will always contain zero. SGDI drivers for the CH Product’s Flightstick Pro and Thrustmaster joysticks will return bits for the entire set of switches available on those devices.
24.5.11 Remove Inputs: ah= 84h, dh= 80h This call will attempt to remove the SGDI driver from memory. Generally, only the SGDI.EXE code itself would invoke this routine. You should use the TestPresence routine (described next) to see if the driver was actually removed from memory by this call.
24.5.12 TestPresence Inputs: ah=84h, dh=81h
If an SGDI driver is present in memory, this routine return ax=0 and a pointer to an identification string in es:bx. If an SGDI driver is not present, this call will return ax unchanged.
24.5.13 An SGDI Driver for the Standard Game Adapter Card If you write your program to make SGDI calls, you will discover that the TestPresence call will probably return “not present” when your program searches for a resident SGDI driver in memory. This is because few manufacturers provide SGDI drivers at this point and even fewer standard game adapter
Page 1265
Chapter 24
companies ship any software at all with their products, much less an SGDI driver. Gee, what kind of standard is this if no one uses it? Well, the purpose of this section is to rectify that problem. The assembly code that appears at the end of this section provides a fully functional, public domain, SGDI driver for the standard game adapter card (the next section present an SGDI driver for the CH Products’ Flightstick Pro). This allows you to write your application making only SGDI calls. By supplying the SGDI TSR with your product, your customers can use your software with all standard joysticks. Later, if they purchase a specialized device with its own SGDI driver, your software will automatically work with that driver with no changes to your software7. If you do not like the idea of having a user run a TSR before your application, you can always include the following code within your program’s code space and activate it if the SGDI TestPresence call determines that no other SGDI driver is present in memory when you start your program. Here’s the complete code for the standard game adapter SGDI driver: .286 page name title subttl ; ; ; ; ; ; ; ; ; ;
58, 132 SGDI SGDI Driver for Standard Game Adapter Card This Program is Public Domain Material.
SGDI.EXE Usage: SDGI This program loads a TSR which patches INT 15 so arbitrary game programs can read the joystick in a portable fashion. We need to load cseg in memory before any other segments!
cseg cseg
segment ends
para public ‘code’
; Initialization code, which we do not need except upon initial load, ; goes in the following segment: Initialize Initialize
segment ends
para public ‘INIT’
; UCR Standard Library routines which get dumped later on. .xlist include stdlib.a includelib stdlib.lib .list sseg sseg
segment ends
para stack ‘stack’
zzzzzzseg zzzzzzseg
segment ends
para public ‘zzzzzzseg’
CSEG
segment assume
para public ‘CODE’ cs:cseg, ds:nothing
wp byp
equ equ
<word ptr>
Int15Vect
dword
0
PSP
word
?
7. Of course, your software may not take advantage of extra features, like additional switches and pots, but at least your software will support the standard set of features on that device.
Page 1266
The Game Adapter ; Port addresses for a typical joystick card: JoyPort JoyTrigger
equ equ
201h 201h
; Data structure to hold information about each pot. ; (mainly for calibration and normalization purposes). Pot PotMask DidCal min max center Pot
struc byte byte word word word ends
0 0 5000 0 0
;Pot mask for hardware. ;Is this pot calibrated? ;Minimum pot value ;Max pot value ;Pot value in the middle
; Variables for each of the pots. Must initialize the masks so they ; mask out all the bits except the incomming bit for each pot. Pot0 Pot1 Pot2 Pot3
Pot Pot Pot Pot
<1> <2> <4> <8>
; The IDstring address gets passed back to the caller on a testpresence ; call. The four bytes before the IDstring must contain the serial number ; and current driver number. SerialNumber IDNumber IDString
byte byte byte byte
0,0,0 0 “Standard SGDI Driver”,0 “Public Domain Driver Written by Randall L. Hyde”,0
;============================================================================ ; ; ReadPotsAH contains a bit mask to determine which pots we should read. ; Bit 0 is one if we should read pot 0, bit 1 is one if we should ; read pot 1, bit 2 is one if we should read pot 2, bit 3 is one ; if we should read pot 3. All other bits will be zero. ; ; This code returns the pot values in SI, BX, BP, and DI for Pot 0, 1, ; 2, & 3. ; ReadPots
; ; ; ; ; ; ;
near bp, bp si, bp di, bp bx, bp
Wait for any previous signals to finish up before trying to read this guy. It is possible that the last pot we read was very short. However, the trigger signal starts timers running for all four pots. This code terminates as soon as the current pot times out. If the user immediately reads another pot, it is quite possible that the new pot’s timer has not yet expired from the previous read. The following loop makes sure we aren’t measuring the time from the previous read.
Wait4Clean:
; ; ; ;
proc sub mov mov mov
mov mov in and loopnz
dx, JoyPort cx, 400h al, dx al, 0Fh Wait4Clean
Okay, read the pots. The following code triggers the 558 timer chip and then sits in a loop until all four pot bits (masked with the pot mask in AL) become zero. Each time through this loop that one or more of these bits contain zero, this loop increments the corresponding register(s). mov
dx, JoyTrigger
Page 1267
Chapter 24
PotReadLoop:
PotReadDone: ReadPots
out mov mov in and jz shr adc shr adc shr adc shr adc loop
dx, al dx, JoyPort cx, 1000h al, dx al, ah PotReadDone al, 1 si, 0 al, 1 bx, 0 al, 1 bp, 0 al, 1 di, 0 PotReadLoop
and and and and ret endp
si, bx, bp, di,
0FFFh 0FFFh 0FFFh 0FFFh
;Trigger pots ;Don’t let this go on forever.
;Increment SI if pot 0 still active. ;Increment BX if pot 1 still active. ;Increment BP if pot 2 still active. ;Increment DI if pot 3 still active. ;Stop, eventually, if funny hardware. ;If we drop through to this point, ; one or more pots timed out (usually ; because they are not connected). ; The reg contains 4000h, set it to 0.
;---------------------------------------------------------------------------; ; Normalize- BX contains a pointer to a pot structure, AX contains ; a pot value. Normalize that value according to the ; calibrated pot. ; ; Note: DS must point at cseg before calling this routine.
Normalize
assume proc push
ds:cseg near cx
; Sanity check to make sure the calibration process went okay. cmp je
[bx].Pot.DidCal, 0 ;Is this pot calibrated? BadNorm ;If not, quit.
mov cmp jbe cmp jae
dx, [bx].Pot.Center dx, [bx].Pot.Min ; BadNorm ; dx, [bx].Pot.Max ; BadNorm
;Do a sanity check on the min, center, and max values to make sure min < center < max.
; Clip the value if it is out of range. cmp ja mov
ax, [bx].Pot.Min MinOkay ax, [bx].Pot.Min
;If the value is less than ; the minimum value, set it ; to the minimum value.
cmp jb mov
ax, [bx].Pot.Max MaxOkay ax, [bx].Pot.Max
;If the value is greater than ; the maximum value, set it ; to the maximum value.
MinOkay:
MaxOkay: ; Scale this guy around the center: cmp jb
ax, [bx].Pot.Center ;See if less than or greater Lower128 ; than centered value.
; Okay, current reading is greater than the centered value, scale the reading ; into the range 128..255 here: sub mov mov mov mov
Page 1268
ax, dl, ah, dh, al,
[bx].Pot.Center ah ;Multiply by 128 al 0 dh
The Game Adapter shr rcr mov sub jz div add cmp je mov jmp
dl, 1 ax, 1 cx, [bx].Pot.Max cx, [bx].Pot.Center BadNorm ;Prevent division by zero. cx ;Compute normalized value. ax, 128 ;Scale to range 128..255. ah, 0 NormDone ax, 0ffh ;Result must fit in 8 bits! NormDone
; If the reading is below the centered value, scale it into the range ; 0..127 here: Lower128:
sub mov mov mov mov shr rcr mov sub jz div cmp je mov jmp
ax, [bx].Pot.Min dl, ah ah, al dh, 0 al, dh dl, 1 ax, 1 cx, [bx].Pot.Center cx, [bx].Pot.Min BadNorm cx ah, 0 NormDone ax, 0ffh NormDone
; If something went wrong, return zero as the normalized value. BadNorm:
sub
ax, ax
NormDone:
pop ret endp assume
cx
Normalize
ds:nothing
;============================================================================ ; INT 15h handler functions. ;============================================================================ ; ; Although these are defined as near procs, they are not really procedures. ; The MyInt15 code jumps to each of these with BX, a far return address, and ; the flags sitting on the stack. Each of these routines must handle the ; stack appropriately. ; ;---------------------------------------------------------------------------; BIOS- Handles the two BIOS calls, DL=0 to read the switches, DL=1 to ; read the pots. For the BIOS routines, we’ll ignore the cooley ; switch (the hat) and simply read the other four switches. BIOS
proc cmp jb je
near dl, 1 Read4Sw ReadBIOSPots
;See if switch or pot routine.
; If not a valid BIOS call, jump to the original INT 15 handler and ; let it take care of this call. pop jmp
bx cs:Int15Vect
;Let someone else handle it!
; BIOS read switches function. Read4Sw:
push mov in and pop pop iret
dx dx, JoyPort al, dx al, 0F0h dx bx
;Return only switch values.
Page 1269
Chapter 24 ; BIOS read pots function. ReadBIOSPots: pop push push push mov call mov mov mov pop pop pop iret BIOS endp
bx si di bp ah, 0Fh ReadPots ax, si cx, bp dx, di bp di si
;Return a value in BX!
;Read all four pots. ;BX already contains pot 1 reading.
;---------------------------------------------------------------------------; ; ReadPotOn entry, DL contains a pot number to read. ; Read and normalize that pot and return the result in AL. ReadPot ;;;;;;;;;;
assume proc push push push push push push push
ds:cseg near bx ds cx dx si di bp
mov mov
bx, cseg ds, bx
;Already on stack.
; If dl = 0, read and normalize the value for pot 0, if not, try some ; other pot. cmp jne mov call lea mov call jmp
dl, 0 Try1 ah, Pot0.PotMask ReadPots bx, Pot0 ax, si Normalize GotPot
;Get bit for this pot. ;Read pot 0. ;Pointer to pot data. ;Get pot 0 reading. ;Normalize to 0..FFh. ;Return to caller.
; Test for DL=1 here (read and normalize pot 1). Try1:
cmp jne mov call mov lea call jmp
dl, 1 Try2 ah, Pot1.PotMask ReadPots ax, bx bx, Pot1 Normalize GotPot
; Test for DL=2 here (read and normalize pot 2). Try2:
cmp jne mov call lea mov call jmp
dl, 2 Try3 ah, Pot2.PotMask ReadPots bx, Pot2 ax, bp Normalize GotPot
; Test for DL=3 here (read and normalize pot 3). Try3:
Page 1270
cmp jne
dl, 3 BadPot
The Game Adapter mov call lea mov call jmp
ah, Pot3.PotMask ReadPots bx, Pot3 ax, di Normalize GotPot
; Bad value in DL if we drop to this point. The standard game card ; only supports four pots. BadPot: GotPot:
ReadPot
sub pop pop pop pop pop pop pop iret endp assume
ax, ax bp di si dx cx ds bx
;Pot not available, return zero.
ds:nothing
;---------------------------------------------------------------------------; ; ReadRawOn entry, DL contains a pot number to read. ; Read that pot and return the unnormalized result in AX. ReadRaw ;;;;;;;;;;
assume proc push push push push push push push
ds:cseg near bx ds cx dx si di bp
mov mov
bx, cseg ds, bx
;Already on stack.
; This code is almost identical to the ReadPot code. The only difference ; is that we don’t bother normalizing the result and (of course) we return ; the value in AX rather than AL. cmp jne mov call mov jmp
dl, 0 Try1 ah, Pot0.PotMask ReadPots ax, si GotPot
Try1:
cmp jne mov call mov jmp
dl, 1 Try2 ah, Pot1.PotMask ReadPots ax, bx GotPot
Try2:
cmp jne mov call mov jmp
dl, 2 Try3 ah, Pot2.PotMask ReadPots ax, bp GotPot
Try3:
cmp jne mov call mov jmp
dl, 3 BadPot ah, Pot3.PotMask ReadPots ax, di GotPot
BadPot:
sub
ax, ax
;Pot not available, return zero.
Page 1271
Chapter 24 GotPot:
ReadRaw
pop pop pop pop pop pop pop iret endp assume
bp di si dx cx ds bx ds:nothing
;---------------------------------------------------------------------------; Read4Pots- Reads pots zero, one, two, and three returning their ; values in AL, AH, DL, and DH. ; ; On entry, AL contains the pot mask to select which pots ; we should read (bit 0=1 for pot 0, bit 1=1 for pot 1, etc). Read4Pots ;;;;;;;;;;;
Read4Pots
proc push push push push push push
near bx ds cx si di bp
mov mov
dx, cseg ds, dx
mov call
ah, al ReadPots
push mov lea call mov
bx ax, si bx, Pot0 Normalize cl, al
;Save pot 1 reading. ;Get pot 0 reading. ;Point bx at pot0 vars. ;Normalize. ;Save for later.
pop lea call mov
ax bx, Pot1 Normalize ch, al
;Retreive pot 1 reading.
mov lea call mov
ax, bp bx, Pot2 Normalize dl, al
;Pot 2 value.
mov lea call mov mov
ax, di bx, Pot3 Normalize dh, al ax, cx
;Pot 3 value. ;Pots 0 and 1.
pop pop pop pop pop pop iret endp
bp di si cx ds bx
;Already on stack
;Save normalized value.
;---------------------------------------------------------------------------; CalPotCalibrate the pot specified by DL. On entry, AL contains ; the minimum pot value (it better be less than 256!), BX ; contains the maximum pot value, and CX contains the centered ; pot value. assume
Page 1272
ds:cseg
The Game Adapter CalPot
proc pop push push mov mov
near bx ds si si, cseg ds, si
;Retrieve maximum value
; Sanity check on parameters, sort them in ascending order:
GoodMax: GoodMin:
mov cmp ja xchg cmp jb xchg cmp jb xchg
ah, 0 bx, cx GoodMax bx, cx ax, cx GoodMin ax, cx cx, bx GoodCenter cx, bx
;Make sure center < max ;Make sure min < center. ; (note: may make center<max). ;Again, be sure center < max.
GoodCenter: ; Okay, figure out who were supposed to calibrate:
DoCal:
CalDone: CalPot
lea cmp jb lea je lea cmp jb jne lea
si, Pot0 dl, 1 DoCal si, Pot1 DoCal si, Pot2 dl, 3 DoCal CalDone si, Pot3
mov mov mov mov pop pop iret endp assume
[si].Pot.min, ax ;Store away the minimum, [si].Pot.max, bx ; maximum, and [si].Pot.center, cx ; centered values. [si].Pot.DidCal, 1 ;Note we’ve cal’d this pot. si ds
;Branch if this is pot 0 ;Branch if this is pot 1 ;Branch if this is pot 2 ;Branch if not pot 3
ds:nothing
;---------------------------------------------------------------------------; TestCalJust checks to see if the pot specified by DL has already ; been calibrated. TestCal ;;;;;;;;
GetCal: BadCal: TestCal
assume proc push push mov mov
ds:cseg near bx ds bx, cseg ds, bx
sub lea cmp jb lea je lea cmp jb jne lea
ax, ax bx, Pot0 dl, 1 GetCal bx, Pot1 GetCal bx, Pot2 dl, 3 GetCal BadCal bx, Pot3
mov pop pop iret endp
al, [bx].Pot.DidCal ds bx
;Already on stack
;Assume no calibration (also zeros AH) ;Get the address of the specified ; pot’s data structure into the ; BX register.
Page 1273
Chapter 24 assume
ds:nothing
;---------------------------------------------------------------------------; ; ReadSwReads the switch whose switch number appears in DL. ReadSw ;;;;;;;
NotDown: ReadSw
proc push push
near bx cx
sub cmp ja
ax, ax dl, 3 NotDown
;Assume no such switch. ;Return if the switch number is ; greater than three.
mov add mov in shr xor and pop pop iret endp
cl, cl, dx, al, al, al, ax, cx bx
;Save switch to read. ;Move from position four down to zero.
dl 4 JoyPort dx cl 1 1
;Already on stack
;Read the switches. ;Move desired switch bit into bit 0. ;Invert so sw down=1. ;Remove other junk bits.
;---------------------------------------------------------------------------; ; Read16SwReads all four switches and returns their values in AX. Read16Sw ;;;;;;;;
Read16Sw
proc push mov in shr xor and pop iret endp
near bx dx, JoyPort al, dx al, 4 al, 0Fh ax, 0Fh bx
;Already on stack
;Invert all switches. ;Set other bits to zero.
;**************************************************************************** ; ; MyInt15Patch for the BIOS INT 15 routine to control reading the ; joystick. MyInt15
proc push cmp je pop jmp
far bx ah, 84h DoJoystick bx cs:Int15Vect
DoJoystick:
mov mov cmp jae cmp jae shl jmp
bh, 0 bl, dh bl, 80h VendorCalls bx, JmpSize OtherInt15 bx, 1 wp cs:jmptable[bx]
jmptable
word word word word =
BIOS ReadPot, Read4Pots, CalPot, TestCal ReadRaw, OtherInt15, OtherInt15 ReadSw, Read16Sw ($-jmptable)/2
OtherInt15:
JmpSize
; Handle vendor specific calls here.
Page 1274
;Joystick code?
The Game Adapter VendorCalls:
je cmp je pop jmp
RemoveDriver bl, 81h TestPresence bx cs:Int15Vect
; TestPresence- Returns zero in AX and a pointer to the ID string in ES:BX TestPresence: pop sub mov mov lea iret
bx ax, bx, es, bx,
;Get old value off stack. ax cseg bx IDString
; RemoveDriver-If there are no other drivers loaded after this one in ; memory, disconnect it and remove it from memory. RemoveDriver: push push push push
ds es ax dx
mov mov
dx, cseg ds, dx
; See if we’re the last routine patched into INT 15h
CantRemove:
MyInt15 cseg
Initialize Main
mov int cmp jne mov cmp jne
ax, 3515h 21h bx, offset MyInt15 CantRemove bx, es bx, wp seg MyInt15 CantRemove
mov mov push mov mov mov int
ax, es, es ax, es, ah, 21h
pop mov int
es ah, 49h 21h
;Now free program space.
lds mov int
dx, Int15Vect ax, 2515h 21h
;Restore previous int vect.
pop pop pop pop pop iret endp ends
dx ax es ds bx
segment assume proc mov mov mov mov
para public ‘INIT’ cs:Initialize, ds:cseg ax, cseg es, ax es:PSP, ds ds, ax
mov
ax, zzzzzzseg
PSP ax
;Free the memory we’re in
es:[2ch] ax 49h
;First, free env block.
;Get ptr to vars segment ;Save PSP value away
Page 1275
Chapter 24 mov mov meminit2 print byte byte byte byte byte byte byte byte
Installed:
NotRemoved:
es, ax cx, 100h
“ Standard Game Device Interface driver”,cr,lf “ PC Compatible Game Adapter Cards”,cr,lf “ Written by Randall Hyde”,cr,lf cr,lf cr,lf “‘SGDI REMOVE’ removes the driver from memory”,cr,lf lf 0
mov argv stricmpl byte jne
ax, 1
mov mov int test jz print byte byte mov int
dh, 81h ax, 84ffh 15h ax, ax Installed
mov mov int mov mov int cmp je print byte byte mov int
ax, 8400h dh, 80h 15h ax, 8400h dh, 81h 15h ax, 0 NotRemoved
print byte mov int
;If no parameters, empty str. “REMOVE”,0 NoRmv ;Remove opcode. ;See if we’re already loaded. ;Get a zero back?
“SGDI driver is not present in memory, REMOVE “ “command ignored.”,cr,lf,0 ax, 4c01h;Exit to DOS. 21h ;Remove call ;TestPresence call
“Successfully removed SGDI driver from memory.” cr,lf,0 ax, 4c01h ;Exit to DOS. 21h “SGDI driver is still present in memory.”,cr,lf,0 ax, 4c01h ;Exit to DOS. 21h
; Okay, Patch INT 15 and go TSR at this point. NoRmv:
Page 1276
mov int mov mov
ax, 3515h 21h wp Int15Vect, bx wp Int15Vect+2, es
mov mov mov mov int
dx, ds, dx, ax, 21h
cseg dx offset MyInt15 2515h
mov mov mov sub add mov
dx, ds, dx, dx, dx, ax,
cseg dx seg Initialize ds:psp 2 3100h ;Do TSR
The Game Adapter Main
int endp
Initialize
ends
sseg
segment word word ends
para stack ‘stack’ 128 dup (0) ?
segment byte ends end
para public ‘zzzzzzseg’ 16 dup (0)
endstk sseg zzzzzzseg zzzzzzseg
21h
Main
The following program makes several different types of calls to an SGDI driver. You can use this code to test out an SGDI TSR: .xlist include stdlib.a includelib stdlib.lib .list cseg
segment assume
para public ‘code’ cs:cseg, ds:nothing
MinVal0 MinVal1 MaxVal0 MaxVal1
word word word word
? ? ? ?
; Wait4Button- Waits until the user presses and releases a button. Wait4Button
proc push push push
near ax dx cx
W4BLp:
mov mov int cmp je
ah, 84h dx, 900h 15h ax, 0 W4BLp
xor loop
cx, cx Delay
;Debouncing delay loop.
W4nBLp:
mov mov int cmp jne
ah, 84h dx, 900h 15h ax, 0 W4nBLp
;Now wait until the user releases ; all buttons
Delay2:
loop
Delay2 cx dx ax
Wait4Button
pop pop pop ret endp
Main
proc
Delay:
print byte
;Read the L.O. 16 buttons. ;Any button down? If not, ; loop until this is so.
“SGDI Test Program.”,cr,lf
Page 1277
Chapter 24 byte byte
“Written by Randall Hyde”,cr,lf,lf “Press any key to continue”,cr,lf,0
getc mov mov int cmp je print byte jmp MainLoop0:print byte
ah, 84h dh, 4 15h ax, 0 MainLoop0
;Test presence call. ;See if there
“No SGDI driver present in memory.”,cr,lf,0 Quit “BIOS: “,0
; Okay, read the switches and raw pot values using the BIOS compatible calls. mov mov int puth mov putc
ah, 84h dx, 0 15h
mov mov int putw mov putc mov putw mov putc mov putw mov putc mov putw
ah, 84h dx, 1 15h
putcr mov int je getc
;BIOS compat. read switches. ;Output switch values.
al, ‘ ‘ ;BIOS compat. read pots.
al, ‘ ‘ ax, bx al, ‘ ‘ ax, cx al, ‘ ‘ ax, dx
ah, 1 16h MainLoop0
;Repeat until key press.
; Read the minimum and maximum values for each pot from the user so we ; can calibrate the pots.
Page 1278
print byte byte byte
cr,lf,lf,lf “Move joystick to upper left corner and press “ “any button.”,cr,lf,0
call mov mov int mov mov
Wait4Button ah, 84h dx, 1 15h MinVal0, ax MinVal1, bx
print byte byte byte
cr,lf “Move the joystick to the lower right corner “ “and press any button”,cr,lf,0
call mov mov int
Wait4Button ah, 84h dx, 1 15h
;Read Raw Values
;Read Raw Values
The Game Adapter mov mov
MaxVal0, ax MaxVal1, bx
; Calibrate the pots.
MainLoop1:
mov mov mov add shr mov mov int
ax, bx, cx, cx, cx, ah, dx, 15h
MinVal0;Will be eight bits or less. MaxVal0 bx ;Compute centered value as the ax ; average of these two (this is 1 ; dangerous, but usually works!) 84h 300h;Calibrate pot 0
mov mov mov add shr mov mov int
ax, bx, cx, cx, cx, ah, dx, 15h
MinVal1;Will be eight bits or less. MaxVal1 bx ;Compute centered value as the ax ; average of these two (this is 1 ; dangerous, but usually works!) 84h 301h ;Calibrate pot 1
print byte
“ReadSw: “,0
; Okay, read the switches and raw pot values using the BIOS compatible calls. mov mov int or putc
ah, 84h dx, 800h 15h al, ‘0’
mov mov int or putc
ah, 84h dx, 801h 15h al, ‘0’
mov mov int or putc
ah, 84h dx, 802h 15h al, ‘0’
mov mov int or putc
ah, 84h dx, 803h 15h al, ‘0’
mov mov int or putc
ah, 84h dx, 804h 15h al, ‘0’
mov mov int or putc
ah, 84h dx, 805h 15h al, ‘0’
mov mov int or putc
ah, 84h dx, 806h 15h al, ‘0’
mov mov int or
ah, 84h dx, 807h 15h al, ‘0’
;Read switch zero.
;Read switch one.
;Read switch two.
;Read switch three.
;Read switch four
;Read switch five.
;Read switch six.
;Read switch seven. ;We won’t bother with ; any more switches.
Page 1279
Chapter 24 putc mov putc mov mov int putw print byte mov mov int puth mov putc mov puth mov putc mov mov int putw putcr mov int je getc
24.6
al, ‘ ‘ ah, 84h dh, 9 15h
;Read all 16 switches.
“ Pots: “,0 ax, 8403h dx, 200h 15h
;Read joystick pots. ;Read four pots.
al, ‘ ‘ al, ah al, ‘ ‘ ah, 84h dx, 503h 15h
ah, 1 16h MainLoop1
;Raw read, pot 3.
;Repeat until key press.
Quit: Main
ExitPgm endp
;DOS macro to quit program.
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?) Main
An SGDI Driver for the CH Products’ Flight Stick Pro The CH Product’s FlightStick Pro joystick is a good example of a specialized product for which the SGDI driver is a perfect solution. The FlightStick Pro provides three pots and five switches, the fifth switch being a special five-position cooley switch. Although the pots on the FlightStick Pro map to three of the analog inputs on the standard game adapter card (pots zero, one, and three), there are insufficient digital inputs to handle the eight inputs necessary for the FlightStick Pro’s four buttons and cooley switch. The FlightStick Pro (FSP) uses some electronic circuitry to map these eight switch positions to four input bits. To do so, they place one restriction on the use of the FSP switches – you can only press one of them at a time. If you hold down two or more switches at the same time, the FSP hardware selects one of the switches and reports that value; it ignores the other switches until you release the button. Since only one switch can be read at a time, the FSP hardware generates a four bit value that determines the current state of the switches. It returns these four bits as the switch values on the standard game adapter card. The following table lists the values for each of the switches:
Page 1280
The Game Adapter
Table 88: FlightStick Pro Switch Return Values Value (binary)
Priority
Switch Position
0000
Highest
Up position on the cooley switch.
0100
7
Right position on the cooley switch.
1000
6
Down position on the cooley switch.
1100
5
Left position on the cooley switch.
1110
4
Trigger on the joystick.
1101
3
Leftmost button on the joystick.
1011
2
Rightmost button on the joystick.
0111
Lowest
1111
Middle button on the joystick. No buttons currently down.
Note that the buttons look just like a single button press. The cooley switch positions contain a position value in bits six and seven; bits four and five always contain zero when the cooley switch is active. The SGDI driver for the FlightStick Pro is very similar to the standard game adapter card SGDI driver. Since the FlightStick Pro only provides three pots, this code doesn’t bother trying to read pot 2 (which is non-existent). Of course, the switches on the FlightStick Pro are quite a bit different than those on standard joysticks, so the FSP SGDI driver maps the FPS switches to eight of the SGDI logical switches. By reading switches zero through seven, you can test the following conditions on the FSP:
Table 89: Flight Stick Pro SGDI Switch Mapping This SGDI Switch number:
Maps to this FSP Switch:
0
Trigger on joystick.
1
Left button on joystick.
2
Middle button on joystick.
3
Right button on joystick.
4
Cooley up position.
5
Cooley left position.
6
Cooley right position.
7
Cooley down position.
The FSP SGDI driver contains one other novel feature, it will allow the user to swap the functions of the left and right switches on the joystick. Many games often assign important functions to the trigger and left button since they are easiest to press (right handed players can easily press the left button with their thumb). By typing “LEFT” on the command line, the FSP SGDI driver will swap the functions of the left and right buttons so left handed players can easily activate this function with their thumb as well. The following code provides the complete listing for the FSPSGDI driver. Note that you can use the same test program from the previous section to test this driver. .286 page name title
58, 132 FSPSGDI FSPSGDI (CH Products Standard Game Device Interface).
; FSPSGDI.EXE
Page 1281
Chapter 24 ; ; Usage: ; FSPSDGI {LEFT} ; ; This program loads a TSR which patches INT 15 so arbitrary game programs ; can read the CH Products FlightStick Pro joystick in a portable fashion. wp byp
equ equ
<word ptr>
; We need to load cseg in memory before any other segments! cseg cseg
segment ends
para public ‘code’
; Initialization code, which we do not need except upon initial load, ; goes in the following segment: Initialize Initialize
segment ends
para public ‘INIT’
; UCR Standard Library routines which get dumped later on. .xlist include stdlib.a includelib stdlib.lib .list sseg sseg
segment ends
para stack ‘stack’
zzzzzzseg zzzzzzseg
segment ends
para public ‘zzzzzzseg’
CSEG
segment assume
para public ‘CODE’ cs:cseg, ds:nothing
Int15Vect
dword
0
PSP
word
?
; Port addresses for a typical joystick card: JoyPort JoyTrigger
equ equ
201h 201h
CurrentReading word
0
Pot PotMask DidCal min max center Pot
struc byte byte word word word ends
0 0 5000 0 0
Pot0 Pot1 Pot3
Pot Pot Pot
<1> <2> <8>
;Pot mask for hardware. ;Is this pot calibrated? ;Minimum pot value ;Max pot value ;Pot value in the middle
; SwapButtons- 0 if we should use normal flightstick pro buttons, ; 1 if we should swap the left and right buttons. SwapButtons
byte
0
; SwBits- the four bit input value from the Flightstick Pro selects one
Page 1282
The Game Adapter ;
of the following bit patterns for a given switch position.
SwBits
SwBitsL
byte byte byte byte byte byte byte byte
10h 0 0 0 40h 0 0 4
;Sw4 ;NA ;NA ;NA ;Sw6 ;NA ;NA ;Sw 2
byte byte byte byte byte byte byte byte
80h 0 0 8 20h 2 1 0
;Sw ;NA ;NA ;Sw ;Sw ;Sw ;Sw ;NA
byte byte byte byte byte byte byte byte
10h 0 0 0 40h 0 0 4
;Sw4 ;NA ;NA ;NA ;Sw6 ;NA ;NA ;Sw 2
byte byte byte byte byte byte byte byte
80h 0 0 2 20h 8 1 0
;Sw ;NA ;NA ;Sw ;Sw ;Sw ;Sw ;NA
7 3 5 1 0
7 3 5 1 0
; The IDstring address gets passed back to the caller on a testpresence ; call. The four bytes before the IDstring must contain the serial number ; and current driver number. SerialNumber IDNumber IDString
byte byte byte byte
0,0,0 0 “CH Products:Flightstick Pro”,0 “Written by Randall Hyde”,0
;============================================================================ ; ; ReadPotsAH contains a bit mask to determine which pots we should read. ; Bit 0 is one if we should read pot 0, bit 1 is one if we should ; read pot 1, bit 3 is one if we should read pot 3. All other bits ; will be zero. ; ; This code returns the pot values in SI, BX, BP, and DI for Pot 0, 1, ; 2, & 3. ; ReadPots
proc sub mov mov mov
near bp, bp si, bp di, bp bx, bp
; Wait for pots to finish any past junk:
Wait4Pots:
mov out mov in and
dx, dx, cx, al, al,
JoyPort al 400h dx 0Fh
;Trigger pots
Page 1283
Chapter 24 loopnz
Wait4Pots
; Okay, read the pots:
PotReadLoop:
mov out mov mov in and jz shr adc shr adc shr adc loop
dx, JoyTrigger dx, al dx, JoyPort cx, 8000h al, dx al, ah PotReadDone al, 1 si, 0 al, 1 bp, 0 al, 2 di, 0 PotReadLoop
;Trigger pots ;Don’t let this go on forever.
PotReadDone: ReadPots
ret endp
;---------------------------------------------------------------------------; ; Normalize- BX contains a pointer to a pot structure, AX contains ; a pot value. Normalize that value according to the ; calibrated pot. ; ; Note: DS must point at cseg before calling this routine.
Normalize
assume proc push
ds:cseg near cx
; Sanity check to make sure the calibration process went okay. cmp je mov cmp jbe cmp jae
[bx].Pot.DidCal, 0 BadNorm dx, [bx].Pot.Center dx, [bx].Pot.Min BadNorm dx, [bx].Pot.Max BadNorm
; Clip the value if it is out of range. cmp ja mov
ax, [bx].Pot.Min MinOkay ax, [bx].Pot.Min
cmp jb mov
ax, [bx].Pot.Max MaxOkay ax, [bx].Pot.Max
MinOkay:
MaxOkay: ; Scale this guy around the center: cmp jb
ax, [bx].Pot.Center Lower128
; Scale in the range 128..255 here: sub mov mov mov mov shr rcr mov sub jz
Page 1284
ax, [bx].Pot.Center dl, ah ;Multiply by 128 ah, al dh, 0 al, dh dl, 1 ax, 1 cx, [bx].Pot.Max cx, [bx].Pot.Center BadNorm ;Prevent division by zero.
The Game Adapter div add cmp je mov jmp
cx ax, 128 ah, 0 NormDone ax, 0ffh NormDone
;Compute normalized value. ;Scale to range 128..255. ;Result must fit in 8 bits!
; Scale in the range 0..127 here: Lower128:
sub mov mov mov mov shr rcr mov sub jz div cmp je mov jmp
ax, [bx].Pot.Min dl, ah ;Multiply by 128 ah, al dh, 0 al, dh dl, 1 ax, 1 cx, [bx].Pot.Center cx, [bx].Pot.Min BadNorm cx ;Compute normalized value. ah, 0 NormDone ax, 0ffh ;Result must fit in 8 bits! NormDone
BadNorm: NormDone:
sub pop ret endp assume
ax, ax cx
Normalize
ds:nothing
;============================================================================ ; INT 15h handler functions. ;============================================================================ ; ; Although these are defined as near procs, they are not really procedures. ; The MyInt15 code jumps to each of these with BX, a far return address, and ; the flags sitting on the stack. Each of these routines must handle the ; stack appropriately. ; ;---------------------------------------------------------------------------; BIOS- Handles the two BIOS calls, DL=0 to read the switches, DL=1 to ; read the pots. For the BIOS routines, we’ll ignore the cooley ; switch (the hat) and simply read the other four switches. BIOS
proc cmp jb je pop jmp
near dl, 1 Read4Sw ReadBIOSPots bx cs:Int15Vect
Read4Sw:
push mov in shr mov mov cmp je mov jmp
dx dx, JoyPort al, dx al, 4 bl, al bh, 0 cs:SwapButtons, 0 DoLeft2 al, cs:SwBitsL[bx] SBDone
DoLeft2: SBDone:
mov rol not pop pop iret
al, cs:SwBits[bx] al, 4 ;Put Sw0..3 in upper bits and make al ; 0=switch down, just like game card. dx bx
ReadBIOSPots: pop push push push
bx si di bp
;See if switch or pot routine.
;Let someone else handle it!
;Return a value in BX!
Page 1285
Chapter 24
BIOS
mov call mov mov mov sub pop pop pop iret endp
ah, 0bh ReadPots ax, si bx, bp dx, di cx, cx bp di si
;---------------------------------------------------------------------------; ; ReadPotOn entry, DL contains a pot number to read. ; Read and normalize that pot and return the result in AL. assume proc push push push push push push push
ds:cseg near bx ds cx dx si di bp
mov mov
bx, cseg ds, bx
cmp jne mov call lea mov call jmp
dl, 0 Try1 ah, Pot0.PotMask ReadPots bx, Pot0 ax, si Normalize GotPot
Try1:
cmp jne mov call lea mov call jmp
dl, 1 Try3 ah, Pot1.PotMask ReadPots bx, Pot1 ax, bp Normalize GotPot
Try3:
cmp jne mov call lea mov call jmp
dl, 3 BadPot ah, Pot3.PotMask ReadPots bx, Pot3 ax, di Normalize GotPot
BadPot:
sub
ax, ax
GotPot:
pop pop pop pop pop pop pop iret endp assume
bp di si dx cx ds bx
ReadPot ;;;;;;;;;;
ReadPot
;Already on stack.
;Question: Should we pass this on ; or just return zero?
ds:nothing
;---------------------------------------------------------------------------;
Page 1286
The Game Adapter ; ReadRaw;
On entry, DL contains a pot number to read. Read that pot and return the unnormalized result in AL. assume proc push push push push push push push
ds:cseg near bx ds cx dx si di bp
mov mov
bx, cseg ds, bx
cmp jne mov call mov jmp
dl, 0 Try1 ah, Pot0.PotMask ReadPots ax, si GotPot
Try1:
cmp jne mov call mov jmp
dl, 1 Try3 ah, Pot1.PotMask ReadPots ax, bp GotPot
Try3:
cmp jne mov call mov jmp
dl, 3 BadPot ah, Pot3.PotMask ReadPots ax, di GotPot
BadPot: GotPot:
sub pop pop pop pop pop pop pop iret endp assume
ax, ax bp di si dx cx ds bx
ReadRaw ;;;;;;;;;;
ReadRaw
;Already on stack.
;Just return zero.
ds:nothing
;---------------------------------------------------------------------------; Read4Pots-Reads pots zero, one, two, and three returning their ; values in AL, AH, DL, and DH. Since the flightstick ; Pro doesn’t have a pot 2 installed, return zero for ; that guy. Read4Pots ;;;;;;;;;;;
proc push push push push push push
near bx ds cx si di bp
mov mov
dx, cseg ds, dx
mov call
ah, 0bh ReadPots
mov lea call mov
ax, si bx, Pot0 Normalize cl, al
;Already on stack
;Read pots 0, 1, and 3.
Page 1287
Chapter 24
Read4Pots
mov lea call mov
ax, bp bx, Pot1 Normalize ch, al
mov lea call mov mov mov
ax, di bx, Pot3 Normalize dh, al ax, cx dl, 0
pop pop pop pop pop pop iret endp
bp di si cx ds bx
;Pot 3 value. ;Pots 0 and 1. ;Pot 2 is non-existant.
;---------------------------------------------------------------------------; CalPotCalibrate the pot specified by DL. On entry, AL contains ; the minimum pot value (it better be less than 256!), BX ; contains the maximum pot value, and CX contains the centered ; pot value. CalPot
assume proc pop push push mov mov
ds:cseg near bx ds si si, cseg ds, si
;Retrieve maximum value
; Sanity check on parameters, sort them in ascending order:
GoodMax: GoodMin:
mov cmp ja xchg cmp jb xchg cmp jb xchg
ah, 0 bx, cx GoodMax bx, cx ax, cx GoodMin ax, cx cx, bx GoodCenter cx, bx
GoodCenter: ; Okay, figure out who were supposed to calibrate:
DoCal:
CalDone: CalPot
Page 1288
lea cmp jb lea je cmp jne lea
si, Pot0 dl, 1 DoCal si, Pot1 DoCal dl, 3 CalDone si, Pot3
mov mov mov mov pop pop iret endp assume
[si].Pot.min, ax [si].Pot.max, bx [si].Pot.center, cx [si].Pot.DidCal, 1 si ds ds:nothing
The Game Adapter
;---------------------------------------------------------------------------; TestCalJust checks to see if the pot specified by DL has already ; been calibrated. TestCal ;;;;;;;;
GetCal: BadCal: TestCal
assume proc push push mov mov
ds:cseg near bx ds bx, cseg ds, bx
sub lea cmp jb lea je cmp jne lea
ax, ax bx, Pot0 dl, 1 GetCal bx, Pot1 GetCal dl, 3 BadCal bx, Pot3
mov mov pop pop iret endp assume
al, [bx].Pot.DidCal ah, 0 ds bx
;Already on stack
;Assume no calibration
ds:nothing
;---------------------------------------------------------------------------; ; ReadSwReads the switch whose switch number appears in DL. SwTable
byte byte
11100000b, 11010000b, 01110000b, 10110000b 00000000b, 11000000b, 01000000b, 10000000b
SwTableL
byte byte
11100000b, 10110000b, 01110000b, 11010000b 00000000b, 11000000b, 01000000b, 10000000b
ReadSw ;;;;;;;
proc push mov mov mov in and cmp je cmp jne jmp
near bx ;Already on stack bl, dl ;Save switch to read. bh, 0 dx, JoyPort al, dx al, 0f0h cs:SwapButtons, 0 DoLeft0 al, cs:SwTableL[bx] NotDown IsDown
DoLeft0:
cmp jne
al, cs:SwTable[bx] NotDown
IsDown:
mov pop iret
ax, 1 bx
NotDown:
sub pop iret endp
ax, ax bx
ReadSw
;---------------------------------------------------------------------------; ; Read16SwReads all eight switches and returns their values in AX. Read16Sw ;;;;;;;;
proc push
near bx
;Already on stack
Page 1289
Chapter 24
DoLeft1: R8Done: Read16Sw
mov mov in shr mov mov cmp je mov jmp
ah, 0 ;Switches 8-15 are non-existant. dx, JoyPort al, dx al, 4 bl, al bh, 0 cs:SwapButtons, 0 DoLeft1 al, cs:SwBitsL[bx] R8Done
mov pop iret endp
al, cs:SwBits[bx] bx
;**************************************************************************** ; ; MyInt15Patch for the BIOS INT 15 routine to control reading the ; joystick. MyInt15
proc push cmp je pop jmp
far bx ah, 84h DoJoystick bx cs:Int15Vect
DoJoystick:
mov mov cmp jae cmp jae shl jmp
bh, 0 bl, dh bl, 80h VendorCalls bx, JmpSize OtherInt15 bx, 1 wp cs:jmptable[bx]
jmptable
word word word word =
BIOS ReadPot, Read4Pots, CalPot, TestCal ReadRaw, OtherInt15, OtherInt15 ReadSw, Read16Sw ($-jmptable)/2
OtherInt15:
JmpSize
;Joystick code?
; Handle vendor specific calls here. VendorCalls:
je cmp je pop jmp
RemoveDriver bl, 81h TestPresence bx cs:Int15Vect
; TestPresence- Returns zero in AX and a pointer to the ID string in ES:BX TestPresence: pop sub mov mov lea iret
bx ax, bx, es, bx,
;Get old value off stack. ax cseg bx IDString
; RemoveDriver-If there are no other drivers loaded after this one in ; memory, disconnect it and remove it from memory. RemoveDriver:
Page 1290
push push push push
ds es ax dx
mov mov
dx, cseg ds, dx
The Game Adapter ; See if we’re the last routine patched into INT 15h mov int cmp jne mov cmp jne
ax, 3515h 21h bx, offset MyInt15 CantRemove bx, es bx, wp seg MyInt15 CantRemove
mov mov push mov mov mov int
ax, es, es ax, es, ah, 21h
pop mov int
es ah, 49h 21h
;Now free program space.
lds mov int
dx, Int15Vect ax, 2515h 21h
;Restore previous int vect.
pop pop pop pop pop iret endp ends
dx ax es ds bx
PSP ax
;Free the memory we’re in
es:[2ch] ax 49h
;First, free env block.
;
CantRemove:
MyInt15 cseg
; The following segment is tossed when this code goes resident. Initialize Main
segment assume proc mov mov mov mov mov mov mov meminit2 print byte byte byte byte byte byte byte byte byte mov argv stricmpl byte jne mov print byte jmp
NoLEFT:
para public ‘INIT’ cs:Initialize, ds:cseg ax, cseg es, ax es:PSP, ds ds, ax
;Get ptr to vars segment ;Save PSP value away
ax, zzzzzzseg es, ax cx, 100h
“Standard Game Device Interface driver”,cr,lf “CH Products Flightstick Pro”,cr,lf “Written by Randall Hyde”,cr,lf cr,lf “‘FSPSGDI LEFT’ swaps the left and right buttons for “ “left handed players”,cr,lf “‘FSPSGDI REMOVE’ removes the driver from memory” cr, lf, lf 0 ax, 1 ;If no parameters, empty str. “LEFT”,0 NoLEFT SwapButtons, 1 “Left and right buttons swapped”,cr,lf,0 SwappedLeft
stricmpl
Page 1291
Chapter 24
Installed:
NotRemoved:
byte jne mov mov int test jz print byte byte mov int
“REMOVE”,0 NoRmv dh, 81h ax, 84ffh 15h ax, ax Installed
mov mov int mov mov int cmp je print byte byte mov int
ax, 8400h dh, 80h 15h ax, 8400h dh, 81h 15h ax, 0 NotRemoved
print byte mov int
;See if we’re already loaded. ;Get a zero back?
“SGDI driver is not present in memory, REMOVE “ “command ignored.”,cr,lf,0 ax, 4c01h;Exit to DOS. 21h ;Remove call ;TestPresence call
“Successfully removed SGDI driver from memory.” cr,lf,0 ax, 4c01h ;Exit to DOS. 21h “SGDI driver is still present in memory.”,cr,lf,0 ax, 4c01h;Exit to DOS. 21h
NoRmv: ; Okay, Patch INT 15 and go TSR at this point. SwappedLeft:
mov int mov mov
ax, 3515h 21h wp Int15Vect, bx wp Int15Vect+2, es
mov mov mov mov int
dx, ds, dx, ax, 21h
cseg dx offset MyInt15 2515h
dx, ds, dx, dx, dx, ax, 21h
cseg dx seg Initialize ds:psp 2 3100h ;Do TSR
Main
mov mov mov sub add mov int endp
Initialize
ends
sseg
segment word word ends
para stack ‘stack’ 128 dup (0) ?
segment byte ends end
para public ‘zzzzzzseg’ 16 dup (0)
endstk sseg zzzzzzseg zzzzzzseg
Page 1292
Main
The Game Adapter
24.7
Patching Existing Games Maybe you’re not quite ready to write the next million dollar game. Perhaps you’d like to get a little more enjoyment out of the games you already own. Well, this section will provide a practical application of a semiresident program that patches the Lucas Arts’ XWing (Star Wars simulation) game. This program patches the XWing game to take advantage of the special features found on the CH Products’ FlightStick Pro. In particular, it lets you use the throttle pot on the FSP to control the speed of the spacecraft. It also lets you program each of the buttons with up to four strings of eight characters each. To describe how you can patch an existing game, a short description of how this patch was developed is in order. The FSPXW patch was developed by using the Soft-ICEdebugging tool. This program lets you set a breakpoint whenever an 80386 or later processor accesses a specific I/O port8. Setting a breakpoint at I/O address 201h while running the xwing.exe file stopped the XWing program when it decided to read the analog and switch inputs. Disassembly of the surrounding code produced complete joystick and button read routines. After locating these routines, it was easy enough to write a program to search through memory for the code and patch in jumps to code in the FSPXW patch program. Note that the original joystick code inside XWing works perfectly fine with the FPS. The only reason for patching into the joystick code is so our code can read the throttle every how and then and take appropriate action. The button routines were another story altogether. The FSPXW patch needs to take control of XWing’s button routines because the user of FSPXW might want to redefine a button recognized by XWing for some other purpose. Therefore, whenever XWing calls its button routine, control transfers to the button routine inside FSPXW that decides whether to pass real button information back to XWing or to fake buttons in the up position because those buttons are redefined to other functions. By default (unless you change the source code, the buttons have the following programming:
Rotate Ship
Hide/Show Cockpit
Weapons Select The programming of the cooley switch demonstrates an interesting feature of the FSPXW patch: you can program up to four different strings on each button. The first time you press a button, FSPXW emits the first string, the second time you press a button it emits the second string, then the third, and finally the fourth. If the string is empty, the FSPXW string skips it. The FSPXW patch uses the cooley switch to select the cockpit views. Pressing the cooley switch forward displays the forward view. Pulling the cooley switch backwards presents the rear view. However, the XWing game provides three left and right views. Pushing the cooley switch to the left or right once displays the 45 degree view. Pressing it a second time presents
8. This feature is not specific to Soft-ICE, many 80386 debuggers will let you do this.
Page 1293
Chapter 24
the 90 degree view. Pressing it to the left or right a third time provides the 135 degree view. The following diagram shows the default programming on the cooley switch:
Forward View 315°
Left Views
45°
270°
90°
225°
Right Views
135°
Back View One word of caution concerning this patch: it only works with the basic XWing game. It does not support the add-on modules (Imperial Pursuit, B-Wing, Tie Fighter, etc.). Furthermore, this patch assumes that the basic XWing code has not changed over the years. It could be that a recent release of the XWing game uses new joystick routines and the code associated with this application will not be able to locate or patch those new routines. This patch will detect such a situation and will not patch XWing if this is the case. You must have sufficient free RAM for this patch, XWing, and anything else you have loaded into memory at the same time (the exact amount of RAM XWing needs depends upon the features you’ve installed, a fully installed system requires slightly more than 610K free). Without further ado, here’s the FSPXW code:
.286 page name title subttl
58, 132 FSPXW FSPXW (Flightstick Pro driver for XWING). Copyright (C) 1994 Randall Hyde.
; FSPXW.EXE ; ; Usage: ; FSPXW ; ; This program executes the XWING.EXE program and patches it to use the ; Flightstick Pro.
Page 1294
byp wp
textequ textequ
<word ptr>
cseg cseg
segment para public ‘CODE’ ends
sseg sseg
segment ends
para stack ‘STACK’
zzzzzzseg zzzzzzseg
segment ends
para public ‘zzzzzzseg’
The Game Adapter include stdlib.a includelib stdlib.lib matchfuncs Installation Installation CSEG
ifndef segment ends endif
debug para public ‘Install’
segment assume
para public ‘CODE’ cs:cseg, ds:nothing
; Timer interrupt vector Int1CVect ; PSP;
dword
?
Program Segment Prefix. Needed to free up memory before running the real application program.
PSP
word
0
; Program Loading data structures (for DOS). ExecStruct
LoadSSSP LoadCSIP PgmName
word dword dword dword dword dword dword
0 CmdLine DfltFCB DfltFCB ? ? Pgm
;Use parent’s Environment blk. ;For the cmd ln parms.
; Variables for the throttle pot. ; LastThrottle contains the character last sent (so we only send one copy). ; ThrtlCntDn counts the number of times the throttle routine gets called. LastThrottle ThrtlCntDn
byte byte
0 10
; Button Mask- Used to mask out the programmed buttons when the game ; reads the real buttons. ButtonMask
byte
0f0h
; The following variables allow the user to reprogram the buttons. KeyRdf Ptrs ptr2 ptr3 ptr4 Index Cnt Pgmd KeyRdf
struct word word word word word word word ends
? ? ? ? ? ? ?
;The PTRx fields point at the ; four possible strings of 8 chars ; each. Each button press cycles ; through these strings. ;Index to next string to output. ;Flag = 0 if not redefined.
; Left codes are output if the cooley switch is pressed to the left. ; Note that the strings ares actually zero terminated strings of words. Left Left1 Left2 Left3 Left4
KeyRdf word word word word
‘7’, 0 ‘4’, 0 ‘1’, 0 0
; Right codes are output if the cooley switch is pressed to the Right.
Page 1295
Chapter 24 Right Right1 Right2 Right3 Right4
KeyRdf word word word word
‘9’, 0 ‘6’, 0 ‘3’, 0 0
; Up codes are output if the cooley switch is pressed Up. Up Up1 Up2 Up3 Up4
KeyRdf word word word word
‘8’, 0 0 0 0
; DownKey codes are output if the cooley switch is pressed Down. Down Down1 Down2 Down3 Down4
KeyRdf word word word word
‘2’, 0 0 0 0
; Sw0 codes are output if the user pulls the trigger.(This switch is not ; redefined.) Sw0 Sw01 Sw02 Sw03 Sw04
KeyRdf word word word word
<Sw01, Sw02, Sw03, Sw04, 0, 0, 0> 0 0 0 0
; Sw1 codes are output if the user presses Sw1 (the left button ; if the user hasn’t swapped the left and right buttons). Not Redefined. Sw1 Sw11 Sw12 Sw13 Sw14
KeyRdf word word word word
<Sw11, Sw12, Sw13, Sw14, 0, 0, 0> 0 0 0 0
; Sw2 codes are output if the user presses Sw2 (the middle button). Sw2 Sw21 Sw22 Sw23 Sw24
KeyRdf word word word word
<Sw21, Sw22, Sw23, Sw24, 0, 2, 1> ‘w’, 0 0 0 0
; Sw3 codes are output if the user presses Sw3 (the right button ; if the user hasn’t swapped the left and right buttons). Sw3 Sw31 Sw32 Sw33 Sw34
KeyRdf word word word word
<Sw31, Sw32, Sw33, Sw34, 0, 0, 0> 0 0 0 0
; Switch status buttons: CurSw LastSw
byte byte
0 0
;**************************************************************************** ; FSPXW patch begins here. This is the memory resident part. Only put code ; which which has to be present at run-time or needs to be resident after ; freeing up memory. ;**************************************************************************** Main
Page 1296
proc mov mov mov
cs:PSP, ds ax, cseg ds, ax
;Get ptr to vars segment
The Game Adapter
; Get the current INT 1Ch interrupt vector: mov int mov mov
ax, 351ch 21h wp Int1CVect, bx wp Int1CVect+2, es
; The following call to MEMINIT assumes no error occurs. If it does, ; we’re hosed anyway. mov mov mov meminit2
ax, zzzzzzseg es, ax cx, 1024/16
; Do some initialization before running the game. These are calls to the ; initialization code which gets dumped before actually running XWING. call call call
far ptr ChkBIOS15 far ptr Identify far ptr Calibrate
; If any switches were programmed, remove those switches from the ; ButtonMask: mov cmp je and
al, 0f0h sw0.pgmd, 0 Sw0NotPgmd al, 0e0h
;Assume all buttons are okay.
cmp je and
sw1.pgmd, 0 Sw1NotPgmd al, 0d0h
;Remove Sw1 from contention.
cmp je and
sw2.pgmd, 0 Sw2NotPgmd al, 0b0h
;Remove Sw2 from contention.
cmp je and
sw3.pgmd, 0 Sw3NotPgmd al, 070h
;Remove Sw3 from contention.
mov
ButtonMask, al
;Save result as button mask
;Remove sw0 from contention.
Sw0NotPgmd:
Sw1NotPgmd:
Sw2NotPgmd:
Sw3NotPgmd: ; ; ; ; ;
Now, free up memory from ZZZZZZSEG on to make room for XWING. Note: Absolutely no calls to UCR Standard Library routines from this point forward! (ExitPgm is okay, it’s just a macro which calls DOS.) Note that after the execution of this code, none of the code & data from zzzzzzseg on is valid. mov sub inc mov mov int jnc print byte byte jmp
bx, zzzzzzseg bx, PSP bx es, PSP ah, 4ah 21h GoodRealloc “Memory allocation error.” cr,lf,0 Quit
GoodRealloc: ; Now load the XWING program into memory: mov mov
bx, seg ExecStruct es, bx
Page 1297
Chapter 24 mov lds mov int jc
bx, offset ExecStruc ;Ptr to program record. dx, PgmName ax, 4b01h ;Load, do not exec, pgm 21h Quit ;If error loading file.
; Search for the joystick code in memory: mov mov xor
si, zzzzzzseg ds, si si, si
mov mov mov mov call jc
di, cs es, di di, offset JoyStickCode cx, JoyLength FindCode Quit ;If didn’t find joystick code.
; Patch the XWING joystick code here mov mov mov
byp ds:[si], 09ah wp ds:[si+1], offset ReadGame wp ds:[si+3], cs
;Far call
; Find the Button code here. mov mov xor
si, zzzzzzseg ds, si si, si
mov mov mov mov call jc
di, cs es, di di, offset ReadSwCode cx, ButtonLength FindCode Quit
; Patch the button code here. mov mov mov mov
byp ds:[si], 9ah wp ds:[si+1], offset ReadButtons wp ds:[si+3], cs byp ds:[si+5], 90h ;NOP.
; Patch in our timer interrupt handler: mov mov mov mov int
ax, dx, ds, dx, 21h
251ch seg MyInt1C dx offset MyInt1C
; Okay, start the XWING.EXE program running
Quit:
Page 1298
mov int mov mov mov mov mov mov jmp
ah, 62h ;Get PSP 21h ds, bx es, bx wp ds:[10], offset Quit wp ds:[12], cs ss, wp cseg:LoadSSSP+2 sp, wp cseg:LoadSSSP dword ptr cseg:LoadCSIP
lds mov int ExitPgm
dx, cs:Int1CVect ax, 251ch 21h
;Restore timer vector.
The Game Adapter Main
endp
;**************************************************************************** ; ; ReadGameThis routine gets called whenever XWing reads the joystick. ; On every 10th call it will read the throttle pot and send ; appropriate characters to the type ahead buffer, if ; necessary. ReadGame
assume proc dec jne mov
ds:nothing far cs:ThrtlCntDn ;Only do this each 10th time SkipThrottle ; XWING calls the joystick cs:ThrtlCntDn, 10 ; routine.
push push push
ax bx di
mov mov int
ah, 84h dx, 103h 15h
;No need to save bp, dx, or cx as ; XWING preserves these. ;Read the throttle pot
; Convert the value returned by the pot routine into the four characters ; 0..63:”\”, 64..127:”[“, 128..191:”]”, 192..255:, to denote zero, 1/3, ; 2/3, and full power, respectively.
SetPower:
mov mov cmp jae mov cmp jae mov cmp jae mov cmp je mov call
SkipPIB:
pop pop pop SkipThrottle: neg neg sti ret ReadGame endp ReadButtons
ReadButtons
assume proc mov mov int not and ret endp
dl, al ax, “\” ;Zero power dl, 192 SetPower ax, “[“ ;1/3 power. dl, 128 SetPower ax, “]” ;2/3 power. dl, 64 SetPower ax, 8 ;BS, full power. al, cs:LastThrottle SkipPIB cs:LastThrottle, al PutInBuffer di bx ax bx di
ds:nothing far ah, 84h dx, 0 15h al al, ButtonMask
;XWING returns data in these registers. ;We patched the NEG and STI instrs ; so do that here.
;Turn off pgmd buttons.
; MyInt1C- Called every 1/18th second. Reads switches and decides if it ; should shove some characters into the type ahead buffer. MyInt1c
assume proc push push push push mov
ds:cseg far ds ax bx dx ax, cseg
Page 1299
Chapter 24 mov
ds, ax
mov mov
al, CurSw LastSw, al
mov mov int
dx, 900h ah, 84h 15h
mov xor jz and jz
CurSw, al al, LastSw NoChanges al, CurSw NoChanges
;Read the 8 switches.
;See if any changes ;See if sw just went down.
; If a switch has just gone down, output an appropriate set of scan codes ; for it, if that key is active. Note that pressing *any* key will reset ; all the other key indexes.
SetSw0:
NoSw0:
SetSw1:
NoSw1:
Page 1300
test jz cmp je mov mov mov mov mov mov mov mov mov mov mov add cmp jb mov mov call jmp
al, 1 NoSw0 Sw0.Pgmd, 0 NoChanges ax, 0 Left.Index, ax Right.Index, ax Up.Index, ax Down.Index, ax Sw1.Index, ax Sw2.Index, ax Sw3.Index, ax bx, Sw0.Index ax, Sw0.Index bx, Sw0.Ptrs[bx] ax, 2 ax, Sw0.Cnt SetSw0 ax, 0 Sw0.Index, ax PutStrInBuf NoChanges
;See if Sw0 (trigger) was pulled.
test jz cmp je mov mov mov mov mov mov mov mov mov mov mov add cmp jb mov mov call jmp
al, 2 NoSw1 Sw1.Pgmd, 0 NoChanges ax, 0 Left.Index, ax Right.Index, ax Up.Index, ax Down.Index, ax Sw0.Index, ax Sw2.Index, ax Sw3.Index, ax bx, Sw1.Index ax, Sw1.Index bx, Sw1.Ptrs[bx] ax, 2 ax, Sw1.Cnt SetSw1 ax, 0 Sw1.Index, ax PutStrInBuf NoChanges
;See if Sw1 (left sw) was pressed.
test jz cmp je mov
al, 4 NoSw2 Sw2.Pgmd, 0 NoChanges ax, 0
;See if Sw2 (middle sw) was pressed.
;Reset the key indexes for all keys ; except SW0.
;Reset the key indexes for all keys ; except Sw1.
The Game Adapter
SetSw2:
NoSw2:
SetSw3:
NoSw3:
SetUp:
NoUp:
mov mov mov mov mov mov mov mov mov mov add cmp jb mov mov call jmp
Left.Index, ax Right.Index, ax Up.Index, ax Down.Index, ax Sw0.Index, ax Sw1.Index, ax Sw3.Index, ax bx, Sw2.Index ax, Sw2.Index bx, Sw2.Ptrs[bx] ax, 2 ax, Sw2.Cnt SetSw2 ax, 0 Sw2.Index, ax PutStrInBuf NoChanges
;Reset the key indexes for all keys ; except Sw2.
test jz cmp je mov mov mov mov mov mov mov mov mov mov mov add cmp jb mov mov call jmp
al, 8 NoSw3 Sw3.Pgmd, 0 NoChanges ax, 0 Left.Index, ax Right.Index, ax Up.Index, ax Down.Index, ax Sw0.Index, ax Sw1.Index, ax Sw2.Index, ax bx, Sw3.Index ax, Sw3.Index bx, Sw3.Ptrs[bx] ax, 2 ax, Sw3.Cnt SetSw3 ax, 0 Sw3.Index, ax PutStrInBuf NoChanges
;See if Sw3 (right sw) was pressed.
test jz cmp je mov mov mov mov mov mov mov mov mov mov mov add cmp jb mov mov call jmp
al, 10h NoUp Up.Pgmd, 0 NoChanges ax, 0 Right.Index, ax Left.Index, ax Down.Index, ax Sw0.Index, ax Sw1.Index, ax Sw2.Index, ax Sw3.Index, ax bx, Up.Index ax, Up.Index bx, Up.Ptrs[bx] ax, 2 ax, Up.Cnt SetUp ax, 0 Up.Index, ax PutStrInBuf NoChanges
;See if Cooly was pressed upwards.
test jz cmp je mov mov mov
al, 20h NoLeft Left.Pgmd, 0 NoChanges ax, 0 Right.Index, ax Up.Index, ax
;See if Cooley was pressed left.
;Reset the key indexes for all keys ; except Sw3.
;Reset all but Up.
;Reset all but Left.
Page 1301
Chapter 24
SetLeft:
NoLeft:
SetRight:
NoRight:
SetDown: NoChanges:
MyInt1c
mov mov mov mov mov mov mov mov add cmp jb mov mov call jmp
Down.Index, ax Sw0.Index, ax Sw1.Index, ax Sw2.Index, ax Sw3.Index, ax bx, Left.Index ax, Left.Index bx, Left.Ptrs[bx] ax, 2 ax, Left.Cnt SetLeft ax, 0 Left.Index, ax PutStrInBuf NoChanges
test jz cmp je mov mov mov mov mov mov mov mov mov mov mov add cmp jb mov mov call jmp
al, 40h ;See if Cooley was pressed Right NoRight Right.Pgmd, 0 NoChanges ax, 0 Left.Index, ax ;Reset all but Right. Up.Index, ax Down.Index, ax Sw0.Index, ax Sw1.Index, ax Sw2.Index, ax Sw3.Index, ax bx, Right.Index ax, Right.Index bx, Right.Ptrs[bx] ax, 2 ax, Right.Cnt SetRight ax, 0 Right.Index, ax PutStrInBuf NoChanges
test jz cmp je mov mov mov mov mov mov mov mov mov mov mov add cmp jb mov mov call
al, 80h ;See if Cooly was pressed Downward. NoChanges Down.Pgmd, 0 NoChanges ax, 0 Left.Index, ax ;Reset all but Down. Up.Index, ax Right.Index, ax Sw0.Index, ax Sw1.Index, ax Sw2.Index, ax Sw3.Index, ax bx, Down.Index ax, Down.Index bx, Down.Ptrs[bx] ax, 2 ax, Down.Cnt SetDown ax, 0 Down.Index, ax PutStrInBuf
pop pop pop pop jmp endp assume
dx bx ax ds cs:Int1CVect ds:nothing
; PutStrInBuf- BX points at a zero terminated string of words. ; Output each word by calling PutInBuffer.
Page 1302
The Game Adapter PutStrInBuf PutLoop:
PutDone: PutStrInBuf
proc push push mov test jz call add jmp
near ax bx ax, [bx] ax, ax PutDone PutInBuffer bx, 2 PutLoop
pop pop ret endp
bx ax
; PutInBuffer- Outputs character and scan code in AX to the type ahead ; buffer. KbdHead KbdTail KbdBuffer EndKbd Buffer PutInBuffer
; NoWrap:
PIBDone:
PutInBuffer
assume equ equ equ equ equ
ds:nothing word ptr ds:[1ah] word ptr ds:[1ch] word ptr ds:[1eh] 3eh 1eh
proc push push mov mov pushf cli mov inc inc cmp jb mov
near ds bx bx, 40h ds, bx
cmp je xchg mov popf pop pop ret endp
bx, KbdTail bx bx bx, buffer+32 NoWrap bx, buffer bx, KbdHead PIBDone KbdTail, bx ds:[bx], ax
;This is a critical region! ;Get ptr to end of type ; ahead buffer and make room ; for this character. ;At physical end of buffer? ;Wrap back to 1eH if at end. ;Buffer overrun? ;Set new, get old, ptrs. ;Output AX to old location. ;Restore interrupts
bx ds
;**************************************************************************** ; ; FindCode: On entry, ES:DI points at some code in *this* program which ; appears in the ATP game. DS:SI points at a block of memory ; in the XWing game. FindCode searches through memory to find the ; suspect piece of code and returns DS:SI pointing at the start of ; that code. This code assumes that it *will* find the code! ; It returns the carry clear if it finds it, set if it doesn’t. FindCode
DoCmp: CmpLoop:
proc push push push mov push push push repe cmpsb pop pop pop je
near ax bx dx dx, 1000h di si cx
;Save ptr to compare code. ;Save ptr to start of string. ;Save count.
cx si di FoundCode
Page 1303
Chapter 24
FoundCode:
FindCode
inc dec jne sub mov inc mov cmp jb
si dx CmpLoop si, 1000h ax, ds ah ds, ax ax, 9000h DoCmp
pop pop pop stc ret
dx bx ax
pop pop pop clc ret endp
dx bx ax
;**************************************************************************** ; ; Joystick and button routines which appear in XWing game. This code is ; really data as the INT 21h patch code searches through memory for this code ; after loading a file from disk.
JoyStickCode
JoyStickCode EndJSC:
proc sti neg neg pop pop pop ret mov in mov not and jnz in endp
near bx di bp dx cx bp, bx al, dx bl, al al al, ah $+11h al, dx
JoyLength
=
EndJSC-JoyStickCode
ReadSwCode
proc mov in xor and endp
dx, al, al, ax,
ButtonLength
=
EndRSC-ReadSwCode
cseg
ends
Installation
segment
ReadSwCode EndRSC:
201h dx 0ffh 0f0h
; Move these things here so they do not consume too much space in the ; resident part of the patch. DfltFCB CmdLine Pgm
Page 1304
byte byte byte byte
3,” “,0,0,0,0,0 2, “ “, 0dh, 126 dup (“ “) “XWING.EXE”,0 128 dup (?)
;Cmd line for program ;For user’s name
The Game Adapter
; ChkBIOS15- Checks to see if the INT 15 driver for FSPro is present in memory. ChkBIOS15
proc mov mov int mov strcmpl byte jne ret
NoDriverLoaded: print byte byte byte byte exitpgm ChkBIOS15
far ah, 84h dx, 8100h 15h di, bx “CH Products:Flightstick Pro”,0 NoDriverLoaded
“CH Products SGDI driver for Flightstick Pro is not “ “loaded into memory.”,cr,lf “Please run FSPSGDI before running this program.” cr,lf,0
endp
;**************************************************************************** ; ; IdentifyPrints a sign-on message. Identify
assume proc
ds:nothing far
; Print a welcome string. Note that the string “VersionStr” will be ; modified by the “version.exe” program each time you assemble this code. print byte byte byte byte byte byte Identify
cr,lf,lf “X W I N G P A T C H”,cr,lf “CH Products Flightstick Pro”,cr,lf “Copyright 1994, Randall Hyde”,cr,lf lf 0
ret endp
;**************************************************************************** ; ; Calibrate the throttle down here: Calibrate
assume proc print byte byte byte byte
ds:nothing far cr,lf,lf “Calibration:”,cr,lf,lf “Move the throttle to one extreme and press any “ “button:”,0
call mov mov int push
Wait4Button ah, 84h dx, 1h 15h dx
print byte byte byte
cr,lf “Move the throttle to the other extreme and press “ “any button:”,0
call mov mov int pop
Wait4Button ah, 84h dx, 1 15h bx
;Save pot 3 reading.
Page 1305
Chapter 24 mov cmp jb xchg mov sub shr add mov mov int ret endp
ax, dx ax, bx RangeOkay ax, bx cx, bx cx, ax cx, 1 cx, ax ah, 84h dx, 303h 15h
proc mov mov int and cmp jne
near ah, 84h dx, 0 15h al, 0F0h al, 0F0h Wait4Button
mov loop
cx, 0 Delay
Wait4Press:
mov int je getc
ah, 1 16h NoKbd
;Eat any characters from the ; keyboard which come along, and ; handle ctrl-C as appropriate.
NoKbd:
mov mov int and cmp je
ah, 84h dx, 0 15h al, 0F0h al, 0F0h Wait4Press
;Now wait for any button to be ; pressed.
RangeOkay:
Calibrate
Wait4Button
Delay:
Wait4Button Installation sseg endstk sseg zzzzzzseg Heap zzzzzzseg
24.8
;Compute a centered value.
;Calibrate pot three.
;First, wait for all buttons ; to be released.
ret endp ends segment word word ends
para stack ‘STACK’ 256 dup (0) ?
segment byte ends end
para public ‘zzzzzzseg’ 1024 dup (0) Main
Summary The PC’s game adapter card lets you connect a wide variety of game related input devices to your PC. Such devices include digital joysticks, paddles, analog joysticks, steering wheels, yokes, and more. Paddle input devices provide one degree of freedom, joysticks provide two degrees of freedom along an (X,Y) axis pair. Steering wheels and yokes also provide two degrees of freedom, though they are designed for different types of games. For more information on these input devices, see •
“Typical Game Devices” on page 1255
Most game input devices connect to the PC through the game adapter card. This device provides for up to four digital (switch) inputs and four analog (resistive) inputs. This device appears as a single I/O location in the PC’s I/O address space. Four of the bits at this port correspond to the four switches, four of the inputs provide the status of the timer pulses from the 558 chip for the analog inputs. The switches you Page 1306
The Game Adapter
can read directly from the port; to read the analog inputs, you must create a timing loop to count how long it takes for the pulse associated with a particular device to go from high to low. For more information on the game adapter hardware, see: •
“The Game Adapter Hardware” on page 1257
Programming the game adapter would be a simple task except that you will get different readings for the same relative pot position with different game adapter cards, game input devices, computer systems, and software. The real trick to programming the game adapter is to produce consistent results, regardless of the actual hardware in use. If you can live with raw input values, the BIOS provides two functions to read the switches and the analog inputs. However, if you need normalized values, you will probably have to write your own code. Still, writing such code is very easy if you remember some basic high school algebra. So see how this is done, check out • •
“Using BIOS’ Game I/O Functions” on page 1259 “Writing Your Own Game I/O Routines” on page 1260
As with the other devices on the PC, there is a problem with accessing the game adapter hardware directly, such code will not work with game input hardware that doesn’t adhere strictly to the original PC’s design criteria. Fancy game input devices like the Thrustmaster joystick and the CH Product’s FlightStick Pro will require you to write special software drivers. Furthermore, your basic joystick code may not even work with future devices, even if they provide a minimal set of features compatible with standard game input devices. Unfortunately, the BIOS services are very slow and not very good, so few programmers make BIOS calls, allowing third party developers to provide replacement device drivers for their game devices. To help alleviate this problem, this chapter presents the Standard Game Device Input application programmer’s interface – a set of functions specifically designed to provide an extensible, portable, system for game input device programmers. The current specification provides for up to 256 digital and 256 analog input devices and is easily extended to handle output devices and other input devices as well. For the details, see • •
“The Standard Game Device Interface (SGDI)” on page 1262 “Application Programmer’s Interface (API)” on page 1262
Since this chapter introduces the SGDI driver, there aren’t many SGDI drivers provided by game adapter manufacturers at this point. So if you write software that makes SGDI driver calls, you will find that there are few machines that will have an SGDI TSR in memory. Therefore, this chapter provides SGDI drivers for the standard game adapter card and the standard input devices. It also provides an SGDI driver for the CH Products’ FlightStick Pro joystick. To obtain these freely distributable drivers, see • •
“An SGDI Driver for the Standard Game Adapter Card” on page 1265 “An SGDI Driver for the CH Products’ Flight Stick Pro” on page 1280
This chapter concludes with an example of a semiresident program that makes SGDI calls. This program, that patches the popular XWing game, provides full support for the CH Product’s FlightStick Pro in XWing. This program demonstrates many of the features of an SGDI driver as well as providing and example of how to patch a commercially available game. For the explanation and the source code, see •
“Patching Existing Games” on page 1293
Page 1307
Chapter 24
Page 1308
Optimizing Your Programs
Chapter 25
Since program optimization is generally one of the last steps in software development, it is only fitting to discuss program optimization in the last chapter of this text. Scanning through other texts that cover this subject, you will find a wide variety of opinions on this subject. Some texts and articles ignore instruction sets altogether and concentrate on finding a better algorithm. Other documents assume you’ve already found the best algorithm and discuss ways to select the “best” sequence of instructions to accomplish the job. Others consider the CPU architecture and describe how to “count cycles” and pair instructions (especially on superscalar processors or processes with pipelines) to produce faster running code. Others, still, consider the system architecture, not just the CPU architecture, when attempting to decide how to optimize your program. Some authors spend a lot of time explaining that their method is the “one true way” to faster programs. Others still get off on a software engineering tangent and start talking about how time spent optmizing a program isn’t worthwhile for a variety of reasons. Well, this chapter is not going to present the “one true way,” nor is it going to spend a lot of time bickering about certain optimization techniques. It will simply present you with some examples, options, and suggestions. Since you’re on your own after this chapter, it’s time for you to start making some of your own decisions. Hopefully, this chapter can provide suitable information so you can make correct decisions.
25.0
Chapter Overview
25.1
When to Optimize, When Not to Optimize The optimization process is not cheap. If you develop a program and then determine that it is too slow, you may have to redesign and rewrite major portions of that program to get acceptable performance. Based on this point alone, the world often divides itself into two camps – those who optimize early and those who optimize late. Both groups have good arguements; both groups have some bad arguements. Let’s take a look at both sides of this arguement. The “optimize late” (OL) crowd uses the 90/10 arguement: 90% of a program’s execution time is spent in 10% of the code1. If you try to optimize every piece of code you write (that is, optimize the code before you know that it needs to be optimized), 90% of your effort will go to waste. On the other hand, if you write the code in a normal fashion first and then go in an optimize, you can improve your program’s performance with less work. After all, if you completely removed the 90% portion of your program, your code would only run about 10% faster. On the other hand, if you completely remove that 10% portion, your program will run about 10 times faster. The math is obviously in favor of attacking the 10%. The OL crowd claims that you should write your code with only the normal attention to performance (i.e., given a choice between an O(n2) and an O(n lg n) algorithm, you should choose the latter). Once the program is working correctly you can go back and concentrate your efforts on that 10% of the code that takes all the time. The OL arguements are persuasive. Optimization is a laborious and difficult process. More often that not there is no clear-cut way to speed up a section of code. The only way to determine which of several different options is better is to actually code them all up and compare them. Attempting to do this on the entire program is impractical. However, if you can find that 10% of the code and optimize that, you’ve reduced your workload by 90%, very inviting indeed. Another good arguement the OL group uses is that few programmers are capable of anticipating where the time will be spent in a program. Therefore, the only real way to determine where a program spends its time is to instrument it and measure which functions consume the most time. Obviously, you must have a working program before you can do this. Once
1. Some people prefer to call this the 80/20 rule: 80% of the time is spent in 20% of the code, to be safer in their esitmates. The exact numbers don’t matter. What is important is that most of a program’s execution time is spent in a small amount of the code.
Page 1311 Thi d
t
t d ith F
M k
402
Chapter 25
again, they argue that any time spent optimizing the code beforehand is bound to be wasted since you will probably wind up optimizing that 90% that doesn’t need it. There are, however, some very good counter arguments to the above. First, when most OL types start talking about the 90/10 rule, there is this implicit suggestion that this 10% of the code appears as one big chunk in the middle of the program. A good programmer, like a good surgeon, can locate this malignant mass, cut it out, and replace with with something much faster, thus boosting the speed of your program with only a little effort. Unfortunately, this is not often the case in the real world. In real programs, that 10% of the code that takes up 90% of the execution time is often spread all over your program. You’ll get 1% here, 0.5% over there, a “gigantic” 2.5% in one function, and so on. Worse still, optimizing 1% of the code within one function often requires that you modify some of the other code as well. For example, rewriting a function (the 1%) to speed it up quite a bit may require changing the way you pass parameters to that function. This may require rewriting several sections of code outside that slow 10%. So often you wind up rewriting much more than 10% of the code in order to speed up that 10% that takes 90% of the time. Another problem with the 90/10 rule is that it works on percentages, and the percentages change during optimization. For example, suppose you located a single function that was consuming 90% of the execution time. Let’s suppose you’re Mr. Super Programmer and you managed to speed this routine up by a factor of two. Your program will now take about 55% of the time to run before it was optimized2. If you triple the speed of this routine, your program takes a total of 40% of the original time to execution. If you are really great and you manage to get that function running nine times faster, your program now runs in 20% of the original time, i.e., five times faster. Suppose you could get that function running nine times faster. Notice that the 90/10 rule no longer applies to your program. 50% of the execution time is spent in 10% of your code, 50% is spent in the other 90% of your code. And if you’ve managed to speed up that one function by 900%, it is very unlikely you’re going to squeeze much more out of it (unless it was really bad to begin with). Is it worthwhile messing around with that other 90% of your code? You bet it is. After all, you can improve the performance of your program by 25% if you double the speed of that other code. Note, however, that you only get a 25% performance boost after you optimized the 10% as best you could. Had you optimized the 90% of your program first, you would only have gotten a 5% performance improvement; hardly something you’d write home about. Nonetheless, you can see some situations where the 90/10 rule obviously doesn’t apply and you can see some cases where optimizing that 90% can produce a good boost in performance. The OL group will smile and say “see, that’s the benefit of optimizing late, you can optimize in stages and get just the right amount of optimization you need.” The optimize early (OE) group uses the flaw in percentage arithmetic to point out that you will probably wind up optimizing a large portion of your program anyway. So why not work all this into your design in the first place? A big problem with the OL strategy is that you often wind up designing and writing the program twice – once just to get it functional, the second time to make it practical. After all, if you’re going to have to rewrite that 90% anyway, why not write it fast in the first place? The OE people also point out that although programmers are notoriously bad at determining where a program spends most of its time, there are some obvious places where they know there will be performance problems. Why wait to discover the obvious? Why not handle such problem areas early on so there is less time spent measuring and optimizing that code? Like so many other arguements in Software Engineering, the two camps become quite polarized and swear by a totally pure approach in either direction (either all OE or all OL). Like so many other arguements in Computer Science, the truth actually lies somewhere between these two extremes. Any project where the programmer set out to design the perfect program without worry about performance until the end is doomed. Most programmers in this scenario write terribly slow code. Why? Because it’s easier to do so and they can always “solve the performance problem during the optimization phase.” As a result, the 90% portion of the program is often so slow that even if the time of the other 10% were reduced to zero,
2. Figure the 90% of the code originally took one unit of time to execute and the 10% of the code originally took nine units of time to exeute. If we cut the execution time of the of the 10% in half, we now have 1 unit plus 4.5 units = 5.5 units out of 10 or 55%.
Page 1312
Optimizing Your Programs
the program would still be way too slow. On the other hand, the OE crowd gets so caught up in writing the best possible code that they miss deadlines and the product may never ship. There is one undeniable fact that favors the OL arguement – optimized code is difficult to understand and maintain. Furthermore, it often contains bugs that are not present in the unoptimized code. Since incorrect code is unacceptable, even if it does run faster, one very good arguement against optimizing early is the fact that testing, debugging, and quality assurance represent a large portion of the program development cycle. Optimizing early may create so many additional program errors that you lose any time saved by not having to optimize the program later in the development cycle. The correct time to optimize a program is, well, at the correct time. Unfortunately, the “correct time” varies with the program. However, the first step is to develop program performance requirements along with the other program specifications. The system analyst should develop target response times for all user interactions and computations. During development and testing, programmers have a target to shoot for, so they can’t get lazy and wait for the optimization phase before writing code that performs reasonably well. On the other hand, they also have a target to shoot for and once the code is running fast enough, they don’t have to waste time, or make their code less maintainable; they can go on and work on the rest of the program. Of course, the system analyst could misjudge performance requirements, but this won’t happen often with a good system design. Another consideration is when to perform what. There are several types of optimizations you can perform. For example, you can rearrange instructions to avoid hazards to double the speed of a piece of code. Or you could choose a different algorithm that could run twice as fast. One big problem with optimization is that it is not a single process and many types of optimizations are best done later rather than earlier, or vice versa. For example, choosing a good algorithm is something you should do early on. If you decide to use a better algorithm after implementing a poor one, most of the work on the code implementing the old algorithm is lost. Likewise, instruction scheduling is one of the last optimizations you should do. Any changes to the code after rearranging instructions for performance may force you to spend time rearranging them again later. Clearly, the lower level the optimization (i.e., relying upon CPU or system parameters), the later the optimization should be. Conversely, the higher level the optimization (e.g., choice of algorithm), the sooner should be the optimization. In all cases, though, you should have target performance values in mind while developing code.
25.2
How Do You Find the Slow Code in Your Programs? Although there are problems with the 90/10 rule, the concept behind it is basically solid – programs tend to spend a large amount of their time executing only a small percentage of the code. Clearly, you should optimize the slowest portion of your code first. The only problem is how does one find the slowest code in a program? There are four common techniques programmers use to find the “hot spots” (the places where programs spend most of their time). The first is by trial and error. The second is to optimize everything. The third is to analyze the program. The fourth is to use a profiler or other software monitoring tool to measure the performance of various parts of a program. After locating a hot spot, the programmer can attempt to analyze that section of the program. The trial and error technique is, unfortunately, the most common strategy. A programmer will speed up various parts of the program by making educated guesses about where it is spending most of its time. If the programmer guesses right, the program will run much faster after optimization. Experienced programmers often use this technique successfully to quickly locate and optimize a program. When the programmer guesses correctly, this technique minimizes the amount of time spent looking for hot spots in a program. Unfortunately, most programmers make fairly poor guesses and wind up optimizing the wrong sections of code. Such effort often goes to waste since optimizing the wrong 10% will not improve performance significantly. One of the prime reasons this technique fails so often is that it is often the first choice of inexperienced programmers who cannot easily recognize slow code. Unfotunately, they are probably
Page 1313
Chapter 25
unaware of other techniques, so rather than try a structured approach, they start making (often) uneducated guesses. Another way to locate and optimize the slow portion of a program is to optimize everything. Obviously, this technique does not work well for large programs, but for short sections of code it works reasonably well. Later, this text will provide a short example of an optimization problem and will use this technique to optimize the program. Of course, for large programs or routines this may not be a cost effective approach. However, where appropriate it can save you time while optimizing your program (or at least a portion of your program) since you will not need to carefully analyze and measure the performance of your code. By optimizing everything, you are sure to optimize the slow code. The analysis method is the most difficult of the four. With this method, you study your code and determine where it will spend most of its time based on the data you expect it to process. In theory, this is the best technique. In practice, human beings generally demonstrate a distaste for such analysis work. As such, the analysis is often incorrect or takes too long to complete. Furthermore, few programmers have much experience studying their code to determine where it is spending most of its time, so they are often quite poor at locating hot spots by studying their listings when the need arises. Despite the problems with program analysis, this is the first technique you should always use when attempting to optimize a program. Almost all programs spend most of their time executing the body of a loop or recursive function calls. Therefore, you should try to locate all recursive function calls and loop bodies (especially nested loops) in your program. Chances are very good that a program will be spending most of its time in one of these two areas of your program. Such spots are the first to consider when optimizing your programs. Although the analytical method provides a good way to locate the slow code in a program, analyzing program is a slow, tedious, and boring process. It is very easy to completely miss the most time consuming portion of a program, especially in the presence of indirectly recursive function calls. Even locating time consuming nested loops is often difficult. For example, you might not realize, when looking at a loop within a procedure, that it is a nested loop by virtue of the fact that the calling code executes a loop when calling the procedure. In theory, the analytical method should always work. In practice, it is only marginally successful given that fallible humans are doing the analysis. Nevertheless, some hot spots are easy to find through program analysis, so your first step when optimizing a program should be analysis. Since programmers are notoriously bad at analyzing programs to find their hot spots, it would make since to try an automate this process. This is precisely what a profiler can do for you. A profiler is a small program that measures how long your code spends in any one portion of the program. A profiler typically works by interrupting your code periodically and noting the return address. The profiler builds a histogram of interrupt return addresses (generally rounded to some user specified value). By studying this histogram, you can determine where the program spends most of its time. This tells you which sections of the code you need to optimize. Of course, to use this technique, you will need a profiler program. Borland, Microsoft, and several other vendors provide profilers and other optimization tools.
25.3
Is Optimization Necessary? Except for fun and education, you should never approach a project with the attitude that you are going to get maximal performance out of your code. Years ago, this was an important attitude because that’s what it took to get anything decent running on the slow machines of that era. Reducing the run time of a program from ten minutes to ten seconds made many programs commercially viable. On the other hand, speeding up a program that takes 0.1 seconds to the point where it runs in a millisecond is often pointless. You will waste a lot of effort improving the performance, yet few people will notice the difference. This is not to say that speeding up programs from 0.1 seconds to 0.001 seconds is never worthwhile. If you are writing a data capture program that requires you to take a reading every millisecond, and it can only handle ten readings per second as currently written, you’ve got your work cut out for you. Further-
Page 1314
Optimizing Your Programs
more, even if your program runs fast enough already, there are reasons why you would want to make it run twice as fast. For example, suppose someone can use your program in a multitasking environment. If you modify your program to run twice as fast, the user will be able to run another program along side yours and not notice the performance degradation. However, the thing to always keep in mind is that you need to write software that is fast enough. Once a program produces results instantaneously (or so close to instantaneous that the user can’t tell), there is little need to make it run any faster. Since optimization is an expensive and error prone process, you want to avoid it as much as possible. Writing programs that run faster than fast enough is a waste of time. However, as is obvious from the set of bloated application programs you’ll find today, this really isn’t a problem, most programming produce code that is way too slow, not way too fast. A common reason stated for not producing optimal code is advancing hardware design. Many programmers and managers feel that the high-end machines they develop software on today will be the mid-range machines two years from now when they finally release their software. So if they design their software to run on today’s very high-end machines, it will perform okay on midrange machines when they release their software. There are two problems with the approach above. First, the operating system running on those machines two years from now will gobble a large part of the machine’s resources (including CPU cycles). It is interesting to note that today’s machines are hundreds of times faster than the original 8088 based PCs, yet many applications actually run slower than those that ran on the original PC. True, today’s software provides many more features beyond what the original PC provided, but that’s the whole point of this arguement – customers will demand features like multiple windows, GUI, pull-down menus, etc., that all consume CPU cycles. You cannot assume that newer machines will provide extra clock cycles so your slow code will run faster. The OS or user interface to your program will wind up eating those extra available clock cycles. So the first step is to realistically determine the performance requirements of your software. Then write your software to meet that performance goal. If you fail to meet the performance requirements, then it is time to optimize your program. However, you shouldn’t waste additional time optimizing your code once your program meets or exceed the performance specifications.
25.4
The Three Types of Optimization There are three forms of optimization you can use when improving the performance of a program. They are choosing a better algorithm (high level optimization), implementing the algorithm better (a medium level optmization), and “counting cycles” (a low level optimization). Each technique has its place and, generally, you apply them at different points in the development process. Choosing a better algorithm is the most highly touted optimization technique. Alas it is the technique used least often. It is easy for someone to announce that you should always find a better algorithm if you need more speed; but finding that algorithm is a little more difficult. First, let us define an algorithm change as using a fundamentally different technique to solve the problem. For example, switching from a “bubble sort” algorithm to a “quick sort” algorithm is a good example of an algorithm change. Generally, though certainly not always, changing algorithms means you use a program with a better Big-Oh function3 For example, when switching from the bubble sort to the quick sort, you are swapping an algorithm with an O(n2) running time for one with an O(n lg n) expected running time. You must remember the restrictions on Big-Oh functions when comparing algorithms. The value for n must be sufficiently large to mask the effect of hidden constant. Furthermore, Big-Oh analysis is usually worst-case and may not apply to your program. For example, if you wish to sort an array that is “nearly” sorted to begin with, the bubble sort algorithm is usually much faster than the quicksort algorithm, regard-
3. Big-Oh function are approximations of the running time of a program.
Page 1315
Chapter 25
less of the value for n. For data that is almost sorted, the bubble sort runs in almost O(n) time whereas the quicksort algorithm runs in O(n2) time4. The second thing to keep in mind is the constant itself. If two algorithms have the same Big-Oh function, you cannot determine any difference between the two based on the Big-Oh analysis. This does not mean that they will take the same amount of time to run. Don’t forget, in Big-Oh analysis we throw out all the low order terms and multiplicative constants. The asymptotic notation is of little help in this case. To get truly phenomenal performance improvements requires an algorithmic change to your program. However, discovering an O(n lg n) algorithm to replace your O(n2) algorithm is often difficult if a published solution does not already exist. Presumably, a well-designed program is not going to contain many obvious algorithms you can dramatically improve (if they did, they wouldn’t be well-designed, now, would they?). Therefore, attempting to find a better algorithm may not prove successful. Nevertheless, it is always the first step you should take because the following steps operate on the algorithm you have. If you perform the other steps on a bad algorithm and then discover a better algorithm later, you will have to repeat these time-consumings steps all over again on the new algorithm. There are two steps to discovering a new algorithms: research and development. The first step is to see if you can find a better solution in the existing literature. Failing that, the second step is to see if you can develop a better algorithm on your own. The key thing is to budget an appropriate amount of time to these two activities. Research is an open-ended process. You can always read one more book or article. So you’ve got to decide how much time you’re going to spend looking for an existing solution. This might be a few hours, days, weeks, or months. Whatever you feel is cost-effective. You then head to the library (or your bookshelf) and begin looking for a better solution. Once your time expires, it is time to abandon the research approach unless you are sure you are on the right track in the material you are studying. If so, budget a little more time and see how it goes. At some point, though, you’ve got to decide that you probably won’t be able to find a better solution and it is time to try to develop a new one on your own. While searching for a better solution, you should study the papers, texts, articles, etc., exactly as though you were studying for an important test. While it’s true that much of what you study will not apply to the problem at hand, you are learning things that will be useful in future projects. Furthermore, while someone may not provide the solution you need, they may have done some work that is headed in the same direction that you are and could provide some good ideas, if not the basis, for your own solution. However, you must always remember that the job of an engineer is to provide a cost-effective solution to a problem. If you waste too much time searching for a solution that may not appear anywhere in the literature, you will cause a cost overrun on your project. So know when it’s time to “hang it up” and get on with the rest of the project. Developing a new algorithm on your own is also open-ended. You could literally spend the rest of your life trying to find an efficient solution to an intractible problem. So once again, you need to budget some time for this process accordingly. Spend the time wisely trying to develop a better solution to your problem, but once the time is exhausted, it’s time to try a different approach rather than waste any more time chasing a “holy grail.” Be sure to use all resources at your disposal when trying to find a better algorithm. A local university’s library can be a big help. Also, you should network yourself. Attend local computer club meetings, discuss your problems with other engineers, or talk to interested friends, maybe they’re read about a solution that you’ve missed. If you have access to the Internet, BIX, Compuserve, or other technically oriented on-line services or computerized bulletin board systems, by all means post a message asking for help. With literally millions of users out there, if a better solution exists for your problem, someone has probabaly solved it for you already. A few posts may turn up a solution you were unable to find or develop yourself. At some point or another, you may have to admit failure. Actually, you may have to admit success – you’ve already found as good an algorithm as you can. If this is still too slow for your requirements, it may be time to try some other technique to improve the speed of your program. The next step is to see if you 4. Yes, O(n2). The O(n lg n) rating commonly given the quicksort algorithm is actually the expected (average case) analysis, not the worst case analysis.
Page 1316
Optimizing Your Programs
can provide a better implementation for the algorithm you are using. This optimization step, although independent of language, is where most assembly language programmers produce dramatic performance improvements in their code. A better implementation generally involves steps like unrolling loops, using table lookups rather than computations, eliminating computations from a loop whose value does not change within a loop, taking advantage of machine idioms (such as using a shift or shift and add rather than a multiplication), trying to keep variables in registers as long as possible, and so on. It is surprising how much faster a program can run by using simple techniques like those whose descriptions appear thoughout this text. As a last resort, you can resort to cycle counting. At this level you are trying to ensure that an instruction sequence uses as few clock cycles as possible. This is a difficult optimization to perform because you have to be aware of how many clock cycles each instruction consumes, and that depends on the instruction, the addressing mode in use, the instructions around the current instruction (i.e., pipelining and superscalar effects), the speed of the memory system (wait states and cache), and so on. Needless to say, such optimizations are very tedious and require a very careful analysis of the program and the system on which it will run. The OL crowd always claims you should put off optimization as long as possible. These people are generally talking about this last form of optimization. The reason is simple: any changes you make to your program after such optimizations may change the interaction of the instructions and, therefore, their execution time. If you spend considerable time scheduling a sequence of 50 instructions and then discover you will need to rewrite that code for one reason or another, all the time you spent carefully scheduling those instructions to avoid hazards is lost. On the other hand, if you wait until the last possible moment to make such optimizations to you code, you will only optimize that code once. Many HLL programmers will tell you that a good compiler can beat a human being at scheduling instructions and optimizing code. This isn’t true. A good compiler will beat a mediocre assembly language program a good part of the time. However, a good compiler won’t stand a chance against a good assembly language programmer. After all, the worst that could happen is that the good assembly language programmer will look at the output of the compiler and improve on that. “Counting cycles” can improve the performance of your programs. On the average, you can speed up your programs by a factor of 50% to 200% by making simple changes (like rearranging instructions). That’s the difference between an 80486 and a Pentium! So you shouldn’t ignore the possibility of using such optimizations in your programs. Just keep in mind, you should do such optimizations last so you don’t wind up redoing them as your code changes. The rest of this chapter will concentrate on the techniques for improving the implementation of an algorithm, rather than designing a better algorithm or using cycle counting techniques. Designing better algorithms is beyond the scope of this manual (see a good text on algorithm design). Cycle counting is one of those processes that differs from processor to processor. That is, the optimization techniques that work well for the 80386 fail on a 486 or Pentium chip, and vice versa. Since Intel is constantly producing new chips, requring different optimization techniques, listing those techniques here would only make that much more material in this book outdated. Intel publishes such optimization hints in their processor programmer reference manuals. Articles on optimizing assembly language programs often appear in technical magazines like Dr. Dobb’s Journal, you should read such articles and learn all the current optimization techniques.
25.5
Improving the Implementation of an Algorithm One easy way to partially demonstrate how to optimize a piece of code is to provide an example of some program and the optimization steps you can apply to that program. This section will present a short program that blurs an eight-bit gray scale image. Then, this section will lead though through several optimization steps and show you how to get that program running over 16 times faster.
Page 1317
Chapter 25
The following code assumes that you provide it with a file containing a 251x256 gray scale photographic image. The data structure for this file is as follows: Image: array [0..250, 0..255] of byte;
Each byte contains a value in the range 0..255 with zero denoting black, 255 representing white, and the other values representing even shades of gray between these two extremes. The blurring algorithm averages a pixel5 with its eight closest neighbors. A single blur operation applies this average to all interior pixels of an image (that is, it does not apply to the pixels on the boundary of the image because they do not have the same number of neighbors as the other pixels). The following Pascal program implements the blurring algorithm and lets the user specify the amount of blurring (by looping through the algorithm the number of times the user specifies)6: program PhotoFilter(input,output); (* Here is the raw file data type produced by the Photoshop program *) type image = array [0..250] of array [0..255] of byte; (* (* (* (*
The variables we will use. Note that the “datain” and “dataout” *) variables are pointers because Turbo Pascal will not allow us to *) allocate more than 64K data in the one global data segment it *) supports. *)
var h,i,j,k,l,sum,iterations:integer; datain, dataout: ^image; f,g:file of image; begin (* Open the files and real the input data *) assign(f, ‘roller1.raw’); assign(g, ‘roller2.raw’); reset(f); rewrite(g); new(datain); new(dataout); read(f,datain^); (* Get the number of iterations from the user *) write(‘Enter number of iterations:’); readln(iterations); writeln(‘Computing result’); (* Copy the data from the input array to the output array. *) (* This is a really lame way to copy the border from the *) (* input array to the output array. *) for i := 0 to 250 do for j := 0 to 255 do dataout^ [i][j] := datain^ [i][j]; (* Okay, here’s where all the work takes place. The outside (* loop repeats this blurring operation the number of (* iterations specified by the user.
*) *) *)
for h := 1 to iterations do begin (* For each row except the first and the last, compute (* a new value for each element. for i := 1 to 249 do 5. Pixel stands for “picture element.” A pixel is an element of the Image array defined above. 6. A comparable C program appears on the diskette accompanying the lab manual.
Page 1318
*) *)
Optimizing Your Programs (* For each column except the first and the last, com(* pute a new value for each element.
*) *)
for j := 1 to 254 do begin (* For each element in the array, compute a new blurred value by adding up the eight cells around an array element along with eight times the current cell’s value. Then divide this by sixteen to compute a weighted average of the nine cells forming a square around the current cell. The current cell has a 50% weighting, the other eight cells around the current cel provide the other 50% weighting (6.25% each). *) sum := 0; for k := -1 to 1 do for l := -1 to 1 do sum := sum + datain^ [i+k][j+l]; (* Sum currently contains the sum of the nine *) (* cells, add in seven times the current cell so *) (* we get a total of eight times the current cell. *) dataout^ [i][j] := (sum + datain^ [i][j]*7) div 16; end; (* Copy the output cell values back to the input cells (* so we can perform the blurring on this new data on (* the next iteration.
*) *) *)
for i := 0 to 250 do for j := 0 to 255 do datain^ [i][j] := dataout^ [i][j]; end; writeln(‘Writing result’); write(g,dataout^); close(f); close(g); end.
The Pascal program above, compiled with Turbo Pascal v7.0, takes 45 seconds to compute 100 iterations of the blurring algorithm. A comparable program written in C and compiled with Borland C++ v4.02 takes 29 seconds to run. The same source file compiled with Microsoft C++ v8.00 runs in 21 seconds. Obviously the C compilers produce better code than Turbo Pascal. It took about three hours to get the Pascal version running and tested. The C versions took about another hour to code and test. The following two images provide a “before” and “after” example of this program’s function: Before blurring:
Page 1319
Chapter 25
After blurring (10 iterations):
The following is a crude translation from Pascal directly into assembly language of the above program. It requires 36 seconds to run. Yes, the C compilers did a better job, but once you see how bad this code is, you’ll wonder what it is that Turbo Pascal is doing to run so slow. It took about an hour to translate the Pascal version into this assembly code and debug it to the point it produced the same output as the Pascal version. ; ; ; ; ; ; ; ; ; ; ; ;
Page 1320
IMGPRCS.ASM An image processing program. This program blurs an eight-bit grayscale image by averaging a pixel in the image with the eight pixels around it. The average is computed by (CurCell*8 + other 8 cells)/16, weighting the current cell by 50%. Because of the size of the image (almost 64K), the input and output matrices are in different segments. Version #1: Straight-forward translation from Pascal to Assembly.
Optimizing Your Programs ; ; ; ; ; ; ;
Performance comparisons (66 MHz 80486 DX/2 system). This codeBorland Pascal v7.0Borland C++ v4.02Microsoft C++ v8.00-
36 45 29 21
seconds. seconds. seconds. seconds.
.xlist include stdlib.a includelib stdlib.lib .list .286 dseg
segment
para public ‘data’
; Loop control variables and other variables: h i j k l sum iterations
word word word word word word word
? ? ? ? ? ? ?
InName OutName
byte byte
“roller1.raw”,0 “roller2.raw”,0
dseg
ends
; File names:
; Here is the input data that we operate on. InSeg
segment
para public ‘indata’
DataIn
byte
251 dup (256 dup (?))
InSeg
ends
; Here is the output array that holds the result. OutSeg
segment
para public ‘outdata’
DataOut
byte
251 dup (256 dup (?))
OutSeg
ends
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
Main
proc mov mov meminit
ax, dseg ds, ax
GoodOpen:
mov lea int jnc print byte jmp
ax, 3d00h dx, InName 21h GoodOpen
mov mov mov lea
bx, dx, ds, dx,
;Open input file for reading.
“Could not open input file.”,cr,lf,0 Quit ax InSeg dx DataIn
;File handle. ;Where to put the data.
Page 1321
Chapter 25
GoodRead:
mov mov int cmp je print byte jmp
cx, 256*251 ah, 3Fh 21h ax, 256*251 GoodRead
mov mov print byte getsm atoi free mov print byte
ax, dseg ds, ax
;Size of data file to read. ;See if we read the data.
“Did not read the file properly”,cr,lf,0 Quit
“Enter number of iterations: “,0
iterations, ax “Computing Result”,cr,lf,0
; Copy the input data to the output buffer. iloop0: jloop0:
jDone0:
mov cmp ja mov cmp ja
i, 0 i, 250 iDone0 j, 0 j, 255 jDone0
mov shl add
bx, i bx, 8 bx, j
mov mov mov
cx, InSeg ;Point at input segment. es, cx al, es:DataIn[bx] ;Get DataIn[i][j].
mov mov mov
cx, OutSeg ;Point at output segment. es, cx es:DataOut[bx], al ;Store into DataOut[i][j]
inc jmp
j jloop0
;Next iteration of j loop.
inc jmp
i iloop0
;Next iteration of i loop.
;Compute index into both ; arrays using the formula ; i*256+j (row major).
iDone0: ; for h := 1 to iterationshloop:
mov mov cmp ja
h, 1 ax, h ax, iterations hloopDone
; for i := 1 to 249 iloop:
mov cmp ja
; for j := 1 to 254 mov jloop: cmp ja
i, 1 i, 249 iloopDone j, 1 j, 254 jloopDone
; sum := 0; ; for k := -1 to 1 do for l := -1 to 1 do mov
Page 1322
ax, InSeg
;Gain access to InSeg.
Optimizing Your Programs
kloop:
lloop:
mov
es, ax
mov mov cmp jg
sum, 0 k, -1 k, 1 kloopDone
mov cmp jg
l, -1 l, 1 lloopDone
; sum := sum + datain [i+k][j+l]
lloopDone:
mov add shl add add
bx, bx, bx, bx, bx,
i k 8 j l
mov mov add
al, es:DataIn[bx] ah, 0 Sum, ax
inc jmp
l lloop
inc jmp
k kloop
;Multiply by 256.
; dataout [i][j] := (sum + datain[i][j]*7) div 16; kloopDone:
jloopDone:
mov shl add mov mov imul add shr
bx, bx, bx, al, ah, ax, ax, ax,
i 8 ;*256 j es:DataIn[bx] 0 7 sum 4 ;div 16
mov mov
bx, OutSeg es, bx
mov shl add mov
bx, i bx, 8 bx, j es:DataOut[bx], al
inc jmp
j jloop
inc jmp
i iloop
iloopDone: ; Copy the output data to the input buffer. iloop1: jloop1:
mov cmp ja mov cmp ja
i, 0 i, 250 iDone1 j, 0 j, 255 jDone1
mov shl add
bx, i bx, 8 bx, j
mov mov mov
cx, OutSeg ;Point at input segment. es, cx al, es:DataOut[bx] ;Get DataIn[i][j].
mov
cx, InSeg
;Compute index into both ; arrays using the formula ; i*256+j (row major).
;Point at output segment.
Page 1323
Chapter 25 mov mov
es, cx es:DataIn[bx], al ;Store into DataOut[i][j]
inc jmp
j jloop1
;Next iteration of j loop.
jDone1:
inc jmp
i iloop1
;Next iteration of i loop.
iDone1:
inc jmp
h hloop
hloopDone:
print byte
“Writing result”,cr,lf,0
; Okay, write the data to the output file: mov mov lea int jnc print byte jmp
ah, 3ch cx, 0 dx, OutName 21h GoodCreate
mov push mov mov lea mov mov int pop cmp je print byte jmp
bx, ax bx dx, OutSeg ds, dx dx, DataOut cx, 256*251 ah, 40h 21h bx ax, 256*251 GoodWrite
GoodWrite:
mov int
ah, 3eh 21h
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?)
GoodCreate:
;Create output file. ;Normal file attributes.
“Could not create output file.”,cr,lf,0 Quit ;File handle. ;Where the data can be found. ;Size of data file to write. ;Write operation. ;Retrieve handle for close. ;See if we wrote the data.
“Did not write the file properly”,cr,lf,0 Quit ;Close operation.
;DOS macro to quit program.
Main
This assembly code is a very straight-forward, line by line translation of the previous Pascal code. Even beginning programmers (who’ve read and understand Chapters Eight and Nine) should easily be able to improve the performance of this code. While we could run a profiler on this program to determine where the “hot spots” are in this code, a little analysis, particularly of the Pascal version, should make it obvious that there are a lot of nested loops in this code. As Chapter Ten points out, when optimizing code you should always start with the innermost loops. The major change between the code above and the following assembly language version is that we’ve unrolled the innermost loops and we’ve replaced the array index computations with some constant Page 1324
Optimizing Your Programs
computations. These minor changes speed up the execution by a factor of six! The assembly version now runs in six seconds rather than 36. A Microsoft C++ version of the same program with comparable optimzations runs in eight seconds. It required nearly four hours to develop, test, and debug this code. It required an additional hour to apply these same modifications to the C version7. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
IMGPRCS2.ASM An image processing program (First optimization pass). This program blurs an eight-bit grayscale image by averaging a pixel in the image with the eight pixels around it. The average is computed by (CurCell*8 + other 8 cells)/16, weighting the current cell by 50%. Because of the size of the image (almost 64K), the input and output matrices are in different segments. Version #1: Straight-forward translation from Pascal to Assembly. Version #2: Three major optimizations. (1) used movsd instruction rather than a loop to copy data from DataOut back to DataIn. (2) Used repeat..until forms for all loops. (3) unrolled the innermost two loops (which is responsible for most of the performance improvement).
;
Performance comparisons (66 MHz 80486 DX/2 system). This codeOriginal ASM codeBorland Pascal v7.0Borland C++ v4.02Microsoft C++ v8.00-
6 seconds. 36 seconds. 45 seconds. 29 seconds. 21 seconds.
« Lots of omitted code goes here, see the previous version» print byte
“Computing Result”,cr,lf,0
; for h := 1 to iterationsmov
h, 1
hloop: ; Copy the input data to the output buffer. ; Optimization step #1: Replace with movs instruction.
rep
push mov mov mov mov lea lea mov movsd pop
ds ax, ds, ax, es, si, di, cx,
OutSeg ax InSeg ax DataOut DataIn (251*256)/4
ds
; Optimization Step #1: Convert loops to repeat..until form. ; for i := 1 to 249 mov
i, 1
iloop: ; for j := 1 to 254 -
7. This does not imply that coding this improved algorithm in C was easier. Most of the time on the assembly version was spent trying out several different modifications to see if they actually improved performance. Many modifications did not, so they were removed from the code. The development of the C version benefited from the past work on the assembly version. It was a straight-forward conversion from assembly to C.
Page 1325
Chapter 25 mov
j, 1
jloop: ; Optimization. Unroll the innermost two loops:
Done: ;
mov mov
bh, byte ptr i ;i is always less than 256. bl, byte ptr j ;Computes i*256+j!
push mov mov
ds ax, InSeg ds, ax
mov mov mov mov add mov add mov add mov add mov add mov add mov add
cx, ah, cl, al, cx, al, cx, al, cx, al, cx, al, cx, al, cx, al, cx,
mov shl add shr mov mov mov pop
al, ds:DataIn[bx];DataIn[i][j] ax, 3 ;DataIn[i][j]*8 cx, ax cx, 4 ;Divide by 16 ax, OutSeg ds, ax ds:DataOut[bx], cl ds
inc cmp jbe
j j, 254 jloop
inc cmp jbe
i i, 249 iloop
inc mov cmp jnbe jmp
h ax, h ax, Iterations Done hloop
print byte
“Writing result”,cr,lf,0
;Gain access to InSeg.
0 ;Compute sum here. ch ds:DataIn[bx-257];DataIn[i-1][j-1] ds:DataIn[bx-256];DataIn[i-1][j] ax ds:DataIn[bx-255];DataIn[i-1][j+1] ax ds:DataIn[bx-1];DataIn[i][j-1] ax ds:DataIn[bx+1];DataIn[i][j+1] ax ds:DataIn[bx+255];DataIn[i+1][j-1] ax ds:DataIn[bx+256];DataIn[i+1][j] ax ds:DataIn[bx+257];DataIn[i+1][j+1] ax
«More omitted code goes here, see the previous version»
The second version above still uses memory variables for most computations. The optimizations applied to the original code were mainly language-independent optimizations. The next step was to begin applying some assembly language specific optimizations to the code. The first optimization we need to do is to move as many variables as possible into the 80x86’s register set. The following code provides this optimization. Although this only improves the running time by 2 seconds, that is a 33% improvement (six seconds down to four)! ; IMGPRCS.ASM ; ; An image processing program (Second optimization pass).
Page 1326
Optimizing Your Programs ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
This program blurs an eight-bit grayscale image by averaging a pixel in the image with the eight pixels around it. The average is computed by (CurCell*8 + other 8 cells)/16, weighting the current cell by 50%. Because of the size of the image (almost 64K), the input and output matrices are in different segments. Version #1: Straight-forward translation from Pascal to Assembly. Version #2: Three major optimizations. (1) used movsd instruction rather than a loop to copy data from DataOut back to DataIn. (2) Used repeat..until forms for all loops. (3) unrolled the innermost two loops (which is responsible for most of the performance improvement). Version #3: Used registers for all variables. Set up segment registers once and for all through the execution of the main loop so the code didn’t have to reload ds each time through. Computed index into each row only once (outside the j loop).
;
Performance comparisons (66 MHz 80486 DX/2 system). This code1st optimization passOriginal ASM code-
4 seconds. 6 seconds. 36 seconds.
«Lots of delete code goes here» print byte
“Computing Result”,cr,lf,0
; Copy the input data to the output buffer. hloop:
rep
iloop:
mov mov mov mov lea lea mov movsd
ax, es, ax, ds, si, di, cx,
InSeg ax OutSeg ax DataOut DataIn (251*256)/4
assume mov mov mov mov
ds:InSeg, es:OutSeg ax, InSeg ds, ax ax, OutSeg es, ax
mov mov mov mov
cl, bh, bl, ch,
249 cl 1 254
mov mov mov mov add mov add mov add mov add mov add mov add mov
dx, ah, dl, al, dx, al, dx, al, dx, al, dx, al, dx, al, dx, al,
0 ;Compute sum here. dh DataIn[bx-257] ;DataIn[i-1][j-1] DataIn[bx-256] ;DataIn[i-1][j] ax DataIn[bx-255] ;DataIn[i-1][j+1] ax DataIn[bx-1] ;DataIn[i][j-1] ax DataIn[bx+1] ;DataIn[i][j+1] ax DataIn[bx+255] ;DataIn[i+1][j-1] ax DataIn[bx+256] ;DataIn[i+1][j] ax DataIn[bx+257] ;DataIn[i+1][j+1]
;i*256 ;Start at j=1. ;# of times through loop.
jloop:
Page 1327
Chapter 25
Done: ;
add
dx, ax
mov shl add shr mov
al, DataIn[bx] ax, 3 dx, ax dx, 4 DataOut[bx], dl
inc dec jne
bx ch jloop
dec jne
cl iloop
dec jne
bp hloop
print byte
“Writing result”,cr,lf,0
;DataIn[i][j] ;DataIn[i][j]*8 ;Divide by 16
«More deleted code goes here, see the original version»
Note that on each iteration, the code above still copies the output data back to the input data. That’s almost six and a half megabytes of data movement for 100 iterations! The following version of the blurring program unrolls the hloop twice. The first occurrence copies the data from DataIn to DataOut while computing the blur, the second instance copies the data from DataOut back to DataIn while blurring the image. By using these two code sequences, the program save copying the data from one point to another. This version also maintains some common computations between two adjacent cells to save a few instructions in the innermost loop. This version arranges instructions in the innermost loop to help avoid data hazards on 80486 and later processors. The end result is almost 40% faster than the previous version (down to 2.5 seconds from four seconds). ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Page 1328
IMGPRCS.ASM An image processing program (Third optimization pass). This program blurs an eight-bit grayscale image by averaging a pixel in the image with the eight pixels around it. The average is computed by (CurCell*8 + other 8 cells)/16, weighting the current cell by 50%. Because of the size of the image (almost 64K), the input and output matrices are in different segments. Version #1: Straight-forward translation from Pascal to Assembly. Version #2: Three major optimizations. (1) used movsd instruction rather than a loop to copy data from DataOut back to DataIn. (2) Used repeat..until forms for all loops. (3) unrolled the innermost two loops (which is responsible for most of the performance improvement). Version #3: Used registers for all variables. Set up segment registers once and for all through the execution of the main loop so the code didn’t have to reload ds each time through. Computed index into each row only once (outside the j loop). Version #4: Eliminated copying data from DataOut to DataIn on each pass. Removed hazards. Maintained common subexpressions. Did some more loop unrolling. Performance comparisons (66 MHz 80486 DX/2 system, 100 iterations). This code2nd optimization pass1st optimization passOriginal ASM code-
2.5 seconds. 4 seconds. 6 seconds. 36 seconds.
«Lots of deleted code here, see the original version»
Optimizing Your Programs print byte
“Computing Result”,cr,lf,0
assume
ds:InSeg, es:OutSeg
mov mov mov mov
ax, ds, ax, es,
InSeg ax OutSeg ax
; Copy the data once so we get the edges in both arrays.
rep
mov lea lea movsd
cx, (251*256)/4 si, DataIn di, DataOut
; “hloop” repeats once for each iteration. hloop: mov mov mov mov
ax, ds, ax, es,
InSeg ax OutSeg ax
; “iloop” processes the rows in the matrices. iloop:
mov mov mov mov mov mov mov mov
cl, bh, bl, ch, si, dh, bh, ah,
249 cl 1 254/2 bx 0 0 0
;i*256 ;Start at j=1. ;# of times through loop. ;Compute sum here.
; “jloop” processes the individual elements of the array. ; This loop has been unrolled once to allow the two portions to share ; some common computations. jloop: ; The sum of DataIn [i-1][j] + DataIn[i-1][j+1] + DataIn[i+1][j] + ; DataIn [i+1][j+1] will be used in the second half of this computation. ; So save its value in a register (di) until we need it again. mov mov mov add mov add mov add mov
dl, al, bl, dx, al, dx, bl, dx, al,
DataIn[si-256] DataIn[si-255] DataIn[si+257] ax DataIn[si+256] bx DataIn[si+1] ax DataIn[si+255]
mov
di, dx
add mov add mov add mov shl add add shr shr add mov
dx, bx bl, DataIn[si-1] dx, ax al, DataIn[si] dx, bx bl, DataIn[si-257] ax, 3 dx, bx dx, ax ax, 3 dx, 4 di, ax DataOut[si], dl
;[i-1,j] ;[i-1,j+1] ;[i+1,j+1] ;[I+1,j] ;[i,j+1] ;[i+1,j-1] ;Save partial result. ;[i,j-1] ;[i,j] ;[i-1,j-1] ;DataIn[i,j] * 8. ;Restore DataIn[i,j]. ;Divide by 16.
Page 1329
Chapter 25 ; ; ; ;
Okay, process the next cell over. Note that we’ve got a partial sum sitting in DI already. Don’t forget, we haven’t bumped SI at this point, so the offsets are off by one. (This is the second half of the unrolled loop.) mov mov mov add mov add mov add shl add add mov shr dec mov jne
dx, di bl, DataIn[si-254] al, DataIn[si+2] dx, bx bl, DataIn[si+258] dx, ax al, DataIn[si+1] dx, bx ax, 3 si, 2 dx, ax ah, 0 dx, 4 ch DataOut[si-1], dl jloop
dec jne
cl iloop
dec je
bp Done
;Partial sum. ;[i-1,j+1] ;[i,j+1] ;[i+1,j+1]; ;[i,j] ;DataIn[i][j]*8 ;Bump array index. ;Clear for next iter. ;Divide by 16
; Special case so we don’t have to move the data between the two arrays. ; This is an unrolled version of the hloop that swaps the input and output ; arrays so we don’t have to move data around in memory. mov mov mov mov assume
ax, OutSeg ds, ax ax, InSeg es, ax es:InSeg, ds:OutSeg
mov mov mov mov mov mov mov mov
cl, bh, bl, ch, si, dh, bh, ah,
249 cl 1 254/2 bx 0 0 0
mov mov mov add mov add mov add mov
dl, al, bl, dx, al, dx, bl, dx, al,
DataOut[si-256] DataOut[si-255] DataOut[si+257] ax DataOut[si+256] bx DataOut[si+1] ax DataOut[si+255]
mov
di, dx
add mov add mov add mov shl add add shr shr mov
dx, bx bl, DataOut[si-1] dx, ax al, DataOut[si] dx, bx bl, DataOut[si-257] ax, 3 dx, bx dx, ax ax, 3 dx, 4 DataIn[si], dl
hloop2: iloop2:
jloop2:
Page 1330
Optimizing Your Programs mov mov add mov add mov add mov add shl add add mov shr dec mov jne
dx, di bl, DataOut[si-254] dx, ax al, DataOut[si+2] dx, bx bl, DataOut[si+258] dx, ax al, DataOut[si+1] dx, bx ax, 3 si, 2 dx, ax ah, 0 dx, 4 ch DataIn[si-1], dl jloop2
dec jne
cl iloop2
dec je jmp
bp Done2 hloop
; Kludge to guarantee that the data always resides in the output segment. Done2:
rep Done: ;
mov mov mov mov mov lea lea movsd print byte
ax, ds, ax, es, cx, si, di,
InSeg ax OutSeg ax (251*256)/4 DataIn DataOut
“Writing result”,cr,lf,0
«Lots of deleted code here, see the original program»
This code provides a good example of the kind of optimization that scares a lot of people. There is a lot of cycle counting, instruction scheduling, and other crazy stuff that makes program very difficult to read and understand. This is the kind of optimization for which assembly language programmers are famous; the stuff that spawned the phrase “never optimize early.” You should never try this type of optimization until you feel you’ve exhausted all other possibilities. Once you write your code in this fashion, it is going to be very difficult to make further changes to it. By the way, the above code took about 15 hours to develop and debug (debugging took the most time). That works out to a 0.1 second improvement (for 100 iterations) for each hour of work. Although this code certainly isn’t optimal yet, it is difficult to justify more time attempting to improve this code by mechanical means (e.g., moving instructions around, etc.) because the performance gains would be so little. In the four steps above, we’ve reduced the running time of the assembly code from 36 seconds down to 2.5 seconds. Quite an impressive feat. However, you shouldn’t get the idea that this was easy or even that there were only four steps involved. During the actual development of this example, there were many attempts that did not improve performance (in fact, some modifications wound up reducing performance) and others did not improve performance enough to justify their inclusion. Just to demonstrate this last point, the following code included a major change in the way the program organized data. The main loop operates on 16 bit objects in memory rather than eight bit objects. On some machines with large external caches (256K or better) this algorithm provides a slight improvement in performance (2.4 seconds, down from 2.5). However, on other machines it runs slower. Therefore, this code was not chosen as the final implementation:
Page 1331
Chapter 25 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
IMGPRCS.ASM An image processing program (Fourth optimization pass). This program blurs an eight-bit grayscale image by averaging a pixel in the image with the eight pixels around it. The average is computed by (CurCell*8 + other 8 cells)/16, weighting the current cell by 50%. Because of the size of the image (almost 64K), the input and output matrices are in different segments. Version #1: Straight-forward translation from Pascal to Assembly. Version #2: Three major optimizations. (1) used movsd instruction rather than a loop to copy data from DataOut back to DataIn. (2) Used repeat..until forms for all loops. (3) unrolled the innermost two loops (which is responsible for most of the performance improvement). Version #3: Used registers for all variables. Set up segment registers once and for all through the execution of the main loop so the code didn’t have to reload ds each time through. Computed index into each row only once (outside the j loop). Version #4: Eliminated copying data from DataOut to DataIn on each pass. Removed hazards. Maintained common subexpressions. Did some more loop unrolling. Version #5: Converted data arrays to words rather than bytes and operated on 16-bit values. Yielded minimal speedup. Performance comparisons (66 MHz 80486 DX/2 system). This code3rd optimization pass2nd optimization pass1st optimization passOriginal ASM code-
2.4 seconds. 2.5 seconds. 4 seconds. 6 seconds. 36 seconds.
.xlist include stdlib.a includelib stdlib.lib .list .386 option segment:use16
dseg
segment
para public ‘data’
ImgData
byte
251 dup (256 dup (?))
InName OutName Iterations
byte byte word
“roller1.raw”,0 “roller2.raw”,0 0
dseg
ends
; ; ; ; ; ; ;
Page 1332
This code makes the naughty assumption that the following segments are loaded contiguously in memory! Also, because these segments are paragraph aligned, this code assumes that these segments will contain a full 65,536 bytes. You cannot declare a segment with exactly 65,536 bytes in MASM. However, the paragraph alignment option ensures that the extra byte of padding is added to the end of each segment.
DataSeg1 Data1a DataSeg1
segment byte ends
para public ‘ds1’ 65535 dup (?)
DataSeg2 Data1b DataSeg2
segment byte ends
para public ‘ds2’ 65535 dup (?)
Optimizing Your Programs DataSeg3 Data2a DataSeg3
segment byte ends
para public ‘ds3’ 65535 dup (?)
DataSeg4 Data2b DataSeg4
segment byte ends
para public ‘ds4’ 65535 dup (?)
cseg
segment assume
para public ‘code’ cs:cseg, ds:dseg
Main
proc mov mov meminit
ax, dseg ds, ax
GoodOpen:
GoodRead:
mov lea int jnc print byte jmp
ax, 3d00h dx, InName 21h GoodOpen
mov lea mov mov int cmp je print byte jmp
bx, ax dx, ImgData cx, 256*251 ah, 3Fh 21h ax, 256*251 GoodRead
;Open input file for reading.
“Could not open input file.”,cr,lf,0 Quit ;File handle. ;Size of data file to read. ;See if we read the data.
“Did not read the file properly”,cr,lf,0 Quit
print byte getsm atoi free mov cmp jle
Iterations, ax ax, 0 Quit
printf byte dword
“Computing Result for %d iterations”,cr,lf,0 Iterations
“Enter number of iterations: “,0
; Copy the data and expand it from eight bits to sixteen bits. ; The first loop handles the first 32,768 bytes, the second loop ; handles the remaining bytes.
CopyLoop:
mov mov mov mov
ax, es, ax, fs,
DataSeg1 ax DataSeg3 ax
mov mov lea xor lodsb mov stosw dec jne
ah, cx, si, di,
0 32768 ImgData di
mov
di, DataSeg2
fs:[di], ax
;Output data is at ofs zero. ;Read a byte ;Store a word in DataSeg3 ;Store a word in DataSeg1
cx CopyLoop
Page 1333
Chapter 25
CopyLoop1:
mov mov mov mov xor lodsb mov stosw dec jne
es, di, fs, cx, di,
di DataSeg4 di (251*256) - 32768 di ;Read a byte fs:[di], ax ;Store a word in DataSeg4 ;Store a word in DataSeg2 cx CopyLoop1
; hloop completes one iteration on the data moving it from Data1a/Data1b ; to Data2a/Data2b hloop:
mov mov mov mov
ax, ds, ax, es,
DataSeg1 ax DataSeg3 ax
; Process the first 127 rows (65,024 bytes) of the array):
iloop0: jloop0:
mov lea mov mov mov mov shl add mov add add add add mov add add add add shl add add shr add mov add add shr dec mov jne
cl, 127 si, Data1a+202h ch, 254/2 dx, [si] bx, [si-200h] ax, dx dx, 3 bx, [si-1feh] bp, [si+2] bx, [si+200h] dx, bp bx, [si+202h] dx, [si-202h] di, [si-1fch] dx, [si-2] di, [si+4] dx, [si+1feh] di, [si+204h] bp, 3 dx, bx bp, ax dx, 4 bp, bx es:[si], dx bp, di si, 4 bp, 4 ch es:[si-2], bp jloop0
add
si, 4
dec jne
cl iloop0
;Start at [1,1] ;# of times through loop. ;[i,j] ;[i-1,j] ;[i,j] * 8 ;[i-1,j+1] ;[i,j+1] ;[i+1,j] ;[i+1,j+1] ;[i-1,j-1] ;[i-1,j+2] ;[i,j-1] ;[i,j+2] ;[i+1,j-1] ;[i+1,j+2] ;[i,j+1] * 8 ;Divide by 16. ;Store [i,j] entry. ;Affects next store operation! ;Divide by 16. ;Store [i,j+1] entry. ;Skip to start of next row.
; Process the last 124 rows of the array). This requires that we switch from ; one segment to the next. Note that the segments overlap.
iloop1: jloop1:
Page 1334
mov sub mov mov sub mov
ax, ax, ds, ax, ax, es,
DataSeg2 40h ax DataSeg4 40h ax
mov mov mov mov mov mov shl
cl, si, ch, dx, bx, ax, dx,
251-127-1 202h 254/2 [si] [si-200h] dx 3
;Back up to last 2 rows in DS2 ;Back up to last 2 rows in DS4 ;Remaining rows to process. ;Continue with next row. ;# of times through loop. ;[i,j] ;[i-1,j] ;[i,j] * 8
Optimizing Your Programs add mov add add add add mov add add add add shl add add shr add mov add add shr dec mov jne
bx, [si-1feh] bp, [si+2] bx, [si+200h] dx, bp bx, [si+202h] dx, [si-202h] di, [si-1fch] dx, [si-2] di, [si+4] dx, [si+1feh] di, [si+204h] bp, 3 dx, bx bp, ax dx, 4 bp, bx es:[si], dx bp, di si, 4 bp, 4 ch es:[si-2], bp jloop1
add
si, 4
dec jne
cl iloop1
mov mov assume
ax, dseg ds, ax ds:dseg
dec je
Iterations Done0
;[i-1,j+1] ;[i,j+1] ;[i+1,j] ;[i+1,j+1] ;[i-1,j-1] ;[i-1,j+2] ;[i,j-1] ;[i,j+2] ;[i+1,j-1] ;[i+1,j+2] ;[i,j+1] * 8 ;Divide by 16 ;Store [i,j] entry. ;Affects next store operation! ;Store [i,j+1] entry. ;Skip to start of next row.
; Unroll the iterations loop so we can move the data from DataSeg2/4 back ; to DataSeg1/3 without wasting extra time. Other than the direction of the ; data movement, this code is virtually identical to the above.
iloop2: jloop2:
mov mov mov mov
ax, ds, ax, es,
DataSeg3 ax DataSeg1 ax
mov lea mov mov mov mov shl add mov add add add add mov add add add add shl add add shr add mov add add shr dec mov
cl, 127 si, Data1a+202h ch, 254/2 dx, [si] bx, [si-200h] ax, dx dx, 3 bx, [si-1feh] bp, [si+2] bx, [si+200h] dx, bp bx, [si+202h] dx, [si-202h] di, [si-1fch] dx, [si-2] di, [si+4] dx, [si+1feh] di, [si+204h] bp, 3 dx, bx bp, ax dx, 4 bp, bx es:[si], dx bp, di si, 4 bp, 4 ch es:[si-2], bp
Page 1335
Chapter 25 jne
jloop2
add
si, 4
dec jne
cl iloop2
mov sub mov mov sub mov
ax, ax, ds, ax, ax, es,
mov mov mov mov mov mov shl add mov add add add add mov add add add add shl add add shr add mov add add shr dec mov jne
cl, 251-127-1 si, 202h ch, 254/2 dx, [si] bx, [si-200h] ax, dx dx, 3 bx, [si-1feh] bp, [si+2] bx, [si+200h] dx, bp bx, [si+202h] dx, [si-202h] di, [si-1fch] dx, [si-2] di, [si+4] dx, [si+1feh] di, [si+204h] bp, 3 dx, bx bp, ax dx, 4 bp, bx es:[si], dx bp, di si, 4 bp, 4 ch es:[si-2], bp jloop3
add
si, 4
dec jne
cl iloop3
mov mov assume
ax, dseg ds, ax ds:dseg
dec je jmp
Iterations Done2 hloop
Done2:
mov mov jmp
ax, DataSeg1 bx, DataSeg2 Finish
Done0:
mov mov mov print byte
ax, DataSeg3 bx, DataSeg4 ds, ax
iloop3: jloop3:
Finish:
DataSeg4 40h ax DataSeg2 40h ax
“Writing result”,cr,lf,0
; Convert data back to byte form and write to the output file: mov mov
Page 1336
ax, dseg es, ax
Optimizing Your Programs
CopyLoop3:
CopyLoop4:
mov lea xor lodsw stosb dec jne
cx, 32768 di, ImgData si, si
mov mov xor lodsw stosb dec jne
ds, bx cx, (251*256) - 32768 si, si ;Read final data word. ;Write data byte to output array. cx CopyLoop4
;Output data is at offset zero. ;Read a word from final array. ;Write a byte to output array.
cx CopyLoop3
; Okay, write the data to the output file: mov mov mov mov lea int jnc print byte jmp
ah, 3ch cx, 0 dx, dseg ds, dx dx, OutName 21h GoodCreate
mov push mov mov lea mov mov int pop cmp je print byte jmp
bx, ax bx dx, dseg ds, dx dx, ImgData cx, 256*251 ah, 40h 21h bx ax, 256*251 GoodWrite
GoodWrite:
mov int
ah, 3eh 21h
Quit: Main
ExitPgm endp
cseg
ends
sseg stk sseg
segment byte ends
para stack ‘stack’ 1024 dup (“stack “)
zzzzzzseg LastBytes zzzzzzseg
segment byte ends end
para public ‘zzzzzz’ 16 dup (?)
GoodCreate:
;Create output file. ;Normal file attributes.
“Could not create output file.”,cr,lf,0 Quit ;File handle. ;Where the data can be found. ;Size of data file to write. ;Write operation. ;Retrieve handle for close. ;See if we wrote the data.
“Did not write the file properly”,cr,lf,0 Quit ;Close operation.
;DOS macro to quit program.
Main
Of course, the absolute best way to improve the performance of any piece of code is with a better algorithm. All of the above assembly language versions were limited by a single requirement – they all must produce the same output file as the original Pascal program. Often, programmers lose sight of what it is that they are trying to accomplish and get so caught up in the computations they are performing that they fail to see other possibilities. The optimization example above is a perfect example. The assembly code faithfully preserves the semantics of the original Pascal program; it computes the weighted average Page 1337
Chapter 25
of all interior pixels as the sum of the eight neighbors around a pixel plus eight times the current pixel’s value, with the entire sum divided by 16. Now this is a good blurring function, but it is not the only blurring function. A Photoshop (or other image processing program) user doesn’t care about algorithms or such. When that user selects “blur image” they want it to go out of focus. Exactly how much out of focus is generally immaterial. In fact, the less the better because the user can always run the blur algorithm again (or specify some number of iterations). The following assembly language program shows how to get better performance by modifying the blurring algorithm to reduce the number of instructions it needs to execute in the innermost loops. It computes blurring by averaging a pixel with the four neighbors above, below, to the left, and to the right of the current pixel. This modification yields a program that runs 100 iterations in 2.2 seconds, a 12% improvement over the previous version: ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
IMGPRCS.ASM An image processing program (Fifth optimization pass). This program blurs an eight-bit grayscale image by averaging a pixel in the image with the eight pixels around it. The average is computed by (CurCell*8 + other 8 cells)/16, weighting the current cell by 50%. Because of the size of the image (almost 64K), the input and output matrices are in different segments. Version #1: Straight-forward translation from Pascal to Assembly. Version #2: Three major optimizations. (1) used movsd instruction rather than a loop to copy data from DataOut back to DataIn. (2) Used repeat..until forms for all loops. (3) unrolled the innermost two loops (which is responsible for most of the performance improvement). Version #3: Used registers for all variables. Set up segment registers once and for all through the execution of the main loop so the code didn’t have to reload ds each time through. Computed index into each row only once (outside the j loop). Version #4: Eliminated copying data from DataOut to DataIn on each pass. Removed hazards. Maintained common subexpressions. Did some more loop unrolling. Version #6: Changed the blurring algorithm to use fewer computations. This version does *NOT* produce the same data as the other programs. Performance comparisons (66 MHz 80486 DX/2 system, 100 iterations). This code3rd optmization pass2nd optimization pass1st optimization passOriginal ASM code-
2.2 seconds. 2.5 seconds. 4 seconds. 6 seconds. 36 seconds.
«Lots of deleted code here, see the original program» print byte
“Computing Result”,cr,lf,0
assume
ds:InSeg, es:OutSeg
mov mov mov mov
ax, ds, ax, es,
InSeg ax OutSeg ax
; Copy the data once so we get the edges in both arrays. mov lea
Page 1338
cx, (251*256)/4 si, DataIn
Optimizing Your Programs rep
lea movsd
di, DataOut
; “hloop” repeats once for each iteration. hloop: mov mov mov mov
ax, ds, ax, es,
InSeg ax OutSeg ax
; “iloop” processes the rows in the matrices. iloop:
mov mov mov mov mov mov mov mov
cl, bh, bl, ch, si, dh, bh, ah,
249 cl 1 254/2 bx 0 0 0
;i*256 ;Start at j=1. ;# of times through loop. ;Compute sum here.
; “jloop” processes the individual elements of the array. ; This loop has been unrolled once to allow the two portions to share ; some common computations. jloop: ; The sum of DataIn [i-1][j] + DataIn[i-1][j+1] + DataIn[i+1][j] + ; DataIn [i+1][j+1] will be used in the second half of this computation. ; So save its value in a register (di) until we need it again. mov mov shl mov add mov add mov add shl add mov shr add mov mov mov add mov add add shr dec mov jne
dl, DataIn[si] al, DataIn[si-256] dx, 2 bl, DataIn[si-1] dx, ax al, DataIn[si+1] dx, bx bl, DataIn[si+256] dx, ax ax, 2 dx, bx bl, DataIn[si-255] dx, 3 ax, bx DataOut[si], dl bl, DataIn[si+2] dl, DataIn[si+257] ax, bx bl, DataIn[si] ax, dx ax, bx ax, 3 ch DataOut[si+1], al jloop
dec jne
cl iloop
dec je
bp Done
;[i,j] ;[I-1,j] ;[i,j]*4 ;[i,j-1] ;[i,j+1] ;[i+1,j] ;[i,j+1]*4 ;[i-1,j+1] ;Divide by 8. ;[i,j+2] ;[i+1,j+1] ;[i,j]
; Special case so we don’t have to move the data between the two arrays. ; This is an unrolled version of the hloop that swaps the input and output ; arrays so we don’t have to move data around in memory. mov mov mov mov
ax, ds, ax, es,
OutSeg ax InSeg ax
Page 1339
Chapter 25 assume
es:InSeg, ds:OutSeg
mov mov mov mov mov mov mov mov
cl, bh, bl, ch, si, dh, bh, ah,
249 cl 1 254/2 bx 0 0 0
mov mov mov add mov add mov add mov
dl, al, bl, dx, al, dx, bl, dx, al,
DataOut[si-256] DataOut[si-255] DataOut[si+257] ax DataOut[si+256] bx DataOut[si+1] ax DataOut[si+255]
mov
di, dx
add mov add mov add mov shl add add shr shr mov
dx, bx bl, DataOut[si-1] dx, ax al, DataOut[si] dx, bx bl, DataOut[si-257] ax, 3 dx, bx dx, ax ax, 3 dx, 4 DataIn[si], dl
mov mov add mov add mov add mov add shl add add mov shr dec mov jne
dx, di bl, DataOut[si-254] dx, ax al, DataOut[si+2] dx, bx bl, DataOut[si+258] dx, ax al, DataOut[si+1] dx, bx ax, 3 si, 2 dx, ax ah, 0 dx, 4 ch DataIn[si-1], dl jloop2
dec jne
cl iloop2
dec je jmp
bp Done2 hloop
hloop2: iloop2:
jloop2:
; Kludge to guarantee that the data always resides in the output segment. Done2: mov mov mov mov mov lea lea
Page 1340
ax, ds, ax, es, cx, si, di,
InSeg ax OutSeg ax (251*256)/4 DataIn DataOut
Optimizing Your Programs rep Done:
movsd print byte
;
“Writing result”,cr,lf,0
«Lots of delete code here, see the original program»
One very important thing to keep in mind about the codein this section is that we’ve optimized it for 100 iterations. While it turns out that these optimizations apply equally well to more iterations, this isn’t necessarily true for fewer iterations. In particular, if we run only one iteration, any copying of data at the end of the operation will easily consume a large part of the time we save by the optimizations. Since it is very rare for a user to blur an image 100 times in a row, our optimizations may not be as good as we could make them. However, this section does provide a good example of the steps you must go through in order to optimize a given program. One hundred iterations was a good choice for this example because it was easy to measure the running time of all versions of the program. However, you must keep in mind that you should optimize your programs for the expected case, not an arbitrary case.
25.6
Summary Computer software often runs significantly slower than the task requires. The process of increasing the speed of a program is known as optimization. Unfortunately, optimization is a difficult and time-consuming task, something not to be taken lightly. Many programmers often optimize their programs before they’ve determined that there is a need to do so, or (worse yet) they optimize a portion of a program only to find that they have to rewrite that code after they’ve optimized it. Others, out of ignorance, often wind up optimizing the wrong sections of their programs. Since optimization is a slow and difficult process, you want to try and make sure you only optimize your code once. This suggests that optimization should be your last task when writing a program. One school of thought that completely embraces this philosophy is the Optimize Late group. Their arguement is that program optimization often destroys the readability and maintanability of a program. Therefore, one should only take this step when absolutely necessary and only at the end of the program development stage. The Optimize Early crowd knows, from experience, that programs that are not written to be fast often need to be completely rewritten to make them fast. Therefore, they often take the attitude that optimization should take place along with normal program development. Generally, the optimize early group’s view of optimization is typically far different from the optimize late group. The optimize early group claims that the extra time spent optimizing a program during development requires less time than developing a program and then optimizing it. For all the details on this religious battle, see •
“When to Optimize, When Not to Optimize” on page 1311
After you’ve written a program and determine that it runs too slowly, the next step is to locate the code that runs too slow. After identifying the slow sections of your program, you can work on speeding up your programs. Locating that 10% of the code that requires 90% of the execution time is not always an easy task. The four common techniques people use are trial and error, optimize everything, program analysis, and experimental analysis (i.e., use a profiler). Finding the “hot spots” in a program is the first optimization step. To learn about these four techniques, see •
“How Do You Find the Slow Code in Your Programs?” on page 1313
A convincing arguement the optimize late folks use is that machines are so fast that optimization is rarely necessary. While this arguement is often overstated, it is often true that many unoptimized programs run fast enough and do not require any optimization for satisfactory performance. On the other hand, programs that run fine by themselves may be too slow when running concurrently with other software. To see the strengths and weaknesses of this arguement, see Page 1341
Chapter 25
•
“Is Optimization Necessary?” on page 1314
There are three forms of optimization you can use to improve the performance of a program: choose a better algorithm, choose a better implementation of an algorithm, or “count cycles.” Many people (especially the optimize late crowd) only consider this last case “optimization.” This is a shame, because the last case often produces the smallest incremental improvement in performance. To understand these three forms of optimization, see •
“The Three Types of Optimization” on page 1315
Optimization is not something you can learn from a book. It takes lots of experience and practice. Unfortunately, those with little practical experience find that their efforts rarely pay off well and generally assume that optimization is not worth the trouble. The truth is, they do not have sufficient experience to write truly optimal code and their frustration prevents them from gaining such experience. The latter part of this chapter devotes itself to demonstrating what one can achieve when optimizing a program. Always keep this example in mind when you feel frustrated and are beginning to believe you cannot improve the performance of your program. For details on this example, see •
Page 1342
“Improving the Implementation of an Algorithm” on page 1317
Appendix B: Annotated Bibliography There are a wide variety of texts available for those who are interested in learning more about assembly language or other topics this text covers. The following is a partial list of texts that may be of interest to you. Many of these texts are now out of print. Please consult your local library if you cannot find a particular text at a bookstore. Microprocessor Programming for Computer Hobbyists Neill Graham TAB books ISBN 0-8306-6952-3 1977 This book provides a gentle introduction to data structures for computer hobbyists. Although it uses the PL/M programming language, many of the concepts apply directly to assembly language programs. IBM Assembler Language and Programming Peter Able Prentice-Hall ISBN 0-13-448143-7 1987 A college text book on assembly language. Contains good sections on DOS and disk formats for earlier versions of DOS. MS-DOS Developer’s Guide John Angermeyer and Keven Jaeger Howard W. Sams & Co. ISBN 0-672-22409-7 An excellent reference book on programming MS-DOS. Compilers: Principles, Techniques, and Tools Alfred Aho, Ravi Sethi, and Jeffrey Ullman Addison Wesley ISBN 0-201-10088-6 1986 The standard text on compiler design and implementation. Contains lots of material on pattern matching and other related subjects. C Programmer’s Guide to Serial Communications Joe Campbell Howard W. Sams & Co. ISBN 0-672-22584-0 An indispensible guide to serial communications. Although written specifically for C programmers, the material applies equally well to assembly language programmers. The MS-DOS Encyclopedia Ray Duncan, General Editor & various authors Microsoft Press ISBN 1-55615-049-0 An excellent description of MS-DOS programming. Contains especially good sections on resident programs and device drivers. Quite expensive, but well worth it.
Page 1347 Thi d
t
t d ith F
M k
402
Appendix B
Zen of Assembly Language Michael Abrash Scott Foresman ISBN 0-673-38602-3 1990 The first really great book on 80x86 code optimization. There are only two things wrong with this book. (1) It is out of print. (2) The optimization techniques apply mostly to the 8088 and 80286 processors, they do not apply as well to the 80386 and later processors. That’s okay, see the next entry below. Zen of Code Optimization Michael Abrash Coriolis Group Books ISBN 1-883577-03-9 1994 Here is Michael Abrash’s book updated for the 80386, 80486, and Pentium processors. An absolute musthave for 80x86 assembly language programmers. Assembler Inside & Out Harley Hahn McGraw-Hill ISBN 0-07-881842-7 1992 A reasonable 80x86 assembly language text. This one is notable because Microsoft ships this text with every copy of MASM. Assembly Language Subroutines for MS-DOS (2nd Edition) Leo J. Scanlon Windcrest ISBN 0-8306-7649-X This book is full of little code examples. The routines themselves are not earth-shaking, but it does provide lots of good code examples for those individuals who learn by example. Advanced Assembly Language Steven Holzner Brady/Peter Norton ISBN 0-13-658774-7 1991 This book provides a basic introduction to programming many of the PC’s hardware devices in assembly language. Despite its name, it is not truly an advanced assembly language programming text. Assembly Language. For Real Programmers Only. Marcus Johnson Sams Publishing ISBN 0-672-48470 A comprehensive book (over 1,300 pages) with lots of example code. The Revolutionary Guide to Assembly Language Vitaly Maljugin, Jacov Izrailevich, Semyon Lavin, and Alksandr Sopin Wrox Press ISBN 1-874416-12-5 1993 Another comprehensive text on assembly language. This one spends considerable time discussing the PC’s hardware. This text also includes sections on how to interface assembly language with the Clipper (dBase compiler) programming language. Page 1348
Appendices
The Waite Group’s Microsoft Macro Assembler Bible Nabajyoti Barkakati and Randall Hyde Sams ISBN 0-672-30155-5 1992 A comprehensive reference manual to MASM 6.x and the 8088 through the 80486. Computer Organization & Design: The Hardware/Software Interface David Patterson and John Hennessy Morgan Kaufmann Publishers ISBN 1-55860-223-2 1993 An excellent text on machine organization, one of the best in the field. Computer Architecture, A Quantitative Approach John Hennessy and David Patternson Morgan Kaufmann Publishers ISBN 1-55860-069-8 1990 One of the standard texts on computer architecture. Although it emphasizes RISC processors over CISC, many of the topics discussed apply to superscalar and pipelined CISC processors as well. IBM Microcomputers: A Programmer’s Handbook Julio Sanchez and Maria P. Canton McGraw Hill ISBN 0-07--54594-4 1990 One of the best reference manuals covering the PC’s hardware. An absolute must-have book for those interested in programming peripheral devices on the PC. The Undocumented PC Frank Van Gilluwe Addison Wesley ISBN 0-201-62277-7 1994 Another excellent text that covers the PC’s hardware and how to program peripheral devices. The Indispensible PC Hardware Book Hans-Peter Messmer Addison Wesley ISBN 0-201-62424-9 Yet another great PC hardware book. This one even describes the low-level operation of various silicon devices in a way even beginners can understand. It also provides an excellent hardware reference guide to the 80386 and 80486 microprocessor chips. Programmer’s Technical Reference: The Processor and Coprocessor Robert L. Hummel Ziff-Davis Press ISBN 1-56276-016-5 1992 One of the premier references on the 80x86 family from the 8088 through the 80486 chips. Also provides an excellent discussion of the 8087, 80287, 80387, and 487 math coprocessors.
Page 1349
Appendix B
Microsoft MS-DOS Programmer’s Reference Written by Microsoft Corporation Microsoft Press ISBN 1-55615-329-5 1991 The official guide to programming MS-DOS, directly from Microsoft. Undocumented DOS. A Programmer’s Guide to Reserved MS-DOS Functions and Data Structures Andrew Schulman, Raymond Michels, Jim Kyle, Tim Patterson, David Maxey, and Ralf Brown Addison Wesley ISBN 0-201-57064-5 1990 This book describes lots of features available to MS-DOS that Microsoft never bothered to document. This text contains vital information to TSR and protected mode programmers. Introduction to Automata Theory, Languages, and Computation John Hopcroft and Jeffrey Ullman Addison Wesley 1979 ISBN 0-201-02988-X Very concise, but one of the standard texts on automata theory, pattern matching, and computability. The Art of Computer Programming, Vol 1: Fundamental Algorithms Vol 2: Seminumerical Algorithms Vol 3: Sorting and Searching Donald Knuth Addison Wesley 1973 One of the finest sets of text on data structures and algorithms available for assembly language programmers. Donald Knuth uses a hypothetical assembly language, MIX, to present most algorithms. Code in these texts is very easy to convert to 80x86 assembly language.
Page 1350
Appendix C: Keyboard Scan Codes
Table 90: PC Keyboard Scan Codes (in hex) Key
Down
Up
Key
Down
Up
Key
Down
Up
Key
Down
Up
Esc
1
81
[{
1A
9A
,<
33
B3
center
4C
CC
1!
2
82
]}
1B
9B
.>
34
B4
right
4D
CD
2@
3
83
Enter
1C
9C
/?
35
B5
+
4E
CE
3#
4
84
Ctrl
1D
9D
R shift
36
B6
end
4F
CF
4$
5
85
A
1E
9E
* PrtSc
37
B7
down
50
D0
5%
6
86
S
1F
9F
alt
38
B8
pgdn
51
D1
6^
7
87
D
20
A0
space
39
B9
ins
52
D2
7&
8
88
F
21
A1
CAPS
3A
BA
del
53
D3
8*
9
89
G
22
A2
F1
3B
BB
/
E0 35
B5
9(
0A
8A
H
23
A3
F2
3C
BC
enter
E0 1C
9C
0)
0B
8B
J
24
A4
F3
3D
BD
F11
57
D7
-_
0C
8C
K
25
A5
F4
3E
BE
F12
58
D8
=+
0D
8D
L
26
A6
F5
3F
BF
ins
E0 52
D2
Bksp
0E
8E
;:
27
A7
F6
40
C0
del
E0 53
D3
Tab
0F
8F
‘“
28
A8
F7
41
C1
home
E0 47
C7
Q
10
90
`~
29
A9
F8
42
C2
end
E0 4F
CF
W
11
91
L shift
2A
AA
F9
43
C3
pgup
E0 49
C9
E
12
92
\|
2B
AB
F10
44
C4
pgdn
E0 51
D1
R
13
93
Z
2C
AC
NUM
45
C5
left
E0 4B
CB
T
14
94
X
2D
AD
SCRL
46
C6
right
E0 4D
CD
Y
15
95
C
2E
AE
home
47
C7
up
E0 48
C8
U
16
96
V
2F
AF
up
48
C8
down
E0 50
D0
I
17
97
B
30
B0
pgup
49
C9
R alt
E0 38
B8
O
18
98
N
31
B1
-
4A
CA
R ctrl
E0 1D
9D
P
19
99
M
32
B2
left
4B
CB
Pause
E1 1D 45 E1 9D C5
-
Page 1351 Thi d
t
t d ith F
M k
402
Appendix C
Table 91: Keyboard Codes (in hex) Key Esc 1! 2@ 3# 4$ 5% 6^ 7& 8* 9( 0) -_ =+ Bksp Tab Q W E R T Y U I O P [{ ]} enter ctrl A S D F G H J K L ;: ‘“ `~ Lshift \| Z X C V B Key
Page 1352
Scan Code 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 Scan Code
ASCII
Shifta
Ctrl
1B 31 32 33 34 35 36 37 38 39 30 2D 3D 08 09 71 77 65 72 74 79 75 69 6F 70 5B 5D 0D
1B 21 40 23 24 25 5E 26 2A 28 29 5F 2B 08 0F00 51 57 45 52 54 59 55 49 4F 50 7B 7D 0D
1B
61 73 64 66 67 68 6A 6B 6C 3B 27 60 5C 7A 78 63 76 62 ASCII
0300
1E
1F
Alt
7800 7900 7A00 7B00 7C00 7D00 7E00 7F00 8000 8100 8200 8300
7F 11 17 05 12 14 19 15 09 0F 10 1B 1D 0A
1000 1100 1200 1300 1400 1500 1600 1700 1800 1900
41 53 44 46 47 48 4A 4B 4C 3A 22 7E
01 13 04 06 07 08 0A 0B 0C
1E00 1F00 2000 2100 2200 2300 2400 2500 2600
7C 5A 58 43 56 42 Shift
1C 1A 18 03 16 02 Ctrl
2C00 2D00 2E00 2F00 3000 Alt
Num
Caps
Shift Caps Shift Num
1B 31 32 33 34 35 36 37 38 39 30 2D 3D 08 09 71 77 65 72 74 79 75 69 6F 70 5B 5D 0D
1B 31 32 33 34 35 36 37 38 39 30 2D 3D 08 09 51 57 45 52 54 59 55 49 4F 50 5B 5D 0D
1B 31 32 33 34 35 36 37 38 39 30 5F 2B 08 0F00 71 77 65 72 74 79 75 69 6F 70 7B 7D 0A
1B 31 32 33 34 35 36 37 38 39 30 5F 2B 08 0F00 51 57 45 52 54 59 55 49 4F 50 7B 7D 0A
61 73 64 66 67 68 6A 6B 6C 3B 27 60
41 53 44 46 47 48 4A 4B 4C 3B 27 60
61 73 64 66 67 68 6A 6B 6C 3A 22 7E
41 53 44 46 47 48 4A 4B 4C 3A 22 7E
5C 7A 78 63 76 62 Num
5C 5A 58 43 56 42 Caps
7C 7C 7A 5A 78 58 63 43 76 56 62 42 Shift Caps Shift Num
Appendices
Table 91: Keyboard Codes (in hex) Key N M ,< .> /? Rshift * PrtSc alt space caps F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 num scrl home up pgup -d left center right +e end down pgdn ins del Key
Scan Code 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 Scan Code
ASCII
Shifta
Ctrl
Alt
Num
Caps
6E 6D 2C 2E 2F
4E 4D 3C 3E 3F
0E 0D
3100 3200
6E 6D 2C 2E 2F
4E 4D 2C 2E 2F
6E 6D 3C 3E 3F
4E 4D 3C 3E 3F
2A
INT 5b
10c
2A
2A
INT 5
INT 5
20
20
20
20
20
20
20
3B00 3C00 3D00 3E00 3F00 4000 4100 4200 4300 4400
5400 5500 5600 5700 5800 5900 5A00 5B00 5C00 5D00
5E00 5F00 6000 6100 6200 6300 6400 6500 6600 6700
6800 6900 6A00 6B00 6C00 6D00 6E00 6F00 7000 7100
3B00 3C00 3D00 3E00 3F00 4000 4100 4200 4300 4400
3B00 3C00 3D00 3E00 3F00 4000 4100 4200 4300 4400
5400 5500 5600 5700 5800 5900 5A00 5B00 5C00 5D00
5400 5500 5600 5700 5800 5900 5A00 5B00 5C00 5D00
4700 4800 4900 2D 4B00 4C00 4D00 2B 4F00 5000 5100 5200 5300 ASCII
37 38 39 2D 34 35 36 2B 31 32 33 30 2E Shift
7700
Alt
37 38 39 2D 34 35 36 2B 31 32 33 30 2E Num
4700 4800 4900 2D 4B00 4C00 4D00 2B 4F00 5000 5100 5200 5300 Caps
8400 7300 7400 7500 7600
Ctrl
Shift Caps Shift Num
37 4700 38 4800 39 4900 2D 2D 34 4B00 35 4C00 36 4D00 2B 2B 31 4F00 32 5000 33 5100 30 5200 2E 5300 Shift Caps Shift Num
a. For the alphabetic characters, if capslock is active then see the shift-capslock column. b. Pressing the PrtSc key does not produce a scan code. Instead, BIOS executes an int 5 instruction which should print the screen. c. This is the control-P character that will activate the printer under MS-DOS. d. This is the minus key on the keypad. e. This is the plus key on the keypad.
Page 1353
Appendix C
Table 92: Keyboard Related BIOS Variables Name
Addressa
Size
Description
KbdFlags1 (modifier flags)
40:17
Byte
This byte maintains the current status of the modifier keys on the keyboard. The bits have the following meanings: bit 7: Insert mode toggle bit 6: Capslock toggle (1=capslock on) bit 5: Numlock toggle (1=numlock on) bit 4: Scroll lock toggle (1=scroll lock on) bit 3: Alt key (1=alt is down) bit 2: Ctrl key (1=ctrl is down) bit 1: Left shift key (1=left shift is down) bit 0: Right shift key (1=right shift is down)
KbdFlags2 40:18 (Toggle keys down)
Byte
Specifies if a toggle key is currently down. bit 7: Insert key (currently down if 1) bit 6: Capslock key (currently down if 1) bit 5: Numlock key (currently down if 1) bit 4: Scroll lock key (currently down if 1) bit 3: Pause state locked (ctrl-Numlock) if one bit 2: SysReq key (currently down if 1) bit 1: Left alt key (currently down if 1) bit 0: Left ctrl key (currently down if 1)
AltKpd
40:19
Byte
BIOS uses this to compute the ASCII code for an alt-Keypad sequence.
BufStart
40:80
Word
Offset of start of keyboard buffer (1Eh). Note: this variable is not supported on many systems, be careful if you use it.
BufEnd
40:82
Word
Offset of end of keyboard buffer (3Eh). See the note above.
KbdFlags3
40:96
Byte
Miscellaneous keyboard flags. bit 7: Read of keyboard ID in progress bit 6: Last char is first kbd ID character bit 5: Force numlock on reset bit 4: 1 if 101-key kbd, 0 if 83/84 key kbd. bit 3: Right alt key pressed if 1 bit 2: Right ctrl key pressed if 1 bit 1: Last scan code was E0h bit 0: Last scan code was E1h
KbdFlags4
40:97
Byte
More miscellaneous keyboard flags. bit 7: Keyboard transmit error bit 6: Mode indicator update bit 5: Resend receive flag bit 4: Acknowledge received bit 3: Must always be zero bit 2: Capslock LED (1=on) bit 1: Numlock LED (1=on) bit 0: Scroll lock LED (1=on)
a. Addresses are all given in hexadecimal
Page 1354
Appendices
Table 93: On-Board Keyboard Controller Commands (Port 64h) Value (hex) Description 20
Transmit keyboard controller’s command byte to system as a scan code at port 60h.
60
The next byte written to port 60h will be stored in the keyboard controller’s command byte.
A4
Test if a password is installed (PS/2 only). Result comes back in port 60h. 0FAh means a password is installed, 0F1h means no password.
A5
Transmit password (PS/2 only). Starts receipt of password. The next sequence of scan codes written to port 60h, ending with a zero byte, are the new password.
A6
Password match. Characters from the keyboard are compared to password until a match occurs.
A7
Disable mouse device (PS/2 only). Identical to setting bit five of the command byte.
A8
Enable mouse device (PS/2 only). Identical to clearing bit five of the command byte.
A9
Test mouse device. Returns 0 if okay, 1 or 2 if there is a stuck clock, 3 or 4 if there is a stuck data line. Results come back in port 60h.
AA
Initiates self-test. Returns 55h in port 60h if successful.
AB
Keyboard interface test. Tests the keyboard interface. Returns 0 if okay, 1 or 2 if there is a stuck clock, 3 or 4 if there is a stuck data line. Results come back in port 60h.
AC
Diagnostic. Returns 16 bytes from the keyboard’s microcontroller chip. Not available on PS/2 systems.
AD
Disable keyboard. Same operation as setting bit four of the command register.
AE
Enable keyboard. Same operation as clearing bit four of the command register.
C0
Read keyboard input port to port 60h. This input port contains the following values: bit 7: Keyboard inhibit keyswitch (0 = inhibit, 1 = enabled). bit 6: Display switch (0=color, 1=mono). bit 5: Manufacturing jumper. bit 4: System board RAM (always 1). bits 0-3: undefined.
C1
Copy input port (above) bits 0-3 to status bits 4-7. (PS/2 only)
C2
Copy input pot (above) bits 4-7 to status port bits 4-7. (PS/2 only).
D0
Copy microcontroller output port value to port 60h (see definition below).
D1
Write the next data byte written to port 60h to the microcontroller output port. This port has the following definition: bit 7: Keyboard data. bit 6: Keyboard clock. bit 5: Input buffer empty flag. bit 4: Output buffer full flag. bit 3: Undefined. bit 2: Undefined. bit 1: Gate A20 line. bit 0: System reset (if zero). Note: writing a zero to bit zero will reset the machine. Writing a one to bit one combines address lines 19 and 20 on the PC’s address bus.
Page 1355
Appendix C
Table 93: On-Board Keyboard Controller Commands (Port 64h) Value (hex) Description D2
Write keyboard buffer. The keyboard controller returns the next value sent to port 60h as though a keypress produced that value. (PS/2 only).
D3
Write mouse buffer. The keyboard controller returns the next value sent to port 60h as though a mouse operation produced that value. (PS/2 only).
D4
Writes the next data byte (60h) to the mouse (auxiliary) device. (PS/2 only).
E0
Read test inputs. Returns in port 60h the status of the keyboard serial lines. Bit zero contains the keyboard clock input, bit one contains the keyboard data input.
Fx
Pulse output port (see definition for D1). Bits 0-3 of the keyboard controller command byte are pulsed onto the output port. Resets the system if bit zero is a zero.
Table 94: Keyboard to System Transmissions Value (hex) Description 00
Page 1356
Data overrun. System sends a zero byte as the last value when the keyboard controller’s internal buffer overflows.
1..58 81..D8
Scan codes for key presses. The positive values are down codes, the negative values (H.O. bit set) are up codes.
83AB
Keyboard ID code returned in response to the F2 command (PS/2 only).
AA
Returned during basic assurance test after reset. Also the up code for the left shift key.
EE
Returned by the ECHO command.
F0
Prefix to certain up codes (N/A on PS/2).
FA
Keyboard acknowledge to keyboard commands other than resend or ECHO.
FC
Basic assurance test failed (PS/2 only).
FD
Diagnostic failure (not available on PS/2).
FE
Resend. Keyboard requests the system to resend the last command.
FF
Key error (PS/2 only).
Appendices
Table 95: Keyboard Microcontroller Commands (Port 60h) Value (hex) Description ED
Send LED bits. The next byte written to port 60h updates the LEDs on the keyboard. The parameter (next) byte contains: bits 3-7: Must be zero. bit 2: Capslock LED (1 = on, 0 = off). bit 1: Numlock LED (1 = on, 0 = off). bit 0: Scroll lock LED (1 = on, 0 = off).
EE
Echo commands. Returns 0EEh in port 60h as a diagnostic aid.
F0
Select alternate scan code set (PS/2 only). The next byte written to port 60h selects one of the following options: 00: Report current scan code set in use (next value read from port 60h). 01: Select scan code set #1 (standard PC/AT scan code set). 02: Select scan code set #2. 03: Select scan code set #3.
F2
Send two-byte keyboard ID code as the next two bytes read from port 60h (PS/2 only).
F3
Set Autorepeat delay and repeat rate. Next byte written to port 60h determines rate: bit 7: must be zero bits 5,6: Delay. 00- 1/4 sec, 01- 1/2 sec, 10- 3/4 sec, 11- 1 sec. bits 0-4: Repeat rate. 0- approx 30 chars/sec to 1Fh- approx 2 chars/sec.
F4
Enable keyboard.
F5
Reset to power on condition and wait for enable command.
F6
Reset to power on condition and begin scanning keyboard.
F7
Make all keys autorepeat (PS/2 only).
F8
Set all keys to generate an up code and a down code (PS/2 only).
F9
Set all keys to generate an up code only (PS/2 only).
FA
Set all keys to autorepeat and generate up and down codes (PS/2 only).
FB
Set an individual key to autorepeat. Next byte contains the scan code of the desired key. (PS/2 only).
FC
Set an individual key to generate up and down codes. Next byte contains the scan code of the desired key. (PS/2 only).
FD
Set an individual key to generate only down codes. Next byte contains the scan code of the desired key. (PS/2 only).
FE
Resend last result. Use this command if there is an error receiving data.
FF
Reset keyboard to power on state and start the self-test.
Page 1357
Appendix C
Table 96: BIOS Keyboard Support Functions Function # (AH)
Input Parameters
0
Output Parameters
al - ASCII character ah- scan code
Description Read character. Reads next available character from the system’s type ahead buffer. Wait for a keystroke if the buffer is empty.
1
ZF- Set if no key. ZF- Clear if key available. al - ASCII code ah- scan code
Checks to see if a character is available in the type ahead buffer. Sets the zero flag if not key is available, clears the zero flag if a key is available. If there is an available key, this function returns the ASCII and scan code value in ax. The value in ax is undefined if no key is available.
2
al- shift flags
Returns the current status of the shift flags in al. The shift flags are defined as follows:
3
bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Alt key is down bit 2: Ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down
al = 5 bh = 0, 1, 2, 3 for 1/4,
Set auto repeat rate. The bh register contains the amount of time to wait before starting the autorepeat operation, the bl register contains the autorepeat rate.
1/2, 3/4, or 1 second delay bl = 0..1Fh for 30/sec to 2/sec.
5
10h
11h
Page 1358
ch = scan code cl = ASCII code
al - ASCII character ah- scan code
Store keycode in buffer. This function stores the value in the cx register at the end of the type ahead buffer. Note that the scan code in ch doesn’t have to correspond to the ASCII code appearing in cl . This routine will simply insert the data you provide into the system type ahead buffer. Read extended character. Like ah=0 call, except this one passes all key codes, the ah=0 call throws away codes that are not PC/XT compatible.
ZF- Set if no key. Like the ah=01h call except this one does not throw away ZF- Clear if key avail- keycodes that are not PC/XT compatible (i.e., the extra keys able. found on the 101 key keyboard). al - ASCII code ah- scan code
Appendices
Table 96: BIOS Keyboard Support Functions Function # (AH) 12h
Input Parameters
Output Parameters al- shift flags ah- extended shift flags
Description Returns the current status of the shift flags in ax. The shift flags are defined as follows: bit 15: SysReq key pressed bit 14: Capslock key currently down bit 13: Numlock key currently down bit 12: Scroll lock key currently down bit 11: Right alt key is down bit 10:Right ctrl key is down bit 9: Left alt key is down bit 8: Left ctrl key is down bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Either alt key is down (some machines, left only) bit 2: Either ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down
Page 1359
Appendix C
Page 1360
Appendix D: Instruction Set Reference This section provides encodings and approximate cycle times for all instructions that you would normally execute in real mode on an Intel processor. Missing are the special instructions on the 80286 and later processors that manipulate page tables, segment descriptors, and other instructions that only an operating system should use. The cycle times are approximate. To determine exact execution times, you will need to run an experiment. The cycle times are given for comparison purposes only. Key to special bits in encodings: x: s:
rrr:
Don’t care. Can be zero or one. Sign extension bit for immediate operands. If zero, immediate operand is 16 or 32 bits depending on destination operand size. If s bit is one, then the immediate operand is eight bits and the CPU sign extends to 16 or 32 bits, as appropriate. Same as reg field in [mod-reg-r/m] byte.
Other Notes: [disp] [imm]
[mod-reg-r/m]:
reg,reg
This field can be zero, one, two, or four bytes long as required by the instruction. This field is one byte long if the operand is an eight bit operand or if the s bit in the instruction opcode is one. It is two or four bytes long if the s bit contains zero and the destination operand is 16 or 32 bits, respectively. Instructions that have a mod-reg-r/m byte may have a scaled index byte (sib) and a zero, one, two, or four byte displacement. See Appendix E for details concerning the encoding of this portion of the instruction. Many instructions allow two operands using a [mod-reg-r/m] byte. A single direction bit in the opcode determines whether the instruction treats the reg operand as the destination or the modr/m operand as the destination (e.g., mov reg,mem vs. mov mem,reg). Such instructions also allow two register operands. It turns out there are two encodings for each such reg-reg instruction. That is, you can encode an instruction like mov ax, bx with ax encoded in the reg field and bx encoded in the mod-r/m field, or you can encode it with bx encoded in the reg field and ax encoded in the mod-r/m field. Such instructions always have an x bit in the opcode. If the x bit is zero, the destination is the register specified by the mod-r/m field. If the x bit is one, the destination is the register specified by the reg field. Other types of instructions support multiple encodings for similar reasons.
Table 97: 80x86 Instruction Set Referencea Instruction
Execution Time in Cyclesc
Encoding (bin)b
8088
8086
80286
80386
80486
Pentium
aaa
0011 0111
8
8
3
4
3
3
aad
1101 0101 0000 1010
60
60
14
19
14
10
aam
1101 0100 0000 1010
83
83
16
17
15
18
aas
0011 1111
8
8
3
4
3
3
adc reg8, reg8
0001 00x0 [11-reg-r/m]
3
‘3
2
2
1
1
adc reg16, reg16
0001 00x1 [11-reg-r/m]
3
3
2
2
1
1
Page 1361 Thi d
t
t d ith F
M k
402
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
3
3
2
2
1
1
adc reg32, reg32
0110 0110 0001 00x1 [11-reg-r/m]
adc reg8, mem8
0001 0010 [mod-reg-r/m]
9+EA
9+EA
7
6
2
2
adc reg16, mem16
0001 0011 [mod-reg-r/m]
13+EA
9+EA
7
6
2
2
adc reg32, mem32
0110 0110 0001 0011 [mod-reg-r/m]
-
-
-
6
2
2
adc mem8, reg8
0001 0000 [mod-reg-r/m]
16+EA
16+EA
7
7
3
3
adc mem16, reg16
0001 0001 [mod-reg-r/m]
24+EA
16+EA
7
7
3
3
adc mem32, reg32
0110 0110 0001 0001 [mod-reg-r/m]
-
-
-
7
3
3
adc reg8, imm8
1000 00x0 [11-010-r/m] [imm]
4
4
3
2
1
1
adc reg16, imm16
1000 00s0 [11-010-r/m] [imm]
4
4
3
2
1
1
adc reg32, imm32
0110 0110 1000 00s0 [11-010-r/m] [imm]
4
4
3
2
1
1
adc mem8, imm8
1000 00x0 [mod-010-r/m] [imm]
17+EA
17+EA
7
7
3
3
adc mem16, imm16
1000 00s1 [mod-010-r/m] [imm]
23+EA
17+EA
7
7
3
3
adc mem32, imm32
0110 0110 1000 00s1 [mod-010-r/m] [imm]
-
-
-
7
3
3
adc al, imm
0001 0100 [imm]
4
4
3
2
1
1
adc ax, imm
0001 0101 [imm]
4
4
3
2
1
1
adc eax, imm
0110 0110 0001 0101 [imm]
-
-
-
2
1
1
add reg8, reg8
0000 00x0 [11-reg-r/m]
3
‘3
2
2
1
1
add reg16, reg16
0000 00x1 [11-reg-r/m]
3
3
2
2
1
1
add reg32, reg32
0110 0110 0000 00x1 [11-reg-r/m]
3
3
2
2
1
1
Page 1362
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
add reg8, mem8
0000 0010 [mod-reg-r/m]
9+EA
9+EA
7
6
2
2
add reg16, mem16
0000 0011 [mod-reg-r/m]
13+EA
9+EA
7
6
2
2
add reg32, mem32
0110 0110 0000 0011 [mod-reg-r/m]
-
-
-
6
2
2
add mem8, reg8
0000 0000 [mod-reg-r/m]
16+EA
16+EA
7
7
3
3
add mem16, reg16
0000 0001 [mod-reg-r/m]
24+EA
16+EA
7
7
3
3
add mem32, reg32
0110 0110 0000 0001 [mod-reg-r/m]
-
-
-
7
3
3
add reg8, imm8
1000 00x0 [11-000-r/m] [imm]
4
4
3
2
1
1
add reg16, imm16
1000 00s0 [11-000-r/m] [imm]
4
4
3
2
1
1
add reg32, imm32
0110 0110 1000 00s0 [11-000-r/m] [imm]
4
4
3
2
1
1
add mem8, imm8
1000 00x0 [mod-000-r/m] [imm]
17+EA
17+EA
7
7
3
3
add mem16, imm16
1000 00s1 [mod-000-r/m] [imm]
23+EA
17+EA
7
7
3
3
add mem32, imm32
0110 0110 1000 00s1 [mod-000-r/m] [imm]
-
-
-
7
3
3
add al, imm
0000 0100 [imm]
4
4
3
2
1
1
add ax, imm
0000 0101 [imm]
4
4
3
2
1
1
add eax, imm
0110 0110 0000 0101 [imm]
-
-
-
2
1
1
and reg8, reg8
0010 00x0 [11-reg-r/m]
3
‘3
2
2
1
1
and reg16, reg16
0010 00x1 [11-reg-r/m]
3
3
2
2
1
1
and reg32, reg32
0110 0110 0010 00x1 [11-reg-/rm]
3
3
2
2
1
1
and reg8, mem8
0010 0010 [mod-reg-r/m]
9+EA
9+EA
7
6
2
2
Page 1363
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
and reg16, mem16
0010 0011 [mod-reg-r/m]
13+EA
9+EA
7
6
2
2
and reg32, mem32
0110 0110 0010 0011 [mod-reg-r/m]
-
-
-
6
2
2
and mem8, reg8
0010 0000 [mod-reg-r/m]
16+EA
16+EA
7
7
3
3
and mem16, reg16
0010 0001 [mod-reg-r/m]
24+EA
16+EA
7
7
3
3
and mem32, reg32
0110 0110 0010 0001 [mod-reg-r/m]
-
-
-
7
3
3
and reg8, imm8
1000 00x0 [11-100-r/m] [imm]
4
4
3
2
1
1
and reg16, imm16
1000 00s1 [11-100-r/m] [imm]
4
4
3
2
1
1
and reg32, imm32
0110 0110 1000 00s1 [11-100-r/m] [imm]
4
4
3
2
1
1
and mem8, imm8
1000 00x0 [mod-100-r/m] [imm]
17+EA
17+EA
7
7
3
3
and mem16, imm16
1000 00s1 [mod-100-r/m] [imm]
23+EA
17+EA
7
7
3
3
and mem32, imm32
0110 0110 1000 00s1 [mod-100-r/m] [imm]
-
-
-
7
3
3
and al, imm
0010 0100 [imm]
4
4
3
2
1
1
and ax, imm
0010 0101 [imm]
4
4
3
2
1
1
and eax, imm
0110 0110 0010 0101 [imm]
-
-
-
2
1
1
bound reg16, mem32
0110 0010 [mod-reg-r/m]
13 (values within range)
10
7
8
bound reg32, mem64
0110 0110 0110 0010 [mod-reg-r/m]
10 (values within range)
7
8
bsf reg16, reg16
0000 1111 1011 1100 [11-reg-r/m]
10+3*n n= first set bit.
6-42
6-34
Page 1364
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
bsf reg32, reg32
0110 0110 0000 1111 1011 1100 [11-reg-r/m]
10+3*n n= first set bit.
6-42
6-42
bsf reg16, mem16
0000 1111 1011 1100 [mod-reg-r/m]
10+3*n n= first set bit.
7-43
6-35
bsf reg32, mem32
0110 0110 0000 1111 1011 1100 [mod-reg-r/m]
10+3*n n= first set bit.
7-43
6-43
bsr reg16, reg16
0000 1111 1011 1101 [11-reg-r/m]
10+3*n n= first set bit.
7-100
7-39
bsr reg32, reg32
0110 0110 0000 1111 1011 1101 [11-reg-r/m]
10+3*n n= first set bit.
8-100
7-71
bsr reg16, mem16
0000 1111 1011 1101 [mod-reg-r/m]
10+3*n n= first set bit.
7-101
7-40
bsr reg32, mem32
0110 0110 0000 1111 1011 1101 [mod-reg-r/m]
10+3*n n= first set bit.
8-101
7-72
bswap reg32
0000 1111 11001rrr
1
1
bt reg16, reg16
0000 1111 1010 0011 [11-reg-r/m]
3
3
4
bt reg32, reg32
0110 0110 0000 1111 1010 0011 [11-reg-r/m]
3
3
4
bt mem16, reg16
0000 1111 1010 0011 [mod-reg-r/m]
12
8
9
bt mem32, reg32
0110 0110 0000 1111 1010 0011 [mod-reg-r/m]
12
8
9
bt reg16, imm
0000 1111 1011 1010 [11-100-r/m] [imm8]
3
3
4
bt reg32, imm
0110 0110 0000 1111 1011 1010 [11-100-r/m] [imm8]
3
3
4
bt mem16, imm
0000 1111 1011 1010 [mod-100-r/m]
6
3
4
Page 1365
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
bt mem32, imm
0110 0110 0000 1111 1011 1010 [mod-100-r/m]
6
3
4
btc reg16, reg16
0000 1111 1011 1011 [11-reg-r/m]
6
6
7
btc reg32, reg32
0110 0110 0000 1111 1011 1011 [11-reg-r/m]
6
6
7
btc mem16, reg16
0000 1111 1011 1011 [mod-reg-r/m]
13
13
13
btc mem32, reg32
0110 0110 0000 1111 1011 1011 [mod-reg-r/m]
13
13
13
btc reg16, imm
0000 1111 1011 1010 [11-111-r/m] [imm8]
6
6
7
btc reg32, imm
0110 0110 0000 1111 1011 1010 [11-111-r/m] [imm8]
6
6
7
btc mem16, imm
0000 1111 1011 1010 [mod-111-r/m] [imm8]
8
8
8
btc mem32, imm
0110 0110 0000 1111 1011 1010 [mod-111-r/m] [imm8]
8
8
8
btr reg16, reg16
0000 1111 1011 0011 [11-reg-r/m]
6
6
7
btr reg32, reg32
0110 0110 0000 1111 1011 0011 [11-reg-r/m]
6
6
7
btr mem16, reg16
0000 1111 1011 0011 [mod-reg-r/m]
13
13
13
btr mem32, reg32
0110 0110 0000 1111 1011 0011 [mod-reg-r/m]
13
13
13
Page 1366
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
btr reg16, imm
0000 1111 1011 1010 [11-110-r/m] [imm8]
6
6
7
btr reg32, imm
0110 0110 0000 1111 1011 1010 [11-110-r/m] [imm8]
6
6
7
btr mem16, imm
0000 1111 1011 1010 [mod-110-r/m] [imm8]
8
8
8
btr mem32, imm
0110 0110 0000 1111 1011 1010 [mod-110-r/m] [imm8]
8
8
8
bts reg16, reg16
0000 1111 1010 1011 [11-reg-r/m]
6
6
7
bts reg32, reg32
0110 0110 0000 1111 1010 1011 [11-reg-r/m]
6
6
7
bts mem16, reg16
0000 1111 1010 1011 [mod-reg-r/m]
13
13
13
bts mem32, reg32
0110 0110 0000 1111 1010 1011 [mod-reg-r/m]
13
13
13
bts reg16, imm
0000 1111 1011 1010 [11-101-r/m] [imm8]
6
6
7
bts reg32, imm
0110 0110 0000 1111 1011 1010 [11-101-r/m] [imm8]
6
6
7
bts mem16, imm
0000 1111 1011 1010 [mod-101-r/m] [imm8]
8
8
8
bts mem32, imm
0110 0110 0000 1111 1011 1010 [mod-101-r/m] [imm8]
8
8
8
call near
1110 1000 [disp16]
7-10
3
1
23
19
7-10
Page 1367
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
call far
1001 1010 [offset] [segment]
36
28
13-16
17-20
18
4
call reg16
1111 1111 [11-010-r/m]
20
16
7-10
7-10
5
2
call mem16
1111 1111 [mod-010-r/m]
29+EA
21+EA
11-14
10-13
5
2
call mem32
1111 1111 [mod-011-r/m]
53+EA
37+EA
16-19
22-25
17
5
2
2
2
3
3
3
2
2
2
cbw
1001 1000
cdq
0110 0110 1001 1001
clc
1111 1000
2
2
2
2
2
2
cld
1111 1100
2
2
2
2
2
2
cli
1111 1010
2
2
3
5
7
cmc
1111 0101
2
2
2
2
2
2
cmp reg8, reg8
0011 10x0 [11-reg-r/m]
3
‘3
2
2
1
1
cmp reg16, reg16
0011 10x1 [11-reg-r/m]
3
3
2
2
1
1
cmp reg32, reg32
0110 0110 0011 10x1 [11-reg-/rm]
3
3
2
2
1
1
cmp reg8, mem8
0011 1010 [mod-reg-r/m]
9+EA
9+EA
7
6
2
2
cmp reg16, mem16
0011 1011 [mod-reg-r/m]
13+EA
9+EA
7
6
2
2
cmp reg32, mem32
0110 0110 0011 1011 [mod-reg-r/m]
-
-
-
6
2
2
cmp mem8, reg8
0011 1000 [mod-reg-r/m]
9+EA
9+EA
7
6
2
2
cmp mem16, reg16
0011 1001 [mod-reg-r/m]
13+EA
9+EA
7
6
2
2
cmp mem32, reg32
0110 0110 0011 1001 [mod-reg-r/m]
-
-
-
6
2
2
cmp reg8, imm8
1000 00x0 [11-111-r/m] [imm]
4
4
3
2
1
1
cmp reg16, imm16
1000 00s0 [11-111-r/m] [imm]
4
4
3
2
1
1
cmp reg32, imm32
0110 0110 1000 00s0 [11-111-r/m] [imm]
4
4
3
2
1
1
cmp mem8, imm8
1000 00x0 [mod-111-r/m] [imm]
10+EA
10+EA
6
5
2
2
Page 1368
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
cmp mem16, imm16
1000 00s1 [mod-111-r/m] [imm]
14+EA
10+EA
6
5
2
2
cmp mem32, imm32
0110 0110 1000 00s1 [mod-111-r/m] [imm]
-
-
-
5
2
2
cmp al, imm
0011 1100 [imm]
4
4
3
2
1
1
cmp ax, imm
0011 1101 [imm]
4
4
3
2
1
1
cmp eax, imm
0110 0110 0011 1101 [imm]
-
-
-
2
1
1
cmpsb
1010 0110
30
22
8
10
8
5
cmpsw
1010 0111
30
22
8
10
8
5
cmpsd
0110 0110 1010 0111
-
-
-
10
8
5
repe cmpsb
1111 0011 1010 0110
9+17*cx cx = # of repetitions
9+17*cx
5+9*cx
5+9*cx
7+7*cx 5 if cx=0
9+4*cx 7 if cx=0
repne cmpsb
1111 0010 1010 0110
9+17*cx
9+17*cx
5+9*cx
5+9*cx
7+7*cx 5 if cx=0
9+4*cx 7 if cx=0
repe cmpsw
1111 0011 1010 0111
9+25*cx
9+17*cx
5+9*cx
5+9*cx
7+7*cx 5 if cx=0
9+4*cx 7 if cx=0
repne cmpsw
1111 0010 1010 0111
9+25*cx
9+17*cx
5+9*cx
5+9*cx
7+7*cx 5 if cx=0
9+4*cx 7 if cx=0
repe cmpsd
0110 0110 1111 0011 1010 0111
-
-
-
5+9*cx
7+7*cx 5 if cx=0
9+4*cx 7 if cx=0
repne cmpsd
0110 0110 1111 0010 1010 0111
-
-
-
5+9*cx
7+7*cx 5 if cx=0
9+4*cx 7 if cx=0
cmpxchg reg8, reg8
0000 1111 1011 0000 [11-reg-r/m] Note: r/m is first register operand.
-
-
-
-
6
6
cmpxchg reg16, reg16
0000 1111 1011 0001 [11-reg-r/m]
-
-
-
-
6
6
cmpxchg reg32, reg32
0110 0110 0000 1111 1011 0001 [11-reg-r/m]
-
-
-
-
6
6
cmpxchg mem8, reg8
0000 1111 1011 0000 [mod-reg-r/m]
-
-
-
-
7 if equal, 10 if not equal
6
Page 1369
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
cmpxchg mem16, reg16
0000 1111 1011 0001 [mod-reg-r/m]
-
-
-
-
7 if equal, 10 if not equal
6
cmpxchg mem32, reg32
0110 0110 0000 1111 1011 0001 [mod-reg-r/m]
-
-
-
-
7 if equal, 10 if not equal
6
cmpxchg8b mem64
0000 1111 1100 0111 [mod-001-r/m]
-
-
-
-
-
10
cpuid
0000 1111 1010 0010
-
-
-
-
-
14
cwd
1001 1001
5
5
2
2
3
2
cwde
0110 0110 1001 1000
3
3
3
daa
0010 0111
4
4
3
4
2
3
das
0010 1111
4
4
3
4
2
3
dec reg8
1111 1110 [11-001-r/m]
3
3
2
2
1
1
dec reg16
0100 1rrr
3
3
2
2
1
1
dec reg16 (alternate encoding)
1111 1111 [11-001-r/m]
3
3
2
2
1
1
dec reg32
0110 0110 0100 1rrr
3
3
2
2
1
1
dec reg32 (alternate encoding)
0110 0110 1111 1111 [11-001-r/m]
3
3
2
2
1
1
dec mem8
1111 1110 [mod-001-r/m]
15+EA
15+EA
7
6
3
3
dec mem16
1111 1111 [mod-001-r/m]
23+EA
15+EA
7
6
3
3
dec mem32
0110 0110 1111 1111 [mod-001-r/m]
-
-
-
6
3
3
div reg8
1111 0110 [11-110-r/m]
80-90
80-90
14
14
16
17
div reg16
1111 0111 [11-110-r/m]
144-162
144-162
22
22
24
25
div reg32
0110 0110 1111 0111 [11-110-r/m]
-
-
-
38
40
41
div mem8
1111 0110 [mod-110-r/m]
(86-96) + EA
(86-96) + EA
17
17
16
17
div mem16
1111 0111 [mod-110-r/m]
25
25
24
25
div mem32
0110 0110 1111 0111 [mod-110-r/m]
-
41
40
41
Page 1370
(158-176) + (150-168) + EA EA -
-
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
enter local, 0
1100 1000 [locals-imm16] 0000 0000
11
10
14
11
enter local, 1
1100 1000 [locals-imm16] 0000 0001
15
12
17
15
enter local, lex
1100 1000 [locals:imm16] [lex:imm8]
12 + 4 * (lex-1)
15 + 4 * (lex-1)
17 + 3*lex
15 + 2*lex
hlt
1111 0100
idiv reg8
2+d
2+
2+
5+
4+
12+
1111 0110 [11-111-r/m]
101-112
101-112
17
19
19
22
idiv reg16
1111 0111 [11-111-r/m]
165-184
165-184
25
27
27
30
idiv reg32
0110 0110 1111 0111 [11-111-r/m]
-
-
-
43
43
46
idiv mem8
1111 0110 [mod-111-r/m]
(107-118) + (107-118) + EA EA
20
22
20
30
idiv mem16
1111 0111 [mod-111-r/m] [disp]
(175-194) + (171-190) + EA EA
28
30
28
30
idiv mem32
0110 0110 1111 0111 [mod-111-r/m]
imul reg8
-
-
-
46
44
46
1111 0110 [11-101-r/m]
80-98
80-98
13
9-14
13-18
11
imul reg16
1111 0111 [11-101-r/m]
128-154
128-154
21
9-22
13-26
11
imul reg32
0110 0110 1111 0111 [11-101-r/m]
-
-
-
9-38
13-42
11
imul mem8
1111 0110 [mod-101-r/m]
(86-104) + EA
(107-118) + EA
16
12-17
13-18
11
imul mem16
1111 0111 [mod-101-r/m]
(134-164) + (134-160) + EA EA
24
15-25
13-26
11
imul mem32
0110 0110 1111 0111 [mod-101-r/m]
-
-
-
12-41
13-42
11
imul reg16, reg16, imm8 imul reg16, imm8 (Second form assumes reg and r/m are the same, instruction sign extends eight bit immediate operand to 16 bits)
0110 1011 [11-reg-r/m] [imm8] (1st reg operand is specified by reg field, 2nd reg operand is specified by r/m field)
-
-
21
13-26
13-26
10
imul reg16, reg16, imm imul reg16, imm
0110 1001 [11-reg-r/m] [imm16]
-
-
21
9-22
13-26
10
Page 1371
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
13-42
13-42
10
imul reg32, reg32, imm8 imul reg32, imm8
0110 0110 0110 1011 [11-reg-r/m] [imm8]
-
-
imul reg32, reg32, imm imul reg32, imm
0110 0110 0110 1001 [11-reg-r/m] [imm32]
-
-
-
9-38
13-42
10
imul reg16,mem16,imm8 0110 1011 [11-reg-r/m] [imm8]
-
-
24
14-27
13-26
10
imul reg16,mem16,imm
0110 1001 [11-reg-r/m] [imm16]
-
-
24
12-25
13-26
10
imul reg32, mem32, imm8 0110 0110 0110 1011 [11-reg-r/m] [imm8]
-
-
-
14-43
13-42
10
imul reg32, mem32, imm
0110 0110 0110 1001 [11-reg-r/m] [imm32]
-
-
-
12-41
13-42
10
imul reg16, reg16
0000 1111 1010 1111 [11-reg-r/m] (reg is dest operand)
-
-
-
12-25
13-26
10
imul reg32, reg32
0110 0110 0000 1111 1010 1111 [11-reg-r/m] (reg is dest operand)
-
-
-
12-41
12-42
10
imul reg16, mem16
0000 1111 1010 1111 [mod-reg-r/m]
-
-
-
15-28
13-26
10
imul reg32, mem32
0110 0110 0000 1111 1010 1111 [mod-reg-r/m]
-
-
-
14-44
13-42
10
in al, port
1110 0100 [port8]
10
10
5
12
14
7
in ax, port
1110 0101 [port8]
14
10
5
12
14
7
in eax, port
0110 0110 1110 0101 [port8]
-
-
-
12
14
7
in al, dx
1110 1100
8
8
5
13
14
7
in ax, dx
1110 1101
12
8
5
13
14
7
in eax, dx
0110 0110 1110 1101
12
8
5
13
14
7
Page 1372
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
inc reg8
1111 1110 [11-000-r/m]
3
2
2
2
1
1
inc reg16
0100 0rrr
3
3
2
2
1
1
inc reg16 (alternate encoding)
1111 1111 [11-000-r/m]
3
3
2
2
1
1
inc reg32
0110 0110 0100 0rrr
-
-
-
2
1
1
inc reg32 (alternate encoding)
0110 0110 1111 1111 [11-000-r/m]
-
-
-
2
1
1
inc mem8
1111 1110 [mod-000-r/m]
15+EA
15+EA
7
6
3
3
inc mem16
1111 1110 [mod-000-r/m] [disp]
23+EA
15+EA
7
6
3
3
inc mem32
0110 0110 1111 1110 [mod-000-r/m]
-
-
-
6
3
3
insb
1010 1010
-
-
5
15
17
9
insw
1010 1011
-
-
5
15
17
9
insd
0110 0110 1010 1011
-
-
-
15
17
9
rep insb
1111 0010 1010 1010
-
-
5 + 4*cx
14 + 6*cx
16+8*cx
11 + 3*cx
rep insw
1111 0010 1010 1011
-
-
5 + 4*cx
14 + 6*cx
16+8*cx
11 + 3*cx
rep insd
0110 0110 1111 0010 1010 1011
-
-
-
14 + 6*cx
16+8*cx
11 + 3*cx
int nn
1100 1101 [imm8]
71
51
23-26
37
30
16
int 03
1100 1100
72
52
23-26
33
26
13
into
1100 1110
73 (if ovr) 4 (no ovr)
53 4
24-27 3
35 3
28 3
13 3
iret
1100 1111
44
32
17-20
22
15
8
iretd
0110 0110 1100 1111
22
15
10
ja short
0111 0111 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
ja near
0000 1111 1000 0111 [disp16]
-
-
-
7-10 3
3 1
1
jae short
0111 0011 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jae near
0000 1111 1000 0011 [disp16]
-
-
-
7-10 3
3 1
1
jb short
0111 0010 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
Page 1373
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
jb near
0000 1111 1000 0010 [disp16]
-
-
-
7-10 3
3 1
1
jbe short
0111 0110 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jbe near
0000 1111 1000 0110 [disp16]
-
-
-
7-10 3
3 1
1
jc short
0111 0010 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jc near
0000 1111 1000 0010 [disp16]
-
-
-
7-10 3
3 1
1
je short
0111 0100 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
je near
0000 1111 1000 0100 [disp16]
-
-
-
7-10 3
3 1
1
jg short
0111 1111 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jg near
0000 1111 1000 1111 [disp16]
-
-
-
7-10 3
3 1
1
jge short
0111 1101 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jge near
0000 1111 1000 1101 [disp16]
-
-
-
7-10 3
3 1
1
jl short
0111 1100 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jl near
0000 1111 1000 1100 [disp16]
-
-
-
7-10 3
3 1
1
jle short
0111 1110 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jle near
0000 1111 1000 1110 [disp16]
-
-
-
7-10 3
3 1
1
jna short
0111 0110 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jna near
0000 1111 1000 0110 [disp16]
-
-
-
7-10 3
3 1
1
jnae short
0111 0010 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnae near
0000 1111 1000 0010 [disp16]
-
-
-
7-10 3
3 1
1
jnb short
0111 0011 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
Page 1374
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
jnb near
0000 1111 1000 0011 [disp16]
-
-
-
7-10 3
3 1
1
jnbe short
0111 0111 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnbe near
0000 1111 1000 0111 [disp16]
-
-
-
7-10 3
3 1
1
jnc short
0111 0011 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnc near
0000 1111 1000 0011 [disp16]
-
-
-
7-10 3
3 1
1
jne short
0111 0101 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jne near
0000 1111 1000 0101 [disp16]
-
-
-
7-10 3
3 1
1
jng short
0111 1110 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jng near
0000 1111 1000 1110 [disp16]
-
-
-
7-10 3
3 1
1
jnge short
0111 1100 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnge near
0000 1111 1000 1100 [disp16]
-
-
-
7-10 3
3 1
1
jnl short
0111 1101 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnl near
0000 1111 1000 1101 [disp16]
-
-
-
7-10 3
3 1
1
jnle short
0111 1111 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnle near
0000 1111 1000 1111 [disp16]
-
-
-
7-10 3
3 1
1
jno short
0111 0001 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jno near
0000 1111 1000 0001 [disp16]
-
-
-
7-10 3
3 1
1
jnp short
0111 1011 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnp near
0000 1111 1000 1011 [disp16]
-
-
-
7-10 3
3 1
1
jns short
0111 1001 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
Page 1375
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
jns near
0000 1111 1000 1001 [disp16]
-
-
-
7-10 3
3 1
1
jnz short
0111 0101 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jnz near
0000 1111 1000 0101 [disp16]
-
-
-
7-10 3
3 1
1
jo short
0111 0000 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jo near
0000 1111 1000 0000 [disp16]
-
-
-
7-10 3
3 1
1
jp short
0111 1010 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jp near
0000 1111 1000 1010 [disp16]
-
-
-
7-10 3
3 1
1
jpe short
0111 1010 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jpe near
0000 1111 1000 1010 [disp16]
-
-
-
7-10 3
3 1
1
jpo short
0111 1011 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jpo near
0000 1111 1000 1011 [disp16]
-
-
-
7-10 3
3 1
1
js short
0111 1000 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
js near
0000 1111 1000 1000 [disp16]
-
-
-
7-10 3
3 1
1
jz short
0111 0100 [disp8]
16 4 (not taken)
16 4
7-10 3
7-10 3
3 1
1
jz near
0000 1111 1000 0100 [disp16]
-
-
-
7-10 3
3 1
1
jcxz short
1110 0011 [disp8]
18 6 (not taken)
18 6
8-11 4
9-12 5
8 5
6 5
jecxz short
0110 0110 1110 0011 [disp8]
9-12 5
8 5
6 5
jmp short
1110 1011 [disp8]
15
15
7-10
7-10
3
1
jmp near
1110 1001 [disp16]
15
15
7-10
7-10
3
1
jmp reg16
1111 1111 [11-100-r/m]
11
11
7-10
7-10
5
2
Page 1376
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
18+EA
18+EA
11-14
10-13
5
2
15
15
11-14
12-15
17
3
24+EA
24+EA
15-18
43-46
13
2
4
4
2
2
3
2
jmp mem16
1111 1111 [mod-100-r/m]
jmp far
1110 1010 [offset16] [segment16]
jmp mem32
1111 1111 [mod-101-r/m]
lahf
1001 1111
lds reg, mem32
1100 0101 [mod-reg-r/m]
24+EA
16+EA
7
7
6
4
lea reg, mem
1000 1101 [mod-101-r/m]
2+EA
2+EA
3
2
1
1
leave
1100 1001
-
-
5
4
5
3
les reg, mem32
1100 0100 [mod-reg-r/m]
24+EA
16+EA
7
7
6
4
lfs reg, mem32
0000 1111 1011 0100 [mod-reg-r/m]
-
-
-
7
6
4
lgs reg, mem32
0000 1111 1011 0101 [mod-reg-r/m]
-
-
-
7
6
4
lodsb
1010 1100
12
12
5
5
5
2
lodsw
1010 1101
16
12
5
5
5
2
loadsd
0110 0110 1010 1101
-
-
-
5
5
2
loop short
1110 0010 [disp8]
17 5 (not taken)
17 5
8-11 4
11-14
7 6
5
loope short loopz short
1110 0001 [disp8]
18 6 (not taken)
18 6
8-11 4
11-14
9 6
7
loopne short loopnz short
1110 0000 [disp8]
19 5(not taken)
19 5
8-11 4
11-14
9 6
7
lss reg, mem32
0000 1111 1011 0010 [mod-reg-r/m]
-
-
-
7
6
4
mov reg8, reg8
1000 1000 [11-reg-r/m] (r/m specifies destination reg)
2
2
2
2
1
1
mov reg8, reg8 (alternate encoding)
1000 1010 [11-reg-r/m] (reg specifies destination reg)
2
2
2
2
1
1
mov reg16, reg16
1000 1001 [11-reg-r/m] (r/m specifies destination reg)
2
2
2
2
1
1
mov reg16, reg16 (alternate encoding)
1000 1011 [11-reg-r/m] (reg specifies destination reg)
2
2
2
2
1
1
Page 1377
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
mov reg32, reg32
0110 0110 1000 1001 [11-reg-r/m] (r/m specifies destination reg)
-
-
-
2
1
1
mov reg32, reg32 (alternate encoding)
0110 0110 1000 1011 [11-reg-r/m] (reg specifies destination reg)
-
-
-
2
1
1
mov mem, reg8
1000 1000 [mod-reg-r/m]
9+EA
9+EA
3
2
1
1
mov reg8, mem
1000 1010 [mod-reg-r/m]
8+EA
8+EA
5
4
1
1
mov mem, reg16
1000 1001 [mod-reg-r/m]
13+EA
9+EA
3
2
1
1
mov reg16, mem
1000 1011 [mod-reg-r/m]
12+EA
8+EA
5
4
1
1
mov mem, reg32
0110 0110 1000 1001 [mod-reg-r/m]
-
-
-
2
1
1
mov reg16, mem
0110 0110 1000 1011 [mod-reg-r/m]
-
-
-
4
1
1
mov reg8, imm
1011 0rrr [imm8]
4
4
2
2
1
1
mov reg8, imm (alternate encoding)
1100 0110 [11-000-r/m] [imm8]
10
10
2
2
1
1
mov reg16, imm
1011 1rrr [imm16]
4
4
2
2
1
1
mov reg16, imm (alternate encoding)
1100 0111 [11-000-r/m] [imm16]
10
10
2
2
1
1
mov reg32, imm
0110 0110 1011 1rrr [imm32]
-
-
-
2
1
1
mov reg32, imm (alternate encoding)
0110 0110 1100 0111 [11-000-r/m] [imm32]
-
-
-
2
1
1
mov mem8, imm
1100 0110 [mod-000-r/m] [imm8]
10+EA
10+EA
3
2
1
1
mov mem16, imm
1100 0111 [mod-000-r/m] [imm16]
14+EA
10+EA
3
2
1
1
mov mem32, imm
1100 0111 [mod-000-r/m] [imm32]
-
-
-
2
1
1
mov al, disp
1010 0000 [disp]
10
10
5
4
1
1
Page 1378
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
mov ax, disp
1010 0001 [disp]
14
10
5
4
1
1
mov eax, disp
0110 0110 1010 0001 [disp]
-
-
-
4
1
1
mov disp, al
1010 0010 [disp]
10
10
3
2
1
1
mov disp, ax
1010 0011 [disp]
14
10
3
2
1
1
mov disp, eax
0110 0110 1010 0011 [disp]
-
-
-
2
1
1
mov segreg, reg16
1000 1110 [11-sreg-r/m]
2
2
2
2
3
2-3
mov segreg, mem
1000 1110 [mod-reg-r/m]
12+EA
8+EA
5
5
3
2-3
mov reg16, segreg
1000 1100 [11-sreg-r/m]
2
2
2
2
3
1
mov mem, segreg
1000 1100 [mod-reg-r/m]
13+EA
9+EA
3
2
3
1
movsb
1010 0100
18
18
5
8
7
4
movsw
1010 0101
26
18
5
8
7
4
movsd
0110 0110 1010 0101
-
-
-
8
7
4
rep movsb
1111 0010 1010 0100
9 + 17 * cx
9 + 17*cx
5 + 4*cx
8 + 4*cx
12 + 3*cx 5 if cx=0 13 if cx=1
4 + 3*cx
rep movsw
1111 0010 1010 0101
9 + 25 * cx
9 + 17*cx
5 + 4*cx
8 + 4*cx
12 + 3*cx 5 if cx=0 13 if cx=1
4 + 3*cx
rep movsd
0110 0110 1111 0010 1010 0101
-
-
-
8 + 4*cx
12 + 3*cx 5 if cx=0 13 if cx=1
4 + 3*cx
movsx reg16, reg8
0000 1111 1011 1110 [11-reg-r/m] (dest is reg operand)
3
3
3
movsx reg32, reg8
0110 0110 0000 1111 1011 1110 [11-reg-r/m]
3
3
3
movsx reg32, reg16
0110 0110 0000 1111 1011 1111 [11-reg-r/m]
3
3
3
movsx reg16, mem8
0000 1111 1011 1110 [mod-reg-r/m]
6
3
3
Page 1379
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
movsx reg32, mem8
0110 0110 0000 1111 1011 1110 [mod-reg-r/m]
6
3
3
movsx reg32, mem16
0110 0110 0000 1111 1011 1111 [mod-reg-r/m]
6
3
3
movzx reg16, reg8
0000 1111 1011 0110 [11-reg-r/m] (dest is reg operand)
3
3
3
movzx reg32, reg8
0110 0110 0000 1111 1011 0110 [11-reg-r/m]
3
3
3
movzx reg32, reg16
0110 0110 0000 1111 1011 0111 [11-reg-r/m]
3
3
3
movzx reg16, mem8
0000 1111 1011 0110 [mod-reg-r/m]
6
3
3
movzx reg32, mem8
0110 0110 0000 1111 1011 0110 [mod-reg-r/m]
6
3
3
movzx reg32, mem16
0110 0110 0000 1111 1011 0111 [mod-reg-r/m]
6
3
3
mul reg8
1111 0110 [11-100-r/m]
70-77
70-77
13
9-14
13-18
11
mul reg16
1111 0111 [11-100-r/m]
118-133
118-133
21
9-22
13-26
11
mul reg32
0110 0110 1111 0111 [11-100-r/m]
-
-
-
9-38
13-42
10
mul mem8
1111 0110 [mod-100-r/m]
(76-83) + EA
(76-83) + EA
16
12-17
13-18
11
mul mem16
1111 0111 [mod-100-r/m]
24
12-25
13-26
11
mul mem32
0110 0110 1111 0111 [mod-100-r/m]
-
-
-
12-41
13-42
10
neg reg8
1111 0110 [11-011-r/m]
3
3
2
2
1
1
neg reg16
1111 0111 [11-011-r/m]
3
3
2
2
1
1
Page 1380
(124-139) + (124-139) + EA EA
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
3
3
2
2
1
1
neg reg32
0110 0110 1111 0111 [11-011-r/m]
neg mem8
1111 0110 [mod-011-r/m]
16+EA
16+EA
7
6
3
3
neg mem16
1111 0111 [mod-011-r/m]
24+EA
16+EA
7
6
3
3
neg mem32
0110 0110 1111 0111 [mod-011-r/m]
-
-
-
6
3
3
nop (same as xchg ax, ax)
1001 0000
3
3
3
3
1
1
not reg8
1111 0110 [11-010-r/m]
3
3
2
2
1
1
not reg16
1111 0111 [11-010-r/m]
3
3
2
2
1
1
not reg32
0110 0110 1111 0111 [11-010-r/m]
3
3
2
2
1
1
not mem8
1111 0110 [mod-010-r/m]
16+EA
16+EA
7
6
3
3
not mem16
1111 0111 [mod-010-r/m]
24+EA
16+EA
7
6
3
3
not mem32
0110 0110 1111 0111 [mod-010-r/m]
-
-
-
6
3
3
or reg8, reg8
0000 10x0 [11-reg-r/m]
3
‘3
2
2
1
1
or reg16, reg16
0000 10x1 [11-reg-r/m]
3
3
2
2
1
1
or reg32, reg32
0110 0110 0000 10x1 [11-reg-r/m]
3
3
2
2
1
1
or reg8, mem8
0000 1010 [mod-reg-r/m]
9+EA
9+EA
7
6
2
2
or reg16, mem16
0000 1011 [mod-reg-r/m]
13+EA
9+EA
7
6
2
2
or reg32, mem32
0110 0110 0000 1011 [mod-reg-r/m]
-
-
-
6
2
2
or mem8, reg8
0000 1000 [mod-reg-r/m]
16+EA
16+EA
7
7
3
3
or mem16, reg16
0000 1001 [mod-reg-r/m]
24+EA
16+EA
7
7
3
3
or mem32, reg32
0110 0110 0000 1001 [mod-reg-r/m]
-
-
-
7
3
3
or reg8, imm8
1000 00x0 [11-001-r/m] [imm]
4
4
3
2
1
1
Page 1381
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
or reg16, imm16
1000 00s0 [11-001-r/m] [imm]
4
4
3
2
1
1
or reg32, imm32
0110 0110 1000 00s0 [11-001-r/m] [imm]
4
4
3
2
1
1
or mem8, imm8
1000 00x0 [mod-001-r/m] [imm]
17+EA
17+EA
7
7
3
3
or mem16, imm16
1000 00s1 [mod-001-r/m] [imm]
25+EA
17+EA
7
7
3
3
or mem32, imm32
0110 0110 1000 00s1 [mod-001-r/m] [imm]
-
-
-
7
3
3
or al, imm
0000 1100 [imm]
4
4
3
2
1
1
or ax, imm
0000 10101 [imm]
4
4
3
2
1
1
or eax, imm
0110 0110 0000 1101 [imm]
-
-
-
2
1
1
out port, al
1110 0110 [port8]
14
10
3
10
16
12
out port, ax
1110 0111 [port8]
14
10
3
10
16
12
out port, eax
0110 0110 1110 0111 [port8]
-
-
-
10
16
12
out dx, al
1110 1110
8
8
3
11
16
12
out dx, ax
1110 1111
12
8
3
11
16
12
out dx, eax
0110 0110 1110 1111
-
-
-
11
16
12
outsb
1010 1010
-
-
5
14
17
13
outsw
1010 1011
-
-
5
14
17
13
outsd
0110 0110 1010 1011
-
-
-
14
17
13
rep outsb
1111 0010 1010 1010
-
-
5 + 4*cx
12 + 5*cx
17+5*cx
13 + 4*cx
rep outsw
1111 0010 1010 1011
-
-
5 + 4*cx
12 + 5*cx
17+5*cx
13 + 4*cx
rep outsd
0110 0110 1111 0010 1010 1011
-
-
-
12 + 5*cx
17+5*cx
13 + 4*cx
pop reg16
0101 1rrr
12
8
5
4
1
1
pop reg16 (alternate encoding)
1000 1111 [11-000-r/m]
12
8
5
4
1
1
Page 1382
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
pop reg32
0110 0110 0101 1rrr
-
-
-
4
1
1
pop reg32 (alternate encoding)
0110 0110 1000 1111 [11-000-r/m]
-
-
-
5
4
3
pop mem16
1000 1111 [mod-000-r/m]
25+EA
17+EA
5
5
6
3
pop mem32
1000 1111 [mod-000-r/m]
-
-
-
5
6
3
pop es
0000 0111
12
8
5
7
3
3
pop ss
0001 0111
12
8
5
7
3
3
pop ds
0001 1111
12
8
5
7
3
3
pop fs
0000 1111 1010 0001
-
-
-
7
3
3
pop gs
0000 1111 1010 1001
-
-
-
7
3
3
popa
0110 0001
-
-
19
24
9
5
popad
0110 0110 0110 0001
-
-
-
24
9
5
popf
1001 1101
12
8
5
5
9
6
popfd
0110 0110 1001 1101
-
-
-
5
9
6
push reg16
0101 0rrr
15
11
3
2
1
1
push reg16 (alternate encoding)
1111 1111 [11-110-r/m]
15
11
3
2
1
1
push reg32
0110 0110 0101 0rrr
-
-
-
2
1
1
push reg32 (alternate encoding)
0110 0110 1111 1111 [11-110-r/m]
-
-
-
2
1
1
push mem16
1111 1111 [mod-110-r/m]
24+EA
16+EA
5
5
4
2
push mem32
1111 1111 [mod-110-r/m]
-
-
-
5
4
2
push cs
0000 1110
14
10
3
2
3
1
push ds
0001 1110
14
10
3
2
3
1
push es
0000 0110
14
10
3
2
3
1
push ss
0001 0110
14
10
3
2
3
1
push fs
0000 1111 1010 0000
-
-
-
2
3
1
push gs
0000 1111 1010 1000
-
-
-
2
3
1
push imm8->16
0110 1000 [imm8] (sign extends value to 16 bits)
-
-
3
2
1
1
push imm16
0110 1010 [imm16]
-
-
3
2
1
1
Page 1383
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
push imm32
0110 0110 0110 1010 [imm32]
-
-
-
2
1
1
pusha
0110 0000
-
-
17
18
11
5
pushad
0110 0110 0110 0000
-
-
-
18
11
5
pushf
1001 1100
14
10
3
4
4
4
pushfd
0110 0110 1001 1100
-
-
-
4
4
4
rcl reg8, 1
1101 0000 [11-010-r/m]
2
2
2
9
3
1
rcl reg16, 1
1101 0001 [11-010-r/m]
2
2
2
9
3
1
rcl reg32, 1
0110 0110 1101 0001 [11-010-r/m]
-
-
-
9
3
1
rcl mem8, 1
1101 0000 [mod-010-r/m]
15+EA
15+EA
7
10
4
3
rcl mem16, 1
1101 0001 [mod-010-r/m]
23+EA
15+EA
7
10
4
3
rcl mem32, 1
0110 0110 1101 0001 [mod-010-r/m]
-
-
-
10
4
3
rcl reg8, cl
1101 0010 [11-010-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
9
8-30
7-24
rcl reg16, cl
1101 0011 [11-010-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
9
8-30
7-24
rcl reg32, cl
0110 0110 1101 0011 [11-010-r/m]
-
-
-
9
8-30
7-24
rcl mem8, cl
1101 0010 20+EA+4*cl 20+EA+4*cl [mod-010-r/m]
8 + cl
10
9-31
9-26
rcl mem16, cl
1101 0011 28+EA+4*cl 20+EA+4*cl [mod-010-r/m]
8 + cl
10
9-31
9-26
rcl mem32, cl
0110 0110 1101 0011 [mod-010-r/m]
-
-
-
10
9-31
9-26
rcl reg8, imm8
1100 0000 [11-010-r/m] [imm8]
-
-
5+imm8
9
8-30
8-25
rcl reg16, imm8
1100 0001 [11-010-r/m] [imm8]
-
-
5+imm8
9
8-30
8-25
rcl reg32, imm8
0110 0110 1100 0001 [11-010-r/m] [imm8]
-
-
-
9
8-30
8-25
rcl mem8, imm8
1100 0000 [mod-010-r/m] [imm8]
-
-
8+imm8
10
9-31
10-27
Page 1384
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
rcl mem16, imm8
1100 0001 [mod-010-r/m] [imm8]
-
-
8+imm8
10
9-31
10-27
rcl mem32, imm8
0110 0110 1100 0001 [mod-010-r/m] [imm8]
-
-
-
10
9-31
10-27
rcr reg8, 1
1101 0000 [11-011-r/m]
2
2
2
9
3
1
rcr reg16, 1
1101 0001 [11-011-r/m]
2
2
2
9
3
1
rcr reg32, 1
0110 0110 1101 0001 [11-011-r/m]
-
-
-
9
3
1
rcr mem8, 1
1101 0000 [mod-011-r/m]
15+EA
15+EA
7
10
4
3
rcr mem16, 1
1101 0001 [mod-011-r/m]
23+EA
15+EA
7
10
4
3
rcr mem32, 1
0110 0110 1101 0001 [mod-011-r/m]
-
-
-
10
4
3
rcr reg8, cl
1101 0010 [11-011-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
9
8-30
7-24
rcr reg16, cl
1101 0011 [11-011-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
9
8-30
7-24
rcr reg32, cl
0110 0110 1101 0011 [11-011-r/m]
-
-
-
9
8-30
7-24
rcr mem8, cl
1101 0010 20+EA+4*cl 20+EA+4*cl [mod-011-r/m]
8 + cl
10
9-31
9-26
rcr mem16, cl
1101 0011 28+EA+4*cl 20+EA+4*cl [mod-011-r/m]
8 + cl
10
9-31
9-26
rcr mem32, cl
0110 0110 1101 0011 [mod-011-r/m]
-
-
-
10
9-31
9-26
rcr reg8, imm8
1100 0000 [11-011-r/m] [imm8]
-
-
5+imm8
9
8-30
8-25
rcr reg16, imm8
1100 0001 [11-011-r/m] [imm8]
-
-
5+imm8
9
8-30
8-25
rcr reg32, imm8
0110 0110 1100 0001 [11-011-r/m] [imm8]
-
-
-
9
8-30
8-25
rcr mem8, imm8
1100 0000 [mod-011-r/m] [imm8]
-
-
8+imm8
10
9-31
10-27
rcr mem16, imm8
1100 0001 [mod-011-r/m] [imm8]
-
-
8+imm8
10
9-31
10-27
Page 1385
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
-
-
-
10
9-31
10-27
rcr mem32, imm8
0110 0110 1100 0001 [mod-011-r/m] [imm8]
ret retn
1100 0011
20
16
11-14
10-13
5
2
ret imm16 retn imm16
1100 0010 [imm16]
24
20
11-14
10-13
5
3
ret retf
1100 1011
34
26
15-18
18-21
13
4
ret imm16 retf imm16
1100 1010 [imm16]
33
25
15-18
18-21
14
4
rol reg8, 1
1101 0000 [11-000-r/m]
2
2
2
3
3
1
rol reg16, 1
1101 0001 [11-000-r/m]
2
2
2
3
3
1
rol reg32, 1
0110 0110 1101 0001 [11-000-r/m]
-
-
-
3
3
1
rol mem8, 1
1101 0000 [mod-000-r/m]
15+EA
15+EA
7
7
4
3
rol mem16, 1
1101 0001 [mod-000-r/m]
23+EA
15+EA
7
7
4
3
rol mem32, 1
0110 0110 1101 0001 [mod-000-r/m]
-
-
-
7
4
3
rol reg8, cl
1101 0010 [11-000-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
rol reg16, cl
1101 0011 [11-000-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
rol reg32, cl
0110 0110 1101 0011 [11-000-r/m]
-
-
-
3
3
4
rol mem8, cl
1101 0010 20+EA+4*cl 20+EA+4*cl [mod-000-r/m]
8 + cl
7
4
4
rol mem16, cl
1101 0011 28+EA+4*cl 20+EA+4*cl [mod-000-r/m]
8 + cl
7
4
4
rol mem32, cl
0110 0110 1101 0011 [mod-000-r/m]
-
-
-
7
4
4
rol reg8, imm8
1100 0000 [11-000-r/m] [imm8]
-
-
5+imm8
3
2
1
rol reg16, imm8
1100 0001 [11-000-r/m] [imm8]
-
-
5+imm8
3
2
1
rol reg32, imm8
0110 0110 1100 0001 [11-000-r/m] [imm8]
-
-
-
3
2
1
Page 1386
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
rol mem8, imm8
1100 0000 [mod-000-r/m] [imm8]
-
-
8+imm8
7
4
3
rol mem16, imm8
1100 0001 [mod-000-r/m] [imm8]
-
-
8+imm8
7
4
3
rol mem32, imm8
0110 0110 1100 0001 [mod-000-r/m] [imm8]
-
-
-
7
4
3
ror reg8, 1
1101 0000 [11-001-r/m]
2
2
2
3
3
1
ror reg16, 1
1101 0001 [11-001-r/m]
2
2
2
3
3
1
ror reg32, 1
0110 0110 1101 0001 [11-001-r/m]
-
-
-
3
3
1
ror mem8, 1
1101 0000 [mod-001-r/m]
15+EA
15+EA
7
7
4
3
ror mem16, 1
1101 0001 [mod-001-r/m]
23+EA
15+EA
7
7
4
3
ror mem32, 1
0110 0110 1101 0001 [mod-001-r/m]
-
-
-
7
4
3
ror reg8, cl
1101 0010 [11-001-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
ror reg16, cl
1101 0011 [11-001-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
ror reg32, cl
0110 0110 1101 0011 [11-001-r/m]
-
-
-
3
3
4
ror mem8, cl
1101 0010 20+EA+4*cl 20+EA+4*cl [mod-001-r/m]
8 + cl
7
4
4
ror mem16, cl
1101 0011 28+EA+4*cl 20+EA+4*cl [mod-001-r/m]
8 + cl
7
4
4
ror mem32, cl
0110 0110 1101 0011 [mod-001-r/m]
-
-
-
7
4
4
ror reg8, imm8
1100 0000 [11-001-r/m] [imm8]
-
-
5+imm8
3
2
1
ror reg16, imm8
1100 0001 [11-001-r/m] [imm8]
-
-
5+imm8
3
2
1
ror reg32, imm8
0110 0110 1100 0001 [11-001-r/m] [imm8]
-
-
-
3
2
1
ror mem8, imm8
1100 0000 [mod-001-r/m] [imm8]
-
-
8+imm8
7
4
3
Page 1387
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
ror mem16, imm8
1100 0001 [mod-001-r/m] [imm8]
-
-
8+imm8
7
4
3
ror mem32, imm8
0110 0110 1100 0001 [mod-001-r/m] [imm8]
-
-
-
7
4
3
sahf
1001 1110
4
4
2
3
2
2
sal reg8, 1 (Same instruction as shl)
1101 0000 [11-100-r/m]
2
2
2
3
3
1
sal reg16, 1
1101 0001 [11-100-r/m]
2
2
2
3
3
1
sal reg32, 1
0110 0110 1101 0001 [11-100-r/m]
-
-
-
3
3
1
sal mem8, 1
1101 0000 [mod-100-r/m]
15+EA
15+EA
7
7
4
3
sal mem16, 1
1101 0001 [mod-100-r/m]
23+EA
15+EA
7
7
4
3
sal mem32, 1
0110 0110 1101 0001 [mod-100-r/m]
-
-
-
7
4
3
sal reg8, cl
1101 0010 [11-100-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
sal reg16, cl
1101 0011 [11-100-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
sal reg32, cl
0110 0110 1101 0011 [11-100-r/m]
-
-
-
3
3
4
sal mem8, cl
1101 0010 20+EA+4*cl 20+EA+4*cl [mod-100-r/m]
8 + cl
7
4
4
sal mem16, cl
1101 0011 28+EA+4*cl 20+EA+4*cl [mod-100-r/m]
8 + cl
7
4
4
sal mem32, cl
0110 0110 1101 0011 [mod-100-r/m]
-
-
-
7
4
4
sal reg8, imm8
1100 0000 [11-100-r/m] [imm8]
-
-
5+imm8
3
2
1
sal reg16, imm8
1100 0001 [11-100-r/m] [imm8]
-
-
5+imm8
3
2
1
sal reg32, imm8
0110 0110 1100 0001 [11-100-r/m] [imm8]
-
-
-
3
2
1
sal mem8, imm8
1100 0000 [mod-100-r/m] [imm8]
-
-
8+imm8
7
4
3
Page 1388
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
sal mem16, imm8
1100 0001 [mod-100-r/m] [imm8]
-
-
8+imm8
7
4
3
sal mem32, imm8
0110 0110 1100 0001 [mod-100-r/m] [imm8]
-
-
-
7
4
3
sar reg8, 1
1101 0000 [11-111-r/m]
2
2
2
3
3
1
sar reg16, 1
1101 0001 [11-111-r/m]
2
2
2
3
3
1
sar reg32, 1
0110 0110 1101 0001 [11-111-r/m]
-
-
-
3
3
1
sar mem8, 1
1101 0000 [mod-111-r/m]
15+EA
15+EA
7
7
4
3
sar mem16, 1
1101 0001 [mod-111-r/m]
23+EA
15+EA
7
7
4
3
sar mem32, 1
0110 0110 1101 0001 [mod-111-r/m]
-
-
-
7
4
3
sar reg8, cl
1101 0010 [11-111-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
sar reg16, cl
1101 0011 [11-111-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
sar reg32, cl
0110 0110 1101 0011 [11-111-r/m]
-
-
-
3
3
4
sar mem8, cl
1101 0010 [mod-111-r/m]
20+EA+4*cl 20+EA+4*cl
8 + cl
7
4
4
28+EA+4*cl 20+EA+4*cl
8 + cl
7
4
4
sar mem16, cl 1101 0011 [mod-111-r/m] sar mem32, cl
0110 0110 1101 0011 [mod-111-r/m]
-
-
-
7
4
4
sar reg8, imm8
1100 0000 [11-111-r/m] [imm8]
-
-
5+imm8
3
2
1
sar reg16, imm8
1100 0001 [11-111-r/m] [imm8]
-
-
5+imm8
3
2
1
sar reg32, imm8
0110 0110 1100 0001 [11-111-r/m] [imm8]
-
-
-
3
2
1
sar mem8, imm8
1100 0000 [mod-111-r/m] [imm8]
-
-
8+imm8
7
4
3
Page 1389
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
sar mem16, imm8
1100 0001 [mod-111-r/m] [imm8]
-
-
8+imm8
7
4
3
sar mem32, imm8
0110 0110 1100 0001 [mod-111-r/m] [imm8]
-
-
-
7
4
3
sbb reg8, reg8
0001 10x0 [11-reg-r/m]
3
‘3
2
2
1
1
sbb reg16, reg16
0001 10x1 [11-reg-r/m]
3
3
2
2
1
1
sbb reg32, reg32
0110 0110 0001 10x1 [11-reg-r/m]
3
3
2
2
1
1
sbb reg8, mem8
0001 1010 [mod-reg-r/m]
9+EA
9+EA
7
7
2
2
sbb reg16, mem16
0001 1011 [mod-reg-r/m]
13+EA
9+EA
7
7
2
2
sbb reg32, mem32
0110 0110 0001 1011 [mod-reg-r/m]
-
-
-
7
2
2
sbb mem8, reg8
0001 1000 [mod-reg-r/m]
16+EA
16+EA
7
6
3
3
sbb mem16, reg16
0001 1001 [mod-reg-r/m]
24+EA
16+EA
7
6
3
3
sbb mem32, reg32
0110 0110 0001 1001 [mod-reg-r/m]
-
-
-
6
3
3
sbb reg8, imm8
1000 00x0 [11-011-r/m] [imm]
4
4
3
2
1
1
sbb reg16, imm16
1000 00s1 [11-011-r/m] [imm]
4
4
3
2
1
1
sbb reg32, imm32
0110 0110 1000 00s1 [11-011-r/m] [imm]
4
4
3
2
1
1
sbb mem8, imm8
1000 00x0 [mod-011-r/m] [imm]
17+EA
17+EA
7
7
3
3
sbb mem16, imm16
1000 00s1 [mod-011-r/m] [imm]
25+EA
17+EA
7
7
3
3
sbb mem32, imm32
0110 0110 1000 00s1 [mod-011-r/m] [imm]
-
-
-
7
3
3
sbb al, imm
0001 1100 [imm]
4
4
3
2
1
1
Page 1390
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
sbb ax, imm
0001 1101 [imm]
4
4
3
2
1
1
sbb eax, imm
0110 0110 0001 1101 [imm]
-
-
-
2
1
1
scasb
1010 0100
15
15
7
8
6
4
scasw
1010 0101
19
15
7
8
6
4
scasd
0110 0110 1010 0101
-
-
-
8
6
4
rep scasb
1111 0010 1010 0100
9 + 15 * cx
9 + 15*cx
5 + 8*cx
5 + 8*cx
7 + 5*cx 5 if cx=0
9 + 4*cx 7 if cx=0
rep scasw
1111 0010 1010 0101
9 + 19 * cx
9 + 15*cx
5 + 8*cx
5 + 8*cx
7 + 5*cx 5 if cx=0
9 + 4*cx 7 if cx=0
rep scasd
0110 0110 1111 0010 1010 0101
-
-
-
5 + 8*cx
7 + 5*cx 5 if cx=0
9 + 4*cx 7 if cx=0
seta reg8
0000 1111 1001 0111 [11-000-r/m]e
-
-
-
4
4 if set 3 if clear
1
seta mem8
0000 1111 1001 0011 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setae reg8
0000 1111 1001 0011 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setae mem8
0000 1111 1001 0011 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setb reg8
0000 1111 1001 0010 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setb mem8
0000 1111 1001 0010 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setbe reg8
0000 1111 1001 0110 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setbe mem8
0000 1111 1001 0110 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setc reg8
0000 1111 1001 0010 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setc mem8
0000 1111 1001 0010 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
sete reg8
0000 1111 1001 0100 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
Page 1391
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
sete mem8
0000 1111 1001 0100 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setg reg8
0000 1111 1001 1111 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setg mem8
0000 1111 1001 1111 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setge reg8
0000 1111 1001 1101 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setge mem8
0000 1111 1001 1101 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setl reg8
0000 1111 1001 1100 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setl mem8
0000 1111 1001 1100 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setle reg8
0000 1111 1001 1110 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setle mem8
0000 1111 1001 1110 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setna reg8
0000 1111 1001 0110 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setna mem8
0000 1111 1001 0110 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnae reg8
0000 1111 1001 0010 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnae mem8
0000 1111 1001 0010 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnb reg8
0000 1111 1001 0011 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnb mem8
0000 1111 1001 0011 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnbe reg8
0000 1111 1001 0111 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnbe mem8
0000 1111 1001 0111 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
Page 1392
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
setnc reg8
0000 1111 1001 0011 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnc mem8
0000 1111 1001 0011 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setne reg8
0000 1111 1001 0101 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setne mem8
0000 1111 1001 0101 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setng reg8
0000 1111 1001 1110 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setng mem8
0000 1111 1001 1110 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnge reg8
0000 1111 1001 1100 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnge mem8
0000 1111 1001 1100 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnl reg8
0000 1111 1001 1101 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnl mem8
0000 1111 1001 1101 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnle reg8
0000 1111 1001 1111 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnle mem8
0000 1111 1001 1111 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setno reg8
0000 1111 1001 0001 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setno mem8
0000 1111 1001 0001 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnp reg8
0000 1111 1001 1011 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnp mem8
0000 1111 1001 1011 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setns reg8
0000 1111 1001 1001 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
Page 1393
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
setns mem8
0000 1111 1001 1001 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setnz reg8
0000 1111 1001 0101 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setnz mem8
0000 1111 1001 0101 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
seto reg8
0000 1111 1001 0000 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
seto mem8
0000 1111 1001 0000 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setp reg8
0000 1111 1001 1010 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setp mem8
0000 1111 1001 1010 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setpe reg8
0000 1111 1001 1010 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setpe mem8
0000 1111 1001 1010 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setpo reg8
0000 1111 1001 1011 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setpo mem8
0000 1111 1001 1011 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
sets reg8
0000 1111 1001 1000 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
sets mem8
0000 1111 1001 1000 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
setz reg8
0000 1111 1001 0100 [11-000-r/m]
-
-
-
4
4 if set 3 if clear
1
setz mem8
0000 1111 1001 0100 [mod-000-r/m]
-
-
-
5
3 if set 4 if clear
2
shl reg8, 1
1101 0000 [11-100-r/m]
2
2
2
3
3
1
shl reg16, 1
1101 0001 [11-100-r/m]
2
2
2
3
3
1
shl reg32, 1
0110 0110 1101 0001 [11-100-r/m]
-
-
-
3
3
1
Page 1394
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
shl mem8, 1
1101 0000 [mod-100-r/m]
15+EA
15+EA
7
7
4
3
shl mem16, 1
1101 0001 [mod-100-r/m]
23+EA
15+EA
7
7
4
3
shl mem32, 1
0110 0110 1101 0001 [mod-100-r/m]
-
-
-
7
4
3
shl reg8, cl
1101 0010 [11-100-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
shl reg16, cl
1101 0011 [11-100-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
shl reg32, cl
0110 0110 1101 0011 [11-100-r/m]
-
-
-
3
3
4
shl mem8, cl
1101 0010 20+EA+4*cl 20+EA+4*cl [mod-100-r/m]
8 + cl
7
4
4
shl mem16, cl
1101 0011 28+EA+4*cl 20+EA+4*cl [mod-100-r/m]
8 + cl
7
4
4
shl mem32, cl
0110 0110 1101 0011 [mod-100-r/m]
-
-
-
7
4
4
shl reg8, imm8
1100 0000 [11-100-r/m] [imm8]
-
-
5+imm8
3
2
1
shl reg16, imm8
1100 0001 [11-100-r/m] [imm8]
-
-
5+imm8
3
2
1
shl reg32, imm8
0110 0110 1100 0001 [11-100-r/m] [imm8]
-
-
-
3
2
1
shl mem8, imm8
1100 0000 [mod-100-r/m] [imm8]
-
-
8+imm8
7
4
3
shl mem16, imm8
1100 0001 [mod-100-r/m] [imm8]
-
-
8+imm8
7
4
3
shl mem32, imm8
0110 0110 1100 0001 [mod-100-r/m] [imm8]
-
-
-
7
4
3
shld reg16, reg16, imm8
0000 1111 1010 0100 [11-reg-r/m] [imm8]
-
-
-
3
2
4
0110 0110 0000 1111 1010 0100 [11-reg-r/m] [imm8]
-
-
-
3
2
4
r/m is 1st operand, reg is second operand. shld reg32, reg32, imm8 r/m is 1st operand, reg is second operand.
Page 1395
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
shld mem16, reg16, imm8 0000 1111 1010 0100 [mod-reg-r/m] [imm8]
-
-
-
7
3
4
shld mem32, reg32, imm8 0110 0110 0000 1111 1010 0100 [mod-reg-r/m] [imm8]
-
-
-
7
3
4
0000 1111 1010 0101 [11-reg-r/m]
-
-
-
3
3
4
0110 0110 0000 1111 1010 0101 [11-reg-r/m]
-
-
-
3
3
4
shld mem16, reg16, cl
0000 1111 1010 0101 [mod-reg-r/m]
-
-
-
7
4
5
shld mem32, reg32, cl
0110 0110 0000 1111 1010 0101 [mod-reg-r/m]
-
-
-
7
4
5
shr reg8, 1
1101 0000 [11-101-r/m]
2
2
2
3
3
1
shr reg16, 1
1101 0001 [11-101-r/m]
2
2
2
3
3
1
shr reg32, 1
0110 0110 1101 0001 [11-101-r/m]
-
-
-
3
3
1
shr mem8, 1
1101 0000 [mod-101-r/m]
15+EA
15+EA
7
7
4
3
shr mem16, 1
1101 0001 [mod-101-r/m]
23+EA
15+EA
7
7
4
3
shr mem32, 1
0110 0110 1101 0001 [mod-101-r/m]
-
-
-
7
4
3
shr reg8, cl
1101 0010 [11-101-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
shr reg16, cl
1101 0011 [11-101-r/m]
8 + 4*cl
8 + 4*cl
5 + cl
3
3
4
shr reg32, cl
0110 0110 1101 0011 [11-101-r/m]
-
-
-
3
3
4
shr mem8, cl
1101 0010 20+EA+4*cl 20+EA+4*cl [mod-101-r/m]
8 + cl
7
4
4
shr mem16, cl
1101 0011 28+EA+4*cl 20+EA+4*cl [mod-101-r/m]
8 + cl
7
4
4
shr mem32, cl
0110 0110 1101 0011 [mod-101-r/m]
-
7
4
4
shld reg16, reg16, cl r/m is 1st operand, reg is second operand. shld reg32, reg32, cl r/m is 1st operand, reg is second operand.
Page 1396
-
-
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
shr reg8, imm8
1100 0000 [11-101-r/m] [imm8]
-
-
5+imm8
3
2
1
shr reg16, imm8
1100 0001 [11-101-r/m] [imm8]
-
-
5+imm8
3
2
1
shr reg32, imm8
0110 0110 1100 0001 [11-101-r/m] [imm8]
-
-
-
3
2
1
shr mem8, imm8
1100 0000 [mod-101-r/m] [imm8]
-
-
8+imm8
7
4
3
shr mem16, imm8
1100 0001 [mod-101-r/m] [imm8]
-
-
8+imm8
7
4
3
shr mem32, imm8
0110 0110 1100 0001 [mod-101-r/m] [imm8]
-
-
-
7
4
3
shrd reg16, reg16, imm8
0000 1111 1010 1100 [11-reg-r/m] [imm8]
-
-
-
3
2
4
0110 0110 0000 1111 1010 1100 [11-reg-r/m] [imm8]
-
-
-
3
2
4
shrd mem16, reg16, imm8 0000 1111 1010 1100 [mod-reg-r/m] [imm8]
-
-
-
7
3
4
shrd mem32, reg32, imm8 0110 0110 0000 1111 1010 1100 [mod-reg-r/m] [imm8]
-
-
-
7
3
4
0000 1111 1010 1101 [11-reg-r/m]
-
-
-
3
3
4
0110 0110 0000 1111 1010 1101 [11-reg-r/m]
-
-
-
3
3
4
shrd mem16, reg16, cl
0000 1111 1010 1101 [disp]
-
-
-
7
4
5
shld mem32, reg32, cl
0110 0110 0000 1111 1010 1101 [mod-reg-r/m]
-
-
-
7
4
5
r/m is 1st operand, reg is second operand. shrd reg32, reg32, imm8 r/m is 1st operand, reg is second operand.
shrd reg16, reg16, cl r/m is 1st operand, reg is second operand. shrd reg32, reg32, cl r/m is 1st operand, reg is second operand.
Page 1397
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
stc
1111 1001
2
2
2
2
2
2
std
1111 1101
2
2
2
2
2
2
sti
1111 1011
2
2
2
3
5
7
stosb
1010 1010
11
11
3
4
5
3
stosw
1010 1011
15
11
3
4
5
3
stosd
0110 0110 1010 1011
-
-
-
4
5
3
rep stosb
1111 0010 1010 1010
9 + 10 * cx
9 + 10*cx
4 + 3*cx
5 + 5*cx
7 + 5*cx 5 if cx=0
9 + 3*cx 6 if cx=0
rep stosw
1111 0010 1010 1011
9 + 14 * cx
9 + 10*cx
4 + 3*cx
5 + 5*cx
7 + 5*cx 5 if cx=0
9 + 3*cx 6 if cx=0
rep stosd
0110 0110 1111 0010 1010 1011
-
-
-
5 + 5*cx
7 + 5*cx 5 if cx=0
9 + 3*cx 6 if cx=0
sub reg8, reg8
0010 10x0 [11-reg-r/m]
3
‘3
2
2
1
1
sub reg16, reg16
0010 10x1 [11-reg-r/m]
3
3
2
2
1
1
sub reg32, reg32
0110 0110 0010 10x1 [11-reg-r/m]
3
3
2
2
1
1
sub reg8, mem8
0010 1010 [mod-reg-r/m]
9+EA
9+EA
7
7
2
2
sub reg16, mem16
0010 1011 [mod-reg-r/m]
13+EA
9+EA
7
7
2
2
sub reg32, mem32
0110 0110 0010 1011 [mod-reg-r/m]
-
-
-
7
2
2
sub mem8, reg8
0010 1000 [mod-reg-r/m]
16+EA
16+EA
7
6
3
3
sub mem16, reg16
0010 1001 [mod-reg-r/m]
24+EA
16+EA
7
6
3
3
sub mem32, reg32
0110 0110 0010 1001 [mod-reg-r/m]
-
-
-
6
3
3
sub reg8, imm8
1000 00x0 [11-101-r/m] [imm]
4
4
3
2
1
1
sub reg16, imm16
1000 00s1 [11-101-r/m] [imm]
4
4
3
2
1
1
sub reg32, imm32
0110 0110 1000 00s1 [11-101-r/m] [imm]
4
4
3
2
1
1
sub mem8, imm8
1000 00x0 [mod-101-r/m] [imm]
17+EA
17+EA
7
7
3
3
Page 1398
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
sub mem16, imm16
1000 00s1 [mod-101-r/m] [imm]
25+EA
17+EA
7
7
3
3
sub mem32, imm32
0110 0110 1000 00s1 [mod-101-r/m] [imm]
-
-
-
7
3
3
sub al, imm
0010 1100 [imm]
4
4
3
2
1
1
sub ax, imm
0010 1101 [imm]
4
4
3
2
1
1
sub eax, imm
0110 0110 0010 1101 [imm]
-
-
-
2
1
1
test reg8, reg8
1000 0100 [11-reg-r/m]
3
‘3
2
2
1
1
test reg16, reg16
1000 0101 [11-reg-r/m]
3
3
2
2
1
1
test reg32, reg32
0110 0110 1000 0101 [11-reg-r/m]
3
3
2
2
1
1
test reg8, mem8
1000 0110 [mod-reg-r/m]
9+EA
9+EA
6
5
2
2
test reg16, mem16
1000 0111 [mod-reg-r/m]
13+EA
9+EA
6
5
2
2
test reg32, mem32
0110 0110 1000 0111 [mod-reg-r/m]
-
-
-
5
2
2
test reg8, imm8
1111 0110 [11-000-r/m] [imm]
4
4
3
2
1
1
test reg16, imm16
1111 0111 [11-000-r/m] [imm]
4
4
3
2
1
1
test reg32, imm32
0110 0110 1111 0111 [11-000-r/m] [imm]
4
4
3
2
1
1
test mem8, imm8
1111 0110 [mod-000-r/m] [imm]
9+EA
9+EA
6
5
2
2
test mem16, imm16
1111 0111 [mod-000-r/m] [imm]
13+EA
9+EA
6
5
2
2
test mem32, imm32
0110 0110 1111 0111 [mod-000-r/m] [imm]
-
-
-
5
2
2
test al, imm
1010 1000 [imm]
4
4
3
2
1
1
Page 1399
Appendix D
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
test ax, imm
1010 1001 [imm]
4
4
3
2
1
1
test eax, imm
0110 0110 1010 1001 [imm]
-
-
-
2
1
1
xadd reg8, reg8
0000 1111 1100 0000 [11-reg-r/m]
-
-
-
-
3
3
xadd reg16, reg16
0000 1111 1100 0001 [11-reg-r/m]
-
-
-
-
3
3
xadd reg32, reg32
0110 0110 0000 1111 1100 0001 [11-reg-r/m]
-
-
-
-
3
3
xadd mem8, reg8
0000 1111 1100 0000 [mod-reg-r/m]
-
-
-
-
4
4
xadd mem16, reg16
0000 1111 1100 0001 [mod-reg-r/m]
-
-
-
-
4
4
xadd mem32, reg32
0110 0110 0000 1111 1100 0001 [mod-reg-r/m]
-
-
-
-
4
4
xchg reg8, reg8
1000 0110 [11-reg-r/m]
4
4
3
3
3
3
xchg reg16, reg16
1000 0111 [11-reg-r/m]
4
4
3
3
3
3
xchg reg32, reg32
0110 0110 1000 0111 [11-reg-r/m]
-
-
-
3
3
3
xchg mem8, reg8f
1000 0110 [11-reg-r/m]
17 + EA
17 + EA
5
5
5
3
xchg mem16, reg16
1000 0111 [11-reg-r/m]
25 + EA
17 + EA
5
5
5
3
xchg mem32, reg32
0110 0110 1000 0111 [11-reg-r/m]
-
-
-
5
5
3
xchg ax, reg16
1001 0rrr
3
3
3
3
xchg ax, reg32
0110 0110 1001 0rrr
3
3
3
3
3
2
xlat
1101 0111
11
11
5
5
4
4
xor reg8, reg8
0011 00x0 [11-reg-r/m]
3
‘3
2
2
1
1
xor reg16, reg16
0011 00x1 [11-reg-r/m]
3
3
2
2
1
1
r/m is first operand, reg is second operand.
Page 1400
3 2 1 if reg=ax 1 if reg=ax
Appendices
Table 97: 80x86 Instruction Set Referencea Instruction
Encoding (bin)b
Execution Time in Cyclesc 8088
8086
80286
80386
80486
Pentium
3
3
2
2
1
1
xor reg32, reg32
0110 0110 0011 00x1 [11-reg-r/m]
xor reg8, mem8
0011 0010 [mod-reg-r/m]
9+EA
9+EA
7
7
2
2
xor reg16, mem16
0011 0011 [mod-reg-r/m]
13+EA
9+EA
7
7
2
2
xor reg32, mem32
0110 0110 0011 0011 [mod-reg-r/m]
-
-
-
7
2
2
xor mem8, reg8
0011 0000 [mod-reg-r/m]
16+EA
16+EA
7
6
3
3
xor mem16, reg16
0011 0001 [mod-reg-r/m]
24+EA
16+EA
7
6
3
3
xor mem32, reg32
0110 0110 0011 0001 [mod-reg-r/m]
-
-
-
6
3
3
xor reg8, imm8
1000 00x0 [11-110-r/m] [imm]
4
4
3
2
1
1
xor reg16, imm16
1000 00s1 [11-110-r/m] [imm]
4
4
3
2
1
1
xor reg32, imm32
0110 0110 1000 00s1 [11-110-r/m] [imm]
4
4
3
2
1
1
xor mem8, imm8
1000 00x0 [mod-110-r/m] [imm]
17+EA
17+EA
7
7
3
3
xor mem16, imm16
1000 00s1 [mod-110-r/m] [imm]
25+EA
17+EA
7
7
3
3
xor mem32, imm32
0110 0110 1000 00s1 [mod-110-r/m] [imm]
-
-
-
7
3
3
xor al, imm
0011 0100 [imm]
4
4
3
2
1
1
xor ax, imm
0011 0101 [imm]
4
4
3
2
1
1
xor eax, imm
0110 0110 0011 0101 [imm]
-
-
-
2
1
1
a. Real mode, 16-bit segments. b. Instructions with a 66h or 67h prefix are available only on 80386 and later processors. c. Timings are all optimistic and do not include the cost of prefix bytes, hazards, fetching, misaligned operands, etc. d. Cycle timings for HLT instruction are above and beyond the time spent waiting for an interrupt to occur.
Page 1401
Appendix D e. On the 80386 and most versions of later processors, the processor ignores the reg field’s value for the Scc instruction; the reg field, however, should contain zero. f. Most assemblers accept “xchg reg,mem” and encode it as “xchg mem,reg” which does the same thing.
Page 1402