This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Director, Global Engineering Program: Chris Carson
ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means–graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, information storage and retrieval systems, or in any other manner–except as may be permitted by the license terms herein.
Senior Developmental Editor: Hilda Gowans
For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706.
Editorial Assistant: Jennifer Dinsmore
For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions. Further permissions questions can be emailed to [email protected].
Jonathan W. Valvano
Marketing Specialist: Lauren Betsos
Library of Congress Control Number: 2009923271
Media Editor: Chris Valentine
ISBN-13: 978-0-495-41137-6 ISBN-10: 0-495-41137-X
Director, Content and Media Production: Barbara Fuller-Jacobsen
Cengage Learning 200 First Stamford Place, Suite 400 Stamford, CT 06902 USA
Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan. Locate your local office at: international.cengage.com/region. Cengage Learning products are represented in Canada by Nelson Education Ltd. For your course and learning solutions, visit www.cengage.com/engineering. Purchase any of our products at your local college store or at our preferred online store www.ichapters.com.
Preface Embedded computer systems are electronic systems that include a microcomputer to perform specific dedicated tasks. The computer is hidden inside these products. Embedded systems are ubiquitous. Every week millions of tiny computer chips come pouring out of factories like Freescale, Microchip, Philips, Texas Instruments, Silicon Labs, and Mitsubishi finding their way into our everyday products. Our global economy, our production of food, our transportation systems, our military defense, our communication systems, and even our quality of life depend on the efficiency and effectiveness of these embedded systems. Engineers play a major role in all phases of this effort: planning, design, analysis, manufacturing, and marketing. This book provides an introduction to embedded systems, including both hardware interfacing and software fundamentals. This book employs a bottom-up educational approach. The overall educational objective is to allow students to discover how the computer interacts with its environment. It will provide hands-on experiences of how an embedded system could be used to solve Electrical Engineering (EE) problems. The focus will be on understanding and analysis, with an introduction to design. The optical sensors, motors, sampling ADCs and DACs are the chosen mechanism to bridge the Computer Engineering (CE) and EE worlds. EE concepts include Ohms Law, LED voltage/current, resistance measurement, and stepper motor control. CE concepts include I/O device drivers, debugging, stacks, queues, local variables and interrupts. This book is based on the Freescale 9S12. This book can be used effectively with any of the 9S12 derivatives, such as 9S12C32, 9S12DG256, 9S12DP512, and 9S12E128. The hardware construction is performed on a breadboard and debugged using a multimeter (students learn to measure voltage and resistance). Software is developed in 9S12 assembly; labs may be simulated-only or first simulated and then run on the real 9S12 system. Software debugging occurs during the simulation stage. Device testing occurs on the final product. One way to sort the broad range of topics within EE and CE is to group them into three categories: components, interfaces, and systems. Electrical and Computer Engineering curriculi devote considerable effort to teaching how to design the components within a system. Components include physical devices, analog circuits, digital circuits, power circuits, digital signal processing, data structures, and software algorithms. Interfacing in general and this book, in specific, address the important task of connecting these components together. So, one effective way to educate engineering students is to first teach them how to build components, then teach them how to connect components together (this book). After the student learns how to build things and connect them together, then the student can be taught how to build systems. Of course, once a system is complete, it can be interfaced with other systems to solve more complex problems. The book is essentially organized into three parts. Chapters 1 through 4 provide a basic introduction to computer architecture, representation of information, and assembly language programming. Parallel ports, switches, and LEDs are presented early in Chapter 2 so that students can write software that actually does something. Chapters 5, 6, 7, and 10 provide an in-depth treatment of software design as it applies to embedded systems. Interfacing and applications of embedded systems are presented in Chapters 8, 9, 11, and 12.
Objectives of the Book The overall objective of this book is to present basic computer architecture, teach assembly language programming, and present an introduction to interfacing. Most universities teach assembly language programming not because employers wish to hire engineers and scientists iii
iv
䡲 Preface
ready to produce assembly code, but rather, because it affords a concrete approach for teaching how software works. Furthermore, an embedded system is an effective vehicle around which to introduce architecture, programming, and interfacing because the components are simple and inexpensive. The book describes both general processes and specific details involved in embedded system design. In particular, detailed case studies are used to illustrate fundamental concepts, and laboratory assignments are provided. The specific objectives of this book include the understanding of: 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲
The basic procedures involved in hardware/software simulation How information is represented on the computer The basic arithmetic and logical operations performed by the computer The fundamental architecture of the 9S12 family microcomputers The input/output operations and synchronization Assembly language programming: considering both function and style Simple hardware interfaces, including: switches, keyboards, LEDs, LCDs, DC motors, DACs, ADCs, and serial ports Debugging techniques: breakpoints, scanpoints, profiles, monitors, voltmeters, oscilloscopes, logic analyzers Program structures with a comparison between assembly and C Modular programming Elementary data structures Interrupt programming
This book does not discuss in detail every 9S12 instruction, but rather, it presents some of the instructions and uses them to discuss the general issues of representation of information, computer architecture, and developing embedded systems. In contrast, the Freescale programming reference guides do give details of each assembly instruction. In a similar manner, the Freescale microcomputer technical reference manuals explain all the I/O port functions. In other words, you will use this book along with the manuals from Freescale. A web site http://users.ece.utexas.edu/⬃valvano/ contains many reference documents for this book.
Special Features This book incorporates a number of special features specifically designed for the beginning engineer. An effective educational approach is to learn by doing. The first action component of the book is the use of checkpoints, which can be found throughout the book. A checkpoint
䡲 Preface
v
is a short question meant as an immediate feedback mechanism for the reader to evaluate his or her level of comprehension. Checkpoints should be performed while reading the chapter. Answers to checkpoints are given in the solutions manual section at the back of the book. The second action component of the book is the examples. Design examples are included within each chapter. The purpose of the examples is to apply knowledge presented in that chapter to solve a specific problem. The third action component is the tutorials. Each tutorial includes a sequence of actions (specific things for the reader to do) and a list of questions. Tutorials are meant to be performed without supervision, and should be performed after reading the chapter, but before attempting the labs or homework. Answers to the tutorial questions are also given in the solutions manual section in the back of the book. The most important action components of the book are the laboratory assignments, which can be found at the end of each chapter. Additional labs and the tutorials can be found on the web site http://users. ece.utexas.edu/⬃valvano/. Each laboratory solution can first be built and tested using the TExaS simulator, then downloaded and run on an actual 9S12. Only by performing the laboratory assignments can the reader truly assimilate the hardware and software concepts introduced in this book. Laboratories are meant to be performed under the supervision of an instructor, and involve the classic engineering processes of design, construction, debugging, and evaluation. Homework problems can also be found at the end of each chapter. These problems are less detailed and are intended to evaluate the reader’s understanding of specific topics introduced in the chapter.
How to Teach a Course Based on This Book The first step in the design of any course is to create a list of educational objectives. This book along with the materials on the book web site could be used to teach introductory microcomputer programming and/or microcomputer interfacing. Specific educational objectives that are supported in this book are microcomputer architecture, number systems, assembly language programming, debugging, I/O device interfacing, I/O device synchronization, subroutines, local variables, elementary data structures, and interrupts. The next important decision to make is the organization of the student laboratory. The importance of practical “hands on” experience is critical in the educational process. Unfortunately, space, staff, and money constraints force all of us to compromise, doing the best we can. On the other hand, the role of simulation is becoming increasingly important as the race for technological superiority is run with shorter and shorter design cycle times. Consequently, it is important to expose our students to all phases of engineering design, including problem specification, conceptualization, simulation, construction, and analysis. Universities that adopt this book will be allowed to download, rewrite, print out, and distribute the laboratory assignments presented in this book. The first laboratory configuration is based entirely on material included with book, and involves no extra costs. Each book allows the student to download and install the TExaS application on a single computer. Students, for the most part, work off campus and come to a TA station for help or lab grading. In this configuration, you can either develop software in assembly using the TExaS assembler or develop C programs using the special version of Metrowerks Codewarrior for the 9S12. The simulator itself becomes the platform on which the lab assignments are developed and tested. A second laboratory configuration combines simulation with some real microcomputer experiments. Labs can be first simulated, then run on a real microcomputer. Students are given or loaned a 9S12 development board like the Dragon12 board from Wytec (http://www.evbplus.com/index.html) or the Adapt9S12 board from Technological Arts (http://www.technologicalarts.com). Students can work off campus on the simulation aspects of the labs, then come to a laboratory for access to test equipment such as voltmeters and oscilloscopes. In this configuration, students first could write and debug assembly
vi
䡲 Preface
software using the TExaS simulator, then use TExaS to download and test on a real 9S12 board. TExaS can be used with any 9S12 that contains the Serial Monitor in protected EEPROM $F800 to $FFFF. The special version of Metrowerks Codewarrior for the 9S12 could also be used to develop either assembly or C using either the serial monitor or a background debug module (BDM) hardware pod. This is more expensive than the first configuration because actual microcomputer hardware and debugging systems are required.
What’s on the Book Web Site? 1. TExaS installer download. Each student purchasing a book can download and install TExaS. TExaS is a complete editor, assembler, and simulator for the Freescale 9S12 microcomputer. It simulates external hardware, I/O ports, interrupts, memory, and program execution. It is intended as a learning tool for embedded systems. This software is not freeware, but the purchase of the book entitles the owner to install one copy of the program. Once installed TExaS creates many subdirectories with example applications. 2. There are multiple short video tutorials about developing assembly language programs on TExaS. See http://users.ece.utexas.edu/⬃valvano/Readme.htm 3. There is a directory containing data sheets in Adobe’s pdf format. This information does not need to be copied to your hard drive; you can simply read the data sheets from the web itself. In particular there are data sheets for microcomputers, digital logic, memory chips, op amps, ADCs, DACs, timer chips and interface chips. See http://users.ece.utexas.edu/⬃valvano/Datasheets/ 4. There is a directory containing example applications. These examples include circuit diagrams and software that can be downloaded and run on the actual 9S12 board. http://users.ece.utexas.edu/⬃valvano/Starterfiles/ 5. There is a directory containing lecture notes and laboratory assignments based on this book. http://users.ece.utexas.edu/⬃valvano/EE319K/ 6. There is a web site containing downloads of materials that can be used with this book. http://www.cengage.com/engineering/valvano
Acknowledgments Many shared experiences contributed to the development of this book. First, I would like to acknowledge the many excellent teaching assistants I have had the pleasure of working with. Some of these hardworking, underpaid warriors include Dr. Nachiket Kharalkar, Dr. Robin Tsang, John Porterfield, Sri Priya Ponnapalli, Dr. Anil Kottam, Brett Hemes, Priyank Patel, Dr. Byung-geun Lee, Deepak Panwar, Tawfik Chowdhury, Jungho Jo, Usman Tariq, Glen Rhodes, Sandy Hermawan, Jacob Egner, Robby Morrill, and Kyle Hutchens. Ann Meyer developed most of the code for the HD44780 LCD simulation. My teaching assistants have contributed greatly to the contents of this book, especially Nachiket and Robin. In the similar manner, my students have recharged my energy each semester with their enthusiasm, dedication, and quest for knowledge. Secondly, I appreciate the patience and expertise of my fellow faculty members here at the University of Texas at Austin. From a personal perspective Dr. John Pearce provided much needed encouragement and support throughout my career. In addition, as instructors of the class around which this book was developed Dr. Bill Bard, Dr. Nachiket Kharalkar, Dr. Nur Touba, Mr. Mark Welker, Mr. Gary Daniels, and Dr. Ramesh Yerraballi provided insight and substance for this book. Dr. Lizy John and Dr. Yale Patt contributed to the architecture sections in this book. Thirdly, I would like to thank the experts who reviewed this manuscript. This is the third book I have written, and I was deeply impressed by the quality and quantity of
䡲 Preface
vii
suggestions made by these reviewers. The rough draft had serious flaws in how it was organized, and thanks to their helpful advice, I think this book now flows smoothly. In particular, I want to thank Bill Bard, University of Texas at Austin Christopher M. Cischke, Michigan Technological University Bruce A. Harvey, Florida A & M University Joseph J. Pfeiffer, New Mexico State University Karkal S. Prabhu, Drexel University Eric M. Schwartz, University of Florida Lastly, I appreciate the valuable lessons of character and commitment taught to me by my parents and grandparents. I recall how hard my parents and grandparents worked to make the world a better place for the next generation. Most significantly, I acknowledge the love, patience and support of my wife, Barbara, and my children, Ben, Dan, and Liz. In particular, Ben helped with the web site and the animations.
JONATHAN W. VALVANO
Good luck!
Contents 1
Introduction to Embedded Microcomputer Systems 1
1.1 1.2 1.3 1.4 1.5 1.6 1.7
Basic Components of an Embedded System 2 Applications Involving Embedded Systems 5 Flowcharts and Structured Programming 6 Concurrent and Parallel Programming 10 Product Development Cycle Successive Refinement 17 Quality Design 19
The Assembly Language Development Process 40 Memory Transfer Operations Subroutines 43 Input/Output 45 2.9.1 Direction Registers 45 2.9.2 Switch Interface 46 2.9.3 LED Interface 47
Big and Little Endian 111 Memory-Mapped I/O 112 *I/O-Mapped I/O 113 *Segmented or Partitioned Memory Memory Bus Cycles 114 Processor Architecture 116 I/O Port Architecture 118
113
*Understanding Software Execution at the Bus Cycle Level 121 9S12 Architecture Details 127 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5
5.1.1 Definition and Goals 153 5.1.2 Functions, Procedures, Methods, and Subroutines 155 5.1.3 Dividing a Software Task into Modules 156 5.1.4 How to Draw a Call-Graph 158 5.1.5 How to Draw a Data Flow Graph 160 5.1.6 Top-Down Versus Bottom-Up Design 160
5.2
Pointers and Data Structures 192
Tutorial 4. Building a Microcomputer and Executing Machine Code 144 Homework Assignments 147 Laboratory Assignments 148
Approximating Continuous Signals in the Digital Domain 398 Digital to Analog Conversion 399 Music Generation 400 Analog to Digital Conversion 403 11.4.1 9S12 ADC Details 403 11.4.2 ADC Data Formats 406 11.4.3 ADC Resolution 407
11.5 11.6 11.7
*Multiple Access Circular Queues 408 Real-Time Data Acquisition 409 *Control Systems 413
Introduction 433 Reentrant Programming and Critical Sections 434 Interthread Communication and Synchronization 438 Mailbox 439 Producer Consumer Problem FIFO Queue Implementation Double Buffer 446
A1.7 A1.8
440 444
Serial Port Interface using Interrupt Synchronization 438 *Distributed Systems. 447 *Design and Implementation of a Controller Area Network (CAN) 449 12.6.1 The Fundamentals of CAN 451 12.6.2 Details of the 9S12 CAN 454 12.6.3 9S12 CAN Device Driver 468
12.7
12.8 12.9 12.10 12.11
The Fundamentals of I2C 460 I2C Synchronization 464 9S12 I2C Details 465 9S12 I2C Single Master Example
Appendix 1 Embedded System Development Using TExaS 480 A1.1 Introduction to TExaS 480 A1.2 Major Components of TExaS 483 A1.3 Embedded System Design Process 486
Overall Structure 492 Label Field 492 Operation Field 493 Operand Field 493 Expressions 494 Comment Field 496 Assembly Listing and Errors 497 Assembler Pseudo-Ops 499 S-19 Object Code 503
TExaS ViewBox 505 Microcomputer Interfacing in TExaS 506
Appendix 2 Running on an Evaluation Board 508 Appendix 3 Systems Engineering 511 A3.1 A3.2
Running and Modifiing Existing Assembly Language Programs 490 TExaS Editor 491 Assembly Language Syntax 492 A1.6.1 A1.6.2 A1.6.3 A1.6.4 A1.6.5 A1.6.6 A1.6.7 A1.6.8 A1.6.9
Communication Systems 433
12.3.1 12.3.4 12.3.4 12.3.4
12.4
A1.4
Design for Manufacturability Battery Power 512
Glossary of Terms
514
Solutions Manual
529
Checkpoint Solutions 529 Tutorial Solutions 542
Index
550
511
xi
This page intentionally left blank
1
Introduction to Embedded Microcomputer Systems Chapter 1 objectives are to: c Introduce embedded microcomputer systems c Outline the basic steps in developing microcomputer systems c Define data flow graphs, flowcharts and call graphs
It is an effective approach to learn new techniques by doing them. But, the dilemma in learning a laboratory-based topic like embedded systems is that there is a tremendous volume of details that first must be learned before microcomputer hardware and software systems can be designed. The approach taken in this book is to learn by doing. One of the advantages of a bottom-up approach to learning is that the student begins by mastering simple concepts. Once the student truly understands simple concepts, he or she can then embark on the creative process of design, which involves the putting the pieces together to create a more complex system. True creativity is needed to solve complex problems using effective combinations of simple components. Embedded systems afford an effective platform to teach new engineers how to program for three reasons. First, there is no operating system. Thus, in a bottom-up fashion the student can see, write, and understand all software running on a system that actually does something. Second, embedded systems involve input/output that is easy for the student to touch, hear, and see. Third, embedded systems are employed in many every-day products, motivating students by showing them how electrical and computer engineering processes can be applied in the real world. Rather than introduce the voluminous details in an encyclopedic fashion, the book is organized by basic concepts, and the details are introduced as they are needed. We will start with simple systems and progressively add complexity. The overriding theme for Chapter 1 will be to present the organizational framework with which embedded systems will be designed. Chapters 2 through 4 explain how the computer works. Chapters 5, 6, 7, and 10 present the details of software development on an embedded system. Interfacing I/O devices to build embedded systems is presented in Chapters 8, 9, 11, 12, and 13.
1
2
1.1
1 䡲 Introduction to Embedded Microcomputer Systems
Basic Components of an Embedded System Information is stored on the computer in binary form. A binary bit can exist in one of two possible states. In positive logic, the presence of a voltage is called the ‘1’, true, asserted, or high state. The absence of a voltage is called the ‘0’, false, not asserted, or low state. Figure 1.1 shows the output of a typical complementary metal oxide semiconductor (CMOS) circuit. The left side shows the condition with a true bit, and the right side shows a false. The output of each digital circuit consists of a p-type transistor “on top of” an n-type transistor. In digital circuits, each transistor is essentially on or off. If the transistor is on, it is equivalent to a short circuit between its two output pins. Conversely, if the transistor is off, it is equivalent to an open circuit between its outputs pins. On a 9S12 powered with 5 V supply, a voltage between 3.25 and 5 V is considered high, and a voltage between 0 and 1.75 V is considered low. Separating the two regions by 1.5 V allows digital logic to operate reliably at very high speeds. The design of transistor-level digital circuits is beyond the scope of this book. However, it is important to know that digital data exist as binary bits and encoded as high and low voltages.
Figure 1.1 A binary bit is true if a voltage is present and false if the voltage is 0.
True
Equivalence +5V
+5V
p-type on Out=5V n-type off
False
+5V
p-type off Out=0V
short Out=5V
n-type on
open
Equivalence +5V open Out=0V short
If the information we wish to store exists in more than two states, we use multiple bits. For example, a byte contains 8 bits, and is built by grouping 8 binary bits into one object, as shown in Figure 1.2. Information can take many forms, e.g., numbers, logical states, text, instructions, sounds, or images. What the bits mean depends on how the information is organized and more importantly how it is used. Figure 1.2 A byte is comprised of 8 bits.
Bit 7
Bit 6 +5V
+5V
Bit 5 +5V
Bit 4 +5V
Bit 3 +5V
Bit 2 +5V
Bit 1 +5V
Bit 0 +5V
Memory is a collection of hardware elements in a computer into which we store information, as shown in Figure 1.3. For most computers in today’s market, each memory cell contains one byte of information, and each byte has a unique and sequential address. The memory is called byte-addressable because each byte has a separate address. The address of a memory cell specifies its physical location and its contents is the data. When we write to memory, we specify an address and 8 bits of data, causing that information to be stored into the memory. When we read from memory we specify an address, causing 8 bits of data to be retrieved from the memory. Read Only Memory, or ROM, is a type of memory where is the information is programmed or burned into the device, and during normal operation it only allows read accesses. Random Access Memory (RAM) is used
1.1 䡲 Basic Components of an Embedded System Figure 1.3 Memory is a sequential collection of data storage elements.
Address
3
Contents
103 Main St 104 Main St 105 Main St 106 Main St 107 Main St 108 Main St
to store temporary information, and during normal operation we can read data from or write data into RAM. The information in the ROM is nonvolatile, meaning the contents are not lost when power is removed. In contrast, the information in the RAM is volatile, meaning the contents are lost when power is removed. The system can quickly and conveniently read data from a ROM. It takes a comparatively long time to program or burn data into a ROM. In contrast, it is fast and easy to both read data from and write data into a RAM. Software is a set of instructions, stored in memory, that are executed in a complicated but well-defined manner. The processor is the digital hardware device that executes software. A port is a physical connection between the computer and its outside world. Ports allow information to enter and exit the system. Information enters via the input ports and exits via the output ports. Other names used to describe ports are I/O ports, I/O devices, interfaces, or sometimes just devices. A bus is a collection of wires used to pass information between modules. A computer is an electronic device with a processor, memory, and I/O ports, connected together with a bus. A microcomputer is a computer small enough that one person can carry it. Small in this context describes its size not its computing power. Consequently, there can be great confusion over the term microcomputer, because it can refer to a very wide range of devices from a PIC12C508, which is an 8-pin chip with 512 words of ROM and 25 bytes RAM, to the most powerful Pentium-based personal computer. Computers are not intelligent. Rather, you are the true genius. Computers are electronic idiots. They can store a lot of data, but they will only do exactly what we tell them to do. Fortunately, however, they can execute our programs quite quickly, and they don’t get bored doing the same tasks over and over again. To better understand the expression embedded microcomputer system, consider each word separately. In this context, the word “embedded” means hidden inside so one can’t see it. The term “micro” means small, and a “computer” contains a processor, memory, and a means to exchange data with the external world. In an embedded system, we use ROM for storing the software and fixed constant data, and RAM for storing temporary information. Many microcomputers employed in embedded systems use EEPROM, which is an electrically erasable programmable ROM, because the information can easily be erased and reprogrammed. The functionality of a digital watch is defined by the software programmed into its ROM. When you remove the batteries from a watch and insert new batteries, it still behaves like a watch because the ROM is nonvolatile storage. As shown in Figure 1.4, the term embedded microcomputer system refers to a device that contains one or more microcomputers inside. Microcontrollers, which are microcomputers incorporating the processor, RAM, ROM and I/O ports into a single package, are often employed in an embedded system because of their low cost, small size, and low power
4
1 䡲 Introduction to Embedded Microcomputer Systems
Figure 1.4 An embedded system includes a microcomputer interfaced to external devices.
Embedded system Microcontroller
9S12
Processor I/O Ports
RAM ROM Bus
ADC
Electrical, mechanical, chemical, or optical devices DAC Analog signals
requirements. Microcontrollers like the 9S12 are available with a large number and wide variety of I/O devices, such as parallel ports, serial ports, timers, digital to analog convertors (DAC), and analog to digital convertors (ADC). The I/O devices are a crucial part of an embedded system, because they provide necessary functionality. The software together with the I/O ports and associated interface circuits give an embedded computer system its distinctive characteristics. Checkpoint 1.1: What is an embedded system?
A digital multimeter, as shown in Figure 1.5, is a typical embedded system. This embedded system has two inputs, the mode selection dial on the front and the red/black test probes. The output is a liquid crystal display (LCD) showing measured parameters. The large black chip inside the box is a microcontroller. The software that defines its very specific purpose is programmed into the ROM of the microcontroller. As you can see, there is not much else inside this box other than the microcontroller, a fuse, a few interfacing resistors, and a battery. Figure 1.5 A digital multimeter contains a microcontroller programmed to measure voltage, current and resistance.
As defined previously, a microcomputer is a small computer. One typically restricts the term embedded to refer to systems that do not look and behave like a typical computer. Most embedded systems do not have a keyboard, a graphics display, or secondary storage (disk). There are two ways to develop embedded systems. The first technique uses a microcontroller, like the 9S12. In general, there is no operating system, so the entire software system must be developed. These devices are suitable for low-cost, low-performance systems. One the other hand, one can develop a high-performance embedded system around the Arm or PC architecture. These systems typically employ an operating system, and are first designed on a development platform, and then the software and hardware are migrated to a standalone embedded platform. Checkpoint 1.2: What is a microcomputer?
The external devices attached to the microcontroller allow the system to interact with its environment. An interface is defined as the hardware and software that combine to allow the computer to communicate the external hardware. We must also learn how to interface a
1.2 䡲 Applications Involving Embedded Systems
5
wide range of inputs and outputs that can exist in either digital or analog form. This book provides an introduction to microcomputer programming, hardware interfacing, and the design of embedded systems. In general, we can classify I/O interfaces into four categories Parallel—binary data is available simultaneously on groups of lines Serial—binary data is available one bit at a time on a single line Analog—data is encoded as a variable voltage Time—data is encoded as a period, frequency, pulse width or phase shift A device driver is a set of software functions that facilitate the use of an I/O port. One of the simplest I/O ports on the 9S12 is a parallel port called PTT, meaning it is a collection of eight pins that can be used for either input or output. If PTT is an input port, then when the software reads from PTT, it gets eight bits (each bit is 1 or 0), representing the digital levels (high or low) that exist at the time of the read. If PTT is an output port, then when the software writes to PTT, it sets the outputs on the eight pins high (1) or low (0), depending on the data value the software has written. The other general concept involved in most embedded systems is they run in real-time. In a real-time computer system, we can put an upper bound on the time required to perform the input-calculation-output sequence. A real-time system can guarantee a worst case upper bound on the response time between when the new input information becomes available and when that information is processed. This response time is called interface latency. Another real-time requirement that exists in many embedded systems is the execution of periodic tasks. A periodic task is one that must be performed at equal-time intervals. A realtime system can put a small and bounded limit on the time error between when a task should be run and when it is actually run. Because of the real-time nature of these systems, microcomputers have a rich set of features to handle many aspects of time. Checkpoint 1.3: An input device allows information to be entered into the computer. List some of the input devices available on a general purpose computer. Checkpoint 1.4: An output device allows information to exit the computer. List some of the output devices available on a general purpose computer.
The embedded computer systems in this book will contain a Freescale 9S12, which will be programmed to perform a specific dedicated application. Software for embedded systems typically solves only a limited range of problems. The microcomputer is embedded or hidden inside the device. In an embedded system, the software is usually programmed into ROM and therefore fixed. Even so, software maintenance (e.g., verification of proper operation, updates, fixing bugs, adding features, extending to new applications, end user configurations) is still extremely important. In fact, because microcomputers are employed in many safety-critical devices, injury or death may result if there are hardware and/or software faults. Consequently, testing must be considered in the original design, during development of intermediate components, and in the final product. The role of simulation is becoming increasingly important in today’s market place as we race to build better and better machines with shorter and shorter design cycles. An effect approach to building embedded systems is to first design the system using a hardware/software simulator, then download and test the system on an actual microcontroller.
1.2
Applications Involving Embedded Systems An embedded computer system includes a microcomputer with mechanical, chemical and electrical devices attached to it, programmed for a specific dedicated purpose, and packaged up as a complete system. Any electrical, mechanical, or chemical system that involves inputs, decisions, calculations, analyses, and outputs is a candidate for implementation as an embedded system. Electrical, mechanical, and chemical sensors collect information.
6
1 䡲 Introduction to Embedded Microcomputer Systems
Electronic interfaces convert the sensor signals into a form acceptable for the microcomputer. For example, a tachometer is a sensor that measures the revolutions per second of a rotating shaft. Microcomputer software performs the necessary decisions, calculations, and analyses. Additional interface electronics convert the microcomputer outputs into the necessary form. Actuators can be used to create mechanical or chemical outputs. For example, an electrical motor converts electrical power into mechanical power. One automobile may soon employ up to 100 microcontrollers. In fact, upscale homes already contain as many as 150 microcontrollers, and the average consumer now interacts with microcontrollers thousands of times each day. Embedded microcomputers impact virtually all aspects of daily life: 䡲 䡲 䡲 䡲 䡲 䡲
Consumer electronics Communication systems Automotive systems Military hardware Business applications Medical devices
Table 1.1 presents typical embedded microcomputer applications and the function performed by the embedded microcomputer. Each microcomputer accepts inputs, performs calculations, and generates outputs. In contrast, a general-purpose computer system typically has a keyboard, disk and graphics display and can be programmed for a wide variety of purposes. Typical generalpurpose applications include word processing, electronic mail, business accounting, scientific computing, and data base systems. The user of a general-purpose computer does have access to the software that controls the machine. In other words, the user decides which operating system to run and which applications to launch. Because the general-purpose computer has a removable disk or network interface, new programs can easily be added to the system. The most common type of general-purpose computer is the personal computer, e.g., the Apple Macintosh or the IBM-PC compatible computer. Computers more powerful than the personal computer can be grouped in the workstation category, ranging from $10,000 to $50,000 range. Supercomputers cost above $50,000. These computers often employ multiple processors and have much more memory than the typical personal computer. The workstations and supercomputers are used for handling large amounts of information (business applications) or performing large calculations (scientific research.) This book will not specifically cover the general-purpose computer, although many of the basic principles of embedded computers do apply to all types of computer systems. Checkpoint 1.5: There is a microcomputer embedded in a digital watch. List three operations the software must perform.
1.3
Flowcharts and Structured Programming The remainder of this chapter will discuss the art and science of designing embedded systems from a general perspective. If you need to write a paper, you decide on a theme, then begin with an outline. In the same manner, if you design an embedded system, you define its specification (what it does), and begin with an organizational plan. In this chapter, we will present three graphical tools to describe the organization of an embedded system: flowcharts, data flow graphs and call graphs. You should draw all three for every system you design. In this section, we introduce the flowchart syntax that will be used throughout the book. Programs themselves are written in a linear or one-dimensional fashion. In other words, we type one line of software after another in a sequential fashion. Writing programs this way is a natural process, because the computer itself usually executes the program in a top-to-bottom
1.3 䡲 Flowcharts and Structured Programming Table 1.1 Embedded system applications.
Function Performed by the Microcomputer Consumer electronics Washing machine Exercise equipment Remote controls Clocks and watches Games and toys Audio/video electronics Set-back thermostats Camera, camcoder Television, VCR, cable box Communication systems Answering machines Telephones Fax machines Radios Cellular phones, pagers Automotive systems Automatic breaking Noise cancellation Locks Electronic ignition Power windows and seats Cruise control Collision avoidance Climate control Emission control Instrumentation Military hardware Smart weapons Missile guidance systems Global positioning systems Surveillance systems Business applications Cash registers Vending machines ATM machines Traffic controllers Industrial robots Bar code readers and writers Automatic sprinklers Elevator controllers RFID systems Lighting and heating systems Medical devices Monitors Drug delivery systems Cancer treatments Pacemakers Prosthetic devices Dialysis machines
Controls the water and spin cycles Measures speed, distance, calories, heart rate Accepts key touches, and sends infrared pulses Maintains the time, alarm, and display Entertains the user, accepts joystick input, displays video output Interacts with the operator and enhances performance Adjusts day/night thresholds saving energy Records and organizes images Accepts inputs and processes audio/visual signals Plays outgoing message, saves and organizes messages Transmits voice and data information Sends and receives images Sends and receives audio, noise rejection Accepts key pad input, outputs sound, and enables communication Optimizes stopping on slippery surfaces Improves sound quality Allows keyless entry, detects intruders, activates alarms Controls sparks and fuel injectors Remembers preferred settings for each driver Maintains constant speed Reduces accidents Improves comfort Reduces pollution Collects and provides necessary information Recognizes friendly targets Directs ordnance at the desired target Determines where you are on the planet Collects information about enemy activities Accepts inputs and manages money Collects money and dispenses product Provides both security and convenience Senses car positions and controls traffic lights Accepts input from sensors, controls motors Controls inventory and optimizes shipping Controls the wetness of the soil Maximizes traffic, minimizes waiting time Identifies products using radiofrequency tags Maximizes comfort and minimizes cost Measures important functions Administers proper doses Controls doses of radiation, drugs, or heat Helps the heart beat regularly Increases mobility for the handicapped Performs functions normally done by the kidney
7
8
1 䡲 Introduction to Embedded Microcomputer Systems
sequential fashion. This one-dimensional format is fine for simple programs, but conditional branching and function calls may create complex behaviors that are not easily observed in a linear fashion. Flowcharts are one way to describe software in a two-dimensional format, specifically providing convenient mechanisms to visualize conditional branching and function calls. Flowcharts are very useful in the initial design stage of a software system to define complex algorithms. Furthermore, flowcharts can be used in the final documentation stage of a project, once the system is operational, in order to assist in its use or modification. Observation: TExaS is one of the few software development systems that allow you to add flowcharts directly into your software as part of its documentation.
Figures throughout this section illustrate the syntax used to draw flowcharts. The oval shapes define entry and exit points. The main entry point is the starting point of the software. Each function, or subroutine, also has an entry point. The exit point returns the flow of control back to the place from which the function was called. When the software runs continuously, as is typically the case in an embedded system, there will be no main exit point. We use rectangles to specify process blocks. In a high-level flowchart, a process block might involve many operations, but in a low-level flowchart, the exact operation is defined in the rectangle. The parallelogram will be used to define an input/output operation. Some flowchart artists use rectangles for both processes and input/output. Since input/output operations are an important part of embedded systems, we will use the parallelogram format, which will make it easier to identify input/output in our flowcharts. The diamond-shaped objects define a branch point or decision block. The rectangle with double lines on the side specifies a call to a predefined function. In this book, functions, subroutines and procedures are terms that all refer to a well-defined section of code that performs a specific operation. Functions usually return a result parameter, while procedures usually do not. Functions and procedures are terms used when describing a highlevel language, while subroutines often used when describing assembly language. When a function (or subroutine or procedure) is called, the software execution path jumps to the function, the specific operation is performed, and the execution path returns to the point immediately after the function call. Circles are used as connectors. Common Error: In general, it is bad programming style to develop software that requires a lot of connectors when drawing its flowchart.
There are a seemingly unlimited number of tasks one can perform on a computer, and the key to developing great products is to select the correct ones. Just like hiking through the woods, we need to develop guidelines (like maps and trails) to keep us from getting lost. One of the fundamental issues when developing software, regardless whether it is a microcontroller with 1000 lines of assembly code or a large computer system with billions of lines of code, is to maintain a consistent structure. One such framework is called structured programming. A good high-level language will force the programmer to write structured programs. Structured programs are built from three basic building blocks: the sequence, the conditional, and the while-loop. At the lowest level, the process block contains simple and well-defined commands. I/O functions are also low-level building blocks. Structured programming involves combining existing blocks into more complex structures, as shown in Figure 1.6.
Figure 1.6 Flowchart showing the basic building blocks of structured programming.
Sequence
Conditional
While-loop
Block 1 Block 2
Block 1
Block 2
Block
1.3 䡲 Flowcharts and Structured Programming
9
Example 1.1: Using a flowchart describe the control algorithm that a toaster might use to cook toast. There will be a start button the user pushes to activate the machine. There is other input that measures toast temperature. The desired temperature is preprogrammed into the machine. The output is a heater, which can be on or off. The toast is automatically lowered into the oven when heat is applied and is ejected when the heat is turned off. Solution This example illustrates a common trait of an embedded system, that is, they perform the same set of tasks over and over forever. The program starts at main when power is applied, and the system behaves like a toaster until it is unplugged. Figure 1.7 shows a flowchart for one possible toaster algorithm. The system initially waits for the operator to push the start button. If the switch is not pressed the system loops back reading and checking the switch over and over. After the start button is pressed, heat is turned on. When the toast temperature reaches the desired value, heat is turned off and the process is repeated. Figure 1.7 Flowchart illustrating the process of making toast.
Entry point Input/Output Decision Input/Output Input/Output Decision
main Input from switch Start
Not pressed
Pressed Output heat is on Too cold Input toast temperature toast < desired toast ≥ desired
Input/Output
Output heat is off
Checkpoint 1.6: What safety feature might you add to this toaster to reduce the chance of a fire?
Example 1.2: Design a flowchart to illustrate the process of reading a book. The inputs to this system are words read from the book, and definitions looked up in a dictionary. The objective of this system will be to store knowledge into a database. There will be no formal output per se. Solution This second example illustrates the concept of a subroutine. We break a complex system into smaller components so that the system is easier to understand, and easier to test. In particular, once we know how to look up definitions of words in a dictionary, we will encapsulate that process into a subroutine, called Lookup. In this example, the main program performs the tasks of reading and remembering. We use a while-loop to read each word of the book in order until the end of the book is reached. After we read a word from the book, we use a conditional to determine whether or not we understand the meaning of the word. If we do not understand the word, we call the Lookup subroutine to find the definition in the dictionary. After we have read and understood each word, we record the knowledge we have learned into a database. The letters A through D in Figure 1.8 specify the software activities in this simple example. In this example, execution is sequential and predictable
10
1 䡲 Introduction to Embedded Microcomputer Systems
(if BD is to occur, it will come after A and before C.) A software task is called a thread. More formally, a thread is the execution of software or the action caused by the execution. In this example, there is one thread. Consider a book with 10 words, and we do not know the meaning of word 4 and word 7. The thread caused by the execution when reading this 10-word book will be A0 C0 A1 C1 A2 C2 A3 C3 A4 B4 D4 C4 A5 C5 A6 C6 A7 B7 D7 C7 A8 C8 A9 C9 where the subscript refers to the word number. The main program executes the sequence AC or ABDC over and over as it finishes reading the book. Figure 1.8 Flowchart illustrating the process of reading a book.
Entry point Connector 1
main End of book
Decision More Input/Output
Exit point
Read next word w
return
Entry point
Lookup(w)
Input/Output
Read w in dictionary
Exit point Decision
w
Connector
1.4
return
Don’t understand
Understand Function call Process block
D
A
Remember
Lookup(w)
B
C
1
Concurrent and Parallel Programming Many problems can not be implemented using the single-threaded execution pattern described in the previous section. Parallel programming allows the computer to execute multiple threads at the same time. State-of-the art multi-core processors can execute a separate program in each of its cores. Fork and join are the fundamental building blocks of parallel programming. After a fork, two or more software threads will be run in parallel, i.e., the threads will run simultaneously on separate processors. Two or more simultaneous software threads can be combined into one using a join. The flowchart symbols for fork and join are shown in Figure 1.9. Software execution after the join will wait until all threads above the join are complete. As an analogy, if I want to dig a big hole in my back yard, I will invite three friends over and give everyone a shovel. The fork operation changes the situation from me working alone to four of us ready to dig. The four digging tasks are run in parallel. When the overall task is complete, the join operation causes the friends go away, and I am working alone again. Concurrent programming allows the computer to execute multiple threads, but only one at a time. Interrupts are one mechanism to implement concurrency on real-time systems. Interrupts have a hardware trigger and a software action. An interrupt is a parameterless subroutine call, triggered by a hardware event. The flowchart symbols for interrupts are
Figure 1.9 Flowchart symbols to describe parallel and concurrent programming.
Fork
Trigger interrupt
Process
Process
Process
Process
Join
Return from interrupt
1.4 䡲 Concurrent and Parallel Programming
11
also shown in Figure 1.9. The trigger is a hardware event signaling it is time to do something. Examples of interrupt triggers we will see in this book include new input data has arrived, output device is idle, and periodic event. The second component of an interruptdriven system is the software action called an interrupt service routine (ISR). The foreground thread is defined as the execution of the main program, and the background threads are executions of the ISRs. Consider the analogy of sitting in a comfy chair reading a book. Reading a book is like executing the main program in the foreground. You start reading at the beginning of the book and basically read one page at time in a sequential fashion. You might jump to the back and look something up in the glossary, then jump back to where you where, which is analogous to a function call. Similarly, if you might read the same page a few times, which is analogous to a program loop. Even though you skip around a little, the order of pages you read follows a logical and well-defined sequence. Conversely, if the telephone rings, you place a bookmark in the book, and answer the phone. When you are finished with the phone conversation, you hang up the phone and continue reading in the book where you left off. The ringing phone is analogous to hardware trigger and the phone conversation is like executing the ISR.
Example 1.3 Design a flowchart for a system that performs two independent tasks. The first task is to output a pulse on PTT every 1.024 ms in real time. The second task is to find all the prime numbers, and there are no particular time constraints on when or how fast one finds the prime numbers. Solution In this example, there are two threads: foreground and background. Real-time means the output pulse must occur every 1.024 ms. Therefore, we will use a periodic interrupt to guarantee this real-time requirement. In particular, the timer system will be configured so that a hardware trigger will occur every 1.024 ms, and the software action will issue the pulse on PTT. The background thread causes the output to go high, then low. Tasks that are not timecritical can be performed in the foreground by the main program. In this example, the foreground thread finds prime numbers. Because both threads are active at the same time, we say the system is multithreaded and the threads are running concurrently. The letters (A through F) in Figure 1.10 specify the software activities in this multithreaded example. In particular, main Factor and Record are executed in the foreground. In the foreground, execution is sequential and predictable (if C is to occur, it will come after B and before D.) On the other hand, with interrupts, the hardware trigger causes the interrupt service routine to execute. The execution of the ISR is predictable too; in this case it is executed every 1.024 ms, but Figure 1.10 Flowchart for a multithreaded solution of a system performing two tasks.
Clock
Entry point
main
Process block
n=2
A
Input/Output
PTT = 1
Factor(n)
B
Input/Output
PTT = 0 F
Interrupt trigger
< E
Connector 1 Function call
Prime
Decision Function call Process block
Not
n = n+1
Connector 1
Record (n) D
Return from interrupt
>
C void interrupt 7 Clock(void){ PTT = 1; E PTT = 0; F } > void main(void){ int n=2; A while(1){ if(Factor(n)) B Record(n); C n = n+1; D } }
12
1 䡲 Introduction to Embedded Microcomputer Systems
ISR execution does not depend on execution in the foreground. In a single processor system like the 9S12, the interrupt must suspend foreground execution, execute the interrupt service routine in the background, then resume execution of the foreground. The symbol signifies the hardware halting the main program and launching the ISR. The symbol signifies the ISR software executing a return from interrupt instruction (rti), which resumes execution in the main program. The execution sequence of this two-threaded system might be something like the following (2, 3, 5, 7 are prime) Foreground A B2C2D2 B3C3D3 B4D4 B5C5D5 B6D6 B7C7 D7 B8D8B9D9 B10 D10 EF EF EF Background where the subscript refers to the current value of n. The main program executes the sequence BCD or BD over and over as it searches for prime numbers. In this example, the periodic timer causes the execution of EF every 1.024 ms. Even though C will come after B and before D, interrupts may or may not inject a EF between any two instructions of the foreground thread. Being able to inject a EF exactly every 1.024 ms is how the real-time constraint is satisfied.
Figure 1.11 Parallel programming solution for finding the maximum value in a buffer.
Buf[0]>Buf[1]
x = Buf[0]
Buf[0]<Buf[1]
Buf[2]>Buf[3]
x = Buf[1]
x>y
max = x
y = Buf[2]
Buf[2]<Buf[3]
y = Buf[3]
x
max = y
To illustrate the concept of parallel programming, consider the problem of finding the maximum value in a buffer, as implemented in Figure 1.11. Finding the maximum value in the first half of the buffer can be executed in parallel with finding the maximum value in the second half of the buffer. Although the 9S12 microcontroller can not execute software tasks in parallel, state-of-the-art microprocessors found in desktop computers have two or more cores, which do support parallel program execution. It is important to distinguish parallel programming, like Figure 1.11, from multithreading, like Figure 1.10. Multithreading, as we will be developing in this book, switches among multiple software tasks, executing one task at a time.
1.5
Product Development Cycle In this section, we will introduce the product development process in general. The basic approach is introduced here, and the details of these concepts will be presented throughout the remaining chapters of the book. As we learn software/hardware development tools and techniques, we can place them into the framework presented in this section. As illustrated in Figure 1.12, the development of a product follows an analysis-design-implementation-testing cycle. For complex systems with long life-spans, we transverse multiple times around the development cycle. For simple systems, a one-time pass may suffice.
1.5 䡲 Product Development Cycle Figure 1.12 Product development cycle.
• Specifications • Constraints New requirements New constraints
Analyze the problem
High level design
13
• Block diagrams • Data flow graphs
Engineering design
Not done Done Testing
• Hardware • Software
Implementation
• Call graphs • Data structures • I/O interfaces
During the analysis phase, we discover the requirements and constraints for our proposed system. We can hire consultants and interview potential customers in order to gather this critical information. A requirement is a specific parameter that the system must satisfy. We begin by rewriting the system requirements, which are usually written in general form, into a list of detailed specifications. In general, specifications are detailed parameters describing how the system should work. For example, a requirement may state that the system should fit into a pocket, whereas a specification would give the exact size and weight of the device. For example, suppose we wish to build a motor controller. During the analysis phase, we would determine obvious specifications such as range, stability, accuracy, and response time. There may be less obvious requirements to satisfy, such as weight, size, battery life, product life, ease of operation, display readability, and reliability. Often, improving the performance on one parameter can be achieved only by decreasing the performance of another. This art of compromise defines the tradeoffs an engineer must make when designing a product. A constraint is a limitation, within which the system must operate. The system may be constrained to such factors as cost, safety, compatibility with other products, use of specific electronic and mechanical parts as other devices, interfaces with other instruments and test equipment, and development schedule. The following measures are often considered during the analysis phase of a project: Safety: The risk to humans or the environment. Accuracy: The difference between desired and actual parameter Precision: The number of distinguishable measurements Resolution: The smallest change that can be reliably detected Response time: The time difference between triggering event and resulting action Bandwidth: The amount of information processed per time Maintainability: The flexibility with which the device can be modified Testability: The ease with which proper operation of the device can be verified Compatibility: The conformance of the device to existing standards Mean time between failure: The reliability of the device defining the life of a product Size and weight: The physical space required by the system Power: The amount of energy it takes to operate the system Nonrecurring engineering cost (NRE cost): The one-time cost to design and test the product Unit cost: The cost required to manufacture one additional product Time-to-prototype: The time required to design build and test an example system Time-to-market: The time required to deliver the product to the customer Human factors: The degree to which our customers enjoy/like/appreciate the product Checkpoint 1.7: What’s the difference between a requirement and a specification?
The following is an outline of a software requirements document. IEEE publishes a number of templates that can be used to define a project (IEEE STD 830-1998.) A requirements
14
1 䡲 Introduction to Embedded Microcomputer Systems
document states what the system will do. It does not state how the system will do it. The main purpose of a requirements document is to serve as an agreement between you and your clients describing what the system will do. This agreement can become a legally binding contract. It should be unambiguous, complete, verifiable and modifiable. 1. Overview 1.1. Objectives: Why are we doing this project? What is the purpose? 1.2. Process: How will the project be developed? 1.3. Roles and Responsibilities: Who will do what? Who are the clients? 1.4. Interactions with Existing Systems: How will it fit in? 1.5. Terminology: Define terms used in the document. 1.6. Security: How will intellectual property be managed? 2. Function Description 2.1. Functionality: What will the system do precisely? 2.2. Scope: List the phases and what will be delivered in each phase. 2.3. Prototypes: How will intermediate progress be demonstrated? 2.4. Performance: Define the measures and describe how they will be determined. 2.5. Usability: Describe the interfaces. Be quantitative if possible. 2.6. Safety: Explain any safety requirements and how they will be measured. 3. Deliverables 3.1. Reports: How will the system be described? 3.2. Audits: How will the clients evaluate progress? 3.3. Outcomes: What are the deliverables? How do we know when the system is done? During the high-level design phase, we build a conceptual model of the hardware/ software system. It is in this model that we exploit as much abstraction as appropriate. The project is broken in modules or subcomponents. Modular design will be presented in Chapter 5. During this phase, we estimate the cost, schedule, and expected performance of the system. At this point we can decide if the project has a high enough potential for profit. A data flow graph is a block diagram of the system, showing the flow of information. Arrows point from source to destination. The rectangles represent hardware components and the ovals are software modules. We use data flow graphs in the high-level design, because they describe the overall operation of the system while hiding the details of how it works. Issues such as safety (e.g., Isaac Asimov’s first Law of Robotics “A robot may not harm a human being, or, through inaction, allow a human being to come to harm”) and testing (e.g., we need to verify our system is operational) should be addressed during the high-level design. A driver is a set of software functions that facilitate the use of an I/O port. A data flow graph for a simple position measurement system is shown in Figure 1.13. The sensor converts position in an electrical resistance. The analog circuit converts resistance into the 0 to 5 V voltage range required by the ADC. The ADC converts analog voltage into a digital sample. The ADC driver, using the ADC and timer hardware, collects samples and calculates voltages. The software converts voltage to position. Voltage and position data are represented as fixedpoint numbers within the computer. The position data is passed to the LCD driver creating ASCII strings, which will be sent to the liquid crystal display (LCD) module.
Figure 1.13 A data flow graph showing how the position signal passes through the system.
Position Resistance Voltage 0 to 3 cm 0 to +50 kΩ 0 to +5 V Position Sensor
Analog circuit
ADC hardware
Sample 0 to 1023 ADC driver
Sample 0 to 1023 Timer ISR
Timer hardware
Fixed-point Characters 0 to 3.00 0.00 to 3.00cm LCD driver
LCD display
1.5 䡲 Product Development Cycle
15
The next phase is engineering design. We begin by constructing a preliminary design. This system includes the overall top down hierarchical structure, the basic I/O signals, shared data structures and overall software scheme. At this stage there should be a simple and direct correlation between the hardware/software systems and the conceptual model developed in the high-level design. Next, we finish the top down hierarchical structure, and built mock-ups of the mechanical parts (connectors, chassis, cables etc.) and user software interface. Sophisticated 3-D CAD systems can create realistic images of our system. Detailed hardware designs must include mechanical drawings. It is a good idea to have a second source, which is an alternative supplier that can sell our parts if the first source can’t deliver on time. Call-graphs are a graphical way to define how the software/hardware modules interconnect. Data structures, which will be presented throughout the book, include both the organization of information and mechanisms to access the data. Again safety and testing should be addressed during this low-level design. A call-graph for a simple position measurement system is shown in Figure 1.14. Again, rectangles represent hardware components and ovals show software modules. An arrow points from the calling routine to the module it calls. The I/O ports are organized into groups and placed at the bottom of the graph. A high-level call-graph, like the one shown in Figure 1.14, shows only the high-level hardware/software modules. A detailed call-graph would include each software function and I/O port. Normally, hardware is passive and the software initiates hardware/software communication, but as we will learn in this book, it is possible for the hardware to interrupt the software and cause certain software modules to be run. In this system, the timer hardware will cause the ADC software to collect a sample. The timer interrupt service routine (ISR) gets the next sample from the ADC software, converts it to position, and displays the result by calling the LCD interface software. The double-headed arrow between the ISR and the hardware means the hardware triggers the interrupt and the software accesses the hardware. Figure 1.14 A call-graph for a simple position measurement system.
Timer ISR
main
Timer driver
ADC driver
Timer hardware
ADC hardware
LCD driver LCD hardware
Observation: If module A calls module B, and B returns data, then a data flow graph will show an arrow from B to A, but a call-graph will show an arrow from A to B.
The next phase is implementation. An advantage of a top-down design is that implementation of subcomponents can occur simultaneously. During the initial iterations of the development cycle, it is quite efficient to implement the hardware/software using simulation. One major advantage of simulation is that it is usually quicker to implement an initial product on a simulator versus constructing a physical device out of actual components. Rapid prototyping is important in the early stages of product development. This allows for more loops around the analysis-design-implementation-testing cycle, which in turn leads to a more sophisticated product. Recent software and hardware technological developments have made significant impacts on the software development for embedded microcomputers. The simplest approach is to use a cross-assembler or cross-compiler to convert source code into the machine code for the target system. The machine code can then be loaded into the target machine. Debugging embedded systems with this simple approach is very difficult for two reasons. First, the embedded system lacks the usual keyboard and display that assist us
16
1 䡲 Introduction to Embedded Microcomputer Systems
when we debug regular software. Second, the nature of embedded systems involves the complex and real-time interaction between the hardware and software. These real-time interactions make it impossible to test software with the usual single-stepping and print statements. The next technological advancement that has greatly affected the manner in which embedded systems are developed is simulation. Because of the high cost and long times required to create hardware prototypes, many preliminary feasibility designs are now performed using hardware/software simulations. A simulator is a software application that models the behavior of the hardware/software system. If both the external hardware and software program are simulated together, even although the simulated time is slower than the clock on the wall, the real-time hardware/software interactions can be studied. During the testing phase, we evaluate the performance of our system. First, we debug the system and validate basic functions. Next, we use careful measurements to optimize performance such as static efficiency (memory requirements), dynamic efficiency (execution speed), accuracy (difference between truth and measured), and stability (consistent operation.) Debugging techniques will be presented at the end of most chapters. Maintenance is the process of correcting mistakes, adding new features, optimizing for execution speed or program size, porting to new computers or operating systems, and reconfiguring the system to solve a similar problem. No system is static. Customers may change or add requirements or constraints. To be profitable, we probably will wish to tailor each system to the individual needs of each customer. Maintenance is not really a separate phase, but rather involves additional loops around the development cycle. Figure 1.12 describes top-down design as a cyclic process, beginning with a problem statement and ending up with a solution. With a bottom-up design we begin with solutions and build up to a problem statement. Many innovations begin with an idea, “what if . . .?” In a bottom-up design, one begins with designing building and testing low-level components. Figure 1.15 illustrates a two-level process, combining three subcomponents to create the overall product. This hierarchical process could have more levels and/or more components at each level. The low-level designs can be developed in parallel. The design of each component is cyclic, iterating through the design-build-test cycle until the performance is acceptable. Bottom-up design may be inefficient because some subsystems may be designed built and tested, but never used. As the design progresses the components are fit together to make the system more and more complex. Only after the system is completely Figure 1.15 System development process illustrating bottom-up design.
• Specifications • Constraints Analyze
Done
• Block diagrams • Data flow graphs Done
Testing
High level design
No
Done
Done Testing
Testing No
• Hardware • Software • Call graphs • Data structures • I/O interfaces Idea
Implementation
Engineering design Idea
Testing
Implementation
Engineering design
No
No Implementation
Engineering design
Idea
1.6 䡲 Successive Refinement
17
built and tested does one define the overall system specifications. The bottom-up design process allows creative ideas to drive the products a company develops. It also allows one to quickly test the feasibility of an idea. If one fully understands a problem area and the scope of potential solutions, then a top-down design will arrive at an effective solution most quickly. On the other hand, if one doesn’t really understand the problem or the scope of its solutions, a bottom-up approach allows one to start off by learning about the problem. Observation: A good engineer knows both bottom-up and top-down design methods, choosing the approach most appropriate for the situation at hand.
1.6
Successive Refinement Throughout the book in general, we discuss how to solve problems on the computer. In this section, we discuss the process of converting a problem statement into an algorithm. Later in the book, we will show how to map algorithms into assembly language. We begin with set of general specifications, then create a list of requirements and constraints. The general specifications describe the problem statement in an overview fashion, requirements define the specific things the system must do, and constraints are the specific things the system must not do. These requirements and constraints will guide us as we develop and test our system. Observation: Sometimes the specifications are ambiguous, conflicting or incomplete.
There are two approaches to the situation of ambiguous, conflicting or incomplete specifications. The best approach is to resolve the issue with your supervisor or customer. The second approach is to make a decision and document the decision. Performance Tip: If you feel a system specification is wrong, discuss it with your supervisor. We can save a lot of time and money by solving the correct problem in the first place.
Successive refinement, stepwise refinement, and systematic decomposition are three equivalent terms for a technique to convert a problem statement into a software algorithm. We start with a task and decompose the task into a set of simpler subtasks. Then, the subtasks are decomposed into even simpler sub-subtasks. We make progress as long as each subtask is simpler than the task itself. During the task decomposition we must make design decisions as the details of exactly how the task will be performed are put into place. Eventually, a subtask is so simple, it can be converted to software code. We can decompose a task in four ways, as shown in Figure 1.16. The sequence, conditional, and while-loop are the three building blocks of structured programming. Because embedded systems often have real-time requirements, we will implement time-critical tasks using Figure 1.16 We can decompose a task using the building blocks of structured programming.
Task
True
Subtask 1 Subtask 2
Condition
False Condition
Subtask 1
Subtask 2
True
Subtask
Subtask
False
Sequential
Conditional
Iterative
Interrupt
18
1 䡲 Introduction to Embedded Microcomputer Systems
interrupt synchronization. An interrupt is a hardware-triggered software function, which will be discussed in more detail in Chapters 8, 10, 11, and 12. When we solve problems on the computer, we need to answer these questions: 䡲 䡲 䡲 䡲 䡲 䡲
What does being in a state mean? What is the starting state of the system? What information do we need to collect? What information do we need to generate? How do we move from one state to another? What is the desired ending state?
List the parameters of the state Define the initial state List the input data List the output data Specify actions we could perform Define the ultimate goal
We need to recognize these phrases that translate to four basic building blocks: 䡲 䡲 䡲 䡲 䡲 䡲 䡲 䡲
“do A then do B” “do A and B in either order” “if A, then do B” “for each A, do B” “do A until B” “repeat A over and over forever” “on external event do B” “every t msec do B”
Example 1.4 Build a digital door lock using seven switches. Solution The system has seven binary inputs from the switches and one binary output to the door lock. The state of this system is defined as “door locked” and “door unlocked”. Initially, we want the door to be locked, which we can make happen by turning a solenoid off (make binary output low). If the 7-bit binary pattern on the switches matches a pre-defined keycode, then we want to unlock the door (make binary output high). Because the switches might bounce (flicker on and off) when changed, we will make sure the switches match the pre-defined keycode for at least 1 ms before unlocking the door. We can change states by writing to the output port for the solenoid. Like most embedded systems, there is no ending state. Once the switches no longer match the keycode the door will lock again. The first step in successive refinement is to divide the tasks into those performed once (Initialization), and those tasks repeated over and over (Execute lock), as shown as the left flowchart in Figure 1.17. As shown in the middle flow chart, we implement if the switches match the key, then unlock. If
Figure 1.17 We can decompose a task using the building blocks of structured programming.
Initialize ports Initialize Initialize
different
Lock
cnt=4000
switches match key
Execute lock
Solenoid=off
Wait and unlock
different Solenoid =off cnt=4000
switches match key cnt=cnt-1 cnt
>0
0 Solenoid=on
1.7 䡲 Quality Design
19
the switches do not match we will lock the door. We will use a counter (cnt) to make sure the switches match the keycode for at least 1 ms before unlocking the door. The waiting is implemented by decrementing the counter. The hardware and software will be implemented in detail as Tutorial 5B.
1.7
Quality Design Embedded system development is similar to other engineering tasks. We can choose to follow well-defined procedures during the development and evaluation phases, or we can meander in a haphazard way and produce code that is hard to test and harder to change. The ultimate goal of the system is to satisfy the stated objectives such as accuracy, stability, and input/output relationships. Nevertheless it is appropriate to separately evaluate the individual components of the system. Therefore in this section, we will evaluate the quality of our software. There are two categories of performance criteria with which we evaluate the “goodness” of our software. Quantitative criteria include dynamic efficiency (speed of execution), static efficiency (memory requirements), and accuracy of the results. Qualitative criteria center on ease of software maintenance. Another qualitative way to evaluate software is ease of understanding. If your software is easy to understand then it will be: Easy to debug (fix mistakes) Easy to verify (prove correctness) Easy to maintain (add features) Common Error: Programmers who sacrifice clarity in favor of execution speed often develop software that runs fast, but does work and can’t be changed.
Golden Rule of Software Development Write software for others as you wish they would write for you.
1.7.1 Quantitative Performance Measurements
In order to evaluate our software quality, we need performance measures. The simplest approaches to this issue are quantitative measurements. Dynamic efficiency is a measure of how fast the program executes. It is measured in seconds or CPU cycles. Static efficiency is the number of memory bytes required. Since most embedded computer systems have both RAM and ROM, we specify memory requirement in global variables, stack space, fixed constants and program. The global variables plus the stack must fit into the available RAM. Similarly, the fixed constants plus the program must fit into the available ROM. We can also judge our embedded system according to whether or not it satisfies given requirements and constraints, like accuracy, cost, power, size, reliability, and time-table.
1.7.2 Qualitative Performance Measurements
Qualitative performance measurements include those parameters to which we can not assign a direct numerical value. Often in life the most important questions are the easiest to ask, but the hardest to answer. Such is the case with software quality. So therefore we ask the following qualitative questions. Can we prove our software works? Is our software easy to understand? Is our software easy to change? Since there is no single approach to writing the best software, we can only hope to present some techniques that you may wish to integrate into your-own software style. In fact, this book devotes considerable effort to the important issue of developing quality software. In particular, we will study self-documented code, abstraction, modularity, and layered software. These issues indeed play a profound effect on the bottom-line financial success of our projects. Although quite real, because there is often not a immediate and direct relationship between a software’s quality and profit, we may be mistakenly tempted to dismiss the importance of quality.
20
1 䡲 Introduction to Embedded Microcomputer Systems
To get a benchmark on how good a programmer you are, take the following two challenges. In the first challenge, find a major piece of software that you have written over 12 months ago, and then see if you can still understand it enough to make minor changes in its behavior. The second challenge is to exchange with a peer a major piece of software that you have both recently written (but not written together), then in the same manner, see if you can make minor changes to each other’s software. Observation: You can tell if you are a good programmer if 1) you can understand your own code 12 months later, and 2) others can make changes to your code.
1.7.3 Attitude
Good engineers employ well-defined design processes when developing complex systems. When we work within a structured framework, it is easier to prove our system works (verification) and to modify our system in the future (maintenance.) As our software systems become more complex, it becomes increasingly important to employ well-defined software design processes. Throughout this book, a very detailed set of software development rules will be presented. This book focuses on real-time embedded systems written in assembly language, but most of the comments should apply to other situations as well. At first, it may seem radical to force such a rigid structure to software. We might wonder if creativity will be sacrificed in the process. True creativity is more about good solutions to important problems and not about being sloppy and inconsistent. Because software maintenance is a critical task, the time spent organizing, documenting, and testing during the initial development stages will reap huge dividends throughout the life of the software project. Observation: The easiest way to debug is to write software without any bugs.
We define clients as programmers who will use our software. A client develops software that will call our functions. We define coworkers as programmers who will debug and upgrade our software. A coworker, possibly ourselves, develops, tests, and modifies our software. Writing quality software has a lot to do with attitude. We should be embarrassed to ask our coworkers to make changes to our poorly written software. Since so much software development effort involves maintenance, we should create software modules that are easy to change. In other words, we should expect each piece of our code will be read by another engineer in the future, whose job it will be to make changes to our code. We might be tempted to quit a software project once the system is running, but this short time we might save by not organizing, documenting, and testing will be lost many times over in the future when it is time to update the code. As project managers, we must reward good behavior and punish bad behavior. A company, in an effort to improve the quality of their software products, implemented the following policies. The employees in the customer relations department receive a bonus for every software bug that they can identify. These bugs are reported to the software developers, who in turn receive a bonus for every bug they fix. Checkpoint 1.8: Why did the above policy fail horribly?
We should demand of ourselves that we deliver bug-free software to our clients. Again, we should be embarrassed when our clients report bugs in our code. We should be mortified when other programmers find bugs in our code. There are a few steps we can take to facilitate this important aspect of software design. 1. Test it now. When we find a bug, fix it immediately. The longer we put off fixing a mistake the more complicated the system becomes, making it harder to find. Remember that bugs do not go away on their own, but we can make the system so complex that the bugs will manifest themselves in a mysterious and obscure fashion. For the same reason, we should completely test each module individually, before combining them into a larger system. We
1.8 䡲 Debugging Theory
21
should not add new features before we are convinced the existing system is bug-free. In this way, we start with a working system, add features, then debug this system until it is working again. This incremental approach makes it easier to track progress. It allows us to undo bad decisions, because we can always revert back to a previously working system. Adding new features before the old ones are debugged is very risky. With this sloppy approach, we could easily reach the project deadline with 100% of the features implemented, but have a system that doesn’t run. In addition, once a bug is introduced, the longer we wait to remove it, the harder it will be to correct. This is particularly true when the bugs interact with each other. Conversely, with the incremental approach, when the project schedule slips, we can deliver a working system at the deadline that supports some of the features. Maintenance Tip: Go from working system to working system.
2. Plan for testing. How to test each module should be considered at the start of a project. In particular, testing should be included as part of the design of both hardware and software components. Our testing and the client’s usage go hand in hand. In particular, how we test the module will help the client understand the context and limitations of how our component is to be used. On the other hand, a clear understanding of how the client wishes to use our hardware/software component is critical for both its design and its testing. Maintenance Tip: It is better to have some parts of the system that run with 100% reliability than to have the entire system with bugs.
3. Get help. Use whatever features are available for organization and debugging. Pay attention to warnings, because they often point to misunderstandings about data or functions. Misunderstanding of assumptions that can cause bugs when the software is upgraded, or reused in a different context than originally conceived. Remember that computer time is a lot cheaper than programmer time. Maintenance Tip: It is better to have a software system that runs slow than one that does run at all.
4. Deal with the complexity. In the early days of microcomputer systems, software size could be measured in 100’s of lines of source code using 1000’s of bytes of memory. These early systems, due to their small size, were inherently simple. The explosion of hardware technology (both in speed and size) has lead to a similar increase in the size of software systems. Some people forecast that by the next decade, automobilies will have 10 million lines of code in their embedded systems. The only hope for success in a large software system will be to break it into simple modules. In most cases, the complexity of the problem itself can not be avoided. E.g., there is just no simple way to get to the moon. Nevertheless, a complex system can be created out of simple components. A real creative effort is required to orchestrate simple building blocks into larger modules, which themselves are grouped to create even larger systems. Use your creativity to break a complex problem into simple components, rather than developing complex solutions to simple problems. Observation: There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies and the other way is make it so complicated that there are no obvious deficiencies. C.A.R. Hoare, ”The Emperor’s Old Clothes,“ CACM Feb. 1981.
1.8
Debugging Theory The last section of every chapter will address debugging techniques. Every programmer is faced with the need to debug and verify the correctness of his or her software. A debugging instrument is hardware or software used for the purpose of debugging. In this book, we will study hardware-level probes like the logic analyzer and in-circuit-emulator
22
1 䡲 Introduction to Embedded Microcomputer Systems
(ICE); software-level tools like simulators, monitors, and profilers; and manual tools like inspection and print statements. Nonintrusiveness is the characteristic or quality of a debugger that allows the software/hardware system to operate normally as if the debugger did not exist. Intrusiveness is used as a measure of the degree of perturbation caused in program performance by the debugging instrument itself. For example, a printf statement added to your source code is very intrusive because it significantly affects the real-time interaction of the hardware and software. A debugging instrument is classified as minimally intrusive if it has a negligible effect on the system being debugged. In a real microcomputer system, breakpoints and single-stepping are also intrusive, because the real hardware continues to change while the software has stopped. When a program interacts with real-time events, the performance can be significantly altered when using intrusive debugging tools. On the other hand, dumps, dumps with filter and monitors (e.g., output strategic information on LEDs) are much less intrusive. A logic analyzer that passively monitors the activity of the software is completely non-intrusive. An in-circuit emulator is also nonintrusive because the software input/output relationships will be the same with and without the debugging tool. Similarly, breakpoints and single-stepping on a simulator like TExaS are nonintrusive, because the simulated hardware and the software are affected together. Checkpoint 1.9: What does it mean for a debugging instrument to be minimally intrusive? Give both a general answer and a specific criterion.
Research in the area of program monitoring and debugging mirrors the rapid pace of developments in other areas of computer architecture and software systems. Because of the complexity explosion in computer systems, effective debugging tools are essential. Some experts predict the software footprint programmed into the embedded systems of one automobile will soon reach 10 million lines of code. The critical aspect of debugging an embedded system is the ability to see what the software is doing, where it is executing, and when did it do it, without the debugger itself modifying system behavior. Terms such as program testing, diagnostics, performance debugging, functional debugging, tracing, profiling, instrumentation, visualization, optimization, verification, performance measurement, and execution measurement have specialized meanings, but they are also used interchangeably, and they often describe overlapping functions. For example, the terms profiling, tracing, performance measurement, or execution measurement may be used to describe the process of examining a program from a time viewpoint. But, tracing is also a term that may be used to describe the process of monitoring a program state or history for functional errors, or to describe the process of stepping through a program with a debugger. Usage of these terms among researchers and users vary. Furthermore, the meaning and scope of the term debugging itself is not clear. In this book the goal of debugging is to maintain and improve software, and the role of a debugger is to support this endeavor. The debugging process is defined as testing, stabilizing, localizing, and correcting errors. Although testing, stabilizing, and localizing errors are important and essential to debugging, they are auxiliary processes: the primary goal of debugging is to remedy faults or to correct errors in a program. Stabilization is the process of fixing the inputs so that the system can be run over and over again yielding repeatable outputs. Although, a wide variety of program monitoring and debugging tools are available today, in practice it is found that an overwhelming majority of users either still prefer or rely mainly upon “rough and ready” manual methods for locating and correcting program errors. These methods include desk-checking, dumps, and print statements, with print statements being one of the most popular manual methods. Manual methods are useful because they are readily available, and they are relatively simple to use. But, the usefulness of manual methods is limited: they tend to be highly intrusive, and they do not provide adequate control over repeatability, event selection, or event isolation.
1.9 䡲 Tutorial 1. Getting Started
23
A real-time system, where software execution timing is critical, usually can not be debugged with simple print statements, because the print statement itself will require too much time to execute. A debugging instrument is defined as hardware or software that is added to the system for the purpose of debugging. A print statement is a common example of an instrument. Using the editor, one adds print statements to the code that either verify proper operation or illustrate the programming errors. If we test a system, then remove the instruments, the system may actually stop working, because of the importance of timing in embedded systems. If we leave debugging instruments in the final product, we can use the instruments to test systems on the production line, or test systems returned for repair. On the other hand, sometimes we wish to provide for a mechanism to reliably and efficiently remove all instruments when the debugging is done. Consider the following mechanisms as you develop your own unique debugging style. 䡲 Place all instruments in a unique column, so you can easily distinguish instruments from regular programs. 䡲 Define all debugging instruments as functions that all have a specific pattern in their names. In this way, the find/replace mechanism of the editor can be used to find all the calls to the instruments. 䡲 Define the instruments so that they test a run time global flag. When this flag is turned off, the instruments perform no function. Notice that this method leaves a permanent copy of the debugging code in the final system, causing it to suffer a runtime overhead, but the debugging code can be activated dynamically without recompiling. Many commercial software applications utilize this method because it simplifies “on-site” customer support. 䡲 Use conditional compilation (or conditional assembly) to turn on and off the instruments when the software is compiled. When the assembler or compiler supports this feature, it can provide both performance and effectiveness. The emergence of concurrent languages and the increasing use of embedded real-time systems place further demands on debuggers. The complexities introduced by the interaction of multiple events or time dependent processes are much more difficult to debug than errors associated with sequential programs. The behavior of non-real-time sequential programs is reproducible: for a given set of inputs their outputs remain the same. In the case of concurrent or real-time programs this does not hold true. Control over repeatability, event selection, and event isolation is even more important for concurrent or real-time environments. Checkpoint 1.10: Consider the difference between a runtime flag that activates a debugging command versus an assembly/compile-time flag. In both cases it is easy to activate/deactivate the debugging statements. For each method, list one factor for which that method is superior to the other. Checkpoint 1.11: What is the advantage of leaving debugging instruments in a final delivered product?
1.9
Tutorial 1. Getting Started Tutorials in this book represent short activities for you to do on your own. Each tutorial that allows you to have a hands-on experience to support the basic concepts. An action defines a specific task that you should perform. The answers to the questions can be found at the end of the book. The objective of this first tutorial is to provide an overview embedded system development in general and of the TExaS simulator in particular. When you are ready to use the TExaS simulator to develop your own programs, first perform this tutorial, then install TExaS and read the Getting Started section found in the TExaS help menu.
24
1 䡲 Introduction to Embedded Microcomputer Systems Action: Watch the first getting started movie, called Lesson 1. First time users of TExaS should watch the Lesson 1 animation located on the web at http:// users.ece.utexas.edu/⬃valvano/Readme.htm. This lesson introduces the major components of the application. It takes about 11 minutes and provides a narrated overview of the TExaS application. You need not install TExaS, just download and run the Windows media file. Question 1.1 The branch instruction causes what two instructions to execute ten times? Question 1.2 Into what type of memory is count defined? Question 1.3 Into what type of memory is the main program defined? Question 1.4 What are the special features of the TExaS editor help in writing assembly programs? Question 1.5 What microcomputer is being simulated? Question 1.6 What does the red cursor in the listing file signify? Question 1.7 Does the TExaS application simulate both hardware and software or just software?
1.10
Homework Assignments Homework 1.1 In order to reduce power, some microcomputers run on 3.3 V instead of 5 V. Redraw Figure 1.1 using 3.3 V power, and define what logic high and logic low would be for this system. Assume the resistance path from the 5 V supply to ground for 5 V logic is approximately equal to the resistance from the 3.3 V supply to ground for 3.3 V logic. What is the percentage reduction in power occurring by switching from 5 V to 3.3 V. Homework 1.2 There is a microcomputer embedded in a vending machine. List three operations the software must perform. Homework 1.3 What is a port? Homework 1.4 What does nonvolatile mean? Homework 1.5 What do the acronyms RAM ROM I/O DAC ADC mean? Homework 1.6 What is the difference between a microcomputer and a microcontroller? Homework 1.7 Using a flowchart describe the control algorithm that a thermostat must use to maintain constant temperature. Assume the inputs are current temperature in F, the desired temperature in F, and an AC/off/heat three-way switch. The outputs are AC (on/off) and heat (on/off). Write a brief software requirements document for this system. Homework 1.8 Using a flowchart describe the cruise control algorithm that a car must use to maintain constant speed. Assume the inputs are current speed in mph, brake (on/off), and a cruise on/off momentary button. The output is accelerator position (0 to 100%). The desired current is the current speed at the time the cruise control is activated. Touching the brake turns off the system. Write a brief software requirements document for this system. Homework 1.9 Draw a flowchart of the following C program. Assume PORTB is an output. This is an incremental controller that maintains the motor at a constant speed of 100. void main(void) { unsigned char power,speed; power = 0; ADC_Init(); /* turn on ADC power */ while(1){ PORTB = power; /* output to actuator */ speed = ADC_Read(); if(speed < 100){ /* too slow */ if(power < 255) power++; }
1.10 䡲 Homework Assignments
25
else{ /* too fast */ if(power > 1) power--; } } } Homework 1.10 Write C code for the flowchart shown in Figure Hw1.10. PORTB is an output connected to a stepper motor. PORTA is an input connected to a toggle switch. Figure Hw1.10 Flowchart showing a stepper motor controller, used for Homework 1.10.
step(n) main
read PORTA
step(5) step(9)
bit0 1
0
PORTB=n
step(10) cnt = 10000 step(6)
cnt =0 return
>0 cnt = cnt-1
Homework 1.11 Draw a data flow graph of the thermostat algorithm developed in Homework 1.7. Homework 1.12 Draw a data flow graph of the cruise control algorithm developed in Homework 1.8. Homework 1.13 Draw a flowchart of this C program using just the three basic building blocks of structured programming. In particular, first draw the flowchart in the regular way, then show the groupings that define each basic block. short data[100],sum; void calc(void){ short i; sum=0; for(i=0;i<100;i++) sum=sum+data[i]; } Homework 1.14 Draw a flowchart of this C program using just the three basic building blocks of structured programming. In particular, first draw the flowchart in the regular way, then show the groupings that define each basic block. short decide(short in){ short out; switch(in){ case 0: out=1; break; case 1: out=2; break; default: out=3; } return out; } Homework 1.15 Draw a flowchart of this C program using just the three basic building blocks of structured programming. In particular, first draw the flowchart in the regular way, then show the groupings that define each basic block. Hint: Look at Homework 1.14. short decide(short in){ if(in==0) return 1; if(in==1) return 2; return 3; }
26
1 䡲 Introduction to Embedded Microcomputer Systems Homework 1.16 The first two flowchart elements of Figure 1.7 form a do-while loop, as redrawn in Figure Hw1.16. This do-while loop can be written in C: do{ Input = ReadSwitch(); // true if switch is pressed } while(Input==0); Prove that a do-while loop falls into the class of structured programming by redrawing Figure Hw1.15 using just the three basic building blocks, as shown in Figure 1.6.
Figure Hw1.16 Flowchart showing do-while loop, used for Homework 1.16.
Input from switch Start
Not pressed Pressed
Homework 1.17 Consider a telephone switching network. For simplicity assume there are only four telephones on this system. Each phone has a keypad with numbers, a call button and a hang up button. Each phone also has a speaker, a microphone and a communication link to the network. Draw a data flow graph of this system showing four rectangles for the keypad, four rectangles for the speakers, and four rectangles for the microphones. Assume there is a dedicated microcontroller for each phone and a central computer to handle the network switching. Draw a possible data flow graph for system, that allows each phone to call the other three. Homework 1.18 Consider an MP3 player, which has a memory storage device, an MP3 decoder (converts compressed data into raw form), buttons, an LCD display, a USB link to a PC, dual DACs for stereo output, and stereo headphones. Draw a possible data flow graph for system. Homework 1.19 Consider a system with four modules named A, B, C, and D. Draw the call graph if A calls B, C calls A and D, and B calls C. Why can’t the four individual modules in this system be tested one at a time? Homework 1.20 List three factors that we can use to evaluate the “goodness” of a program.
2
Introduction to Assembly Language Programming Chapter 2 objectives are to: c Introduce assembly language programming for the 9S12 c Discuss where in memory to place programs and data c Present simple addressing modes
The overall goal of this chapter is to introduce just enough assembly language programming so you can get started designing and running programs on the 9S12. In the next few chapters, we will study software development in detail, but for now we introduce the basics of assembly language programming. Software written for embedded systems like the 9S12 is tightly coupled to their I/O devices. Although the major focus of this book involves I/O programming, we will begin with a general presentation of how to write assembly programs for the 9S12. When we write software we think about the actions we wish to perform and the decisions we wish to make. Then, we sort these activities into sequences: what to do first, second etc. Some tasks should be performed once at the beginning, other tasks need to be performed over and over, other tasks are to be performed only when certain conditions are satisfied, and some tasks need to be executed on a periodic basis. Programming is the process of translating these sequential activities into very specific software codes that perform the actions and make the decisions.
2.1
Binary and Hexadecimal Numbers To solve problems using a computer we need to understand binary numbers and what they mean. Each digit in a decimal number has a place and a value. The place is a power of 10 and the value is selected from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. A decimal number is simply a combination of its digits multiplied by powers of 10. For example 1984 1•103 9•102 8•101 4•100 Fractional values can be represented by using the negative powers of 10. For example, 273.15 2•102 7•101 3•100 1•101 5•102
27
28
2 䡲 Introduction to Assembly Language Programming
In a similar manner, each digit in a binary number has a place and a value. In binary numbers, the place is a power of 2, and the value is selected from the set {0, 1}. A binary number is simply a combination of its digits multiplied by powers of 2. To eliminate confusion between decimal numbers and binary numbers, we will put a subscript 2 after the number or a % before the number to mean binary. Because of the way the 9S12 operates, most of the binary numbers in this book will have either 8 or 16 bits. An 8-bit number is called a byte, and a 16-bit number is called a word. For example, the 8-bit binary number for 106 is 011010102 %01101010 0•27 1•26 1•25 0•24 1•23 0•22 1•21 0•20 643282 Checkpoint 2.1: What is the numerical value of the 8-bit binary number %11111111?
Binary fixed-point is a number system that allows fractional values to be represented by using the negative powers of 2. Fixed-point numbers will be presented in later in Section 9.1. Binary is the natural language of computers, but a big nuisance for us humans. To simplify working with binary numbers, humans use a related number system called hexadecimal, which uses base 16. Just like decimal and binary, each hexadecimal digit has a place and a value. In this case, the place is a power of 16 and the value is selected from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F}. As you can see, hexadecimal numbers have more possibilities for their digits than are available in the decimal format; so, we add the letters A through F, as shown in Table 2.1. A hexadecimal number is a combination of its digits multiplied by powers of 16. To eliminate confusion between various formats, we will put a $ before the number to mean hexadecimal. Hexadecimal representation is a convenient mechanism for us humans to define binary information, because it is extremely simple for humans to convert back and forth between binary and hexadecimal. Hexadecimal number system is often abbreviated as “hex”. A nibble is defined as four binary bits, or one hexadecimal digit. Each value of the 4-bit nibble is mapped into a unique hex digit, as shown in Table 2.1. For example, the hexadecimal number for the 16bit binary %0001 0010 1010 1101 is $12AD 1•163 2•162 10•161 13•160 409651216013 4781 Checkpoint 2.2: What is the numerical value of the 8-bit hexadecimal number $FF?
Table 2.1 Definition of hexadecimal representation.
Hex Digit
Decimal Value
Binary Value
$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $A or $a $B or $b $C or $c $D or $d $E or $e $F or $f
As illustrated in Figure 2.1, to convert from binary to hexadecimal we can: 1. divide the binary number into right justified nibbles, 2. convert each nibble into its corresponding hexadecimal digit. Figure 2.1 Example conversion from binary to hexadecimal.
%0011011001111101
Binary
Nibbles 0011 0110 0111 1101 Hexadecimal
$367D
Checkpoint 2.3: Convert the binary number %01000101 to hexadecimal. Checkpoint 2.4: Convert the binary number %110010101011 to hexadecimal.
As illustrated in Figure 2.2, to convert from hexadecimal to binary we can: 1. convert each hexadecimal digit into its corresponding 4-bit binary nibble, 2. combine the nibbles into a single binary number. Figure 2.2 Example conversion from hexadecimal to binary.
Hexadecimal
$1E9B
Nibbles 0001 1110 1001 1011 Binary
%0001111010011011
Checkpoint 2.5: Convert the hex number $40 to binary. Checkpoint 2.6: Convert the hex number $63F to binary. Checkpoint 2.7: How many binary bits does it take to represent $123456?
Computer programming environments use a wide variety of symbolic notations to specify the numbers in hexadecimal. As an example, assume we wish to represent the binary number %01111010. Most assembly languages including Metrowerks CodeWarrior use $7A. Intel and Texas Instruments (TI) assembly languages use 7AH. The C language uses 0x7A. Patt’s LC-3 simulator uses x7A. TExaS will accept either $7A or 0x7A.
2.2
Addresses, Registers, and Accessing Memory Figure 2.3 shows the memory model of a simplified 9S12 computer. From a memory model perspective, I/O RAM and ROM are accessed by the computer in a similar manner. In particular, we can read data from them and write data to them. An address specifies the location from
Figure 2.3 The memory model of a simplified 9S12 computer.
Data
Address $0000
I/O PTT
$0240 $57
Processor RegA
$34 $3800 $62
PC
RAM
$F000 $F004 $B6
Bus
$FFFF
EEPROM
30
2 䡲 Introduction to Assembly Language Programming
where to read data or to where to write data. On the 9S12 addresses are a simple linear sequence beginning at $0000 and ending at $FFFF. The small white boxes in Figure 2.3 represent 8-bit storage elements, and the large white box labelled PC is 16 bits. As we saw in Chapter 1, the computer has input/output ports with which it communicates with its external world. For example, the 9S12 address $0240 points to one such I/O port called Port T (PTT), $3800 points to a location in RAM, and $F004 points to a location in EEPROM. The registers are high-speed storage devices located in the processor. Registers do not have addresses, but rather they have names or numbers explicitly defined by the assembly software we write. In this simplied 9S12, we see Register A (RegA) is eight bits wide and is located in the processor. The program counter (PC) is a 16-bit register containing the address of where the program is executing. Other than the registers, each 8-bit storage element has a unique and separate 16-bit address. Software is allowed to read from and write to I/O and RAM elements, but in general software is not allowed to write to ROM (this is why it is called “read only memory”). Memory diagrams like Figure 2.3 will be drawn in the book with smaller addresses towards the top of the page and larger addresses towards the bottom of the page. Drawing the memory diagrams in a consistent manner is obviously a good plan, but which way to orient your drawings is completely arbitrary. Therefore, the terms “top of memory” and “bottom of memory” should be avoided, because their usage is ambiguous in the computer community. To describe operations that read/write memory, we will use the following notation in this book [U] {U} [U] {U}
specifies an 8-bit read from address U specifies a 16-bit read from addresses U, U1 (most significant byte first) specifies an 8-bit write to address U specifies a 16-bit write to addresses U, U1 (most significant byte first)
For example, the notation RegAⴝ[$3800] means perform an 8-bit memory read from location $3800 and place the result into Register A. After the read, there are two copies of the data: the original data still at memory location $3800 and a second copy now in Register A. In contrast, the notation [$3800]ⴝRegA means perform an 8-bit memory write placing the value from Register A into memory location $3800. After the write, there are also two copies of the data: the original data still in Register A and a second copy now at memory location $3800. It would be a mistake to try and store a 16-bit value into an 8-bit register; so this command would be nonsensical RegAⴝ{$3800}. The phrase “most significant byte first” means this one 16-bit operation {$3800}ⴝ$1234 is equivalent to these two 8-bit operations [$3800]ⴝ$12 [$3801]ⴝ$34, because $12 is the most significant byte of $1234. Observation: 16-bit data is stored in memory in two consecutive locations, with the most significant byte at the first location and the least significant byte at the second location. Checkpoint 2.8: What does [$0240]ⴝ RegA mean literally? What is the overall action?
The 9S12 has six registers as depicted in Figure 2.4. An accumulator is a register typically used to hold and manipulate numbers. Registers A and B are accumulators that can be concatenated together form one 16-bit accumulator, Register D, with Register A Figure 2.4 The 9S12 has six registers.
7 0 S X H I NZ V C 8
15 Register A
Register B
CC 8-bit condition code D
Two 8-bit accumulators
X
16-bit index register
Y
16-bit index register
SP 16-bit stack pointer PC 16-bit program counter
2.2 䡲 Addresses, Registers, and Accessing Memory
31
containing the most significant byte. Registers X and Y are index registers, which typically contain addresses (or pointers.) PC is the program counter, and it points to the current instruction. The PC defines where your program is executing. There are eight condition code bits packed into the condition code register (called CC or CCR). Each bit has a separate and unique usage. Some instructions set individual bits in the CCR to signify the result of the operation. For example, the Z bit is set after an arithmetical or logical operation signify whether or not the result is zero. Some instructions test individual bits in the CCR and perform different actions depending on the values of certain bits. For example there is a “branch on zero” instruction (beq) that will execute a program branch if the Z bit is set. Checkpoint 2.9: Think about how you could use the “subtract” and the “branch on zero” instructs to test if two numbers are equal?
When developing programs on a real 9S12 that uses the Freescale Serial Monitor, we will place a cli instruction near the beginning of our programs so the debugger will be active. The cli instruction clears the I bit in the CCR. Details of the I bit and the use of interrupts will be presented in Chapter 9. The stack is temporary storage implemented in RAM. If a piece of information is important, we can push it on the stack. Later, when we wish to retrieve the data, we pull it off the stack. On the 9S12, SP is the stack pointer, which points to the top element of the stack. When programming on the 9S12, ‘S’ and ‘SP’ both refer to the stack pointer. To push data on the stack, we first decrement the SP, then store the data at the location specified by SP. The boxes in Figure 2.5 represent 8-bit storage elements in RAM. The grey boxes in the figure refer to actual data stored on the stack, and the white boxes refer to locations in memory that do not contain stack data. This figure illustrates how the stack is used to push the numbers 1 2 3 in that order. To pop data from the stack, we first read the data from the location specified by SP, then we increment the SP. Figure 2.5 Stack picture as three numbers are pushed.
push 1
push 2
push 3
SP SP SP
1
2 1
3 2 1
SP
Tables 2.2, 2.3, and 2.4 display simplified memory maps for the 9S12C32, 9S12DP512, and 9S12E128 respectively. For a detailed memory map of the specific 9S12 you are using, please refer to its data sheet.
Table 2.2 The 9S12C32 has I/O ports, RAM, and EEPROM.
Address
Size
Device
Device
Contents
$0000 to $03FF $3800 to $3FFF $4000 to $7FFF $C000 to $FFFF
Access external devices Fixed constants Variables and stack Programs and fixed constants
32
2 䡲 Introduction to Assembly Language Programming
Table 2.4 The 9S12E128 has I/O ports, RAM, and EEPROM.
Address
Size
Device
Device
Contents
$0000 to $03FF 1024 I/O ports Input/output devices Access external devices $2000 to $3FFF 8192 RAM Random access memory Variables and stack $4000 to $FFFF 49152 EEPROM Electrically erasable PROM Programs and fixed constants
Observation: The 9S12DP512 actually has 524,288 bytes of EEPROM, but in this book we will only use the 49152 bytes that are easily accessible. The paging hardware required to access all EEPROM bytes will not be considered in detail in this book.
2.3
Assembly Syntax 2.3.1 Assembly Language Instructions
Assembly language instructions, assembly code, and assembly source code are equivalent terms for software we write. The Load Accumulator A (ldaa) command will read an 8-bit value from memory and place it into RegA. We use ldaa to illustrate the four fields existing in assembly language code. The label field is optional and starts in the first column and is used as the target location when performing a branch. You must choose a unique name for each label. The opcode field specifies the command (either an op code for the 9S12 to execute or a pseudo op code for the assembler to use). A list of the 9S12 op codes can be found on the inside cover of the book. The operand field specifies the data itself or where to find the data to execute the instruction. We will see opcodes have 0, 1, 2, or 3 operands, separated by commas. The comment field is optional and is ignored by the computer, but allows you to document your software, making it easier to understand. In this book, we will begin every comment with semicolon, and place no spaces in the operand field. In this way, the syntax will be valid for both Metrowerks CodeWarrior (which allows spaces in the operand field, but requires the semicolon for comments) and TExaS (which does not allow spaces in the operand field, but does not require the semicolon for comments.) label here
opcode ldaa
operand $3800
comment ;RegAⴝ[$3800]
Good programmers add comments to explain how the code works, why the code is being executed, how the code was tested, or in what ways the code could be changed. But for now, we are learning what each instruction is doing, so in this chapter comments will simply describe what the instruction does. Machine instructions, machine code, and object code are three equivalent terms defining the information loaded into EEPROM that can be executed by the processor. The assembly language instructions need to be translated into machine instructions before they can be run. For example, the ldaa $3800 instruction will be translated into three bytes of machine code: $B6,$38,$00. A program will have many assembly language instructions. The assembler converts the entire assembly program into machine code, which is then stored into EEPROM. Running software is essentially the activities of the processor as it executes machine code. A simplified explanation of how processors execute machine code will be presented in later in this chapter. But for now, we need to understand that if we want to read the data from memory location $3800, placing it in Register A, we write the assembly code ldaa $3800. Observation: TExaS will accept hexadecimal numbers in $3800 or 0x3800 format.
2.4 䡲 Simplified 9S12 Machine Language Execution
2.3.2 Pseudo Operation Codes
33
Pseudo-op, pseudo operation code and assembly directive are equivalent terms for an operation that is not executed by the computer at run time, but rather is interpreted by the assembler during the assembly process. To get started, we will need four pseudo ops. The top to bottom order of the assembly code defines the $0000 to $FFFF order in which they are stored in memory. However, we can use the origin pseudo op to define the memory location into which the subsequent assembly lines will be placed. Typically, there will be three org pseudo ops in our programs: one to place variables in RAM, one to place programs in ROM, and a third to define the reset vector at $FFFE. For example, this line is placed before the executable part of our program so the machine codes will be loaded into EEPROM memory starting at location $4000. org
$4000
We will use the equate pseudo op to define a symbol. We use equ to define an assembly constant, which will make our software easier to read. The label field specifies the symbol and the operand field defines its value. For example, these lines define the symbols PTT and DDRT. In particular, whenever the symbol PTT is used in our program, it will be replaced during the assembly process by the value $0240. PTT DDRT
equ equ
$0240 $0242
We will use the reserve multiple bytes pseudo op to define uninitialized global variables. The label field can be used to define a name for the variable, and the operand field specifies the size in bytes. On the 9S12DP512, RAM begins at address $0800. A 16-bit variable will require two bytes in memory. The following example defines a 16-bit variable called Ptr1 and an 8-bit variable called Data1. org Ptr1 rmb Data1 rmb
$0800 2 1
In the above example, the variable Ptr1 can be found at addresses $0800 and $0801. However, when we access the variable we will perform 16-bit read/write accesses to address $0800, which will automatically access both $0800 and $0801. Checkpoint 2.10: Where in memory will the variable Data1 be located?
We will use the form double byte pseudo op to define a 16-bit constant, which will be stored into memory. Most pseudo ops do not create object code that will be loaded into memory. In contrast, the fdb pseudo op defines a 16-bit value, which will exist in memory of the microcontroller. For example, these two pseudo ops will store the 16-bit address of main into memory at ROM locations $FFFE and $FFFF. In particular, when the 9S12 starts (after power up or after a reset), it will load the program counter (PC) with the 16-bit contents at locations $FFFE and $FFFF. In other words, these two lines specify where to begin executing our software. org fdb
2.4
$FFFE main
Simplified 9S12 Machine Language Execution In this section, we present a simplified cycle-by-cycle analysis for the 9S12. The purpose of considering a simplified version to understand in general how a computer executes instructions without being burdened with the extreme complexities that exist in today’s high-speed processors. The TExaS simulator can be enabled to display these simplified bus cycles. TExaS is a co-simulator, meaning simulates both the hardware devices and software action at the same time. The assembler will give the real number of cycles and the real cycle
34
2 䡲 Introduction to Assembly Language Programming
sequence in the assembly listing. However, when showing the cycle-by-cycle execution, a simplified bus cycle is displayed instead of the actual 9S12 bus cycles. The major differences between the real 9S12 and the simplified TExaS display are shown in Table 2.5. Assuming the computer is running in single chip mode, TExaS properly simulates the action of each instruction and the amount of time each instruction takes to execute. Table 2.5 Differences between a real 9S12 and the TExaS bus cycle simulation.
Actual 9S12 Cycle-By-Cycle
Simplified Cycle-By-Cycle
Sometimes 8, sometimes 16-bit data Special case for 16-bit accesses to odd address Instruction queue enhances speed Fetches op codes for later execution Fetches op codes that are never executed 9S12 allows for up to 19-bit address Variable length off-chip accesses
Sequence of 8-bit accesses Even and odd addresses treated the same Simple fetch-execute sequence Fetches op codes for immediate execution Fetched op codes are always executed Always 16-bit address All accesses exactly same cycle period
A simple processor like the 9S12 has four major components, as illustrated in Figure 2.6. The control unit (CU) orchestrates the sequence of operations in the processor. The CU issues commands to the other three components. The instruction register (IR) contains the op code for the current instruction. Most 9S12 op codes are 8 bits wide, but some are 16 bits. The TExaS simulator allows you to observe the IR during execution. The arithemetic logic unit (ALU) performs arithemetic operations such as addition, subtraction, multiplication and division. The ALU also performs logical operations such as and, or, and shift. The program counter (PC) points to the memory containing the instruction to execute next. The bus interface unit (BIU) reads data from the bus during a read cycle, and writes data onto the bus during a write cycle. The effective address register (EAR) contains the data address for the current instruction. The TExaS simulator allows you to observe the EAR during execution. Figure 2.6 Block diagram of a simplified 9S12 computer.
9S12 Processor Registers A PC
Control unit IR
Bus Bus interface unit 8
EAR
ALU
Memory
16
I/O ports
Address R/W Data
This simplified bus contains 16 address lines, 8 data lines and a R/W signal. There are two types of bus cycles that the processor uses to communicate with memory. For both types of cycles, the processor drives the address bus and the R/W signal, see Figure 2.6. The 16-bit address bus selects which memory location (or I/O device) to access. The R/W signal specifies read or write. During a read cycle (R/W1), the memory at the specified address puts the information on the data bus, and the processor transfers the information (8 bits in this simplified simulation) into the appropriate place within the processor. The processor has 4 types of read cycles: Instruction fetch. The address is the PC and the 8-bit data is loaded into the instruction register, IR. Operand fetch. The address is also the PC, but the 8-bit data is used to calculate the effective address. Data fetch. The address is the EAR, and the 8-bit data is loaded into a register or sent to the ALU. Stack pull. First, the 8-bit data is read from memory pointed to by SP and stored in a register, then the stack pointer is incremented SP SP 1.
2.4 䡲 Simplified 9S12 Machine Language Execution
35
During a write cycle (R/W 0), the processor puts the information on the data bus, and the memory transfers the information into the specified location. The write cycles can be grouped into two types: Data write. The 8-bit data from a register or ALU is stored in memory at the address specified by the EAR. Stack push. First, the stack pointer is decremented SPSP1, then the 8-bit data from a register is stored in memory at the address specified by the SP. In general, the execution of an instruction goes through many phases. First, the computer fetches the machine code for the instruction by reading the value in memory pointed to by the program counter (PC). After each byte of the instruction is fetched, the PC is incremented. During phases 2 and 3, the instruction is decoded, and the effective address is determined (EAR). Many instructions require additional data, and during phase 4 the data is retrieved from memory at the effective address. During phase 5, the actual function for this instruction is performed. Sometimes the computer bus is idle at this time, because no additional data is required. For example, a 16-bit integer divide on the 9S12 takes 12 bus cycles to complete: two cycles are Phase 1, and the other ten are phase 5 free cycles. During the last phase, the results are written back to memory. All instructions have a Phase 1, but the other phases may or may not occur for any specific instruction. Phase 1 requires 1 to 6 bus cycles to fetch the entire machine code for the instruction. The op code is placed in the IR, and the operand either contains data itself or is used to determine the memory address of the data. The subsequent phases may require 0 or more bus cycles to complete. Each bus cycle reads or write one piece of data. On the real 9S12, read and write cycles can transfer 8-bit or 16-bit data, but this simplified analysis all cycles are 8-bit. The simplified execution has six phases, but in this discussion we will focus only those phases that generate bus cycles (shown in bold): Phase 1 2 3 4 5 6
Function Op code fetch Operand fetch Decode instruction Evaluation address Data read Free cycle Data store
R/W read read none none read read write
Address PCⴙⴙ PCⴙⴙ
SP,EAR PC/SP/$FFFF SP,EAR
Comment Put op code into IR Immediate or calculate EA Figure out what to do Determine EAR Data passes through ALU, ALU operations, set CCR Results stored in memory
Phase 1. Opcode and operand fetch. The execution of 9S12 instruction begins with fetching the op code and putting it in the IR. Additional phase 1 cycles will occur until the entire machine code (op code and operand) are fetched. The PC is incremented after fetching the op code and incremented again after fetching the operand. Phase 2. Decode instruction. The op code will tell the control unit exactly what steps need to be performed to execute the instruction. This phase happens so quickly that bus cycles are not needed. Phase 3. Evaluate address. During this phase the processor will set the EAR pointing to the address where to access data in memory. Usually, this phase does not require any bus cycles. Phase 4. Data read. If the instruction requires data from memory, it will use the EAR to read data from memory as needed. It takes a bus cycle to read data from memory, but since registers are inside the processor, no bus cycles occur as data in saved into a register. Remember registers do not have addresses, and see in Figure 2.6 that registers are not attached to the bus. Phase 5. Free cycles. Any ALU functions occur next. On the real 9S12 the ALU requires time to execute, but the simplified cycle-by-cycle simulation does not generate an output
36
2 䡲 Introduction to Assembly Language Programming
display for these do-nothing cycles. The simulator accurately models this execution time, it just doesn’t generate output Phase 6. Data write. If required, the last step involves writing data to memory. The address of these writes is determined by the EAR. Examples of these simplified cycles are shown in the next section.
2.5
Simple Addressing Modes A fundamental issue in software development is the differentiation between data and address. It is in assembly language programming in general and addressing modes in specific that this differentiation becomes clear. When we put the number $1000 into Register X, whether this is data or address depends on how the $1000 is used. Most instructions access memory to fetch parameters or save results. The addressing mode is the format the instruction uses to specify the memory location to read or write data. All instructions begin by fetching the machine instruction (op code and operand) pointed to by the PC. Some instructions operate completely within the processor and require no memory data fetches. These instructions have no operand and are classified as inherent. If the data is found in the instruction itself, the instruction uses immediate addressing mode. The data will be a constant, meaning each time that instruction is executed, it will use same data value. If the instruction uses the absolute address to specify the memory data location, the instruction uses either direct or extended addressing mode. Because the data in memory can change during execution, different data values may exist each time that instruction is executed. For example, we will use direct or extended addressing modes to access global variables and I/O ports. Indexed addressing mode uses a register pointer to access data in memory. For example, if we use Register X indexed mode, then Register X contains the address of the data. Indexed addressing allows us to use different addresses each time that instruction is executed. A simple indexed addressing mode is presented in this section, and a detailed explanation of indexed addressing can be found in Section 6.1. Many computers, including the 9S12, use PC-relative addressing mode to encode branch instructions. PC-relative addressing makes the object code smaller and relocatable. Relocatable means the machine code can be moved to a different address, and it still works, without reassembling. Checkpoint 2.11: What is the addressing mode used for?
In the following subsections, these six addressing modes are explained. Although the following program doesn’t really do anything useful, the program illustrates these addressing modes while executing these six instructions over and over. We will assume RegX equals $3900. In other words, RegX points to memory location $3900. org main clra ldaa ldaa ldaa ldaa bra org fdb
These simple addressing modes will be sufficient to understand most of the software presented in this book. The more complicated addressing modes will be presented later in Chapter 6. These complex addressing modes will be required to implement local variables and to build the data structures presented in Chapters 6 and 7.
2.5 䡲 Simple Addressing Modes
2.5.1 Inherent Addressing Mode
Figure 2.7 Example of the inherent addressing mode (before execution).
37
Inherent addressing mode has no operand field. Sometimes there is no data for the instruction at all. For example, the stop instruction halts execution. Sometimes the data for the instruction is implied. For example, the clra instruction sets register A to zero. In this case, the data value of zero is implied. Another instruction using inherent mode is cli, which clears the I-bit in the CCR to zero. On the other hand, sometimes the data must be fetched from memory, but the address of the data is implied. For example, the pula instruction will pop an 8-bit data from the stack and store it in register A. In particular, the data value pointed to by the SP is read from memory and stored into register A. The machine code for the instruction clra is $87. At time of executing this first instruction, we assume the $87 machine code is stored in memory at $F000 and the PC equals $F000, as shown in Figure 2.7. Notice that the value “0” for the clra instruction is not stored anywhere in memory.
PC
$F000 A
$00
EEPROM $EFFF clra $F000 $87 $F001 $F002
The execution of this instruction requires only one memory bus cycle. The 9S12 reads the op code. At this point, the 0 is moved into Reg A and the instruction is complete. Registers are internal to the processor (and do not have addresses), so the move into Reg A does not require a memory bus cycle. This execution will also cause the PC to increment to $F001, which will be next instruction. The bus cycle as shown by the TExaS simulator will look something like the following. Opcode fetch
2.5.2 Immediate Addressing Mode
Figure 2.8 Example of the immediate addressing mode (before execution).
R 0xF000 0x87 from EEPROM
Phase 1
Immediate addressing mode uses a fixed data constant. The data itself is included in the machine code as the operand. For example, the ldaa #36 instruction will store a data value of 36 into register A (Figure 2.8). The machine code for this instruction is $86 $24. We will classify the $86 as the opcode and $24 as the operand. In this example, we assume this machine code is stored in memory at $F001 and $F002 and the PC equals $F001. Notice that the “36” itself is encoded in the machine code for the ldaa #36 instruction. In assembly code, immediate addressing mode is signified by the # sign.
PC
$F001 A
$24
EEPROM $F000 $F001 $86 ldaa #36 $F002 $24 $F003
}
Observation: With immediate mode addressing, the information is stored in the machine code.
The execution of this instruction requires two memory bus cycles. In the first cycle, the 9S12 reads the opcode, and in the second cycle it reads the operand. At this point the 36 is moved into Reg A and the instruction is complete. Registers are internal to the processor (and do not have addresses), so the move into Reg A does not require a memory bus cycle. This execution will also cause the PC to increment to $F003, which will be next instruction. Opcode fetch Operand fetch
R 0xF001 0x86 from EEPROM R 0xF002 0x24 from EEPROM
Phase 1 Phase 1
Checkpoint 2.12: What is the difference between ldaa #36 and ldaa #$24?
38
2 䡲 Introduction to Assembly Language Programming
2.5.3 Direct Addressing Mode
Figure 2.9 Example of the directpage addressing mode (before execution).
Direct Page addressing mode uses an 8-bit address to access locations from 0 to $00FF. In many computer systems outside the Freescale family, this addressing mode is called zeropage. These addresses include some of the I/O ports on the 9S12. For example Port K on the 9S12DP512 is located at address $0032. In assembly language, the operator can be used to force direct addressing. Figure 2.9 illustrates the execution of the ldaa $32 instruction. The machine code for this instruction is $96 $32. We will classify the $96 as the opcode and $32 as the operand. The operand in this case is an address. In this example we assume this machine code is stored in memory at $F003 and $F004 and the PC equals $F003. We also assume Port K has the value $57.
PC
$F003 A
$57
I/O $0031 $0032 $57 $0033
EEPROM $F002 $F003 $96 $F004 $32 $F005
}ldaa
$32
Observation: With direct and extended mode addressing a fixed pointer to the information is stored in the machine code. The data itself may change dynamically, but its location is fixed.
The execution of the ldaa $32 instruction requires three cycles. In the first cycle, the 9S12 reads the opcode, the second reads the operand, at this point the 8-bit address $32 is expanded to create the usual 16-bit address $0032, and the third cycle reads the data. Next, the $57 is copied into Reg A and the instruction is complete. This execution will also cause the PC to increment to $F005, which will be next instruction. Opcode fetch R 0xF003 0x96 from EEPROM Operand fetch R 0xF004 0x32 from EEPROM Fetch using EARR 0x0032 0x57 from I/O
Phase 1 Phase 1 Phase 4
Checkpoint 2.13: What is the difference between ldaa #$32 and ldaa $32?
2.5.4 Extended Addressing Mode
Figure 2.10 Example of the extended addressing mode (before execution).
Extended addressing mode uses a 16-bit address allowing access all memory and I/O devices. In many computer systems outside the Freescale family, this addressing mode is called direct, because it can directly access all of memory. In this book, we will adhere to the Freescale terminology. The operator can be used to force extended addressing. In general, when the address happens to fall in the 0 to $00FF range, the assembler will automatically use direct addressing. If the address is between $0100 and $FFFF, it uses extended addressing. Figure 2.10 illustrates the execution of the ldaa $3800 instruction. The machine code for this instruction is $B6 $38 $00. We will classify the $B6 as the opcode and $3800 as the operand. Similar to direct addressing, the operand in this case is an address. In this example, we assume this machine code is stored in memory at $F005 to $F007 and the PC equals $F007. We also assume memory location $3800 has the value $62.
PC
$F005 A
$62
RAM $37FF $3800 $62 $3801
EEPROM $F004 $F005 $B6 $F006 $38 $F007 $00
}ldaa $3800
The execution of the ldaa $3800 instruction requires four cycles. The first cycle reads the opcode, the second/third cycles read the operand, at this point the address is formulated and the fourth cycle reads the data. Next, the $62 is moved into Reg A and the instruction
2.5 䡲 Simple Addressing Modes
39
is complete. This execution will also cause the PC to increment to $F008, which will be next instruction. Opcode fetch R Operand fetch R Operand fetch R Fetch using EARR
0xF005 0xF006 0xF007 0x3800
0xB6 0x38 0x00 0x62
from from from from
EEPROM EEPROM EEPROM RAM
Phase 1 Phase 1 Phase 1 Phase 4
Common Error: It is wrong to assume the and operators affect the amount of data that is transferred. The and operators will affect the addressing mode, i.e., how the address is represented.
2.5.5 Indexed Addressing Mode
Figure 2.11 Example of the indexed addressing mode (before execution).
Indexed addressing mode uses a 16-bit pointer in a register to access memory and I/O devices. Typically RegX or RegY are used to access data, but one can also use this mode with SP or PC. Again, only the simplest indexed addressing mode is presented in this section, and remaining details will be presented later in Section 6.1. Figure 2.11 illustrates the execution of the ldaa 0,x instruction. The machine code for this instruction is $A6 $00. We will classify the $A6 as the opcode and $00 as the operand. The operand in this case specifies both the register to use and the offset. In this example, we assume this machine code is stored in memory at $F008 to $F009 and the PC equals $F008. We also assume RegX equals $3900 and memory location $3900 has the value $72. Notice that RegX contains an address, and memory location $3900 and RegA contain data. PC
$F008
X
$3900 A
$72
RAM $38FF $3900 $72 $3901
EEPROM $F008 $F009 $F00A $F00B
$A6 $00 $20 $F4
}ldaa 0,x }bra main
The execution of the ldaa 0,x instruction requires three cycles. The first cycle reads the opcode, the second cycle reads the operand, at this point the address is formulated. The 0,x operand produces an effective address equal to the address in Reg X. The third cycle reads the data from the address pointed to by RegX. Next, the $72 is moved into Reg A and the instruction is complete. This execution will also cause the PC to increment to $F00A, which will be bra main instruction. RegX will be unchanged. Opcode fetch R 0xF008 0xA6 from EEPROM Operand fetch R 0xF009 0x00 from EEPROM Fetch using EARR 0x3900 0x72 from RAM
Phase 1 Phase 1 Phase 4
r, where n is a In general, the n,r indexed mode produce an effective address of n fixed integer and r is register X, Y, SP, or PC. For example assume RegY equals $3A00, ldaa 6,y will read from address $3A06 putting the 8-bit data into RegA. RegY will be unchanged. Observation: Accesses to Registers A, B and CC transfer 8 bits, while accesses to Registers D, X, Y, SP, and PC transfer 16 bits regardless of the addressing mode.
2.5.6 PC Relative Addressing Mode
PC Relative addressing mode is used for the branch and branch to subroutine instructions. Stored in the machine code is not the absolute address of where to branch, but the 8-bit signed offset relative distance from the current PC value. The machine code for the bra main instruction is $20 $F4. In this example, we assume this machine code is stored in memory at $F00A to $F00B (see Figure 2.11) and the PC equals $F00A. The execution of the bra main instruction requires two cycles. The first cycle reads the op code, and the second cycle reads the operand, at this point the branch address is calculated by adding the
40
2 䡲 Introduction to Assembly Language Programming
offset to the PC. This execution will also cause the PC to change to $F000, which the instruction at main. Opcode fetch Operand fetch
R 0xF00A 0x20 from EEPROM R 0xF00B 0xF4 from EEPROM
Phase 1 Phase 1
When the branch address is being calculated the PC already points to the next instruction. Calculating relative offsets gives beginning students a lot of trouble, but lucky for us the assembler calculates it for us. It is explained here in order to better understand how the computer works, rather than being necessary for us to do while programming. The address of the next instruction is the location of current instruction plus the number of bytes in the machine code. In the above example, the branch instruction is located at address $F00A, therefore the instruction after the branch would have been at $F00C. The destination address $F000 is before the current instruction, which is called a backward jump. The operand field for PC relative addressing is an 8-bit value called rr, which is calculated using the equation (destination address) (location of instruction) (size of the instruction). Since the bra op code is one byte ($20) and the operand is one byte, this instruction requires two bytes and the rr field is $F000 $F00A 2 $F000 $F00C 12 $0C $F4 Details about negative numbers will be presented in the next chapter. Consider a second, different branch instruction. Assume this time the branch instruction is located at address $4000, and the destination address $4046 is after the current instruction, which is called a forward jump. The rr field for this example will be $4046 $4000 2 $4046 $4002 $44 and the object code for this instruction will be $2044. Common Error: Since not every instruction supports every addressing mode, it would be a mistake to use an addressing mode not available for that instruction. Observation: Some of the conditional branch instructions on the 9S12 require a different number of cycles to execute depending of whether or not the branch is taken. The cycle time when accessing external memory on a 9S12 depends on the speed of the external memory. It also depends on whether the address is an even number or an odd number. These facts complicate the task of predetermining how long a 9S12 program will take to execute. Observation: Relative addressing within a program block is essential for implementing relocatable code. Checkpoint 2.14: Give the machine code for the assembly code that branches to itself, causing an infinite loop, loop bra loop.
2.6
The Assembly Language Development Process To develop assembly language software, we first use an editor to create our source code. Source code contains specific set of sequential commands in human-readable-form. Next, we use an assembler to translate our source code into object code. Object code (or machine instructions) contains these same commands in machine-readable-form. Most assembly source code is one-to-one with the object code that is executed by the computer. For example, when programming in a high level language like Java, one line of a program often requires multiple machine instructions to execute. In contrast, one line of assembly code usually translates to exactly one machine instruction. The assembler also produces a listing file, which is a human-readable output showing the addresses, and object code that correspond to each line
2.7 䡲 Memory Transfer Operations
41
of the assembly program. When developing software for a real microcomputer, a loader is used to place the object code into computer memory. In an embedded system, the object code is usually stored in EEPROM, and the loader uses an EEPROM programmer to store the machine codes. The 9S12 microcontroller contains built-in features that assist in programming its EEPROM. In contrast, a general purpose computer places its programs in RAM, and the loader typically reads the object code from a hard drive or CD. Since RAM is volatile, the programs on a general purpose computer must be loaded each time power is removed. For both microcontrollers and general purpose computers, we test our program with the aid of a debugger. In the final product using a microcontroller, the power on reset is used to start software execution after power is supplied to the system. Figure 2.12 outlines the assembly language development process using a simulator. The process for developing systems on real hardware is identical except the simulated microcomputer is replaced with a real microcomputer, and the simulated external devices are replaced with real external devices. Either TExaS or Metrowerks CodeWarrior can be used to develop software for the 9S12. Both include an editor, assembler, and simulator. Furthermore, both can be used to download and debug software on a real 9S12. Metrowerks CodeWarrior has more features including a C compiler, whereas TExaS is easier to learn and allows you to simulate more external I/O devices such as LCD displays, keypads, IR remotes, DC motors, and stepper motors. Essentially, Metrowerks CodeWarrior is a full-featured commercial product, while TExaS an educational tool. Either way, the entire development process is contained in one application. Figure 2.12 Assembly language development process in Metrowerks Codewarrior or TExaS.
Editor Source code PTT DDRT cnt main off look loop
When developing assembly code, you will need access to the Programming Reference Manual for the 9S12. This can be obtained in print form or as a pdf file. Another resource containing a brief overview of each instruction and example usage can be found by searching the Contents page of the help engine included with the TExaS application.
2.7
Memory Transfer Operations To describe operations involving numbers we will use the following symbols w n u W N U
is a signed 8-bit 128 to 127 or unsigned 8-bit 0 to 255 is a signed 8-bit 128 to 127 is a unsigned 8-bit 0 to 255 is a signed 16-bit 32787 to 32767 or unsigned 16-bit 0 to 65535 is a signed 16-bit 32787 to 32767 is a unsigned 16-bit 0 to 65535
The 8-bit load instructions transfer data from memory into a register. In real life, when we move a box, push a broom, load a rifle, store spoons in a drawer or transfer to a new job,
42
2 䡲 Introduction to Assembly Language Programming
there is a single physical object and the action changes the location of that object. Assembly language uses these same verbs, but the action will be different. It creates a copy of the data and places the copy at the new location. In other words, since the original data still exists, there are now two copies of the information. If the address U is between 0 and $00FF, then direct addressing mode will be used. If the address U is between $0100 and $FFFF, it will use extended addressing mode. ldaa ldaa ldab ldab
#w U #w U
;RegA=w ;RegA=[U] ;RegB=w ;RegB=[U]
Load an 8-bit constant into RegA Load an 8-bit memory value into RegA Load an 8-bit constant into RegB Load an 8-bit memory value into RegB
The 16-bit load instructions also transfer information from memory into a register. Although D usually contains data and X, Y usually contain addresses, it is acceptable programming practice to place address or data information in any of these three registers. The stack pointer (called either S or SP) will always contain an address specifying the top of the stack. The program counter (PC) will always contain an address specifying the next instruction to execute. ldd ldd lds lds ldx ldx ldy ldy
Load a 16-bit constant into RegD Load a 16-bit memory value into RegD Load a 16-bit constant into RegS Load a 16-bit memory value into RegS Load a 16-bit constant into RegX Load a 16-bit memory value into RegX Load a 16-bit constant into RegY Load a 16-bit memory value into RegY
Observation: The lds #$4000 instruction is used to initialize the stack, where $3FFF is the last RAM address. Checkpoint 2.15: What is the difference between ldx #$0801 and ldx $0801? Checkpoint 2.16: What is the difference between the direct mode instruction ldx $12 and the extended mode instruction ldx $0012?
The 9S12 has two very convenient memory to memory move instructions. movb movb movw movw
Move an 8-bit constant into memory Move an 8-bit value memory to memory Move a 16-bit constant into memory Move a 16-bit value memory to memory
There are minor syntactical differences between TExaS and CodeWarrior. For the most part, assembly code in this book will be valid for both assemblers, like the third example: movb #100, $3800 movb #100, $3800 movb #100, $3800
set RAM to 100 (valid in TExaS) ;set RAM to 100 (valid in CodeWarrior) ;set RAM to 100 (valid in both)
The 8-bit store instructions move data from a register to memory. The data in the register remains intact, so after executing one of these instructions there are two copies of the data. staa stab
U U
;[U]=RegA ;[U]=RegB
Store RegA into memory Store RegB into memory
The 16-bit store instructions move from a register to memory. std sts stx sty
U U U U
;{U}=RegD ;{U}=RegS ;{U}=RegX ;{U}=RegY
Store RegD into memory Store RegS into memory Store RegX into memory Store RegY into memory
2.8 䡲 Subroutines
43
Checkpoint 2.17: Write assembly code that copies the 8-bit data from memory location $0810 to memory location $0820. Checkpoint 2.18: Write assembly code that writes the binary %11000111 to Port T. Observation: In most instructions, the size of the register will determine the size of the data when reading from or writing to memory.
2.8
Subroutines Subroutines, procedures and functions are programs that can be called to perform specific tasks. They are important conceptual tool because they allow us to develop modular software. The programming languages Pascal, Fortran, and Ada distinguish between functions, which return values, and procedures, which do not. On the other hand, the programming languages C, C, Java, and Lisp do not make this distinction, and treat functions and procedures as synonymous. Object-oriented programming languages use the term method to describe programs that are part of objects; it is also used in conjunction with type classes. In assembly language, we use the term subroutine for all subprograms whether or not they return a value. The rationale for developing subroutines will be developed in Chapter 5. However, in this section we will present a short introduction on the syntax for defining subroutines. In general, we initialize the stack pointer (SP) into RAM using the lds instruction, which is usually done once at the beginning of the program. We define a subroutine by giving it a name in the label field, followed by instructions, which when executed, perform the desired effect. The last instruction in a subroutine will be rts, which we use to return from the subroutine. In Program 2.1, we define the subroutine named Set. In assembly language, we will use either the bsr Set or jsr Set instruction to call this subroutine. At run time, the bsr and jsr instructions will push the return address on the stack. The return address is the location of the instruction immediately after the bsr/jsr instruction. At the end of the subroutine, the rts instruction will pull the return address from the stack, returning the program to the place from which the subroutine was called. More precisely, it returns to the instruction immediately after the instruction that performed the subroutine call.
Program 2.1 Listing file showing how to use the bsr and rts instructions to implement a subroutine.
$0800 $0800 $0801 $4000
$4000 $4006 $400B $400C $400F $4011 $FFFE $FFFE
180303E80801 180B010800 3D CF4000 07EF 20FE 400C
org $0800 ;Variables in RAM Flag rmb 1 Data rmb 2 org $4000 ;Programs in ROM ;*****Set************** ; Set Data=1000, and Flag=1 ; Input: None ; Output: None Set movw #1000,Data ;3 movb #1,Flag ;4 rts ;5 main lds #$4000 ;1 bsr Set ;2 loop bra loop ;6 org $fffe fdb main
Observation: Since the bsr instruction uses relative addressing, it can only be used to call a subroutine near the current instruction. Since the jsr instruction allows extended addressing, it can be used to call a subroutine anywhere in memory.
44
2 䡲 Introduction to Assembly Language Programming
There are two global variables in this example. The org $0800 pseudo-op will cause the variables to be allocated in RAM. Flag is an 8-bit variable, requiring one byte of storage, so the rmb 1 reserves one byte. The org $4000 pseudo-op will cause the program to be stored in ROM. Data is a 16-bit variable, requiring two bytes of storage, so the rmb 2 reserves two bytes. The subroutine, called Set, is shaded in Program 2.1. This routine sets the values of two global variables. The comments explain what the subroutine does, and the label Set defines the entry point, which is the place subroutine execution will begin. The main program calls the subroutine using the bsr Set instruction. The subroutine returns back to the main program using the rts instruction. The numbers in the comment field explain the sequence of execution.
Figure 2.13 The stack before and after execution of the bsr instruction.
Figure 2.13 shows the stack before and after the bsr instruction is executed. During the first two cycles of the bsr Set instruction it fetches the opcode and operand. After fetching the opcode and operand, the PC is $4011, which will be the return location. During the next two cycles, the effective address ($4000) is calculated. The last two cycles push the return address on the stack. Opcode fetch R Operand fetch R Stack store lsbW Stack store msbW
0x400F 0x4010 0x3FFF 0x3FFE
0x07 0xEF 0x11 0x40
Phase 1 Phase 1 Phase 6 Phase 6
from ROM from ROM to RAM to RAM
The rts instruction will return to the program that called the subroutine. Figure 2.14 shows the stack before and after the rts instruction is executed. During the first cycle it fetches the op code. The last two cycles pull the return address from the stack. Phase 1 Phase 4 Phase 4
Opcode fetch R 0x4009 0x3D from ROM Stack read msb R 0x3FFE 0x40 from RAM Stack read lsb R 0x3FFF 0x06 from RAM
Figure 2.14 The stack before and after execution of the rts instruction.
Input/Output This section is completely out of place. It really belongs back in Chapter 10 with the other input/output devices. However, I couldn’t wait until Chapter 10 to show you how much fun it is to write assembly code for a microcontroller. So in this section, I will take a short side step from the business of assembly language syntax to teach you how to connect switches and LEDs. We will use switches to input data and use LEDs to output results.
2.9.1 Direction Registers
Figure 2.15 The input/output direction of a bidirectional port is specified by its direction register.
There are many I/O ports available on the 9S12. One such 8-bit port is Port T (called PTT), which is associated with eight pins on the 9S12 chip. Each of the eight pins can be individually defined as input (allowing data to flow into the computer) or output (allowing data to flow out of the computer). Three possible hardware configurations for Port T are shown in Figure 2.15. PTT is located at address $0240, meaning if the software reads from address $0240 it will get logic levels on the input pins. If the software writes to address $0240, it will set the logic levels on the output pins. DDRT is the 8-bit direction register associated with Port T, and it is located at address $0242. Each one bit in the DDRT corresponds to one pin on the port. Table 2.6 defines the individual bits in these two registers. Setting a direction register bit to 1 makes the corresponding pin an output, and 0 makes the corresponding pin an input. If we set DDRT to $FF, then all eight pins are output. If we set DDRT to $00, then all eight pins are input.
9S12
9S12
PT7 PT6 PT5 PT4 PT3 PT2 PT1 PT0
DDRT=$00
9S12
PT7 PT6 PT5 PT4 PT3 PT2 PT1 PT0
DDRT=$0F
PT7 PT6 PT5 PT4 PT3 PT2 PT1 PT0
DDRT=$FF
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0240
PT7
PT6
PT5
PT4
PT3
PT2
PT1
PT0
PTT
$0242
DDRT7
DDRT6
DDRT5
DDRT4
DDRT3
DDRT2
DDRT1
DDRT0
DDRT
Table 2.6 Port T registers.
Checkpoint 2.19: What happens if we were to set DDRT to $F0?
When using a bidirectional I/O port, typically we set the direction register once at the beginning of our program. To make all pins input, we clear the direction register: movb #0,DDRT
;make all PTT pins an input
or in C you could execute DDRT = 0x00;
// make all PTT pins an input
The following software reads an input and saves the information into a variable called Happiness. ldaa PTT staa Happiness
46
2 䡲 Introduction to Assembly Language Programming
During the execution of the ldaa instruction, the computer will perform a read bus cycle to the address of the port. It is during this exact time that the values of the inputs are transferred into the computer. If a pin is programmed as an input, a write to that port has no effect on that pin. In C, an I/O port name on the right hand side of an assignment operator is implemented with an input port read access. Happiness = PTT;
// read PTT
If a pin is programmed as an output, a write to that port will set/clear that pin, and a read from that port will return the value that we had previously written. In C, an I/O port name on the left hand side of an assignment operator is implemented as an output port write access. PTT = 0x56;
// set PTT equal to $56
Example 2.1. Make PTT pins 7–4 input, and pins 3–0 output, then make PT3–PT0 output high. Solution First, we should set the direction register to $0F as illustrated in the middle circuit in Figure 2.15. This code is executed once, at the start of the system. ldaa #$0F staa DDRT
;PT7-PT4 inputs, PT3-PT0 outputs
or in C you could execute DDRT = 0x0F;
// PT7-PT4 inputs, PT3-PT0 outputs
The following software will set the outputs (pins 3–0) high, but have no effect on the inputs (pins 7–4). This software can be executed whenever we wish to make PT3–0 high. The bit 7–4 values in the $0F constant do not matter. ldaa #$0F staa PTT
;make PT3-0 high
or in C you could execute PTT = 0x0F;
2.9.2 Switch Interface
// make PT3-0 high
Input/output devices are critical components of an embedded system. The first input device we will study is the switch. It allows the human to input binary information into the computer. Typically we define the asserted state, or logic true, when the switch is pressed. Contact switches can also be used in machines to detect mechanical contact (e.g., two parts touching, paper present in the printer, or wheels on the ground etc.) A single pole single throw (SPST) switch has two connections (the connections are shown as little open circles in Figure 2.16. In a normally open switch (NO), the resistance between the connections is infinite (over 100 M on the B3F tactile switch) if the switch is not pressed and zero (under 0.1 on the B3F tactile switch) if the switch is pressed. To convert the infinite/zero resistance into a digital signal, we can use a pull-down resistor to ground or a pull-up resistor to 5V as shown in Figure 2.16. Notice that 10 k is 100,000 times larger than the onresistance of the switch and 10,000 times smaller than its off-resistance. Another way to choose the pull-down or pull-up resistor is to consider the input current of the 9S12 input pin. The current into the 9S12 will be less than 1A (shown as IIL and IIH in the data sheet). So, if the current into 9S12 is 1A, then the voltage drop across the 10 k resistor will be 0.01 V, which is negligibly small. With a pull-down resistor, the digital signal will be low if the switch is not pressed and high if the switch is pressed (Figure 2.16b). This is defined as positive logic because the asserted state is a logic high. Conversely, with a pull-up resistor,
2.9 䡲 Input/Output Figure 2.16 Switch interface.
+5V
+5V Not pressed
Pressed
9S12 low PT2 10kΩ high PT3
+5V
47
10kΩ +5V 10kΩ
high
9S12 PT2
low PT3
10kΩ (a) SPST switch
(b) Positive logic using a pull-down interface
(c) Negative logic using a pull-up interface
the digital signal will be high if the switch is not pressed and low if the switch is pressed (Figure 2.16c). This is defined as negative logic because the asserted state is a logic low. One of the complicating issues with mechanical switches is they can bounce (oscillate on and off) when touched and when released. The contact bounce varies from switch to switch and from time to time, but usually bouncing is a transient event lasting less than 5 ms. We can eliminate the effect of bounce if we design software that waits at least 10 ms between times we read the switch values.
2.9.3 LED Interface
Figure 2.17 Positive logic LED interface (Lite-On LTL-10223W).
A light emitting diode (LED) emits light when an electric current passes through it. LEDs have polarity, meaning current must pass from anode to cathode to activate. The anode is labeled a or , and cathode is labeled k or . The cathode is the short lead and there may be a slight flat spot on the body of round LEDs. Thus, the anode is the longer lead. The brightness of a LED depends on the applied electrical power (PI*V). Since the LED voltage is approximately constant in the active region (see Figure 2.17a), we can establish the desired brightness by setting the current. current 40 30 (mA) 20 10 0
+ a
I
+5V 260Ω
9S12 I
– k voltage 1.6 2.0 2.4 V (volts)
(a) LED curve
high
(b) PT5 is high
+5V 260Ω
9S12
LED
7405 PT5
R 10mA
0.5V
LED
7405 PT5
low
R 0mA
off
(c) PT5 is low
Normally, we connect a microcomputer output to the LED as shown in Figure 2.17. In this way the software can control the LED state. If the output current of the external device is greater than or equal to 10 mA, we can use a driver (e.g., 7405) between the 9S12 and the device. When the software writes a logic 1 to the output port, the input to the 7405 becomes high, output from the 7405 becomes low, 10 mA travels through the LED, and the LED is on. When the software writes a logic 0 to the output port, the input to the 7405 becomes low, output from the 7405 floats (neither high or low), no current travels through the LED, and the LED is dark. The value of the resistor is selected to create the proper LED current. When active, the LED voltage will be about 2 V, and the power delivered to the LED will be controlled by its current. If the desired brightness requires an operating point of 1.9 V at 10 mA, then the resistor value should be R
5 Vd VOL 5 1.9 0.5 260 Id 0.01
where Vd, Id is the desired LED operating point, and VOL is the output low voltage of the LED driver. If we use a standard resistor value of 220 in place of the 260 , then the
48
2 䡲 Introduction to Assembly Language Programming
current will be (5 1.9 0.5 V)/220 , which is about 12 mA. This slightly higher current is usually acceptable. Checkpoint 2.20: What resistor value in Figure 2.17 is needed if the desired LED operating point is 1.7 V and 5 mA?
When the LED current is much less than 10 mA, we can interface it directly to an output pin without using a driver. The LED shown in Figure 2.18 has an operating point of 1.7 V and 1 mA. For the positive logic interface (Figure 2.18b) we calculate the resistor value based on the desired LED voltage and current R
VOH Vd 4.2 1.6 2.6 k Id 0.001
where VOH is the output high voltage of the 9S12 output pin. Negative logic means the LED is activated when the software outputs a zero. For the negative logic interface (Figure 2.18c) we use a similar equation to determine the resistor value R
5 Vd VOL 5 1.6 0.8 2.6 k Id 0.001
where VOL is the output low voltage of the 9S12 output pin.
Figure 2.18 Low current LED interface (Agilent HLMP-D150).
current 2
+ a
(mA) 1
– k
I
I
voltage
0
PT5 9S12
high 2.6kΩ
R 1mA
(a) LED curve
PT5 (b) Positive logic interface
R 1mA LED
LED
1.5 1.6 1.7 V (volts)
+5V 2.6kΩ
9S12
low
(c) Negative logic interface
If we use a standard resistor value of 2.7 k in place of the 2.6 k, then the current will be (5 1.6 0.8V)/2.7 k, which is about 0.96 mA. This slightly lower current is usually acceptable. Checkpoint 2.21: What resistor value in Figure 2.18 is needed if the desired LED operating point is 1.7 V and 2 mA? Observation: Using standard resistor values will make our product less expensive and easier to obtain parts.
Example 2.2. Build a system with three LEDs that flash a rotating sequence 100,010,001 over and over. Solution We will use low current LEDs because they are cheaper and easier to interface. We need three output pins, so we will use PT2, PT1, and PT0 (any three port pins would have been ok). Using the design method in Figure 2.18, we build three positive logic LED circuits as shown in Figure 2.19.
2.9 䡲 Input/Output Figure 2.19 Hardware solution to Example 2.2 (Agilent HLMP-D150).
49
PT2 9S12 PT1 PT0 2.7kΩ
2.7kΩ LED
2.7kΩ LED
LED
The software will first make PT2, PT1, and PT0 outputs by setting the direction register to 7. When running on a real board we allow the debugger by executing the cli instruction. The main loop outputs to Port T the sequence 4, 2, 1, . . . . The “over and over” action is created by the bra loop instruction, which causes the three PTT outputs to be repeated. There are no global variables needed in this solution, but if variables were needed they would go in RAM. We initialize the stack to the last location of RAM (and the stack grows down to smaller addresses), and the program is placed in flash EEPROM. In assembly language, we must explicitly specify where in memory to place the various components of our software. The last two lines of assembly version of Program 2.2 are called the reset vector, which is used to specify where the computer begins execution after power-on or after a reset. When programming in a high level language like C, we define memory allocation using compiler specific settings.
; 9S12DP512 PTT equ $0240 DDRT equ $0242 org $08001
main
loop
org lds ldaa staa cli ldaa staa ldaa staa ldaa staa bra org fdb
;RAM
$4000 ;EEPROM #$4000 ;SP=>RAM #$07 DDRT ;PT2,PT1,PT0 outputs ;allow debugger #4 PTT ;output #2 PTT ;output #1 PTT ;output loop $FFFE ;EEPROM main ;reset vector
// 9S12DP512 void main(void){ DDRT = 0x07; // PT2,PT1,PT0 outputs asm cli while(1){ PTT = 0x04; PTT = 0x02; PTT = 0x01; } }
Program 2.2 Software solution to Example 2.2.
To better understand how the computer translates our program into actions, we will analyze the explicit actions that occur as Program 2.2 executes. After typing in our source code, we will assemble the program generating the machine code and listing file. Program 2.3 is the listing file for Program 2.2. Line numbers were manually added to show instructions that will be executed. Looking at the big picture, we see that lines 1 through 4 are executed once to initialize the system, and lines 5 through 11 are repeated over and over as the system produces the infinite output sequence 4-2-1-4-2-1-4-2-1- . . . 1
; 9S12DP512 PTT equ $0240 DDRT equ $0242 org $0800 ;RAM org $4000 ;EEPROM main lds #$4000 ;SP=>RAM *Line ldaa #$07 *Line staa DDRT ;PT2,PT1,PT0 *Line cli ;allow debugger*Line loop ldaa #4 *Line staa PTT ;output *Line ldaa #2 *Line staa PTT ;output *Line ldaa #1 *Line staa PTT ;output *Line bra loop *Line org $FFFE ;EEPROM fdb main ;reset vector
1 2 3 4 5 6 7 8 9 10 11
The machine code is programmed into the flash EEPROM. In particular, locations $4000 to $401A will contain the machine code for this program, and locations $FFFE to $FFFF will always contain the reset vector, as shown in Figure 2.20.
Figure 2.20 Memory model of Program 2.2. After reset, PC$4000.
When power is applied to the system, or when the reset button is pushed, the computer reads the 16-bit number from location $FFFE and $FFFF and places it into the PC. This defines the place the program will begin execution. In this example, the software will start executing at $4000. Lines 1 through 4:
These four lines perform the initialization sequence. Executing the lds instruction will initialize the stack pointer. Although not specifically used in this example, the stack is an important structure and should be initialized in this manner for all our 9S12 software. During the execution of each instruction, the PC is incremented to the next instruction. Executing the ldaa instruction will set register A equal to $07. Since this is immediate mode addressing, the data can be found in the machine code itself. Since it is immediate mode,
2.10 䡲 Tutorial 2. Running with TExaS
51
the data will be fixed, and can only be changed by editing the source code and reassembling the program. Executing the staa instruction will set DDRT equal to $07. Since this is extended mode addressing, the machine code contains the address of DDRT, $0242. DDRT specifies whether each pin of Port T is an input or an output. This store instruction produces a write cycle to address $0242 with data $07, causing PT2, PT1 and PT0 to become output pins. Notice that the load instructions bring data from memory or a port into a register, and the store instructions send data from a register out to memory or a port. Executing the cli instruction will enable interrupts. Although this program not specifically use interrupts, the debugger needs to have interrupts enabled.
Lines 5 through 10:
Line 11:
2.10
Address
Object code
Source code
Action
After completion
$4000 $4003 $4005 $4008
$CF4000 $8607 $7A0242 $10EF
lds #$4000 ldaa #$07 staa DDRT cli
SP=$4000 A=$07 DDRT=$07 I=0
PC=$4003 PC=$4005 PC=$4008 PC=$400A
These lines perform the body of the program, causing the 4-2-1 output sequence. Executing the ldaa instructions will set register A equal to a constant. The # symbol specifies immediate mode addressing, the constant data can be found in the machine code itself. Executing the staa instructions will set PTT. This is extended mode addressing, therefore the machine code contains the address of PTT, $0240. The store staa instructions produce write cycles to address $0240. When you write 1/0 binary data to Port T, the high/low digital voltages occur on the corresponding output pins. In this example, each staa instruction sets a new output on PTT, and notice the sequence will be 4-2-1, as desired. Address
This line causes the execution of the body of the program to occur over and over. The bra instruction uses PC-relative addressing. During the fetching of the two bytes of machine code, the PC is incremented twice, changing it from $4019 to $401B. The PC-relative offset, $EF is sign extended to $FFEF, which means -17. This is an unconditional branch, so PC PC-17 (or $401B$FFEF), setting PC back to line 5. Address
Object code
Source code
$4019
$20EF
bra loop
Action
After completion PC=$400A
Tutorial 2. Running with TExaS This tutorial explains some of the debugging features available with TExaS. A vast amount of information exists as the computer executes software. A good debugger allows us to selectively filter this information, showing us only data relevant to problem at hand. There are two aspects of this filter: what information will we see? and when (or how often) will it be collected? The run mode allows us to adjust the level of detail observable during the simulation. Action: Watch the second movie, called Lesson 2. Lesson 2 is located on the web at http://users.ece.utexas.edu/~valvano/Readme.htm. This lesson introduces some of the debugging features. It takes about 11 minutes and provides a narrated overview of debugging within TExaS. You need not install TExaS, just download and run the Windows media file.
52
2 䡲 Introduction to Assembly Language Programming Question 2.1. A good debugger allows us to filter data that we observe. What are the two aspects of this filtering? I.e., in what two ways does the debugger filter data? Question 2.2. What format code do we use in the ViewBox to see a variable in 8-bit unsigned decimal? Question 2.3. What does CycleView mode do? Question 2.4. What does InstructionView mode do? Question 2.5. What does LogRecord mode do? Question 2.6. What is a ScanPoint?
2.11
Homework Assignments Homework 2.1 What are the differences between the following four instructions: ldaa 10 ldaa #10 ldaa $10 ldaa #$10 Homework 2.2 What is the difference between the following two instructions: ldaa #10 ldx #10 Homework 2.3 Identify the addressing mode used in each of the following instructions: staa 200 staa 2000 staa 200,x staa 2000,x bra 2000 jmp 2000 Homework 2.4 Identify the addressing mode used in each of the following instructions: subd 2,x clra ldaa #$36 ldd $3800 bra loop Homework 2.5 You will need to look up the address of Ports A and J in your data sheet to answer this question. Identify the addressing mode used in each of the following instructions: cli subd #0 bsr $5000 jsr $5000 ldy 2,y ldaa PTJ ;Port J stab PORTA ;Port A rts The next three homework assignments in this chapter involve hand assembly. Pass1 contains three steps. The first step is to determine addressing mode for each instruction. Next, you calculate the object code size for the instruction. The third step is to create the symbol table. Pass2 contains two steps. The first step is to determine the object code for each instruction, and the second step write the listing (address, data) for each line. Homework 2.6 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. DDRH equ $0262 ; Port H Data Direction Register DDRT equ $0242 ; Port T Data Direction Register PTH equ $0260 ; Port H I/O Register
2.11 䡲 Homework Assignments PTT Main
loop
equ org ldaa staa ldaa staa ldaa staa bra org fdb
$0240 $4000 #$FF DDRT #$00 DDRH PTH PTT loop $FFFE Main
53
; Port T I/O Register ; Object code goes in EEPROM ; Port T is output ; ; ; ;
Port H is input Read inputs Set output Repeat
; Starting address after a RESET
Homework 2.7 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. DDRP equ $025A ; Port P Data Direction Register PTP equ $0258 ; Port P I/O Register org $0800 ; Variables go in RAM Data rmb 1 org $4000 ; Object code goes in EEPROM Main movb #$00,DDRP ; Port P is input loop ldaa PTP ; Read inputs staa Data ; Save in variable bra loop ; Repeat org $FFFE fdb Main ; Starting address after a RESET Homework 2.8 Hand assemble the following program. Include the symbol table, the address and machine code in hexadecimal for each instruction. org $0800 ; Variables go in RAM Data rmb 1 org $4000 ; Object code goes in EEPROM Main lds #$4000 ; Initialize stack movb #10,Data ; Data=10 loop bsr Add1 bra loop ; Repeat Add1 ldaa Data inca ; Add one staa Data rts org $FFFE fdb Main ; Starting address Homework 2.9 During an 8-bit memory read bus cycle to address $3800, what memory locations are modified? During an 8-bit memory write bus cycle to address $3800, what memory locations are modified? Homework 2.10 Consider this assembly instruction Here bsr Lookup ;call Lookup function For each of the addresses listed below, give the machine code for the instruction and the value pushed on the stack when the instruction is executed. If it is not possible to assemble this instruction, state “not possible”. Here Lookup machine code value pushed $4040 $4060 $5050 $5020 $5050 $4060 Homework 2.11 Consider this assembly instruction Here jsr Lookup ;call Lookup function
54
2 䡲 Introduction to Assembly Language Programming For each of the addresses listed below, give the machine code for the instruction and the value pushed on the stack when the instruction is executed. If it is not possible to assemble this instruction, state “not possible”. Here Lookup machine code value pushed $4040 $4060 $5050 $5020 $5050 $4060 Homework 2.12 Assume RegX is $3800, RegD is $4647, the PC is $4123, and RAM locations $3800 to $38FF are initially $00, $01, . . . $FF respectively. E.g., location $3856 contains $56. Show the simplified bus cycles occurring when the ldd 2,x instruction is executed. Specify which registers get modified during each cycle, and the corresponding new values. Do not worry about changes to the CCR. Just show the one instruction. $4123 EC02
ldd 2,x
Homework 2.13 Assume PC is $4120, and the SP is initially $3FF4. Show the simplified bus cycles occurring when the bsr instruction is executed. Specify which registers get modified during each cycle, and the corresponding new values. Do not worry about changes to the CCR. Just show the one instruction. $4120 07F0
bsr MyFunction
Homework 2.14 What does the effective address register contain? Homework 2.15 What is the purpose of the following registers CCR SP PC IR EAR? Homework 2.16 Show the simplified bus cycles generated by the execution of the following program. The first step is to find the object code for the three instructions, and the second step is to break each instruction into individual bus cycles required to execute it. org $F000 ldaa #44 ldy #$0010 staa 4,y Homework 2.17 Show the simplified bus cycles generated by the execution of the following program. The first step is to find the object code for the three instructions, and the second step is to break each instruction into individual bus cycles required to execute it. org $F000 ldab #$55 ldx #$0020 stab 5,x Homework 2.18 The following data is stored in sequential memory locations. Determine the sequence of memory instructions this data represents. org $F000 fcb $86,$55,$CE,$02,$50,$F6,$F0,$00,$5A,$01,$6B,$08,$20,$FA Homework 2.19 The following data is stored in sequential memory locations. Determine the sequence of memory instructions this data represents. Each value is in hexadecimal. org $4000 fcb $87,$CE,$02,$40,$F6,$40,$01,$5A,$08,$54,$6B,$02,$20,$FB Homework 2.20 Write an assembly language subroutine that initializes Port J bits 5, 4, 1, 0 to outputs and bits 7, 6, 3, 2 to input. Make all Port H bits input, and all Port T bits output. Homework 2.21 Write an assembly language subroutine that initializes Port T bits 7, 4, 3, 0 to outputs and bits 6, 5, 2, 1 to input. Make all Port M bits outputs. Homework 2.22 Write an assembly language software that initializes Port T bit 3 to an output. All other bits are input. Homework 2.23 Write an assembly language software that initializes Port H bit 1 to an input. All other bits are output. Homework 2.24 Write assembly software that makes Port T bits 1, 3, 5, and 7 outputs and the rest inputs.
2.12 䡲 Laboratory Assignments
55
Homework 2.25 Interface a LED that requires 1 mA at 2.5 V. A digital output high on PT0 turns on the LED. Homework 2.26 Interface a LED that requires 2 mA at 2.0 V. A digital output low turns PT1 on the LED. Homework 2.27 Interface a LED that requires 15 mA at 2.5 V. Use a 7405 driver and a current limiting resistor. A digital output high on PT2 turns on the LED. The 7405 output voltage VOL is 0.5 V. Homework 2.28 Interface a LED that requires 30 mA at 1.5 V. Use a 7406 driver and a current limiting resistor. A digital output high PT3 turns on the LED. The 7406 output voltage VOL is 0.5 V.
2.12
Laboratory Assignments For each lab in this chapter, you will have two binary switch inputs and one LED output. The LED represents the output, and the operator will toggle the switches in order to set the inputs. Let T be the Boolean variable representing the output (0 means LED is off and output is zero, 1 means LED is on and the output is 1). Let H and J be Boolean variables representing the state of the two switches (0 means the switch is not pressed, and 1 means the switch is pressed). Use the TExaS simulator to create three files. Lab2.rtf will contain the assembly source code. Lab2.uc will contain the microcomputer configuration. Lab2.io will define the external connections, which should be the two switches and one LED. Use the Mode-Processor command to select the desired processor. You should connect switches to PH0 (means Port H bit 0) and to PJ0 (means Port J bit 0). You should connect an LED to PT0 (means Port T bit 0). The switches should be labeled H and J, and the LED should be labeled T. When H switch is “off” or open position, the signal at PH0 will be 0 V, which is a logic “0”. For this situation, your software will consider H to be false. When the H switch is “on” or closed position, the signal at PH0 will be 5 V, which is a logic “1”. In this case, your software will consider H to be true. The J switch, which is connected to PJ0, will operate in a similar fashion. When your software writes a “1” to PT0, the LED will turn on. You will write assembly code that inputs from PH0 and PJ0, and outputs to PT0. A template structure for your assembly program is shown as Program 2.2. To solve this lab you will need the ldaa staa anda coma and bra instructions. You can use the movb instruction if you wish. You can copy and paste the address definitions for ports H, J, and T from the port12.rtf file. In particular, you will need to define DDRH DDRJ DDRT PTH PTJ and PTT. The opening comments include: file name, overall objectives, hardware connections, specific functions, author name, and date. The equ pseudo-op is used to define port addresses. Global variables are declared in RAM, and the main program is placed in EEPROM. The 16-bit contents at $FFFE and $FFFF define where the computer will begin execution after a reset vector. Lab 2.1 The specific device you will create is a digital NAND with two binary switch inputs and one LED output. The specific function you will implement is T = H&J This means the output will be zero if and only if both the H switch and the J switch are pressed. Program L2.1 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary.
Program L2.1 The C program to illustrate Lab 2.1.
void main(void){ DDRH = 0x00; // make Port DDRJ = 0x00; // make Port DDRT = 0xFF; // make Port while(1){ PTT = ~(PTJ&PTH); // LED off } }
H an input, PH0 is H J an input, PJ0 is J T an output, PT0 is T iff PJ0=1 and PH0=1
56
2 䡲 Introduction to Assembly Language Programming Lab 2.2 The specific device you will create is a digital NOR with two binary switch inputs and one LED output. The specific function you will implement is T = H&J This means the output will be one if and only if both the H switch and the J switch are not pressed. Program L2.2 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary.
Program L 2.2 The C program to illustrate Lab 2.2.
void main(void){ DDRH = 0x00; // make DDRJ = 0x00; // make DDRT = 0xFF; // make while(1){ PTT = (~PTJ)&(~PTH); // } }
Port H an input, PH0 is H Port J an input, PJ0 is J Port T an output, PT0 is T LED on iff PJ0=0 and PH0=0
Lab 2.3 The specific device you will create is a digital lock with two binary switch inputs and one LED output. The LED output represents the lock, and the operator will toggle the switches in order to unlock the door. The specific function you will implement is T = H&J This means the LED will be on if and only if the H switch is pressed and the J switch is not pressed. Program L2.3 describes the software algorithm in C. Notice that this algorithm affects all bits in a port, although only one bit is used. In general, this will be unacceptable, and a better solution would have been to write code that affects only the bits necessary. Program L 2.3 The C program to illustrate Lab 2.3.
void main(void){ DDRH = 0x00; // make Port H an input, DDRJ = 0x00; // make Port J an input, DDRT = 0xFF; // make Port T an output, while(1){ PTT = (~PTJ)&PTH; // LED on iff PJ0=0 and } }
PH0 is H PJ0 is J PT0 is T PH0=1
3
Representation and Manipulation of Information Chapter 3 objectives are: c c c c c c
Introduce the concept of how numbers are stored on the computer Discuss how characters are represented Define terms like precision and basis Review arithmetic and logic operations Explain the usage of condition code bits Develop mechanisms to convert between character strings and binary numbers
Numbers, like all information, are stored on the computer in binary form. On most computers, the memory is organized into 8-bit bytes. This means each 8-bit byte stored in memory will have a separate address. In this chapter we will learn about unsigned numbers, signed numbers, characters, and how to perform basic logical and arithmetic calculations. In order to develop reliable systems it is important to understand how the computer can make mistakes during calculations. With this knowledge, we can write software that detects when an error occurs, or better yet, we can write software that does not make mistakes.
3.1
Precision Precision is the number of distinct or different values. We express precision in alternatives, decimal digits, bytes, or binary bits. Alternatives are defined as the total number of possibilities. For example, an 8-bit number format can represent 256 different numbers. An 8-bit digital to analog converter (DAC) can generate 256 different analog outputs. An 8-bit analog to digital converter (ADC) can measure 256 different analog inputs. Table 3.1 illustrates the relationship between precision in binary bits and precision in alternatives.
Table 3.1 Relationship between bits, bytes and alternatives as units of precision.
3 䡲 Representation and Manipulation of Information
The operation [[x]] is defined as the greatest integer of x. E.g., [[2.1]] [[2.9]] and [[3.0]] are all equal to 3. The Bytes column in Table 3.1 specifies how many bytes of memory would it take to store a number with that precision assuming the data were not packed or compressed in any way. Checkpoint 3.1: How many bytes of memory would it take to store a 50-bit number?
Decimal digits are used to specify precision of measurement systems that display results as numerical values, as defined in Table 3.2. A full decimal digit can be any value 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. A digit that can be either 0 or 1 is defined as a 1⁄2 decimal digit. The terminology of a 1⁄2 decimal digit did not arise from a mathematical perspective of precision, but rather it arose from the physical width of the LED/LCD module used to display a blank or ‘1’as compared to the width of a full digit. Notice in Figure 3.1 that the 7-segment modules capable of displaying 0 to 9 are about 1 cm wide; however, the corresponding 2-segment modules capable of being blank or displaying a 1 are about half as wide. Similarly, we define a digit that can be or also as a half decimal digit, because it has two choices. A digit that can be 0, 1, 2, 3 is defined as a 3⁄4 decimal digit, because it is wider than a 1⁄2 digit but narrower than a full digit. We also define a digit that can be 1, 0, 0, or 1 as a 3⁄4 decimal digit, because it also has 4 choices. We use the expression 41⁄2 decimal digits to mean 20,000 alternatives and the expression 43⁄4 decimal digits to mean 40,000 alternatives. The use of a 1⁄2 decimal digit to mean twice the number of alternatives or one additional binary bit is widely accepted. On the other hand, the use of a 3⁄4 decimal digit to mean four times the number of alternatives or two additional binary bits is not as commonly accepted. For example, consider the two ohmmeters shown in Figure 3.1. As illustrated in the figure, both are set to the 0 to 200 k range. The 31⁄2 digit ohmmeter has a resolution of 0.1 k with measurements ranging from 0.0 to 199.9 k. On the other hand, the 41⁄2 digit ohmmeter has a resolution of 0.01 k with measurements ranging from 0.00 to 199.99 k. Table 3.2 Definition of decimal digits as a unit of precision.
Decimal Digits
Alternatives
3 31⁄2 33⁄4 4 41⁄2 43⁄4 n n1⁄2 n3⁄4
1000 2000 4000 10000 20000 40000 10n 2•10n 4•10n
Observation: A good rule of thumb to remember is 210•n ⬇ 103•n. Figure 3.1 Two ohmmeters: the one on the left has 31⁄2 decimal digits and the one on the right has 41⁄2.
3.2 䡲 Boolean Information
59
Checkpoint 3.2: How many binary bits is equivalent to 31⁄2 decimal digits? Checkpoint 3.3: About how many decimal digits is 64 binary bits? You can answer this without a calculator, just using the “rule of thumb”.
A great deal of confusion exists over the abbreviations we use for large numbers. In 1998 the International Electrotechnical Commission (IEC) defined a new set of abbreviations for the powers of 2, as shown in Table 3.3. These new terms are endorsed by the Institute of Electrical and Electronics Engineers (IEEE) and International Committee for Weights and Measures (CIPM) in situations where the use of a binary prefix is appropriate. The confusion arises over the fact that the mainstream computer industry, such as Microsoft, Apple, and Dell, continues to the old terminology. According to the companies that market to consumers, a 1 GHz is 1,000,000,000 Hz but 1 Gbyte of memory is 1,073,741,824 bytes. The correct terminology is to use the SI-decimal abbreviations to represent powers of 10, and the IEC-binary abbreviations to represent powers of 2. The scientific meaning of 2 kilovolts is 2000 volts, but 2 kibibytes is the proper way to specify 2048 bytes. The term kibibyte is a contraction of kilo binary byte and is a unit of information or computer storage, abbreviated KiB. 1 KiB 210 bytes 1024 bytes 1 MiB 220 bytes 1,048,576 bytes 1 GiB 230 bytes 1,073,741,824 bytes These abbreviations can also be used to specify the number of binary bits. The term kibibit is a contraction of kilo binary bit, and is a unit of information or computer storage, abbreviated Kibit. 1 Kibit 210 bits 1024 bits 1 Mibit 220 bits 1,048,576 bits 1 Gibit 230 bits 1,073,741,824 bits A mebibyte (1 MiB is 1,048,576 bytes) is approximately equal to a megabyte (1 MB is 1,000,000 bytes), but mistaking the two has nonetheless led to confusion and even legal disputes. In the engineering community, it is appropriate to use terms that have a clear and unambiguous meaning. Checkpoint 3.4: A 2 tebibyte storage system can store how many bytes?
Table 3.3 Common abbreviations for large numbers.
3.2
Value
SI
Decimal
Value
IEC
Binary
10001 10002 10003 10004 10005 10006 10007 10008
k M G T P E Z Y
kilomegagigaterapetaexazettayotta-
10241 10242 10243 10244 10245 10246 10247 10248
Ki Mi Gi Ti Pi Ei Zi Yi
kibimebigibitebipebiexbizebiyobi-
Boolean Information A Boolean number is has two states. The two values represent logical true and false. In Chapter 1, we defined positive logic so that true is a 1 or high, and false is a 0 or low. In C programming, a false is represented by a zero, and a true as any non-zero value. If you
60
3 䡲 Representation and Manipulation of Information
were controlling a motor, light, heater, or air conditioner, the Boolean could mean on or off. Figure 3.2 shows the simulation using TExaS of a simple switch connected to PC0 that has two states and a LED that can be on or off. PB0 is a digital output of the microcomputer, which can be either high or low. The output of the LED driver is low or HiZ (shown as z in Figure 3.2.) In communication systems, we represent the information as a sequence of Booleans: mark or space. For black or white graphic displays we use Booleans to specify the state of each pixel. The most efficient storage of Booleans on a computer is to map each Boolean into one memory bit. In this way, we can pack eight Booleans into each byte. If we have just one Boolean to store in memory, out of convenience we allocate an entire byte for it. A common positive logic definition for Boolean information is: False is defined as all zeros, and True is defined as any nonzero value. Figure 3.2 External to the microcomputer, Boolean information is encoded as voltage (0 or 5 V), position of a switch (off, on), and the presence of light (dark, light).
Checkpoint 3.5: Given an example of a switch that is not binary.
In negative logic, the absence of a voltage is the true or asserted state. The presence of a voltage is called the false or not asserted state. In other words, the 0 or low voltage means true, and the 5 or high voltage means false. RS232 serial communication uses a negative logic encoding where 12 V means true, and 12 V means false. More about serial interfacing can be found in Chapters 8 and 12.
3.3
8-bit Numbers We saw 8-bit and 16-bit numbers in Chapter 2, but more formal definitions will be presented in the next few sections. A byte contains 8 bits as shown in Figure 3.3, where each bit b7, . . . , b0 is binary and has the value 1 or 0. We specify b7 as the most significant bit or MSB, and b0 as the least significant bit or LSB.
Figure 3.3 8-bit binary format.
b7 b6 b5 b4
b3 b2 b1 b0
If a byte is used to represent an unsigned number, then the value of the number is N 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Notice that the significance of bit n is 2n. There are 256 different unsigned 8-bit numbers. The smallest unsigned 8-bit number is 0 and the largest is 255. For example, %00001010 is 8 2 or 10. Other examples are shown in Table 3.4. The least significant bit can tell us if the number is even or odd.
3.3 䡲 8-bit Numbers Table 3.4 Example conversions from unsigned 8-bit binary to hexadecimal and to decimal.
61
Binary
Hex
Calculation
Decimal
%00000000 %01000001 %00010110 %10000111 %11111111
$00 $41 $16 $87 $FF
641 1642 128421 1286432168421
0 65 22 135 255
Checkpoint 3.6: Convert the binary number %01101010 to unsigned decimal. Checkpoint 3.7: Convert the hex number $45 to unsigned decimal.
The basis of a number system is a subset from which linear combinations of the basis elements can be used to construct the entire set. The basis represents the “places” in a “placevalue” system. For positive integers, the basis is the infinite set {1, 10, 100, . . .}, and the “values” can range from 0 to 9. Each positive integer has a unique set of values such that the dot-product of the value vector times the basis vector yields that number. For example, 2345 is ( . . . , 2,3,4,5)•(. . . , 1000,100,10,1), which is 2*10003*1004*105. For the unsigned 8-bit number system, the basis is {1, 2, 4, 8, 16, 32, 64, 128} The values of a binary number system can only be 0 or 1. Even so, each 8-bit unsigned integer has a unique set of values such that the dot-product of the values times the basis yields that number. For example, 69 is (0,1,0,0,0,1,0,1)•(128,64,32,16,8,4,2,1), which equals 0*1281*640*320*160*81*40*21*1. Conveniently, there is no other set of 0’s and 1’s, such that set of values multiplied by the basis is 69. One way for us to convert a decimal number into binary is to use the basis elements. The overall approach is to start with the largest basis element and work towards the smallest. More precisely, we start with the most significant bit and work towards the least significant bit. One by one, we ask ourselves whether or not we need that basis element to create our number. If we do, then we set the corresponding bit in our binary result and subtract the basis element from our number. If we do not need it, then we clear the corresponding bit in our binary result. We will work through the algorithm with the example of converting 100 to 8-bit binary, see Table 3.5. We start with the largest basis element (in this case 128) and ask whether or not we need to include it to make 100? Since our number is less than 128, we do not need it, so bit 7 is zero. We go the next largest basis element, 64 and ask, “do we need it?” We do need 64 to generate our 100, so bit 6 is one and we subtract 100 minus 64 to get 36. Next, we go the next basis element, 32 and ask, “do we need it?” Again, we do need 32 to generate our 36, so bit 5 is one and we subtract 36 minus 32 to get 4. Continuing along, we do not need basis elements 16 or 8, but we do need basis element 4. Once we subtract the 4, are working result is zero, so basis elements 2 and 1 are not needed. Putting it together, we get %01100100 (which means 64324).
Table 3.5 Example conversion from decimal to unsigned 8-bit binary to hexadecimal.
Number
Basis
Need It?
Bit
Operation
100 100 36 4 4 4 0 0
128 64 32 16 8 4 2 1
no yes yes no no yes no no
bit 70 bit 61 bit 51 bit 40 bit 30 bit 21 bit 10 bit 00
3 䡲 Representation and Manipulation of Information Checkpoint 3.8: In this conversion algorithm, how can we tell if a basis element is needed? Observation: If the least significant binary bit is zero, then the number is even. Observation: If the right-most n bits (least significant) are zero, then the number is divisible by 2n. Observation: Bit 7 of an 8-bit number determines whether its value is greater than or equal to 128. Checkpoint 3.9: Give the representations of the decimal 45 in 8-bit binary and hexadecimal. Checkpoint 3.10: Give the representations of the decimal 200 in 8-bit binary and hexadecimal.
One of the first schemes to represent signed numbers was called one’s complement. It was called one’s complement because to negate a number, we complement (logical not) each bit. For example, if 25 equals 00011001 in binary, then 25 is 11100110. An 8-bit one’s complement number can vary from 127 to 127. The most significant bit is a sign bit, which is 1 if and only if the number is negative. The difficulty with this format is that there are two zeros 0 is 00000000, and 0 is 11111111. Another problem is that ones complement numbers do not have basis elements. These limitations led to the use of two’s complement. The two’s complement number system is the most common approach used to define signed numbers. It is called two’s complement because to negate a number, we complement each bit (like one’s complement), then add 1. For example, if 25 equals 00011001 in binary, then 25 is 11100111. If a byte is used to represent a signed two’s complement number, then the value of the number is N 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Observation: One usually means two’s complement when one refers to signed integers.
There are 256 different signed 8-bit numbers. The smallest signed 8-bit number is 128 and the largest is 127. For example, %10000010 equals 1282 or 126. Other examples are shown in Table 3.6. Checkpoint 3.11: Convert the signed binary number %11101010 to signed decimal. Checkpoint 3.12: Are the signed and unsigned decimal representations of the 8-bit hex number $45 the same or different?
For the signed 8-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128} Observation: The most significant bit in a two’s complement signed number will specify the sign.
Table 3.6 Example conversions from signed 8-bit binary to hexadecimal and to decimal.
Binary
Hex
Calculation
Decimal
%00000000 %01000001 %00010110 %10000111 %11111111
$00 $41 $16 $87 $FF
64 1 16 4 2 128 4 2 1 128 64 32 16 8 4 2 1
0 65 22 121 1
3.3 䡲 8-bit Numbers
63
Notice that the same binary pattern of %11111111 could represent either 255 or 1. It is very important for the software developer to keep track of the number format. The computer can not determine whether the 8-bit number is signed or unsigned. You, as the programmer, will determine whether the number is signed or unsigned by the specific assembly instructions you select to operate on the number. Some operations like addition, subtraction, and shift left (multiply by 2) use the same hardware (instructions) for both unsigned and signed operations. On the other hand, multiply, divide, and shift right (divide by 2) require separate hardware (instruction) for unsigned and signed operations. For example, the multiply instruction, mul, operates on unsigned values. Software that employs the mul instruction implements unsigned arithmetic. There is also a signed multiply instruction, smul, and if you use it, you are implementing signed arithmetic. Similar to the unsigned algorithm, we can use the basis to convert a decimal number into signed binary. We will work through the algorithm with the example of converting 100 to 8-bit binary, as shown in Table 3.7. We start with the most significant bit (in this case 128) and decide do we need to include it to make 100? Yes (without 128, we would be unable to add the other basis elements together to get any negative result), so we set bit 7 and subtract the basis element from our value. Our new value equals 100 minus 128, which is 28. We go the next largest basis element, 64 and ask, “do we need it?” We do not need 64 to generate our 28, so bit 6 is zero. Next we go the next basis element, 32 and ask, “do we need it?” We do not need 32 to generate our 28, so bit 5 is zero. Now we need the basis element 16, so we set bit 4, and subtract 16 from our number 28 (28 16 12). Continuing along, we need basis elements 8 and 4 but not 2 1. Putting it together we get %10011100 (which means 128 16 8 4).
Table 3.7 Example conversion from decimal to signed 8-bit binary.
Number
Basis
Need It
Bit
Operation
100 28 28 28 12 4 0 0
128 64 32 16 8 4 2 1
Yes No No Yes Yes Yes No No
bit 71 bit 60 bit 50 bit 41 bit 31 bit 21 bit 10 bit 00
Observation: To take the negative of a two’s complement signed number we first complement (flip) all the bits, then add 1.
A second way to convert negative numbers into binary is to first convert them into unsigned binary, then do a two’s complement negate. For example, we earlier found that 100 is %01100100. The two’s complement negate is a two step process. First we do a logic complement (flip all bits) to get %10011011. Then add one to the result to get %10011100. A third way to convert negative numbers into binary is to first add 256 to the number, then convert the unsigned result to binary using the unsigned method. For example, to find 100, we add 256 plus 100 to get 156. Then we convert 156 to binary resulting in %10011100. This method works because in 8-bit binary math adding 256 to number does not change the value. E.g., 256-100 has the same 8-bit binary value as 100. Checkpoint 3.13: Give the representations of 45 in 8-bit binary and hexadecimal. Checkpoint 3.14: Why can’t you represent the number 200 using 8-bit signed binary?
64
3 䡲 Representation and Manipulation of Information
Sign-magnitude representation dedicates one bit as the sign leaving the remaining bits to specify the magnitude of the number. If b7 is 1 then the number is negative, otherwise the number is positive. b
N 1 7•(64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0) Unfortunately, there is no basis set for the sign-magnitude number system. For example, %10000010 equals 1•2 or 2. Other examples are shown in Table 3.8. Table 3.8 Example conversions from sign-magnitude 8-bit binary to hexadecimal and to decimal.
Binary
Hex
Calculation
Decimal
%00000000 %01000001 %00010110 %10000111 %11111111
$00 $41 $16 $87 $FF
64 1 16 4 2 1•(4 2 1) 1•(64 32 16 8 4 2 1)
0 65 22 7 127
Another problem with sign-magnitude is that there are two representations of the number 0: “00000000” and “10000000”. But, the biggest advantage of two’s complement signed numbers over sign-magnitude is that the same addition and subtraction hardware (e.g., the adda, suba instructions) can be used for both signed and unsigned numbers. We also can use the same hardware for shift left (e.g., asla is the same instruction as lsla). Although the hardware for these three operations works for both signed and unsigned numbers, the overflow (error) conditions are distinct. The C bit in the condition code register (CCR) signifies unsigned overflow, and the V bit in the CCR means a signed overflow has occurred. Unfortunately, we must use separate signed and unsigned operations for multiply, divide, and shift right. Common Error: An error will occur if you use signed operations on unsigned numbers, or use unsigned operations on signed numbers. Maintenance Tip: To improve the clarity of our software, always specify the format of your data (signed versus unsigned) when defining or accessing the data.
When communicating with humans (input or output), computers need to store information in an easy-to-read decimal format. One such format is binary coded decimal or BCD. The 8-bit BCD format contains two decimal digits, and each decimal digit is encoded in 4-bit binary. For example, the number 72 is stored as $72 or %01110010. We can represent numbers from 0 to 99 using 8-bit BCD. Checkpoint 3.15: What binary values are used to store the number 25 in 8-bit BCD format?
3.4
16-bit Numbers A word or double byte contains 16 bits, where each bit b15, . . . , b0 is binary and has the value 1 or 0, as shown in Figure 3.4.
Figure 3.4 16-bit binary format.
b15 b14 b13 b12 b11 b10 b9 b8 b7 b6 b5 b4
b3 b2 b1 b0
If a word is used to represent an unsigned number, then the value of the number is N 32768•b15 16384•b14 8192•b13 4096•b12 2048•b11 1024•b10 512•b9 256•b8 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0
3.4 䡲 16-bit Numbers
65
There are 65536 different unsigned 16-bit numbers. The smallest unsigned 16-bit number is 0 and the largest is 65535. For example, %0010000110000100 or $2184 is 8192 256 128 4 or 8580. Other examples are shown in Table 3.9.
Table 3.9 Example conversions from unsigned 16-bit binary to hexadecimal and to decimal.
Checkpoint 3.16: Convert the 16-bit binary number %0010000001101010 to unsigned decimal. Checkpoint 3.17: Convert the 16-bit hex number $1234 to unsigned decimal.
For the unsigned 16-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768} Checkpoint 3.18: Convert the unsigned decimal number 1234 to 16-bit hexadecimal. Checkpoint 3.19: Convert the unsigned decimal number 10000 to 16-bit binary.
There are also 65536 different signed 16-bit numbers. The smallest two’s complement signed 16-bit number is 32768 and the largest is 32767. For example, %1101000000000100 or $D004 is 327681638440964 or 12284. Other examples are shown in Table 3.10.
Table 3.10 Example conversions from signed 16-bit binary to hexadecimal and to decimal.
If a word is used to represent a signed two’s complement number, then the value of the number is N 32768•b15 16384•b14 8192•b13 4096•b12 2048•b11 1024•b10 512•b9 256•b8 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Checkpoint 3.20: Convert the 16-bit hex number $1234 to signed decimal. Checkpoint 3.21: Convert the 16-bit hex number $ABCD to signed decimal.
66
3 䡲 Representation and Manipulation of Information
For the signed 16-bit number system the basis is {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768} Common Error: An error will occur if you use 16-bit operations on 8-bit numbers, or use 8-bit operations on 16-bit numbers. Maintenance Tip: To improve the clarity of your software, always specify the precision of your data when defining or accessing the data. Checkpoint 3.22: Convert the signed decimal number 1234 to 16-bit hexadecimal. Checkpoint 3.23: Convert the signed decimal number –10000 to 16-bit binary.
3.5
Extended Precision Numbers Consider an unsigned number with n bits, where each bit bn-1, Á , b0 is binary and has the value 1 or 0. If an n-bit number is used to represent an unsigned integer, then the value of the number is n-1
N2n-1•b n-1 + 2 n-2•bn-2 + Á + 2•b1 + b0 a 2 i•bi i0
There are 2n different unsigned n-bit numbers. The smallest unsigned n-bit number is 0 and the largest is 2n 1. For the unsigned n-bit number system, the basis is {1, 2, 4, Á , 2n2, 2n1} If an n-bit binary number is used to represent a signed two’s complement number, then the value of the number is n-2
N -2n-1•bn-1 + 2n-2•bn-2 + Á + 2•b1 + b0 - 2n-1•bn-1 + a 2i•bi i0
There are also 2 different signed n-bit numbers. The smallest signed n-bit number is 2n1 and the largest is 2n1 1. For the signed n-bit number system, the basis is n
{1, 2, 4, Á , 2n2, 2n1} Maintenance Tip: When programming in C, we will use data types char short and long when we wish to explicitly specify the precision as 8-bit, 16-bit or 32-bit. Whereas, we will use the int data type only when we don’t care about precision, and we wish the compiler to choose the most efficient way to perform the operation. Observation: When programming in assembly, we will always explicitly specify the precision of our numbers and calculations.
The binary coded decimal or BCD format is convenient for storing data that has just been input or is just about to be output. Each byte or a BCD number contains two decimal digits, and each decimal digit is encoded in four-bit binary. For example, the number 1,234,567 is stored in four bytes as $01234567. If m is the number of bytes, then the numbers from 0 to 100m 1 can be stored. Checkpoint 3.24: What hexadecimal values are used to store the number 3456 in 16-bit BCD format?
3.6
Logical Operations Software uses logical operations to combine information, to extract information and to test information. A unary operation produces its result given a single input parameter. For example, negate, increment, and decrement are unary operations.
3.6 䡲 Logical Operations
67
In discrete digital logic, the complement operation is called a NOT gate, as shown in Figure 3.5. The complement function is defined in Table 3.11. CMOS refers to complementary metal oxide semiconductor. The “HC” in 74HC04 stands for high-speed CMOS. Most microcomputers, including the 9S12, are made with high-speed CMOS logic. As we saw in Chapter 1, CMOS circuits are built with p-type and n-type transistors. There are just a few rules one needs to know for understanding how CMOS transistor-level circuits work. Each transistor acts like a switch between its source and drain pins. In general, current can flow from source to drain across an active p-type transistor, and no current will flow if the switch is open. From a first approximation, we can assume no current flows into or out of the gate. For a p-type transistor, the switch will be closed (transistor active) if its gate is low. A p-type transistor will be off (its switch is open) if its gate is high. The gate on the n-type works in a complementary fashion, hence the name complementary metal oxide semiconductor. For a n-type transistor, the switch will be closed (transistor active) if its gate is high. A n-type transistor will be off (its switch is open) if its gate is low. Therefore, consider the two possibilities for the circuit in Figure 3.5. If A is high (5 V), then p-type is off and the n-type is active. The closed switch across the source-drain of the n-type will make the output low (0 V). Conversely, if A is low (0 V), then p-type is active and the n-type is off. The closed switch across the sourcedrain of the p-type will make the output high (5 V). The 9S12 performs the complement in a bit-wise fashion. For example, the calculation r⬃n means each bit is calculated separately, r7⬃n7, r6⬃n6, . . . , r0⬃n0. Figure 3.5 Logical NOT operation can be implemented with discrete transistors or digital gates.
+5V p-type A p-type n-type A 0 V active off +5V A +5V off active 0V
drain
n-type
drain
gate
Table 3.11 Logical complement.
A
⬃A
0 1
1 0
source
gate
A
A
A
74HC04
source
A binary operation produces a single result given two inputs. The logical AND (&) operation yields a true result if both input parameters are true. The logical OR (|) operation yields a true result if either input parameter is true. The exclusive OR (^) operation yields a true result if exactly one input parameter is true. The logical operators are summarized in Table 3.12 and shown as digital gates in Figure 3.6. We can understand the operation of the AND gate by observing the behavior of its six transistors. If both A and B are high, both T3 and T4 will be active. Furthermore, if A and B are both high, T1 and T2 will be off. In this case, the signal labeled A & B will be low because the T3,T4 switch combination will short this signal to ground. If A is low, T1 will be active and T3 off. Similarly, if B is low, T2 will be active and T4 off. Therefore, if either
Table 3.12 Logical operations.
A
B
A&B
A|B
A^B
0 0 1 1
0 1 0 1
0 0 0 1
0 1 1 1
0 1 1 0
68
3 䡲 Representation and Manipulation of Information
Figure 3.6 Logical operations can be implemented with discrete transistors or digital gates.
AND Gate
OR Gate
A&B
A B
74HC08 +5V A
A&B T3
74HC86 +5V
+5V T2
A^B
A B
74HC32
T1
B
EOR Gate
A|B
A B
+5V
A
T1
B
T2
+5V
T5 A&B
A|B
T6 T4
T3
T5 A|B T6
T4
A is low or if B is low, the signal labeled A & B will be high because one or both of the T1,T2 switches will short this signal to 5 V. Transistors T5 and T6 create a logical complement, converting the signal A & B into the desired result of A&B. We can understand the operation of the OR gate by observing the behavior of its 6 transistors. If both A and B are low, both T1 and T2 will be active. Furthermore, if A and B are both low, T3 and T4 will be off. In this case, the signal labeled A | B will be high because the T1,T2 switch combination will short this signal to 5V. If A is high, T3 will be active and T1 off. Similarly, if B is high, T4 will be active and T2 off. Therefore if either A is high or if B is high, the signal labeled A | B will be low because one or both of the T3,T4 switches will short this signal to ground. Transistors T5 and T6 create a logical complement, converting the signal A | B into the desired result of A|B. Checkpoint 3.25: Using just the 74HC gates shown in Figures 3.5 and 3.6, design an equals circuit, such that the output is 1 if and only if input A equals input B. There will be two input signals and one output signal.
Most 8-bit logical instructions take two inputs, one from a register and the other from memory. The 9S12 performs these operations in a bit-wise fashion on two 8-bit parameters yielding an 8-bit result. For example, the calculation rm&n means each bit is calculated separately, r7m7&n7, r6m6&n6, . . . , r0m0&n0. All but the bita bitb instructions put the result back in the register. The N bit will be set is the result is negative. The Z bit will be set if the result is zero. These logical instructions will clear the V bit and leave the C bit unchanged. anda anda andb andb bita bita bitb bitb coma comb eora eora eorb eorb oraa oraa orab orab
#w U #w U #w U #w U
#w U #w U #w U #w U
;RegA=RegA&w ;RegA=RegA&[U] ;RegB=RegB&w ;RegB=RegB&[U] ;RegA&w ;RegA&[U] ;RegB&w ;RegB&[U] ;RegA=$FF-RegA, RegA=~RegA ;RegB=$FF-RegB, RegB=~RegB ;RegA=RegA ^ w ;RegA=RegA ^ [U] ;RegB=RegB ^ w ;RegB=RegB ^ [U] ;RegA=RegA | w ;RegA=RegA | [U] ;RegB=RegB | w ;RegB=RegB | [U]
Logical and RegA with a constant Logical and RegA with a memory value Logical and RegB with a constant Logical and RegB with a memory value Logical and RegA with a constant Logical and RegA with a memory value Logical and RegB with a constant Logical and RegB with a memory value Complement RegA Complement RegB Exclusive or RegA with a constant Exclusive or RegA with a memory value Exclusive or RegB with a constant Exclusive or RegB with a memory value Logical or RegA with a constant Logical or RegA with a memory value Logical or RegB with a constant Logical or RegB with a memory value
3.6 䡲 Logical Operations
69
Condition code bits are set, where R is the result of the operation. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0 V: signed overflow V 0 C: unchanged Example 3.1 Write software to set bit 4 and clear bits 1 and 0 of an 8-bit variable N. Solution We use an 8-bit register because we wish to operate on 8-bit data. We “or with 1” to set bits and we “and with 0” to clear bits. This logical function N$FC&(N|$10) performs the desired effect. Immediate mode addressing is used when operating on fixed constants. ldaa oraa anda staa
To illustrate how the above program works, let b7 b6 b5 b4 b3 b2 b1 b0 be the values of the original 8 bits of variable N. The ldaa instruction brings these values into Register A. The oraa instruction sets bit 4, the anda instruction clears bits 1,0, and the staa instruction stores the result back to N. b7 0 b7 1 b7
b6 0 b6 1 b6
b5 0 b5 1 b5
b4 1 1 1 1
b3 0 b3 1 b3
b2 0 b2 1 b2
b1 0 b1 0 0
b0 0 b0 0 0
value of N $10 constant result of the oraa instruction $FC constant result of the anda instruction
Checkpoint 3.26: Write assembly code that implements RegDRegD&$0F3C. Checkpoint 3.27: Write assembly code that implements RegXRegX|$1234. Checkpoint 3.28: Let N be an 8-bit location. Write assembly code that clears bit 4.
We can use the AND operation to extract, or mask, individual bits from a value.
Example 3.2 Write software that sets a global variable to true if a switch is pressed. Solution The first step is to interface a switch to an input port of the 9S12. We will use positive logic interface because we want the digital signal in to be high if and only if the switch is pressed, as shown in Figure 3.7. In particular, PTT bit 0 contains a signal that is high or low depending on the position of the switch. Some switches bounce, which means there will be multiple open/closed cycles when the switch is changed. This simple solution can be used if the switch doesn’t bounce or if the bouncing doesn’t matter. Bit 0 of the Port T direction register should be made zero during the initialization. When the computer reads PTT it gets all 8 bits of the input port. On the other hand, the expression PTT&0x01 will be zero, if Figure 3.7 Interface of a switch to a microcomputer input.
+5V 9S12 in 10kΩ
PT0
70
3 䡲 Representation and Manipulation of Information
and only if bit 0 of PTT is zero. The following C code will set the variable Pressed to true (nonzero) if the switch is pressed. Pressed = PTT&0x01;
// true if the switch is pressed
The following 9S12 assembly code uses the anda instruction to perform the same operation. ldaa PTT ;read input Port T anda #$01 ;clear all bits except bit 0 staa Pressed ;true iff the switch is pressed
To illustrate how the above program works, let a7 a6 a5 a4 a3 a2 a1 a0 be the values of the 8 individual bits in PTT. The ldaa instruction brings these values into Register A. The anda instruction clears all bits except bit 0, and the staa instruction stores the result into the variable called Pressed. a7 a6 a5 a4 a3 a2 a1 a0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 a0
value of PTT $01 constant result of the anda instruction
Often we combine many small systems together to make larger systems. Once we debug a small system we would like to have confidence that it will still work when combined with other systems. One difficulty arises when two or more systems share an I/O port (for example system 1 uses PT1, and system 2 uses PT2). Friendly software modifies just the bits that need to be modified, making it easier to combine with other software. Conversely, an unfriendly solution modifies all 8 bits of a register when needing only to modify less than 8 bits.
Example 3.3 Write software that make PT4 PT5 outputs and clears both outputs without affecting the other bits of PTT. Solution This system uses just bits 4 and 5 of PTT, and the other 6 bits are not needed in this problem. If we implement a friendly solution, this system can be combined with other systems that use the other bits of PTT. We begin by setting DDRT bits 4 and 5, so PT4 PT5 become outputs. Usually, we set the direction register once at the start of our program. Rather than just setting DDRT=0x30 (unfriendly), we perform a read modify write so just bits 4 and 5 are affected. The following C code uses the OR operation to set bits 4 and 5 of the register DDRT. The other six bits of DDRT remain constant. DDRT |= 0x30; // set bits 4 and 5, making PT4 and PT5 outputs
The following 9S12 assembly code uses the oraa instruction to perform the same operation. ldaa DDRT oraa #$30 staa DDRT
;read previous value of DDRT ;set bits 4 and 5, other 6 bits left unchanged ;update the actual direction register
To illustrate how the above program works, let c7 c6 c5 c4 c3 c2 c1 c0 be the values of the original 8 bits in DDRT. The ldaa instruction brings these values into Register A. The oraa instruction sets bits 4 and 5, and the staa instruction stores the result back to DDRT. c7 c6 c5 c4 c3 c2 c1 c0 0 0 1 1 0 0 0 0 c7 c6 1 1 c3 c2 c1 c0
value of DDRT $30 constant result of the oraa instruction
We use another read-modify-write to also to clear bits 4 and 5 of PTT. Notice that ⬃0x30 is 0xCF. This complement is executed at compile-time rather than at run-time. PTT &= ~0x30; // clear bits 4 and 5, PT4 and PT5 become 0
3.6 䡲 Logical Operations
71
The following 9S12 assembly code uses the anda instruction to clear the two bits of PTT. ldaa PTT anda #$CF staa PTT
;read previous value of PTT ;clear bits 4 and 5, other 6 bits left unchanged ;update the actual PTT register
Maintenance Tip: When interacting with just some of the bits of an I/O register, it is better to modify just the bits of interest, leaving the other bits unchanged. In this way, the action of one piece of software does not undo the action of another piece.
These read-or-write and read-and-write sequences are extremely useful in manipulating individual bits within direction registers and output ports. So useful in fact that the 9S12 has instructions to perform these logical operations. Notice that these two instructions directly affect memory space without using registers, and that the data size is always 8-bits. These instructions have two addressing modes. The first addressing mode determines memory location to change. For now this first addressing mode will be direct or extended addressing, but later in Chapter 6, we see indexed addressing mode also could be used to specify the memory location. The second addressing mode will always be immediate, specifying which bits to modify. The N bit will be set if the result is negative. The Z bit will be set if the result is zero. These logical instructions will clear the V bit and leave the C bit unchanged. bclr bset
U,#w U,#w
;[U]=[U]&(~w) ;[U]=[U] | w
Clear bits in memory Set bits in memory
Condition code bits are set, where R is the result of the operation. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0 V: signed overflow V 0 C: unchanged Example 3.4 Write software that toggles a PT3 output without affecting the other bits of PTT. Toggle means change. I.e., if it is 1, make it 0. If it is 0, make it 1. Solution The exclusive or operation can be used to toggle bits. The following C code toggles PT3 by inverting bit 3 of PTT, while the other seven bits remain constant. Notice that 0x08 is %00001000 in binary. PTT ^= 0x08;
// toggle PT3 from 0 to 1 or from 1 to 0
The following 9S12 assembly code uses the eora instruction to perform the same operation. ldaa PTT eora #$08 staa PTT
;read output Port T ;toggle just bit 3, other 7 bits left unchanged ;update the actual output port
To illustrate how the above program works, let b7 b6 b5 b4 b3 b2 b1 b0 be the values of the original 8 bits in PTT. The ldaa instruction brings these values into Register A. The eora instruction toggles bit 3, and the staa instruction stores the result back to PTT. b7 b6 b5 b4 b3 b2 b1 b0 0 0 0 0 1 0 0 0 b7 b6 b5 b4 ~b3 b2 b1 b0
value of PTT $08 constant result of the eora instruction
72
3 䡲 Representation and Manipulation of Information
Example 3.5 Generate two out-of-phase squarewaves as shown in Figure 3.8. Solution Out of phase means one signal goes high when the other one goes low. During the initialization, we specify PT1 and PT0 as outputs, then establish the initial values as 0 and 1 respectively. We use the exclusive or operation to toggle both bits at the same time. The infinite loop program will repeat the exclusive or operation over and over, creating the out of phase squarewaves on Port T bit 1 and 0. The other six bits of Port T remain unchanged. DDRT |= 0x03; // make PT1 PT0 output PTT = (PTT&0xFD)|0x01; // PT1=0, PT0=1 while(1){ PTT ^= 0x03; // toggle bits 1 and 0 }
The following assembly code uses logical instructions to perform the function in a friendly manner. The period of the squarewave is determined by the speed of the microcomputer. Figure 3.8 shows the simulated waveforms running on a 1 MHz 9S12. main bset bclr bset loop ldaa eora staa bra
DDRT,#03 PTT,#$02 PTT,#$01 PTT #$03 PTT loop
;make PT1, PT0 outputs, leaving other bits as is ;make PT1=0 ;make PT0=1, leaving other bits as is ;read previous value of Port T ;toggle bits 1,0 ;change PT1 PT0, leaving other bits as is
Figure 3.8 Scope window showing the execution of Example 3.5.
Other convenient logical operators are summarized in Table 3.13 and shown as digital gates in Figure 3.9. The NAND operation is defined by an AND followed by a NOT. If you compare the transistor-level circuits in Figures 3.6 and 3.9, it would be more precise to say AND is defined as a NAND followed by a NOT. Similarly, the OR operation is a NOR followed by a NOT. The exclusive NOR operation implements the bit-wise equals operation. Table 3.13 Convenient logical operations.
A
B
NAND
NOR
Exclusive NOR
0 0 1 1
0 1 0 1
1 1 1 0
1 0 0 0
1 0 0 1
3.6 䡲 Logical Operations Figure 3.9 Other logical operations can also be implemented with discrete logic.
NAND
NOR
A&B
A B
A|B
A B
74HC00
Ex NOR
74HC02
+5V
Open collector
NOT
A^B
A B
73
A
A 74HC7266
7405 or 7406
+5V
+5V A
A
74HC05
B
A&B
B
A A|B
A
The output of an open collector gate, drawn with the ‘x’, has two states: low (0 V) and HiZ (floating.) TExaS signifies this floating state with a z, as seen in Figure 3.2. Consider the operation of the transistor-level circuit for the 74HC05. If A is high (5 V), the transistor is active, and the output is low (0 V). If A is low (0 V), the transistor is off, and the output is neither high nor low. In general, we can use an open collector NOT gate to control the current to a device, such as a relay, an LED, a solenoid, a small motor and a small light. The 74HC05, the 7405, and the 7406 are all open collector NOT gates. 74HC04 is high speed CMOS and can only sink up to 4 mA when its output is low. Since the 7405 and 7406 are transistor-transistor-logic (TTL) they can sink more current. In particular, the 7405 has a maximum output low current (IOL) of 16 mA, whereas the 7406 has a maximum IOL of 40 mA.
Example 3.6 The goal is develop a means for the microcontroller to turn on and turn off an AC-powered appliance. The interface will use a solid-state relay with a control parameters of 2 V and 10 mA. Write necessary subroutines to operate the system. Solution The control portion of the solid-state relay (SSR) is an LED, which we interface using an open collector NOT gate just like Figure 2.17. We choose an electronic circuit that has an output current larger than the 10 mA needed by the SSR. Since the maximum IOL of the 7405 is 16 mA, it can sink the 10 mA required by the SSR. The 7406 could also have been used. The resistor is selected to control the current to the diode. Using the LED design equation, R (5 Vd VOL)/Id (5 2 0.5 V)/0.01 A 250 . The closest standard value 5% resistor is 240 . A 240 resistor will generate Id (5 2 0.5 V)/240 10.4 mA, which will be close enough to activate the relay. When the input to the 7405 is high (p 5 V), the output is low (q 0.5 V), see Figure 3.10. In this state, a 10 mA current is applied to the diode, and relay switch activates. This causes 120 VAC power to be delivered to the appliance. But, when the input is low (p 0), the output floats (q HiZ, which is neither high or low). This floating output state causes the LED current to be zero, and the relay switch opens. In this case, no power is delivered to the appliance. Figure 3.10 Solid-state relay interface using a 7405 open collector driver.
+5V 240Ω
9S12
SSR
7405 PT5
p
Appliance
q
120 VAC
74
3 䡲 Representation and Manipulation of Information
The initialization subroutine will set bit 5 of DDRT to make PT5 an output, see Program 3.1. This function should be called once at the start of the system. After initialization, the on and off functions can be called to control the applicance. Software that operates by affecting only the bits it has to without changing any of the other bits is called friendly. The oraa instruction is used to set bits and the anda instruction clears bits. Program 3.1 Subroutines to control a solid-state relay.
SSR_Init ldaa oraa staa rts SSR_On ldaa oraa staa rts SSR_Off ldaa anda staa rts
DDRT #$20 DDRT
;PT5 output
PTT #$20 PTT
;PT5 high
PTT #$BF PTT
;PT5 low
// Make PT5 an output SSR_Init(void){ DDRT |= 0x20; } // Make PT5 high void SSR_On(void){ PTT |= 0x20; } // Make PT5 low void SSR_Off(void){ PTT &= ~0x20; }
Checkpoint 3.29: Rewrite the assembly code in Program 3.1 using the bset and bclr instructions.
While we’re introducing digital circuits, we need digital storage devices, which are are essential components used to make registers and memory. The simplest storage device is the set-reset flip-flop. One way to build one is shown on the left side of Figure 3.11. If the inputs are S*0 and R*1, then the Q output will be one. Conversely, if the inputs are S*1 and R*0, then the Q output will be 0. Normally, we leave both the S* and R* inputs high. We make the signal S* go low, then back high to set the flip-flip, making Q 1. Conversely, we make the signal R* go low, then back high to reset the flip-flip, making Q 0. If both S* and R* are 1, the value on Q will be remembered or stored. This flip-flop enters an unpredicable mode with S* and R* are simulataneously low. Figure 3.11 Digital storage elements.
Set-Reset flip-flop S*
Q
Gated D flip-flop S*
D W
R*
R*
Q
74HC374
74HC74 8 D Q clock
D Q clock G
8
The gated D flip-flop is also shown in Figure 3.11. The front-end circuits take a data input, D, and a control signal, W, and produce the S* and R* commands for the set-reset flip-flop. For example, if W 0, then the flip-flip is in its quiescent state, remembering the value on Q that was previously written. However, if W 1, then the data input is stored into the flip-flip. In particular, if D 1 and W 1, then S*0 and R*1, making Q 1. Furthermore, if D 0 and W 1, then S*1 and R*0, making Q 0. So, to use the gated flip-flip, we first put the data on the D input, next we make W go high, then we make W go low. This causes the data value to be stored at Q. After W goes low, the data does not need to exist at the D input anymore. If the D input changes while W is high, then the Q output will change correspondingly. However, the last value on the D input is remembered or latched when the W falls, as shown in Table 3.14. The D flip-flop, shown on the right of Figure 3.11, can also be used to store information. D flip-flips are the basic building block of RAM and registers on the computer. To save information, we first place the digital value we wish to remember on the D input, then give a rising edge to the clock input. After the rising edge of the clock, the value is available at the Q output, and the D input is free to change. The operation of the clocked D flip-flop is
3.6 䡲 Logical Operations
75
defined on the right side of Table 3.14. The 74HC374 is an 8-bit D flip-flop, such that all 8 bits are stored on the rising edge of a single clock. The 74HC374 is similar in structure and operation to a register, which is high speed memory inside the processor. If the gate (G) input on the 74HC374 is high, its outputs will be HiZ (floating), and if the gate is low, the outputs will be high or low depending on the stored values on the flip-flop. Table 3.14 D flip-flop operation. Qold is the value of the D input at the time of the active edge of on W or clock.
D
W
Q
D
clock
Q
0 1 0 1 0 1
0 0 1 1 T T
Qold Qold 0 1 0 1
0 0 1 1 0 1
0 1 0 1 c c
Qold Qold Qold Qold 0 1
Second, the tristate driver, shown in Figure 3.12, can be used dynamically control signals within the computer. The tristate driver is an essential component from which computers are built. To active the driver, we make its gate (G) low. When the driver is active, its output (Y) equals its input (A). To deactive the driver, we make its G high. When the driver is deactive, its output Y floats independent of A. We saw this floating state with the open collector logic, and it is also called HiZ or high impedence. The HiZ output means the output is neither driven high or low. The operation of a tristate driver is defined in Table 3.15. The 74HC244 is an 8-bit tristate driver, such that all 8 bits are active or deactive controlled by a single gate. The 74HC374 8-bit D flip-flop includes tristate drivers on its outputs. Normally, we can’t connect to digital outputs together. The tristate driver provides a way to connect multiple outputs to the same signal, as long as at most one of the gates is active at a time. Figure 3.12 A 1-bit and an 8-bit tristate driver.
74HC125 Y
A G
A G
A +5V
Y
+5V
+5V T3
74HC244 8 8
G A
T5 T6
T4 Y
T1 T7
G T2
Table 3.15 Tristate driver operation. HiZ is the floating state, such that the output is not high or low.
G
T8
A
G
T1
T2
T3
T4
T5
T6
T7
T8
Y
0 1 0 1
0 0 1 1
on on off off
off off on on
on off on off
off on off on
on on off off
off on off on
on off on off
on on off off
0 1 HiZ HiZ
To understand how a tristate driver works, look at the various pieces of the circuit in Figure 3.12. Transistors T1 and T2 create the logical complement of G. Similarly, transistors T3 and T4 create the complement of A. An input of G 0 causes the driver to be active. In this case, both T5 and T8 will be on. With T5 and T8 on, the circuit behaves like a cascade
76
3 䡲 Representation and Manipulation of Information
of two NOT gates, so the output Y equals the input A. However, if the input G 1, both T5 and T8 will be off. Since T5 is in series with the 5 V, and T8 in series with the ground, the output Y will be neither high nor low. I.e., it will float.
3.7
Shift Operations When programming in C, the shift is a binary operation. In other words, the << and >> operators take two inputs and yield one output, e.g., r m >> n. But at the machine level (i.e., assembly programming), the shift operators are actually unary operations, e.g., r m >> 1. The assembly instructions used for shifting will shift one bit at a time. If you want to shift multiple times, you will have to execute the instruction multiple times. The logical shift right (LSR) is the equivalent to an unsigned divide by 2, as shown in Figure 3.13. A zero is shifted into the most significant position, and the carry flag will hold the bit shifted out.
Figure 3.13 8-bit logical shift right.
LSR
0
C
Consider the top row of 8 D flip-flops of Figure 3.14 as a register containing an 8-bit value. The LSR function can be implemented in hardware as a two step process. The first step, which occurs on the falling edge of shift (rising edge of copy), is to make a copy of the 8 bits into the lower row of D flip-flips. Then, on the rising edge of the shift signal, the new shifted value is clocked back in the top row. Figure 3.14 8-bit logical shift right hardware.
0
b7
b6
b5
b4
D Q
D Q
D Q
D Q
b3 D Q
b2 D Q
b1
b0
D Q
D Q
C D Q
shift D Q
D Q
D Q
D Q
D Q
D Q
D Q
D Q
copy
The arithmetic shift right (ASR) is the equivalent to a signed divide by 2, as shown in Figure 3.15. Notice that the sign bit is preserved and the carry flag will hold the bit shifted out. Figure 3.15 8-bit arithmetic shift right.
ASR
C
Checkpoint 3.30: Use D flip-flops like Figure 3.14 to build an 8-bit ASR function.
The same shift left operation works for both unsigned and signed multiply by 2, as shown in Figure 3.16. In other words, the arithmetic shift left (ASL) is identical to the logical shift left (LSL). A zero is shifted into the least significant position, and the carry bit will contain the bit that was shifted out. Figure 3.16 8-bit shift left.
LSL/ASL C
0
The roll operations can be used to create multiple-byte shift functions. Roll right and roll left are shown in Figure 3.17. In each case, the carry is shifted into the 8-bit byte, and the carry bit will contain the bit that was shifted out. The simplest way to perform a shift operation on the microcomputer is to use a register like Register A or Register B. The asla and lsla instructions have identical machine
3.7 䡲 Shift Operations Figure 3.17 8-bit roll right and 8-bit roll left.
ROR
77
C
ROL
C
codes. The two assembly language names allow the programmer to write clearer code (using lsla for unsigned numbers and asla for signed numbers). The shift instructions use inherent addressing. The N bit is set if the result is negative. The Z bit is set if the result is zero. The V bit is set on a signed overflow, and detected by a change in the sign bit. The C bit is the carry out after the shift. asla aslb asld lsla lslb lsld asra asrb asrd lsra lsrb lsrd rola rolb rora rorb
Signed shift left, same as lsla Signed shift left, same as lslb Signed shift left, same as lsld Unsigned shift left, same as asla Unsigned shift left, same as aslb Unsigned shift left, same as asld Signed shift right Signed shift right Signed shift right Unsigned shift right Unsigned shift right Unsigned shift right Rotate RegA (C←A7←...←A0←C) Rotate RegB (C←B7←...←B0←C) Rotate RegA (C→A7→...→A0→C) Rotate RegB (C→B7→...→B0→C)
Example 3.7 Write assembly code to implement M N >> 2, where M and N are 16-bit unsigned variables. Solution We need to use a 16-bit register, because we have 16-bit data. First, we perform a 16-bit read, bringing N into Register D. Second we divide by 4 using two shift right operations, and lastly we store the result into M. Since the value gets smaller, no overflow can occur. If the variables were signed, then the two lsrd instructions should be replaced with a asrd instructions ldd N lsrd lsrd std M
Checkpoint 3.31: Let N and M be 8-bit signed locations. Write assembly code to implement M4*N. Maintenance Tip: Use the asla instruction when manipulating signed numbers, and use the lsla instruction when shifting unsigned numbers.
Example 3.8 Take two 4-bit nibbles and combine them into one 8-bit value. Solution The solution uses the shift operation to move the bits into position, then it uses the or operation to combine the two parts into one number. Let High and Low be the unsigned 4-bit
78
3 䡲 Representation and Manipulation of Information
components, which will be combined into a single unsigned 8-bit Result. We will assume both High and Low are bounded within the range of 0 to 15. The expression High<<4 will perform four logical shift lefts. Result = (High<<4)|Low;
The assembly solution is ldaa High lsla lsla lsla lsla oraa Low staa Result
;read value of High ;shift into position
;combine the two parts together ;save answer
To illustrate how the above program works, let 0 0 0 0 h3 h2 h1 h0 be the value of High, and let 0 0 0 0 l3 l2 l1 l0 be the value of Low. The ldaa instruction brings High into Register A. The four lsla instructions move the High into bit positions 4 to 7, the oraa instruction combines High and Low and the staa instruction stores the combination into Result. 0 0 0 0 h3 0
0 0 0 h3 h2 0
0 0 h3 h2 h1 0
0 h3 h2 h1 h0 0
h3 h2 h1 h0 0 l3
h2 h1 h0 0 0 l2
h1 h0 0 0 0 l1
h0 0 0 0 0 l0
h3 h2 h1 h0 l3 l2 l1 l0
3.8
value of High after first lsla after second lsla after third lsla after last lsla value of Low result of the oraa instruction
Arithmetic Operations: Addition and Subtractions When software executes arithmetic instructions, the operations are performed by digital hardware inside the processor. Even though the design of such logic is complex, we will present a brief introduction, in order to provide a little insight as to how the computer performs arithmetic. It is important to remember that arithmetic operations (addition, subtraction, multiplication, and division) have constraints when performed with finite precision on a microcomputer. An overflow error occurs when the result of an arithmetic operation can not fit into the finite precision of the register into which the result is to be stored. For example when two 8-bit numbers are added, the sum may not fit back into an 8-bit register. Previously, we stated that the same digital hardware (instructions) could be used to add and subtract unsigned and signed numbers. This is true, but we will have to design separate overflow detection for signed and unsigned addition and subtraction. Checkpoint 3.32: How many bits does it take to store the result of two unsigned 8-bit numbers added together? Checkpoint 3.33: How many bits does it take to store the result of two signed 8-bit numbers added together? Checkpoint 3.34: How many bits does it take to store the result of two unsigned 8-bit numbers multiplied together? Checkpoint 3.35: How many bits does it take to store the result of two signed 8-bit numbers multiplied together?
It is common for computers to perform arithmetic operations using a register like Register A. As we have seen, a register is a high-speed storage inside the processor. An accumulator, like Register A, is a register with which arithmetic and logic operations can be performed.
3.8 䡲 Arithmetic Operations: Addition and Subtractions
79
The following instructions are a few of the arithmetic functions available on the 9S12, which fetch data from memory and add/subtract it from Register A. With immediate mode (#w) the 8-bit constant is located in the instruction itself. With direct mode (u) the 8-bit data is fetched from memory location u (u is a number from 0 to 255). With extended mode (U) the 8-bit data is fetched from the 16-bit memory location U. Recall that direct/extended mode affects the size of the address, not the size of the data. The size of the data will be determined by the size of the register into which the operation will be performed. All microcomputers have a condition code register (CC or CCR) that specifies the status of the most recent operation. In this section, we will need the four condition code bits shown in Table 3.16. If the two inputs to an addition or subtraction operation are considered as unsigned, then the C bit (carry) will be set if the result does not fit. In other words, after an unsigned addition, the C bit is set if the answer is wrong. If the two inputs to an addition or subtraction operation are considered as signed, then the V bit (overflow) will be set if the result does not fit. In other words, after a signed addition, the V bit is set if the answer is wrong. If the result is unsigned 8 bits, the N 1 means the result is greater than or equal to 128. Conversely, if the result is signed, the N 1 means the result is negative. Table 3.16 Condition code bits contain the status of the previous arithmetic or logical operation.
Bit
Name
Meaning After Addition or Subtraction
N Z V C
Negative Zero Overflow Carry
Result is negative Result is zero Signed overflow Unsigned overflow
The adda and addb instructions add an 8-bit value from memory to the corresponding register. These instructions work for both signed and unsigned data. adda adda addb addb
Add 8-bit constant to RegA Add 8-bit memory value to RegA Add 8-bit constant to RegB Add 8-bit memory value to RegB
Condition code bits are set after R X M, where X is initial register value, R is the final register value. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0 V: signed overflow V = X7•M7•R7 + X7•M7•R7 C: unsigned overflow C = X7•M7 + M7•R7 + R7•X7
Example 3.9 Write assembly code to implement M N 10, where M and N are 8-bit variables. Solution First, we perform an 8-bit read, bringing N into Register A. Second, we add 10 to this value, and lastly, we store the result into M. If M and N are unsigned variables (0 to 255), then the C bit would be set on an overflow. On the other hand, if M and N are signed variables (128 to 127), then the V bit would be set on an overflow. ldaa N adda #10 ;RegA=N+10 staa M
80
3 䡲 Representation and Manipulation of Information
The addd instruction adds a 16-bit value from memory to Register D. This instruction works for both signed and unsigned data. addd addd
#W U
;RegD=RegD+W ;RegD=RegD+{U}
Add 16-bit constant to RegD Add 16-bit memory value to RegD
Condition code bits are set after R D M, where D is initial register value, R is the final register value. N: result is negative N R15 Z: result is zero Z = R15•R14•R13•R12•R11•R10•R9•R8•R7•R6•R5•R4 •R3•R2•R1•R0 V: signed overflow V = D15• M15•R15 + D15•M15•R15 C: unsigned overflow C = D15• M15 + M15•R15 + R15•D15 Example 3.10 Write assembly code to implement M N 1000, where M and N are 16-bit variables. Solution We need to use a 16-bit register, because we have 16-bit data. First, we perform a 16-bit read, bringing N into Register D. Second, we add 1000 to this value, and lastly, we store the result into M. If M and N are unsigned variables (0 to 65535), then the C bit would be set on an overflow. On the other hand, if M and N are signed variables (32768 to 32767), then the V bit would be set on an overflow. ldd N addd #1000 ;RegD=N+1000 std M
Checkpoint 3.36: Write assembly code that adds a constant 100 to Register X.
These instructions subtract an 8-bit memory value from a register. The operation works for both signed and unsigned values. As designers, we must know in advance whether we have signed or unsigned numbers. The computer can not tell from the binary which type it is, so it sets both C and V. Our job as programmers is to look at the C bit if the values are unsigned and look at the V bit if the values are signed. The compare instructions do not change the register value. The condition code bits can be used by a conditional branch instruction to compare the two values. If the numbers represent unsigned values, then follow the subtraction with an unsigned conditional branch: beq bne bhi bhs blo bls. If the numbers represent signed values, then follow the subtraction with a signed conditional branch: beq bne bgt bge blt ble. cmpa cmpa cmpb cmpb suba suba subb subb tsta tstb
Compare RegA to 8-bit constant Compare RegA to 8-bit memory value Compare RegB to 8-bit constant Compare RegB to 8-bit memory value Subtract 8-bit constant from RegA Subtract 8-bit memory value from RegA Subtract 8-bit constant from RegB Subtract 8-bit memory value from RegB Test RegA Test RegB
Condition code bits are set after R X M, X is initial register value, and R is the final register value. N: result is negative N R7 Z: result is zero Z = R7•R6•R5•R4•R3•R2•R1•R0
3.8 䡲 Arithmetic Operations: Addition and Subtractions
81
V: signed overflow V X7•M7•R7 + X7•M7•R7 C: unsigned overflow C X7•M7 + M7•R7 + R7•X7 Example 3.11 Write assembly code to implement M N 10, where M and N are 8-bit variables. Solution First, we perform an 8-bit read, bringing N into Register A. Second, we subtract 10 to this value, and lastly, we store the result into M. Similar to the other examples, the C bit is set on an unsigned overflow, and the V bit is set on a signed overflow. ldaa N suba #10 ;RegA=N-10 staa M
These instructions subtract a 16-bit memory value from a register. Just like the 8-bit subtraction operators these operators works for both signed and unsigned values. Again, the condition code bits can be used by a conditional branch instruction to compare the two values. cpd cpd cpx cpx cpy cpy subd subd
Compare RegD to 16-bit constant Compare RegD to 16-bit memory value Compare RegX to 16-bit constant Compare RegX to 16-bit memory value Compare RegY to 16-bit constant Compare RegY to 16-bit memory value Subtract 16-bit constant from RegD Subtract 16-bit memory value from RegD
Condition code bits are set after R X M, X is initial register value, and R is the final register value. N: result is negative N R15 Z: result is zero Z R15•R14•R13•R12•R11•R10•R9•R8•R7•R6•R5•R4 •R3•R2•R1•R0 V: signed overflow V X15•M15•R15 + X15•M15•R15 C: unsigned overflow C X15•M15 + M15•R15 + R15•X15 Example 3.12 Write assembly code to implement M N 1000, where M and N are 16-bit variables. Solution Because we have 16-bit data, we will use Register D. First, we perform a 16-bit read, bringing N into Register D. Second, we subtract 1000 to this value, and lastly, we store the result into M. Like the other add and subtract examples, the C bit is set on an unsigned overflow, and the V bit is set on a signed overflow. ldd N subd #1000 ;RegD=N-1000 std M
To better understand how the computer translates our program into actions, we will analyze the explicit actions that occur as the software from Example 3.12 executes. Assuming this code is located at $5000, a partial listing file is shown below. Assume the variables are allocated in RAM. More specifically, assume the 16-bit variable N is located at $3800
82
3 䡲 Representation and Manipulation of Information
and $3801, and the 16-bit variable M is located at $3802 and $3803. Furthermore, assume N is initially equal to 12345. $5000 FC3800 $5003 8303E8 $5006 7C3802
ldd N subd #1000 ;RegD=N-1000 std M
The machine code is programmed into the flash EEPROM. In particular, locations $5000 to $5009 will contain the machine code for this program ($FC38008303E87C3802). We will assume the PC is initialized to $5000. Executing the ldd instruction will read the 16bit data located at address $3800 and $3801. Since we assumed N was initially 12345, this instruction will bring the 12345 into RegD. Since the ldd instruction uses extended mode addressing, the machine code contains the address of N, $3800. The address is fixed, meaning it will always fetch from variable N, but the data will be the current 16-bit value of this variable. Executing the subd instruction will subtract 1000 from RegD. If RegD were previously 12345, this instruction makes it 11345. Since this is immediate mode addressing, the number 1000 can be found in the machine code itself, in this case 1000 equals $03E8. The machine code for the std instruction contains the address of M, $3802. This extended addressing mode store instruction produces write cycles to addresses $3802 and $3803, the result of the addition to be written into the memory variable M. Address
Object Code
Source Code
Action
After Completion
$5000 $5003 $5006
FC3800 8303E8 7C3802
ldd N subd #1000 std M
D {N} 12345 D 11345 {M} 11345
PC $5003 PC $5006 PC $500A
Observation: Immediate addressing is used for constants and extended addressing is used for variables and I/O ports.
There are a full set of increment and decrement instructions, which operate properly on either signed or unsigned values. These instructions use inherent addressing. The Z bit is set if the result is zero. RegA=RegA-1 RegA=RegA-1 RegX=RegX-1 RegY=RegY-1 RegA=RegA+1 RegB=RegB+1 RegX=RegX+1 RegY=RegY+1
We begin the design of an adder circuit with a simple subcircuit called a binary full adder, as shown in Figure 3.18. There are two binary data inputs A and B and a carry input, Cin. There is one data output Sout, and one carry output, Cout. As shown in Table 3.17, Cin, A, and B are three independent binary inputs each having a significance or 0 or 1. These three inputs are added together (the sum could be 0, 1, 2, or 3) and the result is encoded in the 2-bit binary result with Cout as the most significant bit and Sout as the least significant bit. Cout is true if the sum is 2 or 3, and Sout is true if the sum is 1 or 3. Figure 3.18 A binary full adder.
Cin A
A^B 74HC86
A&B B 74HC08
(A^B)^ Cin 74HC86 (A^B)& Cin 74HC08
S out ((A^B)&Cin )|(A&B) Cout
74HC32
3.8 䡲 Arithmetic Operations: Addition and Subtractions Table 3.17 Input/output response of a binary full adder.
A
B
Cin
A ⴙ B ⴙ Cin
Cout
Sout
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
0 1 1 2 1 2 2 3
0 0 0 1 0 1 1 1
0 1 1 0 1 0 0 1
83
We can build an 8-bit adder by concatenating eight binary full adders together, as shown in Figure 3.19. The carry into the 8-bit adder is zero, and the carry out will be saved in the carry bit of the CCR. We are now ready to understand the sequence of events required to execute the instruction adda #64. First the current value of Register A and the constant 64 are connected to the two 8-bit inputs of the 8-bit binary adder. Next, the result of the addition is stored in back in Register A, and the condition code bits are appropriately set. Assume Register A is initially 224. The shaded boxes and italized values in Figure 3.19 show this particular case where the A input to the adder is 224 (111000002) and the B input equals 64 (010000002). For bits 0, 1, 2, 3, and 4, the A and B inputs are 0, so the carry out and result bits will also be 0. Since A5 is 1 and B5 equals 0, the result R5 will also be 1, and the carry from 5 to 6 will be 0. For bit 6, both A6 and B6 are 1, so the result R6 is zero, but the carry between bits 6 and 7 is 1. For bit 7, the carry in and A7 are 1, so the result R7 is 0, and the carry out is 1. The carry out of bit 7 will be the represent the unsigned overflow for the entire 8-bit addition. For an 8-bit unsigned number, there are only 256 possible values, which are 0 to 255. We can think of the numbers as positions along a circle, like a clock. There is a discontinuity at the 0|255 interface, everywhere else adjacent numbers differ by 1. If we add two unsigned numbers, we start at the position of the first number a move in a clockwise direction the number of steps equal to the second number. As shown in Figure 3.20, if 96 64 is performed in 8-bit unsigned precision, the correct result of 160 is obtained. In this case,
Figure 3.19 We make an 8-bit adder using eight binary full adders.
0 0 0
A0 A1 A2 A3 A4 A5 A6 A7
A B
Cin S out Cout 0
0
A
Cin
0
B
Cout
A
Cin
B
Cout
A
Cin
B
Cout
A
Cin S out Cout 0
0
Cin
1
0 0
S out
R1 0
0 0
R2 0 0
R3
S out 0
0
B0 B1 B2 B3 B4 B5 B6 B7
0
B 1 0
A B
Cout
S out
Cin
0
A B
Cout
A B
Cin S out Cout 1
S out
R5 R6
1
1 0
R4
0
1 1
224 + 64 1 32
0
S out
0 0
R0
0
R7 carry
1
$E0 + $40 $20
11100000 + 01000000 1 00100000
84
3 䡲 Representation and Manipulation of Information
Figure 3.20 Number wheel showing 96 64 and 224 64.
+64 255 0
255 0 224
32
64
192
160
64
192
96 128
128 +64
the carry bit will be 0 signifying the answer is correct. On the other hand, if 224 64 is performed in 8-bit unsigned precision, the incorrect result of 32 is obtained. In this case, the carry bit will be 1, signifying the answer is wrong. To calculate the negative of a two’s complement number, we complement all of the bits and add 1. For example, the 8-bit binary representation for 100 is 10011100. The complement of this binary value is 01100011. When we add 1 to 01100011, we get the binary 01100100, which is the proper representation for 100. Using this fact, we can build an 8-bit subtractor (R A B) by first negating B, then using eight binary full adders to add A plus B, as shown in Figure 3.21. The carry into the 8-bit adder is one, and the carry out is inverted and saved in the carry bit of the CCR. Again, let’s consider the sequence of events required to execute the instruction suba #64. First the current value of Register A and the constant 64 are connected to the two Figure 3.21 We make an 8-bit subtractor using eight binary full adders.
1 0 1
0
A0 A1 A2 A3 A4 A5 A6 A7
A B
Cin Cout
0
S out
0 1
0
A B
Cin Cout
0
S out
1
A B
Cin Cout
1
A B
Cin Cout
B0 B1 B2 B3 B4 B5 B6 B7
1
A B
Cin Cout
1
Cin
0
R4
S out 1 1
A B
Cout
A B
Cin S out Cout
A B
Cin
S out
0
1
1
Cout
1
S out
R6
0
0 0
R5
1
0 1
00100000 – 01000000 1 11100000
1
1 0
R3
0
0 0
$20 – $40 1 $E0
1
S out
R7
0 1
32 – 64 224 1
R2
0
S out
0 0
R1
1
0 0
R0
1
carry
3.8 䡲 Arithmetic Operations: Addition and Subtractions
85
8-bit inputs of the 8-bit binary subtractor. The eight NOT gates perform the complement of B. The eight binary adders perform the addition of A and the complement of B. Notice that the carry into the 8-bit adder is 1 as needed to implement the subtraction A B. Lastly, the result of the subtraction is stored in back in Register A, and the condition code bits are appropriately set. Assume Register A is initially 32. The shaded boxes and italized values in Figure 3.21 show this particular case where the A input to the subtractor is 32 (001000002) and the B input equals 64 (010000002). For bits 0, 1, 2, 3, and 4, the A and B inputs are 0, so the carry out will be 1 and result bits will be 0. The full adder in bit 5 sees three ones, so the result R5 will also be 1, and the carry from 5 to 6 will be 1. For bit 6, the full adderess sees just a single 1, so the result R6 is 1, and there is no carry from bit 6 to bit 7. Again for bit 7, the full adderess sees just a single 1, so the result R7 is 1, and there is no carry out of bit 7. The complement of the carry out of bit 7 will be the represent the unsigned overflow for the entire 8-bit subtraction. For subtraction, we start at the position of the first number a move in a counterclockwise direction the number of steps equal to the second number. As shown in Figure 3.22, if 160 64 is performed in 8-bit unsigned precision, the correct result of 96 is obtained (carry bit will be 0.) On the other hand, if 32 64 is performed in 8-bit unsigned precision, the incorrect result of 224 is obtained (carry bit will be 1.) Figure 3.22 Number wheel showing 160 64 and 32 64.
255 0
255 0 224
64
192
160
–64 32
64
192
96 128
128
–64
In general, we see that the carry bit is set when we cross over from 255 to 0 while adding or cross over from 0 to 255 while subtracting. Observation: The carry bit, C, is set after an unsigned addition or subtraction when the result is incorrect.
For an 8-bit signed number, the possible values range from 128 to 127. Again there is a discontinuity, but this time it exists at the 128|127 interface, everywhere else adjacent numbers differ by 1. The meanings of the numbers with bit 7 1 are different from unsigned, but we add and subtract signed numbers on the number wheel in a similar way (e.g., addition of a positive number moves clockwise.) Therefore, we can use the same hardware (Figures 3.19 and 3.21) to add and subtract two’s complement signed numbers. The only difference is the carry out generated by the circuits do not represent an error when adding or subtracting two’s complement signed numbers. Instead a new bit, called overflow or V, will be calculated to signify errors when operating on signed numbers. Adding a negative number is the same as subtracting a positive number hence this operation would cause a counterclockwise motion. As shown in Figure 3.23, if 32 64 is performed, the correct result of 32 is obtained. In this case, the overflow bit will be 0, signifying the answer is correct. On the other hand, if 96 64 is performed, the incorrect result of 96 is obtained. In this case, the overflow bit will be 1, signifying the answer is wrong.
86
3 䡲 Representation and Manipulation of Information
Figure 3.23 Number wheel showing 32 64 and 96 64.
+64
–1 0
–1 0 –32 32
64
–64
64
–64
–96 –128 127
96 –128 127
+64
For subtracting signed numbers, we again move in a counterclockwise direction. Subtracting a negative number is the same as adding a positive number, hence this operation would cause a clockwise motion. As shown in Figure 3.24, if 32 64 is performed, the correct result of 32 is obtained (overflow bit will be 0.) On the other hand, if 96 64 is performed, the incorrect result of 96 is obtained (overflow bit will be 1.) Figure 3.24 Number wheel showing 32 64 and 96 64.
–1 0
–64
–1 0
–32 32
64
–64
64
–64
–96 –128 127
–64
96 –128 127
In general, we see that the overflow bit, V, is set when we cross over from 127 to 128 while adding or cross over from 128 to 127 while subtracting. Observation: The overflow bit, V, is set after a signed addition or subtraction when the result is incorrect.
Another way to determine the overflow bit after an addition is to consider the carry out of bit 6. The V bit will be set of there is a carry out of bit 6 (into bit 7) but no carry out of bit 7 (into the C bit). It is also set if there is no carry out of bit 6 but there is a carry out of bit 7. Let A7, A6, A5, A4, A3, A2, A1, A0 and B7, B6, B5, B4, B3, B2, B1, B0 be the individual binary bits of the two 8-bit numbers that are to be added, and let R7, R6, R5, R4, R3, R2, R1, R0 be individual binary bits of the 8-bit sum, as implemented in Figure 3.19. The N bit is set if the unsigned result is above 127 or if the signed result is negative. N R7 The Z bit is set if the result is zero. The Z bit will be clear if any of the result bits is one. Z = R7 & R6 & R5 & R4 & R3 & R2 & R1 & R0 If the V bit is set after a signed addition, then the result is incorrect because a signed overflow occurred. The first term of the following equation is true if you add two negative
3.8 䡲 Arithmetic Operations: Addition and Subtractions
87
numbers together and get a positive result. The second term is true if you add two positive numbers together and get a negative result. V = A7 & B7 & R7 + A7 & B7 & R7 If the C bit is set after an unsigned addition, then the result is incorrect because an unsigned overflow occurred. The first term of the following equation is true if you add two numbers both above 127. The second term is true if the A input is above 127, but the result is less than or equal to 127. The third term is true if the B input is above 127, but the result is less than or equal to 127. C = A7 & B7 + A7& R7 + B7& R7 Checkpoint 3.37: Assume Register A is initially 100. After executing the instruction adda #64 what is the value in Register A, and the NZVC bits? Checkpoint 3.38: Assume Register A is initially 100. After executing the instruction adda #-64 what is the value in Register A, and the NZVC bits?
In a similar manner, let the result R be the result of the subtraction A B, as implemented in Figure 3.21. The N and Z bits are the same as with addition. N R7 Z = R7 & R6 & R5 & R4 & R3 & R2 & R1 & R0 If the V bit is set after a signed subtraction (R A B), then the result is incorrect because a signed overflow occurred. The first term of the following equation is true if you subtract a negative number minus a positive number together and get a positive result (a negative number minus a positive number should still be negative). The second term is true if you subtract a positive number minus a negative number together and get a negative result (a positive number minus a negative number should still be positive). V = A7 & B7 & R7 + A7 & B7 & R7 If the C bit is set after an unsigned subtraction (R A B), then the result is incorrect because an unsigned overflow occurred. The first term of the following equation is true if you subtracted a number greater than 127 (B 127) from a number less than 128 (A 128). The second term is true if the B input is above 127, but the result is greater than 127. The third term is true if the A input is less than or equal to 127, but the result is greater than 127. C = A7 & B7 + B7 & R7 + A7 & R7 Checkpoint 3.39: Assume Register A is initially 200. After executing the instruction suba #64 what is the value in Register A, and the NZVC bits? Checkpoint 3.40: Assume Register A is initially 200. After executing the instruction suba #-64 what is the value in Register A, and the NZVC bits? Common Error: Ignoring overflow (signed or unsigned) can result in significant errors. Observation: Microcomputers have two sets of conditional branch instructions (if statements) that make program decisions based on either the C or V bit.
There some instructions that operate only on signed numbers and others that work only for unsigned numbers. An error will occur if you use unsigned instructions after operating on signed numbers, and vice-versa. There are some applications where arithmetic errors are not possible. For example if we had two 8-bit unsigned numbers that we knew were in the range of 0 to 100, then no overflow is possible when they are added together. Typically, the numbers we are processing are either signed or unsigned (but not both), so we need only consider the corresponding C or V bit (but not both the C and V bits at the same
88
3 䡲 Representation and Manipulation of Information
time.) In other words, if the two numbers are unsigned, then we look at the C bit and ignore the V bit. Conversely, if the two numbers are signed, then we look at the V bit and ignore the C bit. There are two appropriate mechanisms to deal with the potential for arithmetic errors when adding and subtracting. The first mechanism, used by most compilers, is called promotion. Promotion involves increasing the precision of the input numbers, and performing the operation at that higher precision. An error can still occur if the result is stored back into the smaller precision. Fortunately, the program has the ability to test the intermediate result to see if it will fit into the smaller precision. To promote an unsigned number we add zeros to the left side. In a previous example, we added the unsigned 8-bit 224 to 64 and got the wrong result of 32. With promotion, we first convert the two 8-bit numbers to 16 bits, then add Decimal 224 + 64 288
We can check the 16-bit intermediate result (e.g., 288) to see if the answer will fit back into the 8-bit result. In Figure 3.25, A and B are 8-bit unsigned inputs, A16, B16, and R16 are 16-bit intermediate values, and R is an 8-bit unsigned output. The oval symbol represents the entry and exit points, the rectangle is used for calculations, and the diamond shows a decision. We will use parallelograms to perform input/output functions. Figure 3.25 Flowcharts showing how to use promotion to detect and correct unsigned arithmetic errors.
Unsigned add
Unsigned sub
Promote A to A16 Promote B to B16
Promote A to A16 Promote B to B16
R16=A16+B 16
R16=A16–B16
ok R16 < 255
R16
overflow R16 >255 R=255
R=R16 end
ok R16 > 0
R16
underflow R16 < 0 R=0
R=R16 end
The C code in Program 3.2 adds and subtracts two 8-bit values, using promotion to detect for errors. Program 3.2 Using promotion to detect and compensate for unsigned overflow errors.
unsigned char A,B,R; void add(void){ unsigned short result; result = A+B; /* promote and perform 16-bit addition */ if(result>255){ /* check for overflow */ result = 255; /* yes, overflow occurred */ } R = result; /* demote back to 8 bits */ } void sub(void){ short result; result = A-B; /* promote and perform 16-bit subtraction */ if(result<0){ /* check for underflow */ result = 0; /* yes, underflow occurred */ } R = result; /* demote back to 8 bits */ }
3.8 䡲 Arithmetic Operations: Addition and Subtractions
89
To promote a signed number, we duplicate the sign bit as we add binary digits to the left side. Earlier, we performed the 8-bit signed operation 96 64 and got a signed overflow. With promotion we first convert the two numbers to 16 bits, then subtract Decimal -96 - 64 -160
We can check the 16-bit intermediate result (e.g., 160) to see if the answer will fit back into the 8-bit result. In Figure 3.26, A and B are 8-bit signed inputs, A16, B16, and R16 are 16-bit signed intermediate values, and R is an 8-bit signed output. Figure 3.26 Flowcharts showing how to use promotion to detect and correct signed arithmetic errors.
Signed add
Signed sub
Promote A to A16 Promote B to B16
Promote A to A16 Promote B to B16
R16=A16+B 16 underflow R16 < –128
R16
R = –128
R16=A16–B16
overflow R16 >127
underflow R16 < –128
R=127
R = –128
R16
overflow R16 >127 R=127
R=R16
R=R16
end
end
The C code in Program 3.3 adds and subtracts two 8-bit signed numbers. The compiler will automatically promote A and B to signed 16-bit values before the addition. Program 3.3 Using promotion to detect and compensate for signed overflow errors.
char A,B,R; void add(void){ short result = A+B; /* if(result>127){ /* result = 127; /* } if(result<-128){ /* result = -128; /* } R = result; /* } void sub(void){ short result = A-B; /* if(result>127){ /* result = 127; /* } if(result<-128){ /* result = -128; /* } R = result; /* }
result; promote and perform 16-bit addition */ check for overflow */ yes, overflow occurred */ check for underflow */ yes, underflow occurred */ demote back to 8 bits */ result; promote and perform 16-bit subtraction */ check for overflow */ yes, overflow occurred */ check for underflow */ yes, underflow occurred */ demote back to 8 bits */
Observation: When performing calculations on 8-bit numbers, most C compilers for the 9S12 will first promote to 16 bits, perform the operations using 16-bit operations, then demote the result back to 8 bits.
90
3 䡲 Representation and Manipulation of Information Common Error: Even though most C compilers automatically promote to a higher precision during the intermediate calculations, they do not check for overflow when demoting the result back to the original format.
We will put off implementing the functions of Programs 3.2 and 3.3 in assembly language until Chapter 10 (Programs 10.4 and 10.5), after the necessary registers and instructions have been introduced. On the other hand, with just a couple more instructions, we can use another approach to detect and correct overflow errors occurring during addition and subtraction. The following instructions are a few of the conditional branch instructions available on the 9S12. In each of these cases, the instruction will test one of the condition code bits and branch (change the PC) if the condition exists. If the condition is false, then the program does not branch, and the computer will continue execution with the instruction immediately following the conditional branch. target can be any nearby1 label within our program. bcc bcs beq bne bmi bpl bra brn bvc bvs jmp
;Branch to target if C=0 ;Branch to target if C=1 ;Branch to target if Z=1 ;Branch to target if Z=0 ;Branch to target if N=1 ;Branch to target if N=0 ;Branch to target always ;Branch to target never ;Branch to target if V=0 ;Branch to target if V=1 ;Branch to target always, extended addressing
The other mechanism for handling addition and subtraction errors is called ceiling and floor. It is analogous to movements inside a room. If we try to move up (add a positive number or subtract a negative number) the ceiling will prevent us from exceeding the bounds of the room. Similarly, if we try to move down (subtract a positive number or add a negative number) the floor will prevent us from going too low. The ceiling and floor prevent us from leaving the room. For our 8-bit addition and subtraction, we will prevent the 0 to 255 and 255 to 0 crossovers for unsigned operations and 128 to 127 and 127 to 128 crossovers for signed operations. These operations are described by the flowcharts in Figure 3.27. If the carry bit is set after an unsigned addition the result is adjusted to the largest possible unsigned number (ceiling). If the carry bit is set after an unsigned subtraction, the result is adjusted to the smallest possible unsigned number (floor.) Figure 3.27 Flowcharts showing how to use overflow bits to detect and correct unsigned arithmetic errors.
Unsigned add
Unsigned sub
R=A+B
R=A–B C=1
C=1 C
C C=0 end
1
R=255
C=0
R=0
end
The branch location must be with 127 bytes of the current location.
3.8 䡲 Arithmetic Operations: Addition and Subtractions
91
There are no mechanisms in C to access the condition code bits of the processor. So, implementation of this approach must be performed in assembly language. The pseudo-op rmb stands for reserve multiple bytes. The operand field specifies the number of bytes, and we use it to create global variables. Assume A8 B8 and R8 are three 8-bit (1-byte) global variables defined in RAM. A8 B8 R8
rmb rmb rmb
1 1 1
;Input ;Input ;Output
The following assembly language adds two unsigned 8-bit numbers, using the algorithm presented in Figure 3.27. ldaa adda bcc ldaa staa
OK1
A8 B8 OK1 #255 R8
;get first input ;A8+B8 ;if C=0, then no error, so skip to the end ;overflow
The following assembly language subtracts two unsigned 8-bit numbers. ldaa suba bcc ldaa staa
OK2
A8 B8 OK2 #0 R8
;get first parameter ;A8-B8 ;if C=0, then no error, so skip to the end ;underflow
Signed addition and subtraction are described by the flowcharts in Figure 3.28. If the overflow bit is set after a signed operation the result is adjusted to the largest (ceiling) or smallest (floor) possible signed number depending on whether it was a 128 to 127 cross over (N0) or 127 to 128 cross over (N1). Notice that after a signed overflow, the sign of the result is always wrong because there was a cross over. Figure 3.28 Flowcharts showing how to use overflow bits to detect and correct signed arithmetic errors.
Signed add
Signed sub
R=A+B
R=A–B
V V=0
V=1 V=0
R = 127 N N=1
ok3
V
err3
end
N=0
V=1
N N=1
R = –128 ok4
err4
R = 127 N=0 R = –128
end
The following assembly language adds two signed 8-bit numbers, using the algorithm presented in Figure 3.28. ldaa adda bvc err3 bpl ldaa bra under3 ldaa ok3 staa
A8 ;get first input B8 ;A8+B8 ok3 ;if V=0, then no error, so skip to the end under3 #127 ;if V=1 and N=1, it was overflow ok3 #-128 ;if V=1 and N=0, it was underflow R8
92
3 䡲 Representation and Manipulation of Information
The following assembly language subtracts two signed 8-bit numbers. ldaa suba bvc err4 bpl ldaa bra under4 ldaa ok4 staa
3.9
A8 ;get first parameter B8 ;A8-B8 ok4 ;if V=0, then no error, so skip to the end under4 #127 ;if V=1 and N=1, it was overflow ok4 #-128 ;if V=1 and N=0, it was underflow R8
Arithmetic Operations: Multiplication and Divide The algorithms for multiply and divide are included in this embedded systems book for two reasons. First, as programmers, we need to know how the computer works, so we can understand the strengths and limitations of our computer. Second, many embedded microcomputers have limited ability for performing mathematical operations. Although the 9S12 has signed and unsigned 16-bit multiplication, sometimes we need more precision than is supported by the processor. In these situations, we must use a more powerful microcomputer (if speed is important) or develop software algorithms for extended precision arithmetic (if speed is not important). We can perform unsigned multiplication using a combination of shift and addition operations. Let A and B be two unsigned 8-bit numbers, and R A•B. Simple calculations of 0•0 0 and 255•255 65025 illustrate the fact that the multiplication of two 8-bit numbers will fit into a 16-bit product. To develop the algorithm used by the multiplication hardware, define one of the multiplicands in its basis representation. B 128•b7 64•b6 32•b5 16•b4 8•b3 4•b2 2•b1 b0 Next, we distribute multiplication over addition R A•128•b7 A•64•b6 A•32•b5 A•16•b4 A•8•b3 A•4•b2 A•2•b1 A•b0 We can simplify the equation leaving only one-bit shifts R 2•(2•(2•(2•(2•(2•(2•A•b7 A•b6) A•b5) A•b4) A•b3) A•b2) A•b1) A•b0 The multiplication by a power of 2 is a logical shift left, and the multiplication by a binary bit (0 or 1) is an add or no-add conditional. This equation motivates the following multiplication algorithm. The multiplication function will be implemented as digital hardware in the processor, and available as an assembly language instruction, but here it is shown as a C function in Program 3.4. In particular, this exact function is available as the assembly
Program 3.4 Unsigned 8-bit times 8-bit multiplication yielding a 16-bit product.
unsigned short mul(unsigned char A, unsigned char B){ unsigned short R = 0; /* result, R=A*B */ int n; for(n=0; n<8; n++){ R = R<<1; /* shift left */ if(B&0x80){ /* should we add? */ R = R+A; /* A is promoted first, then added */ } B = B<<1; /* move next bit into bit 7 position */ } return R; }
3.9 䡲 Arithmetic Operations: Multiplication and Divide
93
instruction mul. For an 8-bit multiply, we will use 16-bit shifts and additions, yielding a 16-bit product. Since the product, R, is a 16-bit unsigned number, there can be no overflow error in this 8 by 8 into 16-bit multiply. To better understand multiplication, we will hand execute Program 3.4 for A 100, B 10. In particular, Table 3.18 shows the values of n, B, and R after the if-statement, but before the B B << 1.
B7 is zero, no addition B6 is zero, no addition B5 is zero, no addition B4 is zero, no addition B3 is one, add 100 to R B2 is zero, no addition B1 is one, add 100 to R B0 is zero, no addition
For signed multiplication we will first check the sign of each input, perform an unsigned multiplication on the absolute values of the inputs, then negate the product if necessary. Simple calculations of 128•128 16384, 128•127 16256, and 127•127 16129 illustrate the fact that the multiplication of two 8-bit signed numbers will always fit into a signed 16-bit product. The algorithm for signed multiplication is presented as a C function in Program 3.5.
Program 3.5 Signed 8-bit times 8-bit multiplication yielding a 16-bit product.
short smul(char A, char B){ int sign = 0; /* 0 means positive */ if(A < 0){ sign++; /* A is negative */ A = -A; /* absolute value */ } if(B < 0){ sign--; /* B is negative */ B = -B; /* absolute value */ } if(sign) return -mul(A,B); /* product is negative */ else return mul(A,B); /* product is positive */ }
We can perform unsigned division using a combination of shift and subtract operations. Let N be the unsigned 16-bit dividend and M be the unsigned 8-bit divisor. The 8-bit quotient is Q N/M, and the 8-bit remainder is R N%M. Assuming the remainder is less than the divisor, there is a unique solution (Q,R) such that N equals M*Q R. In C, the division (/) and modulo (%) are separate operators, but in digital hardware and assembly language one operation produces both results. The overflow can occur two ways in a 16-bit by 8-bit division. A divide by zero causes an error, and an overflow error occurs if the quotient does not fit into 8 bits. Binary long division is very similar to decimal long division, the way you learned to divide in elementary school. We line the divisor up under the dividend shifted as far to the left as possible so that the line-up (shifted) divisor is less than the dividend. In decimal
94
3 䡲 Representation and Manipulation of Information
division, we have to determine the decimal digit 0 to 9 of the quotient and multiply the divisor by that digit. In binary division, the binary digits are 0 or 1, so no multiplication step is required. We just subtract the shifted divisor from the dividend and record a 1 in that place-value for the quotient. We repeat the operation on the result of the subtraction. Figure 3.29 shows the two shift/subtract operations required to divide 1004 by 100, yielding a quotient of 10 and a remainder of 4. The divisor is subtracted twice yielding the two binary bits in the quotient of 10. Figure 3.29 Binary long division example showing 1004 divided by 100.
Program 3.6, written in C, uses global variables to input/output parameters. The dividend, N, will be modified during execution and become the remainder. Again, division on an actual computer is usually performed in hardware by the processor, and the C program is presented here only to illustrate the division algorithm. The available assembly instructions for division will be presented in the next chapter. A 64-bit by 32-bit assembly language version of this algorithm can be found in the file math.rtf, which is included as one of the TExaS examples, and it will be presented in Chapter 8. Program 3.6 Unsigned 16-bit by 8-bit division yielding an 8-bit quotient and an 8-bit remainder.
unsigned short N; /* dividend, becomes the remainder */ unsigned char M; /* divisor */ unsigned char Q; /* quotient */ int error; /* -1 is divide by zero, 0 is OK, 1 is overflow */ void div(void){ unsigned short M16; int i; if(M) error = 0; else{ error = -1; return; /* divide by zero */ } Q = 0; /* actually this step can be omitted */ M16 = M<<8; /* M16 is divisor left-justified in 16 bits */ for(i=0; i<<8; i++){ M16 = M16>>1; /* logical shift right */ Q = Q<<1; /* logical shift left */ if(N > M16){ /* should we subtract? */ Q = Q|0x01; /* yes, set bit in the quotient */ N -= M16; /* reduce dividend, transform into remainder */ } /* table values collected at this point */ } if(N > M) /* N is now the remainder */ error = 1; /* overflow if it is bigger than the divisor */ }
Hand execution, or desk checking, is a convenient way to test a software algorithm. Table 3.19 shows the values of i, N, M16, and Q within the for-loop after the if-statement. Assume initially N equals 1004, M equals 100. The statement M16M<<8; will initialize M16 to 25600.
3.9 䡲 Arithmetic Operations: Multiplication and Divide Table 3.19 Example division of 1004 divided by 100.
i
N
M16
Q
Comments
0 1 2 3 4 5 6 7
1004 1004 1004 1004 204 204 4 4
12800 6400 3200 1600 800 400 200 100
0 0 0 0 1 2 5 10
Q7 is zero Q6 is zero Q5 is zero Q4 is zero Q3 is one, add bit to Q, subtract from N Q2 is zero Q1 is one, add bit to Q, subtract from N Q0 is zero
95
For signed division, we force the remainder to be the same sign as the dividend. The absolute value of the remainder will be less than the division, and N equals M*QR. Checkpoint 3.41: Why do we have to specify a rule for the sign of the remainder?
The mul instruction performs an 8-bit by 8-bit into 16-bit unsigned multiply, giving RegD RegA*RegB, as shown in Figure 3.30. No overflow is possible. It can also be used to perform an 8 by 8-bit into 8-bit signed multiply, giving RegB RegA*RegB. Figure 3.30 The mul instruction takes two 8-bit inputs and generates a 16-bit product.
Register D Register A * Register B 8 bits
=
Register A Register B
8 bits
16 bits
Condition code bits are set after RA*B. C: R7, set if bit 7 of 16-bit result is one Checkpoint 3.42: Prove the mul instruction can’t overflow.
Example 3.13 Write assembly code to implement unsigned M 5*N 25, where M is 16 bits and N is 8 bits. Solution First, we perform an 8-bit read, bringing N into Register A. Second, we bring the constant into Register B. Third, we multiply the value of N times 5. This results in a 16-bit product in Register D. Next, we add the constant 25 to Register D, and lastly we store the result into M. Since the result is stored into a 16-bit variable, and the largest possible result is 5*255 25 1300, no overflow can occur. ldaa ldab mul addd std
N #5
;0 to 255
;RegD=5*N, 0 to 1275 #25 ;RegD=5*N+25, 25 to 1300 M
The idiv instruction performs a 16-bit by 16-bit unsigned divide with remainder, giving RegXRegD/RegX, as shown in Figure 3.31. Register D is the remainder. Figure 3.31 The idiv instruction takes two 16-bit inputs and generates a 16-bit quotient and a 16-bit remainder.
Register D
/
Register X
=
Register X
remainder =
Register D
96
3 䡲 Representation and Manipulation of Information
Condition code bits are set after quotient dividend/divisor or Q D/X. Z: result is zero, Z Q15•Q14•Q13•Q12•Q11•Q10•Q9•Q8•Q7•Q6•Q5•Q4•Q3 •Q2•Q1•Q0 V: 0 C: divide by zero, C X15•X14•X13•X12•X11•X10•X9•X8•X7•X6•X5•X4•X3 •X2•X1•X0 Example 3.14 Write assembly code to implement M ⴝ 2.3*(N ⴙ 5.5), where M is 16 bits and N is 8 bits. Solution First, rewrite this equation using integer operations. Multiplying by 100 and dividing by 100 yields M (230*N 1265)/100. We bring the 8-bit N into Register A, and set Register B to 230. The mul instruction gives an unsigned 16-bit value equal to 230*N. Next, we perform a 16-bit addition to get 230*N 1265. We use the idiv instruction to implement the divide by 100. This results in a 16-bit result in Register X, and lastly, we store the result into M. Observe the comments of this program, which carefully consider the range of values at each step of the calculation in order to guarantee that overflow can not occur. ldaa ldab mul addd ldx idiv stx
N #230
;0 to 255
;230*N, 0 to 58650 #1265 ;230*N+1265, 0 to 59915 #100 ;(230*N+1265)/100, 0 to 599 M
Example 3.15 Write assembly code to scale an unsigned 8-bit integer into a number from 0 to 500. Solution Let N be the 8-bit input and M be the 16-bit output. The range of inputs is 0 to 255, so the conversion equation using integer operations is M (500*N)/255. If we factor out a 5 (M (100*N)/51), the constants will become small enough to use the 8-bit mul instruction. The mul instruction gives an unsigned 16-bit value equal to 100*N. We use the idiv instruction to implement the divide by 51. This results in a 16-bit result in Register X, and lastly, we store the result into M. ldaa N ldab #100 mul ldx #51 idiv stx M
;0 to 255 ;100*N, 0 to 25500 ;(100*N)/51, 0 to 500
Checkpoint 3.43: Give a single mathematical equation relating the dividend, divisor, quotient, and remainder. This equation gives a unique solution as long as you assume the remainder is strictly less than the divisor. Assume the sign of the remainder matches the sign of the dividend.
The fdiv instruction also performs a 16-bit by 16-bit unsigned divide with remainder. In constrast, this instruction calculates RegX (216*RegD)/RegX, as shown in Figure 3.32. RegD is the remainder.
3.10 䡲 Character Information Figure 3.32 The fdiv instruction takes two 16-bit inputs and generates a 16-bit quotient and a 16-bit remainder.
Register D
0
/
Register X remainder
=
97
Register X =
Register D
Condition code bits are set after R (65536*D)/X. Z: result is zero, Z R15•R14•R13•R12•R11•R10•R9•R8•R7•R6•R5•R4•R3 •R2•R1•R0 V: overflow if RegX is less than or equal to RegD, result $FFFF C: divide by zero, C X15•X14•X13•X12•X11•X10•X9•X8•X7•X6•X5•X4•X3 •X2•X1•X0 Example 3.16 Write assembly code to implement M 12.34*N, where M and N are unsigned 16 bits. Solution Remember, the fdiv instruction calculates RegX(65536*RegD)/RegX. If we initialize RegD to equal N, and set RegX to be some fixed constant I, then fdiv can be used to calculate (65536*N)/I. To make it work out, we approximate 12.34 as 65536/I. I.e., we want I to equal 65536/12.34, which is about 5311. The software first brings the 16-bit value of N into Register D. The constant 5311 is loaded into Register X. The fdiv instruction places the desired result into Register X. The overflow bit is set if N is greater than or equal to 5311, making 12.34*N larger than 65535. ldd N ;0 to 5310 ldx #5311 fdiv ;RegX=(65536*N)/5311, 0 to 65525 stx M ;V is set if the result is wrong Checkpoint 3.44: Let N and M be 8-bit unsigned locations. Write assembly code to implement M (7*N)/31.
3.10
Character Information We can use bytes to represent characters with the American Standard Code for Information Interchange (ASCII) code. Standard ASCII is actually only 7 bits, but is stored using 8-bit bytes with the most significant byte equal to 0. For example, the capital ‘V’ is defined by the 8-bit binary pattern %01010110. Table 3.20 shows the ASCII code for some of the commonly used nonprinting characters.
Table 3.20 Common special characters and their ASCII representations.
Abbr.
ASCII Character
Binary
Hexadecimal
Decimal
BS HT CR LF SP
Delete or Backspace Tab Enter or Return Line feed Space
%00001000 %00001001 %00001101 %00001011 %00100000
$08 $09 $0D $0B $20
8 9 13 11 32
98
3 䡲 Representation and Manipulation of Information
The 7-bit ASCII code definitions are given in the Table 3.21. For example, the letter ‘V’ is in the $50 row and the 6 column. Putting the two together yields hexadecimal $56. Table 3.21 Standard 7-bit ASCII.
BITS 4 to 6 0 1 0 1 2 3 4 5 6 7 8 9 A B C D E F
B I T S 0 T O 3
NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
DLE DC1/XON DC2 DC3/XOFF DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2
3
4
5
6
7
SP ! ” # $ % & ’ ( )
0 1 2 3 4 5 6 7 8 9 : ; = ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ DEL
*
, . /
Checkpoint 3.45: How is the character 0 represented in ASCII? Checkpoint 3.46: Assume Register A contains an ASCII code 0 to 9. Write assembly code that converts the ASCII code into the corresponding decimal number.
Standard ASCII code uses only 7 bits and thus can only represent 128 different characters. The ISO/IEC 8859 standard uses the eighth bit of the byte to define additional characters such as graphics and letters in other alphabets. This standard is jointly published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Unfortunuately, there can be one character with multiple numerical encodings or one numerical value that could represent different characters. This ambiguity has led to more complex encoding schemes using multiple bytes to represent character data such as the Unicode Standard, see http://www.unicode.org/. Unicode is an active and ongoing consortium with a goal to provide a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. ISO/IEC 10646 is the corresponding international standard synchronized with the Unicode Standard. As embedded systems are asked to communicate with other computers across the world, these standards will be a critical component for guaranteeing unambiguous communication. One way to encode a character string is to use null-termination. In this way, the characters of the string are stored one right after the other, and the end of the string is signified by the NUL character (0). For example, the string “Valvano” is encoded as these eight bytes $56, $61, $6C, $76, $61, $6E, $6F, $00. Typically we use a pointer to the first byte to identify the string, as shown in Figure 3.33. Figure 3.33 Strings are stored as a sequence of ASCII characters, followed by a null.
Pointer $56 $61 $6C $76 $61 $6E $6F $00
V a l v a n o
Checkpoint 3.47: How is “Hello World” encoded as a null-terminated ASCII string?
3.11 䡲 Conversions
3.11
99
Conversions We will illustrate the conversion between ASCII strings and binary numbers by developing high level functions in C. The purpose of introducing C programs is to introduce the conversion process. Later we will learn how to perform these operations in assembly language. In the first example, let Data be a fixed length string of three ASCII characters. Each entry of Data is an ASCII character 0 to 9. Let Data[0] be the ASCII code for the hundred’s digit, Data[1] be the ten’s digit and Data[2] be the one’s digit. Let n be an unsigned 16-bit integer. We will also need an index, i. From Table 3.21, we see the decimal digits 0 to 9 are encoded in ASCII as 0x30 to 0x39. So, to convert a single ASCII digit to binary, we simply subtract 0x30. To convert this string of three decimal digits into binary we can simply calculate n = 100*(Data[0]-0x30) + 10*(Data[1]-0x30) + (Data[2]-0x30);
This 3-digit ASCII string could also be calculated as n = (Data[2]-0x30) + 10*((Data[1]-0x30) + 10*(Data[0]-0x30));
If Data were a string of five decimal digits, we could put the above function into a loop n = 0; for (i=0; i<5 ;i++){ n = 10*n + (Data[i]-0x30); }
If Data were a variable length string of ASCII characters terminated with a null character (0), we could convert it to binary using a while loop, as shown in Program 3.7. Program 3.7 Unsigned ASCII string to decimal conversion.
/* Convert ASCII string to unsigned 16-bit decimal */ unsigned short Str2UDec(unsigned char Data[]){ unsigned short n = 0; /* the number */ unsigned int i = 0; /* index into Data */ while (Data[i] != 0){ n = 10*n + (Data[i]-0x30); i++; } return n; }
The example, shown in Program 3.8, also converts an ASCII string into a 16-bit unsigned decimal number. The program uses pointer syntax to access the string, and processes only the ASCII digits ‘0’ to ‘9’. Other ASCII characters in the string are ignored. Program 3.8 Unsigned ASCII to decimal conversion.
unsigned short Str2UDec(unsigned char *sPt){ unsigned short n = 0; /* the number */ unsigned char character; while(*sPt){ /* accepts until null */ character = (*sPt++); /* fetch next character */ if((character >= '0') && (character <= '9')){ n = 10*n+(character-0x30); /* might overflow */ } } return n; }
100
3 䡲 Representation and Manipulation of Information
The example, shown in Program 3.9, uses an I/O device capable of sending and receiving ASCII characters. For 9S12 systems this device could be a PC computer connected to the serial port of the 9S12 microcomputer. For a simulation of this configuration see the examples in TExaS called TUT2 TUT4 or SCI. The function InChar() returns an ASCII character from the I/O device. The function OutChar() sends an ASCII character to the I/O device. The function InUDec() will accept characters from the device until a carriage return (the Enter key) is typed. Only the numbers are echoed. Program 3.9 Input an unsigned decimal number.
#define CR 0x0D /* InUDec accepts ASCII input in unsigned decimal format and converts to a 16-bit unsigned number up to 65535 If n>65535, it will truncate without reporting the error */ unsigned short InUDec(void){ unsigned short n=0; unsigned char character; while((character=InChar()) != CR){ /* accepts until <enter> */ if((character >= '0') && (character <= '9')){ n = 10*n+(character-0x30); /* overflows if above 65535 */ OutChar(character); /* echo this character */ } } return n; }
If the ASCII characters were to contain optional “” and “” signs, we could look for the presence of the sign character in the first position, as shown in Program 3.10. If there is a minus sign then set a flag. Next use our unsigned conversion routine to process the rest of the ASCII characters and generate the unsigned number, n. If the flag was Program 3.10 Input a signed decimal number.
// InSDec accepts ASCII input in signed decimal format / and converts to a signed 16-bit number short InSDec(void){ short n=0, sign=1; /* sign flag 1=positive 0=negative */ unsigned int length=0; unsigned char character; while((character=InChar()) != CR){ /* accepts until <enter> */ if (!length){ /* + or - only valid as first char if (character == ‘-’){ sign = -1; length++; OutChar(‘-’); /* if - inputted, sign is negative */ } else if (character == ‘+’){ length++; OutChar(‘+’); /* if + inputted, sign is positive */ } } if((character >= '0') && (character <= '9')){ n = n*10+character-'0'; /* overflows if above 32767 */ length++; OutChar(character); } } return sign*n; }
*/
3.11 䡲 Conversions
101
previously set, we can negate the value n. The length is used to guarantee the and are only processed as the first character. To convert an unsigned integer into a fixed length string of ASCII characters, we could use the integer divide. Assume n is an unsigned integer less than or equal to 999: Data[0] = n/100 + 0x30; n = n%100; /* n is now between 0 and 99 */ Data[1] = n/10 + 0x30; n = n%10; /* n is now between 0 and 9 */ Data[2] = n + 0x30;
To convert an unsigned integer into a variable length string of ASCII characters, we convert the digits in reverse order, then switch them. This conversion technique is shown in Program 3.11. Program 3.11 Unsigned decimal to ASCII string conversion.
void UDec2Str(unsigned short n){ unsigned int size,i; unsigned char temp; unsigned char Data[10]; size = 0; /* the total number of ASCII characters */ do{ Data[size] = (n%10)+0x30; /* Start with the one’s digit */ n = n/10; /* go to next digit */ size++; } while(n != 0); for(i=0; i< size/2; i++ ){ temp = Data[i]; /* Reverse order */ Data[i] = Data[size-1-i]; /* exchange, swap */ Data[size-1-i] = temp; } Data[size] = 0; /* Null terminated ASCII string */ }
The example shown in Program 3.12 performs the same unsigned conversion, but uses recursion. The function OutChar()sends one ASCII character to the output device. A recursive function calls itself. There are two possibilities. If the number is between 0 and 9, the function does not call itself. The code nⴙ'0' will convert it to ASCII. If the number is between 10 and 99, the n/10 will calculate the tens digit and the n%10 will calculate the one digit. The assembly implementation will be presented as Program 5.12. Program 3.12 Print unsigned 16-bit decimal to an output device.
// Variable format 1 to 5 digits with no space before or after // uses recursion to convert a decimal number to an ASCII string void OutUDec(unsigned short n){ if(n >= 10){ OutUDec(n/10); n=n%10; } OutChar(n+’0’); /* n is between 0 and 9 */ }
To convert a signed number into ASCII, we can first test the sign of the input. If it is negative we issue the first ASCII character as the minus sign () negate the input so that it is now positive and use our unsigned conversion routine to generate the rest of the ASCII characters. This conversion technique is shown in Program 3.13.
102
3 䡲 Representation and Manipulation of Information
Program 3.13 Print signed 16-bit decimal to an output device.
void OutSDec(short n){ if(n<0){ n = -n; /* take absolute value */ OutChar(‘-’); /* the number is negative */ } OutUDec(n); /* print the absolute value */ }
Maintenance Tip: Even though the machine will process data in binary, we can specify numbers in many formats (e.g., binary, decimal, hexadecimal, etc.) When writing software, use the format that makes your software easiest to understand. There is no one format that is best for all situations.
3.12
Debugging Monitor Using a LED One of the important tasks in debugging a system is to observe when and where our software is executing. A debugging tool that works well for real-time systems is the monitor. In a real-time system, we need the execution time of the debugging tool to be small compared to the execution time of the program itself. Instrusiveness is defined as the degree to which the debugging code itself alters the performance of the system being tested. A monitor is an independent output process, somewhat similar to the print statement, but one that executes much faster, and thus is much less intrusive. An LED attached to an output port of the microcontroller is an example of a BOOLEAN monitor. You can place LEDs on unused output pins. Software toggles these LEDs to let you know where and when your program in running. Assume an LED is attached to Port T bit 6. Program 3.14 will toggle the LED.
Program 3.14 An LED monitor.
Toggle ldaa PTT eora #$40 staa PTT rts
void Toggle(void){ PTT ^= 0x40; }
A heartbeat is a pulsing output that is not required for the correct operation of the system, but it is useful to see the program is running. In particular, you add jsr Toggle statements at strategic places within your system. It only takes 16 bus cycles to execute. The DDRT must be initialized so that bit 6 is an output, before the debugging begins. You can either observe the LED directly, or look at the LED control signals with a high speed oscilloscope or logic analyzer. The LCD display, explained later in Chapter 10, can be an effective monitor for small amounts of information. Inexpensive LCDs can display from 8 to 160 characters. Unfortunately, it takes about 50 s to output each character, so the use of a LCD monitor might be intrusive. Observation: When using LED monitors it is better to modify just the one bit, leaving the other seven as is. In this way, you can have additional LED monitors.
3.13
Tutorial 3. Arithmetic and Logical Operations The purpose of the second tutorial is to study numbers, logical operations, and arithmetic operations. Computers internally store numbers in binary, but use decimal or hexadecimal when interacting with humans. Some computers implement floating point, but in this tutorial the formats will be restricted to signed and unsigned integers. We can observe the registers in the ViewBox. Registers are high-speed storage elements inside the processor. The 9S12 has registers named CCR, A, B, X, Y, SP, and PC. Registers A and B generally contain data and are used for arithmetic and logical operations. Registers A and B concatenated together also can be referred to as Register D. Registers
3.13 䡲 Tutorial 3. Arithmetic and Logical Operations
103
X and Y generally contain addresses and are used for pointers. The condition code register (CCR) contains the N, Z, V, and C bits. The N bit is set if the most significant bit of the result is true. The Z bit is set if the result is zero. In general, the C bit is set on an unsigned overflow, and the V bit is set on a signed overflow. The ViewBox is part of the microcomputer window, and is used to display and/or modify information. To change the value of a parameter in the ViewBox: click on the entry, type the new value in the Data field, then hit enter. The available ViewBox formats are listed in Table T3.1. True condition code bits are displayed as upper case letters and false bits are shown as lower case letters. For example sXhInzvc means S0, X1, H0, I1, N0, Z0, V0, and C0.
Format
Fescriptions
Examples
h d b H D B h or h d or d b or b H or H D or D B or B b3 b4 cc c or C s or S v V v or v V or V
8-bit unsigned hexadecimal 8-bit unsigned decimal 8-bit unsigned binary 16-bit unsigned hexadecimal 16-bit unsigned decimal 16-bit unsigned binary 8-bit signed hexadecimal 8-bit signed decimal 8-bit signed binary 16-bit signed hexadecimal 16-bit signed decimal 16-bit signed binary 3-bit binary (least significant bits) 4-bit binary (least significant bits) 8-bit binary showing bits in the CCR ASCII character NULL or EOT terminated ASCII string value of the address itself, unsigned decimal value of the address itself, unsigned hex value of the address itself, signed decimal value of the address itself, signed hex
Table T3.1 Available formats for displaying information in the ViewBox.
Action: Execute TExaS and open the Tutor3.rtf and Tutor3.uc files. Bring the ViewBox to the front. We will begin talking about unsigned numbers, so we will use the “d” format to observe values in Register A, and use the “D” format for Register X. Question 3.1 Register A contains an 8-bit integer. Its precision is 8 bits. As an unsigned number, its range of values is 0 to 255. What happens when you try to set Register A to 256? Question 3.2 What happens when you try to set Register A to 1? Question 3.3 Register X contains a 16-bit integer. Its precision is 16 bits. As an unsigned number, its range of values is 0 to 65535. What happens when you try to set Register X to 65536? Question 3.4 What happens when you try to set Register X to 1? Action: Next, we will study signed numbers. Change the format of Register A to the “d”, and change the format of Register X to the “D”. To change the format of a parameter in the ViewBox: click on the ViewBox entry, type the new format in the Format field, then hit enter. Question 3.5 As a signed number, the range of values in Register A is 128 to 127. What happens when you try to set Register A to 128? Question 3.6 What happens when you try to set Register A to 129?
104
3 䡲 Representation and Manipulation of Information Question 3.7 As a signed number, the range of values in Register X is 32768 to 32767.What happens when you try to set Register X to 32768? Question 3.8 What happens when you try to set Register X to 32769? Question 3.9 Use the help system of TExaS to look up the instruction ldaa and answer the question, “Even though the ldaa instruction does not perform any arithmetic or logical operations, does it modify the condition code bits, N, Z, V, and C?” Within the TExaS application right click on ldaa instruction in the source code and execute help. You could also execute Help-HelpTopics, double-click 6812 assembly language, double-click 6812 memory access instructions, click ldaa. Action: Assemble the Tutor3.rtf program, and bring the TheList.rtf TheLog.rtf and Tutor3.uc windows to the front. Notice that the instructions lsla and asla have the same object code. Question 3.10 First, perform the following logical operations by hand, and record what you think the result will be in 8-bit unsigned hexadecimal. In addition, record your expectation for the N and Z bits. Within TExaS, change the format of Register A to unsigned hexadecimal “h”. Reset and singlestep the program through part 1. Correct your answers by recording the proper values of Register A and the CCR bits N and Z. The logical operations clear the V bit. The complement instruction is the only one that sets the C bit, while the other logical operations only affect the N, Z, and V bits. $0F&$85 $0F|$85 $0F^$85 ~$0F Question 3.11 First, perform the following unsigned arithmetic operations by hand, and record what you think the result will be in 8-bit unsigned decimal format. In addition, record your expectation for the N, Z, and C bits. Although the processor will set the V bit during the calculation, we will ignore it when operating on unsigned integers. Within TExaS, change the format of Register A to unsigned decimal “d”. Single-step the program through part 2. Correct your answers by recording the proper values of Register A and the CCR bits N, Z, and C. 155>>1 50<<1 96+64 224+64 160-64 32-64 Question 3.12 First, perform the following signed arithmetic operations by hand, and record what you think the result will be in 8-bit signed decimal. In addition, record your expectation for the N, Z, and V bits. Although the processor will set the C bit during the calculation, we will ignore it when operating on signed integers. Within TExaS, change the format of Register A to signed decimal “d”. Single-step the program through part 3. Correct your answers by recording the proper values of Register A and the CCR bits N, Z, and V. -101>>1 -50<<1 -32+64 96+64 32-64 -96-64
3.14
Homework Assignments Homework 3.1 How many binary bits does it take to represent 12,345,678? How many bytes? Using the fact that 210 is about 103, it is possible to solve this question without a calculator. Homework 3.2 How many binary bits does it take to represent 9,876,543,210? How many bytes? Using the fact that 210 is about 103, it is possible to solve this question without a calculator.
3.14 䡲 Homework Assignments
105
Homework 3.3 In C, a char is 8 bits, a short is 16 bits, and a long is 32 bits. Assuming each is signed, give the range of each type of number. Homework 3.4 In C, an unsigned char is 8 bits, an unsigned short is 16 bits, and an unsigned long is 32 bits. Assuming each is unsigned, give the range of each type of number. Homework 3.5 How many binary bits is 23⁄4 decimal digits? Homework 3.6 About how many decimal digits is 14 binary bits? Homework 3.7 You know that the 8-bit hexadecimal representation for 1 is $FF. Use this fact and count backwards to quickly find the hexadecimal representations of 2, 3, and 4. Homework 3.8 You know that the 16-bit hexadecimal representation for 1 is $FFFF. Use this fact and count backwards to quickly find the hexadecimal representations of 2, 3, and 4. Homework 3.9 Each row of the following table is to contain an equal value expressed in binary, hexadecimal, and decimal. Complete the missing values. Assume the decimal values are unsigned. The first row illustrates the process. Binary
Hexadecimal
Decimal
$69 $45
105
%01101001
45 %10001111 $E4 99 %111001001101 $2B9 1000 Homework 3.10 Each row of the following table is to contain an equal value expressed in binary, hexadecimal, and decimal. Complete the missing values. Assume the decimal values are unsigned. The first row illustrates the process. Binary
Hexadecimal
Decimal
$AD $78
173
%10101101
123 %11111 $1234 36 %1000100001111101 $2456 54321 Homework 3.11 Each row of the following table is to contain an equal value expressed in binary, hexadecimal, and decimal. Complete the missing values. Assume each value is 8 bits and the decimal numbers are signed. The first row illustrates the process. Binary %01011110
Hexadecimal $5E $A2
Decimal 94 47
%11000011 $D1 75 %00101011 $B7 100
106
3 䡲 Representation and Manipulation of Information Homework 3.12 Each row of the following table is to contain an equal value expressed in binary, hexadecimal, and decimal. Complete the missing values. Assume each value is 8 bits and the decimal numbers are signed. The first row illustrates the process. Binary %11111110
Hexadecimal
Decimal
$FE $BD
2 88
%00111011 $94 52 %11100000 $11 126 Homework 3.13 Each row of the following table is to contain an equal value expressed in binary, hexadecimal, and unsigned BCD decimal. Complete the missing values. The first row illustrates the process. Binary %01111000
Hexadecimal
BCD Decimal
$78 $69
78 45
%10000111 $94 99 %00100110 $29 52 Homework 3.14 Write assembly code that promotes an 8-bit unsigned integer in Reg A to a 16-bit unsigned integer in Reg X. Homework 3.15 Write assembly code that promotes an 8-bit signed integer in Reg A to a 16-bit signed integer in Reg X. Homework 3.16 Using just the NAND gates, design an equals circuit, such that the output is 1 if and only if input A equals input B. There will be two input signals and one output signal. Homework 3.17 Using just the NOR gates, design an equals circuit, such that the output is 1 if and only if input A equals input B. There will be two input signals and one output signal. Homework 3.18 Using just p-type and n-type transistors, design a three-input AND circuit, similar to the AND circuit of Figure 3.6. There will be three input signals and one output signal. The output is high only if all three inputs are high. Homework 3.19 Using just p-type and n-type transistors, design a three-input OR circuit, similar to the OR circuit of Figure 3.6. There will be three input signals and one output signal. The output is low only if all three inputs are low. Homework 3.20 Using just p-type and n-type transistors, design an exclusive or circuit, similar to the AND and OR circuits of Figure 3.6. There will be two input signals and one output signal. Homework 3.21 Using just p-type and n-type transistors, design an exclusive nor circuit, similar to the NAND and NOR circuits of Figure 3.9. There will be two input signals and one output signal. Homework 3.22 Using just p-type and n-type transistors, design a 1-bit tristate driver, similar to the 74HC125 circuits of Figure 3.12, except this driver has a positive logic control. In other words, if G 1 the output Y equals the input A. If G 0, the output Y is HiZ. There will be two input signals (A,G) and one output signal (Y).
3.14 䡲 Homework Assignments
107
Homework 3.23 Design a digital circuit that takes the output of the 8-bit adder shown in Figure 3.19, and implements the Z, N, and V bits. The figure already includes the C bit (carry). Homework 3.24 Design a digital circuit that takes the output of the 8-bit subtractor shown in Figure 3.21, and implements the Z, N, and V bits. The figure already includes the C bit (carry). Homework 3.25 You are given a double-pole switch that has three pins. If the switch is not pressed, pins 1 and 2 are connected (0 resistance) and pins 2 and 3 are not connected (infinite resistance). If the switch is pressed, pins 2 and 3 are connected (0 resistance) and pins 1 and 2 are not connected (infinite resistance). Pins 1 and 3 are never connected (it is a break-before-make switch). Interface this switch to the 9S12, such that PT0 is high (5 V) if the switch is pressed and PT0 is low (0 V) if the switch is not pressed. You do not need to debounce the switch. Label all chip numbers and resistor values. No software is required. Homework 3.26 Show the circuit diagram to interface a switch to PH6 and a LED to PJ7. Write assembly software that initializes the ports. In the body of the main program, toggle the LED on and off if the switch is pressed, and turn the LED off if the switch is not pressed. Homework 3.27 Show the circuit diagram to interface two switches to PH1 and PH0 and one LED to PP0. Write assembly software that initializes the ports. In the body of the main program, toggle the LED on and off if the two switches are either both on or both off. Turn the LED off if the PH1 switch is pressed and the PH0 switch is not pressed. Turn the LED on if the PH0 switch is pressed and the PH1 switch is not pressed. Homework 3.28 Show the circuit diagram to interface one switch to PH0 and four LEDs to PT3 to PT0. The four LEDs will display a number from 0 to 15 in binary. Write assembly software that initializes the ports. In the body of the main program, increment the number every time the switch is pressed and released. Once the number gets to 15, do not increment it any more. Homework 3.29 Show the circuit diagram to interface four switches to PH3-PH0 and four LEDs to PT3 to PT0. Switches PH3 to PH2 represent a 2-bit unsigned number 0, 1, 2, or 3. Switches PH1 to PH0 represent a second 2-bit unsigned number 0, 1, 2, or 3. The four LEDs will display a number from 0 to 15 in binary. Write assembly software that initializes the ports. In the body of the main program, read the switches, form the two numbers, multiply the numbers together, and output the result on the LEDs. Homework 3.30 Use D flip-flops like Figure 3.14 to build an 8-bit ASL function. Homework 3.31 Use D flip-flops like Figure 3.14 to build an 8-bit ROR function. Homework 3.32 Use D flip-flops like Figure 3.14 to build an 8-bit ROL function. Homework 3.33 Consider the addition of two signed 8-bit numbers. If a positive number is added to a negative number, under what conditions will a signed overflow occur? Homework 3.34 Consider the subtraction of two signed 8-bit numbers. If a positive number is subtracted from another positive number, under what conditions will a signed overflow occur? Homework 3.35 Let A and B be two 8-bit inputs to an 8-bit binary adder. Fill in the table showing R A B and the four CCR bits after each addition. The first row illustrates the process. A 10 $40 $C3 100 150 5 41 120 20
B
R
NZVC
100 $A3 $6F 50 180
110
0000
0 50 136 0101
108
3 䡲 Representation and Manipulation of Information Homework 3.36 Let A and B be two 8-bit inputs to an 8-bit binary subtractor. Fill in the table showing R A B and the four CCR bits after each subtraction. The first row illustrates the process. A 100 $55 $DD 50 200 12 41 255
B
R
NZVC
10 $93 $9F 70 180
90
0000
0 50 136 87
0100
Homework 3.37 Let A be an 8-bit input. Fill in the table showing the promotion to 16-bit unsigned and 16-bit signed. Give all answers in 16-bit hexadecimal. The first row illustrates the process. A $80 $55 $DD $00 $FF $45 $90 $27 $A4
Unsigned 16-bit
Signed 16-bit
$0080
$FF80
Homework 3.38 Let A be an 8-bit input. Fill in the table showing the promotion to 16-bit unsigned and 16-bit signed. Give all answers in 16-bit hexadecimal. The first row illustrates the process. A $70 $A5 $1D $90 $DF $85 $52 $B6 $4B
Unsigned 16-bit
Signed 16-bit
$0070
$0070
Homework 3.39 Write assembly language code that negates a 16-bit number together. The input data can be found at $3800 to $3801, and the output result should be stored at $3802 to $3803. Can overflow occur with this operation? If not, prove it. If so give an example, and design the software implementing ceiling or floor as appropriate. Homework 3.40 Does the following assembly language code negate the 16-bit number in Register D? nega negb sbca #0 If so, prove it. If not show a counter example. One way to prove it is to write a test program that tests all 65536 cases. Homework 3.41 Write assembly language code that adds three 8-bit numbers together. The input data can be found at $3800, $3801, and $3802, and the output result should be stored at $3803. Ignore overflow. Homework 3.42 Look up the definitions for instructions clr dec tst and inc. Where is the data stored and what precision is the data?
3.14 䡲 Homework Assignments
109
Homework 3.43 Modify the C code in Program 3.7 so that it detects an input greater than 65,535. If the input does overflow, return with a value of 65,535. Hint: consider extending the precision of n to 32 bits by defining it as unsigned long. Homework 3.44 Modify the C code in Program 3.9 so that it doesn’t use recursion. Homework 3.45 High and Low are unsigned 8-bit components, which need to be combined into a single unsigned 16-bit Result. We will assume both High and Low are bounded within the range of 0 to 255. The expression High<<8 will perform eight logical shift lefts. Write assembly software to implement Result (High<<8)|Low; Homework 3.46 Write assembly code that exchanges the values in Register X and Y. Homework 3.47 Let N, M, and P be three 8-bit unsigned locations. Write assembly code to implement P 2*NM. Homework 3.48 Let N, M, and P be three 8-bit unsigned locations. Write assembly code to implement P (5*NM)/17. Homework 3.49 Let N, M, and P be three 8-bit locations. Write assembly code to implement P (5|N)&M. Homework 3.50 Let N be an 8-bit location. Write assembly code to set bit 7 and clear bit 0. Homework 3.51 Let N be an 8-bit location. Write assembly code to toggle bit 7 and set bit 0. Homework 3.52 What is the difference between the character 0 and the number 0? Homework 3.53 How is “3.14159” encoded as a null-terminated ASCII string? Homework 3.54 You are given a 16-position rotary switch, which has 17 wires, as shown in Figure Hw3.54. There is one wire called common, and the other 16 wires are labeled S0 through S15. The common wire is connected to exactly one of other 16 wires. You are to design an interface the creates a 4-bit digital signal representing the switch position. These signals are to be connected to Port C bits 3, 2, 1, and 0. Write an initialization ritual. Write an input subroutine that reads Port C and returns in Register A the current switch position 0 to 15. Figure Hw3.54 16-position rotory switch. 2
common
3 4 5 6
1 7 0 8 15 9 14 10 13 12 11
Homework 3.55 A solid-state relay has an LED as its controlling element, as shown in Figure Hw3.55. In particular, if there is 10 mA of current through the LED, then the relay will be on, and the 120 V AC power will be delivered to the load. If no current follows through the LED, then the relay will be off, and the load will not receive power. Assume the LED voltage is 2.2 V. Show the interface between the microcomputer Port T bit 0 and the relay. Write an initialization ritual. Write two output subroutines that write Port T, one routine to turn the relay on and another to shut it off. Figure Hw3.55 Solid-state relay. Load
120VAC
110
3 䡲 Representation and Manipulation of Information
3.15
Laboratory Assignments Lab 3.1 Design a 4-bit AND operator using the 9S12. Let N and M be two 4-bit inputs, and P be the 4-bit output. P N&M. There are eight inputs (4 bits for N and 4 bits for M), and five outputs (4 bits for P, and one bit for the condition code Z). Z is true if the output is zero. You must solve this system using just two 8-bit I/O ports on the 9S12. Use switches to set the inputs and LEDs to show the outputs. Lab 3.2 Design a 4-bit OR operator using the 9S12. Let N and M be two 4-bit inputs, and P be the 4-bit output. P N|M. There are eight inputs (4 bits for N and 4 bits for M), and five outputs (4 bits for P, and one bit for the condition code Z). Z is true if the output is zero. You must solve this system using just two 8-bit I/O ports on the 9S12. Use switches to set the inputs and LEDs to show the outputs. Lab 3.3 Design a 4-bit adder using the 9S12. Let N and M be two 4-bit inputs, and P be the 4-bit output. P N M. There are eight inputs (4 bits for N and 4 bits for M), and eight outputs (4 bits for P, and one bit each for the condition code bits, N, Z, V, and C). You must solve this system using just two 8-bit I/O ports on the 9S12. Use switches to set the inputs and LEDs to show the outputs. Lab 3.4 Design a 4-bit subtractor using the 9S12. Let N and M be two 4-bit inputs, and P be the 4-bit output. P NM. There are eight inputs (4 bits for N and 4 bits for M), and eight outputs (4 bits for P, and one bit each for the condition code bits, N, Z, V, and C). You must solve this system using just two 8-bit I/O ports on the 9S12. You must solve this system using just two 8-bit I/O ports on the 9S12. Use switches to set the inputs and LEDs to show the outputs. Lab 3.5 Design a 4-bit unsigned multiplier using the 9S12. Let N and M be two 4-bit inputs, and P be the 8-bit output. P N*M. There are eight inputs (4 bits for one number and 4 bits for the other), and eight outputs (the product of the two inputs). You must solve this system using just two 8-bit I/O ports on the 9S12. Use switches to set the inputs and LEDs to show the outputs. The input range is 0 to 15 and the output range is 0 to 225. Lab 3.6 Design a 4-bit signed multiplier using the 9S12. Let N and M be two 4-bit inputs, and P be the 8-bit output. P N*M. There are eight inputs (4 bits for one number and 4 bits for the other), and eight outputs (the product of the two inputs). You must solve this system using just two 8-bit I/O ports on the 9S12. Use switches to set the inputs and LEDs to show the outputs. The input range is 8 to 7 and the output range is 64 to 49.
4
9S12 Architecture Chapter 4 objectives are to: c c c c
Present the basic microcomputer architecture Study software execution at the bus cycle level List three 9S12 microcomputers and their memory and I/O port configurations Describe the timer and use it to create fixed time delays
The overall objective of this book is to develop the hardware and software components of an embedded system using the 9S12 microcontroller. In this chapter, we start with the general concepts of a computer and then present the details of the Freescale 9S12C32, 9S12DP512, and 9S12E128. For more information concerning a microcomputer, refer to the respective Freescale manual. Data sheets for the microcomputers we will use can be found at www.freescale.com. A free PDF reader can be downloaded for www.adobe.com. Given this basic knowledge, we will select the best microcomputer and memory configuration for our application. The architecture of a computer defines its hardware components and how the pieces are connected together. The more we understand the strengths and weaknesses of our computer, the better programmers we will be. The performance of an embedded system depends on both its hardware and software components. When developing in assembly language, it is difficult to separate hardware architecture (e.g., wires, gates, memory, etc.) from software architecture (instructions and data). Therefore, assembly language programming is presented along side the traditional architecture issues, such as registers, busses, memory, and I/O connections.
4.1
Introduction 4.1.1 Big and Little Endian
When we store 16-bit data into memory it requires two bytes. Since the memory systems on most computers are byte addressable (a unique address for each byte), there are two possible ways to store in memory the two bytes that constitute the 16-bit data. Freescale microcomputers implement the big endian approach that stores the most significant byte first. Intel microcomputers implement the little endian approach that stores the least significant byte first. The PowerPC is biendian, because it can be configured to efficiently handle both big and little endian. Figure 4.1 shows two ways to store the 16-bit number 1000 ($03E8) at locations $0850 to $0851.
111
112
4 䡲 9S12 Architecture
Figure 4.1 Example of big and little endian formats of a 16-bit number.
Address Data $0850 $03 $0851 $E8 Big Endian
Address Data $0850 $E8 $0851 $03 Little Endian
We also can use either the big or little endian approach when storing 32-bit numbers into memory that is byte (8-bit) addressable. Figure 4.2 shows the big and little endian formats that could be used to store the 32-bit number $12345678 at locations $0850 to $0853.
Figure 4.2 Example of big and little endian formats of a 32-bit number.
Address Data $0850 $12 $0851 $34 $0852 $56 $0853 $78 Big Endian
Address Data $0850 $78 $0851 $56 $0852 $34 $0853 $12 Little Endian
In the previous two examples, we normally would not pick out individual bytes (e.g., the $12), but rather capture the entire multiple byte data as one nondivisable piece of information. On the other hand, if each byte in a multiple byte data structure is individually addressable, then both the big and little endian schemes store the data in first to last sequence. For example, if we wish to store the four ASCII characters ‘9S12’, which is $39533132 at locations $0850 to $0853, then the ASCII ‘9’$39 comes first in both big and little endian schemes, as illustrated in Figure 4.3. Figure 4.3 Character strings are stored in the same for both big and little endian formats.
Address Data $0850 $39 $0851 $53 $0852 $31 $0853 $32 Big Endian and Little Endian
The terms “big and little endian” come from Jonathan Swift’s satire Gulliver’s Travels. In Swift’s book, a Big Endian refers to a person who cracks their egg on the big end. The Lilliputians were Little Endians because they insisted that the only proper way is to break an egg on the little end. The Lilliputians considered the Big Endians as inferiors. The Big and Little Endians fought a long and senseless war over the best way to crack an egg. Common Error: An error will occur when data is stored in big endian by one computer and read in little endian format on another.
4.1.2 MemoryMapped I/O
The architecture of a computer system defines how its processor, RAM, ROM, input devices, and output devices are connected, including the assembly instructions used to access RAM, ROM, and I/O devices. The 9S12 implements memory mapped I/O, where the I/O devices are connected to the processor in a manner similar to memory, as shown in Figure 4.4. I/O devices are assigned addresses, and the software accesses I/O using reads and writes to the specific I/O address. The software inputs from an input device using the same instructions as it would if it were reading from memory. Similarly, the software outputs from an output device using the same instructions as it would if it were writing to memory. From a design perspective, we usually do not develop software that attempts to write to ROM. However, from an architecture perspective, we access RAM in the same manner as we access ROM.
4.1 䡲 Introduction Figure 4.4 A memory-mapped I/O computer system.
9S12 Processor Bus interface Registers unit Control unit ALU
113
Bus
Input ports
RAM
External circuits
Physical devices
Output ports
Flash EEPROM
Input signals Output signals
Address Control Data
4.1.3 I/O-Mapped I/O
*
Figure 4.5 An I/O-mapped I/O computer system.
Input/output devices are important in all computers, but they are especially significant in an embedded system. In a computer system with I/O-mapped I/O, the control bus signals that activate the I/O are separate from those that activate the memory devices, as shown in Figure 4.5. These systems have a separate address space and separate instructions to access the I/O devices. The original Intel 8086 had four control bus signals MEMR, MEMW, IOR, and IOW. MEMR and MEMW were used to read and write memory, while IOR and IOW were used to read and write I/O. The Intel x86 refers to any of the processors that Intel has developed based on this original architecture. Even though we do not consider the personal computer (PC) an embedded system, there are embedded systems developed on this architecture. One such platform is called the PC/104 Embedded-PC. The Intel x86 processors continue to implement this separation been memory and I/O. The advantages of isolated I/O are that software can not inadvertently access I/O when it thinks it is accessing memory, and the I/O interfaces are simpler. Currently, there are many memory address lines (32 depending on the chip), but only 16 of those lines are used to access I/O devices. The other address lines are not used during an I/O bus cycle. Rather than use the regular memory access instructions, the Intel x86 processor uses special in and out instructions to access the I/O devices.
Personal Computer Intel x86 processor
I/O Control IOR, IOW Input ports
Memory Control Output ports
RAM
External circuits
Physical devices
Input signals Output signals
MEMR, MEMW
Address, Data
ROM
On the Intel x86, if we wish to bring the current value of input port $10 into Register AL, we could in
AL,10H
;Copy the value of Port $10 into Register AL
On the Intel x86, if we wish to send the value in Register AL to output port $11, we could out AL,11H
;Copy the value of Register AL out to Port $11
In actuality, today’s personal computer (i.e., the IBM-PC) contains both I/O-mapped I/O and memory-mapped I/O.
4 䡲 9S12 Architecture
114
4.1.4 Segmented or Partitioned Memory *
There is a third type of architecture found in embedded systems. In a computer system with segmented memory, the memory is divided into different groups according its function. The Intel 8051 architecture has three memory partitions, as shown in Figure 4.6. There are separate instructions to access the individual segments. The internal RAM segment includes the most of the registers, the stack, the I/O ports, and some general purpose RAM. Most of the instructions on the 8051 operate on this internal RAM segment. About 240 of the bits in the internal RAM segment are also bit addressable, meaning the program can read or write one bit without affecting the other 7 bits in the byte. The second segment is external RAM, which can function as large data storage. There are only these four instructions we can use to read and write external RAM. movx movx movx movx
A,@Ri A,@DPTR @Ri,A @DPTR,A
; ; ; ;
read from RAM using 8-bit address in RegRi into RegA read from RAM using 16-bit address in DPTR into RegA write from RegA to RAM using 8-bit address in RegRi write from RegA to RAM using 16-bit address in DPTR
The third segment is program memory, which holds the machine code and can also function as large lookup table. Machine codes are fetched out of the program memory. There are only these two instructions we can use to read from program memory. movc A,@A+PC ; read from ROM using address PC+RegA into RegA movc A,@A+DPTR ; read from ROM using address DPTR+RegA into RegA
The 8051 has a separate address space for each of the segments, meaning there is internal RAM location 0, external RAM location 0, and program memory location 0. The real advantage of this architecture is market-share and second sourcing. There are many manufacturers that produce microcontrollers based on the 8051 architecture, and together they comprise a significant fraction of the 8-bit microcontroller market.
Figure 4.6 A segmented-memory computer system.
8051
Internal RAM Control Processor
Input ports
Program Memory Control ROM
External RAM
External RAM Control
Output ports
External circuits
Physical devices
Input signals Output signals
Registers Internal RAM
Address, Data
4.1.5 Memory Bus Cycles
The bus contains address, data, and control information that provides data transfer between the various modules in the system. The address specifies which module (input, output, RAM, or ROM) will communicate with the processor. The data contains the information that is being transferred. Control signals specify the direction of the transfer. We call a complete data transfer a bus cycle. In a simple computer system, like the 9S12, the two types of transfers are shown in Table 4.1. In this simple system, the processor is the bus master, always controlling the address (where to access), the direction (read or write), and the control (when to access.) The 9S12 data bus allows either 8 or 16 bits. On the 9S12, like most computers, each individual byte has a unique address. Therefore, the maximum memory size in bytes is equal to the number of different addresses. The TExaS simulator allows you to observe bus activity during execution.
4.1 䡲 Introduction Table 4.1 Simple computers generate two types of cycles.
Data copied to processor Data copied to Output or RAM
Checkpoint 4.1: The 9S12C32 has a 16-bit address bus. How many locations can it address? Checkpoint 4.2: Both the 9S12DP512 and the 9S12E128 can access 1 mebibyte, including internal and external memory. How many address lines in their busses?
During a read cycle (or memory LOAD) data flows from memory or input device into the processor. Assume memory location $3800 contains a $98, and the processor executes a ldaa $3800 instruction. Many bus cycles will occur during the execution of the ldaa instruction, but Figure 4.7 shows the one memory read cycle that copies the data $98 from memory into Register A.
Figure 4.7 A read cycle copies data from RAM, ROM or input device into the processor.
$98 $3800
Memory Read Cycle R
Processor Input ports
Input signals
Output ports
Output signals
RAM $98 ROM Address Control Data
During a write cycle (or memory STORE) data flows from the processor into memory or output device. Assume Register B contains a $25, and the processor executes a stab $3801 instruction. Again, many bus cycles will occur during the execution of the stab instruction, but Figure 4.8 shows the one memory write cycle that copies the data $25 from Register B into memory. Checkpoint 4.3: The 9S12C32 in single-chip mode has a 16-bit address bus and a 16-bit data bus but can still only address 65536 bytes of memory. Why? Figure 4.8 A write cycle copies data from the processor into RAM or output device.
$25 $3800 $25 Processor
Memory Write Cycle W Input ports
Input signals
Output ports
Output signals
RAM
ROM Address Control Data
116
4 䡲 9S12 Architecture
4.1.6 Processor Architecture
Figure 4.9 The four basic components of the 9S12 processor.
The 9S12 has four major components, as illustrated in Figure 4.9. The bus interface unit (BIU) handles the read/write accesses to memory. The 9S12 has a single processor, and its BIU always drives the address bus and the control signals of the bus. I.e., the 9S12 is always bus master. The bus signals can be divided into three groups: control, address, and data. The control signals include E and R/W. The E clock controls the timing of each bus cycle. For example, if the 9S12 is running at 24 MHz, then the E clock will be a 24 MHz squarewave with one clock period per bus cycle. R/W determines the type of bus cycle: R/W equals 1 for a read cycle and 0 for a write cycle. The address bus contains 16 signals, A15 to A0, containing the memory address for this bus cycle. On the real 9S12, the data bus is 16-bits wide. The effective address register (EAR) contains the data address for the current instruction and is set in the third phase of exection. The TExaS simulator allows you to observe the EAR during execution.
9S12 Processor Bus interface unit Registers EAR CCR A B X Y SP PC
Control: E, R/W Address: A15-A0 Data: D15-D0
Control unit ALU IR
The registers are temporary storage elements with a usage that is explicitly defined by each instruction. Registers do not have addresses like regular memory, but rather they have names or numbers. The opcode and/or operand defines which registers to use. (However, on the Intel 8051, the registers do have addresses, and thus, registers can be accessed like RAM locations.) Accumulators are registers that contain data. Index registers contain addresses. The program counter (PC) points to the memory containing the instruction to execute next. In an embedded system, the PC usually points into nonvolatile memory like ROM or EEPROM. The information stored in nonvolatile memory (e.g., the program) is not lost when power is removed. The stack pointer (SP) points to the RAM and defines the stack. The stack is an extremely important component of software development, which can be used to pass parameters, save temporary information, and implement local variables. The internal RAM of the 9S12 is volatile memory, meaning its information is lost when power is removed. It is possible to connect external RAM to the 9S12 powered by a separate battery, creating a nonvolatile RAM. The condition code register (CCR) contains the status of the previous operation, as well as some operating mode flags such as the interrupt enable bit, see Figure 4.10. This register is called the flag register on the Intel computers. When S1, the stop instruction is disabled. When X0, XIRQ interrupts are allowed. Once X is set to zero, the software can not set it back to 1. The H bit is used for BCD addition (see the adda and daa instructions.) When I0, IRQ interrupts are enabled. Interrupts will be discussed in Chapters 7, 10, 11, and 12. N, Z, V, and C bits signify the status of the previous ALU operation.
Figure 4.10 The 9S12 condition code bits.
CC
S X H I N Z V C Carry/borrow or unsigned overflow Signed overflow Zero Negative IRQ Interrupt Mask Half Carry from bit 3 XIRQ Interrupt Mask Stop disable
4.1 䡲 Introduction
117
The 9S12 is typical of 16-bit microcontrollers by having only three general purpose 16bit registers. The AMD 64-bit architecture has 16 general purpose registers, while some reduced instruction set computer (RISC) architectures can have from 32 to 128 registers. Registers are very different from memory in one very important aspect: register numbers are assigned at compile time and are static, while memory addresses can be computed at run time and therefore the location of data in memory can be dynamically adjusted. Conversely, when writing in assembly, the programmer decides the best registers to use to solve the program. In a very similar, but more automatic manner, when writing in a high-level language, the compiler decides which registers to use. In both cases, the registers are assigned and fixed before the program is executed. The arithmetic logic unit (ALU) performs arithmetic and logic operations. Addition, subtraction, multiplication, and division are examples of arithmetic operations. AND, OR, exclusive OR, and shift are examples of logical operations. Checkpoint 4.4: For what do the acronyms CU BIU ALU stand? Phase
Function
R/W
Address
Comment
1 2 3 4 5 6
Instruction fetch Decode instruction Evaluation address Data read Operation Data store
read none none read none write
PC++
Get opcode and operands Figure out what to do Determine EAR Data passes through ALU ALU operations, set CCR Results stored in memory
EAR EAR
Observation: On the 9S12, some phases may be skipped, but the phases always occur in this order.
Phase 1. Opcode and operand fetch. The execution of 9S12 instruction begins with fetching the op code and putting it in the IR. The page 2 instructions have a prebyte ($18), so these instructions require two 8-bit cycles to obtain the entire op code. Inherent mode instructions require no additional information. Immediate addressing mode requires 1 or 2 bytes of operand data which is transferred to either a register (e.g., ldaa #10) or to the ALU (e.g., eora #$44). Direct addressing mode instructions will fetch 1 operand byte which is used to create the effective address and stored in the EAR (with the most significant byte set to zero.) Extended addressing mode instructions will fetch 2 operand bytes that are used to create the effective address and stored in the EAR. Indexed addressing mode instructions will fetch 1, 2, or 3 operand bytes, which are used to create the effective address and stored in the EAR. The index register may or may not be modified. PC relative addressing mode instructions will fetch 1 or 2 operand bytes that are added to the PC forming the target branch address. Since the PC is incremented after fetching the op code and incremented again after fetching the operand, the PC is pointing to the next the next instruction at the time the target branch address is calculated. That is why the PC relative offset is the difference rel target current n where n is the number of bytes (2, 3, or 4) in the instruction. Phase 2. Decode instruction. This phase does not require any bus cycles. The processor determines which instruction is to be executed. Phase 3. Evaluate address. This phase does not require any bus cycles. The processor uses information in the instruction and in its registers to calculate the EAR. Normally, this phase is also quick and requires no bus cycles. However, there are some indirect addressing modes (i.e., an address pointing to an address that points to the data) that will require additional bus cycles to fully determine the effective address. Phase 4. Data read. If the instruction requires data from memory, it will use the EAR to fetch 1 or 2 bytes. These cycles will be either data fetches or a stack pulls.
118
4 䡲 9S12 Architecture
Phase 5. Free cycles. Any ALU functions occur next. The ALU is used in many instructions to set the CCR. The ALU calculation may require addition time to execute (e.g., idiv, mem). Since no data needs to be read/written at this time, the actual 9S12 will generate null cycles or free cycles (f). In the real computer, these free cycles look like real memory accesses (R/W, Address, Data), but the data is ignored. The simplified cycle-bycycle simulation counts these cycles but does not generate an output display for these do-nothing cycles. Phase 6. Data write. If required, the last step involves writing data to memory, it will use the EAR to store 1 or 2 bytes. These cycles will be either data writes or a stack pushes. TExaS assembles the 9S12 instruction movw $3800,$240 as the following. The first field, $F000, is the memory location that will hold the instruction. The second field, 180438000240, is the object code in hexadecimal that the assembler generates and is loaded into memory. The next field, [6], specifies the total number of cycles required to execute this instruction. The fourth field, {ORPWPO}, gives the cycle codes explaining the details of the execution of the real 9S12. The last field is the original source code. $F000 180438000240
[6]{ORPWPO}
movw $3800,$0240
When a real 9S12 executes this instruction, it will create six 16-bit memory cycles, as discussed later in this chapter. The TExaS simulator will correctly count the six cycles, but show ten 8-bit read cycles as it simulates this instruction. Assume the 16-bit data at $3800 is $1234 (big endian.) Opcode fetch R Page2 op fetch R Operand fetch R Operand fetch R Operand fetch R Operand fetch R Fetch msb @ EARR Fetch lsb @ EARR Store msb @ EARW Store lsb @ EARW
4.1.7 I/O Port Architecture
Figure 4.11 A read-only input port allows the software to read external digital signals.
Figure 4.11 describes an input-only port from an architecture perspective. On the 9S12, PE1 and PE0 are input only. On the 9S12DP512, the 16 pins on ports PAD0 and PAD1 are also input only. The digital values existing on the input pins are copied into the microcomputer when the software executes a read from the port address. There is no direction register bits, and these pins are always inputs. The “triangle-shaped” circuit, shown in Figure 4.11, is a tristate driver, which was described previously in Figure 3.12 and Table 3.15. During a read cycle from the specified port address, the input signals are driven onto the bus by making the tristate driver active. At all other times, the output of the driver is hiZ or off.
Processor
Read from port Input signals Data bus
A latched input port behaves similar to the circuit shown in Figure 4.12. There are no latched inputs on the 9S12 (Port C on the 6811 can be a latched input). The digital values existing on the input pins are copied into an internal latch on an edge of the external control signal. At a later time, the data is transferred to the microcomputer when
4.1 䡲 Introduction
119
the software executes a read from the latch address. Notice that this latched input port also supports the regular input function. In other words, the software has the option of reading the port address to get information directly from the input port pins or from the latch address to get information that existed at the time of the previous active edge of the external control signal.
Figure 4.12 A latched input port allows the software to read external digital signals that are captured via the external control signal.
Processor
Read from port Input signals Read from latch D Q
Data bus
External control
On the 6811, the latched input port allows the software to select the strategic edge to affect the latch function. In other words, the software specifies whether the rise or fall of an external control signal latches the input data. It is important to remember that when the software reads from the latch address, it obtains the values of the input signals that existed at the time of the active edge of the external control signal. In this way, the external device can provide the data at the input and issue an edge on the control signal, latching into the computer. The software can process the data at a later time without requiring the external device to maintain the valid data at the input. One appropriate extension of the latched input is to add an output signal from the computer to the device, so the software can signal back (acknowledge) to the external hardware that the previous data has been read by the software. While an input device usually just involves the software reading the port, an output port can participate in both the read and write cycles very much like a regular memory. Figure 4.13 describes a “readable output port”. For an 8-bit output port there will be eight D flip-flips to hold the values on the output pins. D flip-flops were described previously in Figure 3.11 and Table 3.14. A write cycle to the port address will affect the values on the output pins. In particular, the microcomputer places information on the data bus and that information is clocked into the D flip-flops. Since it is a readable output, a read cycle access from the port address returns the current values existing on the port pins. A “write-only output port” does not allow software to read the current values.
Figure 4.13 A readable output port allows the software to generate external digital signals.
Processor
Read from port Output signals Write to port DQ
Data bus
Although the 9S12 has many pins that can be used as outputs, none of them are output-only. I.e., most of the pins on the 9S12 can be configured as either inputs or outputs. Using Port T as an example, Figure 4.14 illustrates how most of the port pins on the 9S12 operate. Freescale uses the concept of a direction register to determine whether a pin is an input (direction register bit is 0) or an output (direction register bit is 1). We define a ritual as a program executed during
4 䡲 9S12 Architecture
120
Read from PTIT
Read from PTIT Processor
Read from PTT
Processor
Read from PTT PTT inputs
PTT outputs Write to PTT
Write to PTT
Active
DQ
DQ Data bus
Direction bit = 1
Data bus
Direction bit = 0
Figure 4.14 A bidirectional port can be configured as a read-only input port or a readable output port.
start up that initializes hardware and software. If the direction bit is zero, the port behaves like a simple input. When the pin is an input, writes to PTT have no effect, and reads from PTIT and PTT both return the current status of the input pin. If the direction bit is one, it becomes a readable output port. When the pin is an output, writes to PTT set the value of the output driver, reads from PTT return the previous value written, and reads from PTIT return the current status of the output pin. If the output is working properly, and the external device is not drawing too much current, we expect the data we read from PTIT to equal the data we read from PTT. Common Error: Many program errors can be traced to confusion between I/O ports and regular memory. For example, you should not write to an input port, and sometimes we can not read from an output port. Observation: If a port pin is configured as a readable output, but external loading causes the pin voltage to be different than the value written by the software, then a read from the port will return the current voltage level at the pin and not the value last written by the software. This fact can be used by the software to detect excess loading by the external circuits.
It is good programming practice to collect all the subroutines that access PTT in one place within our software project. In this way, it will be easier to debug and easier to change. In fact, the term software driver is used to describe a set of programs that perform a task. In particular, an I/O driver or device driver is a set of programs that facilitate the use of an I/O device. For example, if you had a system implemented using Port T, and wished to redesign it using Port H instead, then it is a simple manner of finding all the places that use the Port T and change it to Port H. Observation: An initialization routine is called “friendly” if it just sets the configuration bits that are needed, and leaves the other bits as is.
When there is more than one initialization routine, one initialization routine may affect the operation of the other. For example, assume there are two independent modules: one uses Port T bit 0 as an output and the other uses Port T bit 1 as an output. If the initialization routine for the first module sets DDRT to 1 and the second initialization sets DDRT to 2, then an error will occur when both initialization routines are executed. In particular, the execution of the second routine one overrides the action of the first routine.
Example 4.1 Design an I/O driver for a single output pin. Solution This software driver will require three operations: initialization, set, and clear. Initially, we will use PT0, but we expect this pin assignment to change in the future. Because we expect
4.2 䡲 *Understanding Software Execution at the Bus Cycle Level
121
the port pin to change, we will define the functions with a general name, like Pin, rather than its physical name, like PT0. The example in Program 4.1 is friendly because it does not modify the other seven bits. The initialization will define PT0 as an output, the function Pin_Set will make it high, and the function Pin_Clr will make it low. Program 4.1 Simple I/O port driver.
; Make PT0 an output pin Pin_Init bset DDRT,#$01 rts ; Make PT0 high Pin_Set bset PTT,#$01 rts ; Make PT0 low Pin_Clr bclr PTT,#$01 Rts
// Make PT0 an output pin void Pin_Init(void){ DDRT |= 0x01; } // Make PT0 high void Pin_Set(void){ PTT |= 0x01; } // Make PT0 low void Pin_Clr(void){ PTT &= ~0x01; }
Checkpoint 4.5: Rewrite the I/O driver in Example 4.1, moving the pin to PT7.
4.2
*
Understanding Software Execution at the Bus Cycle Level One of the advantages of studying assembly language programming is the ability to understand how the computer executes software from a bus cycle prospective. Many computer concepts such as speed, hardware/software synchronization, protection, pointers, and security can be studied at the bus cycle level. There are eight types of bus cycles that the 9S12 uses to communicate with memory. For all types of cycles, the processor drives the address bus, the R/W signal, and the LSTRB signal. The 16-bit address bus selects which memory location (or I/O device) to access. The R/W signal specifies read (R/W1) or write (R/W0). The LSTRB signal specifies the type of cycle as shown in Table 4.2.
Table 4.2 Eight types of memory cycles on a 9S12.
LSTRB
A0
R/W
Type of Access
1 0 1 0 0 1 0 1
0 1 0 1 0 1 0 1
1 1 0 0 1 1 0 0
8-bit read of an even address 8-bit read of an odd address 8-bit write of an even address 8-bit write of an odd address 16-bit read of an even address 16-bit read of an odd address (low/high data swapped) 16-bit write to an even address 16-bit write to an odd address (low/high data swapped)
During a read cycle, the memory at the specified address puts the information on the data bus, and the processor transfers the information (8 or 16 bits) into the appropriate place within the processor. During a write cycle, the processor puts the information on the data bus, and the memory transfers the information into the specified location. The 8-bit transfers are straightforward because the 16-bit address explicitly defines the one byte in memory to be affected. The 16-bit transfers to even addresses are also simple, because the top 15 bits of the address determine the two locations of interest. The tricky situation is a 16-bit access to an odd address. For example, a 16-bit read of location $08FF must access both locations $08FF (most significant byte) and $0900 (least significant byte) in a single
122
4 䡲 9S12 Architecture
memory cycle. In this case, the $08FF data will arrive at the processor as the lower 8 bits of the data bus, and the $0900 data will arrive as the upper 8 bits. The two bytes must be swapped inside the processor before it is used. Another complicating feature of a real 9S12 is its instruction queue, used to buffer program information. The mechanism is called a queue rather than a pipeline because a typical pipelined CPU executes more than one instruction at the same time, while the CPU12 always finishes executing one instruction before beginning to execute another. The 9S12 BIU reads the op code, then it reads the operand, and finally it reads and writes memory data as required. The instruction queue is a hardware first-in-first-out queue, placed between the BIU and the CU, as shown in Figure 4.15. Queue logic fetches program information and positions it for execution, but instructions are executed sequentially. The queue can hold up to three 16-bit values. Writes from the processor to memory do not pass through the queue. Using a queue allows the 9S12 to fetch the op code and operands of the next instruction while it is executing the current instruction. This explains the curious behavior of 9S12 instructions, which seem to perform its data transfers first, followed by the op code fetches. In actually, the op code fetches specified as part of an instruction execution are reading the op codes for the next instruction. A conditional branch instruction has the tricky question of which op codes it should fetch. It hasn’t yet performed the conditional test, so it doesn’t know the answer. The 9S12 assumes a conditional branch will not occur and will simply fetch the next op code. If the branch is to occur, then its queue will be filled with the wrong op codes, and the queue will have to be flushed and refilled. This is the reason conditional branch instructions take one or three cycles to execute, depending on whether the branch is taken or not. When faced with a conditional branch instruction, the Intel Pentium processor fills its pipeline with both possible branch paths, so it is efficient regardless of whether or not the branch is to occur. Figure 4.15 The 9S12 processor fetches instructions in advance of actual execution.
Instruction Queue Execute instructions Control unit
Phase 1 Fetch opcodes and operands from bus
The actual 9S12 cycle-by-cycle execution can be determined by first listing the cycle sequence code for the instruction. The sequence code can be found in the Freescale CPU12 Reference Manual or by assembling the instruction using TExaS. When the ShowCycleTypes... option is enabled via the Assembly-Options command, the cycle types will be included in the assembly listing. Although TExaS doesn’t show the real cycles during simulated execution, its assembler will show the cycle sequence code in the assembly listing. A single letter code represents a single CPU cycle. Upper case letters indicate 16-bit access cycles. Lower case letters mean 8-bit accesses. There are cycle codes for each addressing mode variation of each instruction. Simply count code letters to determine the execution time of an instruction. This execution time is accurate for a single-chip 16-bit system with no 16-bit odd-address data accesses to any locations other than on-chip memory. Accesses to external memory can be stretched, but the CPU is not aware of the stretch delays because the clock to the CPU is temporarily stopped during these delays. The following paragraphs explain the cycle code letters used and note conditions that can cause each type of cycle to be stretched. f—Free cycle. This indicates a cycle where the CPU does not require use of the system buses. An f cycle is always one cycle of the system bus clock. These cycles can be used by a queue controller or the background debug system to perform single cycle accesses without disturbing the CPU.
4.2 䡲 *Understanding Software Execution at the Bus Cycle Level
123
g—Read 8-bit PPAGE register. These cycles are only used with the call instruction to read the current value of the PPAGE register, and are not visible on the external bus. Since the PPAGE register is an internal 8-bit register, these cycles are never stretched. I—Read indirect pointer. Indexed indirect instructions use this 16-bit pointer from memory to address the operand for the instruction. These are always 16-bit reads but they can be either aligned or misaligned. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the corresponding data is stored in external memory. There can be additional stretching when the address space is assigned to a chip-select circuit programmed for slow memory. These cycles are also stretched if they correspond to misaligned access to a memory that is not designed for single-cycle misaligned access. i—Read indirect PPAGE value. These cycles are only used with indexed indirect versions of the call instruction, where the 8-bit value for the memory expansion page register of the call destination is fetched from an indirect memory location. These cycles are stretched only when controlled by a chip-select circuit that is programmed for slow memory. n—Write 8-bit PPAGE register. These cycles are only used with the call and rtc instructions to write the destination value of the PPAGE register and are not visible on the external bus. Since the PPAGE register is an internal 8-bit register, these cycles are never stretched. O—Optional cycle. Program information is always fetched as aligned 16-bit words. When an instruction consists of an odd number of bytes, and the first byte is misaligned, an O cycle is used to make an additional program word access (P) cycle that maintains queue order. In all other cases, the O cycle appears as a free (f) cycle. The $18 prebyte for page two opcodes is treated as a special one-byte instruction. If the prebyte is misaligned, the O cycle is used as a program word access for the prebyte; if the prebyte is aligned, the O cycle appears as a free cycle. If the remainder of the instruction consists of an odd number of bytes, another O cycle is required some time before the instruction is completed. If the O cycle for the prebyte is treated as a P cycle, any subsequent O cycle in the same instruction is treated as an f cycle; if the O cycle for the prebyte is treated as an f cycle, any subsequent O cycle in the same instruction is treated as a P cycle. Optional cycles used for program word accesses can be extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the program is stored in external memory. There can be additional stretching when the address space is assigned to a chip-select circuit programmed for slow memory. Optional cycles used as free cycles are never stretched. P—Program word access. Program information is fetched as aligned 16-bit words. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the program is stored externally. There can be additional stretching when the address space is assigned to a chip-select circuit programmed for slow memory. r—8-bit data read. These cycles are stretched only when controlled by a chip-select circuit programmed for slow memory. R—16-bit data read. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the corresponding data is stored in external memory. There can be additional stretching when the address space is assigned to a chip-select circuit programmed for slow memory. These cycles are also stretched if they correspond to misaligned accesses to memory that is not designed for single-cycle misaligned access.
124
4 䡲 9S12 Architecture
s—8-bit stack data. These cycles are stretched only when controlled by a chipselect circuit programmed for slow memory. S—16-bit stack data. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the SP is pointing to external memory. There can be additional stretching if the address space is assigned to a chip-select circuit programmed for slow memory. These cycles are also stretched if they correspond to misaligned accesses to a memory that is not designed for single cycle misaligned access. The internal RAM is designed to allow single cycle misaligned word access. w—8-bit data write. These cycles are stretched only when controlled by a chipselect circuit programmed for slow memory. W—16-bit data write. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the corresponding data is stored in external memory. There can be additional stretching when the address space is assigned to a chip-select circuit programmed for slow memory. These cycles are also stretched if they correspond to misaligned access to a memory that is not designed for single-cycle misaligned access. u—8-bit unstack data. These cycles are stretched only when controlled by a chipselect circuit programmed for slow memory. U—16-bit unstack data. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the SP is pointing to external memory. There can be additional stretching when the address space is assigned to a chipselect circuit programmed for slow memory. These cycles are also stretched if they correspond to misaligned accesses to a memory that is not designed for single-cycle misaligned access. The internal RAM is designed to allow single-cycle misaligned word access. V—Vector fetch. Vectors are always aligned 16-bit words. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the program is stored in external memory. There can be additional stretching when the address space is assigned to a chip-select circuit programmed for slow memory. t—8-bit conditional read. These cycles are either data read cycles or free cycles, depending upon the data and flow of the revw instruction. These cycles are only stretched when controlled by a chip-select circuit programmed for slow memory. T—16-bit conditional read. These cycles are either data read cycles or free cycles, depending upon the data and flow of the rev or revw instruction. These cycles are extended to two bus cycles if the MCU is operating with an 8-bit external data bus and the corresponding data is stored in external memory. There can be additional stretching when the address space is assigned to a chip-select circuit programmed for slow memory. These cycles are also stretched if they correspond to misaligned accesses to a memory that is not designed for single-cycle misaligned access. x—8-bit conditional write. These cycles are either data write cycles or free cycles, depending upon the data and flow of the rev or revw instruction. These cycles are only stretched when controlled by a chip-select circuit programmed for slow memory. PPP/P—Short branches require three cycles if taken, one cycle if not taken. Since the instruction consists of a single word containing both an opcode and an 8-bit offset, the not-taken case is simple—the queue advances, another program word fetch is made, and execution continues with the next instruction. The taken case
4.2 䡲 *Understanding Software Execution at the Bus Cycle Level
125
requires that the queue be refilled so that execution can continue at a new address. First, the effective address of the destination is determined, then the CPU performs three program word fetches from that address. OPPP/OPO—Long branches require four cycles if taken, three cycles if not taken. Optional cycles are required because all long branches are page two opcodes and thus include the $18 prebyte. The CPU12 treats the prebyte as a special 1-byte instruction. If the prebyte is misaligned, the optional cycle is used to perform a program word access; if the prebyte is aligned, the optional cycle is used to perform a free cycle. As a result, both the taken and not-taken cases use one optional cycle for the prebyte. In the not-taken case, the queue must advance so that execution can continue with the next instruction, and another optional cycle is required to maintain the queue. The taken case requires that the queue be refilled so that execution can continue at a new address. First, the effective address of the destination is determined, then the CPU performs three program word fetches from that address. Because of the instruction queue, one must study a sequence of instructions to determine the actual memory bus cycles. As an example, we will analyze the program shown in the following listing. First, we look at the listing file generated by the TExaS assembler. The first field, e.g., $4000, is the memory location that will hold the instruction. The second field, e.g., 8608, is the object code in hexadecimal that the assembler generates and is loaded into memory. The next field, e.g., [1], specifies the total number of cycles required to execute this instruction. The fourth field, e.g., ( 0), specifies the accumulated number of cycles required to execute all the previous instructions up to this point. The fifth field, e.g., {P}, gives the cycle codes explaining the details of the execution. The last field is the original source code. $4000 $4002 $4005 $4008 $400B $400C $400F
The next step is to string the cycle codes together: PPwO, rPOPwOOPwOPPP. On an 8-MHz 9S12, each cycle takes 125 ns. The sequence after the comma will be repeated. At the beginning of each instruction, we assume the entire instruction is loaded in the queue. In other words, each instruction must fetch enough data to capture the entire next instruction. The reset sequence will set the PC using the reset vector (16-bit value from $FFFE), and load the queue with the $8608, $7A02 machine codes. We also assume the Port T inputs are set such that its data will be $18. Table 4.3 shows the actual 9S12 cycles during execution. Notice that the front of the queue contains the instruction being executed. The ** symbols indicate irrelevant data. Table 4.3 Real memory cycles generated by a 9S12 executing this program.
Each instruction refills the queue by fetching the same number of bytes that the instruction uses. Program information is fetched in aligned 16-bit words. Each program fetch (P) indicates that two bytes need to be replaced in the instruction queue. Each optional fetch (O) indicates that only one byte needs to be replaced. For example, the staa DDRT instruction in this example composed of three bytes does one program fetch (P) and one optional fetch (O). If the first byte of the three-byte instruction was even-aligned, the optional fetch is converted into a free cycle. If the first byte was odd-aligned, the optional fetch is executed as a program fetch. Observation: When 9S12 begins executing an instruction, the queue has all the opcode and operand bytes it needs. Observation: During the execution of an instruction, the 9S12 fetches the opcode and operand bytes for the next instruction.
Table 4.4 shows the simulated 9S12 cycles generated by TExaS. Table 4.4 Simulated memory cycles generated by TExaS for this program.
Cycle Type
Details
Instruction
Opcode fetch Operand fetch Opcode fetch Operand fetch Operand fetch Store using EAR Opcode fetch Operand fetch Operand fetch Fetch using EAR Opcode fetch Operand fetch Operand fetch Store using EAR Opcode fetch Opcode fetch Operand fetch Operand fetch Store using EAR Opcode fetch Operand fetch
R 0x4000 0x86 from ROM R 0x4001 0x08 from ROM R 0x4002 0x7A from ROM R 0x4003 0x02 from ROM R 0x4004 0x42 from ROM W 0x0242 0x0F to I/O port R 0x4005 0xB6 from ROM R 0x4006 0x02 from ROM R 0x4007 0x40 from ROM R 0x0240 0x10 from I/O R 0x4008 0x7A from ROM R 0x4009 0x08 from ROM R 0x400A 0x00 from ROM W 0x0800 0x10 to RAM R 0x400B 0x44 from ROM R 0x400C 0x7A from ROM R 0x400D 0x02 from ROM R 0x400E 0x40 from ROM W 0x0240 0x08 to I/O port R 0x400F 0x20 from ROM R 0x4010 0xF4 from ROM
ldaa #$08
staa DDRT
ldaa PTT
staa Data
lsra staa PTT
bra loop
Checkpoint 4.6: Give the actual memory cycles created when the following program is executed by a 9S12. Assume the queue contains the data B608, 0006. Assume location $0800 contains the data $55. $F000 B60800 [ 3]( $F003 06F000 [ 3](
0){rPO 3){PPP
}loop ldaa $0800 } jmp loop
4.3 䡲 9S12 Architecture Details
4.3
127
9S12 Architecture Details The microcontrollers in the 9S12 family differ by the amount of memory and by the types of I/O modules. All 9S12 microcontrollers have a 16-bit central processing unit (HCS12CPU), a system integration module (SIM), RAM (volatile random access memory), Flash EEPROM (nonvolatile electrically erasable programmable read only memory), and a phase-locked loop (PLL). The 9S12 microcontrollers are configured with zero, one, or more of the following modules: asynchronous serial communications interface (SCI), serial peripheral interface (SPI), inter-integrated circuit (I2C), key wakeup, 16-bit timer, a pulse width modulation (PWM), 10-bit or 12-bit analog-todigital converter (ADC), 8-bit digital-to-analog converter (DAC), liquid crystal display driver (LCD), controller area network (CAN 2.0), universal serial bus (USB 2.0) interface, Ethernet (MAC FEC 10/100) interface, and memory expansion logic. The PLL allows the software to increase or decrease the execution speed. Typically, the SPI and I2C modules allow the 9S12 to communicate with other dedicated peripheral devices. Whereas, typically the SCI, CAN, USB, and Ethernet allow the 9S12 to communicate with other computers. The ADC module (together with sensors and analog amplifiers) can be used to collect information. The PWM module can be used to control delivered power to motors and lights. We can use key wakeup modules to interface digital inputs to the 9S12.
The 9S12C32, with its port structure shown in Figure 4.16, is one of the smaller and lower-cost members of the 9S12 family. It has 2 kibibytes of RAM and 32 kibibytes of EEPROM.
The 9S12C32 is available in three quad flat package (QFP) sizes. The larger chip packages have more pins, as shown in Table 4.5. TExaS does not simulate the external data bus, SPI, PWM, or CAN. Port
48-pin
52-pin
80-pin
Shared Functions
Port A Port B Port E Port J Port M Port P Port S Port T Port AD
Address/Data Bus Address/Data Bus System Integration Module Key wakeup SPI, CAN Key wakeup, PWM SCI Timer, PWM Analog to Digital Converter
Table 4.5 The 9S12C32 has nine external I/O ports.
One of the smallest systems based on the 9S12 family is the 24-pin Nanocore module from TechArts, as shown in Figure 4.17. This system includes a voltage regulator, a run/load switch, a BDM header, RS232 drivers for the SCI port, the 8 pins of Ports T, and the 8 pins of Port AD. Figure 4.17 The TechArts NC12C32 Nanocore12.
Program 4.2 and Table 4.6 define some of the parallel ports for the 9S12C32. A full list of I/O ports can be found in the reference manual at http://users.ece.utexas.edu/⬃valvano/ Datasheets/MC9S12C128_V1.pdf. The full list can also be found in the TExaS file HC12.rtf. The ports simulated by TExaS are defined in the file Port12.rtf. We clear (0) a bit in the direction register to make that pin an input and set it (1) to make it an output. Notice that Port M is only 6 bits wide. A pin on Port AD (PTAD) can be used as a digital
4.3 䡲 9S12 Architecture Details ; port name definitions ATDDIEN equ $008D ; Input Enable DDRAD equ $0272 ; Direction DDRM equ $0252 ; Direction DDRT equ $0242 ; Direction PTAD equ $0270 ; I/O PTM equ $0250 ; I/O PTT equ $0240 ; I/O
PTT PTIT DDRT PTM PTIM DDRM ATDDIEN PTAD PTIAD DDRAD
Table 4.6 Some 9S12C32 Parallel ports.
input if the corresponding bit in the ATDDIEN is set to 1 and the bit in the DDRAD is cleared to 0. A pin on Port AD (PTAD) can be used as a digital output if the corresponding bit in the DDRAD is set to 1. The pins on Port AD can be used as analog input if the ADC is enabled and the corresponding bit in the ATDDIEN is cleared to 0.
4.3.2 9S12DP512 Architecture
Table 4.7 The 9S12DP512 is a 112 pin module with 91 I/O pins.
Figure 4.18 shows the port structure of the 9S12DP512. Although the 9S12DP512 has 512 kibibytes EEPROM, only 48 kibibytes of it is directly addressable using standard 16-bit addressing modes. The remaining 464 kibibytes must be accessed using the paged memory process. Table 4.7 shows the I/O pins that exist on the 9S12DP512 chip. All pins except PE5 and PE6 are available on the TechArts Adapt module shown in Figure 4.19. TExaS does not simulate the external data bus, SPI, I2C, PWM, or CAN. Chip
Digital input Digital input Digital I/O Digital I/O Digital I/O Digital I/O and key wakeup Digital I/O and key wakeup Digital I/O and key wakeup Digital I/O and key wakeup Digital I/O Digital I/O
IRQ and XIRQ SPI2 SPI1 CAN4 I2C or CAN0
CAN3 or CAN4
(Continued on p. 130)
4 䡲 9S12 Architecture Special I/O (priority order)
Figure 4.18 Block diagram of a Freescale 9S12DP512.
Chip
PORTB
Table 4.7 (Continued from p. 129)
PORTT
130
The 9S12DP512 has 91 I/O pins, some of which are listed in Program 4.3. Many of the pins can be configured to implement complex I/O functions, but in this section the pins are used as simple digital inputs or outputs. We clear (0) a bit in the direction register to make that pin an input, and set it (1) to make it an output. Program 4.3 defines four of these parallel ports. A full list of I/O ports can be found in the reference manual at http://users.ece.utexas.
4.3 䡲 9S12 Architecture Details
131
Figure 4.19 A 9S12DP512 system from Technological Arts (#AD9S12DP512M0). ; port name definitions DDRH equ $026A ; Direction DDRJ equ $0262 ; Direction DDRM equ $0252 ; Direction DDRP equ $025A ; Direction DDRT equ $0242 ; Direction PTH equ $0260 ; I/O PTJ equ $0268 ; I/O PTM equ $0250 ; I/O PTP equ $0258 ; I/O PTT equ $0240 ; I/O
Program 4.3 Definitions of some of the 9S12DP512 I/O ports.
edu/ ⬃valvano/Datasheets/MC9S12DP512.zip. The full list can also be found in the TExaS file HC12.rtf. The ports simulated by TExaS are defined in the file Port12.rtf. Table 4.8 shows all 91 digital I/O pins that can be used on the 9S12DP512. Pins PE1, PE0 and Ports PORTAD1, PORTAD0 are input only. The module routine register (MODRR) will be explained later in Chapter 8, when it is needed. In C, we add the volatile, so the compiler will not optimize the code involving I/O ports. More precisely, it tells the compiler, the value may change beyond the control of the software itself. In particular, each time the value is needed, it will be reread from the port.
PORTA PORTB DDRA DDRB PORTE DDRE PORTK DDRK PORTAD0 PORTAD1 PTT PTIT DDRT MODRR PTS PTIS DDRS PTM PTIM DDRM PTP PTIP DDRP PTH PTIH DDRH PTJ PTIJ DDRJ
Table 4.8 9S12DP512 parallel ports.
4.3.3 9S12E128 Architecture
Figure 4.20 shows the port structure of the 9S12E128, which has 8 KiB of RAM and 128 KiB of flash EEPROM. This member of the 9S12 family has 12 input capture/output compare timer pins, 12 pulse-width modulated output pins, 16 ADC inputs, two DAC outputs, one SPI modules, three SCI modules, and one I2C interface. There are two sizes of the 9S12E128 chip one with 80 pins and the other with 112 pins. The 112-pin chip has 92 I/O pins, some of which are listed in Table 4.9. We clear (0) a bit in the direction register to make that pin an input and set it (1) to make it an output. A pin on Port AD (PTAD) can be used as a digital input if the corresponding bit in the ATDDIEN is set to 1 and the bit in the DDRAD is cleared to 0. A pin on Port AD (PTAD) can be used as a digital output if the corresponding bit in the DDRAD is set to 1. The pins on Port AD can be used as analog input if the ADC is enabled and the corresponding bit in the ATDDIEN is cleared to 0.
4.3.4 Operating Modes
The 9S12 can operate in one of eight modes, where the mode is selected by the values of the three signals BKGD, MODA, and MODB that exist when the device starts up after a reset. Most applications, however, utilize one of the three modes shown in Table 4.10. In single-chip mode, the 9S12 contains the four major building blocks required to make a complete computer system: processor, I/O, RAM, and EEPROM. In this book, we will be using single-chip mode exclusively, where all ports are available for input/output. Because the
9S12 family is available with flash EEPROM ranging from 32 KiB to 512 KiB, most embedded projects can be developed directly in single chip mode. On the other hand, the 9S12 family RAM sizes only range from 2 KiB to 32 KiB. For embedded systems that require a large amount of read/write storage, we will use expanded modes to interface external RAM to the system. Expanded narrow mode creates a 16-bit address bus and 8-bit data bus, while expanded wide mode implements both the address bus and data bus as 16 bits.
Table 4.10 The 9S12 has eight operating modes, but we will use normal single-chip mode.
BKGD
MODB
MODA
Mode
Port A
Port B
1 1 1
0 0 1
0 1 1
Normal single chip Normal expanded narrow Normal expanded wide
In/Out A15-8/D15-8/D7-0 A15-8/D15-8
In/Out A7-A0 A7-0/D7-0
We use flash EEPROM during development because it takes only minutes to perform an edit/assemble/download cycle. For delivered projects, we simply program our code into the flash EEPROM and embed the device into our final product. In single-chip mode, the 9S12 implements a complete microcomputer, where all its I/O ports are available. This mode is used for the final product with the application software programmed into the EEPROM. The address space of the input/output devices, and the RAM can be mapped on any 2 KiBboundary by software. In this book however, we will use the addresses at the default locations as shown in Tables 2.2, 2.3, and 2.4.
4.3.5 Phase-LockLoop (PLL)
Normally, the execution speed of a microcontroller is determined by an external crystal. Some MC9S12C32 boards have an 8-MHz crystal creating a 4-MHz E clock. The MC9S12DP512 shown in Figure 4.19 has a 16-MHz crystal creating a 8-MHz E clock. The 9S12 has a phase-lock-loop (PLL) that allows the software to adjust the execution speed of the computer. Program 4.4 will increase the E clock from 8 MHz to 24 MHz. The OSCCLK is the frequency of the crystal. Typically, the choice of frequency involves the tradeoff
4.4 䡲 The Stack ; 9S12DP512 with a 16 MHz crystal PLL_Init movb #$02,SYNR movb #$013,REFDV ; PLLCLK = 2*OSCCLK*(SYNR+1)/(REFDV+1) movb #$00,CLKSEL movb #$D1,PLLCTL ; turn on PLL brclr CRGFLG,#$08,* ; stabilized? bset CLKSEL,#$80 ; Switch to PLL rts
Program 4.4 These programs active the phase-lock-loop setting the E clock to 24 MHz.
between software execution speed and electrical power. In other words, slowing down the E clock will require less power to operate and generate less heat. Speeding up the E clock obviously allows for more calculations per second.
4.4
The Stack We begin this section with a general description of the stack and introduce the basic concepts. Sophisticated use of the stack will be developed later in Chapter 7. In the classical definition of the stack, there are just two operations one can perform: push and pull. Some computers define the two stack operations as push and pop. The push function saves data on the top of the stack (i.e., decrement SP then store at SP), and the pull function removes data from the top of the stack (read at SP then increment SP). For example, the psha instruction will push the value in register A onto the stack, leaving register A unchanged. The pula instruction will pull (or pop) a value off the stack bringing it into register A. The pull operation does modify the stack such that the pulled data is no longer on the stack. The stack implements last in first out (LIFO) behavior. On the left of Figure 4.21, we show an empty stack. We draw the memory locations for the stack in a sequential fashion with the smaller addresses the top (refer back to Figure 2.5). The white boxes represent the free area of the stack, meaning these locations have been allocated for stack usage, but do not contain data. The following code pushes the numbers 1, 2, and 3 in that order. ldaa #1 psha ldaa #2 psha ldaa #3 psha
Figure 4.21 The stack holding three elements, with the 3 on top.
; push 1 on the stack ; push 2 on the stack ; push 3 on the stack
ldaa #1 psha
ldaa #3 psha
ldaa #2 psha
SP SP SP
1
2 1
SP 3,4
Use 00 for the 9S12C32 or 9S12E128 with an 8-MHz crystal.
3 2 1
136
4 䡲 9S12 Architecture
After these three push operations, the stack would contain three numbers with the 3 on the top, as shown in Figure 4.21. The top entry of the stack contains the newest data, i.e., the data pushed last. On most computers, including the 9S12, SP points to the top element. The shaded boxes represent memory locations that contain stack data. Common Error: It is incorrect to draw stack pictures with the SP pointing to the line separating two boxes. These boxes represent memory locations. We should unambiguously draw the SP arrow pointing into the middle of a box, meaning it points to that memory location.
At this point, if one were to pull from the stack (e.g., execute pulb), the 3 would be popped off the stack into Reg B, and 2 would now be on the top of the stack, as shown in the middle picture on Figure 4.22. If one were to pull again from the stack (e.g., execute pula), the 3 would be popped off the stack into Reg A, and 1 would now be on the top of the stack (rightmost picture of Figure 4.22). Figure 4.22 The stack holding two elements, with the 2 on top.
pulb ;RegB=3
SP
3 2 1
SP
pula ;RegA=2
2 1
SP
1
Observation: Notice in the right-most stack picture of Figure 4.22 that blank white boxes are drawn without showing the 3 and the 2 inside. It would be wrong to think the 2 and 3 are still in memory right where we think we left them, because memory at addresses less than the SP is free (meaning it contains no data.)
The push and pull instructions use inherent addressing and do not modify the condition code. The push instructions produce two copies of the data, one on the stack and the other still in the register. The pull instructions remove the data from the stack, so there will be only one copy of the data left, which is in the register. psha pshb pshx pshy des pula pulb pulx puly ins
;Push RegA on the stack ;Push RegB on the stack ;Push RegX on the stack ;Push RegY on the stack ;RegSP =RegSP-1 (reserve space on the stack) ;Pull value from stack, put in RegA ;Pull value from stack, put in RegB ;Pull value from stack, put in RegX ;Pull value from stack, put in RegY ;RegSP=RegSP+1 (discard top of stack)
The stack is used for many purposes. A common use is temporary storage. If a piece of information is important, we can push it on the stack. Later, when we wish to retrieve the data, we pull it off the stack. Checkpoint 4.7: After a psha instruction, how many copies exist of the data being pushed? Checkpoint 4.8: After a pula instruction, how many copies exist of the data being pulled? Checkpoint 4.9: Assume you have two 8-bit global variables M and N. Write assembly code that switches the values in M and N using just the ldaa staa psha and pula instructions.
4.5 䡲 16-Bit Timer
137
Later in Chapter 7, we will learn the additional stack operations of stack read and stack write. The stack read operation allows you to retrieve data previously pushed on the stack without modifying the data or the stack pointer. The stack write operation allows you to change a value previously pushed on the stack without changing the stack pointer. Even though these two stack operations are not part of the classical definition of a stack, they will be essential for implementing parameter passing and local variables. The following are important instructions that greatly facilitate the use of the stack. The instructions tsx and tsy will move a copy of the stack pointer into Register X or Y, respectively. tsx tsy txs tys
;Transfer RegSP to RegX (now RegX also points to the top of stack) ;Transfer RegSP to RegY (now RegY also points to the top of stack) ;Transfer RegX to RegSP ;Transfer RegY to RegSP
Usually, we initialize the stack to the last location of RAM (and the stack grows down to smaller addresses), as drawn in Figures 4.21 and 4.22. To be most efficient, we allocate global variables in a contiguous fashion starting at the first location of RAM. Unless we have a heap or other use of RAM, we consider all the RAM that is not used for global variables as memory available for the stack. For example, on the 9S12DP512, there is 14 KiB of RAM from $0800 to $3FFF. If there are 2048 bytes of global variables ($0800 to $0FFF), then 12 KiB of RAM ($1000 to $3FFF) is allocated for the stack. Using this memory allocation example, the free area is from $1000 to SP-1, and the area containing information on the stack is from SP to $3FFF. These stack rules should be carefully followed: 1. 2. 3. 4. 5.
Program segments should have an equal number of pushes and pulls. Stack accesses (push or pull) should not be performed outside the allocated area. Stack reads and writes should not be performed within the free area. Stack push should first decrement SP, then store the data. Stack pull should first read the data, then increment SP.
Stack overflow occurs when the system pushes so many items that the stack starts overwriting information in the global variables. Stack underflow occurs when the system pulls more data than it pushed resulting in an inappropriate attempt to read and write stack data from EEPROM. Rules 1 and 2 address the problem of stack overflow and stack underflow. Tutorial 8 will experimentally explore the issues of stack overflow and stack underflow. When an interrupt is serviced, it will push many items on the stack. When it is finished, it will pull the items off the stack, putting the registers and stack back to their original state. Because interrupts are invoked by external hardware events, the software can not know in advance when they will occur. Rules 3, 4, and 5 guarantee the free area really is available for the interrupt.
4.5
16-Bit Timer The 9S12 has a 16-bit timer called TCNT. The timer is essentially a 16-bit counter that incremented at a fixed rate, and the software can read its value to know the current time. It also can be used to create pulses, squarewaves, and pulse-width modulated waves. It can be configured to measure the period, pulse-width, or frequency of an input signal. For information on interrupts, creating waveforms, and measuring events, see Chapter 9. In this section, we will use TCNT to create fixed-time software delays. Table 4.11 shows some of the 9S12 timer registers. On the 9S12, TCNT is a 16-bit unsigned counter that is incremented continuously at a rate determined by three bits (PR2, PR1, and PR0) in the TSCR2 register as shown in Table 4.12. When the TCNT register hits $FFFF, the next count will roll over to zero, and it keeps counting. Table 4.12 shows the TCNT period for various E clock frequencies. Remember the default frequency depends on the crystal. The 9S12C32 in Figure 4.17 has a default frequency
Table 4.12 Given an E clock frequency, the PR2, PR1, and PR0 bits define the TCNT rate.
of 4 MHz, and the 9S12DP512 in Figure 4.17 has a default frequency of 8 MHz. On all 9S12 systems, the PLL can be activated to create a 24-MHz E clock. In order to use TCNT on the 9S12, you must first set bit 7 of the TSCR1 register. In this example, we will set the prescale bits to make the TCNT count every 1s. The prescale value depends on the E clock frequency, but the delay functions are platform independent. Program 4.5 shows how to use TCNT to create a time delay. The delay parameter to the assembly Timer_Wait subroutine can be any number from 1 to 32767. However, the C implementation of Timer_Wait will operate properly with any number less than 65000.
; 9S12DP512 at 8 MHz (9S12C32 at 4 MHz) ; Enable TCNT at 1us Timer_Init movb #$80,TSCR1 ;enable movb #$03,TSCR2 ;($02 if 9S12C32) rts ; either 9S12C32 or 9S12DP512 ; Reg D is the time to wait in us ; Reg D must be less than 32767 Timer_Wait
// 9S12DP512 at 8 MHz (9S12C32 at 4 MHz) // Initialize TCNT to 1us // Input: none // Output: none void Timer_Init(void){ TSCR1 = 0x80; // enable TCNT TSCR2 = 0x03; // ($02 if 9S12C32) } // either 9S12C32 or 9S12DP512 // Input: delay time in 1us units // Output: none
continued on p. 139 Program 4.5 Timer functions that implement a time delay.
4.5 䡲 16-Bit Timer
139
continued from p. 138 addd TCNT ;end of wait time wloop cpd TCNT ;stop when RegD
void Timer_Wait(unsigned short cycles){ unsigned short startTime = TCNT; while((TCNT-startTime) <= cycles){} } // Input: delay time in 1ms units // Output: none void Timer_Wait1ms(unsigned short delay){ unsigned short i; for(i=0; i<delay; i++){ Timer_Wait(1000); // wait 1ms } }
Observation: If n is the bottom three bits of TSCR2, then TCNT is 2n times slower than the E clock. Checkpoint 4.10: Write assembly code to initialize the timer so that it counts every 2 s. Checkpoint 4.11: If the TCNT counts every 2 s, how long does it take for it to count all the way from 0 to $FFFF and back to 0 again? Checkpoint 4.12: Write one subroutine that waits 10 ms? Assume TCNT has been initialized to 1s. This subroutine has no input or output parameters, it just waits 10 ms.
Example 4.2 Design a system with four outputs, making them 5, 6, 10, and 9 over and over separated by 5 ms. Solution There are no inputs to this system. However, the four outputs will be generated using port pins. We could have used any port, but in this example we will use PT3-0. This example illustrates how development time is reduced by reusing previously developed code. In this example we need software that implements a 5-ms delay. Rather than starting over from scratch, we will reuse the timer subrountines previously developed in Program 4.5. Notice, when we think about Timer_Wait1ms in Program 4.6, we can focus on what it does Main
void main(void){ DDRT |= 0x0F; // PT3-0 outputs Timer_Init(); // enable timer while(1){ PTT = 0x05; Timer_Wait1ms(5); // Program 4.5 PTT = 0x06; Timer_Wait1ms(5); PTT = 0x0A; Timer_Wait1ms(5); PTT = 0x09; Timer_Wait1ms(5); } }
4 䡲 9S12 Architecture
140
(takes a parameter in Reg Y and waits that many ms), rather than worry about how it works. During initialization, we set the direction register to make PT3-0 outputs and enable the timer. After each output, the system waits 5 ms by calling Timer_Wait1ms with RegY equal to 5.
Observation: If the PT3-0 outputs of Example 4.2 were connected to a stepper motor as shown in Figure 8.28, this software would cause the motor to spin at a constant rate.
4.6
*
Memory Allocation Memory allocation is the decision of where in memory we put the various pieces of our software. The memory on a PC-compatible computer is physically configured as a simple linear array. In other words, if you have 4 tebibytes of RAM, then this memory exists as a continuous linear object with no fundamental difference in the behavior of one memory cell to the next. Although the memory itself forces no structure in the way it is used, the Intel x86 processors have implemented a memory access scheme that requires the programmer to separate memory into segments: e.g., machine codes, global variables, and local variables. The term x86 refers to any Intel processor from the 8086 through the current Pentiums. There can be more than three segments, but three are enough to illustrate the point. The mechanism to access these segments is called segmentation. Figure 4.23 shows a simple view of the memory allocation on the Intel x86 family.
Figure 4.23 The Intel x86 uses segmented memory allocation.
Memory CS IP
Machine codes Op code fetched
SS SP
Local variables Return addresses Temporary data Stack access
DS SI
Global variables Heap
Data access
In particular, when the Pentium fetches a machine code, it uses two registers. The code segment selector (CS) points to the beginning of the code segment, and the instruction pointer (IP) contains the offset within this segment of the op code to fetch. Similarly, when the Pentium accesses a global variable, it uses two different registers. The data segment selector (DS) points to the beginning of the data segment, and a data pointer (e.g., DI) contains the offset within this segment of global variable. Lastly, when the Pentium accesses a local variable, it uses a stack segment selector (SS) and either the stack pointer (SP) or the base pointer (BP). The stack segment selector (SS) points to the beginning of the stack segment, and a stack pointer (SP or BP) contains the offset within this segment of local variable. Segmentation forces the programmer to allocate in memory information that has similar properties. In other words, all the machine codes are placed in one group, the global variables are in another group, and the stack is in a third group. This allocation scheme provides
4.6 䡲 *Memory Allocation
141
for protection so that the errors of stack overflow, stack underflow, and accessing an illegal pointer do not modify machine codes. We will allocate memory on our embedded system in a fashion similar to segmentation, but for a different reason. Because different types of memory on an embedded computer behave in different fashions to makes sense to group together in memory information that has similar properties or usage. Typical examples of this grouping include global variables, the heap, local variables, fixed constants, and machine instructions. Figure 4.24 shows a typical memory allocation scheme for an embedded system. Figure 4.24 We place variables in RAM and programs in ROM on an embedded system.
RAM memory Global variables Heap Data access
Pointer Local variables Return addresses Temporary data
Stack access
SP ROM memory Machine codes Fixed constants
Op code fetched
PC Vectors
Global variables are permanently allocated in RAM. We use global variables for information that must be permanently available. Private global variables are accessed by only one module. Conversely, we use public global variables for data that is to be shared by more than one module. We will see many applications later in this book of the first-in-first-out (FIFO) queue, which is a well-defined mechanism to pass data from one module to another. The FIFO module includes two public functions, Put and Get, but the FIFO data itself, although stored in permanent RAM, will be private. This means the modules must call the functions Put and Get to access the data. Some software systems use a heap to dynamically allocate and release memory. This information can be shared (public) or not shared (private), depending on which modules have pointers to the data. The heap is efficient in situations where storage is needed for only a limited amount of time. Local variables are usually allocated on the stack at the beginning of the subroutine/function, used within the subroutine/function, and deallocated at the end of the subroutine/function. Local variables are not shared with other modules, hence local variables are private. Fixed constants do not change and include information such as numbers, strings, sounds and pictures. Just like the heap, the fixed constants can be shared or not shared depending on which modules have pointers to the data. The assembler or compiler translates our software into machine instructions (op codes and operands) that when executed perform the intended operations. For single chip microcomputers, there are three types of memory. The RAM contains temporary information that is lost when the power is shunt off (i.e., volatile). This means that all variables allocated in RAM must be explicitly initialized at run time by the software. Some C compilers initialize all RAM-based global variables to zero, and others do not. It is good software development practice to set globals to the desired initial value explicitly. Most microcomputers have either EEPROM (which can be erased and reprogrammed) or ROM (which is a low-cost nonvolatile storage that can be programmed only once). The term OTP stands for one-time programmable, meaning the customer can program the nonvolatile memory only once. In an embedded application, we usually put global variables, the heap, and local variables in RAM, because these types of information can change during execution. When software is
142
4 䡲 9S12 Architecture
to be executed on a regular computer, the machine instructions are usually read from a massstorage device (like a disk) and loaded into memory. Because the embedded system usually has no mass-storage device, the machine instructions and fixed constants must be stored in nonvolatile memory. If there is both EEPROM and ROM on our microcomputer, we put some fixed constants in EEPROM and some in ROM. If it is information that we may wish to change in the future, we could put it in EEPROM. Examples include language-specific strings, calibration constants, finite state machines, and system ID numbers. This allows us to make minor modifications to the system by reprogramming the EEPROM without throwing the chip away. If our project involves producing a small number of devices then the program can be placed in OTP ROM or EEPROM. For a project with a large volume, it will be cost effective to place the machine instructions in factory-programmed ROM.
4.7
Performance Debugging Performance debugging involves the verification of the timing behavior of our system. It is a dynamic process where the system is run, and the dynamic behavior of the input/outputs are compared against the expected results. Two methods of performance debugging are presented, then the techniques are applied to measure execution speed.
4.7.1 Instrumentation
Program 4.7 Instrumentation output port.
In the last section, we saw that TCNT is a 16-bit counter incremented at a regular rate. There is a prescaler that can be placed between the E clock and the TCNT counter. It automatically rolls over when it gets to $FFFF. If we are sure the execution speed of our function is less than (65535 counts), we can use this timer to collect timing information with only a modest amount of intrusiveness. For example, if we read TCNT, execute some software, then read TCNT again, the difference in TCNT represents the elaspsed time for the executing software. Another method to observe time-dependent behavior of our software involves an output port and a logic analyzer or oscilloscope. Assume an oscilloscope is attached to Port T bit 6. The two subroutines in Program 4.7 can be used to set and clear the bit. The C version of the instrument use the #define so the debugging code is inserted into the program, rather than requiring a function call.
Next, you add jsr Pin_Set6 and jsr Pin_Clr6 statements at strategic places within the system. The DDRT must be initialized so that bit 6 is an output, before the debugging begins. You can observe the signal with a high-speed oscilloscope.
4.7.2 Measurement of Dynamic Efficiency
We will present three ways to measure dynamic efficiency of our software. To illustrate these three methods, we will consider measuring the execution time of the sqrt function. The first method is to count bus cycles using the assembly listing. This approach is only appropriate for very short programs, and becomes difficult for long programs with many conditional branch instructions. Often this is a very tedious process, but luckily the TExaS
4.7 䡲 Performance Debugging
143
assembler will look up and keep a running count of the number of cycles. The assembly pseudo-op org* will reset the cycle counter, shown between the parentheses. A portion of the assembly output is presented in Program 4.8. Notice that the total cycle count for a 9S12 implementation is 71 cycles. At 8 MHz, 71 cycles is 8.875 s. Because the loop (between next and bne next) is executed exactly four times, the actual time will be 713*41 cycles or 24.25 s. For most programs, it is actually very difficult to get an accurate time measurement using this technique.
Program 4.8 Assembly listing from TExaS of the sqrt subroutine.
The second method uses an internal timer called TCNT, as shown in Program 4.9. The 9S12 has a 16-bit unsigned counter that is automatically incremented at a regular rate. If the function completes in a time less than 65535 clock counts, then the internal timer can be used to measure execution speed empirically. The assembly language call to the function is modified so that TCNT is read before and after the subroutine call. The elapsed time is the difference. Since the execution speed may be dependent on the input data, it is often wise to measure the execution speed for a wide range of input parameters. There is a slight overhead in the measurement process itself. To be more accurate, you could measure this overhead and subtract it off your measurements.
144
4 䡲 9S12 Architecture
Program 4.9 Empirical measurement of dynamic efficiency in assembly.
before rmb 2 ;time at start elasped rmb 2 ;in cycles movw TCNT,before jsr sqrt ldd TCNT subd before std elasped
unsigned short before,elasped; void main(void){ ss = 100; before = TCNT; tt = sqrt(ss); elapsed = TCNT-before; }
Common Error: Debugging code should not alter the program operation. In particular, the debugging code in Program 4.9 destroys the result parameter returned in Register B.
The third technique can be used in situations where TCNT is unavailable or where the execution time might be larger than 65535 counts. In this empirical technique, we attach an unused output pin to an oscilloscope or to a logic analyzer. We will set the pin high before the call to the function and set the pin low after the function call. In this way a pulse is created on the digital output with a duration equal to the execution time of the function. Assume Port T is available, and bit 7 is connected to the scope. By placing the function call in a loop, the scope can be triggered. With a storage scope or logic analyzer, the function need be called only once. The performance debugging code is shown in Program 4.10. Program 4.10 Another empirical measurement of dynamic efficiency in assembly.
DDRT |= 0x80; // PT7 output ss = 100; while(1){ PTT |= 0x80; // set PT7 high tt = sqrt(ss); PTT &= ~0x80; // clear PT7 low }
Tutorial 4 Building a Microcomputer and Executing Machine Code In this example, we will create a four input four output NOT gate using an embedded microcomputer, see Figure T4.1. The four input signals will be connected to an input port of the microcomputer and the four output signals will be connected to an output port. Some pins of Port T are used as input and some pins are used as output.
Figure T4.1 External inputs are connected to PT3 to PT0 and PT7 to PT4 are the outputs.
9S12
PT7 PT6 PT5 PT4 PT3 PT2 PT1 PT0
The first software step is to write an initialization function that specifies Port T bits PT7 to PT4 will be outputs and Port T bits PT3 to PT0 will be inputs, see Program T4.1. This function, that we usually execute once at the start of our program, is called a ritual. The direction register, DDRT, specifies whether each pin is an output (1) or an input (0). The main program will input from Port T, perform the logical complement, shift the data into the proper position, then output it back to Port T. The input operation performed on the output pins will return the previous value writ-
4.8 䡲 Tutorial 4 Building a Microcomputer and Executing Machine Code
145
ten, while the output operation to the input pins has no effect. In each case, the stack pointer is initialized to a RAM location. In these C programs we will assume the compiler will handle the segmentation (placing variables in RAM, programs in ROM/EEPROM), initializing the stack, and setting the reset vector. Program T4.1 Assembly and C programs that implement the 4-bit NOT gate.
PTT DDRT main loop
init
equ equ org lds bsr ldaa coma lsla lsla lsla lsla staa bra ldaa staa rts org fdb
$0240 $0242 $F000 #$4000 ;Initialize stack init ;ritual PTT ;input ;logical not ;shift
PTT loop #$F0 DDRT
;output ;repeat ;PT7-PT3 out ;PT3-PT0 in
$FFFE main
;place to start
void init(void){ DDRT = 0xF0; // PC7-PC4 are outputs, // PC3-PC0 are inputs } void main(void){ unsigned char data; init(); // call once while(1){ data = PTT; // input data =(~data)<<4; // complement and shift PTT = data; // output } }
Question 4.1 Choose the processor you wish to study and look up its RAM and ROM locations. Look up the locations of PTT and DDRT. Action: Copy the three files Tutor4.rtf Tutor4.uc Tutor4.io from the web onto your harddrive. Start a fresh copy of TExaS and open the Tutor4.rtf file, TExaS should open the other two files. Execute the Mode->Processor. . . command and select the processor you identified in Question T4.1. Save these three documents. Figure T4.2 shows the 9S12DP512 being selected. Figure T4.2 Mode-Processor dialog.
146
4 䡲 9S12 Architecture Action: Click on the Program window and execute the Assemble-Options . . . command. Make sure all the check boxes except Automatically create a *.S19 file are checked. Make sure the Complete Assembly (small programs) radio button is selected, as shown in Figure T4.3.
Figure T4.3 Assemble-Options dialog.
Action: Edit the assembly source code, Tutor4.rtf. Adjust the $0240 $0242 $F000 and $4000 in the first four lines to match the processor you selected in Question 4.1. Assemble the program by executing the Assemble-Assemble command. Question 4.2 Assume only PT2 and PT0 are on. Execute this program by hand up to an including the bra instruction. For each instruction show the memory cycles generated and the values of Registers A PC and SP after each instruction. Show the simplified cycles as described in Chapter 2. Action: Make sure just PT2 and PT0 are on. Activate the FollowPC CycleView InstructionView and LogRecord modes using the commands in the Mode menu. Single step (F10) the program up to an including the bra instruction. Verify the answers you gave for Question 4.2. Action: Bring the Tutor4.io window to the top and notice that the switches connected to PT2 and PT0 are activated, as shown in Figure T4.4. Start the simulation (F12). Toggle each of the switches by clicking on it with the mouse. Notice the corresponding LED is the logical complement of the switch. Figure T4.4 IO window.
4.9 䡲 Homework Assignments
4.9
147
Homework Assignments Homework 4.1 What is the difference between memory-mapped and isolated I/O? Homework 4.2 Name the four main components of the processor. Give a brief description of their functions. Homework 4.3 The goal is to design a latched input as shown in Figure 4.12, using a 74HC374 octal D-flip-flop. Look up the data sheet for the 74HC374. Draw a circuit diagram between the input signals, external control (as shown in Figure 4.12) and Ports H and P on the 9S12DP512. Be careful to shown all pins of the 74HC374. The rising edge of the control signal should cause the input signals to be latched. Write a device driver for this interface with three public functions: Initialization, ReadPort, and ReadLatch. The ReadPort function returns the current value of the input lines in RegA, and the ReadLatch function returns the previously latched value in RegA. Homework 4.4 The goal is to design a transparent latched input, using a 74HC573 octal D latch. Look up the data sheet for the 74HC573. This input is similar, but not identical to Figure 4.12. Draw a circuit diagram between the input signals, external control and Ports A and B on the 9S12. Be careful to shown all pins of the 74HC573. When in LE pin of the 74HC573 is high, its output equals its input. The falling edge of the control signal will cause the input signals to be latched. Write a device driver for this interface with three public functions: Initialization, ReadPort, and ReadLatch. The ReadPort function returns the current value of the input lines in RegA, and the ReadLatch function returns the previously latched value in RegA. Homework 4.5 The goal is to design a digital output port expander, using six 74HC374 octal D-flip flops. Look up the data sheet for the 74HC374. Six bits of Port M and eight bits of Port T will be used as 14 bits of output from the 9S12. Your digital circuit will convert these 14 bits into 48 digital output lines. Draw the interface circuit. Be careful to shown all pins of the 74HC374. Write a device driver for this interface with two public functions: Initialization, WritePort. The WritePort function takes two parameters, a 6-bit port number in RegA and an 8-bit data value in RegB. Homework 4.6 If Port T is an output, what does it mean if the data read from PTT is not equal to the data read from PTIT? Homework 4.7 Assume a $55 is written to DDRT. What is the effect of executing inc PTT? Homework 4.8 It seems silly in Program 4.1 to make subroutines out one-line assembly code. In what situation might making subroutines like this actually be useful? Homework 4.9 Why does the conditional branch instruction somes take one cycle and somes take three cycles? Why does the bra instruction always take three cycles. Homework 4.10 Why do the C code in Programs 4.2 and 4.3 use the volatile qualifier? Homework 4.11 Assume your 9S12 has a 16 MHz crystal, so it normally runs at 8 MHz. Program 4.4 makes the 9S12 run at 24 MHz, but it can run at 25 MHz. Write a PLL initialization subroutine to make it run at 25 MHz. Homework 4.12 Assume your 9S12 has an 8 MHz crystal, so it normally runs at 4 MHz. In order to save power, we can make it run slower. Write a PLL initialization subroutine to make it run at 1 MHz. Homework 4.13 Generally we say the faster the computer runs the better it is. When designing a computer what reasons might we not want to run at the absolute fastest possible frequency? Homework 4.14 Assume the SP has been properly initialized. What will be the contents of RegA and RegX after the following six instructions are executed? Hint: draw stack pictures. ldaa #$56 ldx #$789A pshx psha pulx pula
148
4 䡲 9S12 Architecture Homework 4.15 Assume the SP has been properly initialized. What will be the contents of RegA and RegX after the following six instructions are executed? Hint: draw stack pictures. ldaa #$34 ldx #$5678 pshx psha pulx pula Homework 4.16 Using the stack, write assembly code that rotates the three registers: D goes into X, X goes into Y, and Y goes into D. Homework 4.17 Write an assembly language subroutine that implements a 1 second delay using the TCNT timer. Include both an initialization ritual and a delay function. Homework 4.18 Rewrite Program 2.1 so that it allocates the stack from $0800 to $1000, the global variables from $1000 to $3FFF, and the program from $C000 to $FFFF. Homework 4.19 When defining an 8-bit variable in RAM, why is it better to use an rmb 1 rather than an fcb 0. Homework 4.20 If you use an rmb 2 to define a 16-bit variable in RAM, what is the initial value when power is first applied, before the program begins execution? Homework 4.21 If you use an fcb 5 to define an 8-bit variable in ROM, what is the initial value when power is first applied, before the program begins execution?
4.10
Laboratory Assignments Lab 4.1 Start a fresh copy of TExaS and create a new Program and Microcomputer windows. Execute the Mode-Processor. . . command and select the either the 9S12C32 or the 9S12DP512. Save these two documents with names having the same root name. E.g., Lab4.rtf Lab4.uc. Step 1. Execute the Mode-Processor. . . command and select either the 9S12C32 or the 9S12DP512. Save this document as Lab4.uc. Step 2. Type in the source code for Program L4.1 and save the file as Lab4.rtf. Assemble this program by executing the Assemble-Assemble command.
$3800 2 ;16-bit signed result 3 $4000 #Array ;pointer to array #0 ;Sum=0 Sum 1,x+ ;get data from array a,d ;promote to 16-bits Sum Sum ;Sum=Sum+data #Array+SIZE loop ;done? -12,20,-40 ;array of data $FFFE ;reset vector main
Step 3. For each op code, first determine the addressing mode it uses. The sex a,d is inherent mode, because it operates on the registers without accessing memory. org rmb equ fcb and fdb are pseudo ops and thus do not have addressing modes. The instructions that access Sum will use
4.10 䡲 Laboratory Assignments
149
extended addressing, rather than direct addressing, because the address of Sum ($3800) is outside the $0000 to $00FF range of direct addressing. Step 4. Execute this program by hand (using paper and pencil) up to but not including the stop instruction. For each instruction show the memory cycles generated and the values of any registers that change. Show the simplified cycles as described in Chapter 2. This program will execute 22 instructions, resulting in RegD being the 16-bit result 32. You do not need to show free cycles or changes to the CCR, but do include changes to the other registers including the IR and EAR. The IR is set after reading the op code, and the EAR is set before reading/writing memory with direct, extended, or indexed addressing modes. Some instructions, like leax leay leas, use indexed addressing mode and set the EAR, but do not access memory. Other than these exceptions, the EAR holds the address when reading data from memory or writing data to memory. Lab 4.2 Start a fresh copy of TExaS and create a new Program and Microcomputer windows. Execute the Mode-Processor . . . command and select the either the 9S12C32 or the 9S12DP512. Save these two documents with names having the same root name. E.g., Lab4.rtf Lab4.uc. Step 1. Execute the Mode-Processor... command and select either the 9S12C32 or the 9S12DP512. Save this document as Lab4.uc. Step 2. Type in the source code for Program L4.2 and save the file as Lab4.rtf. Assemble this program by executing the Assemble-Assemble command. Program L4.2. Assembly program used in Lab 4.2.
calculates Average=(Xi[0]+Xi[1]+Xi[2])/3 org $3800 Ave rmb 1 SIZE equ 3 org $4000 main ldx #Xi ;pointer to array ldab #0 ;Sum=0 loop addb 1,x+ ;add data from array cpx #Xi+SIZE $400A 26F9
;done? done
Xi
clra ldx idiv tfr stab stop fcb org fdb
#SIZE X,B Ave 10,20,30 $FFFE main
;D=Sum ;divisor ;X=Sum/3 ;demote
;array of data ;reset vector
Step 3. For each op code, first determine the addressing mode it uses. The idiv is inherent mode, because it operates on the registers without accessing memory. org rmb equ fcb and fdb are pseudo ops and thus do not have addressing modes. The instruction that accesses Ave will use extended addressing, rather than direct addressing, because the address of Ave ($3800) is outside the $0000 to $00FF range of direct addressing. Step 4. Execute this program by hand (using paper and pencil) up to but not including the stop instruction. For each instruction show the memory cycles generated and the values of any registers that change. Show the simplified cycles as described in Chapter 2. This program will execute 17 instructions, resulting in the variable Ave being 20. You do not need to show free cycles or changes to the CCR, but do include changes to the other registers including the IR and EAR. The IR is set after reading the op code, and the EAR is set before reading/writing memory with direct, extended, or indexed addressing modes. Lab 4.3 Start a fresh copy of TExaS and create a new Program and Microcomputer windows. Execute the Mode->Processor. . . command and select the either the 9S12C32 or the 9S12DP512. Save these two documents with names having the same root name. E.g., Lab4.rtf Lab4.uc.
150
4 䡲 9S12 Architecture Step 1. Execute the Mode-Processor... command and select either the 9S12C32 or the 9S12DP512. Save this document as Lab4.uc. Step 2. Type in the source code for Program L4.3 and save the file as Lab4.rtf. Assemble this program by executing the Assemble-Assemble command.
calculates Average=(Xi[0]+Xi[1]+Xi[2])/3 org $3800 Ave rmb 2 SIZE equ 3 org $4000 main ldx #Xi ;pointer to array ldd #0 ;Sum=0 loop addd 2,x+ ;add data from array cpx #Xi+SIZE*2 bne loop ;done? done ldx #SIZE ;D=Sum, X=divisor idiv ;X=Sum/3 stx Ave stop Xi fdb 100,200,300 ;array of data org $FFFE ;reset vector fdb main
Step 3. For each op code, first determine the addressing mode it uses. The idiv is inherent mode, because it operates on the registers without accessing memory. org rmb equ and fdb are pseudo ops and thus do not have addressing modes. The instruction that accesses Ave will use extended addressing, rather than direct addressing, because the address of Ave ($3800) is outside the $0000 to $00FF range of direct addressing. Step 4. Execute this program by hand (using paper and pencil) up to but not including the stop instruction. For each instruction show the memory cycles generated and the values of any registers that change. Show the simplified cycles as described in Chapter 2. This program will execute 15 instructions, resulting in the variable Ave being 200. You do not need to show free cycles or changes to the CCR, but do include changes to the other registers including the IR and EAR. The IR is set after reading the op code, and the EAR is set before reading/writing memory with direct, extended, or indexed addressing modes. Lab 4.4 Start a fresh copy of TExaS and create a new Program and Microcomputer windows. Execute the Mode->Processor . . . command and select the either the 9S12C32 or the 9S12DP512. Save these two documents with names having the same root name. E.g., Lab4.rtf Lab4.uc. Step 1. Execute the Mode->Processor... command and select either the 9S12C32 or the 9S12DP512. Save this document as Lab4.uc. Step 2. Type in the source code for Program L4.4 and save the file as Lab4.rtf. Assemble this program by executing the Assemble->Assemble command.
Program L4.4 Assembly program used in Lab 4.4.
; $3800 $3800 $0003 $4000 $4000 CD4017
calculates Average=(Xi[0]+Xi[1]+Xi[2])/3 org $3800 Ave rmb 2 SIZE equ 3 org $4000 main ldy #Xi ;pointer to array
continued on p. 151
4.10 䡲 Laboratory Assignments
151
continued from p. 150 $4003 $4006 $4008 $400B $400D $4010 $4012 $4015 $4017 $FFFE $FFFE
ldd #0 ;Sum=0 loop addd 2,y+ ;add data from array cpy #Xi+SIZE*2 bne loop ;done? done ldx #SIZE ;D=Sum, X=divisor idivs ;X=Sum/3 stx Ave stop Xi fdb -100,200,-300 ;array of data org $FFFE ;reset vector fdb main
Step 3. For each op code, first determine the addressing mode it uses. The idivs is inherent mode, because it operates on the registers without accessing memory. org rmb equ and fdb are pseudo ops and thus do not have addressing modes. The instruction that accesses Ave will use extended addressing, rather than direct addressing, because the address of Ave ($3800) is outside the $0000 to $00FF range of direct addressing. Step 4. Execute this program by hand (using paper and pencil) up to but not including the stop instruction. For each instruction show the memory cycles generated and the values of any registers that change. Show the simplified cycles as described in Chapter 2. This program will execute 15 instructions, resulting in the variable Ave being 66. You do not need to show free cycles or changes to the CCR, but do include changes to the other registers including the IR and EAR. The IR is set after reading the op code, and the EAR is set before reading/writing memory with direct, extended, or indexed addressing modes. Lab 4.5 Start a fresh copy of TExaS and create a new Program and Microcomputer windows. Execute the Mode->Processor . . . command and select the either the 9S12C32 or the 9S12DP512. Save these two documents with names having the same root name. E.g., Lab4.rtf Lab4.uc. Step 1. Execute the Mode->Processor... command and select either the 9S12C32 or the 9S12DP512. Save this document as Lab4.uc. Step 2. Type in the source code for Program L4.5 and save the file as Lab4.rtf. As you can see, you are given the machine code for this software. Assemble this program by executing the Assemble- Assemble command. Program L4.5 Assembly program used in Lab 4.4.
org main fcb fcb org fdb
$4000 $CD,$40,$11,$C6,$00,$EB,$70,$8D,$40,$14 $26,$F9,$7B,$38,$00,$18,$3E,$F6,$14,$E2 $FFFE ;reset vector main
Step 3. Using the simulator, figure out what this program does. Specifically, what are its inputs and what are its outputs? Give a functional description of how the outputs depend on the inputs.
5
Modular Programming Chapter 5 objectives are to: c c c c c
Present an introduction to modular programming Use conditional branching to perform decisions Implement for-loops, macros and recursion Implement modular programming using subroutines Present functional debugging as a method to test software
In this chapter, we will begin by presenting a general approach to modular design. In specific, we will discuss how to organize software blocks in an effective manner. The ultimate success of an embedded system project depends both on its software and hardware. Computer scientists pride themselves in their ability to develop quality software. Similarly electrical engineers are well-trained in the processes to design both digital and analog electronics. Manufacturers, in an attempt to get designers to use their products, provide application notes for their hardware devices. The main objective of this book is to combine effective design processes together with practical software techniques in order to develop quality embedded systems. As the size and especially the complexity of the software increase, the software development changes from simple “coding” to “software engineering”, and the required skills also vary along this spectrum. These software skills include modular design, layered architecture, abstraction and verification. Real-time embedded systems are usually on the small end of the size scale, but never the less these systems can be quite complex. Therefore, both hardware and software skills are essential for developing embedded systems. Writing good software is an art that must be developed, and can not be added on at the end of a project. Just like any other discipline (e.g., music, art, science, religion), expertise comes from a combination of study and practice. The watchful eye of a good mentor can be invaluable, so take the risk and show your software to others inviting praise and criticism. Good software combined with average hardware will always outperform average software on good hardware. In this chapter, we will outline various techniques for developing quality software.
152
5.1 䡲 Modular Design
5.1
153
Modular Design Back in Section 1.6, we presented successive refinement as a method to convert a problem statement into a software algorithm. Successive refinement is the transformation from the general to the specific. In this section, we introduce the concept of modular programming and demonstrate that it is an effective way to organize our software projects. There are four reasons for forming modules. Functional abstraction allows us to reuse a software module from multiple locations. Complexity abstraction allows us to divide a highly complex system into smaller less complicated components. The third reason is portability. If we create modules for the I/O devices, then we can isolate the rest of the system from the hardware details. This approach is sometimes called a hardware abstraction layer. Since all the software components that access an I/O port are grouped together, it will be easier to redesign the embedded system on a machine with different I/O ports. Another reason for forming modules is security. Modular systems by design hide the inner workings from other modules, and provide a strict set of mechanisms to access data and I/O ports. Hiding details and restricting access generates a more secure system.
5.1.1 Definition and Goals
The key to completing any complex task is to break it down into manageable subtasks. Modular programming is a style of software development that divides the software problem into distinct well-defined modules. The parts are as small as possible, yet relatively independent. Complex systems designed in a modular fashion are easier to debug because each module can be tested separately. Industry experts estimate that 50 to 90 percent of software development cost is spent in maintenance. All five aspects of software maintenance 䡲 䡲 䡲 䡲 䡲
Correcting mistakes Adding new features Optimizing for execution speed or program size Porting to new computers or operating systems Reconfiguring the software to solve a similar related program
are simplified by organizing the software system into modules. The approach is particularly useful when a task is large enough to require several programmers. A program module is a self-contained software task with clear entry and exit points. There is a distinct difference between a module and the assembly language subroutine or C language function. A module is usually a collection of subroutines or functions that in its entirety performs a well-defined set of tasks. A collection of 32-bit math operations is an example of a module. The device driver in Example 4.1 is an another example of a module. Modular programming involves both the specification of the individual modules and the connection scheme whereby the modules are interfaced together to form the software system. While the module may be called from many locations throughout the software, there should be well-defined entry points. In C, the entry point of a module is defined in the header file, and is specified by a list of function prototypes for the public functions. Similarly in assembly, the entry point of a module is also a list of public subroutines that can be called. Common Error: In many situations the input parameters have a restricted range. It would be inefficient for the module and the calling routine to both check for valid input. On the other hand, an error may occur if neither checks for valid input.
An exit point is the ending point of a program module. The exit point of a subroutine is used to return to the calling routine. We need to be careful about exit points. It is important that the stack be properly balanced at all exit points. Similarly, if the subroutine returns parameters, then all exit points should return parameters in an acceptable format. If the
154
5 䡲 Modular Programming
main program has an exit point it either stops the program or returns to the debugger. In most embedded systems, the main program does not exit. Common Error: It is an error if all the exit points of an assembly subroutine do not balance the stack and return parameters in the same way.
In this section, an object refers to either a subroutine or a data element. A public object is one that is shared by multiple modules. This means it can be accessed by other modules. Typically, we make the most general functions of a module public, so the functions can be called from another module. For a module performing I/O, typical public functions include initialization, input, and output. A private object is one that is not shared. I.e., it can be accessed by only one module. Typically, we make the internal workings of a module private, so we hide how it works from user of the module. In an objectoriented language like C or Java, the programmer clearly defines a function or data object as public or private. Later in this chapter, we will present a naming convention for assembly language or C that can be used in an equivalent manner to define a function or data object as public or private. At a first glance, I/O devices seem to be public. For example, PTT resides permanently at the fixed address of $0240, and the programmer of every module knows that. In other words, from a syntactic viewpoint, any module has access to any I/O device. However, in order to reduce the complexity of the system, we will restrict the number of modules that actually do access the I/O device. From this perspective, however, we will write software that considers I/O devices as private, meaning an I/O device should be accessed by only one module. In general, it will be important to clarify which modules have access to I/O devices and when they are allowed to access it. When more than one module accesses an I/O device, then it is important to develop ways to arbitrate (which module goes first if two or more want to access simultaneously) or synchronize (make a second module wait until the first is finished.) The 9S12 has no architectural features that restrict access to I/O ports, because it is assumed that all software burned into its ROM was designed for a common goal, meaning from a security standpoint one can assume there are no malicious components. However, as embedded systems become connected to the internet, providing the power and flexibility, security will become important issue. Checkpoint 5.1: Multiple modules may use TCNT, where each module has an initialization like Timer_Init in Program 4.5. What conflict could arise around the initialization of TSCR2?
Information hiding is similar to minimizing coupling. It is better to separate the mechanisms of software from its policies. We should separate what the function does (the relationship between its inputs and outputs) from how it does it. It is good to hide certain inner workings of a module, and simply interface with the other modules through the welldefined input/output parameters. For example we could implement a variable size buffer by maintaining the current byte count in a global variable, Count. A good module will hide how Count is implemented from its users. If the user wants to know how many bytes are in the buffer, it calls a function that returns the count. A badly written module will not hide Count from its users. The user simply accesses the global variable Count. If we update the buffer routines, making them faster or better, we might have to update all the programs that access Count too. Allowing all software to access Count creates a security risk, making the system vulnerable to malicious or incompetent software. The object-oriented programming environments provide well-defined mechanisms to support information hiding. This separation of policies from mechanisms is discussed further in the section on layered software. Maintenance Tip: It is good practice to make all permanently allocated data and all I/O devices private. Information is transferred from one module to another through well-defined function calls.
5.1 䡲 Modular Design
155
The Keep It Simple Stupid approach tries to generalize the problem so that it fits an abstract model. Unfortunately, the person who defines the software specifications may not understand the implications and alternatives. As a software developer, we always ask ourselves these questions: “How important is this feature?” “What if it worked this different way?” Sometimes we can restate the problem to allow for a simpler (and possibly more powerful) solution.
5.1.2 Functions, Procedures, Methods, and Subroutines
Figure 5.1 The calling program invokes the ADC_In subroutine passing parameters in registers.
A program module that performs a well-defined task can be packaged up and defined as a single entity. Then, that module can be invoked whenever the task needs to be performed. Object-oriented high-level languages like C and Java define program modules as methods. Functions and procedures are defined in some high-level languages like Pascal, Fortran, and Ada. In these languages, functions return a parameter and procedures do not. Most high-level languages however define program modules as functions, whether they return a parameter or not. A subroutine is the assembly language version of a function. Consequently, subroutines may or may not have input or output parameters. Formally, there are two components to a subroutine: definition and invocation. The subroutine definition specifies the task to be performed. Examples of three subroutine definitions can be seen in Program 4.5. In other words, it defines what will happen when executed. The syntax for a subroutine definition was presented previously in Section 2.8. It begins with a label, which will be the name of the subroutine and ends with a rts instruction. The definition of a subroutine includes a formal specification its input parameters and output parameters. In well-written software, the task performed by a subroutine will be welldefined and logically complete. The subroutine invocation is inserted to the software system at places when and where the task should be performed. Examples of subroutine invocations can be seen in Program 4.6. We define software that invokes the subroutine “the calling program” because it calls the subroutine. There are three parts to a subroutine invocation: pass input parameters, subroutine call, and accept output parameters. If there are input parameters, the calling program must establish the values for input parameters before it calls the subroutine. A bsr or jsr instruction is used to call the subroutine. After the subroutine finishes, and if there are output parameters, the calling program accepts the return value(s). In this chapter, we will pass parameters using the registers. If the register contains a value, the parameter is classified as call by value. If the register contains an address, which points to the value, then the parameter is classified as call by reference. For example, consider a subroutine that samples the 10-bit ADC on the 9S12, as drawn in Figure 5.1. An analog input signal is connected to PAD4. The details of how the ADC works will be presented later in Chapter 11, but for now we focus on the defining and invoking subroutines. The execution sequence begins with the calling program setting up the input parameters. In this case, the calling program sets RegA equal to the channel number, ldaa #4. The jsr ADC_In instruction will push the return address on the stack and
Input Parameter
RegA
CallingProgram ... ldaa #4 jsr ADC_Input std Result
Output Parameter
;Subroutine ;Samples 10-bit ADC ;In: RegA has channel Number ;Out: RegD has 10-bit ADC result ADC_In ...For details see Program 11.1... rts RegD
156
5 䡲 Modular Programming
jump to the ADC_In subroutine. The subroutine performs a well-defined task. In this case, it takes the channel number in RegA and performs an analog to digital conversion, placing the digital representation of the analog input into RegD. The rts instruction will pull the return address off the stack into the PC, returning the execution thread to the instruction after the jsr in the calling program. In this case, the output parameter in RegD contains the result of the ADC conversion. It is the responsibility of the calling program to accept the return parameter. In this case, it simply stores the result into one of its variables, std Result. Both the input and output parameters are call by value.
5.1.3 Dividing a Software Task into Modules
The overall goal of modular programming is to enhance clarity. The smaller the task, the easier it will be to understand. Coupling is defined as the influence one module’s behavior has on another module. In order to make modules more independent we strive to minimize coupling. Obvious and appropriate examples of coupling are the input/output parameters explicitly passed from one module to another. A quantitative measure of coupling is the number of bytes per second (bandwidth) that are transferred from one module to another. On the other hand, information stored in public global variables can be quite difficult to track. In a similar way, shared accesses to I/O ports can also introduce unnecessary complexity. Public global variables cause coupling between modules that complicate the debugging process because now the modules may not be able to be separately tested. On the other hand, we must use global variables to pass information into and out of an interrupt service routine, and from one call to an interrupt service routine to the next call. When passing data into or out of an interrupt service routine, we group the functions that access the global into the same module, thereby making the global variable private. Another problem specific to embedded systems is the need for fast execution, coupled with the limited support for local variables. On many microcontrollers it is inefficient to implement local variables on the stack. Consequently, many programmers opt for the less elegant yet faster approach of global variables. Again, if we restrict access to these globals to function in the same module, the global becomes private. It is poor design to pass data between modules through public global variables; it is better to use a well-defined abstract technique like a FIFO queue. We should assign a logically complete task to each module. The module is logically complete when it can be separated from the rest of the system and placed into another application. The interface design is extremely important. The interface to a module is the set of public functions that can be called and the formats for the input/output parameters of these functions. The interfaces determine the policies of our modules: “What does the module do?” In other words, the interfaces define the set of actions that can be initiated. The interfaces also define the coupling between modules. In general we wish to minimize the bandwidth of data passing between the modules yet maximize the number of modules. Of the following three objectives when dividing a software project into subtasks, it is really only the first one that matters: 䡲 Make the software project easier to understand 䡲 Increase the number of modules 䡲 Decrease the interdependency (minimize bandwidth between modules) Checkpoint 5.2: List some examples of coupling.
We will illustrate the process of dividing a software task into modules with an abstract but realistic example. The overall goal of the example shown in Figure 5.2 is to sample data using an ADC, perform calculations on the data and output results. The serial communication interface (SCI) on the 9S12 includes a transmission channel that could be used to output data to the external world. Notice the typical format of an embedded system in that it has some tasks performed once at the beginning, then it has a long sequence of tasks performed over and over. The structure of this example applies to many embedded systems such as a diagnostic medical instrument, an intruder alarm
5.1 䡲 Modular Design Figure 5.2 A complex software system is broken into three modules containing seven subroutines.
Linear approach main Step1 Step2 loop Step3 Step4 Step5 Step6 Step7 Step8 Step9 Step4 Step5 Step6 Step10 bra loop
157
Modular approach main jsr jsr loop jsr jsr jsr bra
ADC_Init SCI_Init ADC_In Math_Calc SCI_Out loop
Math_Calc jsr Sort jsr Average Step9 jsr Sort rts
ADC_Init Step1 rts
SCI_Init Step2 rts
ADC_In Step3 rts
SCI_Out Step10 rts
Sort Step4 Step5 Step6 rts
Average Step7 Step8 rts
system, a heating/AC controller, a voice recognition module, automotive emissions controller, or military surveillance system. The left side of Figure 5.2 shows the complex software system defined as a linear sequence of ten steps, where each step represents many lines of assembly code. The linear approach to this program follows closely to linear sequence of the processor as it executes instructions. This linear code, however close to the actual processor, is difficult to understand, hard to debug, and impossible to reuse for other projects. Therefore, we will attempt a modular approach considering the issues of functional abstraction, complexity abstraction and portability in this example. The modular approach to this problem divides the software into three modules containing seven subroutines. In this example, assume the sequence Step4-Step5-Step6 causes data to be sorted. Notice that this sorting task is executed twice. Functional abstraction encourages us to create a Sort subroutine allowing us to write the software once, but execute it from different locations. Complexity abstraction encourages us to organize the ten-step software into a main program with multiple modules, where each module has multiple subroutines. For example, assume the assembly instructions in Step1 cause the ADC to be initialized. Even though this code is executed only once, complexity abstraction encourages us to create an ADC_Init subroutine so the system is easier to understand and easier to debug. In a similar way assume Step2 initializes the SCI port, Step3 samples the ADC, the sequence Step7-Step8 performs an average, and Step10 outputs to the SCI. Therefore, each well-defined task is defined as a separate subroutine. The subroutines are then grouped into modules. For example, the ADC module is a collection of subroutines that operate the ADC. The complex behavior of the ADC is now abstracted into two easy to understand tasks: turn it on, and use it. In a similar way, the SCI module includes all functions that access the SCI. Again, at the abstract level of the main program, the understanding how to use the SCI is a matter knowing we first turn it on then we transmit data. The math module is a collection of subroutines to perform necessary calculations on the data. In this example, we assume sort and average will be private subroutines, meaning they can be called only by software within the math module, and not by software outside the module. Making private subroutines is an example of “information hiding”, separating what the module does from how the module works. When we port a system, it means we take a working system, and redesign it with some minor but critical change. The SCI device is used in this system to output results. We might be asked to port this system onto a device that uses an LCD in place of the SCI for its output. In this case, all we need to do is design, implement and test a LCD module with two subroutines LCD_Init and LCD_Out that function in a similar manner as the existing SCI routines. The modular approach performs the exact same ten steps in the exact same order. However, the modular approach is easier to debug, because first we debug each subroutine, then we debug each module, and finally we debug the
158
5 䡲 Modular Programming
entire system. The modular approach clearly supports code reuse. For example, if another system needs an ADC, we can simply use the ADC module software without having to debug it again. Observation: When writing modular code, notice its two-dimensional aspect. Down the y-axis still represents time as the program is executed, but along the x-axis we now visualize a functional block diagram of the system showing its data flow: input, calculate, output.
5.1.4 How to Draw a Call-Graph
Defined previously in Figure 1.14, recall that a call-graph is a graphical representation of the organizational structure of the modules pieced together to construct a system. In this section, we will work through the process of drawing a call-graph. A software module is a collection of public functions, private functions and private global variables that together perform a complete task. Modular programming places multiple related subroutines into a single module. I/O devices are essential in all computers, but particularly relevant when developing software for an embedded system. Just like our software, it is appropriate to group I/O ports into hardware modules, which together perform a complete I/O task. The main program is at the top and the I/O ports are at the bottom. In a hierarchical system, the modules are organized both in a horizontal and vertical fashion. Modules at the same horizontal level perform similar but distinct functions (e.g., we could place all I/O modules at the same horizontal level in the call-graph hierarchy). From a vertical perspective, we place modules responsible for overall policy decisions at the top and modules performing implementations at the bottom of the call-graph hierarchy. Since one of the advantages of breaking a large software project into subtasks is concurrent development, it makes sense to consider concurrency when dividing the tasks. In other words, the modules should be partitioned in such a way that multiple programmers can develop and test the subtasks as independently as possible. On the other hand, careful and constant supervision is required as modules are connected together and tested. An arrow represents a software linkage, i.e., one software module calling another. We draw the tail of the arrow in the software module that initiates the call and we point the head of the arrow at the software module it calls. When programming in C or C, including a header file in the implementation file of a module defines an arrow in the call-graph. The exception to this rule is including a header file that just contains constants and has no corresponding implementation file, e.g., HC12.h. For example, if we place an #include “fileA.h” statement in our fileB.c code, we create a call-graph arrow from module B to module A, because software in module B can call the public functions of module A. In a large complex system, we will add call-graph arrows for situations where it can call rather than where is does call. It is easier in a larger system to draw the can-call arrows than the does-call arrows, because we just have to look at the header files each code file includes. This approach will also simplify maintaining a call-graph during project phases where the software is being written, debugged, or upgraded. Changes to the list of header files included by a module are must less frequent than changes to the list of functions actually called. On the other hand, most embedded systems are simple enough that it is more appropriate to show just the does-call arrows. A global variable is one which is allocated in permanent RAM. These variables are a necessary and important component of an embedded system, because some information is permanent in nature. Good programming style however suggests we restrict access to these global variables to a single module. On the other hand, a public global variable is accessed by more than one module. Public globals represent poor programming style, because they add complexity to the system. Reading and writing public globals add arrows to the call-graph. If module A reads a global variable in module B, then we add an arrow from B to A, because activities in B cause changes in A. If module A writes to a global variable in module B, then we add an arrow from A to B, because activities in A cause changes in B. If there is an arrow from A to B, and a second arrow from B to A, then modules A and B must be tested together.
5.1 䡲 Modular Design
159
Typically, hardware modules are at the lowest level, because hardware responses to software. An arrow from an oval to a rectangle represents a hardware access, i.e., the software reads from or write to an I/O port. An arrow from an oval to a rectangle signifies the usual read/write access to the hardware module or public global. We will study interrupts in detail in Chapter 9. With interrupts, a hardware triggering event causes the software interrupt service routine to execute. Therefore with interrupts, we add an arrow from the hardware module to the software module. It can be drawn with two single-headed arrows or one double-headed arrow. Defining arrows between hardware and software modules allows us to identify problems such as conflict (two modules writing to the same I/O configuration registers), or race conditions (e.g., one module reading a port before another module initializes it). Figure 5.3 shows a call-graph of the example presented in Figure 5.2. To draw a callgraph, we first we represent all the software modules as ovals. Inside the oval lives the functions and variables of that module. Normally, there is not space to list all the subroutines of each module inside the oval, but they are drawn here in this figure so you can see the details of how the graph is drawn. In this example, there is a main program and three software modules. Since this main program calls the Math module, the main program is at a higher level than the Math module, therefore the oval for main will be drawn above the oval for Math. The ADC, Math, and SCI modules do not call each other and each is called by main, so they exist at the same level. In this example, there are two hardware modules, and they are drawn as rectangles. To draw the arrows, we search for subroutine call instructions. The tail of an arrow is placed in the module containing the calling program, and the head of an arrow is placed in the module with the subroutine. If there are multiple calls from one module to another, only one arrow is needed. For example, there are two calls from main to ADC, but only one arrow is drawn. No arrows will be drawn to describe subroutine calls within a module. For example, we do not need to draw arrows representing the Math routine Math_Calc calling Sort and Average, because these three routines are all within the same module. Two arrows from software to hardware are drawn, because the ADC module accesses the ADC hardware, and the SCI module accesses the SCI hardware. We can develop and connect modules in a hierarchical manner. Construct new modules by combining existing modules. In general, to reduce complexity of the system we want to maximize the number of modules and minimize the number of arrows between
Figure 5.3 A call-graph of the system of Figure 5.2.
them. More specifically, we want to minimize the bandwidth of data flowing from one module to the other. Observation: If module A calls module B, and module B calls module A, then these two modules must be tested together. Maintenance Tip: It is good practice to have one hardware module (e.g., the ADC or SCI) accessed by exactly one software module. Checkpoint 5.3: In what way are I/O devices considered as public? Checkpoint 5.4: How can you implement a system that considers I/O devices as private?
5.1.5 How to Draw a Data Flow Graph
Figure 5.4 A data flow graph of of the system of Figure 5.2.
Shown previously in Figure 1.13, recall that a data flow graph is a graphical representation of the data as it tranverses the system. Figure 5.4 shows the data flow graph for the example presented in Figure 5.2. In general, the data flow graph contains the same software and hardware modules as the call-graph. There are two fundamental differences, however. The arrows in a data flow graph specify the direction, data type, and rate of data transfer. The second difference is in general the modules are drawn left to right as data enters as inputs on the left and exits as outputs on the right. Assume in this example, the analog input contains frequency components from 0 to 50 Hz. We classify the signal as analog and specify the bandwidth of the analog signal to be 50 Hz. The output of the ADC hardware is 10-bit digital samples. If the 10-bit ADC is sampled 100 times a second, we define the bandwidth of the digital data out of the ADC software module as 100 samples/sec, 100 words/sec or 200 bytes/sec. Assume once a second, the main program fills a 100-element buffer and passes it to the math module. The math module takes in 100 samples and generates one 16-bit result. In this case, we define the output of the math module to be 1 word/sec. If once a second each result is printed as 10 ASCII characters using the SCI, then the bandwidth into and out of the SCI software module will 10 characters/sec. On the physical level, we will see later in Chapter 8, that a 10-byte/sec SCI transmission will require a total of 100 bits/sec on the SCI channel, because there is the overhead of a start bit and a stop bit for each byte transferred. 0 to 50Hz analog signal 10-bit digital 16-bit data every 10ms 100 words/sec ADC main hardware
Hierarchical systems have tree-structured call-graphs, like system in Figure 5.3. Layered systems have call-graphs that group the modules into layers, such that the linkage arrows only go from a high level to a lower level or within the same level. A lower level module is not allowed to call a higher level. If at all possible, we should avoid cyclic graphs. A cycle in the call-graph will make testing difficult. Recall that we design top-down and test bottom-up. When there is a cycle in the call-graph, there is no good place to start debugging. There are two approaches to hierarchical programming. The top-down approach starts with a general overview, like an outline of a paper, and builds refinement into subsequent layers. Most engineers believe top-down is the proper approach to design. A top-down programmer was once quoted as saying, “Write no software until every detail is specified”
5.2 䡲 Making Decisions
161
Top-down provides an excellent global approach to the problem. Managers like top-down because it gives them tighter control over their workers. The top-down approach work well when an existing operational system is being upgraded or rewritten. On the other hand the bottom-up approach starts with the smallest detail, builds up the system “one brick at a time.” The bottom-up approach provides a realistic appreciation of the problem because we often can not appreciate the difficulty or the simplicity of a problem until we have tried it. It allows programmers to start immediately coding, and gives programmers more input into the design. For example, a low level programmer may be able to point out features that are not possible and suggest other features that are even better. Some software projects are flawed from their conception. With bottom-up design, the obvious flaws surface early in the development cycle. Bottom-up is a better approach when designing a complex system and specifications are open-ended. For example, when researching new technologies or exploring new markets, you can’t perform a top-down design because there are no specifications or constraints with which to work. However, a bottom-up approach allows you to brainstorm putting pieces together in new and creative ways. In a bottom-up design, you questions that begin with “I wonder what would happen if . . .” On the other hand, top-down is better when you have a very clear understanding of the problem specifications and the constraints of your system. Observation: The TExaS application was actually written twice. The first version was programmed bottom up and served only to provide a clear understanding of the problem and the features and limitations of my hardware. I literally threw all the source code in the trash, and programmed the second version in a top down manner.
5.2
Making Decisions The previous section presented fundamental concepts, and general approaches to solving problems on the computer. In the subsequent sections, detailed assembly language implementations will be presented.
5.2.1 Conditional Branch Instructions
Normally the computer executes one instruction after another in a sequential or linear fashion. In particular, the next instruction to execute is found immediately following the current instruction. We use branch instructions to deviate from this straight line path. The bcc bcs beq bne bmi bpl bra brn bvc bvs and jmp instructions were presented earlier in Chapter 3. The following unsigned branch instructions must follow a subtract, compare, or test instruction, such as suba subb sbca sbcb subd cba cmpa cmpb cpd tsta tstb and tst. blo bls bhs bhi
target target target target
;Branch if unsigned less than ;Branch if unsigned less than or equal to ;Branch if unsigned greater than or equal to ;Branch if unsigned greater than
if C = 1, same as bcs if C + Z = 1 if C = 0, same as bcc if C + Z = 0
To understand exactly how these four unsigned conditional branches work, let’s start with the blo instruction. As stated earlier, we bring the first unsigned number into a register, and then subtract a second unsigned number from the first. We call the first number ‘First’ and the second number ‘Second’. The blo instruction is supposed to branch if the first unsigned number is strickly less than the second. The two possibilites, branch or no branch, are illustrated in number wheels drawn in Figure 5.5. Assume for a moment that the condition is true, meaning First Second. Since First Second, First-Second should be a negative number. I.e., if we subtract a big unsigned number from a small unsigned number, an unsigned overflow must occur, because the correct result of the subtraction is negative, but there are no negative numbers in the unsigned format. Thus, the C bit must be set. The left
162
5 䡲 Modular Programming
side of Figure 5.5 shows the subtraction will always cross the 0-255 barrier because FirstSecond. Conversely, assume the condition is false, meaning the first unsigned number is greater than or equal the second. The right side of Figure 5.5 shows when we subtract the smaller second number from the bigger first number we get the correct result. In this case, the C bit will be clear. Thus, the blo instruction can be defined as branch if C 1. The bhs instruction is the logical complement of blo, so bhs instruction will branch if the C bit is clear. The bls instruction will branch if the first number is less than the second (C 1) or if the two numbers are equal (Z 1). Hence, the operation of bls instruction can be defined as branch if C Z 1. Lastly, the bhi instruction is the logical complement of bls, so bhi instruction will branch if C Z 0. Figure 5.5 Number wheel on left shows the result of subtracting a big unsigned number from a little number, and the one on the right occurs when subtracting a small unsigned number from a large one.
First >= Second
First < Second Second 255 0
255 0
First
First Second
64
192
160
96 128
64
192
160
96 128
The following signed branch instructions must follow a subtract compare or test instruction, such as suba subb sbca sbcb subd cba cmpa cmpb cpd tsta tstb and tst. blt bge bgt ble
target target target target
;Branch if signed less than ;Branch if signed greater than or equal to ;Branch if signed greater than ;Branch if signed less than or equal to
if (~N•V+N•~V)=1 if (~N•V+N•~V)=0 if (Z+~N•V+N•~V)=0 if (Z+~N•V+N•~V)=1
To understand exactly how these four signed conditional branches work, let’s start with the blt instruction. We bring the first signed number into a register, and then subtract a second signed number from the first. The blt instruction is supposed to branch if the first signed number is strickly less than the second. Assume for a moment that First Second, thus the branch should occur. Since First Second, First-Second should be a negative number. Let’s further dissect this case into two subcases. If the V bit is clear, the subtraction is correct and the N bit will be 1. This subcase defines the N•⬃V term. If the V bit is set, the subtraction is incorrect and the result will incorrectly positive, making N bit 0. This subcase defines the ⬃N•V term. Conversely, assume the condition is false, meaning First
Second, and the branch should not occur. Since First Second, First-Second should be a positive number. If the V bit is clear, the subtraction is correct and the N bit will be 0. If the V bit is set, the subtraction is incorrect and the result will incorrectly negative, making N bit 1. Thus, the blt instruction can be defined as branch if (⬃N•VN•⬃V)1. The bge instruction is the logical complement of blt, so bge instruction will branch if (⬃N•VN•⬃V)0. The ble instruction will branch if the first number is less than the second ((⬃N•VN•⬃V)1) or if the two numbers are equal (Z1). Combining the less than with the equal conditions, the operation of ble instruction can be defined as branch if (Z⬃N•VN•⬃V)1. Lastly, the bgt instruction is the logical complement of ble, so bhi instruction will branch if (Z⬃N•VN•⬃V)0. The 9S12 has a pair of conditional branch instructions that make it easy to test individual bits. The address U can be direct, extended or indexed addressing, and specifies an
5.2 䡲 Making Decisions
163
8-bit value in memory. The parameter w is an immediate bit mask selecting which bits to test. The address target is encoded as an 8-bit PC-relative branch address. The brclr instruction will branch if all selected bits are zero. The brset instruction will branch if all selected bits are one. brclr brset
5.2.2 Conditional if-then Statements
U,#w,target U,#w,target
; branch if [U]&w is zero, branch if all selected bits are zero ;branch if (~[U])&w is zero, branch if all selected bits are one
Decision making is an important aspect of software programming. Two values are compared and certain blocks of program are executed or skipped depending on the results of the comparison. In assembly language it is important to know the precision (e.g., 8-bit, 16-bit) and the format of the two values (e.g., unsigned, signed). It takes three steps to perform a comparison. You begin by reading the first value into a register. For 8-bit values you can use either Register A or Register B. 16-bit values can be loaded into Register D, Register X or Register Y. The second step is to compare the first value with the second value. You can use either a subtract instruction (suba subb subd) or a compare instruction (cmpa cmpb cpd cpx cpy). These instructions set the condition code bits. The last step is a conditional branch. Observation: Think of the three steps 1) bring first value into a register, 2) compare to second value, 3) conditional branch, bxx (where xx is eq ne lo ls hi hs gt ge lt or le). The branch will occur if (first is xx second).
In the following examples, we assume G1, G2 are 8-bit variables. Program 5.1 contains two separate if-then structures involving testing for equal or not equal. It will call isEqual if G1 equals G2, and isNotEqual if G1 does not equal G2. When testing for equal or not equal it doesn’t matter whether the numbers are signed or unsigned. However, it does matter if they are 8-bit or 16-bit. To convert these examples to 16 bits, use Register D X or Y instead of Register A. Program 5.1 Conditional structures that test for equality.
Common error: It is an error to use an 8-bit comparison to test two 16-bit values. Checkpoint 5.5: Assume you have an 8-bit global variable N. Write assembly code that implements if(N25)isEqual(); Checkpoint 5.6: Assume H1 and H2 are two 16-bit variables. Write assembly code that implements if(H1H2)isEqual(); Hint: bring H1 into Register D, perform a 16-bit subtraction H1-H2, then use a branch not equal to skip the subroutine call to isEqual.
When testing for greater than or less than, it does matter whether the numbers are signed or unsigned. Program 5.2 contains four separate unsigned if-then structures. It will call isGreater if G1 is greater than G2, isGreaterEq if G1 is greater than or equal to G2, isLess if G1 is less than G2, and isLessEq if G1 is less than or equal to G2. When comparing unsigned values, the instructions bhi blo bhs and bls should follow the subtraction or comparison instruction. A conditional if-then is implemented by bringing the first number in a register, subtracting the second number, then using the branch
164
5 䡲 Modular Programming
instruction with complemenary logic to skip over the body of the if-then. To convert these examples to 16 bits, use Register D, X, or Y instead of Register A. Program 5.2 Unsigned conditional structures.
assembly code
C code
ldaa cmpa bls jsr next1
G2 G1 next1 isGreater
ldaa cmpa blo jsr next2
G2 G1 next2 ; skip if G2=G1
if(G2 >= G1){ GisGrtrEq(); }
ldaa cmpa bhs jsr next3
G2 G1 next3 isLess
; skip if G2>=G1 ; G2
if(G2 < G1){ isLess(); }
ldaa cmpa bhi jsr next4
G2 G1 next4 isLessEq
; skip if G2>G1 ; G2<=G1
if(G2 <= G1){ isLessEq(); }
; skip if G2<=G1 ; G2>G1
if(G2 > G1){ GisGrtr(); }
Example 5.1 Assuming G1 is unsigned 8-bit write software that sets G21 if G1 is greater than 100. Solution First, we draw a flowchart describing the desired algorithm, see Figure 5.6. Next, we restate the conditional as “skip over if G1 is less than or equal to 100”. We will use Register A because the parameters are 8 bits. To implement the assembly code we bring G1 into Register A, subtract 100, then branch to next if G1 is less than or equal to 100, as presented in Program 5.3. We will use an unsigned conditional branch because the data format is unsigned. Figure 5.6 Flowchart of an if-then structure.
G1
G2=1
G1<=100
Program 5.3 An unsigned if-then-else structure.
G1>100
ldaa cmpa bls movb
G1 #100 next #1,G2
; skip if G1<=100
if(G1>100){ G2 = 1; }
next
Program 5.4 contains four separate signed if-then structures. Similar to Program 5.2, Program 5.4 will call isGreater if G1 is greater than G2, isGreaterEq if G1 is greater than or equal to G2, isLess if G1 is less than G2, and isLessEq if G1 is less than or
5.2 䡲 Making Decisions
165
equal to G2. When comparing signed values, the instructions bgt blt bge and ble should follow the subtraction or comparison instruction. To convert these examples to 16 bits, use Register D, X, or Y instead of Register A. Program 5.4 Signed conditional structures.
Assembly code
C code
ldaa cmpa ble jsr next1
G2 G1 next1 isGreater
ldaa cmpa blt jsr next2
G2 G1 next2 ; skip if G2=G1
if(G2 >= G1){ isGreaterEq(); }
ldaa cmpa bge jsr next3
G2 G1 next3 isLess
; skip if G2>=G1 ; G2
if(G2 < G1){ isLess(); }
ldaa cmpa bgt jsr next4
G2 G1 next4 isLessEq
; skip if G2>G1 ; G2<=G1
if(G2 <= G1){ isLessEq(); }
; skip if G2<=G1 ; G2>G1
if(G2 > G1){ isGreater(); }
Checkpoint 5.7: When implementing if(N25)isGreater(); why is it important to know if N is signed or unsigned? Common Error: It is an error to use an unsigned conditional branch when comparing two signed values. Similarly, it is a mistake to use a signed conditional branch when comparing two unsigned values.
Example 5.2 Redesign the Example 5.1 code assuming G1 is 8-bit signed. Solution We can use the same flowchart shown previously in Figure 5.6. Again we bring G1 into Register A, subtract 100, then branch to next if G1 is less than or equal to 100, as presented in Program 5.5. However, we will use a signed conditional branch because the data format is signed. Program 5.5 A signed if-then-else structure.
ldaa cmpa ble movb
G1 #100 next #1,G2
; skip if G1<=100
if(G1>100){ G2 = 1; }
next
Notice that the C code for Program 5.2 is identical to Program 5.4, and the C code for Program 5.3 is identical to Program 5.5. This is because the compiler knows the type of G1 and G2; therefore, it knows whether to utilize unsigned or signed branches. The ldaa, ldab, ldd, ldx, and ldy instructions set the N and Z bits if the value is negative or zero respectively. It is acceptable to follow one of these load instructions with
166
5 䡲 Modular Programming
beq or bne to test if the value is zero or not zero. If the value is signed, it is appropriate to follow one of these load instructions with bpl or bmi to test if the value is positive or negative. Because the load and store instruction also clear the V bit, it is also acceptable to follow these instructions with one of the signed conditional branch instructions: bge, bgt, ble, and blt. After a load instruction (making V0), blt performs the same function as bmi, and bge performs the same function as bpl. Common Error: It is usually an error to follow a compare instruction with bpl or bmi. Checkpoint 5.8: Assume you have a 16-bit signed global variable N. Write assembly code that implements if(N 0) isPositive(); Checkpoint 5.9: Assume you have an 8-bit signed global variable M. Write assembly code that implements if(M 0) isGreaterThan0();
5.2.3 Conditional if-then-else statements
Figure 5.7 Flowchart of an if-then-else structure.
Simple if-then structures were presented in the last section. We can use the unconditional branch to add an else clause to any of the previous if then structures. A simple example of an unsigned conditional is illustrated in the Figure 5.7 and presented in Program 5.6. The first three lines test the condition G1G2. If G1G2, the software branches to high. Once at high, the software calls the isGreater subroutine then continues. Conversely, if G1
G1<=G2
G1>G2
isGreater
isLessEq
Program 5.6 An unsigned if-then-else structure. low high next
ldaa cmpa bhi jsr bra jsr
G1 G2 high isLessEq next isGreater
; branch if G1>G2 ; G1<=G2 ; G1>G2
if(G1>G2){ isGreater(); } else{ isLessEq(); }
Checkpoint 5.10: Assume you have a 16-bit signed global variable M. Write assembly code that implements if(M 1000) isGreater(); else isLess();
5.2.4 While Loops
Quite often the microcomputer is asked to wait for events or to search for objects. Both of these operations are solved using the while or do-while structure. A simple example of while loop is illustrated in the Figure 5.8 and presented in Program 5.7. Assume G1 and G2 are unsigned 8-bit variables. The operation is defined by the C code while(G2 G1){Body();}
Figure 5.8 Flowchart of a while structure.
G2>G1 G2<=G1
Body
5.2 䡲 Making Decisions
167
The program begins with a test of G2G1. If G2G1 then the body of the while loop is skipped. The unconditional branch after the body causes the G2G1 condition to be tested again. In this way, the body is executed over and over again until G2G1. Program 5.7 A while loop structure.
loop ldaa cmpa bls jsr bra next
G2 G1 next Body loop
while(G2 > G1){ Body(); }
; stop when G2<=G1 ; body of the loop
Checkpoint 5.11: Assume you have a 16-bit global variable N. Write assembly code that implements while(N!25){body();}
5.2.5 For Loops
Figure 5.9 Two flowcharts of a for-loop structure.
A for-loop control structure is a special case of the do-while loop. For loops can iterate up or down. For-loops are a convenient way to perform repetitive tasks. As an example, we write code that calls Process() 100 times. Two possible solutions are illustrated in Figure 5.9. The solution on the left starts at 0 and counts up to 100, while the solution on the right starts at 100 and counts down to 0. The first field is the starting task (e.g., i0). The next field specifies the conditions with which to continue execution (e.g., i100), and the last field is the operation to perform after each interactions (e.g., i). for(i=100; i!=0; i--){ Process(); } i = 100
for(i=0; i<100; i++){ Process(); } i=0 i
i < 100
i
Process
i >= 100
i == 0
i != 0 Process i = i–1
i = i+1
The count-up implementation places the loop counter in the Register B, as shown in Program 5.8. Program 5.8 A simple for-loop.
loop
ldab cmpb bhs jsr incb bra
#0 #100 done Process
; i=0
for(i=0; i<100; i++){ Process(); }
; i=i+1 loop
done
Freescale has a set of instructions convenient for implementating for-loops. Register r can be A B D X Y or SP. These are 3-byte instructions that employ a 9-bit PC-relative addressing mode for the target address. This means you can branch anywhere from 253 to 258 bytes from the address of the current instruction. These instructions do not set any condition code bits. dbeq dbne ibeq ibne tbeq tbne
;decrement register r, and branch to target if r is equal to zero ;decrement register r, and branch to target if r is not equal to zero ;increment register r, and branch to target if r is equal to zero ;increment register r, and branch to target if r is not equal to zero ;test register r, and branch to target if r is equal to zero ;test register r, and branch to target if r is not equal to zero
168
5 䡲 Modular Programming
If you don’t really need the index to go up, it is more efficient to count down, as shown the right side of Figure 5.9 and Program 5.9. Program 5.9 The dbne instruction optimizes this for-loop implementation.
L1
ldab jsr dbne
#100 Process B,L1
; i=100 ; i=i-1
for(i=100; i!=0; i--){ Process(); }
Checkpoint 5.12: How do you modify Programs 5.8 and 5.9 to handle a 16-bit loop counter?
If a register is not available, the for-loop can be implemented by placing the counter in a variable. In general, temporary variables, like i, should be placed in a register or as a local variable on the stack. Checkpoint 5.13: Write assembly code that calls the subroutine Test 1000 times. You may assume Test does not modify Register Y. Checkpoint 5.14: Give an example C code that could be implemented with the ibne instruction.
5.3
*
Macros A macro is a template for a sequence of instructions. The macro definition includes a name and a code sequence. This name becomes the mnemonic by which the macro is subsequently invoked. Invoking a macro means replacing the macro name with its code sequence, analogous to a copy-paste operation in the editor. Macros are like subroutines in the sense we use them to encapsulate operations we wish to perform as a single higherlevel function. With subroutines, we call a function at run time by pushing the PC on the stack and jumping to the code that defines the function. The return from subroutine also occurs at run time, by popping the return address off the stack into the PC. Invoking a macro, on the other hand, occurs at assembly time. When the assembler sees a macro in your code, it performs a character substitution (copy/paste), replacing the macro name with the macro definition. The syntax used in this section is compatible with Metrowerks Codewarrior. The macro definition may also include macro arguments. A macro definition may contain any code or directive except nested macro definitions. Invoking previously defined macros is allowed inside a macro definition. The code sequence of the macro is inserted in the source file at the position where the macro is invoked. To invoke a macro, write the macro name in the operation field of a source statement. Place the arguments, if any, in the operand field. The macro may contain conditional assembly directives that cause the assembler to produce parameter-dependent variations of the macro definition. The definition of a macro consists of three parts: 䡲 The header statement, a MACRO directive with a label that names the macro 䡲 The code sequence, with argument placeholders as needed 䡲 The ENDM directive, terminating the macro definition We can use macros to define what looks like new instructions, For example, this macro creates a new assembly instruction that negates RegD. negd: MACRO coma comb addd #1 ENDM
5.3 䡲 *Macros
169
We invoke this macro in our code simply by placing the negd in the operation field of our program. Our code ldd negd std ldd negd std
voltage voltage gravity gravity
Result created by assembler ldd voltage coma comb addd #1 std voltage ldd gravity coma comb addd #1 std gravity
Up to 36 parameters can be used in a macro definition. These parameters are replaced by the corresponding arguments in a subsequent call to that macro. In the code sequence portion of the macro definition we refer to the parameters a backlashes character (\), followed by a digit (0 - 9) or an uppercase letter (A - Z). During assembly, arguments from the macro invocation are substituted for parameter designators in the body of the macro as literal (string) substitutions. The string corresponding to a given argument is substituted literally wherever that parameter designator occurs in a source statement as the macro is expanded. Each statement generated in the execution is assembled in line. It is possible to specify a null argument in a macro invocation by a comma with no character between the comma and the preceding macro name or comma that follows an argument. Parameter designator \0 corresponds to a size argument that follows the macro name, separated by a period (.). The following macro can be used to define a rectangle given its height and width. rect: MACRO dc.\0 \1,\2 ;height and width dc.\0 2*(\1+\2) ;perimeter dc.\0 \1*\2 ;area ENDM
We invoke this macro in our code by placing rect in the operation field of our program and the dimensions in the operand field big rect.w 100,50 little rect.b 10,20
which will expand to big
dc.w dc.w dc.w little dc.b dc.b dc.b
100,50 ;height and width 2*(100+50) ;perimeter 100*50 ;area 10,20 ;height and width 2*(10+20) ;perimeter 10*20 ;area
Notice the parameters are not evaluated by the macro expansion. The evaluation will occur subsequently by the regular assembly process as the entire program is assembled. The size argument is optional. The following macro can be used to create a new pseudo-op that defines a null-terminated string. str:
MACRO fcc \1 fcb 0 ENDM
170
5 䡲 Modular Programming
We invoke this macro in our code simply by placing str in the operation field of our program and the string in the operand field, which will expand to the same definition seen as Program 6.6. hello str “Hello world”
We can use conditional assembly to test the value of the parameter, selecting different code as needed. This macro defines a logical shift right on RegA. The macro parameter will specify the number of shifts. For simplicity we will assume the parameter is 1, 2, or 3. MEXIT cause the macro expansion to end. Because macro expansion is a character substitution (i.e., it does not evaluate the numerical value of the parameters), we use the conditional assembly that compares strings (IFC). lsrn: MACRO IFC ‘\1’,’1’ lsra MEXIT ENDC IFC ‘\1’,’2’ lsra lsra MEXIT ENDC IFC ‘\1’,’3’ lsra lsra lsra MEXIT ENDC ENDM
If we invoke this macro with lsrn 2, the assembler will substitute two lsra instructions in its place. If you create a subroutine and use it at ten different places within your program, there will be one copy of the subroutine and ten jsr instructions used to call it. If during the execution of your system the subroutine is invoked 100 times/sec, the jsr and rts instructions are each executed 100 times/sec. If you create a macro and use it at ten different places within your program, there will be ten copies of the macro code in your system, and no jsr or rts instructions are needed. If during the execution of your system the macro is invoked 100 times/sec, no run-time overhead of the jsr and rts instructions occurs. Observation: Subroutines optimize memory space with a moderate cost of execution speed. Observation: Macros optimize execution speed with a large cost of memory space.
Example 5.3 Redesign the I/O driver for a single output pin, originally solved in Example 4.1. Solution This software driver requires three operations: initialization, set and clear. The initialization will define PT0 as an output, Pin_Set will make it high, and Pin_Clr will make it low. The subroutine solution, shown on the left of Program 5.10, is repeated from Example 4.1. Checkpoint 5.15: What two advantages does the macro solution have over the subroutine solution in Example 5.1?
5.4 䡲 *Recursion Program 5.10 I/O port drivers using subroutines and macros.
; Make PT0 an output pin Pin_Init bset DDRT,#$01 rts ; Make PT0 high Pin_Set bset PTT,#$01 rts ; Make PT0 low Pin_Clr bclr PTT,#$01 rts
5.4
171
Pin_Init: MACRO bset DDRT,#$01 ; Make PT0 output ENDM Pin_Set: MACRO bset PTT,#$01 ; Make PT0 high ENDM Pin_Clr: MACRO bclr PTT,#$01 ; Make PT0 low ENDM
*
Recursion A recursive program is one that calls itself. Each time the subroutine is started a new instantiation occurs. There is a unique set of parameters, registers, and local variables for each instantiation. The stack is a convenient way to separate the parameters and variables of one instantiation from another. In order for the recursive function to finish, there must be a situation where a direct result is generated, which is called the end condition. For example, the factorial has two possibilities Fact(1) = 1 Fact(n) = n*Fact(n-1) if n>1
end condition recursion
Program 5.11 shows two assembly language implementations of factorial. The one on the top uses iteration, and the one on the bottom uses recursion. It is usually the case that a recursive algorithm can be rewritten in iterative form. Nevertheless, sometimes it is more convenient to implement the algorithm in recursive form.
Program 5.11 Iterative and recursive implementations of factorial.
; iterative implementation ; Input: RegA is n ; Output: RegA is Fact(n) Fact ldab #1 ; r=1 loop cmpa #1 ; end condition beq done psha mul ; r=r*n pula deca ; n=n-1 bra loop done tba ; RegA=Fact(n) rts ; recursive implementation ; Input: RegA is n ; Output: RegA is Fact(n) Fact cmpa #1 ; end condition beq done psha ; save n deca ; n-1 bsr Fact ; RegA=Fact(n-1) pulb ; RegB=n mul tba ; RegA=n*Fact(n-1) done rts
Table 5.1 shows the execution time in cycles for these two assembly implementations. Notice that the recursive implementation is slightly shorter, but the execution speed is slightly slower. Table 5.1 Execution times in cycles for Program 5.11 (including bsr).
Input
9S12 iterative
9S12 factorial
1 2 3 4 5
16 30 44 58 72
13 35 59 79 101
Checkpoint 5.16: How many stack bytes are required for each instantiation of
Fact? How much stack space is required to execute Fact(5)? Example 5.4 You are given a subroutine, OutChar, that outputs one ASCII character. Design a function that outputs a 16-bit unsigned integer. Solution We will solve this two ways, iteratively and recursively. As always, we ask “what is our starting point?”, “how do we make progress?”, and “when are we done?” The input, N, is a 16-bit unsigned number (in RegD), and we are done when one to five ASCII characters are displayed, representing the value of N. Figure 5.10 demonstrates the successive refinement approach to solving this problem iteratively. The iterative solution has three phases: initialization, creation of digits, and output of the ASCII characters. The digits are created from the remainders occurring by dividing the input, N by 10. To get all the digits we divide by 10 until the quotient is 0. Because the digits are created in the opposite order, each digit will be pushed on the stack during the creation phase, and pulled off the stack during the output stage. The counter is needed so the output stage knows how many digits to pull from the stack. Figure 5.10 Successive refinement method for the iterative approach.
Cnt = 0 Initialize Initialize
Create next digit
Create digits done? Output digits
yes
no
N = N/10 R = remainder
Pull R ch = R+$30 Cnt--
Push R Cnt++
OutChar(ch) !=0
N
!=0 Cnt
=0
=0
Output digits
Figure 5.11 demonstrates the successive refinement approach to solving this problem recursively. Most recursive functions first check for the end condition. If the end condition is true, it handles the simple case directly. If the end condition is not true, it simplifies the problem (in this case N N/10) and calls itself. Just like the interative solution, the digits (calculated as R) are calculated in reverse order and the stack is used to save the intermediate results, so the digits are displayed in proper order.
5.4 䡲 *Recursion Figure 5.11 Successive refinement method for the recursive approach.
end?
yes
<10
N
no
>=10
N
173
<10
>=10
Set up to call itself
N = rest of it R = units digit
N = N/10 R = remainder
OutDec(N)
OutDec(N)
OutDec(N)
Handle end condition
Output units digit
ch = R+$30 OutChar(ch)
Program 5.12 shows two implementations of this 16-bit output decimal function. The iteration solution actually has two loops; the first loop determines the digits in opposite order, and the second loop outputs the digits in proper order. The recursive solution, first presented as Program 3.12, also uses the stack to calculate the least significant digit first, but output the most significant digit first. There is no fundamental rule that states which is better iteration or recursion. A good programmer has both in her toolbox and uses whichever is easier to understand and easier to debug.
; iterative method ; Reg D is input, n OutUDec ldy #0 ;RegY= cnt ODloop ldx #10 ;RegD= n idiv ;RegB= R, digit pshb ;Save for later iny ;cnt++ xgdx ;RegD, n=n/10 cpd #0 ;Continue until bne ODloop ODout pula ;next digit adda #’0’ ;convert ASCII jsr OutCh dbne Y,ODout rts ; recursive method ; Reg D is input, n OutUDec cpd #10 ;end condition blo end ldx #10 ;RegD n idiv ;RegX n= n/10 pshb ;RegB R= n%10 xgdx ;RegD n= n/10 bsr OutUDec pulb ;RegB R= n%10 end tba adda #’0’ ;convert ASCII jsr OutCh out rts
Program 5.12 Iterative and recursive implementations of output decimal.
// iterative method void OutUDec(unsigned short n){ unsigned cnt=0; unsigned char buffer[5]; do{ buffer[cnt] = n%10; // digit n = n/10; cnt++; } while(n); // repeat until n==0 for(; cnt; cnt—){ OutCh(buffer[cnt-1]+’0’); } } // rescursive method void OutUDec(unsigned short n){ if(n >= 10){ OutUDec(n/10); // ms digits n = n%10; // n is 0-9 } OutChar(n+’0’); }
5 䡲 Modular Programming
174
To illustrate the execution of the recursive implementation of OutUDec, we can place a ScanPoints on the first line and observe the stack and Register D, see Table 5.2. In this way, we can observe each instantiation. Let the initial input be 12345. Table 5.2 ScanPoint results for recursive version of Program 5.12.
Observation: In general, recursive algorithms are shorter to write, but require additional stack space.
5.5
Writing Quality Software
5.5.1 Assembly Language Style Guidelines
The objective of this section is to present style rules when developing assembly language. This set of rules is meant to guide not control. In other words, they serve as general guidelines rather than fundamental law. Choosing names for variables and functions involves creative thought, and it intimately connected to how we feel about ourselves as programmers. Of the policies presented in this section, naming conventions may be the hardest habit for us to break. The difficulty is that there are many conventions that satisfy the “easy to understand” objective. Good names reduce the need for documentation. Poor names promote confusion, ambiguity, and mistakes. Poor names can occur because code has been copied from a different situation and inserted into our system without proper integration (i.e., changing the names to be consistent with the new situation.) They can also occur in the cluttered mind of a second-rate programmer, who hurries to deliver software before it is finished.
5.5.5.1 Names Should Have Meaning.
If we observe a name away from the place where it is defined, the meaning of the object should be obvious. The object TxFifo is clearly the transmit first in first out circular queue. The function LCD_OutString will output a string to the LCD display.
5.5.5.2 Avoid Ambiguities.
Don’t use variable names in our system that are vague or have more than one meaning. For example, it is vague to use temp, because there are many possibilities for temporary data, in fact, it might even mean temperature. Don’t use two names that look similar, but have different meanings.
5.5.5.3 Give Hints About the Type.
We can further clarify the meaning of a variable by including phrases in the variable name that specify its type. For example, dataPt timePt putPt are pointers. Similarly, voltageBuf timeBuf pressureBuf are data buffers. Other good phrases include Flag Mode U L Index Cnt, which refer to boolean flag, system state, unsigned 16-bit, signed 32-bit, index into an array, and a counter respectively.
5.5.5.4 Use the Same Name to Refer to the Same Type of Object.
For example, everywhere we need a local variable to store an ASCII character we could use the name letter. Another common example is to use the names i j k for indices into arrays. The names V1 R1 might refer to a voltage and a resistance. The exact correspondence is not part of the policies presented in this section, just the fact that a correspondence should exist. Once another programmer learns which names we use for which types of object, understanding our code becomes easier.
5.5 䡲 Writing Quality Software
175
5.5.5.5 Use a Prefix to Identify Public Objects.
An underline character will separate the module name from the function name. Public objects have the underline and private objects do not. As an exception to this rule, we can use the underline to delimit words in all upper-case name (e.g., MIN_PRESSURE equ 10). Functions that can be accessed outside the scope of a module (i.e., public) will begin with a prefix specifying the module to which it belongs. It is poor style to create public variables, but if they need to exist, they too would begin with the module prefix. The prefix matches the module name containing the object. For example, if we see a function call, jsr LCD_OutString we know the public function belongs to the LCD module. Notice the similarity between this syntax (e.g., LCD_Init) and the corresponding syntax we would use if programming the module as a class in C (e.g., LCD.Init()). Using this convention, we can easily distinguish public and private objects.
5.5.5.6 Use Upper and Lower Case to Specify the Scope of an Object.
We will define I/O ports and constants using no lower-case letters, like typing with capslock on. In other words, names without lower-case letters refer to objects with fixed values. TRUE FALSE and NULL are good examples of fixed-valued objects. As mentioned earlier, constant names formed from multiple words will use an underline character to delimit the individual words. E.g., MAX_VOLTAGE UPPER_BOUND FIFO_SIZE. Global objects will begin with a capital letter, but include some lower-case letters. Local variables will begin with a lower-case letter, and may or may not include upper case letters. Since all functions are global, we can start function names with either an upper-case or lower-case letter. Using this convention, we can distinguish constants, globals and locals. An object’s properties (public/private, local/global, constant/variable) are always perfectly clear at the place where the object is defined. The importance of the naming policy is to extend that clarity also to the places where the object is used.
5.5.5.7 Use Capitalization to Delimit Words.
Table 5.3 Examples of names.
Names that contain multiple words should be defined using a capital letter to signify the first letter of the word. Recall that the case of the first letter specifies whether is the local or global. Some programmers use the underline as a word-delimiter, but except for constants, we will reserve underline to separate the module name from the name of a public object. Table 5.3 overviews the naming convention presented in this section. Type
Examples
constants local variables private global variable public global variable
Checkpoint 5.17: Just by looking at its name, how can tell if a function is private or pubic? Checkpoint 5.18: Just by looking at its name, how can tell if a variable is local or global?
5.5.5.8 The Single Entry Point is at the Top.
In assembly language, we place a single entry point of a subroutine at the first line of the code. This guarantees that registers will be saved and local variables will be properly allocated on the stack. By default, C functions have a single entry point. Placing the entry point at the top provides a visual marker for the beginning of the subroutine.
176
5 䡲 Modular Programming 5.5.5.9 The Single Exit Point is at the Bottom.
Program 5.13 Examples that use comments to delineate its beginning and end.
Most programmers prefer to use a single exit point as the last line of the subroutine. Some programmers employ multiple exit points for efficiency reasons. In general, we must guarantee the registers, stack, and return parameters are at a similar and consistent state for each exit point. In particular, we must deallocate local variables properly. If you do employ multiple exit points, then you should develop a means to visually delineate where one subroutine ends and the next one starts. You could use one line of comments to signify the start a subroutine and a different line of comments to show the end of it. Program 5.13 employs distinct visual markers to see the beginning and end of the subroutine. ;***************Abs************ ; Input: RegA is signed 8-bit ; Output: Reg A is absolute value Abs: tsta ; already positive? bpl ok nega ok rts ;0 to 128 ; ......... end of Abs .......
//**************Abs************ // Input: signed 8-bit // Output: absolute value unsigned char Abs(char n){ if(n<0){ n = -n; } // 0 to 128 return (unsigned char) n; }
Observation: Having the first and last lines of a subroutine be the entry and exit points makes it easier to debug, because it will be easy to place debugging instruments (like breakpoints). Common error: If you place a debugging breakpoint on the last rts of a subroutine with multiple exit points, then sometimes the subroutine will return without generating the break. 5.5.5.10 Write Structured Programs.
A structured program is one that adheres to a strict list of program structures, previously defined in Section 1.4 and further elaborated in Section 5.2. When we program in C (with the exception of goto, which by the way you should never use) we are forced to write structured programs due to the syntax of the language. One technique for writing structured assembly language is to adhere to the program structures shown in Figure 1.6. In other words, restrict the assembly language branching to configurations that mimic the software behavior of if, if-else, do-while, while, for, and switch. Structured programs are much easier to debug, because execution proceeds only through a limited number of well-defined pathways. When we use well-understood assembly branching structures, then our debugging can focus more on the overall function and less on how the details are implemented.
5.5.5.11 The Registers Must Be Saved.
When working on a software team it is important to establish a rule whether or not subroutines will save/restore registers. Establishing this convention is especially important when a mixture of assembly and high-level language is being used, or if the software project remains active for long periods of time. It is safest to save and restore registers that are modified (most programmers do not save/restore the CCR) and output parameter(s) returned in a register. Exceptions to this rule can be made for those portions of the code where speed is most critical. Common error: If the calling routine expects a subroutine to save/restore registers, and it doesn’t, then information will be lost. Observation: If the calling routine does not expect a subroutine to save/restore registers, and it does, then the system executes a little slower and the object code is a little bigger than it could be. Common error: When a mixture of C and assembly language programs are integrated, then an error may occur when the compiler is upgraded because there may be a change in if registers are saved/restored, or how parameters are passed.
5.5 䡲 Writing Quality Software 5.5.5.12 Use High-Level Languages Whenever Possible.
177
It may seem odd to have a rule about high level languages in a section about assembly language programming. It is even odder to make this statement in a book devoted to assembly language programming. In general, we should use high level languages when memory space and execution speed are less important than portability and maintenance. When execution speed is important, you could write the first version in a high level language, run a profiler (that will tell you which parts of your program are executed the most), then optimize the section of code using up the most execution time by writing them in assembly language. If a C language implementation just doesn’t run fast enough, you could consider a more powerful compiler or a faster microcomputer. Observation: High-level language programmers who are well acquainted with the underlying assembly language of the machine have a better understanding of how their machine and software work.
5.5.5.13 Minimize Conditional Branching.
Every time software makes a conditional branch, there are two possible outcomes that must be tested (branch or not branch.) In the example shown in Program 5.14, assume we wish to set a software Flag if Port T bit 5 is true. A flag will be true if it is any nonzero value, and false if it is zero. A conditional branch could be avoided by solving the problem in another way.
Program 5.14 Sometimes we can remove a conditional branch and simplify the program.
; uses conditional branch SetFlag brset PTT,#$20,set clr Flag ;PT5 is 0 bra done set movb #$FF,Flag ;PT5 is 1 done rts ; no conditional branch SetFlag ldaa PTT anda #$20 ;PT5 staa Flag ;0 or $20 rts
// uses conditional branch void SetFlag(void){ if(PTT&0x20){ Flag = 0xFF; // PT5 is 1 } else{ Flag = 0; // PT5 is 0 } } // no conditional branch void SetFlag(void){ Flag = PTT&0x20; // 0 or $20 }
Observation: Software can be made easier to understand by reworking the approach in order to reduce the number of conditional branches. Checkpoint 5.19: If a system has 20 conditional branches, how many potential execution paths might there be through the software?
5.5.2 Comments
Discussion about comments was left for last, because they are the least important aspect involved in writing quality software. It is much better to write well-organized software with simple interfaces having operations so easy to understand that comments are not necessary. The beginning of every file should include the file name, purpose, hardware connections, programmer, date, and copyright. E.g., ; ; ; ; ; ; ;
filename adtest.rtf Test of 9S12DP512 10-bit ADC 1 Hz sampling and output to the serial port Last modified 8/14/08 by Jonathan W. Valvano Copyright 2008 by Jonathan W. Valvano, [email protected] You may use, edit, run or distribute this file as long as the above copyright notice remains
The beginning of every function should include a line delimiting the start of the function, purpose, input parameters, output parameters, and special conditions that apply. The comments at the beginning of the function explain the policies (e.g., how to use the function.)
178
5 䡲 Modular Programming
These comments, which are similar to the comments for the prototypes in a header file, are intended to be read by the client. E.g., ;............................SCI_InUDec........................... ; Accepts ASCII input from the SCI in unsigned decimal format ; and converts to a 16-bit unsigned number with a maximum of 65535 ; If a number is above 65535, it truncates without reporting the error ; Backspace will remove last digit typed ; Inputs: none ; Outputs: Register D is the unsigned 16-bit value
Comments can be added to a variable or constant definition to clarify the usage. In particular, comments can specify the units of the variable or constant. For complicated situations, we can use addition lines and include examples. E.g., V1 rmb Fs FoundFlag RunMode ; 0 means ; 1 means ; 2 means ; 3 means
2 ; voltage at node 1 in mV, range -5000 mV to +5000 mV rmb 2 ; sampling rate in Hz rmb 1 ; 0 if keyword not yet found, 1 if found rmb 1 ; 0, 1, 2, or 3 specifies system mode idle startup active run stopped
Comments can be used to describe complex algorithms. These types of comments are intended to be read by our coworkers. The purpose of these comments is to assist in changing the code in the future, or applying this code into a similar but slightly different application. Comments that restate the function provide no additional information, and actually make the code harder to read. Examples of bad comments include: inc clr
time mode
; add one to time ; set mode to zero
Good comments explain why the operation is performed, and what it means: inc clr
time mode
; maintain elapsed time in msec ; switch to idle mode because no more data is available
We can add spaces so the comment fields line up. We should avoid tabs because they often do not translate well from one computer to another. In this way, the software is on the left and the comments can be read on the right. I taught a large programming class one semester, and being an arrogant and lazy fellow, I thought I could write a grading program that accepts the students’ programming assignments and automatically generates and records their grades. (The second step would have been to write a self-study book, then I could teach the masses without ever having to show up for work.) My grading program worked OK for the functional aspects of the students’ software. My program generated inputs, called the students’ program and compared the results with expected behavior. Where I utterly failed was in my attempts to automatically grade their software on style. I used the following three part “quality” statistic. First, I measured execution speed the student’s software, si. Smaller times represent improved dynamic efficiency. Next, I measured the number of bytes in the object code, bi. Again, a smaller number represents better static efficiency. Third, I used the number of ASCII characters in the source code, ci, as a quantitative measure of documentation. For this parameter, bigger is better. In a typical statistical fashion, I used the average and standard deviation to calculate quality statistic Quality
s - si b - bi ci - c + + ss sb sc
5.6 䡲 *How Assemblers Work
179
Half way through the semester, I happened to look at some assignments and was horrified to find the all-time worst software ever written from both a style and content basis. To improve speed and reduce size, the students cut so many corners that their code didn’t really work anymore, it just appeared to work to my grading program. Then they took the ugly mess and filled it with nonsense comments, giving it the appearance of extensive documentation. To my students in that class that semester, I sincerely apologize. We should write comments for coworkers who must change our software, or clients who will use our software.
5.5.3 Inappropriate I/O and Portability
One of the biggest mistakes beginning programmers make is the inappropriate usage of I/O calls (e.g., screen output and keyboard input). An explanation for their foolish behavior is that they haven’t had the experience yet of trying to reuse software they have written for one project in another project. Software portability is diminished when it is littered with user input/output. To reuse software with user I/O in another situation, you will almost certainly have to remove the input/output statements. In general, we avoid interactive I/O at the lowest levels of the hierarchy, rather return data and flags and let the higher level program do the interactive I/O. Often we add keyboard input and screen output calls when testing our software. It is important to remove the I/O that not directly necessary as part of the module function. This allows you to reuse these functions in situations where screen output is not available or appropriate. Obviously screen output is allowed if that is the purpose of the routine. Common Error: Performing unnecessary I/O in a subroutine makes it harder to reuse at a later time.
5.6
*
How Assemblers Work Assemblers are development tools that process assembly language source program statements and translate them into executable machine language object files. The symbolic language used to code source programs to be processed by the assembler is called assembly language. The language is a collection of mnemonic symbols representing: operations (i.e., machine instruction mnemonics or directives to the assembler), symbolic names, operators, and special symbols. The assembly language provides mnemonic operation codes for all machine instructions in the instruction set. The assembly language also contains mnemonic directives that specify auxiliary actions to be performed by the assembler. These directives or pseudo-ops are not always translated into machine language. Most assemblers require two passes. During the first pass, the source program is analyzed in order to develop the symbol table. A symbol table is a mapping between symbolic names (e.g., PTT) and their numeric values (e.g., $0240.) During the second pass, the object file is created (assembled) using the symbol table developed in pass one. It is during the second pass that the source program listing is also produced. The symbol table is recreated in the second pass. A phasing error occurs if the symbol table values calculated in the two passes are different. Errors that occur during the assembly process (e.g., undefined symbol, illegal op code, branch destination too far, etc.) are explained in the listing file. The source code is a file of characters usually created with an editor. Each line within the source code is processed completely before the next line is read. As each line is processed, the assembler examines the label, operation code, and operand fields. The operation code table is scanned for a match with a known opcode. During the processing of a standard operation code mnemonic, the standard machine code is inserted into the object file. If an assembler directive is being processed, the proper action is taken. Any errors that are detected by the assembler are displayed after the actual line containing the error is printed. Object code is the binary values (instructions and data) that, when executed by the computer, perform the intended function. The listing file contains the address, object code, and a copy of the source code. The listing file also provides a symbol table describing where in memory the program and data will be loaded. The symbol table is a list of all the names used in the program along with the values. A symbol is created
180
5 䡲 Modular Programming
when you put a label starting in column 1. The symbol table value for this type is the absolute memory address where the instruction, variable or constant will reside in memory. The second type of label is created by the equ pseudo-op, e.g., PTT
equ
$0240
The value for this type of symbol is simply the number specified in the operand field. When the assembler processes an instruction with a symbol in it, it simply substitutes the fixed value in place of the symbol. Therefore we will use symbols to clarify (make it easier to understand) our programs. A compiler converts high-level language source code into object code. A cross-compiler also converts source code into object code and creates a listing file except that the object code is created for a target machine that is different from the machine running the cross-compiler. The TExaS is a cross-assembler because it runs on an Intel computer but creates 9S12 object code. Metrowerks CodeWarrior includes both a cross-assembler and a cross-compiler because it run on the Windows PC and creates 9S12 object code. Checkpoint 5.20: What does the assembler do in pass 1? Checkpoint 5.21: What does the assembler do in pass 2?
5.7
Functional Debugging 5.7.1 Stabilization
Functional debugging involves the verification of input/output parameters. It is a static process where inputs are supplied, the system is run, and the outputs are compared against the expected results. Four methods of functional debugging are presented in this section, and two more functional debugging methods are presented in the next chapter after indexed addressing mode is presented. The first step of debugging is to stabilize the system. In the debugging context, we stabilize the problem by creating a test routine that fixes (or stabilizes) all the inputs. In this way, we can reproduce the exact inputs over and over again. Once stabilized, if we modify the program, we are sure that the change in our outputs is a function of the modification we made in our software and not due to a change in the input parameters. When a system has a small number of possible inputs (e.g., less than a million), it makes sense to test them all. When the number of possible inputs is large we need to choose a set of inputs. There are many ways to make this choice. We can select values: 䡲 䡲 䡲 䡲 䡲 䡲
Near the extremes and in the middle Most typical of how our clients will properly use the system Most typical of how our clients will improperly use the system That differ by one You know your system will find difficult Using a random number generator
To stabilize the system we define a fixed set of inputs to test, run the system on these inputs, and record the outputs. Debugging is a process of finding patterns in the differences between recorded behavior and expected results. The advantage of modular programming is that we can perform modular debugging. We make a list of modules that might be causing the bug. We can then create new test routines to stabilize these modules and debug them one at a time. Unfortunately, sometimes all the modules seem to work, but the combination of modules does not. In this case we study the interfaces between the modules, looking for intended and unintended (e.g., unfriendly code) interactions.
5.7.2 Single Stepping
Many debuggers allow you to set the program counter to a specific address then execute one instruction at a time. The TExaS simulator provides four stepping commands Step, Few, StepOver and StepOut commands. Action->Step is the usual execute one assembly
5.8 䡲 Tutorial 5a. Editing and Assembling
181
instruction. Action->Few will execute some instructions and stop (you can set how many “some” is.) Action->StepOver will execute one assembly instruction, unless that instruction is a subroutine call, in which case the simulator will execute the entire subroutine and stop at the instruction following the subroutine call. Action->StepOut assumes the execution has already entered a subroutine and will finish execution of the subroutine and stop at the instruction following the subroutine call.
5.7.3 Breakpoints Without Filtering
A breakpoint is a mechanism to tag places in our software, which when executed will cause the software to stop. A scanpoint is similar to a breakpoint in that we place them at strategic places in our software. When that address is encountered, information is logged in a debugging file and the software continues to run. In TExaS, you can break/scan on any address.
5.7.4 Conditional Breakpoints
One of the problems with breakpoints is that sometimes we have to observe many breakpoints before the error occurs. One way to deal with this problem is the conditional breakpoint. To illustrate the implementation of conditional breakpoints add a global variable called Count and initialize it to 32 in the ritual. Add the following conditional breakpoint to the appropriate location in your software. Using the debugger, set a regular breakpoint at bkpt. And run the system again (you can change the 32 to match the situation that causes the error.) if(--Count==0) bkpt
dec Count bne skip bkpt nop skip
Notice that the breakpoint occurs only on the 32nd time the break is encountered. Any appropriate condition can be substituted.
5.7.5 Instrumentation: Print Statements
5.8
The use of print statements is a popular and effective means for functional debugging. One difficulty with print statements in embedded systems is that a standard “printer” may not be available. Another problem with printing is that most embedded systems involve timedependent interactions with its external environment. The print statement itself may so slow, that the debugging process itself causes the system to fail. In this regard, the print statement is intrusive. Therefore, throughout this book we will utilize debugging methods that do not rely on the availability of a standard output device.
Tutorial 5a. Editing and Assembling To illustrate the assembly process, we will hand assemble the software given in Program T5a.1.
Program T5a.1 Simple program used to study the assembly process.
PTT DDRT main loop
init
equ equ org lds bsr staa inca bra ldaa staa clra rts org fdb
5 䡲 Modular Programming During pass 1, we need to create the symbol table. The symbol table is a mapping between symbols and their values. In this particular example, the symbol table values of PTT and DDRT are the addresses specified in the operand fields. The values of main loop and init will be the memory address of the corresponding line. The org pseudo-op specifies the starting address of the subsequent lines. Until there is another org pseudo-op, the instructions will be contiguously allocated in order one after another. To determine the line addresses, we need to know the size of each line. The first step is to determine the addressing modes. Recall that direct addressing can be used when accessing information at addresses 0 to $00FF, while extended addressing is required for the other locations. Question 5a.1 Determine the addressing mode for each instruction in Program T5a.1. Question 5a.2 Determine the size of each instruction in Program T5a.1. We can find the size of an instruction in the Freescale instruction manual. At this point, all we need to know is the number of bytes required to encode each instruction for that particular addressing mode. Some pseudo-ops, like equ and org, do not create object code, so these lines do not have a size. On the other hand, other pseudo-op, like fdb, fcb, and fcc, do create object code, so they will have sizes. Question 5a.3 Finish pass 1 by creating the symbol table. The value for the lines with the equ pseudoop is simply the value of the operand field. The value of the other lines is the address of that line. During pass 2, we create the object code for each line. Question 5a.4 Determine the object code for each instruction in Program T5a.1. Question 5a.5 What trouble would we have during pass 1 if the equ pseudo-ops were placed at the end of the program?
5.9
Tutorial 5b. Microcomputer-Based Lock To illustrate the software development process, we will implement a simple digital lock. The lock system has seven SPST switches and a solenoid as shown in Figure T5b.1. The maximum output current for a 7406 is 40 mA, so this circuit will work for 5 V solenoids with an internal impedance greater than 125 (5 V/40 mA). If the 7-bit binary pattern on Port T bits 6 to 0 becomes 0100011 for at least 1 ms, then the solenoid will activate. The 1-ms delay will compensate for the switch bounce. For information on switches see Section 2.8 and for information on solenoids see Chapter 10. All we really need to understand is that Port T bits 6 to 0 are input signals to the computer and Port T bit 7 is an output signal. Before we write assembly code, we need to develop a software plan. Software development is an iterative process. Even though we list steps the development process in a 1,2,3 . . .order, in reality we iterate these steps over and over.
Figure T5b.1 Hardware configuration for a microcomputercontrolled lock.
+5V +5V 9S12 PT7
7406
Solenoid 10kΩ
PT6 PT5 PT4 PT3 PT2 PT1 PT0
Action: Start a fresh copy of TExaS. Create new program microcomputer and I/O files from within TExaS. Save these files as Tutor5b.rtf Tutor5b.uc and Tutor5b.io. Execute the Mode->Processor . . . command and select the processor you wish to use. You could short-cut through this tutorial by copying the Tutor5b.rtf, Tutor5b.uc, and Tutor5b.io files from the web instead of building them up from scratch.
5.9 䡲 Tutorial 5b. Microcomputer-Based Lock
183
Question 5b.1 What are the RAM and ROM locations for your microcomputer? Action: Type the following assembly code into the Tutor5b.rtf file, replacing RAM with the first RAM address, and ROM with the first ROM address. For the 9S12 you set STCK to the last RAM address plus 1. org RAM ; global variables will go here org ROM main lds #STCK ; program will go here stop ; constant data will go here org $FFFE fdb main Action: We begin with a list of the inputs and outputs. We specify the range of values and their significance. In this example, we will use PTT, with bits 6-0 being inputs. The seven input signals represent an unsigned integer from 0 to 127. Port T bit 7 will be an output. If PT7 is 1 then the solenoid will activate and the door will be unlocked. Click on the Tutor5b.io window and add seven positive logic switches to PT6 to PT0, and one LED to PT7. The LED will simulate the solenoid. Figure T5b.2 shows resulting IO window. The switches in both Figures T5b.1 and T5b.2 are positive logic, meaning if the switch is pushed a logic high is seen at the corresponding input.
Figure T5b.2 I/O window for the microcomputercontrolled lock.
Action: Open the Port12.rtf file. Copy the PTT and DDRT lines, and paste them into your Tutor5b.rtf file. Question 5b.2 What are the addresses of PTT and DDRT?
184
5 䡲 Modular Programming Action: Next, we make a list of the required data structures. Data structures are used to save information. If the data needs to be permanent, then it is allocates in global space. If the software will change its value then it will be allocated in RAM. In this example we need a 16-bit unsigned counter. Add this code to the global variable section. The rmb pseudo-op will reserve multiple bytes. cnt rmb 2 ; 16-bit counter If data structure can be defined at assembly time and will remain fixed, then it can be allocated in EEPROM. In this example, we will define an 8-bit fixed constant to hold the key code, which the operator needs to set to unlock the door. We will place these lines directly after the program so that they will be defined in ROM or EEPROM memory. The fcb pseudo-op defines an 8-bit constant. Add this code to the constant data section (after the bra loop and before the org $FFFE). This line also assigns the symbolic name key to the corresponding address of the information. key fcb %00100011 ; key code It is not real clear at this point exactly where in EEPROM this constant will be, but luckily for us, the assembler will calculate the exact address automatically. After the program is assembled, we can look at the line in the listing file or in the symbol table to see where in memory each structure is allocated. Action: Next we develop the software algorithm, which is a sequence of operations we wish to execute. There are many approaches to describing the plan. Experienced programmers can develop the algorithm directly in assembly language. On the other hand, most of us need an abstractive method to document the desired sequence of actions. Flowcharts, pseudo-code, and high-level language code are three common descriptive formats. The TExaS application is unique in regards that if you draw the flowchart on the computer, you can paste it directly into the program as a comment. There are no formal rules regarding pseudo-code, rather it is a shorthand for describing what to do and when to do it. We can place our pseudo-code as documentation into the comment fields of our program. Figure T5b.3 shows a flowchart on the left and pseudo-code and C code on the right for our digital lock example. The loop counter (400) is the number of times the loop must be executed to wait 1 ms.
Figure T5b.3 Flowchart, pseudocode and C code for a microcomputercontrolled lock.
main Initialize ports Solenoid=off cnt=400 different
switches match key
Solenoid =off cnt=400
cnt=cnt-1 cnt
>0
0 Solenoid=on
Pseudo Code 1) initialize ports PT6-PT0 inputs PT7 output 2) turn off solenoid 3) set counter to 400 4) repeat indefinitely if switch matches key a) decrement counter b) if counter is zero turn on solenoid otherwise a) turn off solenoid b) set counter to 400 C Code DDRT=0x80; PTT=0; cnt=400; while(1){ if((PTT&0x7F==key){ if((--cnt)==0) PTT |= 0x80;} else{ PTT=0; cnt=400;}}
Next we write assembly code to implement the algorithm as illustrated in the above flowchart and pseudo code. In step 1), we initialize Port T so that PT7 is an output and PT6 to PT0 are inputs. ldaa #$80 staa DDRT ; PT6-PT0 input, PT7 output In step 2), we turn off the solenoid. Remember, writing to an input pin has no effect, so this operation only changes bit 7. clr PTT ; disable solenoid lock
5.9 䡲 Tutorial 5b. Microcomputer-Based Lock
185
In step 3), we initialize the counter to 400, which is the number of loops required to wait 1 ms. The 9S12 requires 20 cycles to execute the loop. ldx #400 stx cnt ; 1,000,000ns/(125*20) In step 4) we implement the indefinite loop. We place an assembly label at the program locations to which we wish to branch. The bra instruction is an unconditional branch. loop bra loop Inside the indefinite loop we test to see if the switch pattern matches the key code. In this implementation we branch to off if the switches do not match the key code. If they do match, we will execute the instruction immediately after the bne off. loop ldaa PTT ; [3] input from 7 switches anda #$7F ; [1] cmpa key ; [3] match key code? bne off ; [3] If the switches match the key code, then the 16-bit counter is decremented. ldx cnt ; [3] dex ; [1] stx cnt ; [3] If the counter becomes zero, then the door is unlocked. The bne instruction will go to loop if cnt is not equal to zero. bne loop ; [3]=20 cycles/loop ldaa #$80 staa PTT ; enable solenoid lock bra loop If the switches do not match the key code, then the solenoid is turned off and the cnt set back to 400. off ldx #400 stx cnt ; 1,000,000ns/(125*20) clr PTT ; disable solenoid lock bra loop We put the above pieces together to create the source code, as shown in Program T5b.1. The order of the instructions is very important because it determines the sequence of execution. The last two lines will define where the computer will start execution after a reset.
Program T5b.1 Lock program for Tutorial 5b.
; activate solenoid (PT7=1) if switches match key code PTT equ $0240 ; PT6-PT0 switches, PT7 solenoid lock DDRT equ $0242 ; specifies input or output org $0800 ; RAM cnt rmb 2 ; 16-bit counter org $4000 ; EEPROM main lds #$4000 ldaa #$80 staa DDRT ; PC6-PC0 input, PC7 output clr PTT ; disable solenoid lock ldx #400 stx cnt ; 1,000,000ns/(125*20) loop ldaa PTT ; [3] input from 7 switches anda #$7F ; [1] cmpa key ; [3] match key code? bne off ; [3] ldx cnt ; [3] dex ; [1]
continued on p. 186
186
5 䡲 Modular Programming
continued from p. 185 stx bne ; 7 switches ldaa staa bra off ldx stx clr bra key fcb org fdb
cnt ; [3] loop ; [3]=20 cycles/loop match key code for more than 10 ms #$80 PTT ; enable solenoid lock loop #400 cnt ; 1,000,000ns/(125*20) PTT ; disable solenoid lock loop %00100011 ; key code $FFFE main
Action: The last stage is debugging. You should run the system to verify its proper behavior. For a simple system like this, we could test all 128 possible input values, verifying that only 0100011 is the only code that unlocks the door (turns on the LED). Question 5b.3 What switch pattern activates the solenoid (turns on the LED)? Question 5b.4 How do you change the program so the key is Sw6,5,4off, Sw3,2,1,0on?
5.10
Homework Problems Homework 5.1 Assume you have a 16-bit unsigned global variable H. Write assembly code that implements if(H > 1234)isGreater(); Homework 5.2 Assume you have a 16-bit signed global variable H. Write assembly code that implements if(H > -1234)isGreater(); Homework 5.3 Assume you have an 8-bit unsigned global variable G. Write assembly code that implements if(G < 50) isLess(); else isMore(); Homework 5.4 Assume you have a 16-bit signed global variable H. Write assembly code that implements if(H < -500) isLess(); else isMore(); Homework 5.5 Assume you have an 8-bit global variable G. Write assembly code that implements while(G&0x80)body(); Homework 5.6 Write assembly code that implements while(PTT&0x01)body(); Homework 5.7 You will write four assembly language versions of the following C code n=100; while(n!=0){n--; body();} a) Assume the variable n is implemented as a 16-bit global variable. b) Assume the variable n is implemented as an 8-bit global variable. c) Assume the variable n is implemented as a 16-bit variable in Register D. d) Assume the variable n is implemented as an 8-bit variable in Register A. Homework 5.8 You will write four assembly language versions of the following C code n=0; while(n<100){n++; body();} a) Assume the variable n is implemented as a 16-bit global variable. b) Assume the variable n is implemented as an 8-bit global variable. c) Assume the variable n is implemented as a 16-bit variable in Register D. d) Assume the variable n is implemented as an 8-bit variable in Register A.
5.10 䡲 Homework Problems
187
Homework 5.9 You will write three assembly language versions of the following C code n=1000; while(n!=0){n--; body();} a) Assume the variable n is implemented as a 16-bit global variable. b) Assume the variable n is implemented as a 16-bit variable in Register D. c) Assume the variable n is implemented as a 16-bit variable in Register X. Homework 5.10 You will write three assembly language versions of the following C code n=0; while(n<1000){n++; body();} a) Assume the variable n is implemented as a 16-bit global variable. b) Assume the variable n is implemented as a 16-bit variable in Register D. c) Assume the variable n is implemented as a 16-bit variable in Register X. Homework 5.11 There are two 16-bit unsigned variables, called Input and Output. Write assembly code that checks the Input, and if Input is less than 100, then the code sets the Output to 40. Conversely if Input is greater than or equal to 100, then the code does not modify Output. Homework 5.12 There are two 16-bit signed variables, called Input and Output. Write assembly code that checks the Input, and if Input is less than 100, then the code sets the Output to 200. Conversely if Input is greater than or equal to 100, then the code does not modify Output. Homework 5.13 Consider the following assembly language unstructured code ldaa PTT anda #$80 ;test PT7 bne error ldaa PTP anda #$40 ;test PP6 beq done error ldaa PTH oraa #$20 staa PTH ;set PH5 if PT7==0 or PP6==1 done a) Optimize this assembly code using instructions bset bclr brclr and brset. b) Draw a flowchart of this unstructured code. c) Redesign the algorithm using structured programming. In particular, draw a new flowchart than performs the same operation using structured components. d) Rewrite the assembly code based on the structured flowchart from part c). Homework 5.14 Assume RegA contains an ASCII character. Write assembly code that converts any lower case letters (a to z) to upper case (A to Z). For example, if RegA is initially ‘g’, convert it to ‘G’. Leave all other characters unchanged. Homework 5.15 Assume RegA contains an ASCII character. Write assembly code that converts any upper case letters (A to Z) to lower case (a to z). For example, if RegA is initially ‘G’, convert it to ‘g’. Leave all other characters unchanged. Homework 5.16 Write an assembly subroutine that implements a median filter. The three 16-bit unsigned numbers are passed into the subroutine by value in Registers D, X, and Y. The median is the middle value of the three, sorted by size. The return parameter is passed back in Register D. If you need a temporary variable, you may define it in global RAM. Homework 5.17 Write an assembly subroutine that finds the least common multiple of two numbers. The inputs are passed in as 8-bit unsigned numbers in Registers A and B. The result is returned as a 16-bit unsigned number in Register D. Homework 5.18 You are given a stopwatch module with the following functions. Try and guess what each function does. Watch_SetTimerResolution, Watch_StartTimer, Watch_StopTimer, Watch_DisplayTime.
188
5 䡲 Modular Programming Homework 5.19 Write a subroutine SwapGT that takes two 8-bit unsigned inputs. The inputs are passed in using RegA and RegB. The subroutine swaps the contents of registers RegA and RegB only when (RegA)( RegB), otherwise it does nothing. You may assume that you have access to a 8-bit variable in RAM labeled TMP that you can use for doing the swap. Homework 5.20 Write a subroutine SwapGT that takes two 16-bit unsigned inputs. The inputs are passed in using RegX and RegY. The subroutine swaps the contents of registers RegX and RegY only when (RegX)( RegY), otherwise it does nothing. You may assume that you have access to a 16-bit variable in RAM labeled TMP that you can use for doing the swap. Homework 5.21 Write a main program and a subroutine Check that inputs from Port T bit 4 and outputs to Port T bit 3. The software system will continuously check the status of the input signal, setting the output high if the input ever becomes high. The system never sets the output low. Write code that is friendly. The main program first initializes the stack. Next it should make Port T bit 4 an input and Port T bit 3 an output. The body of the main program should call the subroutine, Check, over and over without stopping. The assembly language subroutine called Check, first reads Port T bit 4. If Port T bit 4 is 1, then set Port T bit 3 equal to 1. If Port T bit 4 is 0, then Port T bit 3 is not changed. Homework 5.22 Assume there is a switch attached to PT7, which will be an input. There is a 16-bit output on Ports P and H, with Port P being the most significant byte. The 16-bit output is initially 0. Write a main program that continuously checks the status of the switch, incrementing the 16-bit output every time the switch is pressed and released. After initialization, the body of the main program execute over and over. Homework 5.23 Assume there are two 8-bit unsigned digital inputs attached to Ports P and H respectively. Port P contains the measured room temperature in °F, and Port H contains the desired room temperature also in °F. There is an output on PT0 interfaced to a SSR that controls the air conditioner. After initialization the bang-bang temperature controller should implement this algorithm: If the actual temperature is two degrees above desired, turn the AC on If the actual temperature is two degrees below desired, turn the AC off After initialization, the body of the main program execute over and over. Homework 5.24 Assume there are two 8-bit signed digital inputs attached to Ports P and H respectively. Port P contains the measured room temperature in °C, and Port H contains the desired room temperature also in °C. There is an output on PT0 interfaced to a SSR that controls the heater. After initialization the bang-bang temperature controller should implement this algorithm: If the actual temperature is two degrees below desired, turn the heater on If the actual temperature is two degrees above desired, turn the heater off After initialization, the body of the main program execute over and over. Homework 5.25 Assume there are two 8-bit unsigned digital inputs attached to Ports P and H respectively. Port P contains the measured motor speed in rotations per second (rps), and Port H contains the desired motor speed also in rps. There is an 8-bit unsigned output on Port T interfaced to the motor that controls power to the motor. PTT0 means no power, PTT255 is full power. After initialization the incremental motor controller should implement this algorithm: If the actual speed is less than the desired and if PTT255, then increment PTT If the actual speed equals the desired, then do not change PTT If the actual speed is greater than the desired and if PTT0, then decrement PTT After initialization, the body of the main program execute over and over. Homework 5.26 The goal is to design a one-wheelled balancing robot. Assume there is an 8-bit signed digital input attached to Port P. Port P contains the measured angle of the robot with respect to the ground in degrees. The desired position is straight up, which is 0 degrees. There is an 8-bit
5.10 䡲 Homework Problems
189
signed output on Port T interfaced to the wheel that controls torque to the wheel. PTT 128 means full clockwise torque, PTT 0 means no torque, and PTT 127 means full counterclockwise torque. After initialization the incremental balancing controller should implement this algorithm: If the angle is less than zero and if PTT 127, then increment PTT If the angle equals zero, then do not change PTT If the angle is greater than zero and if PTT 128, then decrement PTT After initialization, the body of the main program execute over and over. Homework 5.27 Write a debugging instrument (a subroutine) that first checks the value of Port T bit 0. If PT0 is 1, then it displays the value of Register D. If PT0 is 0, the instrument returns without performing any output. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. Save and restore any registers that you modify including the CCR. The subroutine will be added to the original software using an editor, then the combination will be assembled and downloaded to the target. Homework 5.28 Write a debugging instrument (a subroutine) that displays the value of the PTT. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. Save and restore any registers that you modify including the CCR. The subroutine will be added to the original software using an editor, then the combination will be assembled and downloaded to the target. Homework 5.29 We can use macros to add new instructions to the 9S12. If we inject an fcb $21 into our assembly source code, an interesting behavior is invoked. Before injection the computer executes L1, L2, L3, then L4. After injection of this new instruction, which lines lines are executed? Hint: type this into the simulator and execute both versions.
Describe in general the behavior caused by inserted an fcb $21 into an assembly program. Give this new instruction a name and write a macro definition for it. Homework 5.30 We can use macros to add new instructions to the 9S12. If we inject an fcb $8F into our assembly source code, an interesting behavior is invoked. Before injection the computer executes L1, L2, L3, then L4. After injection of this new instruction, which lines lines are executed? Hint: type this into the simulator and execute both versions.
Describe in general the behavior caused by inserted an fcb $8F into an assembly program. Give this new instruction a name and write a macro definition for it.
190
5.11
5 䡲 Modular Programming
Laboratory Assignments Lab 5.1 Debugging with Print Statements Purpose: The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project you will run with a short time delay. After the software is debugged, you will build your hardware and run your software on the real 9S12. During this phase of the project you will run with time delays long enough so you will be able to see the LED flash (slower than 8 Hz). Description: You will first design a system, and then add debugging instruments to prove the system is functioning properly. The system has one input switch and one output LED. The basic function of the system is to respond to the input switch, causing certain output patterns on the LED. Interface a positive logic switch to PT3. This means the PT3 signal will be 0 (low, 0V) if the switch is not pressed, and the PT3 signal will be 1 (high, 5V) if the switch is pressed. Overall functionality of this system is described in the following rules. The system starts with the LED off (make PT2 0). The system will return to the off state if the switch is not pressed (PT3 is 0). If the switch is pressed (PT3 is 1), then the LED will flash on and off at about 4 Hz. During the first phase of this lab, you will simulate these hardware circuits in TExaS using positive logic mode for the switch and LED. During the second phase, you will interface a real switch and LED to your 9S12. When visualizing software running in real-time on an actual microcomputer, it is important use minimally intrusive debugging tools. However, this lab will utilize intrusive debugging technique using the SCI channel for printing results. During the first phase of this lab, you will develop and test your program and debugging instruments on the TExaS simulator. In particular, you will write debugging instruments to print input and output information as your system runs in real time. This software print statements will send information to the PC via the SCI channel. During the second phase of this lab, you will run your system on the real 9S12 with and without your debugging instruments. a) Design the hardware interface of the switch and LED. First in TExaS, then on the real system. b) Write a main program that implements the input/output system. To implement the 125ms delay, use the timer functions from Chapter 4. The basic steps for the main program are shown in Program L5.1.
loop
wait flash
Initialize the stack pointer Enable interrupts for the Metrowerks debugger, cli Set the direction register so PT3 is an input and PT2 is an output Set PT2 so the LED is off delay about 125ms (any delay from 60 to 500 ms is OK) read the switch and go to flash if the switch is pressed Set PT2 so the LED is off read the switch and go to wait if the switch is not pressed toggle the LED (if on turn it off, if off turn it on) go to loop
DDRT &= ~0x08; // PT3 input DDRT |= 0x04; // PT2 output PTT &= ~0x04; // PT2 off while(1){ Delay(); // you write this if((PTT&0x08)==0){ PTT &= ~0x04; // PT2 off while((PTT&0x08)==0){}; } PTT = PTT^0x04; // toggle }
Program L5.1 Program used to develop debugging instruments. c) Write a debugging subroutine that prints the value of Port T in hexadecimal. This is called functional debugging because you are capturing input/output data of the system, without information specifying when the input/output was collected. The subroutine (Debug_ Capture) outputs one data-point (PT3 input data, and PT2 output data) to the PC. Since
5.11 䡲 Laboratory Assignments
191
there are only two bits to save, pack the information into one 8-bit value for storage and ease of visualization. For example, if Input (PT3)
Output (PT2)
Transmitted data
0 0 1 1
0 1 0 1
0000,00002, or $00 0000,00012, or $01 0001,00002, or $10 0001,00012, or $11
In this way, you will be able to visualize the entire array in an efficient manner. Place a call to SCI_Init at the beginning of the system, and a call to Debug_Capture just after each time you output to PTT (there will be 3 or 4 places where your software writes to PTT). The basic steps involved in designing Debug_Capture are as follows Read PTT Mask capturing just bits 3,2 Send information to PC Send CR and LF
data PTT data ((data&$08)1)((data&$04)2) SCI_OutHex(data) SCI_OutChar(13); SCI_OutChar(10);
The debugging routine should save and restore registers that it modifies (except CCR), so that the original program is not affected by the execution of the debugging instruments. d) Using the baud rate of the SCI channel, estimate the execution time of the Debug_Capture subroutine. This time will be a quantitative measure of the intrusiveness of your debugging instrument. There are many types of recursion. The factorial and the decimal output functions in Section 5.4 are examples of linear recursion, because only one call is made to the function within the function. A tail recursive function has the recursive call as the last action taken by the function. A tail recursive function can be implemented in an iterative manner by removing the recursive call and substituting it with a loop. A binary recursive function calls itself twice during the course of its execution. For Labs 5.2 and 5.3, pass parameters in registers and place local variables also in registers. Implement 16-bit unsigned arithmetic. Design a main program to test the functionality of your solution. Measure the execution speed and required stack space of both versions for 5 different input values. Generalize the results. Lab 5.2 Tail Recursion Implement the following recursive greatest common divisor function in assembly language. Convert the operation to a nonrecursive algorithm, and implement it also in assembly language. unsigned short gcd(unsigned short m, unsigned short n){ unsigned short r; if(m < n) return gcd(n,m); r = m%n; if(r == 0)return(n); return(gcd(n,r)); } Lab 5.3 Binary Recursion nCk
is the number of combinations of choosing n elements out of a set of k elements. Implement the following recursive function in assembly language. Convert the operation to a nonrecursive algorithm, and implement it also in assembly language. unsigned short nCk(unsigned short n, unsigned short k){ if((k == 0)||(n == k)) return(1); return(nCk(n-1,k) + nCk(n-1,k-1));}
6
Pointers and Data Structures Chapter 6 objectives are to: c c c c c
Implement pointers using indexed addressing modes Use pointers to access arrays, strings, structures, tables, and matrices Present finite-state machines as an abstractive design methodology Describe how the paged memory system allows memory sizes above 64 KiB Present minimally intrusive methods for functional debugging
Data are brought into registers temporarily for manipulation and decision making. However, on a long term basis, we store data in memory. If the data values are known at design time, and do not change, we place them in ROM. If we do not know the values at assembly time, or if the values vary with time, we need to place them in RAM. When we write software that manipulates the exact same data each time, then we can know its address at assembly time and use direct or extended addressing to access the data. For example, because the addresses of the I/O ports are fixed, we typically use direct or extended addressing to access I/O. Conversely, sometimes we write software that operates on different data at different times (e.g., calculating the average of a set of numbers in various buffers or outputting different strings on a LCD display.) In these cases, we need a way to access data, where the data we are operating on is determined at run-time. For these situations, we use pointers. A pointer is simply an address. A pointer (e.g., Pt in Figure 6.1) is a variable where the contents of the variable is not data, but rather an address. Our software can be extremely flexible if we allow the address to change dynamically. On the 9S12, pointers are 16-bits containing the address of the data of interest. Before we use a pointer, we must initialize it, so it points to an object. We can also change a pointer at run time, so it points to a different object, as shown in Figure 6.1. In this book, we will use the address 0 as the null pointer, meaning the pointer is not valid. The only limitation of this definition of null is that we will not be able to create a pointer to Port A, because Port A happens to be located at address $0000. In this chapter, the objects addressed by pointers will be data, but in Chapter 7, we will see an example of function pointers. A function pointer is an address pointing to a subroutine. The reset vector, stored at $FFFE and $FFFF, is an example of function pointer, because it contains a pointer to the main program.
192
6.1 䡲 Indexed Addressing Modes Used to Implement Pointers Figure 6.1 Pointers are addresses pointing to objects. The objects may be data, functions, or other pointers.
6.1
Not pointing to anything Pt
Object1
Pointing to Object1 Pt
Pointing to Object2
Object1
Object2
193
Pt
Object2
Object1 Object2
Indexed Addressing Modes Used to Implement Pointers The 9S12 instruction set has 15 addressing modes. Six of the modes were presented earlier (Section 2.5), and the remaining modes are presented in this section. At the assembly level, we implement pointers using indexed addressing modes. Basically, we place the address into Register X or Y, then use indexed addressing mode to access the data. In this case, Register X or Y temporarily holds the pointer. Figure 6.2 illustrates three examples that utilize pointers. In this figure, Pt, SP, GetPt, and PutPt are pointers, where the arrows show to which memory they point, and the shaded boxes represent data. An array or string is a simple structure containing multiple equal-sized elements. We set a pointer to the address of the first element, then use indexed addressing mode to access the elements inside. We have introduced the stack previously, and will cover it in more detail in Chapter 7. The SP points to the top element on the stack. A linked list contains some elements that are pointers themselves. The pointers are used to traverse the data structure. Example linked lists will be presented in Section 6.8. The first-in-firstout (FIFO) queue is an important data structure for I/O programming. There is a GetPt that points to the oldest data (to be removed next) and a PutPt that points to an empty space (location to be stored into next). The FIFO queue will be presented in detail in Chapter 12.
Figure 6.2 Examples of data structures that utilize pointers.
Array or string
Stack
Pt
Linked list
FIFO queue
Pt GetPt SP PutPt
6.1.1 Indexed Addressing Mode
Indexed addressing mode uses a fixed offset with the 16-bit registers: X, Y, SP, or PC. The offset can be 5-bit (16 to 15), 9-bit (256 to 127), or 16-bit. Five bit (16 to 15) index mode requires one machine byte to encode the operand. In the first example that uses 5-bit indexed mode, $6A is the staa instruction and $5C is the index mode operand. Refer to the Freescale CPU12 Reference Manual to see the machine codes for all the instructions. Machine code $6A5C
Opcode staa
Operand -4,Y
Comment [Y-4] = RegA
In this example, assume RegA is $56 and RegY is $0823. The instruction staa 4,Y will store a copy of the value in Register A at $081F leaving Register Y unchanged, as shown in Figure 6.3. The effective address register (EAR) will contain the value $0823-4, which equals $081F. Let n,R be the indexed address mode with the fixed offset n and index register R is the index register, then the effective address will be Rn.
194
6 䡲 Pointers and Data Structures
Figure 6.3 Example of the 9S12 indexed addressing mode.
RAM Y $0823 A $56
$081E $081F $0820 $0821
$56
EEPROM $F800 $F801 $6A } staa -4,Y $F802 $5C $F803
Nine bit (256 to 17) or (16 to 255) indexed mode requires two machine bytes to encode the operand. Machine code $6AE840
Opcode staa
Operand $40,Y
Comment [Y+$40] = RegA
Again, assume RegA is $56 and RegY is $0823. The instruction staa $40,Y will store a copy of the value in Register A at $0863 leaving Register Y unchanged, as shown in Figure 6.4. The EAR will be Y$40, which is $0823$40 or $0863. Figure 6.4 Another example of the 9S12 indexed addressing mode.
RAM Y $0823 A $56
$0862 $0863 $0864 $0865
$56
EEPROM $F800 $F801 $6A $F802 $E8 staa $40,Y $F803 $40
}
Sixteen-bit indexed mode requires three machine bytes to encode the operand. Machine code $6AEA0200
Opcode staa
Operand $200,Y
Comment [Y+$200] = RegA
Again, assume RegA is $56 and RegY is $0823. The instruction staa $200,Y will store a copy of the value in Register A at $0A23 leaving Register Y unchanged, as shown in Figure 6.5. The EAR will be Y$200, which is $0823$200 or $0A23. Figure 6.5 A third example of the 9S12 indexed addressing mode.
Due to the properties of 16-bit addition, the 16-bit offset can be interpreted either as unsigned (0 to 65535) or signed (32768 to 32767.) Indexed mode is useful when addressing the data structures and information on the stack. In each case presented so far, the 16-bit register used as a pointer (index) is not modified by the instruction. Common Error: SP relative indexed addressing with a negative constant is usually defined as an illegal stack access.
Accessing 16-bit data structures with indexed addressing is slightly different in assembly versus in C. For example, if we create an array of the first ten prime numbers stored as 16-bit integers, we could allocate the structure in ROM using the fdb pseudo-op. E.g., Prime fdb 1,2,3,5,7,11,13,17,19,23
The equivalent ROM-based definition is C would be unsigned short const Prime[10]={1,2,3,5,7,11,13,17,19,23};
By convention, we define Prime[0] as the first element. In C, if we want element number 4, which is actually the fifth element, we use the expression Prime[4] to fetch the 7 out of the structure. In assembly, however, we are responsible for knowing each element is two bytes and element number 4 is actually at byte number 8. In particular to read element number 4 into RegD we need to perform either ldx #Prime ldd 8,x
;pointer to the structure ;read Prime[4]
6.1 䡲 Indexed Addressing Modes Used to Implement Pointers
195
or we could have fetched it directly as ldd Prime+8
;read Prime[4]
Either way, manipulating addresses in assembly always involves the physical byte-address regardless of the precision of the data. Similarly assume we have a pointer to Prime, and we want to increment the pointer to the next element. In C, we define the pointer as unsigned short const *Pt;
and initialize it as Pt =
Prime;
Now, to increment the pointer to the next element in C, use the expression Pt++. Similarly in assembly, we can define the pointer in RAM as Pt
rmb 2
;16-bit pointer to Prime
and initialize it as ldx #Prime stx Pt ;pointer to Prime[0]
However, to increment the pointer to the next element we have to add 2 to the pointer. E.g., ldx Pt inx inx stx Pt
6.1.2 Auto Pre/Post Decrement/ Increment Indexed Addressing Mode
;previous pointer ;next element in the 16-bit structure
Auto pre/post decrement/increment indexed addressing uses the 16-bit registers: X, Y, or SP. The PC can not be used with these index modes that modify the index register. In each case, the 16-bit register used as a pointer is modified either before (pre) or after (post) the memory access. These modes are useful when addressing the data structures sequentially. The 9S12 allows the programmer to specify the amount added to (subtracted from) the index register from 1 to 8. In each case, assume Reg Y is initially 2345. Post-increment addressing first accesses the data then adds to the index register: staa 1,Y+ ;Store a copy of value in Reg A at 2345, then Reg Y=2346 staa 4,Y+ ;Store a copy of value in Reg A at 2345, then Reg Y=2349
Pre-increment addressing first adds to the index register then accesses the data: staa 1,+Y ;Reg Y=2346, then store a copy of value in Reg A at 2346 staa 4,+Y ;Reg Y=2349, then store a copy of value in Reg A at 2349
Post-decrement addressing first accesses the data then subtracts from the index register: staa 1,Y- ;Store a copy of value in Reg A at 2345, then Reg Y=2344 staa 4,Y- ;Store a copy of value in Reg A at 2345, then Reg Y=2341
Pre-decrement addressing first subtracts from the index register then accesses the data: staa 1,-Y ;Reg Y=2344, then store a copy of value in Reg A at 2344 staa 4,-Y ;Reg Y=2341, then store a copy of value in Reg A at 2341 Observation: Usually we would add/subtract one when accessing an 8-bit value and add/subtract two when accessing a 16-bit value. Common Error: The improper use of these index modes with the SP can result in an illegal stack access or unbalanced stack.
196
6 䡲 Pointers and Data Structures
6.1.3 AccumulatorOffset Indexed Addressing Mode
Accumulator-offset indexed addressing uses two registers. The offset is located in one of the accumulators A, B, or D; the base address is placed in one of the 16-bit registers: X, Y, SP, or PC. In each case, the accumulator used for the offset and the index register are not modified by the instruction. Examples: ldab #4 ldy #2345 staa B,Y ;Store a copy of value in Reg A at 2349 (B & Y unchanged) Observation: Accumulator-offset indexed addressing is efficient for accessing arrays. We can place the index in an accumulator and the base address in Reg X or Reg Y.
6.1.4 Indexed-Indirect Addressing Mode
Indexed-indirect addressing mode uses a fixed offset with the 16-bit registers: X, Y, SP, or PC. The fixed offset is always 16 bits. The fixed 16-bit value is added to the index register (X, Y, SP, or PC) and is used to fetch a second 16-bit big endian address from memory. The load or store is performed at this second address, as shown in Figure 6.6. Indexed indirect mode is useful when data structures contain pointers. In each case, the 16-bit index register and the memory pointer are not modified by the instruction. For example, ldy #2345 staa [-4,Y]
Figure 6.6 Example of the 9S12 indexed-indirect addressing mode.
6.1.6 Post-Byte Machine Coded for Indexed Addressing Table 6.1 The 9S12 indexed register code.
1233 1234 1235
A
56
Y
2345
2340 2341 2342
56
12 34
Accumulator-D offset indexed-indirect addressing uses two registers. The offset is located in accumulator D, and the base address is in one of the 16-bit registers: X, Y, SP, or PC. The value in D is added to the index register (X, Y, SP, or PC) and used to fetch a second 16-bit big endian address from memory, as shown in Figure 6.7. The load or store is performed at this second address. This mode is also useful when data structures contain pointers. In each case, accumulator D and the index register used as a pointer (index) are not modified by the instruction. For example ldd ldy stx
Figure 6.7 Example of the 9S12 accumulator-offset indexed-indirect addressing mode.
;fetch 16-bit address from 2341, store 56 at 1234
#4 #2341 [D,Y] ;Store copy of value in Reg X at 1234 (D & Y unchanged)
D X
0004 5678
Y
2341 D+Y=2345
1233 1234 1235
56 78
2344 2345 2346
12 34
Table 6.1 shows the object code for rr, which specifies X, Y, SP, or PC. Table 6.2 gives the postbyte object code for the indexed mode instructions. The expression nnnnn for the 5-bit constant instructions specifies the 5-bit 2’s complement offset (16 to 15). For example, if the addressing mode is 14,X, rr is 00 signifying Reg X. As a 5-bit 2’s rr
Register
00 01 10 11
X Y SP PC
6.1 䡲 Indexed Addressing Modes Used to Implement Pointers
5-bit constant, n = 0 5-bit constant, n from -16 to +15 pre-increment, n from 1 to 8 pre-decrement, n from 1 to 8 post-increment, n from 1 to 8 post-decrement, n from 1 to 8 Reg A accumulator offset Reg B accumulator offset Reg D accumulator offset 9-bit constant, n -256 to 255 16-bit constant, any 16-bit n Reg D offset, indirect 16-bit constant, indirect
Table 6.2 Postbyte values for the 9S12 indexed-addressing modes.
complement number, 14 is 100102. Thus, for this addressing mode, the postbyte xb is rr0nnnnn 00010010 $12. The expression nnn for the post/pre increment instructions determines the increment value according to n nnn 1. For example, if the addressing mode is 3,Y, rr is 01 signifying Reg Y, and nnn n 1 or 0102. Thus, for this addressing mode, the postbyte xb is rr110nnn 01110010 $72. The expression nnn for the post/pre decrement instructions determines the decrement value according to n 8 nnn. For example, if the addressing mode is 3,Y, rr is 01 signifying Reg Y, and nnn 8 n or 1012. Thus, for this addressing mode, the postbyte xb is rr111nnn 01111101 $7D. The expression nnnnnnnnn for the 9-bit constant instruction specifies the 9-bit 2’s complement offset (256 to 255). For this addressing mode, the postbyte requires two bytes of machine code. For example, if the addressing mode is 100,X, rr is 00 signifying Reg X, and 100 as a 9-bit 2’s complement number is 1,1001,11002. Thus, for this addressing mode, the postbyte xb is 111rr00nnnnnnnnn 1110000110011102 $E19C. The postbyte for the IDX2 instructions will require three bytes of machine code. ffee refers to the 16-bit offset needed for IDX2 modes. Checkpoint 6.1: What is the xb postbyte for addressing mode -10,Y? Checkpoint 6.2: What is the xb postbyte for addressing mode 1,-SP? Checkpoint 6.3: What is the xb postbyte for addressing mode -1000,X?
6.1.6 Load Effective Address Instructions
The 9S12 load-effective address instructions only can be used with indexed addressing mode. These instructions are very useful for manipulating the 16-bit registers. There are three instructions leax leay and leas affecting X,Y, and SP respectively. Let idx represent one of the index addressing modes shown in Table 6.2. They do not affect any condition code bits. leax idx leay idx leas idx
;RegX=EA ;RegY=EA ;RegS=EA
The basic idea is the effective address is calculated in the usual manner. But rather than fetching the memory contents at that address like a regular load instruction (ldaa, ldab, ldd, ldx, ldy, and lds) would, this instruction puts the effective address itself into the register. In each of the following cases, the effective address, EA, is loaded into Register X. leax m,r leax v,+r leax v,-r
;IDX 5-bit index, X=r+m (-16 to 15) ;IDX pre-increment, r=r+v, X=r (1 to 8) ;IDX pre-decrement, r=r-v, X=r (1 to 8)
198
6 䡲 Pointers and Data Structures leax leax leax leax leax leax leax
v,r+ v,rA,r B,r D,r q,r W,r
;IDX post-increment, X=r, r=r+v (1 to 8) ;IDX post-decrement, X=r, r=r-v (1 to 8) ;IDX Reg A offset, X=r+A, zero padded ;IDX Reg B offset, X=r+B, zero padded ;IDX Reg D offset, X=r+D ;IDX1 9-bit index, X=r+q (-256 to 255) ;IDX2 16-bit index, X=r+W (-32768 to 65535)
where r is Reg X, Y, SP, or PC. The fixed constants are m is any signed 5-bit 16 to 15 q is any signed 9-bit 256 to 255 v is any unsigned 3 bit 1 to 8 W is any signed 16-bit 32768 to 32767 or any unsigned 16-bit 0 to 65535 Observation: The leas 4,sp instruction subtracts four from the stack pointer, allocating four bytes of uninitialized space on the stack. Checkpoint 6.4: Write 9S12 assembly code that sets Register Y equal to X 10. Checkpoint 6.5: Write 9S12 assembly code using the leas instruction to discard the top five bytes off the top of the stack.
6.1.7 Call-By-Reference Parameter Passing
The subroutines thus far in the book have utilized call-by-value parameter passing. With an input parameter using call by value, the data itself is passed into the subroutine. For an output parameter using return by value, the result of the subroutine is a value, and the value itself is returned. The most efficient mechanism to pass parameters is the registers. In Chapter 7, we will learn a more flexible, (but less efficient) technique to pass parameters using the stack. However, if we just use RegA, RegB, RegX, and RegY, the maximum number of bytes we can pass using call by value is six. Alternatively, if you pass a pointer to the data, rather than the data itself, we will be able to pass large amounts of data. Passing a pointer to data is classified as call by reference. For large amounts of data, call by reference is also very fast, because the data need not be copied from calling program to subroutine. In call by reference, the one copy of the data exists in the calling program, and a pointer to it is passed to the subroutine. In this way, the subroutine actually performs read/write access to the original data. Call by reference is also a convenient mechanism to return data as well. Passing a pointer to an object allows this object to be an input parameter and an output parameter. As an example, consider the situation where we wish to pass 100 bytes into the subroutine Sort. In this case, we have one or more buffers, defined in RAM, which initially contains data in an unsorted fashion. The buffers are shown here are uninitialized, but assume previously executed software has filled these buffers with corresponding voltage and pressure data. VBuffer rmb PBuffer rmb
100 100
;voltage data ;pressure data
Since 100 bytes is a lot larger than the six bytes available for storing data in registers, we will use call by reference. In this example, we use RegX as a pointer to the data. The calling sequence for sorting the voltage data in VBuffer could be ldx bsr
#VBuffer Sort
The calling sequence for sorting the pressure data in PBuffer could be ldx bsr
#PBuffer Sort
One advantage of call by reference in this example is the same buffer can be used also as the return parameter. In particular, this sort routine could shuffle the data around in the same
6.2 䡲 Arrays
199
original buffer. Since RAM is a scare commodity on most microcontrollers, not having to allocate two buffers will reduce RAM requirements for the system. From a security perspective, call by reference is more vulnerable than call by value. If we have important information, then a level of trust is required to pass a pointer to the original data to a subroutine. Since call by value creates a copy of the data at the time of the call, it is slower but more secure. With call by value, the original data is protected from subroutines that are called.
6.2
Arrays Random access means one can read and write any element in any order. Random access is allowed for all indexable data structures. An indexed data structure has elements of the same size and can be accessed knowing the name of the structure, the size of each element, and the element number. In C, we use the syntax [] to access an indexed structure. Arrays, matrices, and tables are examples of indexed structures presented in this chapter. Sequential access means one reads and writes the elements in order. Pointers are usually employed these types of data structures. Strings, linked lists, stacks, queues, and trees are examples of sequential structures. The first-in-first-out circular queue (FIFO) is useful for data-flow problems, and it will be presented in Chapter 12. An array is made of elements of equal precision and allows random access. The precision is the size of each element. Typically, precision is expressed in bits or bytes. The length is the number of elements. The origin is the index of the first element. A data structure with the index of the origin equal to zero is called zero-origin indexing. In C, zero-origin index is almost always used.
Example 6.1 Write a software module to control the read/write (R/W) head of an audio tape recorder. From the perspective shown in Figure 6.8, the stepper motor causes the R/W head to move up and down. This motion affects which audio track on the tape is under the head. The goal is to be able to move the motor one step at a time. Solution This module requires three public function: one for initialization, one to rotate one step clockwise, and one to rotate one step counterclockwise. By rotating the motor one step at a time, the software can control which audio track on the tape is under the R/W head. A stepper motor has Figure 6.8 A stepper motor is used in a cassette tape recorder to select the track. (Courtesy of Jonathan Valvano.)
200
6 䡲 Pointers and Data Structures
four digital control lines. To make the stepper motor spin, we output the sequence 5, 6, 10, and 9 over and over on these four lines. To make it spin in the other direction, we output the sequence in the other direction. This motor has 24 steps per revolution, therefore one step will change the shaft angle by exactly 15°. To make the motor step once, we output just the next number in the sequence. For example, if the output is currently at 5 and we wish to rotate the shaft by 15°, we simply output a 6. In this solution, we will store the 5, 6, 10, and 9 data in an array, as shown in Figure 6.9. For more information on the hardware interfacing of stepper motors see Section 8.7. Figure 6.9 A byte array with four elements (addresses are made up to illustrate the array is in ROM).
$4000 $4001 $4002 $4003
$05 $06 $0A $09
In C, we can access an element of the array using its name and an index. Assume PTT bits 3 to 0 are connected to the stepper motor. The initialization function makes those pins an output, and the Index is initialized to zero. In assembly, we can perform a similar function using indexed addressing, see Program 6.1. Assume Index is an 8-bit private global variable, defined in RAM, and initialized to zero. Index takes on the values 0, 1, 2, and 3. The instruction ldaa B,X adds the base address in RegX to the index in RegB, fetching the contents of the array at that index. Since the first output generated by Stepper_CW will be a 5, we will initialize the motor to 9; this way, the first call to Stepper_CW will move the motor. In this example, the subroutine is public but has no input or output parameters. Port T, the array and the index are private to this module. This means if another module wishes to move the motor, it can call the public function Stepper_CW, but does not have access to PTT, Data, or Index. The third public function, Stepper_CCW, is left as Homework Problem 6.29. const char static Data[4]= {0x05,0x06,0x0A,0x09}; unsigned char static Index; void Stepper_Init(void){ DDRT |= 0x0F; // PT3-0 are outputs PTT = 0x09; // first data Index = 0; // first index } void Stepper_CW(void){ PTT = Data[Index]; // rotate 15deg Index = 0x03&(Index+1); // next index }
org $0800 ;RAM Index rmb 1 ;0,1,2,3 org $4000 ;ROM Data fcb $05,$06,$0A,$09 Stepper_Init bset DDRT,#$0F movb #$09,PTT clr Index rts Stepper_CW ldab Index ldx #Data ldaa B,X staa PTT ;15 deg incb ;CW andb #$03 stab Index rts
Program 6.1 Stepper motor software that uses a byte array.
In general, let n be the precision of a zero-origin indexed array in bytes. If I is the index and Base is the beginning address of the array, then the address of the element at I is Base+n*I
The origin of an array is the index of the first element. The origin of a zero-origin indexed array is zero. In general, if origin is the origin of the array, then the address of the element at I is Base+n*(I-origin)
6.2 䡲 Arrays
201
Checkpoint 6.6: What is the precision, length, and total size of long data[5];?
Example 6.2 Design an exponential function, y 10x, with a 16-bit output. Solution Since the output is less than 65535, the input must be between 0 and 4. One simple solution is to employ a constant word array, as shown in Figure 6.10, and implemented in Program 6.2. In assembly, we define a word constant using fdb. Figure 6.10 A word array with five elements (addresses are made up to illustrate the array is in ROM).
$F950 $F952 $F954 $F956 $F958
1 10 100 1000 10000
In C, the syntax for accessing all array types is independent of precision. The compiler automatically performs the correct address correction. We will assume the input is less than or equal to 4. If I is the index and Base is the base address of the array, then the address of the element at I is Base+2*I
In assembly, we can access the array using indexed addressing. We will assume the Register B input is less than or equal to 4.
const short static Powers[5] ={1,10,100,1000,10000}; unsigned short power(unsigned char exp){ return Powers[exp]; // look up answer }
Program 6.2 Array implementation of a nonlinear function. Observation: When using an array to implement a nonlinear function, one possible solution is to use a small table with linear interpolation between points (for an example, see the help information within TExaS about etbl instruction).
In the previous examples, the length of the array was known. Sometimes, it is desirable to allow the length to vary dynamically. There are many mechanisms that allow for a variable length array. One simple mechanism saves the length of the array as the first element. In this way, we could add run time checking to make sure the index bounds are not exceeded. The previous examples could be defined as const char Data[5]={4,0x05,0x06,0x0A,0x09}; const short Powers[6]={5,1,10,100,1000,10000};
We could define these variable length arrays in assembly as Data fcb 4,$05,$06,$0A,$09 Powers fdb 5,1,10,100,1000,10000
Another common mechanism to handle variable length is a termination code. Typical codes are shown in Table 6.3. This method only can be used if it is not possible for the termination code to be present in the data.
202
6 䡲 Pointers and Data Structures
Table 6.3 Typical termination codes.
ASCII
Code
Name
NUL ETX EOT FF CR ETB
$00 $03 $04 $0C $0D $17
Null End of text End of transmission Form feed Carriage return End of transmission block
The software in Program 6.3 extends the stepper solution using an array with a null termination. We can use the termination code to determine when to reset the index back to zero. Data fcb $05,$06,$0A,$09,0 Stepper_CW ldab Index ldx #Data ldaa B,x staa PTT incb tst B,x ;null? bne ok clrb ;start over ok stab Index rts
const char Data[5]={0x05,0x06,0x0A,0x09,0}; unsigned char Index=0; void Stepper_CW(void){ PTT = Data[Index]; // move stepper Index++; // next index if(Data[Index] == 0){ // end? Index = 0; // start over } }
Program 6.3 Software to access a variable length byte array.
The stepper motor can be changed from full steps to half steps simply by changing the array, without modification to the program code. In particular, all we need to do is change the values to: {0x05,0x04,0x06,0x02,0x0A,0x08,0x09,0x01,0} Checkpoint 6.7: Why can’t you use a termination code to signify the end of a variable length data set where the data can be any binary value?
In the previous array examples, an index and base address were used together to access data in the array. An alternative approach is to use a single pointer. The pointer method will be more efficient than the index method (compare Program 6.3 to Program 6.4) when the data are accessed sequencially. Assume Pt is a private global pointer, defined in RAM, and initialized to point to the first element of Data. Data fcb $05,$06,$0A,$09,0 Stepper_CW ldx Pt ldaa 1,x+ staa PTT tst 0,x ;null? bne ok ldx #Data ;start over ok stx Pt rts
Program 6.4 Pointer method to access a variable length byte array.
Observation: The post-increment indexed addressing mode is convenient when using a pointer to access sequential data. Checkpoint 6.8: When accessing sequential data using post-increment indexed mode, how do we select the 1,2,3,4 in 1,x 2,x 3,x 4,x?
6.3
Strings A string is a data structure with equal-sized elements that only allows sequential access. The bytes of the string are always read in order from the first to the last. In contrast, an array allows random access to any element in any order. The same mechanisms introduced for variable length arrays will apply also to strings. In general, we store the length of the string in the first position, when the data can take on any value, negating the possibility of using a termination code. Example 6.3 Write software to output a sequence of values to a digital to analog converter. Solution In this system, the length of the string is stored in the first byte. This approach is appropriate when the data elements can take on all possible numeric values. Assume a DAC converter is connected to Port T. The function DAC, shown in Program 6.5, will output the string data to the DAC. The function uses a call by reference, meaning a pointer to the data is passed. The main program calls this function twice, with different data strings.
Data1 fcb 4 ;length fcb 0,50,100,50 ;data Data2 fcb 8 fcb 0,25,50,75,100,75,50,25 *Reg X points to the string data DAC ldab 0,x ;length loop inx ;next element ldaa 0,x ;data staa PTT ;out to DAC decb bne loop rts main lds #$4000 movb #$FF,DDRT mloop ldx #Data1 ;first string bsr DAC ldx #Data2 ;second string bsr DAC bra mloop
unsigned const char Data1[5]= {4,0,50,100,50}; unsigned const char Data2[9]= {8,0,25,50,75,100,75,50,25}; void DAC(unsigned char *pt){ unsigned int length; length = (*pt++); // size do{ PTT = (*pt++); } while(--length); } void main(void){ DDRT = 0xFF; // outputs to DAC while(1){ DAC(Data1); // first string DAC(Data2); // second string } }
Program 6.5 A variable length string contains DAC data.
In C, ASCII strings are stored with null termination. In C, the compiler automatically adds the zero at the end, but in assembly, the zero must be explicitly defined. Example 6.4 Write software to output an ASCII string to the serial port. Solution Because the length of the string may be too long to place all the ASCII characters into the registers at the same time, call-by-reference parameter passing will be used. With call by
204
6 䡲 Pointers and Data Structures
reference, a pointer to the string will be passed. The function OutString, shown in Program 6.6, will output the string data to the serial port. The function SCI_OutChar will be developed later in Chapter 8 and shown as Program 8.2. For now all we need to know is that it outputs a single ASCII character to the serial port. The main program calls this function twice, with different ASCII strings. Hello fcc “Hello World” fcb 0 CRLF fcb 13,10,0 ;Reg X points to the string data OutString ldaa 1,x+ ;next data beq done ;0 means end jsr SCI_OutChar bra OutString done rts main lds #$4000 bsr SCI_Init mloop ldx #Hello ;first string bsr OutString ldx #CRLF ;second string bsr OutString bra mloop
Program 6.6 A variable-length string contains ASCII data.
6.4
*Matrices A matrix is a two-dimensional data structure accessed by rows and columns. Each element of a matrix is the same type and precision. In C, we create matrices using two sets of brackets. Figure 6.11 shows this byte matrix with six 8-bit elements. The figure also shows two possible ways to map the two-dimensional data structure into the linear address space of memory. unsigned char M[2][3]; // byte matrix with 2 rows and 3 columns
Figure 6.11 A byte matrix with two rows and three columns.
0 row I 01
column J 1 2
M[0,0] M[0,1] M[0,2] M[1,0] M[1,1] M[1,2]
row major $0910 $0911 $0912 $0913 $0914 $0915
M[0,0] M[0,1] M[0,2] M[1,0] M[1,1] M[1,2]
column major row 0
row 1
$0910 $0911 $0912 $0913 $0914 $0915
M[0,0] M[1,0] M[0,1] M[1,1] M[0,2] M[1,2]
column 0 column 1 column 2
With row-major allocation, the elements of each row are stored together. Let I be the row index, J be the column index, n be the number of bytes in each row (equal to the number of columns), and Base be the base address of the byte matrix. Then the address of the element at I,J is Base+n*I+J
An assembly language subroutine that reads elements from this row-major matrix is shown in Program 6.7. The matrix data is passed using call by reference, and the indices
6.4 䡲 *Matrices Program 6.7 Assembly function to access a two by three row-major matrix.
205
;Column index J in RegB, Row index I in RegA ;RegX is the base address of M[I,J] Matrix_Read pshb ;Save J on stack ldab #3 ;number of columns mul ;3*I addb 1,SP+ ;3*I+J ldaa B,X ;read value at M[I,J] rts
are passed using call by value. The row index (0 or 1) is passed in Register A. The column index (0, 1, or 2) is passed in Register B. The base address of the matrix is passed in Register X. The subroutine returns the value in Register A. Notice the addressing mode 1,SP first reads the data at SP (J is on top of stack), then adds 1 to SP (discarding the data from the stack.) Checkpoint 6.9: Program 6.7 seems to have one push and no pull instructions. Is the stack balanced?
With column-major allocation, the elements of each column are stored together. Let I be the row index, J be the column index, m be the number of bytes in each column (equal to the number of rows), and Base be the base address of the byte matrix. Then the address of the element at I,J is Base+m*J+I
An assembly language subroutine that reads elements from this column-major matrix is shown in Program 6.8. Again, the matrix data is passed using call by reference, and the indices are passed using call by value. The row index (0 or 1) is passed in Register A. The column index (0, 1, or 2) is passed in Register B. The base address of the matrix is passed in Register X. The subroutine returns the value in Register A.
Program 6.8 Assembly function to access a two by three column-major matrix.
;Column index J in RegB, Row index I in RegA ;RegX is the base address of M[I,J] Matrix_Read aslb ;Reg B = 2*J abx ;Reg X = base + 2*J ldaa A,X ;Read a byte from base + 2*J + I rts
Checkpoint 6.10: Why is column-major format more efficient than row-major for this particular matrix?
With a word matrix, each element requires two bytes of storage. Let I be the row index, J be the column index, n be the number of words in each row (equal to the number of columns), and Base be the base address of the word matrix. Then the address of the element at I,J is Base+2*(n*I+J)
An assembly language subroutine that reads elements from a word matrix defined in rowmajor format is shown in Program 6.9. The number of columns, n, is defined with an equ pseudo-op. The subroutine returns the address of the element in Register X.
206
6 䡲 Pointers and Data Structures
Program 6.9 Assembly function to access a word matrix defined in row-major format.
* The matrix is m rows by n columns * ROW MAJOR Address = Base+2*(n*I+J) *Zero-origin indexing Each element is 2 bytes n equ 10 the number of columns (can be changed) *Input: Reg A is the row index(I) * Reg B is the column index(J) * Reg X is Base, points to the first element *Output: Reg X points to the (I,J) element Matrix_Access pshb ;save J ldab #n ;Reg B is the number of columns mul ;RegD=n*I addb 1,SP+ adca #0 ;RegD= n*I+J lsld ;RegD= 2*(n*I+J) leax D,X ;RegX= Base + 2*(n*I+J) rts
Checkpoint 6.11: What is the purpose of the adca #0 instruction in Program 6.9?
An assembly language subroutine that reads elements from a word matrix defined in column-major format is shown in Program 6.10. The number of rows, m, is defined with an equ pseudo-op. The subroutine returns the address of the element in Register X.
Program 6.10 Assembly function to access a word matrix defined in column-major format.
; The matrix is m rows by n columns ;COLUMN MAJOR Address = Base+2*(I+m*J) ;Zero-origin indexing Each element is 2 bytes m equ 10 ;the number of rows (can be changed) ;Input: Reg A is the row index(I) ; Reg B is the column index(J) ; Reg X is Base, points to the first element ;Output: Reg X points to the (I,J) element Access psha ;save I ldaa #m ;Reg A is the number of rows mul ;RegD= m*J addb 1,SP+ adca #0 ;RegD = m*J+I leax D,X ;Reg X = Base + m*J+I rts
Checkpoint 6.12: What is the largest possible value for n in Program 6.9 and the largest m in Program 6.10?
Example 6.5 Develop a set of driver functions to manipulate a 32 by 12 bit graphics display. Solution Bit arrays can be used to store pixel values for graphics displays, as in Figure 6.12. Placing a 0 into a bit location will display a blank. Placing a 1 into a bit location will display that pixel. In this display, the first byte bit 7 is the top-left corner of the display, and the last byte bit 0 is the bottom-right corner. The graphical image on this 32 by 12 display will be stored in the bit array called Video. Since there are a total of 384 bits and each byte can store 8 bits, we need 48 bytes to store the entire image. In C, we define unsigned char Video[48];
6.4 䡲 *Matrices Figure 6.12 A bit matrix with 12 rows and 32 columns.
In assembly, we define the following in global RAM, Video rmb 48
Let I be the row index, where I ranges from 0 to 11. There are four bytes required in each row. Therefore, the starting address of row I is Video + 4*I
Let J be the column index, where J ranges from 0 to 31. The column index specifies which byte, as well as which bit within that byte. The address of the byte containing the information at (I,J) is Video + 4*I + J>>3
where the divide by 8 is integer math without rounding. Notice that if J is less than or equal to 31, then J divided by 8 will be less than or equal to 3. Let K be the bottom three bits of J. K = J&0x07;
A mask will specify the bit location within the byte. In C, the following array can be used unsigned const char Masks[8]={0x80,0x40,0x20,0x10,0x08,0x04,0x02, 0x01};
In assembly, this array can be defined in ROM as Masks fcb $80,$40,$20,$10,$08,$04,$02,$01
Recall that K is the bottom three bits of J. For example, if K is 0102 then we use the bit mask of $20 to access the information stored in the appropriate byte of the Video buffer mask = Masks[K];
Program 6.11 takes the row and column index values and calculates the memory address and bit mask to access that bit in the Video matrix. Access is a private function for this module. A helper function is another name for private functions used inside a module, but is not called by software outside the module. Conversely, the other four functions of this module are public. Functions to clear, set, and toggle bits in the Video matrix are shown in Program 6.12. For all four public functions, the parameters I, J as passed by value, and the video buffer itself
208
6 䡲 Pointers and Data Structures
Program 6.11 A helper function to access a bit matrix.
; ********* Access *************** ; Access the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) ;Output: Reg X points to the byte of interest ; Reg A is the Mask to access that bit Access lsla lsla ;4*I pshb ;save a copy of J lsrb lsrb lsrb ;Reg B = J>>3 aba ;Reg A = 4*I + J>>3 ldx #Video tab abx ;Reg X = Video + 4*I + J>>3 pulb ;Reg B = J again andb #$07 ;Reg B = K (bottom three bits of J) ldy #Masks ldaa B,Y ;Reg A = mask = Masks[K] rts
is a private global within this module. A function that tests the current value within the matrix is shown in Program 6.13. In order for the image to appear on the display, there must be a hardware interface that translates the data in the video buffer onto the graphics hardware. A typical way this translation occurs is for the video buffer to exist in the display hardware itself. The software reads and writes this buffer in a similar way as described in this example. The graphics hardware is then responsible for copying the data from the buffer onto the display.
Program 6.12 Functions that modify the bit matrix.
; Clear the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_ClrBit bsr Access coma ;Not(mask) zero in bit location anda 0,x ;Clear bit staa 0,x rts ; Set the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_SetBit bsr Access oraa 0,x ;Set bit staa 0,x rts ; Invert the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) Display_InvBit bsr Access eora 0,x ;Flip bit staa 0,x rts
6.5 䡲 Structures Program 6.13 A function that reads the bit matrix.
6.5
209
; Read the Video bit at (I,J) ;Input: Reg A is the row index(I is 0 to 11) ; Reg B is the column index(J is 0 to 31) ;Output: Reg CC zero bit is the value read from the array ; Reg A is zero or not zero depending on the bit Display_ReadBit bsr Access anda 0,x ;Z=1 if bit was zero, Z=0 if bit was one rts
Structures A structure has elements with different types and/or precisions. In C, we use struct to define a structure. The const modifier causes the structure to be allocated in ROM. Without the const, the C compiler will place the structure in RAM, allowing it to be dynamically changed. In the example shown in Figure 6.13, Name is a variable length ASCII strings, but as you can see, we have to specify its maximum size. const struct port{ unsigned char AndMask; // bits that can change unsigned char OrMask; // bits that must stay high unsigned char *Addr; // Port Address unsigned char Name[10]; // ASCII string }; typedef const struct port portType; portType PortT={0x15,0x82,0x0240,”PTT”};
Figure 6.13 A structure collects objects of different sizes into one object.
$F950 $F951 $F952 $F954
$15 $82 $0240 “PTT”,0,0,0,0,0,0,0
Checkpoint 6.13: Most C compilers will align 16-bit elements within structures to an even address. How would Figure 6.13 have been different if the positions of OrMask and Addr had been reversed?
In Program 6.14, we can use the equ pseudo-op to make our software more readable. The subroutine Port_Out uses call by reference for the port structure and call by value for the data written to the port. Program 6.14 Assembly language example of a structure.
AndMask equ 0 OrMask equ AndMask+1 Addr equ OrMask+1 Name equ Addr+2 ; Reg A = data to output ; Reg X = pointer to Port structure Port_Out psha anda AndMask,x ;modify input with andmask oraa OrMask,x ;modify input with ormask ldy Addr,x ;get Port address staa 0,y ;output ldx Name,x ;pointer to string jsr OutString ;print string
continued on p. 210
210
6 䡲 Pointers and Data Structures
continued from p. 209 pula rts ;******************************** PortT fcb $15,$82 ;AndMask,OrMask fdb $0240 ;pointer to PTT fcc “PTT” ;string fcb 0,0,0,0,0,0,0 main lds #$4000 movb #$FF,DDRT ldaa #$00 ;data loop ldx #PortT ;pointer to structure bsr Port-Out inca bra loop
Without the const, the C compiler will place the structure in RAM, allowing it to be dynamically changed. If the structure resides in RAM, then the system will have to initialize it explicitly via software execution. Again, most C compilers will implicitly initialize variable structures.
6.6
*Tables A table is a collection of identically sized structures. Program 6.15 and Figure 6.14 show a table containing a simple data base. Each entry in the table records the name, life span, and the year of inauguration. The names are variable length, but a fixed size will be allocated so that each table entry will be exactly 36 bytes. The C compiler will fill the unused bytes in the Name field with zeros.
Program 6.15 A simple data base with three entries.
Figure 6.14 A table collects structures of same size into one object.
const struct entry{ unsigned char Name[30]; // null-terminated string unsigned short life[2]; // birth year, year died unsigned short year; // year of inauguration }; typedef const struct entry entryType; entryType Presidents[3]={ {“George Washington”,{1732,1799},1789}, {“John Adams”,{1735,1826},1797}, {“Thomas Jefferson”,{1743,1826},1801} };
To access the Inauguration year of the second president in C, we could execute theyear = Presidents[1].year;
This operation in assembly is ldd std
Presidents+SIZE+YEAR theyear
If we wanted the year the third president died in C, we could execute theyear = Presidents[2].life[1];
This operation in assembly is ldd std
President+2*SIZE+LIFE+2 theyear
Program 6.17 shows an assembly language function that prints the name of the nth president. First it calculates the address of the nth entry (Presidentsn*SIZE). In general, the next step would be to add the offset (in this case NAME is zero). This program assumes SIZE*n is less than 256. Program 6.17 A subroutine that prints the name of a president.
;Print the name of the nth entry ;Reg A is the index n ranging from 0 to 2 OutPresident ldx #Presidents ;Reg X points to the table ldab #SIZE ;36 bytes in each entry mul ;Reg D = SIZE*n abx ;Reg X = base +SIZE*n jsr OutString ;Prints name rts
The table, shown in Program 6.18, contains five identically formated structures. Each structure (e.g., PORTA) contains five entries: an 8-bit ASCII character, two pointers, and two byte values. Again, the equ pseudo-ops clarify access the table. It could be used to write a I/O port driver, separating the high-level software from the low-level hardware.
212
6 䡲 Pointers and Data Structures
Program 6.18 A table containing the information about the some 9S12 I/O ports.
;ASCII character specifying port name ;Pointer to port address ;Pointer to direction register ;8-bit initial value of direction reg ;8-bit initial value to output ;Port A ;Address of PortA ;DDRA ;Initially input ;Port B ;Address of PortB ;DDRB ;Initially output=$55 ;Port J ;Address of PortJ ;DDRJ ;Initially input ;Port M ;Address of PortM ;DDRM ;Initially input ;Port T ;Address of PortT ;DDRT ;Initially output=$00
*Trees A graph is a general linked structure without limitations, see Figure 6.15. An acyclic graph is a linked structure without loops. Although there may be multiple pathways to access a node in an acyclic graph, all paths have a finite length. A tree is an acyclic graph with a single root node from which a unique path to each node can be traced. The pointers in an acyclic graph do not form closed circles.
Figure 6.15 Graphs and trees have nodes and are linked with pointers.
Graph
Acyclic graph
Tree
Figure 6.16 shows an arbitrary tree can have a variable number of leaves, while a binary tree consists of node with exactly two pointers (i.e., links or branches.) One way to implement an arbitrary tree is to place a null-terminated list of pointers in each node. For the binary tree, each node has exactly two links, and we use a null pointer to specify the link is not valid. Checkpoint 6.15: Neglecting shortcuts and the StartMenu for now, what type of organization best describes the file structure on the Windows OS? Checkpoint 6.16: Shortcuts and the StartMenu on the Windows OS allow for files and programs to be accessed in multiple ways. Observe the properties of a shortcut on your computer. Does Windows OS implement an acyclic graph? Checkpoint 6.17: If you made an electronic dictionary where each word in the definition portion of an entry was linked to its definition, what type of structure would you have?
6.7 䡲 *Trees Figure 6.16 A tree can be constructed with only down arrows, and there is a unique path to each node.
Binary tree
Arbitrary tree Root
213
Root
Lists with 0 1,2,... links
Info
Info Info
Lists with exactly 2 links
Info null
null
Info
Info null null
Info
Info
null
Info
null
Info null null
null
Info null null
Info null null
A null pointer signifies the end or leaf of the tree. Since each node of a tree has exactly one pointer to it, there is a unique path from the root to each node. One application of a tree is dictionary storage, as shown in Figure 6.17. Each word is stored as a node in the tree. The position of each word in the tree is determined from its alphabetical order. In this simple dictionary, each node contains a name that is a single letter and a value that is an 8-bit number. The binary tree is sorted by name, meaning elements alphabetically before this node can be found using the first link, and elements alphabetically after this node can be found using the second link. Figure 6.17 A binary tree is constructed so that earlier elements are to the left and later ones to the right.
Root S
F
$88
$84
V
null A
$8B
$86
null T
null
null
null
null
$8A
Program 6.19 shows the definition of the tree structure drawn in Figure 6.17. If the dictionary is static, then we can define it in ROM. If it needs to be dynamic, then it must be allocated in RAM and initialized at run time. In Program 6.19, the tree is implemented as a constant structure. Name Data Left Right Root NULL
equ equ equ equ equ equ
0 1 2 4 WS 0
;name of the node ;data for this node ;pointer to son ;pointer to son ;Pointer to top ;undefined address
‘S’,$88 ;name,data WF ;Left son WV ;Right son ‘V’,$86 WT ;WT is a left son NULL ;no right son ‘T’,$8A NULL ;no children NULL ‘F’,$84 WA ;WA is a left son NULL ;no right son ‘A’,$8B NULL ;no children NULL
Program 6.20 presents assembly and C functions that search the binary tree. To look up a word in this dictionary, one starts at the root. The following sequence is repeated until the entry is found (success) or a null point is reached (failure). If the current name matches, then it quits returning the data (its definition) at that node. If the current word is not correct, then we will search left or right. If the look up word is less than the current word, go left. If the look up word is greater than the current word, go right. The program quits with a false result if the pointer becomes null.
;Inputs: Reg A = look up letter ;Outputs: Reg A=0 if not found, ; =data if found ; If fails RegY=>last link Look ldy #Root ldx 0,y ;start at root loop cpx #NULL beq fail cmpa Value,x ;Match beq found ;Skip if found blo golft leay Right,x ;letter>value ldx 0,y ;go right bra loop golft leay Left,x ;letter
unsigned char Look(unsigned char letter){ NodePtr pt = Root; // top while(pt!=NULL){ // done when null if(pt->Value == letter){ return(pt->Data); // good } if(pt->Value < letter){ pt = pt->Right; } else{ pt = pt->Left; } } return 0; /* not in tree */ }
Program 6.20 Binary tree search functions.
In order to add and remove nodes at run time, the tree must be defined in RAM. Program 6.21 shows how to insert a new word into the dictionary. One first searches for the word (the search should fail), then change the null pointer to point to the new list. If the search fails in the previous Look subroutine, Reg Y contains the address of the null pointer to be changed.
6.7 䡲 *Trees Program 6.21 Program to add a node to a binary tree.
215
; Inputs : Reg Y points to a new word to be added to the dictionary ; the new word is already somewhere in memory formatted e.g., ; fcb ‘J’,6 ; fdb NULL ; fdb NULL New pshy ldaa 0,Y ;Reg A is the name of the new word bsr Look pulx ;RegX points to new node to add tsta bne ok ;skip if already defined stx 0,Y ;link into existing tree OK rts
Figure 6.18 shows the binary tree as the nodes J, U, and G are added to the dictionary. Notice, that after the J and U are added, there is something inefficient about this tree of depth 4 and size 7. A binary tree of depth n is capable of holding 2n 1 nodes. A binary tree is full if it has depth n and contains from 2n1 to 2n 1 nodes. The tree in Figure 6.18 after J and U are added is not full; all the other three trees are full. Figure 6.18 Nodes are added to a binary tree such that the alphabetical order is maintained.
add J
Initial tree S
S
F null
A null
V null
T
null null
F
null
add G
F
null null null null null
F null
T U
null null
null
T
S
V J
J
null null null null null null
add U
S
A
A
V
A null null
V J
G
null null
null null
null
T U
null null
This may seem like a lot of trouble for such a simple problem. However, the search time for a binary tree increases as the log2 of the size of the dictionary (more precisely, the search time increases linearly with the depth of the tree). For a simple linear structure (e.g., table or linked list), the search time increases linearly with the dictionary size. When the dictionary is millions of words, the time savings can be extraordinary. There are similar savings in the insertion and deletion times. The dynamic efficiency (execution speed) is enhanced at the cost of static efficiency (memory storage.) Checkpoint 6.18: Consider the problem of designing a large address book where each entry as a first name, a last name, and an address field. You wish to be able to search the data base both by first name and by last name. How do you organize the structure?
216
6.8
6 䡲 Pointers and Data Structures
Finite-State Machines with Statically Allocated Linked Structures 6.8.1 Abstraction
Software abstraction allows us to define a complex problem with a set of basic abstract principles. If we can construct our software system using these abstract building blocks, then we have a better understanding of both the problem and its solution. This is because we can separate what we are doing (policies) from the details of how we are getting it done (mechanisms.) This separation also makes it is easier to optimize. Abstraction provides for a proof of correct function, and simplifies both extensions and customization. The abstraction presented in this section is the Finite-State Machine (FSM.) The abstract principles of FSM development are the inputs, outputs, states, and state transitions. The FSM state graph defines the time-dependent relationship between its inputs and outputs. If we can take a complex problem and map it into a FSM model, then we can solve it with a simple FSM software tools. Our FSM software implementation will be easy to understand, debug, and modify. Other examples of software abstraction include Proportional Integral Derivative digital controllers, fuzzy logic digital controllers, neural networks, and linear systems of differential equations. In each case, the problem is mapped into well-defined model with a set of abstract yet powerful rules. Then, the software solution is a matter of implementing the rules of the model. In our case, once we prove our software correctly solves one FSM, then we can make changes to the state graph and be confident that our software solution correctly implements the new FSM. The FSM controller employs a well-defined model or framework with which we solve our problem. The state graph will be specified using either a linked or table data structure. An important aspect of this method is to create a one-to-one mapping from the state graph into the data structure. The three advantages of this abstraction are (1) it can be faster to develop because many of the building blocks preexist; (2) it is easier to debug (prove correct) because it separates conceptual issues from implementation; and (3) it is easier to change. In a Moore FSM, the output value depends only on the current state, and the inputs affect the state transitions. On the other hand, the outputs of a Mealy FSM depend both on the current state and the inputs. When designing a FSM, we begin by defining what constitutes a state. In a simple system like a single intersection traffic light, a state might be defined as the pattern of lights (i.e., which lights are on and which are off). In a more sophisticated traffic controller, what it means to be in a state might also include information about traffic volume at this and other adjacent intersections. The next step is to make a list of the various states in which the system might exist. As in all designs, we add outputs so the system can affect the external environment and inputs so the system can collect information about its environment or receive commands as needed. The execution of a Moore FSM repeats this sequence over and over: 1. 2. 3. 4.
Perform output, which depends on the current state Wait a prescribed amount of time (optional) Input Go to next state, which depends on the input and the current state
The execution of a Mealy FSM repeats this sequence over and over 1. 2. 3. 4.
Wait a prescribed amount of time (optional) Input Perform output, which depends on the input and the current state Go to next state, which depends on the input and the current state
There are other possible execution sequences. Therefore, it is important to document the sequence before the state graph is drawn. The high-level behavior of the system is defined by the state graph. The states are drawn as circles. Descriptive states names help explain what the machine is doing. Arrows are drawn from one state to another and labeled with the input value causing that state transition.
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
217
Observation: If the machine is such that a specific output value is necessary “to be a state”, then a Moore implementation will be more appropriate. Observation: If the machine is such that no specific output value is necessary “to be a state”, but rather the output is required to transition the machine from one state to the next, then a Mealy implementation will be more appropriate.
A linked structure consists of multiple identically structured nodes. Each node of the linked structure defines one state. One or more of the entries in the node is a pointer (or link) to other nodes. In an embedded system, we usually use statically allocated fixed-size linked structures, which are defined at compile time and exist through out the life of the software. In a simple embedded system, the state graph is fixed, so we can store the linked data structure in nonvolatile memory. For complex systems where the control functions change dynamically (e.g., the state graph itself varies over time), we could implement dynamically allocated linked structures, which are constructed at run time and where the number of nodes can grow and shrink in time. We can also use a table structure to define the state graph, which consists of contiguous multiple, identically structured elements. Each element of the table defines one state. One or more of the entries is an index to other elements. An important factor when implementing FSMs is that there should be a clear and one-to-one mapping between the FSM state graph and the data structure. I.e., there should be one element of the structure for each state. If each state has four arrows, then each node of the linked structure should have four links.
6.8.2 Moore FiniteState Machines
A Moore FSM has the outputs a function of only the current state. In constrast, the outputs are a function of both the input and the current state in a Mealy FSM. Often, in a Moore FSM, the specific output pattern defines what it means to be in the current state. In the first example, the inputs and outputs are simple binary numbers read from and written to a parallel port.
Example 6.6 Design a traffic-light controller for the intersection of two equally busy oneway streets. The goal is to maximize traffic flow, minimize waiting time at a red light, and avoid accidents. Solution The intersection has two one-ways roads with the same amount of traffic: North and East, as shown in Figure 6.19. Controlling traffic is a good example, because we all know what is supposed to happen at the intersection of two busy one-way streets. We begin the design defining what constitutes a state. In this system, a state describes which road has authority to cross the intersection. The basic idea, of course, is to prevent Southbound cars to enter the intersection at the same time as Westbound cars. In this system, the light pattern defines which road has right of way over the other. Since an output pattern to the lights is necessary to remain in a state, we will solve this system with a Moore FSM. It will have two inputs (car sensors on North and East roads) and six outputs (one for each light in the traffic Figure 6.19 Traffic light interface.
9S12
PT1 PT0 PT7 PT6 PT5 PT4 PT3 PT2
North R Y G
East R Y G
218
6 䡲 Pointers and Data Structures
signal.) The six traffic lights are interfaced to Port T bits 7 to 2 and the two sensors are connected to Port T bits 1 to 0, such that PT1 0, PT0 0 means no cars exist on either road PT1 0, PT0 1 means there are cars on the East road PT1 1, PT0 0 means there are cars on the North road PT1 1, PT0 1 means there are cars on both roads The next step in designing the FSM is to create some states. Again, the Moore implementation was chosen because the output pattern (which lights are on) defines which state we are in. Each state is given a symbolic name: goN, waitN, goE, waitE,
PT7 to 2 100001 makes it green on North and red on East PT7 to 2 100010 makes it yellow on North and red on East PT7 to 2 001100 makes it red on North and green on East PT7 to 2 010100 makes it red on North and yellow on East
The output pattern for each state is drawn inside the state circle. The time to wait for each state is also included. How the machine operates will be dictated by the input-dependent state transitions. We create decision rules defining what to do for each possible input and for each state. For this design we can list heuristics describing how the traffic light is to operate: If no cars are coming, we will stay in a green state, but which one doesn’t matter. To change from green to red, we will implement a yellow light of exactly 5 seconds. Green lights will last at least 30 seconds. If cars are only coming in one direction, we will move to and stay green in that direction. If cars are coming in both directions, we will cycle through all four states. Before we draw the state graph, we need to decide on the sequence of operations. 1. Initialize timer and directions registers 2. Specify initial state 3. Perform FSM controller a) Output to traffic lights, which depends on the state b) Delay, which depends on the state c) Input from sensors d) Change states, which depends on the state and the input We implement the heuristics by defining the state transitions, as illustrated in Figure 6.20. Instead of using a graph to define the finite-state machine, we could have used a table, as shown in Table 6.4.
Figure 6.20 Graphical form of a Moore FSM that implements a traffic light.
Next if input is 01 or 11 00,10
goN 100001 30
00,01, 10,11
01,11
waitN 100010 5
Wait time
Table 6.4 Tabular form of a Moore FSM that implements a traffic light.
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
219
The next step is to map the FSM graph onto a data structure that can be stored in EEPROM. Program 6.22 uses a linked data structure, where each state is a node, and state transitions are defined as pointers to other nodes. The four Next parameters define the input-dependent state transitions. The wait times are defined in the software as fixed-point decimal numbers with units of 0.01 seconds, giving a range of 10 ms to about 10 minutes. Using good labels makes the program easier to understand; in other words, goN is more descriptive than &fsm[0]. The main program begins by specifying the Port T bits 1 and 0 to be inputs. The initial state is defined as goN. The main loop of our controller first outputs the desired light pattern to the six LEDs, waits for the specified amount of time, reads the sensor inputs from Port T, then switches to the next state depending on the input data. The timer functions were presented earlier as Program 4.5. The function Timer_Wait10ms will wait 10 ms times the parameter in RegY, and not destroy Registers D or X. We could have eliminated the two shift-left instructions by storing the data in the structure already shifted.
;Linked data structure org $4000 ;Put in ROM OUT equ 0 ;offset for output WAIT equ 1 ;offset for time NEXT equ 3 ;offset for next goN fcb $21 ;North green, East red fdb 3000 ;30sec fdb goN,waitN,goN,waitN waitN fcb $22 ;North yellow, East red fdb 500 ;5sec fdb goE,goE,goE,goE goE fcb $0C ;North red, East green fdb 3000 ;30 sec fdb goE,goE,waitE,waitE waitE fcb $14 ;North red, East yellow fdb 500 ;5sec fdb goN,goN,goN,goN Main lds #$4000 ;stack init bsr Timer_Init ;enable TCNT ldaa #$FC ;PT7-2 are lights staa DDRT ;PT1-0 are sensors ldx #goN ;State pointer FSM ldab OUT,x ;Output value lslb lslb ;line up with 7-2 stab PTT ;set lights ldy WAIT,x ;Time delay bsr Timer_Wait10ms ldab PTT ;Read input andb #$03 ;just bits 1,0 lslb ;2 bytes/address abx ;add 0,2,4,6 ldx NEXT,x ;Next state bra FSM org $FFFE fdb Main ;reset vector
// Linked data structure const struct State { unsigned char Out; unsigned short Time; const struct State *Next[4];}; typedef const struct State STyp; #define goN &FSM[0] #define waitN &FSM[1] #define goE &FSM[2] #define waitE &FSM[3] STyp FSM[4]={ {0x21,3000,{goN,waitN,goN,waitN}}, {0x22, 500,{goE,goE,goE,goE}}, {0x0C,3000,{goE,goE,waitE,waitE}}, {0x14, 500,{goN,goN,goN,goN}}}; void main(void){ STyp *Pt; // state pointer unsigned char Input; Timer_Init(); DDRT = 0xFC; // lights and sensors Pt = goN; while(1){ PTT = Pt->Out<<2; // set lights Timer_Wait10ms(Pt->Time); Input = PTT&0x03; // read sensors Pt = Pt->Next[Input]; } }
Program 6.22 Linked data structure implementation of the traffic-light controller.
220
6 䡲 Pointers and Data Structures
Program 6.23 implements the same traffic-light controller using a table data structure. In the linked data structure implementation, the Next parameters contained 16-bit pointers to the next state. In the table implementation, the Next parameters contain 8-bit indices specifying the index of the next state. In this machine, the Next field will be 0, 1, 2, or 3. Although each state only requires 7 bytes of storage, 8 bytes will be allocated to simplify the address calculations (it is easier to multiply by 8 than to multiply by 7). ;Table structure org $4000 ; Put in ROM OUT equ 1 ;offset for output WAIT equ 2 ;offset for time NEXT equ 4 ;offset for next goN equ 0 ;North green, East red Fsm fdb $21,3000 ;30sec fcb goN,waitN,goN,waitN waitN equ 1 ;North yellow, East red fdb $22,500 ;5sec fcb goE,goE,goE,goE goE equ 2 ;North red, East green fdb $0C,3000 ;30 sec fcb goE,goE,waitE,waitE waitE equ 3 ;North red, East yellow fdb $14,500 ;5sec fcb goN,goN,goN,goN Main lds #$4000 ;stack init bsr Timer_Init ;enable TCNT ldaa #$FC ;PT7-2 are lights staa DDRT ;PT1-0 are sensors ldab #goN ;State number n FSM ldx #Fsm tba lsla lsla lsla ;8*n leax a,x ;Fsm[n] ldaa OUT,x ;Output value lsla lsla ;line up with 7-2 staa PTT ;set lights ldy WAIT,x ;Time delay bsr Timer_Wait10ms ldaa PTT ;Read input anda #$03 ;just bits 1,0 leax a,x ;add 0,1,2,3 ldab NEXT,x ;Next state bra FSM org $FFFE fdb Main ;reset vector
// Table implementation const struct State { unsigned char Out; unsigned short Time; unsigned char Next[4];}; typedef const struct State STyp; #define goN 0 #define waitN 1 #define goE 2 #define waitE 3 STyp FSM[4]={ {0x21,3000,{goN,waitN,goN,waitN}}, {0x22, 500,{goE,goE,goE,goE}}, {0x0C,3000,{goE,goE,waitE,waitE}}, {0x14, 500,{goN,goN,goN,goN}}}; void main(void){ unsigned char n; // state number unsigned char Input; Timer_Init(); DDRT = 0xFC; // lights and sensors n = goN; while(1){ PTT = FSM[n].Out<2; // set lights Timer_Wait10ms(FSM[n].Time); Input = PTT&0x03; // read sensors n = FSM[n].Next[Input]; } }
Program 6.23 Table data structure implementation of the traffic light controller. Observation: The table implementation requires less memory space for the FSM data structure, but the pointer implementation will run faster.
In order to make it easier to understand, which will simplify verification and modification, we have made a one-to-one correspondence between the state graph in Figure 6.20
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
221
and the fsm[4] data structure in Programs 6.22 and 6.23. Notice also how this implementation separates the civil engineering policies (the data structure specifies what the machine does), from the computer engineering mechanisms (the executing software specifies how it is done). Once we have proven the executing software to be operational, we can modify the policies and be confident that the mechanisms will still work. When an accident occurs, we can blame the civil engineer that designed the state graph. On microcontrollers that have EEPROM, we can place the FSM data structure in EEPROM. This allows us to make minor modifications to the finite state machine (add/delete states or change input/output values) by changing the data structure. In this way, small modifications/upgrades/options to the finite state machine can be made by reprogramming the EEPROM reusing the hardware. The FSM approach makes it easy to change. To change the wait time for a state, we simply change the value in the data structure. To add more states (e.g., put a red/red state after each yellow state, which will reduce accidents caused by bad drivers running the yellow light), we simply increase the size of the fsm[] structure and define the Out, Time, and Next fields for these new states. To add more output signals (e.g., walk and left-turn lights), we simply increase the precision of the Out field. To add two more input lines (e.g., wait button, left-turn car sensor), we increase the size of the next field to Next[16]. Because now there are four input lines, there are 16 possible combinations, where each input possibility requires a Next value specifying where to go if this combination occurs. In this simple scheme, the size of the Next[] field will be 2 raised to the power of the number of input signals. Checkpoint 6.19: Why is it good to use labels for the states? E.g., goN is better than &fsm[0]. Observation: In order to make the FSM respond quicker, we could implement a time-delay function that returns immediately if an alarm condition occurs. If no alarm exists, it waits the specified delay.
6.8.3 Mealy FiniteState Machines
A Mealy FSM has the outputs that depend on both the input and the current state. The state transition arrows in a Mealy FSM specify both the output and the next state. In general, we employ a Mealy machine when the output is needed to cause the state to change.
Example 6.7 Design a four-legged trotting robot. The robot has four independent legs and a trot/stop switch. The machine should trot in a steady, straight line when switch is pressed, and stop when the switch is released. Solution The solution minimics the behavior of a trotting horse. Each leg is controlled by four motors, and the 16 motors are connected to Ports A and B. Figure 6.21 shows the eight motors for the right front and right rear legs connected to Port A. There is a second set of eight motors for the left front and left rear legs connected to Port B. Each motor affects the leg like a muscle group in the horse. For example, there is one muscle group that causes the horse’s leg to move forward and a separate muscle group that causes the leg to move backward. Just like the horse, we will not activate the forward and backward motors at the same time. When the software outputs a digital high, the TIP120 Darlington transistor goes into its active state, and power is applied to the motor. We will assume it takes 0.5 second for a motor to cause its intended motion. For example, assume the front left leg is initially down. If the software makes PB5 equal to one, waits 0.5 second, then puts PB5 back to zero, the front left leg will raise up. We begin the design defining what constitutes a state. In this system, a state describes the position of the four legs. Each leg can be in one of four positions: up&forward, up&back, down&forward, and down&back. Since there are four legs, a total of 16 states are possible.
222
6 䡲 Pointers and Data Structures
Figure 6.21 Each of the four legs has four motors that can move the leg back, forward, up, and down. The right legs are shown here connected to Port A. The left legs employ similar circuits connected to Port B.
Front right leg
Rear right leg
+12V PA7
1kΩ
back
+12V PA3
1kΩ
TIP120
+5V
back Trot
TIP120 PT0
PA6
1kΩ
+12V fwrd
PA2
1kΩ
TIP120
PA5
1kΩ
TIP120 +12V
up
PA1
1kΩ
TIP120
PA4
1kΩ
10kΩ
+12V fwrd
+12V up TIP120
+12V down
PA0
TIP120
1kΩ
8 = back 4 = forward 2 = up 1 = down
+12V down TIP120
However, some of these configurations, like all four legs up, will not be needed. A very simple gait (modeled after a trotting horse) can be described with just four states, as shown in Figure 6.22. Since an output pattern to the motors is necessary to cause a change in state, we will solve this system with a Mealy FSM. It will have one input (Trot 1 and Stop 0) and 16 outputs (four motor commands for each leg.) At any given time only two legs are touching the ground, so the machine will need to balance. Each leg repeats a 4-step cycle: up, forward, down, back. It is during the backward motion that force is applied to the ground causing the robot to move forward. Before we draw the state graph, we need to decide on the sequence of operations: 1. Initialize timer and directions registers 2. Specify initial state 3. Perform FSM controller a) Input from sensors b) Output to motors, which depends on the input and the state c) Delay, which depends on the state d) Turn the motors off e) Change states, which depends on the state and the input Figure 6.22 A Mealy FSM.
Wait 0.5 sec after each output
Next if input is 0 Next if input is 1
0/0000 1/8484
0/0000 Trot1
Trot2
Outputs Right front Right rear Left front Left Rear 0/0000 1/2121
8 = back 4 = forward 2 = up 1 = down 0/0000 1/4848
Trot3
Trot4
1/1212
The next step is to map the FSM graph onto a data structure that can be stored in EEPROM. Program 6.24 uses a linked data structure, where each state is a node, and state transitions are defined as pointers to other nodes. The two Next parameters define the input-dependent state transitions. The wait time in this machine is 0.5 seconds. The main program begins by specifying the Ports A and B to be outputs, and PT0 to be an input. Since Port A is at location $0000 and Port B is at location $0001, these
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
223
two 8-bit ports can be considered as one 16-bit port. This allows the software to set all 16 outputs at the same time. The initial state is defined as Trot1. Our controller software first inputs, then outputs depending on the input and state, waits for the specified amount of time, stops all motors, then switches to the next state depending on the input data. The function Timer_Wait1ms will wait 1 ms times the parameter in RegY and not destroy RegX.
org PORTAB equ Time equ Out equ Next equ Trot1 fdb fdb fdb Trot2 fdb fdb fdb Trot3 fdb fdb fdb Trot4 fdb fdb fdb main lds movb movb bclr jsr ldx loop ldaa anda lsla ldy leax ldd std jsr movw ldx bra org fdb
$4000 ;Put in ROM $0000 ;16-bit Port A and B 0 ;time to wait in ms 2 ;output if input=0,1 6 ;next if input=0,1 500 ;Time to wait in ms 0,$8484 ;Outputs if inputs=0,1 Trot1,Trot2 ;Next if input=0,1 500 ;Time to wait in ms 0,$2121 ;Outputs if inputs=0,1 Trot2,Trot3 ;Next if input=0,1 500 ;Time to wait in ms 0,$4848 ;Outputs if inputs=0,1 Trot3,Trot4 ;Next if input=0,1 500 ;Time to wait in ms 0,$1212 ;Outputs if inputs=0,1 Trot4,Trot1 ;Next if input=0,1 #$4000 #$FF,DDRA ;Right legs #$FF,DDRB ;Left legs DDRT,#$01 ;Trot switch Timer_Init #Trot1 ;Reg X => state PTT #$01 ;0,1 ;0,2 Time,X ;Time to wait a,x Out,x PORTAB ;start motors Timer_Wait1ms ;wait in ms #0,PORTAB ;stop motors Next,X ;next loop $FFFE main
struct State{ unsigned short Time; // wait in ms unsigned short Out[2]; // if input=0,1 struct State *Next[2]; // if input=0,1 }; typedef struct State StateType; typedef StateType * StatePtr; #define Trot1 &fsm[0] #define Trot2 &fsm[1] #define Trot3 &fsm[2] #define Trot4 &fsm[3] StateType fsm[4]={ {500,{0,0x8484},{ Trot1, Trot2}}, {500,{0,0x2121},{ Trot2, Trot3}}, {500,{0,0x4848},{ Trot3, Trot4}}, {500,{0,0x1212},{ Trot4, Trot1}} }; void main(void){ StatePtr Pt; // Current State unsigned char Input; Pt = Trot1; // Initial State DDRA = 0xFF; // Right legs DDRB = 0xFF; // Left legs DDRT &= ~0x01; // Trot switch Timer_Init(); while(1){ Input = PTT&0x01; // 0 or 1 PORTAB = Pt->Out[Input]; // output Timer_Wait1ms(Pt->Time); // wait PORTAB = 0; // motors off Pt = Pt->Next[Input]; // next } }
Program 6.24 Mealy FSM.
6.8.4 Functional Abstraction within FiniteState Machines
In the previous examples, the input was obtained by simply reading a parallel port. Similarly, the output was performed by writing to a parallel port. However, finite-state machines can be used in systems where the input and output processes are more complex. In this section, we will develop FSMs where the input is obtained by calling a function, which returns a number to be used by the FSM controller. Similarly, the output process will involve calling a function. The use of function calls adds a layer of abstraction between the high-level FSM and the low-level I/O occurring at the ports.
224
6 䡲 Pointers and Data Structures
Example 6.8 Design a vending machine with two outputs (soda, change) and two inputs (dime, nickel). Solution This vending machine example illustrates additional flexibility that we can build into our FSM implementations. In particular, rather than simple digital inputs, we will create an input subroutine that returns the current values of the inputs. Similarly, rather than simple digital outputs, we will implement general functions for each state. We could have solved this particular vending machine using the approach in the previous examples, but this approach provides an alternative mechanism when the input and/or output operations become complex. Our simple vending machine has two coin sensors: one for dimes and one for nickels. When a coin falls through a slot in the front of the machine, an electrical connection (modeled by a SPST switch) makes a connection between 5 V and a Port A input, as in Figure 6.23. If the digital input is high (1), this means there is a coin currently falling through the slot. When a coin is inserted into the machine, the sensor goes high, then low. Because of the nature of vending machines, we will assume there can not be both a nickel and a dime at the same time. To create the soda and change dispensers, we will interface two solenoids to Port B. The coil current of the solenoids is less than 40 mA, so we can use the 7406 open collector driver. For example, if the software makes PB0 high, waits 10 ms, then makes PB0 low, one soda will be dispensed. Figure 6.23 A simulated vending machine interfaced to a Freescale 9S12.
9S12
10kΩ
Input PA1 Port PA0
dime nickel
+5 +5
7406
10kΩ +12 1N914
Solenoid change
Output Port PB1 +12 1N914
Solenoid soda
PB0
We need to decide on the sequence of operations before we draw the state graph: 1. Initialize timer and directions registers 2. Specify initial state 3. Perform FSM controller a) Call an output function, which depends on the state b) Delay, which depends on the state c) Call an input subroutine to get the status of the coin sensors d) Change states, which depends on the state and the input Figure 6.24 shows the Moore FSM that implements the vending machine. A soda costs 15 cents, and the machine accepts nickels and dimes. We have an input sensor to detect nickels (bit 0) and an input sensor to detect dimes (bit 1.) We choose the wait time in each state to be 20 ms, which is smaller than the time it takes the coin to pass by the sensor. Waiting in each state will debounce the sensor, preventing multiple counting of a single event. Notice that we wait in all states, because the sensor may bounce both on touch and release. Each state also has a function to execute. The function Soda will trigger the Port B output so that a soda is dispensed. Similarly, the function Change will trigger the Port B output so that a nickel is returned. The M states refer to the amount of collected money. When we are in a W state, we have collected that much money, but we’re still waiting for the last coin to pass the sensor. For example, we start with no money in state M0. If we insert a dime, the input will
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
225
go 102, and our state machine will jump to state W10. We will stay in state W10 until the dime passes by the coin sensor. In particular when the input goes to 00, then we go to state M10. If we insert a second dime, the input will go 102, and our state machine will jump to state W20. Again, we will stay in state W20 until this dime passes. When the input goes to 00, then we go to state M20. Now we call the function change and jump to state M15. Lastly, we call the function Soda and jump back to state M0. Figure 6.24 This Moore FSM implements a vending machine.
00
10 01 M0 20 none
01,10
Function 00,01,10 Wait time M15 20 soda
W5 20 none
01,10
00 00
01
M5 20 none 10
00,01,10 M20 20 change
W10 20 none
00 M10 20 none 10 01
00
01,10 00 W20 20 none
W15 20 00 none 01,10
Since this is a layered system, we will begin by designing the low-level input/output functions that handle the operation of the sensors and solenoid, as in Program 6.25. Coin_Init bclr DDRA,#$03 ;PA1,0 sensor in rts Coin_Input ;0 means none ldaa PORTA ;1 means nickel anda #$03 ;2 means dime rts Solenoid_Init bset DDRB,#$03 ;PB1,0 solenoid out rts Solenoid_None rts Solenoid_Soda bset PORTB,#$01 ;activate solenoid ldd #10000 jsr Timer_Wait ;10 msec bclr PORTB,#$01 ;deactivate rts Solenoid_Change bset PORTB,#$02 ;activate solenoid ldd #10000 jsr Timer_Wait ;10 msec bclr PORTB,#$02 ;deactivate rts
Program 6.25 Low-level input/output functions for the vending machine.
The main program, Program 6.26, begins by specifying the Port A bits 1 and 0 to be inputs. The initial state is defined as M0. Our controller software first calls the function for this state, waits for the specified amount of time, reads the sensor inputs from PORTA, then switches to the next state depending on the input data. The Timer_Wait function is defined previously. Notice again the one-to-one correspondence between the state graph in Figure 6.24 and the data structure in Program 6.26.
226 CmdPt Time Next M0 W5 M5 W10 M10 W15 M15 W20 M20 main
The next example involves a Mealy FSM with both the input and output processes being performed using function calls. The example also abstracts the high-level FSM from the low-level I/O.
Example 6.9 Design a robot that sits, stands, and lies down (depending on its mood, which can be OK, tired, curious, or anxious).
6.8 䡲 Finite-State Machines with Statically Allocated Linked Structures
227
Solution The goal of this section is to design a robot controller, as illustrated in Figure 6.25. We begin the design defining what constitutes a state. In this system, a state describes the position of robot: standing, sitting, or sleeping. Since the outputs are necessary to cause a change in state, we will solve this system with a Mealy FSM. Rather than generate the output as a simply write to a port, the outputs on this robot will be defined as abstract functions, which perform a sequence of operations as needed to complete the task. The output functions are None, it performs no movement SitDown, assuming the robot is standing, it will perform a sequence of moves to sit down StandUp, assuming the robot is sitting, it will perform a sequence of moves to stand up LieDown, assuming the robot is sitting, it will perform a sequence of moves to lie down SitUp, assuming the robot is sleeping, it will perform a sequence of moves to sit up This robot has mood sensors, which are read and processed at the low level. There is an abstract input function, called Sensor_Input, which returns one of four possible conditions 00 01 10 11
OK, the robot is feeling fine Tired, the robot energy levels are low Curious, the robot senses activity around it Anxious, the robot senses danger
Before we draw the state graph, we need to decide on the sequence of operations: 1. Initialize inputs and outputs 2. Specify initial state 3. Perform FSM controller a) Call the Sensor_Input function to determine the current mode b) Call the appropriate robot output function, which depends on the input and the state c) Change states, which depends on the state and the input Figure 6.25 Robot interface. Inputs 9S12
Outputs
The outputs (which output function to call) depend on both the input and the current state. For this design, we can list heuristics describing how the robot is to operate: If the robot is OK, we will stay in whichever state we are currently in. If the robot’s energy levels are low (tired), it will go to sleep. If the robot senses activity around it (curious), it will awaken from sleep. If the robot senses danger (anxious), it will stand up. These rules are converted into a finite-state machine graph, as shown in Figure 6.26. Each arrow specifies both an input and an output. For example, the “Tired/SitDown” arrow from Standing to Sitting states means if we are in the Standing state and the input is Tired, then we will call the SitDown function and go to the Sitting state. Mealy machines can have time delays, this example just didn’t have time delays. The next step is to define the FSM graph using a linked data structure. Program 6.27 shows the implementation of the Mealy FSM using abstract functions to perform the input and output. Pointers to the functions are stored in the output field of the data structrure. Similar to the other FSM implementations, the four Next parameters define the input-dependent state transitions.
228
6 䡲 Pointers and Data Structures
Figure 6.26 Mealy FSM for a robot controller.
Tired/SitDown Curious/None Anxious/None OK/None
Tired/LieDown Tired/None OK/None
Curious/None OK/None Standing
Sitting
Anxious/StandUp
;Input/output defined as functions org $4000 ;EEPROM Out equ 0 ;Pointers to functions Next equ 8 ;Next states Standing fdb None,SitDown,None,None fdb Standing,Sitting,Standing,Standing Sitting fdb None,LieDown,None,StandUp fdb Sitting,Sleeping,Sitting,Standing Sleeping fdb None,None,SitUp,SitUp fdb Sleeping,Sleeping,Sitting,Sitting Main
LL
lds jsr ldx jsr lsla leax jsr ldx bra org fdb
#$4000 Robot_Init #Standing ;current state Sensor_Input ;0,1,2,3 ;0,2,4,6 a,x ;Base+2*input [Out,x] ;Call output function Next,x LL ;Infinite loop $FFFE Main ;reset vector
Sleeping Anxious/SitUp Curious/SitUp
// Input/outputs defined as functions const struct State{ void (*CmdPt)[4](void); // outputs const struct State *Next[4]; // Next }; typedef const struct State StateType; #define Standing &fsm[0] #define Sitting &fsm[1] #define Sleeping &fsm[2] StateType FSM[3]={ {{&None,&SitDown,&None,&None}, //Standing {Standing,Sitting,Standing,Standing}}, {{&None,&LieDown,&None,&StandUp},//Sitting {Sitting,Sleeping,Sitting,Standing }}, {{&None,&None,&SitUp,&SitUp}, //Sleeping {Sleeping,Sleeping,Sitting,Sitting}} }; void main(void){ StatePtr *Pt; // Current State unsigned char Input; Robot_Init(); // initialize hardware Pt = Standing; // Initial State while(1){ Input = Sensor_Input(); // Input=0-3 (*Pt->CmdPt[Input])(); // function Pt = Pt->Next[Input]; // next state } }
Program 6.27 A Mealy FSM implemented with functional abstraction.
6.9
*Dynamically Allocated Data Structures In order to reuse memory and provide for efficient use of RAM, we need dynamic memory allocation. The previous examples in this chapter used fixed allocation, meaning the size of the data structures are decided in advance and specified in the source code. In addition, the location of these structures is determined by the assembler at assembly time. With a dynamic allocation, the size and location will be determined at run time. To implement dynamic allocation, we will manage a heap. The heap is a chunk of RAM that is 1. Dynamically allocated by the program when it creates the data structure 2. Used by the program to store information 3. Dynamically released by the program when the structure is no longer needed
6.9 䡲 *Dynamically Allocated Data Structures
229
The heap manager provides the system with two operations: pt = malloc(size); // returns a pointer to a block of size bytes free(pt); // deallocates the block at pt
The implementation of this general memory manager is beyond the scope of this book. Instead, we will develop a very useful, but simple, heap manager with these two operations: pt = Heap_Allocate(); Heap_Release(pt);
6.9.1 *Fixed-Block Memory Manager
Figure 6.27 The initial state of the heap has all of the free blocks linked in a list.
// returns a pointer to a block of fixed size // deallocates the block at pt
In general, the heap manager allows the program to allocate a variable block size, but in this section, we will develop a simplified heap manager handles just fixed size blocks. In this example, the block size is specified by SIZE. The initialization will create a linked list of all the free blocks (Figure 6.27). FreePt
null
Program 6.28 shows the global structures for the heap. These entries are defined in RAM. SIZE is the number of 8-bit bytes in each block. All blocks allocated and released with this memory manager will be of this fixed size. NUM is the number of blocks to be managed. FreePt points to the first free block. Program 6.28 Private global structures for the fixed-block memory manager.
Initialization must be performed before the heap can be used. Program 6.29 shows the software that partitions the heap into blocks and links them together. FreePt points to a linear linked list of free blocks. Initially, these free blocks are contiguous and in order, but as the manager is used, the positions and order of the free blocks can vary. It will be the pointers that will thread the free blocks together.
To allocate a block to manager just removes one block from the free list, see Program 6.30. The Heap_Allocate function will fail and return a null pointer when the heap becomes empty. The Heap_Release returns a block to the free list. This system does not check to verify a released block actually was previously allocated. ; returns RegX points to new block ; RegX=NULL if no more available Heap_Allocate ldx FreePt ;pt=FreePt; cpx #NULL beq aDone ;if (pt!=NULL) ldy 0,x sty FreePt ;FreePt=*pt; aDone rts ; RegX => block being released Heap_Release ldy FreePt ;oldFreePt=FreePt; stx FreePt ;FreePt=pt; sty 0,x ;*pt=oldFreePt; rts
Program 6.30 Functions to allocate and release memory blocks. Checkpoint 6.20: Consider a system that needs variable-size memory allocation, where the size can range from 2 to a maximum of 20 bytes. How might this simple heap be used?
6.9.2 *Linked List FIFO
Next Data
equ equ
0 2
An example application of a dynamically allocated data structure is a FIFO. In this structure, GetPt points to the oldest node (the one to get next), and PutPt points to the newest node: the place to add more data. The pointer for the newest node (if it exists) is a null. The Fifo_Put operation fails (full) when the heap runs out of space. The Fifo_Get operation fails (empty) when GetPt equals NULL. Program 6.31 shows the global variables defined in RAM. Figure 6.28 shows an example FIFO with three elements (after running with lots of putting and getting). In this example, element 1 is the oldest because it was put first. This system uses Programs 6.28, 6.29, and 6.30 with SIZE equal to 4 bytes.
;next ;16-bit data for node
struct Node{ struct Node *Next; short Data; }; typedef struct Node NodeType; typedef NodeType *NodePtr; NodePtr PutPt; // place to put NodePtr GetPt; // place to get
GetPt rmb 2 ; GetPt is pointer to oldest node PutPt rmb 2 ; PutPt is pointer to newest node
Program 6.31 Definition of the linked list structure. Figure 6.28 A linked list FIFO after putting 1,2,3.
PutPt FreePt
GetPt null 3
null 2
1
Program 6.32 shows the three functions which implement the FIFO. Figure 6.29 is a flowchart of the Put and Get functions. The FIFO is full only when the heap is full
6.9 䡲 *Dynamically Allocated Data Structures
231
(Heap_Allocate returns a failure). The Put operation first allocates space for the new entry, then stores the new information into the Data field. Since this element will be last, its Next field is set to null. The last part of Put links this new node at the end of the linked list. The Get function first checks to make sure the FIFO is not empty. Next, the Data field is retrieved from the node. This node is then unlinked from the linked list, and the memory block is released to the heap. There is a special case that handles the situation where you get the one remaining node in the linked list. In this case both PutPt and GetPt point to this node. When you get this node, both PutPt and GetPt are set to null, signifying the FIFO is now empty. Fifo_Init ldx #NULL stx GetPt ;GetPt=NULL stx PutPt ;PutPt=NULL jsr Heap_Init rts ; Inputs: RegD data to put ; Outputs: V=0 if successful ; V=1 if unsuccessful Fifo_Put jsr Heap_Allocate cpx #NULL beq Pful ;skip if full std Data,x ;store data ldy #NULL sty Next,x ;next=NULL ldy PutPt cpy #NULL ;previously MT? beq PMT stx Next,y ;link to previous bra PCon PMT stx GetPt ;Now one entry PCon stx PutPt ;points to newest clv ;success bra PDon PFul sev ;failure, full PDon rts ; Inputs: none ; Outputs: RegD data removed ; V=0 if successful ; V=1 if empty Fifo_Get ldx GetPt cpx #NULL beq GMT ;empty if NULL ldd Data,x ;read ldy Next,x ;pointer to next sty GetPt cpy #NULL bne GCon sty PutPt ;Now empty GCon sty GetPt ;points to oldest jsr Heap_Release clv ;success bra GetDone GMT sev ;failure, empty GDon rts
Program 6.32 Implementation of the linked list FIFO.
void Fifo_Init(void){ GetPt = NULL; // Empty when null PutPt = NULL; Heap_Init(); } int Fifo_Put(short theData){ NodePtr pt; pt = (NodePtr)Heap_Allocate(); if(!pt){ return(0); // full } pt->Data = theData; // store pt->Next = NULL; if(PutPt){ PutPt->Next = pt; // Link } else{ GetPt = pt; // first one } PutPt = pt; return(1); // successful }
Figure 6.29 Flowcharts of a linked list FIFO Put and Get operations.
Put
Get
pt=Heap_Allocate()
GetPt
valid
full
store data at pt->Data
return(0)
PutPt
fetch data at GetPt->Data pt = GetPt
NULL first element
GetPt valid
GetPt = pt
PutPt->Next = pt
return(0)
GetPt = GetPt->Next
pt->Next = NULL valid
empty
valid
NULL
pt
NULL
NULL now, it is empty PutPt = NULL
PutPt = pt
Heap_Allocate(pt)
return(1)
return(1)
Checkpoint 6.21: Draw a picture like Figure 6.28 of a doubly linked list. How might this more complicated structure be more efficient than the single linked list?
6.10
*9S12 Paged Memory 16-bit pointers can only access up to 64 KiB of memory. The 9S12 uses a paged memory system to access memory beyond this 64 KiB barrier. On most of the 9S12 microcontrollers, the extended address contains 20 bits and thus can access up to 1 Mbytes of memory. The paged memory system is organized into a maximum of 64 pages with a fixed page size of 16 KiB. The software must first write the page number into PPAGE, which is an 8-bit register located at $0030 (only the bottom 6 bits are used). On the 9S12, addresses in the $8000 to $BFFF window invoke the paged memory system. The top 6 bits of the 20-bit extended address are retreived from the PPAGE register, and the bottom 14 bits come from the regular 16-bit address, as shown in Figure 6.30. In particular, when the software accesses any address in the $8000 to $BFFF window, the bottom 6 bits of PPAGE are concatenated to the bottom 14 bits of the window address to create the 20-bit extended address used to access memory. This logical to physical address
Figure 6.30 The address is comprised of two components.
translation occurs automatically whenever an address in the $8000 to $BFFF window is accessed. On the 9S12DP512, the full 512 KiB of flash EEPROM can only be accessed using this paged memory system. On the 9S12DP512, there are only 32 pages needed for the 512-KiB flash EEPROM. In particular, it utilizes page numbers $20 through $3F. Page $3E is actually the same as regular EEPROM at $4000 to $7FFF, and page $3F is the same as EEPROM at $C000 to $FFFF. Observation: If the software sets and leaves PPAGE at $20 (actually any constant value from $20 to $3D), then the EEPROM behaves like a simple 48 KiB memory from $4000 to $FFFF.
We will present two applications of paged memory. In this first application, the flash EEPROM on the 9S12DP512 will contain a 256 KiB data buffer. Because these data are located in EEPROM, we will consider them as constant and provide a function to access the data. The buffer will be accessed using a single 18-bit linear address and passed into the subroutine in registers B and X. In this example, we assume the system’s executable object code fits entirely in the 32 KiB space $4000 to $7FFF, $C000 to $FFFF. The 256 KiB buffer will be stored into 16 pages from $20 to $2F. The subroutine, shown as Program 6.33, first sets the PPAGE register to select the correct page, then reads from the $8000 to $BFFF window to retrieve the specified data.
;****Buf_Read******* ;Read byte from buffer ;Input B:X is 18-bit linear address ;Output A is data Buf_Read pshx xgdx lsld xgdx rolb ;addr<<1 xgdx lsld xgdx rolb ;B=(addr<<2)>>16 andb #$0F ;limit to 256K addb #$20 ;B=$20+(addr>>14) stab PPAGE puld anda #$3F ;D=addr&$3FFF adda #$80 ;D=$8000+addr&$3FFF tfr d,x ;X=$8000+addr&$3FFF ldaa 0,x ;A=data from buffer rts
Program 6.33 A 256 Kibibyte data buffer implemented in paged memory.
The second application implements a system with a code size of more then 48 KiB. In this system, we will partition the code into separate 16 Kibibyte pieces. The system will be most efficient if the partitioning is done according to access probability. In other words, if module A frequently calls module B, then A and B will be placed into the same 16 KiB page. We will place the most frequently used code and the starting location into the pages $4000 to $7FFF and $C000 to $FFFF. Accessing these locations is simple and uses standard 16-bit pointers. We place the remaining code into paged memory. Subroutine calls within the same page can utilize
234
6 䡲 Pointers and Data Structures
Figure 6.31 The call instruction is used to call a subroutine in paged memory.
before CALL PC $81
after CALL
Stack
PC
$01
Stack
$97
$6C
PPAGE
PPAGE
SP
$02 $20
$02 $21
$20 $81 $05
SP
top
PC Page2 $80101 call sub,#$21 $80105
PC Page2 $80101 call sub,#$21 $80105
Page3 $8576C sub inca $8576D rtc
Page3 $8576C sub inca $8576D rtc
the standard bsr and jsr instructions. To call a subroutine located in a different page, the call instruction is used. Figure 6.31 shows the stack before and after the call instruction is executed on the 9S12DP512. The call instruction pushes the old PPAGE and PC values on the stack and then loads PPAGE and PC with the address of the subroutine. When op codes are fetched from the $8000 to $BFFF window, the 6-bit PPAGE is combined with the lower 14 bits of the PC to form a 20-bit address. The translation occurs automatically in hardware. Consider the case where the PPAGE register equals $20 and the PC is $8101 (left picture of Figure 6.32). PPAGE = $20 = 00100000 PC = $8101 = 1000000100000001 PPAGE + Lower 14 bits of PC = 100000+00000100000001 = $80101
After the call instruction, PPAGE register equals $21, and the PC is $976C (right picture of Figure 6.32). PPAGE = $21 = 00100001 PC = $976C = 1001011101101100 PPAGE + Lower 14 bits of PC = 100001+01011101101100 = $8576C
The rtc instruction will return to the program that called the subroutine. Both the PPAGE and PC values are pulled off the stack. Figure 6.32 shows the stack before and after execution of the rtc instruction. Figure 6.32 The rtc instruction is used to return from a subroutine in paged memory.
before RTC PC
after RTC
Stack
$97
$6D
PPAGE
SP
$02 $21
PC $81
$20
PPAGE
$81
$02 $20
Stack $05
$05
SP PC Page2 $80101 call sub,#$21 $80105 Page3 $8576C sub inca $8576D rtc
top
PC Page2 $80101 call sub,#$21 $80105 Page3 $8576C sub inca $8576D rtc
6.11 䡲 Functional Debugging
235
Programs 6.34, 6.35, and 6.36 illustrate the use of call and rtc to create a paged memory system on the 9S12. Program 6.34 will be programmed into main EEPROM.
Program 6.34 Main memory programs for this paged memory system.
func1 equ func2 equ org main lds clra loop call call call call bra
0 3 $4000 #$4000
; ; ; ;
func1,#$21 func1,#$22 func2,#$21 func2,#$22 loop
relative offset in paged memory relative offset in paged memory main EEPROM memory stack in main RAM ; ; ; ;
call call call call
function function function function
1 1 2 2
in in in in
page page page page
$21 $22 $21 $22
(add (add (add (add
1) 2) 3) 4)
Program 6.35 will be programmed into external page $21.
Program 6.35 Page $21 programs for this paged memory system.
fun1 fun2
org lbra lbra adda rtc adda rtc
$0000 fun1 fun2 #1
; page $21 external memory ; link to actual function ; link to actual function
#2
Program 6.36 will be programmed into external page $22.
Program 6.36 Page $22 programs for this paged memory system.
fun1 fun2
org lbra lbra adda rtc adda rtc
$0000 fun1 fun2 #3
; page $22 external memory ; link to actual function ; link to actual function
#4
The TExaS simulator does not support external paged memory, but it will execute the call and rtc instructions similar to regular jsr rts subroutine.
6.11
Functional Debugging
6.11.1 Instrumentation: Dump Into Array Without Filtering
As mentioned in the last chapter, one of the difficulties with print statements are that they can significantly slow down the execution speed in real-time systems. Many times the bandwidth of the print functions can not keep pace with the existing system. For example, our system may wish to call a function 1000 times a second (or every 1 ms). If we add print statements to it that require 50 ms to perform, the presence of the print statements will significantly affect the system operation. In this situation, the print statements would be considered extremely intrusive. Another problem with print statements
236
6 䡲 Pointers and Data Structures
occurs when the system is using the same output hardware for its normal operation, as is required to perform the print function. In this situation, debugger output and normal system output are intertwined. To solve both these situations, we can add a debugger instrument that dumps strategic information into an array at run time. We can then observe the contents of the array at a later time. One of the advantages of dumping is that the 9S12 BDM debugger module allows you to visualize memory even when the program is running. So this technique will be quite useful in systems connected to a debugger. Assume happy and sad are strategic 8-bit variables. The first step when instrumenting a dump is to define a buffer in RAM to save the debugging measurements.
The Cnt will be used to index into the buffers. Cnt must be initialized to zero, before the debugging begins. The debugging instrument, shown in Program 6.37, saves the strategic variables into the Buffer.
Next, you add jsr Save statements at strategic places within the system. You can either use the debugger to display the results or add software that prints the results after the program has run and stopped. Observation: You should save registers at the beginning and restore them back at the end, so the debugging instrument itself doesn’t cause the software to crash.
6.11.2 Instrumentation: Dump Into Array With Filtering.
One problem with dumps is that they can generate a tremendous amount of information. If you suspect a certain situation is causing the error, you can add a filter to the instrument. A filter is a software/hardware condition that must be true in order to place data into the array. In this situation, if we suspect the error occurs when another variable gets large, we could add a filter that saves in the array only when the variable is above a certain value. In the example shown in Program 6.38, the instrument saves the strategic variables into the buffer only when sad is greater than 100.
6.12 䡲 Tutorial 6 Software Abstraction Program 6.38 Instrumentation dump with filter.
6.12
Save pshb pshx ldab cmpb ble ldab cmpb beq ldx movb incb movb incb stab done pulx pulb rts
;save sad #100 ;save only done ;when sad >100 Cnt #SIZE*2 ;full? done #Buffer happy,B,X ;save happy sad,B,X
Tutorial 6 Software Abstraction The purpose of this tutorial is to evaluate two stepper motor interfaces. Tutor6a.rtf spins a stepper motor using the switch statement. Tutor6b.rtf spins a stepper motor using a linked structure. You first will be asked to calculate the execution speed for each example. Then, you will study its ease of modification by adding additional states to the system. Action: Open and assemble the switch statement program Tutor6a.rtf. Question 6.1 What is the static efficiency of the step subroutine in the Tutor6a.rtf system in ROM bytes? Action: Run the Tutor6a.rtf system and observe the stepper motor signals. Question 6.2 Put a ScanPoint somewhere in the loop. Run the system and measure the minimum and maximum time (in cycles) to step the motor. Question 6.3 Add four more output values to implement half-stepping. The new sequence should be $05,$04,$06,$02,$0A,$08,$09,$01. Question 6.4 What is the static efficiency of the new system? Also, measure the minimum and maximum time (in cycles) to step the motor. Action: Open and assemble the linked-structure program Tutor6b.rtf. Question 6.5 What is the static efficiency of the linked structure and the step subroutine in the Tutor6b.rtf system in ROM bytes? Action: Run the Tutor6b.rtf system and observe the stepper motor signals. Question 6.6 Put a ScanPoint somewhere in the loop. Run the system and measure the minimum and maximum time (in cycles) to step the motor. Question 6.7 Add four more output values to implement half-stepping. The new sequence should be $05,$04,$06,$02,$0A,$08,$09,$01. Question 6.8 What is the static efficiency of the new system? Also, measure the minimum and maximum time (in cycles) to step the motor. Comment on the differences between the two approaches.
238
6.13
6 䡲 Pointers and Data Structures
Homework Assignments Homework 6.1 Assume Register X contains the address $2000, Register Y contains the address $2080, Register A contains $45, and Register B contains $67. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. staa stab std staa stab std
40,x $40,x $66,y 25,y $FF,y $CD,x
Homework 6.2 Assume Register X contains the address $2000, and Register Y contains the address $2080. Assume memory contains the following initial values $2000 0, $2001 1, . . . , $20FF $FF. For each of the following instructions, specify the effective address and the resulting operation. Give all your answers in hexadecimal. ldaa ldab ldaa ldaa ldd ldd
40,x $40,y $66,x 25,y $FE,x $0D,y
Homework 6.3 Assume Register X contains the address $0800, Register Y contains the address $0900, Register A contains $02, and Register B contains $67. Assume locations $0802 and $0803 contain the 16-bit value $0A00. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. staa stab std stx stab std
b,x -$40,y [2,x] d,y 1,-y 2,x+
Homework 6.4 Assume Register X contains the address $0800, Register Y contains the address $0900, Register A contains $03, and Register B contains $67. Assume locations $0804 and $0805 contain the 16-bit value $0B12. For each of the following instructions, specify the effective address and the resulting operation. In particular, specify what value(s) is stored into what memory location(s). Give all your answers in hexadecimal. stab staa std sty staa std
a,x -1,y [4,x] d,x 1,+x 2,y-
Homework 6.5 Write assembly code that adds 10 to Register X and subtracts 100 from Register Y. Homework 6.6 Write assembly code that sets Register X equal to Register Y plus 100. Homework 6.5 Write assembly code that adds Register D to Register X and stores the sum in Register Y.
6.13 䡲 Homework Assignments
239
Homework 6.7 Look up the machine code created by the following instructions. Explain the basic function of each instructon. The first one is completed. Machine Code
Instruction
Comment
$860A
ldaa ldaa ldaa ldaa
RegA = 10
#10 10 10,x 10,y
Homework 6.8 Look up the machine code created by the following 9S12 instructions. Explain the basic function of each instructon. The first one is completed. Machine Code
Instruction
Comment
$A602
ldaa ldaa ldaa ldaa ldaa ldaa ldaa
RegA = [X + 2]
2,x -2,x 2,+x 2,x+ 2,-x 2,x[2,x]
Homework 6.9 Write a subroutine to converts a null-terminated string to upper case. In particular, convert all lower case ASCII characters to upper case. The original data is in RAM, so this routine overwrites the string. The calling sequence is ldx jsr
#string UpperCase
; pointer to ASCII string
Homework 6.10 Write a subroutine to converts a null-terminated string to lower case. In particular, convert all upper case ASCII characters to lower case. The original data is in RAM, so this routine overwrites the string. The calling sequence is ldx jsr
#string LowerCase
; pointer to ASCII string
Homework 6.11 Write a subroutine that compares two null-terminated strings. Register A will be 0 if the strings do not match and will be nonzero if the strings match. The calling sequence is ldx ldy jsr
#string1 ; pointer to first string #string2 ; pointer to second string StringCompare
Homework 6.12 Write a subroutine that adds two equal-sized arrays. Register A contains the size of the array, and Registers X and Y are call by reference pointers to the arrays. The first array, pointed to by RegX, should be added to the second array, pointed to by RegY, and the sum placed back in the second array. Assume the data is 8-bit unsigned, and implement a ceiling operation (set result to 255) on overflow. Homework 6.13 Write a subroutine that implements the dot-product two equal sized arrays. The arrays contain 8-bit unsigned numbers. Register A contains the size of the array, and Registers X and Y are call by reference pointers to the arrays. The return parameter is an unsigned 16-bit number in Reg D. For example, consider these two arrays: Vector1 fcb 10,20,30 ; 3-D vector Vector2 fcb 1,0,2 ; 3-D vector The dot product is 10*120*030*2 70. The calling sequence is ldaa ldx ldy jsr
#3 #Vector1 #Vector2 DotProduct
; size of arrays ; pointer to first array ; pointer to second array
240
6 䡲 Pointers and Data Structures Homework 6.14 Write a subroutine that counts the number of characters in a string. The string is null-terminated. Register X is a call-by-reference pointer to the string. The number of characters in the string is returned in Reg B. For example, consider this string: Name "Valvano" fcb 0 The size is is 7. The calling sequence is: ldy jsr
#Name Count
; pointer to string
Homework 6.15 Write a subroutine that finds the maximum number in an array. The array contains 8-bit signed numbers. The first element of the array is the size. Register Y is a call-byreference pointer to the array. The maximum value in the array is returned in Reg B. For example, consider this array: Array fcb 8,-10,20,-30,40,-50,-60,-70,-80 The maximum value is 40. The calling sequence is ldy jsr
#Array Maximum
; pointer to array
Homework 6.16 Write a subroutine that finds the largest absolute value in an array. The array contains 8-bit signed numbers. The first element of the array is the size. Register Y is a call-by-reference pointer to the array. The maximum absolute value in the array is returned in Reg B. For example, consider this array: Array fcb 8,-10,20,-30,40,-50,-60,-70,-80 The maximum absolute value is 80. The calling sequence is ldy jsr
#Array Maximum
; pointer to array
Homework 6.17 Write a subroutine that compares two equal-sized arrays. Register A contains the size of the array, and Registers X and Y are call-by-reference pointers to the arrays. The return parameter is in RegB. RegB is 1 if the arrays are equal and 0 if they are different. For example, consider these two arrays containing 8-bit numbers: Array1 fcb 10,20,30,40,50,60,70,80 Array2 fcb 10,20,30,41,50,60,70,80 These arrays are different. The calling sequence is ldaa ldx ldy jsr
#8 #Array1 #Array2 ArrayEqual
; size of arrays ; pointer to first array ; pointer to second array
Homework 6.18 Write a subroutine that counts the frequency of occurance of letters in a text buffer. Register X points to a null-terminated ASCII buffer. There is a 26-element array into which the frequency data will be entered. For example, the first element of Freq will contain the number of A’s and a’s. Count only the upper case and lower case letters. Freq ds.w 26
;twenty six 16-bit counters
The calling sequence is ldx jsr
#buffer CalcFreq
; pointer to text buffer
Homework 6.19 Write three debugging subroutines that implement a debugging array dump. Assume there are two global 16-bit variables AA and BB that are strategic to the system under test. The first subroutine initializes your system. The second subroutine saves AA, BB, and TCNT in the array. Your system should be able to support up to ten measurements. You may assume the SCI port
6.13 䡲 Homework Assignments
241
is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine will display the collected data. These three subroutines will be added to the original system with the first being called at the beginning, the second placed at strategic places within the program under test, and the last one will be called at the end. Estimate the level of intrusiveness of this debugging process. In particular, how long does it take to call the second subroutine. These subroutines will be added to the original software using an editor, then the combination will be assembled and downloaded to the target. Homework 6.20 Assume we have some 6-row by 8-column matrix data structures. The precision of each entry is 16 bits. The information is stored in column-major format (the data for each column is stored contiguously) with zero indexing. I.e., the row index, I, ranges 0 I 5, and the column index, J, ranges 0 J 7. Write the assembly language subroutine which accepts a pointer to the array, the I,J indices, and returns the 16-bit contents. Don’t save/restore registers. ;Inputs ; ; ;Outputs
RegA RegB RegX RegD
row index I=0,1,...,5 column index J=0,1,...,7 pointer to a 6 by 8 matrix 16-bit contents at matrix[I,J]
Homework 6.21 Assume we have some 5-row by 10-column matrix data structures. The precision of each entry is 16 bits. The information is stored in column-major format (the data for each column is stored contiguously) with zero indexing. I.e., the row index, I, ranges 0 I 4, and the column index, J, ranges 0 J 9. Write the assembly language subroutine which accepts a pointer to the array, the I,J indices, and returns the 16-bit contents. Don’t save/restore registers. ;Inputs ; ; ;Outputs
RegA RegB RegX RegD
row index I=0,1,...,4 column index J=0,1,...,9 pointer to a 5 by 10 matrix 16-bit contents at matrix[I,J]
Homework 6.22 Consider the following table structure: const struct theRoom{ unsigned char windows; // number of windows unsigned char doors; // number of doors unsigned short size[3]; // x,y,z dimensions } typedef const struct theRoom roomType; roomType Building[4]={ { 3,2,{16,16,8}}, { 4,1,{20,20,10}}, { 5,3,{32,16,12}}, { 0,1,{18,10,8}}}; a) Show the assembly code required to define this structure in ROM. Use equ to make the code easier to understand. b) Write an assembly program to return the number of windows of a room. The room number is passed by value in Register A, and the result is returned by value in Register A. For example, if the room number is 2, then the number of windows will be 5. c) Write an assembly program to return the number of doors of a room. The room number is passed by value in Register A, and the result is returned by value in Register A. For example, if the room number is 0, then the number of doors will be 2. d) Write an assembly program to return the volume of a room. The room number is passed by value in Register A, and the result is returned by value in Register D. For example, if the room number is 1, then the volume will be 20*20*104000. Homework 6.23 Consider the following table structure: const struct thedesk{ unsigned char legs; unsigned char drawers; unsigned short size[2];
// number of legs // number of drawers // top x,y dimensions 0.1 feet
242
6 䡲 Pointers and Data Structures } typedef const struct thedesk deskType; deskType furniture[4]={ { 4,5,{30,50}}, // 4 legs 5 drawers { 4,0,{45,45}}, // square table { 6,7,{40,65}}, { 4,4,{35,55}}}; a) Show the assembly code required to define this structure in ROM. Use equ to make the code easier to understand. b) Write an assembly program to return the number of legs of a desk. The desk number is passed by value in Register A, and the result is returned by value in Register A. For example, if the desk number is 2, then the number of legs will be 6. c) Write an assembly program to return the number of drawers of a desk. The desk number is passed by value in Register A, and the result is returned by value in Register A. For example, if the desk number is 0, then the number of drawers will be 5. d) Write an assembly program to return the area of a desk top with units in2. The room number is passed by value in Register A, and the result is returned by value in Register D. For example, if the desk number is 3, then the desk top area will be (35*55*144)/100 1764. Worry about accuracy (divide last) and overflow (use enough bits in the multiply stage to prevent overflow.) You could factor the 144 100 terms to calculate (35*55*18)/25 1764. Your solution has to work for these four examples. Homework 6.24 Write an assembly main program that implements this Mealy finite-state machine. The FSM data structure, shown below, is given and cannot be changed. The next state links are defined as 16-bit pointers. Each state has eight outputs and eight next-state links. The input is on Port M bits 2,1, and 0 and the output is on Port T bits 5, 4, 3, 2, 1, and 0. There are three states (S0, S1, and S2), and the initial state is S0. Show all assembly software required to execute this machine, including the reset vector. You need not be friendly, but do initialize the direction registers. The repeating execution sequence is input, output (depends on input and current state), and next (depends on input and current state). org * Finite S0 fcb fdb S1 fcb fdb S2 fcb fdb
$4000 ;EPROM State Machine 0,0,5,6,3,9,3,0 S0,S0,S1,S1,S1,S2,S2,S2 1,2,3,9,6,5,3,3 S2,S0,S0,S0,S2,S2,S2,S1 1,2,3,9,6,5,3,3 S2,S2,S2,S2,S0,S0,S2,S1
; ; ; ; ; ;
Outputs for Next states Outputs for Next states Outputs for Next states
inputs 0 to 7 for inputs 0 to 7 inputs 0 to 7 for inputs 0 to 7 inputs 0 to 7 for inputs 0 to 7
Homework 6.25 Design a microcomputer-based controller using a linked-list finite-state machine. The system has one input and one output.
Figure Hw6.25 Electronic ignition.
9S12
Angle Machine Spark
PT3 PT2
about 1 ms
Angle exactly 50μs
Spark
The input, Angle, is a periodic signal with a frequency of about 1 kHz (has a period of about 1 ms). The output, Spark, should be a positive pulse (exactly 50 s wide) every time Angle goes from 0 to 1. The delay between the rising edge of Angle and the start of the Spark pulse should be as short as possible. The period of Angle can vary from 1 ms to 50 ms. Since Angle is an input you can not control it, only respond to its rising edge.
6.13 䡲 Homework Assignments
243
a) Design the one input, one output finite-state machine for this system. Draw the FSM graph. Use descriptive state names (i.e., don’t call them S0, S1, S2 . . .) b) Show the assembly code to create the statically allocated linked list. Include org statement(s) to place it in the proper location on your microcomputer. c) Show the assembly language controller. Include ORG statement(s) to place it in the proper location on a microcomputer. Assume this is the only task that the microcomputer executes. I.e., show ALL the instructions necessary. Make the program automatically start on a RESET. Homework 6.26 Implement the following Mealy finite-state machine using linked lists. The initial state is Stop. Do not convert the finite-state machine to an equivalent Moore, rather implement it as a mealy machine. There is no wait parameter for the states.
Figure Hw6.26 Engine controller.
Break Machine Gas Control
PT2 PT1 PT0
9S12
Gas Break Control
0/10
0/00 1/01 Go
1/00
Initial
Stop
There is one input, Control, connected to PT0. There are two outputs: Break connected to PT2, and Gas connected to PT1. Each state has two next states and two outputs which depend on the current input. The controller continuously repeats the sequence: Input from Control (PT0) Output to Break,Gas (PT2 and PT1) which depends on the input Control Next state which depends on the input Control E.g., if the state is in Stop, and the Control is 0, then the Break output is 1 and the Gas output is 0 and the next state is Stop. Show ALL the assembly language software required to implement this machine on a single chip microcomputer. Use equ statements to clarify the data structure. Use org statements to implement the appropriate segmentation. Homework 6.27 Write an assembly main program that implements this Moore finite-state machine. The FSM state graph, shown in Figure Hw6.27, is given and cannot be changed. The input is on Port T bits 1 and 0 and the output is on Port M bits 4, 3, 2, 1, and 0. There are three states (happy, hungry, and sleepy). The initial state is happy.
Figure Hw6.27 Finite state graph.
0 1
happy 10
2
0
3
3 2 hungry 0
3
0 sleepy 12 2
1 1
a) Show the ROM-based FSM data structure b) Show the initialization and controller software. Initialize the direction registers, making all code friendly. You may add variables in any appropriate manner (registers, stack, or global RAM). The repeating execution sequence is . . . output, input, next. . . . Please make your code that accesses Port M friendly. Homework 6.28 Write an assembly main program that implements this Mealy finite-state machine. The FSM state graph, shown in Figure Hw6.28, is given and cannot be changed. The input is on Port T
244
6 䡲 Pointers and Data Structures
Figure Hw6.28 Finite state graph.
0/3
0/7
happy 1/2
hungry 1/8 0/4 sleepy
1/3
bit 0 and the output is on Port M bits 3, 2, 1, and 0. There are three states (happy, hungry, and sleepy). The initial state is happy. a) Show the ROM-based FSM data structure b) Show the initialization and controller software. Initialize the direction registers, making all code friendly. You may add variables in any appropriate manner (registers, stack, or global RAM). The repeating execution sequence is . . . input, output, next. . . . Please make your code that accesses Port M friendly. Homework 6.29 Write the Stepper_CCW subroutine as described in Example 6.1.
6.14
Laboratory Assignments Lab 6.1 Minimally Intrusive Debugging Purpose. The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project you will run with a short time delay. After the software is debugged, you will build your hardware and run your software on the real 9S12. During this phase of the project you will run with time delays long enough so you will be able to see the LED flash (slower than 8 Hz). Description. You will first design a system, and then add debugging instruments to prove the system is functioning properly. The system has one input switch and one output LED. The basic function of the system is to respond to the input switch, causing certain output patterns on the LED. Interface a positive logic switch to PT3. This means the PT3 signal will be 0 (low, 0V) if the switch is not pressed, and the PT3 signal will be 1 (high, 5V) if the switch is pressed. Overall functionality of this system is described in the following rules. The system starts with the LED off (make PT2 0). The system will return to the off state if the switch is not pressed (PT3 is 0). If the switch is pressed (PT3 is 1), then the LED will flash on and off at about 4 Hz. During the first phase of this lab, you will simulate these hardware circuits in TExaS using a positive logic mode for the switch and LED. During the second phase, you will interface a real switch and LED to your 9S12. When visualizing software running in real-time on an actual microcomputer, it is important use minimally intrusive debugging tools. The objective of this lab is to develop debugging methods that do not depend on the simulator. During the first phase of this lab, you will develop and test your program and debugging instruments on the TExaS simulator. In particular, you will write debugging instruments to record input and output information as your system runs in real time. This software dump should store data into an array while it is running, and the information will be viewed at a later time. Software dumps are an effective technique when debugging software on an actual microcomputer. During the second phase of this lab, you will run your system on the real 9S12 with and without your debugging instruments. a) Design the hardware interface of the switch and LED first in TExaS, then on the real system. b) Write a main program that implements the input/output system. To implement the 125 ms delay, use the timer functions from Chapter 4. The basic steps for the main program are shown in Program L6.1. c) Write two debugging subroutines that implement a dump instrument. This is called functional debugging because you are capturing input/output data of the system without information
6.14 䡲 Laboratory Assignments
loop
wait flash
Initialize the stack pointer Enable interrupts for the Metrowerks debugger, cli Set the direction register so PT3 is an input and PT2 is an output Set PT2 so the LED is off delay about 125ms (any delay from 60 to 500 ms is OK) read the switch and go to flash if the switch is pressed Set PT2 so the LED is off read the switch and go to wait if the switch is not pressed toggle the LED (if on turn it off, if off turn it on) go to loop
245
DDRT &= ~0x08; // PT3 input DDRT |= 0x04; // PT2 output PTT &= ~0x04; // PT2 off while(1){ Delay(); // you write this if((PTT&0x08)==0){ PTT &= ~0x04; // PT2 off while((PTT&0x08)==0){}; } PTT = PTT^0x04; // toggle }
Program L6.1 Program used to develop minimally intrusive debugging instruments. specifying when the input/output was collected. The first subroutine (Debug_Init) initializes your debugging system. The initialization should initialize a 100-byte array (start it at $3880), initializing pointers and/or counters as needed. The second subroutine (Debug_Capture) saves one data point (PT3 input data, and PT2 output data) in the array. Since there are only two bits to save, pack the information into one 8-bit value for storage and ease of visualization. For example, if Input (PT3)
Output (PT2)
0 0 1 1
0 1 0 1
Saved Data 0000,00002, or $00 0000,00012, or $01 0001,00002, or $10 0001,00012, or $11
In this way, you will be able to visualize the entire array in an efficient manner. Place a call to Debug_Init at the beginning of the system, and a call to Debug_Capture just after each time you output to PTT (there will be 3 or 4 places where your software writes to PTT). Within TExaS you can observe the debugging array using a Stack window. The basic steps involved in designing the data structures for this debugging instrument are as follows: Allocate a 100-byte buffer starting at address $3880 Allocate a 16-bit pointer, which will point to the place to save the next measurement The basic steps involved in designing Debug_Init are as follows: Set all entries of the 100-byte buffer to $FF (meaning no data yet saved) Initialize the 16-bit pointer to the beginning of the buffer The basic steps involved in designing Debug_Capture are as follows: Return immediately if the buffer is full (pointer past the end of the buffer) Read PTT data PTT Mask capturing just bits 3,2 data ((data&$08)1) ((data&$04)2) Dump information into buffer (*pt) data Increment pointer to next address pt pt 1 Both routines should save and restore registers that it modifies (except CCR), so that the original program is not affected by the execution of the debugging instruments. The temporary variable data may be implemented in a register. However, the 100-byte buffer and the 16-bit pointer, pt, should be permanently allocated in global RAM. d) By counting cycles in the listing file, estimate the execution time of the Debug_Capture subroutine. Assuming the actual E clock speed, convert the number of cycles to time. This time will be a quantitative measure of the intrusiveness of your debugging instrument.
246
6 䡲 Pointers and Data Structures Lab 6.2 Hand Assembly and Execution Purpose. In this lab you will learn how to hand-assemble source code. During pass 1 you will create the symbol table. During pass 2 you will create the object code. Another objective is to understand how the microcomputer executes instructions. For each memory cycle during execution, you will predict the R/W line, the 16-bit address, and the 8-bit data bus. Description. In preparation for this assignment, you should familiarize yourself with the format of the Microcomputer Programming Reference Manual. In particular, you should understand the addressing modes. You need to be able to look up op codes for each instruction. For each instruction, you need to determine the object code and CPU execution cycles. Many instructions have multiple addressing modes, each addressing mode has a distinct object code and execution cycles. a) Pretend you are pass 1 of the cross-assembler and create the symbol table for the Program L6.2. Labels start in column 1. A symbol table is a list of symbols and their 16-bit unsigned values. There will be an entry in the symbol table for each label. For all labels except equ or set, the value of a symbol is the beginning address of that line. For labels with equ or set, the value of the label is the 16-bit value of the operand. b) Pretend you are pass 2 of the cross-assembler and create the object code for the Program L6.2. Include four fields for each line of assembly code:
Program L6.2 The assembly program for Lab 6.2.
org Result rmb Index rmb org Main lds ldy ldaa bsr std stop Sum pshy staa ldd SLoop addd dec bne puly rts org data fdb org fdb 1. 2. 3. 4.
$900 ; RAM 2 1 $F800 ; EEPROM #$0C00 #data #2 Sum Result
Index #0 2,y+ Index SLoop
$FC00 13,9 $FFFE Main
The address is the 16-bit unsigned hexadecimal location of the start of this line The object code is a group of 8-bit unsigned hexadecimal values The number of cycles to execute this line (called Cycles in the manual) The execution pattern is called Access Detail in the CPU manual
Every line has an address. Some pseudo-ops will create object code (e.g., fcb5 fdb6). Since pseudo-ops are not executed, no pseudo-op will have values for the Cycles or Access Detail entries. For example, the 6812 yields Address $F800 $F800 5 6
Object Code(s) B6 08 00
Cycles
Access Detail
Source Code
[3]
rOP
org $F800 ldaa $0800
In TExaS, the pseudo-ops fcb, dc.b, and dc are identical. In TExaS, the pseudo-ops fdb and dc.w are identical.
6.14 䡲 Laboratory Assignments
247
c) Type the source code into the system and run the cross-assembler. Please correct your part b with a red pen. Please do parts a and b on paper first, then run the machine. d) Pretend your are the microcomputer and hand-execute this program up until the stop instruction. Perform the pseudo-execution showing the R/W, 16-bit Address, and 8-bit Data in hexadecimal for each cycle. On the 6812 the pseudo-execution will not match the actual 6812 execution. This is because the 6812 has an instruction queue and can actually fetch 16 bits at a time. TExaS does not simulate the 6812 instruction queue and will always fetch 8 bits. TExaS properly simulates the software timing on all its microcomputers. For example TExaS will show the 6812 instruction ldaa $0800 as four pseudo cycles Read Read Read Read
$F800 $F801 $F802 $0800
B6 08 00 xx
fetch opcode fetch operand fetch operand memory read, xx is the contents at $800
but the simulated time will be correctly incremented by 3. In fact, all timing aspects of the simulation will be accurate. Add in the comment field at the start of each instruction which instruction is being executed. e) Run this program with the simulator and verify your answers to part d. Correct any mistakes with a red pen. Please do part d on paper first, then run the machine. Lab 6.3 Profiling Purpose. The TExaS simulator provides a rich set of debugging tools, but eventually we will be asked to run programs on an actual microcomputer. The objective of this lab is to develop profiling tools that do not depend on the simulator. Even though we will still be using the simulator for this lab, these techniques can be used when debugging software on an actual microcomputer. Procedure. a) Write three debugging subroutines that implement profiling. The first subroutine (Debug_Init) initializes your system. The second subroutine (Debug_Capture) saves a profile point (time, data, and PC position) in an array. The time parameter is the current TCNT value, the data parameter is the hexadecimal value in Register D, and the PC position information can be obtained by reading the return address off the stack. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine (Debug_Display) displays the profile on the SCI/CRT interface. Be careful to save and restore registers so the original subroutine will execute. Program L6.3 shows an example application of these debugging functions. Measure the execution time of the Debug_Capture subroutine. This time will be a quantitative measure of the intrusiveness of the debugging instrument. b) In this part, you will instrument the original program with debugging code that outputs to a parallel port. The purpose of this debugging is to count the number of times sqrt is called. Modify the main program so sqrt is called exactly 15 times. Connect unused parallel port bits to an external device that will assist in the visualization (LED, LCD etc.) Run your instrumented system that visualizes the program is called 15 times. Measure the execution times of your debugging instruments. These times will be a quantitative measure of their intrusiveness. c) Again, you will instrument the original program with debugging code that outputs to a parallel port. The purpose of this debugging is to visualize the execution pattern within sqrt. Modify the main program so sqrt is called once with an input of Reg A 100. Connect unused parallel port bits to a logic analyzer Run your instrumented system that visualizes the execution pattern. In particular, you should see the subroutine start, visualize how many times it loops, and see it finish. Measure the execution times of your debugging instruments. These times will be a quantitative measure of their intrusiveness. Lab 6.4 MicroForth Interpreter Purpose. In this lab, you will build a binary-tree data structure. You will design an interpreter that performs simple arithmetic operations. Your system must handle of signed under/overflow conditions.
248
6 䡲 Pointers and Data Structures
org $0800 rmb 1 transformed to sqrt(s) rmb 1 loop counter rmb 2 16*input org $F000 * binary fixed point squareroot, 2**-4 * Input: Reg A is s (0 to 15.9375) * Output: Reg B is t=sqrt(s) 0 to 4.00 sqrt psha clrb tsta beq done ; test for Input==0 ldab #16 mul std s16 ; 16*input mul ldaa #32 staa t ; t=2.0, initial ldaa #4 staa cnt next ldab t ; RegA=t clra xgdx ; RegX=t ldaa t tab ; RegB=t mul ; RegD=t*t addd s16 ; RegD=t*t+16*s idiv ; RegX=(t*t+16*s)/t xgdx ; RegD=(t*t+16*s)/t lsrd ; RegB=((t*t+16*s)/t)/2 adcb #0 ; round up? stab t ; t=((t*t+16*s)/t)/2 dec cnt bne next done pula rts ; RegB=sqrt(s) main lds #$0900 clra loop pshx bsr sqrt check nop inca pulx bne loop stop org $FFFE fdb main t cnt s16
Program L6.3 Profiling added to the squareroot program.
* with debugging added org $0800 t rmb 1 transformed to sqrt(s) cnt rmb 1 loop counter s16 rmb 2 16*input org $F000 * binary fixed point squareroot, 2**-4 * Input: Reg A is s (0 to 15.9375) * Output: Reg B is t=sqrt(s) 0 to 4.00 sqrt jsr Debug_Capture psha clrb tsta beq done ; test for Input==0 ldab #16 mul std s16 ; 16*input ldaa #32 staa t ; t=2.0, initial ldaa #4 staa cnt next ldab t ; RegA=t clra xgdx ; RegX=t ldaa t tab ; RegB=t mul ; RegD=t*t addd s16 ; RegD=t*t+16*s idiv ; RegX=(t*t+16*s)/t xgdx ; RegD=(t*t+16*s)/t lsrd ; RegB=((t*t+16*s)/t)/2 adcb #0 ; round up? stab t ; t=((t*t+16*s)/t)/2 jsr Debug_Capture dec cnt bne next done pula rts ; RegB=sqrt(s) main lds #$0900 clra loop pshx jsr Debug_Init bsr sqrt jsr Debug_Display check nop inca pulx bne loop stop org $FFFE fdb main
6.14 䡲 Laboratory Assignments
249
Description. In preparation for this assignment, review binary trees, command interpreters, and the last-in-first-out queue (stack). See the simple binary interpreter in TREE.rtf (installed with TExaS). The major advantage of a binary tree structure over a linear list is the speed of lookup. In the worst case, the maximum number of compares one must do to find an entry is the maximum depth of the tree. Let size be the number of entries and depth be the maximum distance from the root to any leaf. If the binary tree is full, the maximum depth is less or equal to next greatest integer of log2 size. For example, a full tree with 1023 entries requires only 10 searches to find an entry. A linear search on the same 1023 entries would take on average 512 searches. In this assignment, we will have only 15 entries, but still will implement a linked-list binary tree. There are two basic approaches to binary searching: linked lists and indexed table. In the listed list, each entry contains a string called name, a pointer to the function to execute called command, and two pointers: left and right. If both left and right are null, then the node is a leaf.
"1" push1
root "–1" pshm1
"in" in
"+" add "*" mult null
null
"/" divide "–" sub
null
null
"–2" pshm2 null
"drop" drop
"0" push0
null
null
depth = 4 "out" out
"2" push2
null
null
null
"dup" dup null
"mod" mod
null
null
"over" over
null
null
null
Figure L6.4a Tree structure containing the names and function addresses. In this procedure, input is a string to find. We begin searching at the root. Figure L6.4b Flowchart for the interpreter.
pt = root;
input < pt->name
input
input == pt->name
input > pt->name pt = pt->left;
pt = pt->right;
execute pt->command(); success
pt != null
pt
pt == null
failure
250
6 䡲 Pointers and Data Structures If the input is less than the name of the current node (pt-name) (alphabetically before) then the search will go left (pt pt-left). If the input is greater than the name of the current node (pt-name) (alphabetically after) then the search will go right (pt pt-right). The second approach (which you will not be implementing, but is included for your consideration) is called an indexed table. In this scheme we start numbering at index 1. The table must be sorted alphabetically. Rather than storing the pointers explicitly as we did in the previous example, notice how the index number when viewed in binary provides the same information. If the size is not exactly a power of two, we must allocate additional entries and place them alphabetically at the beginning or the end.
mult add sub pshm1 pshm2 divide push0 push1 push2 drop dup in mod out over
Figure L6.4c Finite state graph.
Again, input is the string that is used to match the name field of the table. This is also a binary search because the number of tests will be less than or equal to the next greatest integer of log2 size.
Figure L6.4d Flowchart for the indexed table interpreter.
The linked-list lookup will be a little faster to execute, because it is quicker to access pt-name than it is to access Table[I].name. On the other hand, it is easy to make minor changes in the indexed table. If the space is already allocated, then at run time involves shifting the entries down and so that the list remains alphabetical. Deleting a node simply involves shifting the nodes up. The only disadvantage is the size can not increase so that it exceeds the next power of two. The following table lists the 15 commands your FORTH interpreter will execute. Your software system will have two stacks. The return stack, pointed to by SP, will contain return addresses for the usual jsr rts subroutine call functions. The data stack, pointed to by RegY, will contain the input/output parameters for the functions. Commands will be separated by returns (ASCII 13). The idea is to input an entire line using InString, then lookup the command in the tree, if found, execute the function. You should display (without popping) the top data stack entry in the LCD display. Command
Function
in out dup over drop * / mod 0 1 2 1 2
Input 8-bit signed number from CRT keyboard, push on data stack Pop from data stack and output 8-bit signed number to CRT display Duplicates top of data stack Duplicates next to top of data stack Pop and discard top of data stack Pops two numbers from data stack, add, push result on data stack Pops two numbers from data stack, subtract, push result on data stack Pops two numbers from data stack, multiply, push result on data stack Pops two numbers from data stack, divide, push quotient on data stack Pops two numbers from data stack, divide, push remainder on data stack Pushes the constant 0 on the stack Pushes the constant 1 on the stack Pushes the constant 2 on the stack Pushes the constant 1 on the stack Pushes the constant 2 on the stack
We will create 10 bytes of space for the data stack (Y) separate from the hardware stack (SP). Register Y will always point into this space. You must explicitly test for data stack overflow and underflow. You must also implement ceiling and floor handling during the addition, subtraction, multiply, divide, and modulo functions. datastack penultimate ultimate bottom
Figure L6.4e Data and return stacks.
rmb rmb rmb rmb
8 1 1 0
Return stack
Data stack datastack Free area
Free area
Y
SP Subroutine return addresses
Top of stack Valid data
antepenultimate penultimate ultimate bottom
252
6 䡲 Pointers and Data Structures Notice that we can determine how many bytes are on the data stack by comparing Y to the fixed addresses: If Y Equals
This Many Bytes Are on the Data Stack
Bottom Ultimate Penultimate Datastack
None (empty) One Two Ten (full)
Use reverse-polish format for subtraction and division. E.g., 2 1 – is 1, and 2 1 / is 2. The usual stack rules apply to this data stack as well. 1. 2. 3. 4.
Stack accesses (PUSH or PULL) should not be performed outside the allocated area. Stack reads should not be performed from the free area. Stack PUSH should first decrement Y, then store the data (not vise versa). Stack PULL should first read the data, then increment Y (not vise versa).
Here are a couple of the routines to get you started: * duplicate next to top over cpy #penultimate check for at least 2 elements bhi overend skip if no data to duplicate cpy #datastack check for full bls overend skip if stack is already full ldaa 1,y copy of next to top staa 1,-y push on data stack overend rts * push a 2 on the data stack push2 cpy #datastack check for full bls psh2end skip if stack is already full movb #2,1,-y push 2 on data stack psh2end rts * multiply top two entries mult cpy #penultimate check for at least 2 elements bhi overend skip if no data to duplicate ldaa 1,y+ pop top of stack sex a,x X=multiplicand (-128 to +127) ldaa 1,y+ pop next to top sex a,d D=multiplicand (-128 to +127) exg x,y Y,D are multiplicands (X is stack pt) emuls D=product (no overflow possible in 16-bit D) exg x,y Y is stack pt again, -16256 = D = 16384 cpd #127 bgt ceiling cpd #-128 bge ok floor ldab #-128 since D<-128, set B = -128 bra ok ceiling ldab #127 since D>127, set B = 127 ok stab 1,-y push result rts a) One by one, write and debug the 15 individual commands. Use stabilization to test each routine. b) Design the fixed binary tree containing the names and function addresses for all your commands. This structure will exist in EEPROM and can not be modified unless the source code is edited and the program reassembled. Note, most FORTH interpreters place the binary tree
6.14 䡲 Laboratory Assignments
253
in RAM and allow commands to be added and subtracted at run time. Use binding (equ) to make the program more readable. c) Write the main program that interprets input from the CRT keyboard and displays output back to the CRT display. Remember to display the top of the data stack on the LCD display. Lab 6.5 Traffic Light Controller Purpose. This lab has these major objectives: the usage of linked list data structures, to create a segmented software system, and real-time synchronization by designing an input-directed traffic light controller. In preparation for this assignment, review finite-state machines, linked lists, and memory allocation. You should also run and analyze the linked-list controllers found in example files moore.rtf and mealy.rtf. Description. The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project, you will run with a fast TCNT clock (TSCR2 0). After the software is debugged, you will interface actual lights and switches to the 9S12 and run your software on the real 9S12. During this phase of the project you will run with a slow TCNT clock (TSCR2 $07). As you have experienced, the simulator requires more actual time to simulate one cycle of the microcomputer. On the other hand, the correct simulation time is maintained in the TCNT register, which is incremented every cycle of simulation time. The simulator speed depends on the amount of information it needs to update into the windows. Unfortunately, even with the least amount of window updates, it would take a long for the simulator to process the typical 3 minutes it might take for a “real” car to pass through a “real” traffic intersection. Consequently, the cars in this traffic intersection travel much faster than “real” cars. In other words, you are encouraged to adjust the time delays so that the operation of your machine is convenient for you to debug and for the TA to observe during demonstration. You will create a segmented software system putting global variables into RAM, local variables into RAM, constants and fixed data structures into EEPROM, and program object code into EEPROM. Most microcontrollers have a rich set of timer functions. For this lab, you will the ability to wait a prescribed amount of time. In general, cycle-counting (simple for loops) has the problem of conditional branches and data-dependent execution times. If an interrupt were to occur during a cycle-counting delay, then the delay would be inaccurate using the cyclecounting method. Using the TCNT timer, however, the timing will be very accurate, even if an interrupt were to occur while the microcomputer was waiting. In more sophisticated systems, other timer modes provide even more flexible mechanisms for microcomputer synchronization. A linked list solution may not run the fastest or occupy the fewest memory bytes, but it is a structured technique that is easy to understand, easy to implement, easy to debug, and easy to upgrade. Consider a typical four-corner intersection as shown in Figure L6.5. There two one-way streets are labeled South (cars travel North) and West (cars travel East). There are three inputs to your 9S12, two are car sensors, and one is a walk button. The South sensor will be true (1) if one or more cars are near the South intersection. Similarly, the West sensor will be true (1) if one or more cars are near the West intersection. The Walk sensor will be true (1) if a pedestrian wishes
Figure L6.5 Traffic light intersection.
Walk
South
R Y G
West R Y G
R
Dont walk G
Walk
254
6 䡲 Pointers and Data Structures to cross in any direction. There are eight outputs from your microcomputer that control the two Red/Yellow/Green traffic lights and the two walk/don’t lights. The simulator allows you to attach binary switches to simulate the three inputs and LED lights to simulate the eight outputs. Traffic should not be allowed to crash. I.e., there should not be a green or yellow on South at the same time there is a green or yellow on West. You should exercise common sense when assigning the length of time that the traffic light will spend in each state, so that the simulated system changes at a speed convenient for the TA (stuff changes fast enough so the TA doesn’t get bored, but not too fast that the TA can’t see what is happening). Cars should not be allowed to hit the pedestrians. The walk sequence should be realistic (walk, flashing don’t, continuous don’t). Your system should consider both the average and worst-case waiting time. You may assume the two car sensors remain active for as long as service is required. On the other hand, the walk button may be pushed and released, and the system must remember the walk has been requested. a) Build an I/O system in TExaS with the appropriate names and colors on the lights and switches. Think about which ports you will be using in part d so that you simulate the exact system you will eventually plan to build. b) Design a finite-state machine that implements a good traffic-light system. Include a graphical picture of your finite-state machine showing the various states, inputs, outputs, wait times, and transitions. Remember the wait function will return input data collected while it is waiting. c) Write the assembly code that implements the traffic-light control system. There is no single, “best” way to implement your traffic light. However, your scheme must be segmented into RAM/EEPROM, and you must use a linked-list data structure. There should be a one-toone mapping from the FSM states and the linked list elements. A “good” solution has about 10 to 20 states in the finite-state machine and provides for input dependence. Try not to focus on the civil engineering issues. Rather, build a quality computer engineering solution that is easy to understand and easy to change. Do something reasonable, and have 10 to 20 states. A good solution has 1. 2. 3. 4. 5.
One-to-one mapping between state graph and data structure No conditional branches in program The state graph defines exactly what it does in a clear and unambiguous fashion The format of each state is the same Good names and labels
Typically in real applications using an embedded system, we put the executable instructions and the finite-state machine linked-list data structure into the nonvolatile memory (flash EEPROM). A good implementation will allow minor changes to the finite machine (adding states, modifying times, removing states, moving transition arrows, and changing the initial state) simply by changing the linked list controller, without changing the executable instructions. Making changes to executable code requires you to debug/verify the system again. If there is a one-to-one mapping from FSM to linked-list data structure, then if we just change the state graph and follow the one-to-one mapping, we can be confident our new system still operates properly. Obviously, if we add another input sensor or output light, it may be necessary to update the executable part of the software and re-assemble. During the debugging phase with the TExaS simulator, you can run with a fast TCNT clock (TSCR2 $00). d) After the software has been debugged on the simulator, you will implement it on the real board. The first step is to interface three pushbutton switches for the sensors. Do not place or remove wires on the protoboard while the power is on. Build the switch circuits and test the voltages using a digital voltmeter. You can also use the debugger to observe the input pin to verify the proper operation of the interface. The next step is to build six LED output circuits. You can use the two LEDs on the docking module (PT1, PT0) in addition to the six external LEDs you will build on your protoboard. Look up the pin assignments in the 7406 data sheet. Be sure to connect 5 V power to pin 14 and ground to pin 7. You can use the debugger to set the direction
6.14 䡲 Laboratory Assignments
255
register to output. Then, you can set the output high and low, and measure the three voltages (input to 7406, output from 7406 which is the LED cathode voltage, and the LED anode voltage). e) Debug your combined hardware/software system on the actual 9S12 board. When using the real 9S12, you should run with a slow TCNT clock (TSCR2 $07). An interesting question that may be asked during checkout is how you could experimentally prove your system works. In other words, what data should be collected and how would you collect it?
7
Local Variables and Parameter Passing Chapter 7 objectives are to: c Explain how to implement local variables on the stack c Show how various C compilers implement local variables and pass parameters c Compare and contrast call-by-value versus call-by-reference parameter passing
Variables are an important component of software design, and there are many factors to consider when creating variables. Some of the obvious considerations are the size and format of the data. Another factor is the scope of a variable. The scope of a variable defines which software modules can access the data. Variables with an access that is restricted to one software module are classified as private, and variables shared between multiple modules are public. In general, a system is easier to design (because the modules are smaller and simpler), easier to change (because code can be reused), and easier to verify (because interactions between modules are well-defined) when we limit the scope of our variables. However, since modules are not completely independent, we need a mechanism to transfer information from one to another. In this chapter, we will develop parameter passing methodologies. Because their contents are allowed to change, all variables must be allocated in RAM and not ROM. On the one hand, global variables contain information that is permanent and are usually assigned a fixed location in RAM. On the other hand, local variables contain temporary information and are stored in a register or allocated on the stack. One of the important objectives of this chapter is to present design steps for creating, using, and destroying local variables on the stack. In summary, there are three types of variables: public globals (shared permanent), private globals (unshared permanent), and private locals (unshared temporary). Because there is no appropriate way to create a public local variable, we usually refer to private local variables simply as local variables, and the fact that they are private is understood.
7.1
Local Versus Global A local variable contains temporary information. Since we will implement local variables on the stack or in registers, this information can not be shared with other software modules. Therefore, under most situations, we can further classify these variables as private. Local variables are allocated, used, then deallocated, in this specific order. For speed reasons, we wish to assign local variables to registers. When we assign a
256
7.1 䡲 Local Versus Global
257
local variable to a register, we can do so in a formal manner. There will be a certain line in the assembly software at which the register begins to contain the variable (allocation), followed by lines where the register contains the information (access or usage), and a certain line in the software after which the register no longer contains the information (deallocation). As an example, consider the register allocation used in a finite-state machine controller, shown earlier as Program 6.22, and again here as Program 7.1. Register B is allocated for holding the Output value in Line 6, used in Lines 6 through 9, then deallocated, such that after Line 9, Register B can be used for other purposes. Register B and Y are used in this program to temporarily hold information, and hence are classified as local variables. Constrast this to how Register X is used. This is a VERY simple program, and in such, the usage of Register X is unusual. This main program assigns Register X to hold the state pointer (Pt) in Line 5. From that point in time, Register X always contains Pt, and hence we classify this assignment of Register X as global (meaning permanent). It is appropriate to assign a register as a global only in the most simple situations (e.g., less than a 20-line program with no interrupts). Program 7.1 Register assignments in a finite-state machine controller.
Line 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Program Main lds #$4000 bsr Timer_Init ldab #$FC stab DDRT ldx #goN FSM ldab OUT,x lslb lslb stab PTT ldy WAIT,x bsr Timer_Wait10ms ldab PTT andb #$03 lslb abx ldx NEXT,x bra FSM
Register B
Register X
Register Y
$FC
Output Output Output Output
Input Input Input Input
Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt
Wait Wait
The information stored in a local variable is not permanent. This means if we store a value into a local variable during one execution of the module, the next time that module is executed the previous value is not available. Examples include loop counters and temporary sums. We use a local variable to store data that is temporary in nature. We can implement a local variable using the stack or registers. Some reasons why we choose local variables over global variables include: 䡲 Dynamic allocation/release allows for reuse of RAM memory. 䡲 Limited scope of access (making it private) provides for data protection; only the program that created the local variable can access it. 䡲 Since an interrupt will save registers and create its own stack frame, the code is reentrant. 䡲 Since absolute addressing is not used, the code is relocatable. Some reasons why we place local variables on the stack rather than using registers include: 䡲 We can use symbolic names for the local variables, making it easier to understand. 䡲 The number of variables is only limited by the size of the stack, which is more than registers. 䡲 Because it is more general, it will be easier to add additional variables in the future.
258
7 䡲 Local Variables and Parameter Passing Checkpoint 7.1: How do you create a local variable in C?
A global variable is allocated at a permanent and fixed location in RAM. A public global variable contains information that is shared by more than one program module. We must use global variables to pass data between the main program (i.e., foreground thread) and an ISR (i.e., background thread). If a function called from the foreground belongs to the same module as the ISR, then a global variable used to pass data between the function and the ISR is classified as a private global (assuming software outside the module does not directly access the data). Global variables are allocated at assembly time and never deallocated. Allocation of a global variable means the assembler assigns the variable a fixed location in RAM. The information they store is permanent. Examples include time of day, date, calibration tables, user name, temperature, fifo queues, and message boards. We use absolute addressing (direct or extended) to access their information. When dealing with complex data structures like the ones presented in Chapter 6, pointers to the data structures are shared. In general, it is a poor design practice to employ public global variables. On the other hand, private global variables are necessary to store information that is permanent in nature. Observation: Sometimes we store temporary information in global variables because it is easier to observe the contents using the debugger. This usage is appropriate during the early stages of development, but once the module is tested, temporary information should be converted to local, and the system should be tested again. Checkpoint 7.2: How do you create a global variable in C?
In C, a static local has permanent allocation, which means it maintains its value from one call to the next. It is still local in scope, meaning it is only accessible from within the function. I.e., modifying a local variable with static changes its allocation (it is now permanent), but doesn’t change its scope (it is still private). In the following example, count contains the number of times MyFunction is called. The initialization of a static local occurs just once, during startup. void MyFunction(void){ static short count=0; count++; }
In C, we create a private global variable using the static modifier. Modifying a global variable with static does not change its allocation (it is still permanent), but does reduce its scope. Regular globals can be accessed from any function in the system (public), whereas a static global only can be accessed by functions within the same file. Static globals are private. Functions can be static also, meaning they can be called only from other functions in the file. E.g., static short myPrivateGlobalVariable; // accessible by this file only void static MyPrivateFunction(void){ }
In C, a const global is read-only. It is allocated in the ROM portion of memory. Constants, of course, must be initialized at compile time. E.g., const short Slope=21; const char SinTable[8]={0,50,98,142,180,212,236,250}; Common Error: If you leave off the const modifier in the SinTable example, the table will be allocated twice: once in ROM containing the initial values and once in RAM
7.2 䡲 Stack Rules
259
containing data to be used at run time. Upon startup, the system copies the ROM-version into the RAM-version. Maintenance Tip: It is good practice to specify whether an assembly variable is signed or unsigned in the comments. If the information has units (e.g., volts, seconds, etc.) this should be included also.
7.2
Stack Rules In the last section, we discussed the important issue of global versus local variables. One of the more flexible means to create local variables will be the stack. In this section, we define a set of rules for proper use of the stack. A last-in-first-out (LIFO) stack is implemented in hardware by most computers. The stack can be used for local variables (temporary storage), saving return addresses during subroutine calls, passing parameters to subroutines, and saving registers during the processing of an interrupt. The first advantage of placing local variables on the stack is that the storage can be dynamically allocated before usage and deallocated after usage. The second advantage is the facilitation of reentrant software. The stack pointer (SP) on the 9S12 points to the top entry of the stack, as shown in Figure 7.1. If it exists, we define the data immediately below the top (larger memory address) as next to top. To push a byte on the stack, we first decrement the stack pointer (SP), then we store the byte at the location pointed to by the SP. To pull a byte from the stack, first we read the byte from memory pointed to by SP, then we increment the SP. To push a 16-bit word on the stack, we first decrement the SP by 2, then we store the word into that location. To pull a 16-bit word from the stack, we first read the word from the location pointed to by SP, then we increment the SP by 2.
Figure 7.1 The 9S12 stack. The white boxes are free spaces, and the shaded boxes contain data.
Stack with 3 elements
Empty Stack
SP
top next
SP
Checkpoint 7.3: How do we push/pull a 16-bit word onto/from the stack?
The instruction tsx will transfer a copy of the stack pointer into Register X. The instruction causes Register X to point to the top element of the stack, as shown in Figure 7.2. The instruction tsy works in a similar manner with Register Y. The tsx and tsy instructions do not modify the stack pointer. Formally, there is only SP that defines what data is on the stack. However, having a second pointer also point into the stack provides additional flexibility for accessing data. Figure 7.2 The tsx instruction creates a stack frame pointer.
Stack before
Stack after txs SP
SP
top next
top X
next
260
7 䡲 Local Variables and Parameter Passing
We can read and write previously allocated locations on the stack using indexed mode addressing. For example, to read an 8-bit value from the next to the top byte: tsx ldaa 1,X
;Reg X points to the top byte of the stack ;Reg A = the next to the top byte
Stack pointer indexed mode also can be used to read any data on the stack: ldaa 1,SP
;Reg A = the next to the top byte
The LIFO stack has a few rules (repeated from Chapter 5): 1. 2. 3. 4. 5.
Program segments should have an equal number of pushes and pulls. Stack accesses (push or pull) should not be performed outside the allocated area. Stack reads and writes should not be performed within the free area. Stack push should first decrement SP, then store the data. Stack pull should first read the data, then increment SP.
Programs that violate rule number 1 will probably crash when a rts instruction pulls an illegal address off the stack at the end of a subroutine. The TExaS simulator will usually recognize this error as an illegal memory access then the processor tries to fetch an op code at this incorrect address. The backdump command will be useful to retrace the steps leading up to the crash. Figures 7.1 and 7.2 show the free area as white boxes. Violations of rule number 2 can be caused by a stack underflow or overflow. Stack underflow is caused when there are more pulls than pushes and is always the result of a software bug. The TExaS simulator will recognize this error as an illegal memory access when the processor tries to pull data from an address that doesn’t exist. A stack overflow can be caused by two reasons. If the software mistakenly pushes more than it pulls, then the stack pointer will eventually overflow its bounds. Even when there is exactly one pull for each push, a stack overflow can occur if the stack is not allocated large enough. Stack overflow is a very difficult bug to recognize, because the first consequence occurs when the computer pushes data onto the stack and overwrites data stored in a global variable. At this point, the local variables and global variables exist at overlapping addresses. Setting a breakpoint at the first address of the allocated stack area allows you to detect a stack overflow situation. Checkpoint 7.4: How do you specify the size of the stack?
The following 9S12 assembly code violates rule 3, and will not work if interrupts are active. The objective is to save register A onto the stack. When an interrupt occurs, registers automatically will be pushed on the stack, destroying the data. staa -1,SP
;Store zero onto the stack (***illegal***)
To use the stack, one first allocates, then saves. The following assembly code also violates rule 3, because it first stores it on the stack, then allocates space. The objective is to push a zero onto the stack. If an interrupt were to occur between the clr and des instructions in the following example, the zero will be destroyed when registers are pushed on the stack by the interrupt context switch: tsx clr -1,X des
;Reg X points to the top of the stack ;Store zero onto the stack (***illegal***) ;Make space for the zero
The proper technique is to allocate first, then store: des clr 0,SP
;Allocate stack space first ;Store zero onto the stack
or clr 1,-SP ;Store zero onto the stack
7.3 䡲 Local Variables Allocated on the Stack
261
Constants can be pushed on the stack with the movb and movw instructions. For example, to push the byte 7: movb #7,1,-SP ;push a 7 onto the stack Checkpoint 7.5: Write an assembly instruction that pushes a 16-bit 1000 onto the stack.
7.3
Local Variables Allocated on the Stack Stack implementation of local variables has four stages: binding, allocation, access, and deallocation. 1. Binding is the assignment of the address (not value) to a symbolic name. The symbolic name will be used by the programmer when referring to the local variable. The assembler binds the symbolic name to a stack index, and the computer calculates the physical location during execution. In the following example, the local variable will be at address SP 0, and the programmer will access the variable using sum,SP addressing: sum
set
0
;16-bit local variable, stored on the stack
Checkpoint 7.6: Why is set better than equ for binding?
2. Allocation is the generation of memory storage for the local variable. The computer allocates space during execution by decrementing the SP. In this first example, the software allocates the local variable by pushing a register on the stack. An 8-bit push (e.g., psha) creates an unitialized 8-byte local variable, and a 16-bit push (e.g., pshx) creates an unitialized 16-byte local variable The value in the register is irrelevant; these instructions are used because they are a fast way to decrement the SP. pshx
;allocate 16-bit sum
In this next example, the software allocates the local variable by decrementing the stack pointer. This local variable is also uninitialized. This method is most general, allowing the allocation of an arbibrary amount of data. leas -2,SP
;allocate sum
Checkpoint 7.7: In what way is pshx better than leas -2,sp for allocating a 16-bit local? In what way is leas -2,sp better?
If you wished to allocate a 16-bit local and initialize it to zero, you could execute: ldx #0 pshx ;allocate sum=0
or movw #0,2,-sp ;allocate sum=0 Checkpoint 7.8: Assume Register A contains the size in bytes of an array, determined at run-time. Write assembly code to allocate the array on the stack.
3. The access to a local variable is a read or write operation that occurs during execution. In the next code fragments, the value of the local variable sum is initialized to 0. One way is tsx ldd std
;X points to locals #0 sum,x ;sum=0
and another way is movw #0,sum,sp
;sum=0
262
7 䡲 Local Variables and Parameter Passing
In the next code fragment, the local variable sum is incremented. We could use RegX to access the data tsx ldd sum,x addd #1 std sum,x
;sum=sum+1
or use the SP directly. ldd sum,sp addd #1 std sum,sp
;sum=sum+1
4. Deallocation is the release of memory storage for the location variable. The computer deallocates space during execution by incrementing SP. In this first example, the software deallocates the local variable by pulling a register from the stack. pulx
;deallocate sum
Observation: When the software uses the “push-register” technique to allocate and the “pull-register” technique to deallocate, it looks like it is saving and restoring the register. Because most applications of local variables involve storing into the local, the value pulled will NOT match the value pushed.
In this next example, the software deallocates the 16-bit local variable by incrementing the stack pointer twice. leas 2,SP
;deallocate sum
Checkpoint 7.9: Write a 9S12 subroutine that allocates then deallocates three 8-bit locals.
7.4
Stack Frames Assume the SP is initialized to $4000. By definition, the SP points to the top of the stack. Therefore, all data on the stack exist at addresses between SP and $3FFF, i.e., SP address $3FFF. However, sometimes it is convenient to setup a second pointer into the stack, using either register X or Y, called a stack frame pointer. For example, the stack frame pointer can point to a set of local variables and parameters of the function. It is important in this implementation that once the stack frame pointer is established (e.g., using the tsx instruction), that the stack frame register (X) not be modified. The term frame refers to the fact that the pointer value is fixed. If Register X is a fixed pointer to the set of local variables, then a fixed binding (using the equ or set pseudo op) can be established between Register X and the local variables (even if additional information is pushed on the stack.) Because the stack frame pointer should not be modified, every subroutine will save the old stack frame pointer of the function that called the subroutine (e.g., pshx at the top) and restore it before returning (e.g., pulx at the bottom.) In some cases, the txs instruction can be used to deallocate the local variables. Local variable access uses the indexed addressing mode using Register X. Observation: One advantage of using a stack frame is that you can push and pull within the body of the function and still be able to access local variables using their symbolic name. Observation: One disadvantage of using a stack frame is that a register is dedicated as the frame pointer, and thus, it is unavailable for general use.
Programs 7.2, 7.3, and 7.4 all calculate the 16-bit sum of the first 100 numbers. The purpose of these simple programs is to demonstrate various implementations of local variables. In these programs, the result will be returned by value in Register D.
7.4 䡲 Stack Frames Program 7.2 A simple function with two local 16-bit variables.
263
unsigned short calc(void){ unsigned short sum,n; sum = 0; for(n=100;n>0;n--){ sum=sum+n; } return sum; }
Program 7.3 shows two implementions using regular stack pointer addressing, as drawn in Figure 7.3 (left). The implementation on the left of Program 7.3 has no binding and is difficult to understand. In this version, the variable n is accessed using 2,SP addressing mode. The version on the right has exactly the same machine code as the left (same size and execution speed), but is easier to understand because the local variables are referred to by their symbolic names. Figure 7.3 Local variables on the stack, accessed with indexed addressing modes.
; *****binding phase*************** sum set 0 ;16-bit number n set 2 ;16-bit number ; *******allocation phase ********* calc leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,sum,sp ;sum=0 movw #100,n,sp ;n=100 loop ldd n,sp ;RegD=n addd sum,sp ;RegD=sum+n std sum,sp ;sum=sum+n ldd n,sp ;n=n-1 subd #1 std n,sp bne loop ; ********deallocation phase ***** leas 4,sp ;deallocation rts ;RegD=sum
Program 7.3 Stack pointer implementation of a function with two local 16-bit variables. The program on the left is a poor style without binding, and the one on the right is a good style with binding.
Program 7.4 shows two implementions using stack frame pointer addressing. The one on the left has no binding and is difficult to understand. The one on the right has exactly the same machine code but is easier to understand. The program establishes the frame pointer, then allocates the variables. In Program 7.4, the variable n is accessed using 2,X addressing mode, as shown in Figure 7.3 (right). Notice in both cases of Figure 7.3 that valid data on the stack exists in memory at addresses greater or equal to the stack pointer. In particular, one does not allocate/deallocate stack space by changing Registers X or Y. I.e., decrementing SP allocates space, and incrementing SP deallocates space.
; *****binding phase*************** sum set -4 ;16-bit number n set -2 ;16-bit number ; *******allocation phase ********* calc pshx ;save old Reg X tsx ;stack frame pointer leas -4,sp ;allocate n,sum ; ********access phase ************ movw #0,sum,x ;sum=0 movw #100,n,x ;n=100 loop ldd n,x ;RegD=n addd sum,x ;RegD=sum+n std sum,x ;sum=sum+n ldd n,x ;n=n-1 subd #1 std n,x bne loop ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts
Program 7.4 Stack frame pointer implementation of a function with two local 16-bit variables. The program on the left is a poor style without binding, and the one on the right is a good style with binding.
Example 7.1. Write an assembly subroutine with three 8-bit and one 16-bit local variables allocated on the stack. Name the variables cnt, n, flag, and pt. Solution There are two general approaches for creating local variables on the stack. Stack pointer addressing is faster, but stack frame addressing is more flexible, allowing for additional stack pushes within the body of the subroutine. The solutions in Program 7.5 begin by
; *****binding phase*************** cnt set 0 ;8-bit number n set 1 ;8-bit number flag set 2 ;8-bit number pt set 3 ;16-bit number ; *******allocation phase ********* func leas -5,sp ;allocate cnt,n,flag,pt
; *****binding phase*************** cnt set -5 ;8-bit number n set -4 ;8-bit number flag set -3 ;8-bit number pt set -2 ;16-bit number ; *******allocation phase ********* func pshx ;save old Reg X tsx ;stack frame pointer leas -5,sp ;allocate cnt,n,flag,pt ; ********access phase ************ ; ********deallocation phase ***** txs ;deallocation pulx ;restore old X rts ;RegD=sum
Program 7.5 Three 8-bit and one 16-bit local variables on the stack. The program on the left uses stack pointer addressing, and the one on the right uses a stack frame pointer.
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables Figure 7.4 Three 8-bit and one 16-bit local variables on the stack.
Stack pointer addressing
SP
cnt n flag pt
0,SP 1,SP 2,SP 3,SP
265
Stack frame pointer addressing –5,X cnt SP –4,X n –3,X flag –2,X pt X
Old Reg X
return address
return address
8 bits
8 bits
allocating five bytes of storage. When using SP addressing, we simply decrement the stack pointer by 5. When using stack frame pointer addressing, we save the frame pointer, copy the SP into the frame pointer, and then decrement the stack pointer by 5. We then draw a picture of the stack at this point, and assign the four variables into the five bytes of storage, as shown in Figure 7.4. There is no particular advantage of one assignment over another, as long as the four variables exist contiguously. We label the addressing mode to be used to access each variable, and use these numbers to assign the bindings in our software.
7.5 Parameter Passing Using Registers, Stack, and Global Variables Up to this point in the book, we used registers to pass data into and out of subroutines. The input parameters (or arguments) are pieces of data passed from the calling routine into the subroutine during execution. The output parameter (or argument) is information returned from the subroutine back to the calling routine after the subroutine has completed its task. As previously defined in Chapter 6, there are two methods to pass parameters: call by reference and call by value. With call by reference, a pointer to the object is passed. In this way, the subroutine and the module that calls the subroutine have access to the exact same object. Call by reference can be used to pass a large quantity of data and can be used to implement a parameter that is both an input and an output parameter. With call by value, a copy of the data itself is passed. Using the stack to pass parameters provides a much greater flexibility not possible with just the registers.
7.5.1 Parameter Passing in C
The call-by-reference method passes a pointer to the object. In other words, references (pointers) to the actual arguments are passed, instead of copies of the actual arguments themselves. In this scheme, assignment statements have implied side effects on the actual arguments; that is, variables passed to a function are affected by changes to the formal arguments. Sometimes side effects are beneficial, and some times they are not. As an example, consider a stepper motor program shown in Program 7.6. Both assembly and C versions are shown. With call-by-reference parameter passing, there is one copy of the information, and the calling program (e.g., main) passes an address (RegX in the assembly version) to the function. The read and write accesses to the parameter affect the original variable. Since C supports only one formal output parameter, we can implement additional output parameters using call by reference. The calling program passes pointers to empty objects
266
7 䡲 Local Variables and Parameter Passing
Program 7.6 An input/output parameter is implemented using call by reference.
;RegX points to the angle next inc 0,x ;(*pt)++ ldaa 0,x ;RegA=(*pt) cmpa #200 bne skip clr 0,x ;(*pt) = 0 skip rts angle set 0 ;0 to 199 main lds #$4000 clr 1,-SP ;angle=0 jsr Stepper_Init loop jsr Stepper_Step leax angle,sp ;RegX=&angle bsr next bra loop
(RegX and RegY in the assembly version), and the where function fills the objects with data. Program 7.7 shows a function that returns two parameters using call by reference. Assume global variables Xx Yy are private to the where function and contain the true current position. Program 7.7 Multiple output parameters implemented using call by reference.
Xx rmb 2 ; private to where Yy rmb 2 where movw Xx,0,X ;RegX = xpt movw Yy,0,Y ;RegY = ypt rts myX set 0 ;16-bit myY set 2 func leas -4,sp ;allocate leax myX,sp ;RegX=&myX leay myY,sp ;RegY=&myY bsr where ;do something based on myX,myY leas 4,sp ;deallocate rts
short Xx,Yy; /* position */ void where(short *xpt, short *ypt){ (*xpt) = Xx; // return Xx (*ypt) = Yy; // return Yy } void func(void){ short myX,myY; where(&myX,&myY); // do something based on myX,myY }
When we use the call-by-value scheme, the values (not references) are passed to functions. With call by value, copies are made of the parameters. Within a called function, references to formal arguments access the copied values, instead of the original objects from which they were taken. At the time when the computer is executing within next, as shown in Program 7.8, there will be two separate and distinct copies of the angle data. An important point to remember about passing arguments by value in C is that there is no connection between an actual argument and its source. Changes to the arguments made within a function, have no affect what so ever on the objects that might have supplied their values. They can be changed and the original values will not be affected. This removes a burden of concern from the programmer since he may use arguments as local variables without side effects. It also avoids the need to define temporary variables just to prevent side effects. It is precisely because C uses call by value that we can pass expressions, not just variables, as arguments. The value of an expression can be copied, but it cannot be referenced since it has no existence in memory. Therefore, call by value adds important generality to the language. Since expressions may include assignment, increment, and decrement operators, it is possible for argument expressions to affect the values of arguments lying to their right. Consider, for example, func(y=x+1, 2*y);
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables ;Input: RegA is theAngle ;Output:RegA is theAngle next inca ;theAngle++ cmpa #200 bne skip clra ;theAngle=0 skip rts angle set 0 ;0 to 199 main lds #$4000 clr 1,-SP ;angle=0 jsr Stepper_Init loop jsr Stepper_Step ldaa angle,sp ;copy bsr next staa angle,sp bra loop
Program 7.8 Parameters are implemented using call by value.
where the first argument has the value x+1 and the second argument has the value 2*(x+1). The value of the second argument depends on whether the arguments are evaluated right-toleft or left-to-right. This kind of situation should be avoided, since the C language does not guarantee the order of argument evaluation. The safe way to write this is y=x+1; func(y, 2*y);
The value of the expression is calculated at the time of the call, and that value is passed into the subroutine. Checkpoint 7.10: What is the difference between call by value and call by reference?
7.5.2 Parameter Passing in Assembly Language
Program 7.9 Multiple return parameters implemented with registers.
In contrast to C, it is easy to return multiple parameters in assembly language. If just a few parameters need to be returned we can use the registers. In Program 7.9, the values of ports A, B, T, and M are to be returned. Notice that it packs two 8-bit parameters into the 16-bit Register X.
; Reg A = Port A, Reg B= Port B ; Reg X = Ports T and M GetPorts ldaa PTT ldab PTM xgdx ldaa PORTA ldab PORTB rts ********calling sequence****** jsr GetPorts * Reg A,B,X have four results staa first stab second xgdx staa third stab fourth
268
7 䡲 Local Variables and Parameter Passing
If many parameters are needed, then the stack can be used. Program 7.10 also returns the values of ports A, B, T, and M. Space for the output parameters is allocated by the calling routine, and GetPorts stores the results into those stack locations.
Program 7.10 Multiple return parameters passed on the stack.
dataA dataB dataT dataM GetPorts
set 2 set 3 set 4 set 5 movb PORTA,dataA,sp movb PORTB,dataB,sp movb PTT,dataT,sp movb PTM,dataM,sp rts
********calling sequence****** leas -4,sp ;allocate jsr GetPorts pula ;first staa first pula ;second staa second pula ;third staa third pula ;fourth staa fourth
An input parameter is information passed from the calling program into the subroutine before the subroutine is executed. An output parameter is information passed out of the subroutine back to the calling program after the subroutine is executed. A parameter can be both an input and an output. The purpose of the next set of examples is to illustrate parameter passing. The subroutine Add8 adds M M N, and sets the flag P if there is an unsigned overflow. M is a 16-bit input/output parameter, N is an 8-bit input parameter, and P is a 1-bit output parameter. The simplest and fastest method to pass parameters uses registers. In this method, the information is contained in the registers. Because concurrent programs have “separate” registers and stack areas, the subroutine is reentrant. Program 7.11 shows the addition module. Reentrancy will be discussed in Chapter 12.
Program 7.11 Addition function that passes parameters call by value in registers.
; Subroutine Calling Sequence ; place information in A,X ; bsr Add8 ; use information in CC,X ; Subroutine Definition ; N is an input parameter, an unsigned 8-bit byte, passed in Reg A ; M is an input/output, a 16-bit number, passed/returned in Reg X ; P is an output parameter, a Boolean flag, ; returned in Reg CC carry bit Add8 psha ;Put N on the stack xgdx ;Place M in Reg D addb 1,SP+ ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P xgdx ;Return result in Reg X rts
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables
269
A simple but completely inappropriate method is to pass parameters using global variables. In this method, the information is contained in global memory variables. Because of the writes to global memory M and P, the subroutine, shown in Program 7.12, is not reentrant. Many embedded systems use this approach because the processor has limited or no facilities with handling data on the stack. Program 7.12 Addition function that passes call-by-value parameters in global variables.
; These three variables can be anywhere in RAM memory N rmb 1 ;N is an input parameter, an unsigned 8-bit number M rmb 2 ;M is an input/output parameter, 16 bits P rmb 1 ;P is an output parameter, a Boolean flag, ; 0 means no overflow, -1 means overflow ; Subroutine Calling Sequence ; place information in N,M ; bsr Add8 ; use information in M,P ; Subroutine Definition Add8 clr P ;Assume no overflow, P=0 ldd M ;Place M in Reg D addb N ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P bcc POK ;Skip if P should remain zero com P ;Overflow, P=-1 POK std M ;Return result in M rts
A flexible and elegant method is to pass parameters using the stack. In this method, the information is placed on the system or user stack. As we will see later, most high-level language generate code that passes the first parameter in a register but use the stack to pass additional parameters. However, most high-level languages have only a single output parameter, which is usually returned in a register. When interrupts are enabled, it is possible have multiple threads active at the same time. There is still only one processor, so exactly one thread is actually running at a time, but we define concurrent programming as the state where multiple threads are “ready to run” at the same time. The interrupt hardware provides the mechanism to switch from one thread to the next. Because concurrent threads have “separate” registers and stack areas, software that uses the stack will operate properly in a concurrent environment. Conversely, extreme care is required when using global variables (including the I/O ports) in a concurrent environment. The other advantage of using the stack is that memory space is used temporarily, then deallocated. Program 7.13 passes both Program 7.13 Addition function that passes call-by-value parameters on the stack.
; ; ; ; ; ; ; ; ; ; ; ; ;
Subroutine Calling Sequence des Make room on the stack for P push M (16 bits) onto the stack push N (8 bits) onto the stack bsr Add8 ins Discard input only parameter, N pop M (16 bits) off the stack pop P (8 bits) off the stack Subroutine Definition N is an input parameter, a unsigned 8-bit number, passed on the top of the stack M is an input/output , a 16-bit number, passed/returned on top-1, top-2
continued on p. 270
270
7 䡲 Local Variables and Parameter Passing
continued from p. 269 ; P ;
is an output parameter, a Boolean flag, returned on top-3 Access Contents ;0,SP 16-bit return address N set 2 ;N,SP 8-bit N M set 3 ;M,SP 16 nit M P set 5 ;P,SP 8-bit P Add8 clr P,SP ;Assume no overflow, P=0 ldd M,SP ;Place M in Reg D addb N,SP ;Add N to the LSByte of M adca #0 ;Reg D=M+N, CC(carry bit) = P bcc POK ;Skip if P should remain zero com P,SP ;Overflow, P=-1 POK std M,SP ;Return result in M rts ;Return
input and output parameters on the stack. Figure 7.5 shows the stack at the time while the subroutine is being executed. Figure 7.5 Stack diagram showing the parameters as passed in Program 7.13.
SP
return address N
0,SP 1,SP 2,SP 3,SP
M P
5,SP
8 bits
7.5.3 C Compiler Implementation of Local and Global Variables
One of the most important applications of learning assembly language involves analyzing assembly listings when programming in a high-level language. When one programs in a high-level language, there are many design decisions to be made affecting accuracy (e.g., overflow, dropout), reliability (e.g., buffer overflow, critical section, race condition), speed, and code size. Often, these decisions can be best understood at the assembly language level. In fact, one cannot tell if a section of high-level language code is critical without looking at the associated assembly language generated by the compiler. For another example, assume you are designing a finite-state machine in C. You could implement the FSM using a linked data structure like Program 6.22 or with a table like Program 6.23. If you compiled them both and observed the generated listing files, you could determine which version runs faster. Sometimes we have a highlevel language program that we know doesn’t work, but we just can’t seem to find the bug. Often it is easier to visualize bugs by looking at the assembly listing in and around the bugged code. Another application of observing assembly listing generated by the compiler involves proving program correctness. For example, we might ask if the following C code causes an overflow error, assuming both In and Out are 8-bit unsigned char). Out = (99*In)/100;
There are two ways to determine if overflow could occur. First, we could exhaustively test the software giving all possible inputs and verifying the correct output for each test case.
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables
271
Second, knowing the architecture and assembly language of the machine, we could look at the compiler listing and prove that overflow cannot occur. The following assembly code was generated by the Metrowerks Codewarrior V4.6 compiler. Because Out will always be less then In the multiplication is 8 by 8 into 16 bits, and the division is 16 by 16 into 16 bits, so this software can not overflow. Furthermore, we see this code takes exactly 23 cycles to execute. 0006 0008 000b 000c 000f 0011 0013
c663 b60000 12 ce0064 1815 b751 7b0000
[1] [3] [1] [2] [12] [1] [3]
LDAB LDAA MUL LDX IDIVS TFR STAB
#99 In #100 X,B Out
The specific goal of this section is to study how compilers implement local variables and pass parameters. However, in the big picture, we can improve our understanding of both the machine architecture and our high-level language programs by looking at the assembly code generated by the compiler. Program 7.14 shows a simple C program with a global variable G, two local variables both called z, and function parameters m and n. All three compilers analyzed in this section will pass one parameter in Register D and push the other parameter on the stack. If there were additional parameters, they too would have been pushed on the stack by the calling routine. Furthermore, all three compilers will push the one parameter initially passed in Register D onto the stack at the beginning of the subroutine. In this way, during the execution of the subroutine sub, the parameters are all on the stack. The first two compilers studied in this section will place the local variables on the stack. The third compiler will generate more efficient code by placing the local variables in registers as needed.
Program 7.14 An example used to illustrate the C compiler’s use of the stack.
short G; // definition of a global variable short sub(short n, short m){ short z; z = n-m; return(z); } void main(void){ short z; // definition of a local variable G = 5; // access global variable z = 6; // access local variable G = sub(z,1); // call function, pass parameter return(0); }
Observation: Although the local variables of the main program are on the stack, and it IS possible to access them, the compiler will NOT allow the subroutine to access them. In C, there is a clear distinction between the parameters pushed on the stack that are supposed to be accessed by the subroutine and the local variables of the calling program, which are not supposed to be accessed. Common Error: It would be a grievous programming error to access the local variables of the main program from the subroutine. Therefore, in assembly language, it is essential to make the distinction between local variables and data passed on the stack to the subroutine.
272
7 䡲 Local Variables and Parameter Passing
Program 7.15 Assembly code generated for the 6812 by the GCC compiler.
z n m sub
set set set movw pshx pshx sts ldx std ldx ldd ldx subd ldx std ldx ldd pulx pulx movw rts z set main movw pshx sts movw ldx movw movw ldx ldd bsr leas std ldd pulx movw rts
2,SP+,$0800 0 $0800,2,-SP $0800 #5,G $0800 #6,z,X #1,2,-SP $0800 z,X sub 2,SP G #0 2,SP+,$0800
;1)save previous stack frame pointer ;allocate space for n,z ;2)establish stack frame pointer ;place n on the stack ;3) use frame to access ;RegD=n ;3) use frame to access ;RegD=n-m ;3) use frame to access ;z=n-m ;3) use frame to access ;RegD=z ;deallocate n,z
n m z z
;4)restore previous stack frame pointer
;1)save previous stack frame pointer ;allocate z ;2)establish stack frame pointer ;G=5 ;3) use frame to access z ;z=6 ;push second parameter onto stack ;3) use frame to access z ;first parameter in RegD ;discard parameter ;G = sub(z,1) ;deallocate z ;4)restore previous stack frame pointer
The first compiler we will study is GCC Release 3.1 for the 6812. The assembly listing, shown as Program 7.15, has been edited to be consistent with the syntax of this book. In particular, the set pseudo-ops were added to help see where information is stored on the stack. The sts instruction establishes a stack frame pointer, at global memory $0800. The use of the stack frame pointer follows the typical pattern: (1) save old frame, (2) establish a new frame, (3) use the frame whenever accessing data on the stack, and (4) restore the previous frame. The pshx instruction allocates local variables. The Register X indexing mode is used to access the data on the stack. The pulx instruction deallocates the local variables. The stack pictures for the three compilers at the time of the subd instruction are drawn in Figure 7.6. Although the local variable of main is on the stack, it will not be (and should not be) accessed by the subroutine. The next compiler we will study is ImageCraft ICCV7 for the Freescale 6812. Again, the disassembled output has been edited to clarify its operation, and shown as Program 7.16. The global symbol, G, will be assigned or bound by the linker/loader. The leas instruction
7.5 䡲 Parameter Passing Using Registers, Stack, and Global Variables GCC for the 6812 Global SP area frame G
ICCV7 for the 6812
Stack area
n
0,X z of sub 2,X old Frame 4,X return addr 6,X 8,X m z of main old Frame
Stack area
Global area G
Metrowerks Stack area Codewarrior 4.6 Global area G
SP
z of sub n
0,SP 2,SP return addr 4,SP m 6,SP z of main 16 bits
16 bits
273
SP
0,SP m return addr 2,SP 4,SP n 16 bits
Figure 7.6 The stack contains local variables, parameters, and the return address.
allocates and deallocates local variables, and stack pointer addressing is used to access parameters and local variables. This compiler passes the first input parameter into the subroutine by placing it in Register D. The remaining parameters are pushed on the stack by the calling routine.
Program 7.16 Assembly code generated for the 6812 by the ICCV7 compiler.
z m n sub
z main
set 0 set 6 set 2 pshd ;place n on the stack leas -2,SP ;allocate z ldd n,SP ;RegD = n subd m,SP ;RegD = n-m tfr D,Y sty z,SP ;z = n-m tfr Y,D leas 4,SP ;deallocate z,n rts set 2 leas -4,SP ;allocate z,secondParameter movw #5,G ;G=5 movw #6,z,SP ;z=6 ldy #1 sty 0,SP ;put second parameter on stack ldd z,SP ;first parameter in RegD jsr sub tfr D,X std G ;G = sub(z,1) ldd #0 leas 4,SP ;deallocate z,secondParameter rts
The third compiler we will study is Metrowerks Codewarrior 4.6 for the Freescale 9S12. Again, the disassembled output has been edited to clarify its operation (see Program 7.17). This is a highly optimized compiler. The local variable in both main and sub was implemented in a register. For this compiler, the second (or last) parameter is passed in Register D and the remaining parameters are pushed on the stack.
274
7 䡲 Local Variables and Parameter Passing
Program 7.17 Assembly code generated for the 9S12 by the ICC12 compiler.
m n sub
set 0 set 4 pshd ;place m on the stack ldd n,sp ;RegD = n subd m,sp ;RegD = n-m pulx ;deallocate m rts main ldab #5 clra std G ;G=5 incb ;RegD=z=6 pshd ;put first parameter on stack ldab #1 ;second parameter in RegD bsr sub leas 2,sp ;discard parameter std G ;G = sub(z,1) clrb clra rts
Observation: Notice the difference in code efficiency between a free compiler (GCC), a compiler costing about $250 (ICCV7), and a compiler costing over $3000 (Metrowerks Codewarrior).
7.6
Tutorial 7 Debugging Techniques The objective of this tutorial is to illustrate some debugging techniques. In particular, we will use TExaS to visualize stack overflow and stack underflow. Action: Copy the Tutor7.rtf Tutor7.uc files from the Web onto your hard drive. Start a fresh copy of TExaS and open these files from within TExaS. This should open the corresponding microcomputer window. This program contains an integer square root subroutine, based on Newton’s method. There is a bug in it that causes a stack overflow. The purpose of this main program is to exhaustively test this function by giving it all possible input patterns and manually checking the validity of all outputs. Being able to evaluate a subroutine with a known and repeatable sequence of inputs is called stabilization. Once a system is stabilized (the inputs are fixed and known), changes to the subroutine can be made being sure changes in the output are a result of software modification and not due to changes in the input. Question 7.1 This is a very easy bug to spot, but it represents a typical programming error. By visual inspection of the main program, identify the programming error that causes the stack overflow, but don’t fix it. Question 7.2 What’s the difference between a breakpoint and a ScanPoint? Action: Assemble the program. Notice that Input and Output parameters with unsigned 8-bit decimal format are in the ViewBox. A breakpoint has been added at the location in the main program labeled check. You can add breakpoints in two ways. The first way is to left-click the line in the listing file, then right-click executing BreakAtCursor. The second way is to type the address (you should use the symbolic address check rather than its numerical value) into the Break/ScanPoints box and click the add button. You could have used its absolute address, but absolute addresses must be recalculated each time the software is modified. The double red arrow («) points in the listing file to the breakpoint. Make check a ScanPoint by toggling the Mode->BreakMode command until the check mark is removed. Figure T7.1 shows the resulting configuration.
7.6 䡲 Tutorial 7 Debugging Techniques
275
Figure T7.1 A ScanPoint is added to Tutorial 7.
Action: Run the system until the first ten outputs are calculated, then stop the simulation with a F12. You should see the following results in TheLog.rtf file. These results are correct. Input=0 Input=1 Input=2 Input=3 Input=4 Input=5 Input=6 Input=7 Input=8 Input=9
Question 7.3 Explain how these first ten results are correct. In particular, verify how the output is the square root of the input. Are there any minor errors? Action: Run the system until TExaS gives the “ Write to EEPROM address 0x07FF” error. Hit reset, run it again, and this time observe the memory box in the Stack window. Notice locations $0800 (Input) and $0801 (Output). The rest of the memory ($802 to $0901) is the stack. In particular, watch in the memory box as the stack overflows. Question 7.4 Look in TheList.rtf file and identify which instruction caused the error. The cursor arrow (») will point to the instruction after the one that caused the error. Action: When a stack instruction causes a bug, observing the stack pointer makes sense. Add the SP to the ViewBox, hit reset, and run it again. The last few outputs are shown below Input=59 Output=8 SP=$0813 Input=60 Output=8 SP=$080F Input=61 Output=8 SP=$080B Input=8 Output=8 SP=$0807 Write to EEPROM address 0x07FF. Question 7.5 Stack errors can cause weird behavior. Why did input change from 61 to 8, when it should have been 62? Action: Fix the bug (change the second pshx to a pulx), assemble, and run the debugged system. Action: Sometimes a stack error results in program branching to a location that is not part of your program. Remove the pshb instruction from first line of the sqrt subroutine. Assemble the software with this new bug and run the system. This stack underflow will cause an error. You should get a Read from uninitialized RAM address error.
276
7 䡲 Local Variables and Parameter Passing Question 7.6 You won’t be able to find the cursor arrow (») in TheList.rtf file. Add the PC to the ViewBox, hit reset, run the system again, and check the value of the PC at the time of error. Question 7.7 There are two ways to find this bug. The first way is to execute Action-BackDump. What are the last five instructions to be executed just before the error? Where in the program are these five instructions? Question 7.8 The second way to visualize the error is to activate Mode-FollowPC. Click this option, reset the computer, and run it again. The rts instruction is highlighted, showing you the last instruction to execute. What does the purple color on the pulb instruction mean?
7.7
Homework Problems Homework 7.1 What does it mean to say a function is public versus private? Why is this distinction important? Homework 7.2 What does it mean to say a variable is public versus private? Why is this distinction important? Homework 7.3 What does it mean to say a variable is local versus global? Homework 7.4 Write assembly code that finds the average value of a ten-element array. The two parameters are passed by reference on the stack. Local variables must be allocated on the stack. void average(unsigned short *pt, unsigned short *ave){ unsigned short sum,n; sum = 0; for(n=0;n<10;n++){ sum = sum+(*pt); pt = pt+1; } (*ave) = sum/10; } A typical calling sequence is ldx #mydata ; pointer to 10-element structure pshx ldx #myave ; pointer to result pshx jsr average pulx ; balance stack pulx Homework 7.5 Write assembly code that calculates the average of three numbers. The three parameters are passed by value on the stack. The return parameter is passed back in Register D. Local variables must be allocated on the stack. short average(short data1, short data2, short data3){ short sum; sum = data1+data2+data3; return = sum/3; } A typical calling sequence is ldx var1 ; first parameter pshx ldx var2 ; second parameter pshx ldx var3 ; third parameter pshx jsr average std var4 ; var4=(var1+var2+var3)/3
7.7 䡲 Homework Problems pulx pulx pulx
277
; balance stack
Homework 7.6 Write assembly code that finds the maximum of value of a ten-element array. The two parameters are passed by reference on the stack. Local variables must be allocated on the stack. void max(unsigned short *pt, unsigned short *result){ unsigned short n; (*result) = 0; for(n=0;n<10;n++){ if((*result)<(*pt)) (*result) = (*pt); pt = pt+1; } } A typical calling sequence is ldx #mydata ; pointer to 10-element structure pshx ldx #myresult ; pointer to result pshx jsr max pulx ; balance stack pulx Homework 7.7 Write assembly code that implements a median filter. The three parameters are passed by value on the stack. The return parameter is passed back in Register D. Local variables must be allocated on the stack. short median(short data1, short data2, short data3){ short temp; if(data1 > data2){ temp = data1; data1 = data2; data2 = temp; // switch } if(data1 > data3){ temp = data1; data1 = data3; data3 = temp; // switch } // data1 is now the smallest if(data2 < data3) return data2; // return the middle value else return data3 } A typical calling sequence is ldx var1 ; first parameter pshx ldx var2 ; second parameter pshx ldx var3 ; third parameter pshx jsr median std var4 ; var4=median(var1,var2,var3) pulx ; balance stack pulx pulx Homework 7.8 Write assembly code that converts temperature in Fahrenheit to temperature in Centigrade. The input parameter is passed by value on the stack. The return parameter is passed back in Register D. short FtoC(short tempF){ return = (5*(tempF-32))/9; }
278
7 䡲 Local Variables and Parameter Passing A typical calling sequence is ldx myTempF ; temperature in F pshx jsr FtoC std myTempC ; temperature in C pulx ; balance stack Homework 7.9 Write assembly code that converts temperature in Centigrade to temperature in Fahrenheit. The input parameter is passed by value on the stack. The return parameter is passed back in Register D. short CtoF(short tempC){ return = (9*tempF)/5+32; } A typical calling sequence is ldx myTempC ; temperature in C pshx jsr CtoF std myTempF ; temperature in F pulx ; balance stack Homework 7.10 Write assembly code that finds the median of value of a five-element array. The input parameter is passed by reference on the stack. The return parameter is passed back in Register D. Local variables must be allocated on the stack. A typical calling sequence is ldx #mydata ; pointer to 5-element structure pshx jsr median std myMedian pulx ; balance stack Homework 7.11 In this question, choose whether the stack activity is 䡲 MUST 䡲 SHOULD 䡲 ILLEGAL
must be performed because not no doing so leads to a illegal stack operation should be followed because it leads to clearer and easier to debug code may not be performed because doing so leads to a illegal stack operation
a) Reading RAM memory at an address less than or equal to the SP. b) Performing all the stack operations which decrement the stack pointer at the beginning of a subroutine and performing all the stack operations that increment the stack pointer at the end. c) Writing to RAM memory at an address greater than the SP. d) Using set or equ to specify the stack relative position of local variables. e) Having the same number of pulls as you have pushes. Homework 7.12 Consider the concepts of local variables and promotion. a) List the four stages of a local variable. b) Write the assembly code for the following C function. Place the corresponding C code into the comment field of the assembly program. Also identify the four stages of the local variables. For each line of C code, show the corresponding assembly, without optimization. void fun(void){ char small; short large; small = -100; large = small;
\\ creates an 8-bit signed local variable \\ creates a 16-bit signed local variable \\ 8-bit signed is promoted to 16-bit signed
} Homework 7.13 Using recursion, write a subroutine that calculates the fibonaci function. In particular, f ib(0) 1 f ib(1) 1 f ib(n) fib(n 1) fib(n 2) for n 1
7.7 䡲 Homework Problems
279
The input is passed by value in Register D, and the result is also returned by value in Register D. Homework 7.14 Write a subroutine called FUZZY which performs the following input/output function. In and Th are inputs and Out is the result. All parameters are 8-bit unsigned integers. Figure Hw7.12 Fuzzy logic response.
Out
if(In < Th){ Out = (255*(Th-In))/Th; } else{ Out = 0; }
255 128
In
0 0
Th
255
A typical calling shows the two inputs, In and Th, are passed on the stack. The return parameter, Out, is returned in Reg B. ldaa psha ldaa psha jsr ins ins ; Reg B =
#150 #90 FUZZY
;value for Th ;Th pushed on the stack ;value for In ;In on the stack ;your function
;pop off Th and In (255*(150-90))/150 = 102
Homework 7.15 Consider the reasons why one chooses which technique to create a variable. a) List three reasons why one would implement a variable using a register. b) List three reasons why one would implement a variable on the stack and access it using RegX indexed mode addressing. c) List three reasons why one would implement a variable in RAM and access it using directmode addressing. Homework 7.16 Consider reasons for implementing “call by value” versus “call by reference”. a) List two reasons for implementing “call by value”. b) List two reasons for implementing “call by reference”. Homework 7.17 In this problem, you will implement three unsigned 8-bit local variables on the stack using Reg Y stack frame addressing and symbolic binding. The variables are called front, center, and back. The code in this question is part of a subroutine which ends in rts. a) Show the assembly code that (in this order) saves Register Y, establishes the Register Y stack frame, and allocates the three 8-bit local variables. b) Assume the stack pointer is equal to $3F0A just before jsr instruction is executed that calls this subroutine. Draw a stack picture showing the return address, the three variables, Register Y, and the stack pointer SP. c) Show the symbolic binding for front, center, and back. d) Show code that implements center=100; using Reg Y stack frame addressing. e) Show the assembly code that deallocates the local variables, restores Reg Y, and returns. Homework 7.18 In this problem, you will implement three unsigned 16-bit local variables on the stack using Reg X stack frame addressing and symbolic binding. The variables are called front, center, and back. The code in this question is part of a subroutine which ends in rts. a) Show the assembly code that (in this order) saves Register X, establishes the Register X stack frame, and allocates the three 16-bit local variables. b) Assume the stack pointer is equal to $3F00 just before jsr instruction is executed that calls this subroutine. Draw a stack picture showing the return address, the three variables, Register X, and the stack pointer SP. c) Show the symbolic binding for front, center, and back. d) Show code that implements center=100; using Reg X stack frame addressing. e) Show the assembly code that deallocates the local variables, restores Reg X, and returns.
280
7 䡲 Local Variables and Parameter Passing Homework 7.19 In this problem, you will implement three unsigned 8-bit local variables on the stack using Reg SP addressing and symbolic binding. The variables are called front, center, and back. The code in this question is part of a subroutine which ends in rts. a) Show the assembly code that allocates the three 8-bit local variables. b) Assume the stack pointer is equal to $3F0A just before jsr instruction is executed that calls this subroutine. Draw a stack picture showing the return address, the three variables, and the stack pointer SP. c) Show the symbolic binding for front, center, and back. d) Show code that implements center=100; using Reg SP addressing. e) Show the assembly code that deallocates the local variables and returns. Homework 7.20 Write a debugging instrument (a subroutine) that first checks the value of Port T bit 0. If PT0 is 1, then it displays the value of Registers D, X, and Y. If PA0 is 0, the instrument returns without performing any output. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. Save and restore any registers that you modify, including the CCR. The subroutine will be added to the original software using an editor, then the combination will be assembled and downloaded to the target. Homework 7.21 Write a debugging instrument (a subroutine) that displays the value of the PC from which the subroutine was called. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. Save and restore any registers that you modify including the CCR. The jsr to this subroutine will be added to the original software using an editor, then the combination will be assembled and downloaded to the target.
7.8
Laboratory Assignments Lab 7.1 Bubble Sort Purpose. This lab has these major objectives: 䡲 To evaluate the static and dynamic efficiency of software 䡲 To learn how to pass subroutine parameters on the stack 䡲 By value, pushing the value onto the stack 䡲 By reference, pushing a pointer onto the stack 䡲 To implement local variables on the stack 䡲 To study the Bubble Sort algorithm Description. a) Write an assembly subroutine that implements the Bubble Sort algorithm. The input parameters are passed on the stack. Local variables must be allocated on the stack. The buffer size (Count) is 1 to 255. void Bubble(char *Buffer, unsigned char Count){ // Count is the size of the byte array Buffer[i] unsigned char i,j; /* Indexes into Buffer */ char temp; /* Used for exchange */ for(i=1;i=i;j—){ if(Buffer[j-1]>Buffer[j]){ temp = Buffer[j-1]; /* Exchange */ Buffer[j-1] = Buffer[j]; Buffer[j] = temp; } } } } A typical calling sequence is ldx #mydata ; pointer to 20-byte structure (call by reference) pshx
b) Use this simple assembly code to debug your Bubble Sort algorithm. org $0800 mydata rmb 5 main lds #$4000 ldaa #$35 staa mydata ; initialize ldd #$3433 std mydata+1 ldd #$3231 std mydata+3 ; mydata[]={'5','4','3','2','1'} ldx #mydata ; pointer to 5-byte structure (call by reference) pshx ldab #5 ; Count (call by value) pshb jsr Bubble ins ; balance stack pulx stop c) Write assembly code that tests the Bubble Sort algorithm. Copy and paste the SCI device driver software from tut2.rtf. This main program will input an ASCII string from a SCICRT interface (call SCI_InString), calculate its length, call the bubble sort subroutine, and output the sorted string on the SCI-CRT (call SCI_OutString). d) Add debugging code to the test software in part c) that measures the elasped execution time for the sort subroutine. Plot the execution time versus buffer size (using worst-case initial data) for buffer sizes 10, 20, 30, and 40 bytes. Fit this data to a quadratic equation to derive a general solution for all sizes. Lab 7.2 Heap Sort Purpose. This lab has these major objectives: 䡲 To evaluate the static and dynamic efficiency of software 䡲 To learn how to pass subroutine parameters on the stack 䡲 By value, pushing the value onto the stack 䡲 By reference, pushing a pointer onto the stack 䡲 To implement local variables on the stack 䡲 To study the Heap Sort algorithm Description. a) Write assembly code that implements the Heap Sort algorithm. The input parameters are passed on the stack. Local variables must be allocated on the stack. The buffer size (Count) is 1 to 255. void HeapSort(char *Buffer, unsigned char Count){ // Count is the size of the byte array Buffer[i] unsigned char i,j; // used when sifting unsigned char ir; unsigned char m; // used in the hiring phase char z; // temporary, used to sort m = (Count>>1)+1; // initial value Count/2+1 ir = Count; for(;;){ if(m > 1){ --m;
282
7 䡲 Local Variables and Parameter Passing z = Buffer[m]; // } else{ z = Buffer[ir]; Buffer[ir] = Buffer[1]; if(--ir == 1){ Buffer[1] = z;
still hiring // // // // // //
in retirement and promotion clear space at end Retire top of heap into it Done with last promotion? least competent worker of all
break; } } i = m; // whether in the hiring or promotion phase j = m+m; // we set up to sift down element z to while(j <= ir ){ // its proper level if((j
7.8 䡲 Laboratory Assignments
283
c) Write assembly code that tests the Heap Sort algorithm. Copy and paste the SCI device driver software from tut2.rtf. This main program will input an ASCII string from a SCI-CRT interface (call SCI_InString), calculate its length, call the heap sort subroutine, and output the sorted string on the SCI-CRT (call SCI_OutString). d) Add debugging code to the test software in part c that measures the elasped execution time for the sort subroutine. Plot the execution time versus buffer size (using worst-case initial data) for buffer sizes 10, 20, 30, and 40 bytes. Fit this data to a quadratic equation to derive a general solution for all sizes.
8
Serial and Parallel Port Interfacing Chapter 8 objectives are to: c c c c c c
Describe the SCI and SPI serial ports Discuss how to interface a keyboard using scanning Interface a liquid crystal display (LCD) and design a device driver for it Explain and interface electromechanical devices that are binary in nature Use pulse-width modulation (PWM) to control power delivered to a DC motor Interface and control a stepper motor
The common theme of this chapter is I/O interfacing connected to serial and parallel ports. The chapter begins with a discussion of SCI and SPI ports, which can be used to interface external devices to the 9S12 such as GPS, DAC, and ADC devices. Various I/O devices such as keyboards, optical sensors, LCD displays, relays, solenoids, DC motors, and stepper motors will be interfaced. In addition, the pulsewidth modulation feature of the 9S12 will be used when interfacing DC motors so that the software can control delivered power to the motor. Advances in the number and sophistication of the I/O ports has contributed greatly to the long term growth of applications of embedded systems. This book covers just some of the ports for the 9S12 microcomputers. For a complete list of I/O ports, refer to the respective data sheets.
8.1
General Introduction to Interfacing There are three components to microcomputer interfacing. Since many external devices have physical characteristics, the first step is the mechanical design of the physical components. Often, the mechanical design is simply selecting the physical devices from a list of available components. The next step is the analog and digital electronics used to connect the physical devices to the computer. The voltage levels of the external device must be translated into values compatible with the microcontroller. The RS232 interface using the MAX232 interface in Figure 8.2 is a typical example if this translation. Some external devices need the interface to source or sink current, and the interfaces in Figures 8.16, 8.21, and 8.28 can be used for these applications. The input/output information may be encoded as simple digital signals or variable analog signals. Interfacing with analog signals will be presented in Chapter 11. More complex systems may use frequency, period, phase, or pulse width to represent the signals. Interfacing with time-based signals will be presented in Chapter 9. The third component of interfacing is the low-level software that transforms the mechanical and electrical devices into objects that perform the desired tasks. The group of these low-level functions is often designated as an I/O device driver. Since this book serves as an introduction to interfacing, most of the hardware circuits are given, and the software design is explained.
284
8.1 䡲 General Introduction to Interfacing
285
The 9S12 is built with CMOS logic. Interfacing with CMOS logic involves consideration of voltage, current, and capacitance. First, let’s consider a digital output, e.g., a port pin with its direction register equal to 1. IOH is the largest current a port pin can source when the output is high. VOH is the smallest voltage a port pin will be if the output is high and the current is less than IOH. If the 9S12 is powered by VDD, then the output high voltage will be between VOH and VDD. IOL is the largest current a port pin can sink when the output is low. VOL is the largest voltage a port pin will be if the output is low and the current is less than IOL. The output low voltage will be between 0 and VOL. Next, let’s consider a digital input, e.g., a port pin with its direction register equal to 0. IIH is the current the input port pin will require when its input is high. VIH is the voltage above which the input will be considered high. IIL is the current the input port pin will require when its input is low. VIL is the voltage below which the input will be considered low. Table 8.1 shows current parameters for various digital logic families, and Table 8.2 shows voltage parameters. In summary, if the input is between 0 and VIL, it is considered a low. If the input is between VIH and VDD, it is considered a high. Refer back to the transistor-level implementation in Figure 3.5. If the voltage on an input pin remains between VIL and VIH for a long time, both the p-type and n-type transistors will be active, causing a short circuit from 5 to ground. With CMOS microcomputers one must define unused I/O pins either as outputs or specify them as inputs and tie the pin high (or low) in hardware. Because of the extremely high impedance of CMOS inputs, an unconnected input pin may oscillate, dissipating power unnecessarily. In order for the output to properly drive all the inputs of the next stage, the maximum available output current must be larger than the sum of all the required input currents for both the high and low conditions. |IOL| Ú g|IIL|
and
|IOH| Ú g|IIH|
In order for the digital information to be properly transferred from the output of one module to the input of the next, we need the output high voltage to be more than the required input high voltage and the output low voltage to be less than the input low voltage. VOH VIH
and
VOL VIL
Table 8.1 The input and output currents of various digital logic families and microcomputers.
Family
Example
IOH
IOL
IIH
IIL
standard TTL (5 V supply) Schottky TTL Low Power Schottky TTL High speed CMOS Freescale microcomputer
7404 74S04 74LS04 74HC04 MC9S12
0.4 mA 1 mA 0.4 mA 4 mA 10 mA
16 mA 20 mA 8 mA 4 mA 10 mA
40 A 50 A 20 A 1 A 1 A
1.6 mA 2 mA 0.4 mA 1 A 1 A
Table 8.2 The input and output voltages of various digital logic families and microcomputers.
Family
Example
VOH
VOL
VIH
VIL
standard TTL (5 V supply) Schottky TTL Low Power Schottky TTL High speed CMOS Freescale microcomputer
7404 74S04 74LS04 74HC04 MC9S12
2.4 V 2.7 V 2.7 V 4.44 V VDD-0.8
0.4 V 0.5 V 0.5 V 0.5 V 0.8 V
2V 2V 2V 3.5 V 0.65*VDD
0.8 V 0.8 V 0.8 V 1.5 V 0.35*VDD
The last consideration for interfacing with CMOS logic is capacitance. Capacitance loading occurs with each input, and with long cables. A 9S12 input port pin has a capacitive load of 6 pF. Consider a situation where the output of one circuit is attached to the input of another. If the output goes from 0 to 5 V, the voltage as perceived at the input of the next stage will be V(t) 5 - 5e-t/RC
8 䡲 Serial and Parallel Port Interfacing
286
where R is the resistance in the circuit, and C is the capacitive load. R*C is called the time constant. If the time constant is very small, the input goes from 0 to 5 almost immediately after the output goes 0 to 5 V. If the signal is a square wave with period T, the interface will only work for situations where the period T is large compared to the time constant . I/O ports are the specific components of a microcomputer that allow it to interact with its environment. A device driver is a collection of software functions that allow higher level software to utilize an I/O device. In other words, the set of low-level functions that input/output directly with the hardware are grouped together in a single module and called a device driver.
8.2
Serial Communications Interface, SCI Serial Communications Interface (SCI) is an old communication protocol originally used to connect input/output terminals to mainframe computers. The common name for this protocol is Universal Asychronous Receiver Transmitter or UART. When microcomputers came onto the scene in the 1980s, this serial protocol was used to perform input/output. The protocol is neither fast, nor reliable. However, it is simple and continues to exist in state-ofthe-art microcontrollers because there are still a wide range of external devices using the protocol, such as GPS and LCD graphics displays.
8.2.1 RS232 Protocol
Figure 8.1 A serial data frame with M 0.
Serial transmission involves sending one bit a time, where the data is spread out over time. Engineers have found that one can send data farther, faster, less expensively, and more reliabily using serial versus parallel channels. This is because it is easier to control capacitive loading and added noise within a serial cable. The total number of bits transmitted per second is called the baud rate. Most of the Freescale embedded microcomputers supports at least one Serial Communications Interface or SCI. Before discussing the detailed operation of particular devices, we will begin with general features common to all devices. Each SCI module has a baud rate control register, which we use to select the transmission rate. There is a mode bit, M, which selects 8-bit (M 0) or 9-bit (M 1) data frames. Each device is capable of creating its own serial port clock with a period that is an integer multiple of the E clock period. The programmer will select the baud rate by specifying the integer divide-by used to convert the E clock into the serial port clock. A frame is the smallest complete unit of serial transmission. The difficulty with serial transmission is synchronizing the receiver with the transmitter. With the SCI protocol, a start bit that always has a 1 to 0 transition is sent by the transmitter signifying the start of the frame. Figure 8.1 plots the signal versus time on a serial port, showing a single frame, which includes a start bit (0), 8 bits of data (least significant bit first) and a stop bit (1). The stop bit is also required so that when a start bit occurs there will be a 1 to 0 transition. This protocol is used for both transmitting and receiving. The information rate, or bandwidth, is defined as the amount of data or usable information transmitted per second. From Figure 8.1, we see that 10 bits are sent for every byte of usable data. Therefore, the bandwidth of the serial channel (in bytes/second) is the baud rate (in bits/sec) divided by 10. one frame serial port
start b b b b b b b 7 stop 0 2 3 4 1 5 b6
5V 0V
Common Error: If you change the E clock frequency without changing the baud rate register, the SCI will operate at an incorrect baud rate.
Table 8.3 shows the three most commonly used RS232 signals. The RS232 standard uses a DB25 connector that has 25 pins. The EIA-574 standard uses RS232 voltage levels and a DB9 connector that has only 9 pins. The most commonly used signals of the full
8.2 䡲 Serial Communications Interface, SCI
287
DB25 Pin
RS232 Name
DB9 Pin
EIA-574 Name
Signal
Description
True
DTE
DCE
2 3 7
BA BB AB
3 2 5
103 104 102
TxD RxD SG
Transmit data Receive data Signal ground
12 V 12 V
Out In
In Out
Table 8.3 The commonly used signals on the RS232 and EIA-574 protocols.
RS232 standard are available with the EIA-574 protocols. Only TxD, RxD and SG are required to implement a simple bidirectional serial channel, thus the other signals are not shown. We define the data terminal equipment (DTE) as the computer or a terminal and the data communication equipment (DCE) as the modem or printer. A Maxim converter chip is used to generate the 12 and 12 V RS232 voltage levels, as shown in Figure 8.2. The capacitors in this circuit are important, because they form a charge pump used to create the 12 voltages from the 5 V supply. The RS232 timing is generated automatically by the SCI in the 9S12. During transmission, the Maxim chip translates a digital high on microcomputer side to 12 V on the RS232/EIA-574 cable, and a digital low is translated to 12 V. During receiving, the Maxim chip translates negative voltages on RS232/EIA-574 cable to a digital high on the microcomputer side, and a positive voltage is translated to a digital low. The computer is classified as DTE, so its serial output is pin 3 in the EIA-574 cable, and its serial input is pin 2 in the EIA-574 cable. When connecting a DTE to another DTE, we use a cable with pins 2 and 3 crossed. I.e., pin 2 on one DTE is connected to pin 3 on the other DTE and pin 3 on one DTE is connected to pin 2 on the other DTE. When connecting a DTE to a DCE, then the cable passes the signals straight across. In all situations, the grounds are connected together using the SG wire in the cable. This channel is classified as full duplex, because transmission can occur in both directions simultaneously. Figure 8.2 Hardware interface implementing an asynchronous RS232 channel.
+5V 0.1μF
16
0.1μF
Microcontroller 0.1μF
SCI PS2 PS0
RxD
PS3 PS1
TxD
3 4
MAX 232A
DB9 female
0.1μF
1 2
+10V 6 –10V
Vss 5 9
0.1μF
5
4 8
9
Sin
8 3
10
7
Sout
7 2
15
6 1
Checkpoint 8.1: The Dragon12 board from Wytec and the Adapt9S12 boards from Technological Arts use a straight through DB9 cable to interface their boards to the PC. The PC is a computer, hence it is a DTE. Are these 9S12 boards DTE or DCE?
8.2.2 Transmitting in Asynchronous Mode
We will begin with transmission, because it is simpler than reception. The transmitter portion of the SCI includes a TxD data output pin with CMOS voltage levels, (see Figure 8.3). The transmitter has a 10- or 11-bit shift register, which cannot be directly accessed by the programmer, and this shift register is separate from the receive shift register. To output data
288
8 䡲 Serial and Parallel Port Interfacing
Figure 8.3 Data and shift registers implement the serial transmission.
shift clock
stop 7 6 5 4 3 2 1 0 start 1 T8 0 data
write data
PS1 9S12DP512 PS3 or 9S12DP512 PS1
TxD 9S12C32
SCI0DRL transmit data register
using the SCI, the software will write to the Serial Communications Data Register. On the 9S12C32 data register is called SCIDRL; on the 9S12DP512 it is called SCI0DRL or SCI1DRL. The transmit data register is write only, which means the software can write to it (to start a new transmission) but cannot read from it. Even though the transmit data register is at the same address as the receive data register, the transmit and receive data registers are two separate registers. When using 9-bit data mode (M 1), we first set the T8 bit, then we write to the transmit data register to start transmission. Four control bits that affect transmission. We initialize the Transmit Enable control bit, TE, to 1 to enable the transmitter. We set the Send Break control bit, SBK, to 1 to send blocks of 10 (or 11 if M 1) zeros. We arm for interrupts by setting the Transmit Interrupt Enable control bit, TIE. The Transmit Complete Enable control bit, TCIE, allows the TC flag to interrupt. There are two status bits generated by transmitter activity. The Transmit Data Register Empty flag, TDRE, is set when the transmit SCI0DRL is empty. The TDRE bit is cleared by reading the TDRE flag (with it set) then writing to the SCI0DRL. The Transmit Complete flag, TC, is set when the transmit shift register is done shifting. The TC is cleared by reading the TC flag (with it set) then writing to the SCI0DRL. When new data (8 bits) is written to the SCI0DRL, it is copied into the 10- or 11-bit transmit shift register. Next, the start bit, T8 (if M 1) and stop bits are added. Then, the frame is shifted out one bit at a time at a rate specified by the baud rate register. If there is already data in the shift register when the SCIODRL is written, it will wait until the previous frame is transmitted, before it too is transferred. The serial port hardware is actually controlled by a clock that is 16 times faster than the baud rate. The digital hardware in the SCI counts 16 times in between changes to the TxD output line. In essence, the SCI0DRL and transmit shift register behave together like a two element first in first out queue (FIFO). In other words the software can actually write two bytes to the SCI0DRL, and the hardware will send them both one at a time. In fact, the serial port interface chip used in most PC computers has a 16-byte hardware FIFO between the data register and the shift register. A PC that has a 16C550-compatible UART supports this hardware FIFO function. This FIFO reduces the software response time requirements of the operating system to service the serial port hardware.
8.2.3 Receiving in Asynchronous Mode
Receiving data frames is a little trickier than transmission because we have to synchronize the receive shift register with the incoming data. The receiver portion of the SCI includes a RxD data input pin, with CMOS voltage levels, see Figure 8.4. There is also a 10- or 11-bit shift register, which can not be directly accessed by the programmer. Again the receive shift register is separate from the transmit shift register. The receiver has a Serial Communications Data Register. Again, this register is called SCIDRL on the 9S12C32, and SCI0DRL/SCI1DRL on the 9S12DP512. The receive data register is read only, which means write operations to this address have no effect on this register. When operating in 9-bit mode (M 1), the ninth data bit is saved in the R8 bit. There are four control bits that affect the receiver. We will set the Receiver Enable control bit, RE, to 1 to enable the receiver. If we set the Receiver Wakeup control bit, RWU, to 1 then a receiver input to wake up the computer out of a low power sleep mode. There are two interrupt arm bits for the receiver. The Receiver Interrupt Enable control bit, RIE, enables the RDRF flag to request interrupts. The Idle Line Interrupt Enable control bit,
8.2 䡲 Serial Communications Interface, SCI Figure 8.4 Data register shift registers implement the receive serial interface.
shift clock
stop 7 6 5 4 3 2 1 0 start 1 R8 0 data
read data
289
PS0 9S12DP512 PS2 or 9S12DP512 PS0
RxD 9S12C32
SCI0DRL receive data register
ILIE, enables the IDLE flag to request interrupts. There are five status bits generated by receiver activity. The Receive Data Register Full flag, RDRF, is set when new input data is available. The RDRF bit is cleared by reading the RDRF flag (with it set) then reading the SCI0DRL. The Receiver Idle flag, IDLE, is set when the receiver line becomes idle. The IDLE bit is cleared by reading the IDLE flag (with it set) then reading the SCI0DRL. The Overrun flag, OR, is set when input data is lost because previous data frames had not been read. The OR bit is cleared by reading the OR flag (with it set) then reading the SCI0DRL. The Noise flag, NF, is set when the input is noisy. The NF bit is cleared by reading the NF flag (with it set) then reading the SCI0DRL. Each bit is sampled three times by the receiver, and the NF flag is cleared if all bits yielded unanimous decisions. The NF bit is set when any of the groups of three samples did not all agree. NF errors can occur if there is indeed noise on the line, but more likely it is caused by a mismatch between the transmitter and receiver baud rates. The Framing Error, FE, is set when the stop bit is incorrect. The FE bit is cleared by reading the FE flag (with it set) then reading the SCI0DRL. Framing errors are also probably caused by a mismatch in baud rate. The receiver waits for the 1 to 0 edge signifying a start bit, then shifts in 10 or 11 bits of data one at a time from the RxD line. The start and stop bits are removed (checked for noise and framing errors), the 8 bits of data are loaded into the SCI0DRL, the ninth data bit is put in R8 (if M 1), and the RDRF flag is set. If there is already data in the SCI0DRL when the shift register is finished, it will wait until the previous frame is read by the software, before it is transferred. An overrun occurs when there is one receive frame in the SCI0DRL, one receive frame in the receive shift register, and a third frame comes into RxD. In order to avoid overrun, we can design a real-time system, i.e., one with a maximum latency. The latency of a SCI receiver is the delay between the time when new data arrives in the receiver SCI0DRL, and the time the software reads the SCI0DRL. If the latency is always less than 10 (11 if M 1) bit times, then overrun will never occur. Observation: With a serial port that has a shift register and one data register (no additional FIFO buffering), the latency requirement of the input interface is the time it takes to transmit one data frame.
In the example illustrated in Figure 8.5, assume the SCI receive shift register and receive data register are initially empty. Three incoming serial frames occur one right after another, but the software does not respond. At the end of the first frame, the $31 goes into the receive SCI0DRL and the RDRF flag is set. In this scenario, the software is busy doing other things, and does not respond to the setting of RDRF. Next, the second frame is entered into the receive shift register. At the end of the second frame, there is the $31 in the SCI0DRL and the $32 in the shift register. If the software were to respond at this point, then both characters would be properly received. If the third frame begins before the first is read by the software, then an overrun error occurs and a frame is lost. We can see from this worst case Figure 8.5 Three receive data frames result in an overrun (OR) error.
"1"=$31
"2"=$32
"3"=$33
s 0 1 2 3 4 5 6 7 s s 0 1 2 3 4 5 6 7 s s 0 1 2 3 4 5 6 7 s
$31 SCI0DRL $32 Shift reg RDRF=1
OVRN=1
290
8 䡲 Serial and Parallel Port Interfacing
scenario that the software must read the data from SCI0DRL within 10 bit times of the setting of RDRF in order to prevent overrun. Next we will overview the specific SCI functions on particular Freescale microcomputers. This section is intended to supplement rather than replace the Freescale manuals. When designing systems with a SCI, please also refer to the reference manual of your specific Freescale microcomputer.
8.2.4 9S12 SCI Details
Address
msb
$00C8
–
–
The 9S12C32 has one serial port using Port S bits 1,0. The 9S12DP512 has two serial ports: SCI1 uses Port S bits 3,2 and SCI0 uses Port S bits 1,0. Table 8.4 shows the I/O ports that implement the SCI functions. The SCI transmitter and receiver are independent, but use the same data format and bit rate. On the 9S12DP512, the SCI port names include a 0 or 1 to specify which SCI module, otherwise the SCI modules on the various 9S12 microcontrollers operate similarly. For example, the one baud rate register on the 9S12C32 is called SCIBD, but on the 9S12DP512 there are two SCI modules, so there are two baud rate registers called SCI0BD and SCI1BD.
–
12
11
10
9
8
7
6
5
4
3
2
1
lsb
Name
0
SCI0BD
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$00CA $00CB $00CC $00CD $00CE $00CF
LOOPS TIE TDRE 0 R8 R7T7
SWAI TCIE TC 0 T8 R6T6
RSRC RIE RDRF 0 0 R5T5
M ILIE IDLE 0 0 R4T4
WAKE TE OR 0 0 R3T3
ILT RE NF BRK13 0 R2T2
PE RWU FE TXDIR 0 R1T1
PT SBK PF RAF 0 R0T0
SCI0CR1 SCI0CR2 SCI0SR1 SCI0SR2 SCI0DRH SCI0DRL
Table 8.4 9S12 SCI ports.
The least significant 13 bits of SCI0BD determine the baud rate for the SCI port. If BR is the value written to bits 12:0, and Mclk is the module clock (typically this is the same as the E clock), then the baud rate is SCI Baud Rate
Mclk 16*BR
The SCI0CR2 control register contains the bits that turn on the SCI, and contains the interrupt arm bits. TE is the Transmitter Enable bit, and RE is the Receiver Enable bit. We set both TE and RE equal to 1 in order to activate the SCI device. TIE is the Transmit Interrupt Enable bit. We set TIE 1 to arm the transmitter so that an interrupt is requested when TDRE is set. We clear TIE 0 to disarm the TDRE-triggered interrupts. TCIE is the Transmit Complete Interrupt Enable bit. We set TCIE 1 to arm the transmitter so that an interrupt is requested when TC is set. We clear TCIE 0 to disarm the TC-triggered interrupts. RIE is the Receiver Interrupt Enable bit. We set RIE 1 to arm the receiver so that an interrupt is requested when RDRF is set. We clear RIE 0 to disarm the RDRF-triggered interrupts. ILIE is the Idle Line Interrupt Enable bit. We set ILIE 1 to arm the receiver so that an interrupt is requested when IDLE is set. We clear ILIE 0 to disarm the IDLE-triggered interrupts. RWU is the Receiver Wake Up Control bit. We set RWU 1 to enable wake up and inhibit receiver interrupts. SBK is the Send Break bit. We set SBK 1 to send a break (continuous TxD low) as long as the SBK is 1.
8.2 䡲 Serial Communications Interface, SCI
291
The SCI0CR1 control register contains the bits that handle special modes of the SCI. LOOPS is the SCI LOOP Mode/Single Wire Mode Enable bit. We set it to 1 to enable loop mode. When loop mode is active, the SCI receive section is disconnected from the RxD pin and the RxD pin is available as general purpose I/O. The receiver input is determined by the RSRC bit. The transmitter output is controlled by the associated DDRS bit. Both the transmitter and the receiver must be enabled to use the LOOP or the single wire mode. RSRC is the Receiver Source when LOOPS 1, the RSRC bit determines the internal feedback path for the receiver. If RSRC equals 0, the receiver input is connected to the transmitter internally (not TxD pin). If RSRC is 1, then the receiver input is connected to the TxD pin. M is the Mode bit. We set M 0 to create a 10 bit frame with 1 start bit, 8 data bits, 1 stop bit. We set M 1 to create an 11-bit frame with 1 start bit, 9 data bits, 1 stop bit (the ninth data bit is in T8/R8). If Wake 0, then the SCI will wake up by an IDLE line recognition. If Wake 1, then the SCI will wake up by address mark (most significant data bit set). ILT is the Idle Line Type specifying which of two types of idle line detection will be used by the SCI receiver. ILT determines when the receiver starts counting logic 1s as idle character bits. The counting begins either after the start bit or after the stop bit. If the count begins after the start bit, then a string of logic 1s preceding the stop bit may cause false recognition of an idle character. Beginning the count after the stop bit avoids false idle character recognition, but requires properly synchronized transmissions. If ILT is 1, then the idle character bit count begins after stop bit. If ILT is 0 then the idle character bit count begins after start bit. To enable parity we set PE to 1. If parity is enabled, the SCI will insert a parity bit into the most significant position, and we specify the parity type with PT. If PT is 0, then an even number of ones in the data character causes the parity bit to be zero and an odd number of ones causes the parity bit to be one. If PT is 1, then odd parity is selected. An odd number of ones in the data character causes the parity bit to be zero and an even number of ones causes the parity bit to be one. If parity is enabled, the receiver will test the parity of each incoming frame. Typically, we set M 1 along with PE 1 to create an 11-bit frame (one start, eight data, one parity, and one stop). Alternatively, we count set M 0 and with PE 1 to create a 10-bit frame (one start, seven data, one parity, and one stop). The flags in the SCI0SR1 register can be read by the software, but can not be modified by writing to this register. TDRE is the Transmit Data Register Empty Flag. It is set by the SCI hardware if transmit data can be written to SCI0DRL. If TDRE is zero, transmit data register contains previous data that has not yet been moved to the transmit shift register. Writing into the SCI0DRL when TDRE is zero will result in the loss of data. On the other hand, when this bit is set, the software can begin another output transmission by writing to SCI0DRL. This flag is cleared by first reading SCI0SR1 with TDRE set followed by a SCI0DRL write. TC is the Transmit Complete Flag. It is set if transmitter is idle (no data, preamble, or break transmission in progress). We can clear TC by reading SCI0SR1 with TC set followed by writing to SCI0DRL. RDRF is the Receive Data Register Full bit. RDRF is set if a received character is ready to be read from SCI data register. We clear the RDRF flag by reading SCI0SR1 with RDRF set followed by reading SCI0DRL. IDLE is the Idle Line Detected Flag. It is set if the RxD line is idle (10 or 11 consecutive logic ones). The IDLE flag is inhibited when RWU is set to one. It is cleared by reading SCI0SR1 with IDLE set followed by reading SCI0DRL. Once cleared, IDLE is not set again until the RxD line has been active and becomes idle again. Four error conditions can occur during generation of SCI input/output. Four bits (OR, NF, FE and PE) in the serial communications status register (SCI0SR1) indicate if one of these error conditions exists. The overrun error (OR) bit is set when the next byte is ready to be transferred from the receive shift register to the SCI data register and the SCI data register is already full (RDRF bit is set). When an overrun error occurs, the
292
8 䡲 Serial and Parallel Port Interfacing
data that caused the overrun is lost and the data that was already in SCI0DRL is not disturbed. The OR is cleared when the SCI0SR1 is read (with OR set), followed by a read of the SCI0DRL. The noise flag (NF) bit is set if there is noise on any of the received bits, including the start and stop bits. In particular, each data bit is sample three times and the NF bit is set if the three samples are not all the same. The NF bit is not set until the RDRF flag is set. The NF bit is cleared when the SCI0SR1 is read (with FE equal to 1) followed by a read of the SCI0DRL. When no stop bit is detected in the received data character, the framing error (FE) bit is set. FE is set at the same time as the RDRF. If the byte received causes both framing and overrun errors, the processor only recognizes the overrun error. The framing error flag inhibits further transfer of data into the SCI0DRL until it is cleared. The FE bit is cleared when the SCI0SR1 is read (with FE equal to 1) followed by a read of the SCI0DRL. The parity flag (PF) bit is set when the parity enable bit (PE) is set and the parity of the received data does not match the parity type bit (PT). PF bit is set during the same cycle as the RDRF flag but does not get set in the case of an overrun. We can clear PF by reading SCI0SR1 and then reading SCI0DRL. The SCI0DRL register contains the data transmitted out and received in by the SCI device. Even though there are separate transmit and receive data registers, these two registers exist at the same I/O port address. Reads to SCI0DRL access the eight bits of the read-only SCI receive data register. Writes to SCI0DRL access the eight bits of the writeonly SCI transmit data register. R8 is Receive Data Bit 8. It is a read-only bit that contains the ninth bit of the receive data when M 1. T8 is Transmit Data Bit 8. If M bit is set, T8 stores ninth bit in transmit data character. T8 can be set to 1 to implement a second stop bit. When using 9-bit data, it is necessary to read and write the SCI data register as one 16-bit register. Common Error: If we read or write the SCI0DRL and SCI0DRH registers in two separate 8-bit accesses, it is possible to confuse the MSbyte and LSbyte between sequential frames.
The SCI0SR1 register contains two mode control bits and one status bit. BRK13 is the Break Transmit character length bit, which determines whether the transmit break length. If BRK13 is 1, then the break character is 13 or 14 bits long, and if it is 0, then the break Character is 10 or 11 bit long. TXDIR specifies the transmitter pin data direction in single-wire mode. It determines whether the TxD pin is going to be used as an input or output, in the single-wire mode of operation. If TXDIR is 1, then the TxD pin is an output in Single-Wire mode. If it is 0, then the TXD pin is an input in Single-Wire mode. RAF is the Receiver Active Flag. This flag is read only, and controlled by the receiver front end. It is set during the detection of a start bit. It is cleared when an idle state is detected or when the receiver circuitry detects a false start bit (generally due to noise or baud rate mismatch). If RAF is one, then a frame is being received. It is useful in half-duplex systems to avoid a collision.
Example 8.1 Design an I/O device driver that allows ASCII input/output on the SCI at 19200 bits/sec. Solution This module will require at least three public functions, one to turn the SCI on, a second function to receive a byte from SCI, and a third function to transmit a byte out from SCI. The initialization function should enable SCI (TE RE 1) and set the baud rate (see Program 8.1). Without the PLL active, we assume the M clock is 8 MHz. Therefore, BR equals 8,000,000/16/19200 26.
// Initalize 9S12DP512 (9S12C32) SCI // (9S12C32) is 4 MHz // 9S12DP512 is 8 MHz // baud rate = Eclk/BR/16 = 19200 bps void SCI_Init(void){ SCI0BD = 26; // (13 if 9S12C32) SCI0CR2 = 0x0C; // enable RE TE }
Program 8.1 Functions that initializes the serial I/O.
Busy-waiting, gadfly, or polling are three equivalent names for the interfacing technique where the software continuously checks the hardware status waiting for it to be ready. Figure 8.6 is a flowchart of the busy-wait I/O synchronization technique. To receive a byte in from SCI, the software first waits for RDRF to be set, then it reads from the SCI data register. To transmit a byte out of the SCI, the software first waits for TDRE to be set, then it writes the data to the SCI data register. Figure 8.6 Busy-wait I/O synchronization for the SCI serial port.
InChar
0
RDRF
OutChar
0
TDRE
1
1
Read SCI0DRL
Write SCI0DRL
rts
rts
Program 8.2 gives functions that input/output using the SCI serial port. The tut2 example in TExaS includes these three subroutines, additional functions for inputting and outputting of strings and numbers. Program 8.2 Functions that implement serial I/O.
;Wait for receiver input ;Register A returned with ASCII RDRF equ $20 SCI_InChar ldaa SCI0SR1 anda #RDRF beq SCI_InChar ldaa SCI0DRL rts ;Send data to transmitter ;Register A ASCII value to send TDRE equ $80 SCI_OutChar ldab SCI0SR1 andb #TDRE beq SCI_OutChar staa SCI0DRL rts
#define RDRF 0x20 // Wait for new input // then return ASCII code char SCI_InChar(void){ while((SCI0SR1&RDRF) == 0){}; return(SCI0DRL); } #define TDRE 0x80 // Wait for buffer to be empty, // then output void SCI_OutChar(char data){ while((SCI0SR1&TDRE) == 0){}; SCI0DRL = data; }
294
8 䡲 Serial and Parallel Port Interfacing Checkpoint 8.2: How do you change Example 8.1 to run at 38400 bits/sec.
8.3
Synchronous Peripheral Interface, SPI The SCI protocol is a low-bandwidth protocol, working well at speeds up to 100,000 bits/sec. It is also appropriate for connecting devices in separate enclosures, such as the PC computer and a microcontroller development board. However, there is a need for higher speed communication between modules within the same enclosure. One of the first protocols to fit this need is Serial Peripheral Interface or SPI. To increase speed over SCI, the protocol includes separate data and clock lines. Since the clock is shared by the transmitter and receiver, synchronization is greatly simplified, and there is no need for start or stop bits. The I2C and CAN protocols, presented in Chapter 12, also support high-speed communication.
8.3.1 SPI Fundamentals
Table 8.5 Synchronous serial port pins on various Freescale microcomputers.
Most of the Freescale embedded microcomputers include at least one SPI. Typically, we use the SPI to attach additional I/O devices, like a DAC or ADC, to the microcomputer. The fundamental difference between SCI, which implements an asynchronous protocol, and SPI, which implements a synchronous protocol, is the manner in which the clock is implemented. SCI is an asynchronous protocol, because the two devices communicating operate at the same frequency but have two separate clocks. The transmitter on one side of the cable and SCI receiver on the other side have clocks that are not synchronized. In fact, the SCI operates properly as long as the frequencies of these two clocks are within 5 percent of each other. Two devices communicating with synchronous serial interfaces, like SPI, operate using the same clock, and hence are synchronized. With SPI, the clock itself can be found in the interface connection between the 9S12 and its periperial. Typically, the master device creates the clock, and the slave device(s) uses the clock to latch the data. Before discussing the detailed operation of particular devices, we will begin with general features common to all devices. The Freescale SPI includes four I/O lines. The slave select ( SS ) is an optional negative logic control signal from master to slave signal signifying the channel is active. The second line, SClk, is a 50 percent duty cycle clock generated by the master. The MOSI (master out slave in) is a data line driven by the master and received by the slave. The MISO (master in slave out) is a data line driven by the slave and received by the master. In order to work properly, the transmitting device uses one edge of the clock to change its output, and the receiving device uses the other edge to accept the data. Table 8.5 lists the I/O port locations of the synchronous serial ports for the various microcomputers discussed in this book.
Although it could be used to connect microcomputers together, we will use SPI to interface external devices. In this situation the SPI system in the microcomputer operates as the master, and the external devices are slaves. In the SPI system the 8-bit data register, SPI0DR, in the master and the 8-bit data register in the slave, also SPI0DR, are linked to form a distributed 16-bit register. Figure 8.7 illustrates communication between a microcomputer and a single external device. The SS signal is used to activate the external I/O device; the SClk shifts both shift registers. When a data transfer operation is performed, this 16-bit register is serially shifted eight bit positions by the SClk clock from the master so the data is effectively exchanged between the master and the slave. It takes four actions in the master to complete this transaction. First,
8.3 䡲 Synchronous Peripheral Interface, SPI Figure 8.7 A synchronous serial interface between a microcomputer and an I/O device.
295
9S12DP512 SPI master
PS7
SS
I/O device slave
PS6
SClk
PS5
MOSI
PS4
MISO
the master waits for SPITEF to be set (meaning it is OK to write data). Second, the master writes data to the SPI0DR (which will be transmitted to the slave). Third, the master waits for SPIF to be set (meaning the input data from the slave is ready). Fourth, the master reads data from the SPI0DR (which came from the slave). There is a baud rate control register, which is used to select the transmission rate. It is possible to implement output only using just MOSI or to implement input only using just MISO. It is also possible to concatenate multiple slaves in a large loop. Even if you are implementing input only or output only communication, the master should perform all four actions listed above. For example, for input only communication, the master outputs dummy data (any value) during step two. Similarly, for output only communication, the master still reads data during step four (but ignores its value). Observation: Because the clocks are shared, if you change the E clock frequency, the transfer rate will change, but the SPI should still operate properly.
The SPI timing is shown in Figure 8.8. The SPI transmits 8-bit data at the same time as it receives input. In all modes, the SPI changes its output on the opposite edge of the Figure 8.8 Synchronous serial modes of the Freescale SPI interface.
CPOL=0, CPHA=0
SClk
MO(Master) or SO(Slave)
7
6
5
4
3
2
1
0
MI(Master) or SI(Slave)
7
6
5
4
3
2
1
0
MO(Master) or SO(Slave)
7
6
5
4
3
2
1
0
MI(Master) or SI(Slave)
7
6
5
4
3
2
1
0
SS CPOL=1, CPHA=0
SClk
SS CPOL=0, CPHA=1
SClk
MO(Master) or SO(Slave)
7
6
5
4
3
2
1
0
MI(Master) or SI(Slave)
7
6
5
4
3
2
1
0
MO(Master) or SO(Slave)
7
6
5
4
3
2
1
0
MI(Master) or SI(Slave)
7
6
5
4
3
2
1
0
SS CPOL=1, CPHA=1
SClk
SS
296
8 䡲 Serial and Parallel Port Interfacing
clock as it uses to shift data in. There are three mode control bits (MSTR, CPOL, CPHA) that affect the transmission protocol. If the device is a master (MSTR 1) it generates the SClk, and data is output on the MOSI pin, and input on the MISO pin. If the device is a slave (MSTR 0), the SClk is an input, and data is received on the MOSI pin, and transmitted on the MISO pin. The CPOL control bit specifies the polarity of the SClk. In particular, the CPOL bit specifies the logic level of the clock when data is not being transferred. The CPHA bit affects the timing of the first bit transferred and received. If CPHA is 0, then the device will shift data in on the first (and 3rd, 5th, 7th, . . . etc.) clock edge. If CPHA is 1, then the device will shift data in on the second (and 4th, 6th, 8th, . . . etc.) clock edge. In Figure 8.8, the data is shown with MSB transferred first, but the 9S12 has an option where the bits are transferred in the other order (LSB first.) There is one SPI on the 9S12C32 using pins PM5, PM4, PM3, and PM2. There are three SPIs on the 9S12DP512. SPI0 on the 9S12DP512 uses the pins PS7, PS6, PS5, and PS4. On the other hand, its SPI1 and SPI2 use the pins PH3-0 and PH7-4, respectively (see Section 8.3.1). When the SPI is enabled (SPE 1), all pins that are defined by the configuration as inputs will be inputs regardless of the state of the direction registor bits for those pins. All pins that are defined as SPI outputs will be outputs only if the corresponding direction register bits for those pins are set. If the 9S12DP512 is the master, then we should set the DDRS register to make PS7, PS6, and PS5 outputs. PS4 will automatically be an input. A bidirectional serial pin is possible using the BIDIROE as the direction control. Table 8.5 shows the ports for SPI0 on the 9S12DP512, and Table 8.6 shows the I/O registers. For example, the baud rate register on the 9S12C32 is called SPIBR, but on the 9S12DP512 there are three SPI modules, so there are three baud rate registers called SPI0BR SPI1BR and SPI2BR. The three SPI ports on the 9S12DP512 work similarly, but use different I/O pins and different addresses for the I/O registers. The SPI functions in three modes, run, wait, and stop. Run mode is the basic mode of operation. The SPI operation in wait mode is a configurable low-power mode, controlled by the SPISWAI bit. In wait mode, if the SPISWAI bit is clear, the SPI operates like in run mode. If the SPISWAI bit is set, the SPI goes into a power conservative state, with the SPI clock generation turned off. If the SPI is configured as a master, any transmission in progress stops, but is resumed after CPU goes into run mode. If the SPI is configured as a slave, reception and transmission of a byte continues, so that the slave stays synchronized to the master. The SPI is inactive in stop mode for reduced power consumption. If the SPI is configured as a master, any transmission in progress stops but is resumed after CPU goes into Run Mode. If the SPI is configured as a slave, reception and transmission of a byte continues, so that the slave stays synchronized to the master.
8.3.2 SPI Details
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$00D8 $00D9 $00DA $00DB $00DD $0248 $024A
SPIE 0 0 SPIF Bit 7 PS7 DDRS7
SPE 0 0 0 6 PS6 DDRS6
SPTIE 0 0 SPTEF 5 PS5 DDRS5
MSTR MODFEN 0 MODF 4 PS4 DDRS4
CPOL BIDIROE 0 0 3 PS3 DDRS3
CPHA 0 SPR2 0 2 PS2 DDRS2
SSOE SPISWAI SPR1 0 1 PS1 DDRS1
LSBFE SPC0 SPR0 0 Bit 0 PS0 DDRS0
SPI0CR1 SPI0CR2 SPI0BR SPI0SR SPI0DR PTS DDRS
Table 8.6 9S12DP512 SPI0 ports.
The module clock (same frequency as the E clock) is input to a divider series and the resulting SPI clock rate may be selected to be divided by 2, 4, 8, 16, 32, 64, 128, or 256. Three bits in the SPIBR register control the SPI clock rate. The SPIBR register determines the transfer rate. Table 8.7 shows the possible transmission rates for two different module clocks.
8.3 䡲 Synchronous Peripheral Interface, SPI Table 8.7 Bit rate selection for the synchronous serial port on the 9S12.
83.3 ns 166.7 ns 333.3 ns 666.7 ns 1.33 s 2.67 s 5.33 s 10.67 s
We use the SPI0CR1 register to specify the SPI mode of operations. SPE is the SPI System Enable bit. We will turn this bit on whenever we wish to use the SPI. We set the SSOE bit to enable the SS signal as shown in Figure 8.9. We clear the SSOE bit when we want to use the SS pin (PS7) as a regular I/O pin. The SPIE bit is the arm bit for SPIF, and SPTIE bit is the arm bit for SPTEF. These arm bits will be cleared when interrupts are not needed. We will set the MSTR to one so that the 9S12 becomes the master. CPOL and CPHA determine the SPI Clock Polarity and Clock Phase. These two bits are used to specify the protocol, as shown in Figure 8.8. All other bits not specifically mentioned will be cleared. SPIF is the SPI Interrupt Request bit. Even though we won’t be using interrupts, this bit gets set after each byte is transferred, and will be used to implement the busywaiting synchronization. In particular, SPIF is set after the eighth SCK cycle in a data transfer and it is cleared by reading the SPI0SR register (with SPIF set) followed by an access (read or write) to the SPI data register. SPTEF is the Transmit Empty Interrupt Flag. If set, this bit indicates that the transmit data register is empty. To clear this bit and place data into the transmit data register, SPI0SR has to be read with SPTEF 1, followed by a write to SPI0DR. Any write to the SPI Data Register without reading SPTEF 1, is effectively ignored. The SPI0DR 8-bit register is both the input and output register. A write to SPI0DR allows a data byte to be queued and transmitted. For a SPI configured as a master, a queued data byte is transmitted immediately after the previous transmission has completed. The SPTEF in the SPI0SR register indicates when the SPI Data Register is ready to accept new data. Reading the data can occur anytime from after the SPIF is set to before the end of the next transfer. If the SPIF is not serviced by the end of the successive transfers, those data bytes are lost and the data within the SPI0DR retains the first byte until SPIF is serviced. LSBFE is the LSB-First Enable bit. This bit does not affect the position of the MSB and LSB in the data register. Reads and writes of the data register always have the MSB in bit 7. In master mode, a change of this bit will abort a transmission in progress and force the SPI system into idle state. If LSBFE is 1, data is transferred least significant bit first. If LSBFE is 0, data is transferred most significant bit first.
Figure 8.9 Synchronous serial modes of the Freescale 9S12 SPI interface.
Master Mode (MSTR=1) Normal Mode SPC0=0
MO
Serial Out
SPI Serial In
DDRS5
PS5
MI PS4
Slave Mode (MSTR=0) SI
Serial In
PS5 SO PS4
SPI Serial Out DDRS4
Bidirectiona l Mode SPC0=1
Serial Out
SPI Serial In
MOMI
PS5
Serial In
SPI
BIDIROE
PS4
Serial Out
PS5 SOSI PS4 BIDIROE
298
8 䡲 Serial and Parallel Port Interfacing
The control bits MODFEN and SSOE affect the operation of the PS7 pin as defined in Table 8.8. MODF is the Mode Error Interrupt Status Flag. This bit is set if the SS input becomes low while the SPI is configured as a master and mode fault detection is enabled, MODFEN bit of SPI0CR2 register is set. The flag is cleared automatically by a read of the SPI Status Register (with MODF set) followed by a write to the SPI0CR1. Table 8.8 Mode selection for the synchronous serial port on the 9S12DP512.
MODFEN
SSOE
Master Mode (MSTR 1)
Slave Mode (MSTR 0)
0 0 1 1
0 1 0 1
PS7 not used with SPI PS7 not used with SPI PS7 is SS input with MODF feature PS7 is SS output
PS7 is SS input PS7 is SS input PS7 is SS input PS7 is SS input
Bidirectional modes are controlled by the bits SPC0, BIDIROE, and MSTR. These control bits determine the input/output configuration of the MISO and MOSI as illustrated in Table 8.9 and Figure 8.9. Table 8.9 Bidirectional modes for the SPI on the 9S12.
8.3.3 9S12DP512 Module Routing Register
Pin Mode
MSTR
SPC0
BIDIROE
MISO
MOSI
Normal Bidirectional
1 1
0 1
Master In MISO not used
Normal Bidirectional
0 0
0 1
X 0 1 X 0 1
Master Out Master In Master I/O Slave In MOSI not used
One of the confusing aspects of 9S12 family is its module routing register. On the 9S12DP512, the MODRR register allows the software to dynamically configure which port pins are used for the CAN0, CAN4, SPI0, SPI1, and SPI2 ports, see Table 8.10. Table 4.7 back in Chapter 4 defines a left to right priority. For example, if both CAN4 and I2C are enabled and mapped to
Routing of CAN4 to PM4 and PM5 only if CAN2 is disabled, and CAN0 not active or not routed here Routing of CAN4 to PM6 and PM7 only if CAN3 is disabled c Routing of CAN0 to PM2 and PM3 only if CAN1 is disabled d Routing of CAN0 to PM4 and PM5 only if CAN2 is disabled b
Table 8.10 The MODRR register on the 9S12DP512 defines which pins implement CAN0, CAN4, SPI0, SPI1, and SPI2.
8.3 䡲 Synchronous Peripheral Interface, SPI
299
PJ7 to 6, then the PJ7 to 6 will be CAN4 (not I2C) because it is higher priority (more to the left in Table 4.7). In general, the digital I/O is lowest priority and can be used only if all its special I/O functions are disabled.
8.3.4 8-bit DAC Interface
A common application of SPI involves interfacing periperals to the microcontroller. There are a wide range of sensors, actuators, inputs, outputs, memory, and displays available with a SPI interface. One class of I/O available with SPI interfaces includes digital to analog converters (DAC) and analog to digital converters (ADC). Example 8.2 Design an 8-bit DAC with a range of 0 to 5 V. Solution A digital to analog converter accepts a digital input (in our case a number between 0 and 255) and creates an analog output, Out, which in our case will be a voltage between 0 and 5 V. Since the 9S12 does not have a built-in DAC, we need to add circuits to solve the problem. Rather than build a DAC from discrete components, we will purchase a DAC chip and interface it to the 9S12. Many manufacturers produce ADC and DAC chips that are easily connected to the 9S12. This design creates a synchronous serial interface between the computer and a Maxim MAX550A digital to analog converter, as shown in Figure 8.10. The interface wiring diagram is included in the data sheet for the chip. The software module for this DAC interface will need two public functions, one to initialize, and one to output data.
Figure 8.10 A 8-bit DAC interfaced to the SPI port.
SPI
Output SClk MOSI MISO
MAX550A +5V CS Vdd SClk REF Din Out nc +5V
LDAC Gnd
As with any SPI interface, there are basic interfacing issues to consider. 1. Word size. In this case we need to transmit two 8-bit packets to the DAC. The first will be a command code ($09), which will tell the DAC to output the following data right away. The second 8-bit packet will be the digital data 0 to 255. 2. Bit order. The MAX550A requires the most significant bits first. This is the normal mode of the SPI. 3. Clock phase, clock polarity. There are two issues to resolve. Since the MAX550A samples its serial input data on the rising edge of the clock, the SPI must change the data on the falling edge. The mode with CPOL CPHA 0 satisfies this requirement. Many devices that employ SPI will tell you which CPOL CPHA values you should use. 4. Bandwidth. We look at the timing specifications of the MAX550A, which can handle a clock period as short as 80 ns. So, the fastest SPI clock will be used. The SPI/MAX550A timing is shown in Figure 8.11. First CS is made to fall, followed by two output frames (command and data), and then CS is made to rise again. Figure 8.11 MAX550A DAC serial timing.
Output
CS
SClk MOSI
SClk 8-bit command
8-bit data
Din
300
8 䡲 Serial and Parallel Port Interfacing
Because we want the CS signal to remain low for the entire 16-bit transfer then pulse low, we will implement it using the regular I/O pin functions. The ritual, shown in Program 8.3 initializes the direction register, SPI mode, and bandwidth. Program 8.3 Initialization for a MAX550A DAC interface using the SPI.
To change the DAC output, we need to send two 8-bit transmissions, as shown in Program 8.4. It takes four steps to create a single 8-bit transmission: 1) wait for SPTEF, 2) write data to SPI0DR, 3) wait for SPIF, and 4) read data from SPI0DR. In this module, send is a private function, and DAC_Init and DAC_Out are public functions.
; Register B is code to transmit send brclr SPI0SR,#$20,* ; 1) wait SPTEF stab SPI0DR ; 2) write data brclr SPI0SR,#$80,* ; 3) wait SPIF ldab SPI0DR ; 4) read data rts ;Register A contains new data DAC_Out bclr PTS,#$80 ;PS7=CS=0 ldab #$09 bsr send ;send command tab bsr send ;send DAC data bset PTS,#$80 ;PS7=CS=1 rts
Program 8.4 Program to send data to a MAX550A DAC interface using the SPI.
Approximating the 8-bit DAC as linear, we can estimate the output Vout as 5*Din/256. Another similar equation is Vout 5*Din/255, which is the same as Vout Din/51. Analog Devices, Maxim and Texas Instruments will send free sample parts of DACs and ADCs to students, so these SPI interface examples are inexpensive to build. The data sheets of most devices will assist you when interfacing it to the 9S12. I do suggest you get plastic dual in-line packages (PDIP), so you can plug the parts into a protoboard. Also beware of voltage levels, making sure the part operates on whatever voltage supply you have on your system.
Scanned Keyboards In a scanned interface, the switches are placed in a row/column matrix, In this way, many keys can be interfaced with just a few I/O pins. Figure 8.12 shows a matrix keyboard with 4 rows and 4 columns. In general, if there are n rows and m columns, there could be n*m switches, but we would need only n ⴙ m I/O pins. The ⴛ at the four outputs signifies open collector (an output with two states HiZ and low.)
Figure 8.12 A matrix keyboard interfaced to the microcomputer.
+5V +5V +5V +5V 9S12
row3
PP7 PP6 PP5 PP4 PP3
row2
1
2
3
+
row1
4
5
6
–
row0
7
8
9
*
column3
c
0
=
/
column2
PP2
column1
PP1
column0
PP0
The computer drives one row at a time to zero, while leaving the other rows at HiZ. By reading the column, the software can detect if a key is pressed in that row. The software “scans” the device by checking all rows one by one. For most microcontrollers, the open collector functionality can be implemented by toggling the direction register. Remember, open collectors have two states low and off. The output low state can be made by making the pin an output and setting the output data to zero. The output off state can be made by making the pin an input. Table 8.11 illustrates the sequence to scan the 4 rows.
Table 8.11 Scanning patterns for a 4 by 4 matrix keyboard.
DDRP
row 3
row 2
row 1
row 0
column 3
column 2
column 1
column 0
$80 $40 $20 $10
0 hiZ hiZ hiZ
hiZ 0 hiZ hiZ
hiZ hiZ 0 hiZ
hiZ hiZ hiZ 0
1 4 7 c
2 5 8 0
3 6 9
* /
This configuration allows many switches to be interfaced with a small number of parallel I/O pins. In our example situation, the single 8-bit I/O port can handle 16 switches with only an 8-wire cable. Two 8-bit ports could interface 64 keys. The disadvantage of the scanned approach over the direct approach (e.g., one switch per one I/O pin) is that the scanned keyboard can only handle situations where 0, 1, or 2 switches are simultaneously pressed. If three keys are pressed in a “L” shape, then the fourth key that completes the rectangle will appear to be pressed. This limitation is not a problem for most of the 103 keys on a standard computer keyboard. However, the shift, alt, and control keys must be interfaced with the direct method. Because of the switch bounce, a rising and falling edges may occur when any
302
8 䡲 Serial and Parallel Port Interfacing
of the keys changes. The resistor pull-ups on the PP3-PP0 inputs could be configured internally. There are two steps to scan a particular row: 1. Select that row by driving it low (make output 0), while the other rows are left not driven (make output hiZ). we set the direction register, so that only one row is output 2. Read the columns to see if any keys are pressed in that row, 0 means the key is pressed 1 means the key is not pressed When all rows are 0, one of the Port P key wakeup inputs will fall when any of the 16 keys is pressed. The scanned keyboard operates properly if 1. No key is pressed 2. Exactly one key is pressed 3. Exactly two keys are pressed. The software to scan this keyboard can be found in Program 8.5. The row is selected by setting the direction register to $80 (then $40, then $20, then $10). The output data is set to 0
Program 8.5 Software interface of a matrix scanned keyboard.
const struct Row{ unsigned char direction; unsigned char keycode[4];} #typedef const struct Row RowType; RowType ScanTab[5]={ { 0x80, “123+” }, // row 3 { 0x40, “456-” }, // row 2 { 0x20, “789*” }, // row 1 { 0x10, “c0=/” }, // row 0 { 0x00, “ “ }}; void Key_Init(void){ // PP3-PP0 are inputs DDRP=0x00; // PP7-PP4 are oc out PTP = 0x00;} // zero when out /* Returns ASCII code for key pressed, Num is the number of keys pressed both equal zero if no key pressed */ unsigned char Key_Scan(int *Num){ RowType *pt; unsigned char column,key; int j; (*Num)=0; key=0; // default values pt = &ScanTab[0]; while(pt->direction){ DDRP = pt->direction; // one output column = PTP; // read columns for(j=3; j>=0; j--){ if((column&0x01)==0){ key = pt->keycode[j]; (*Num)++; } column>>=1; // shift into position } pt++; } return key; }
8.5 䡲 Parallel Port LCD Interface with the HD44780 Controller
303
once in the initialization. The Key_Scan function returns two parameters. One parameter is the number of keys pressed. If there is exactly one key pressed, the second parameter contains the ASCII code representing that key. A debounced interface is created by scanning the keyboard at a rate slower than the time of the bouncing. For example, if the bounce is less than 5 ms, then scan the keyboard every 10 ms. This way a bouncing key will not be seen as touched/released/touched. Observation: An n by n matrix keypad has n2 keys, but requires only 2n I/O pins. You can detect any 0, 1, or 2 key combinations, but it has trouble when 3 or more are pressed. Checkpoint 8.3: What happens if the three keys ‘1’ ‘2’ and ‘5’ are all pressed? Checkpoint 8.4: Why wouldn’t you use a matrix approach when creating a music keyboard for an electric piano?
The key wakeup and input capture will be presented in the next chapter. Either mechanism can be used to generate interrupts on touch and release. We can “arm” this interface for interrupts by driving all the rows to zero.
8.5
Parallel Port LCD Interface with the HD44780 Controller Microprocessor controlled LCD displays are widely used, having replaced most of their LED counterparts, because of their low power and flexible display graphics. This example will illustrate how a handshaked parallel port of the microcomputer will be used to output to the LCD display. The hardware for the display uses an industry standard HD44780 controller, as shown in Figure 8.13. The low-level software initializes and outputs to the HD44780 controller. The 9S12 simply writes ASCII characters to the HC44780 controller. Each ASCII character is mapped into a 5 by 8 bit pixel image, called a font. A 1 by 16 LCD display is 80 pixels wide by 8 pixels, and the HD44780 is responsible for refreshing the pixels in a rastered scanned manner similar to maintaining an image on a TV screen or computer monitor.
Figure 8.13 Interface of a HD44780 LCD controller.
+5
9S12
10kΩ PH0 PH1 PH2 PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Vss (ground) Vdd (power) Vee (contrast) RS R/W E DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7
1 by 16 LCD display
HD44780 controller 5 by 8 bit font
There are four types of access cycles to the HD44780 depending on RS and R/W as shown in Table 8.12. Table 8.12 Two control signals specify the type of access to the HD44780.
RS
R/W
Cycle
0 0 1 1
0 1 0 1
Write to Instruction Register Read Busy Flag (bit 7) Write data from P to the HD44780 Read data from HD44780 to the P
304
8 䡲 Serial and Parallel Port Interfacing
Normally, you write ASCII characters into the data buffer (called DDRAM in the data sheets) to have them displayed on the screen. However, you can create up to eight new characters the LCD by writing to the CGRAM; notice the University of Texas (UT) symbol in Figure 8.14. These new characters exist as ASCII data 0 to 7. Figure 8.14 HD44780-based LCD display interfaced to a 9S12. (Courtesy of Jonathan Valvano.)
Two types of synchronization can be used, blind-cycle and busy-waiting. Most operations require 40 s to complete while some require 1.64 ms. This implementation uses the timer to create the blind-cycle wait. A busy-waiting interface would have provided feedback to detect a faulty interface, but has the problem of creating a software crash if the LCD never finishes. A better interface would have utilized both busy-waiting and blind-cycle, so that the software can return with an error code if a display operation does not finish on time (due to a broken wire or damaged display.) First we present a low-level private helper function, see Program 8.6. This function would not have a prototype in the LCD.H file. E equ 4 ;PH2 RW equ 2 ;PH1 RS equ 1 ;PH0 ; Output command to LCD ; Inputs: RegA is command, Outputs: none OutCmd staa PTP movb #0,PTH ;E=0, RS=0, R/W=0 movb #E,PTH ;E=1, RS=0, R/W=0 movb #0,PTH ;E=0, RS=0, R/W=0 ldd #40 jsr Timer_Wait ;at least 37us rts
#define E 4 // on PH2 #define RW 2 // on PH1 #define RS 1 // on PH0 void OutCmd(unsigned char command){ PTP = command; PTH = 0; // E=0, R/W=0, RS=0 PTH = E; // E=1, R/W=0, RS=0 PTH = 0; // E=0, R/W=0, RS=0 Timer_Wait(40); // at least 37us }
Program 8.6 Private functions for an HD44780 controlled LCD display.
Next, we show the high-level public functions, see Program 8.7. These functions would have prototypes in the LCD.H file. The initialization sequence is copied from the data sheet of the HD44780. Figure 8.15 shows a rough sketch of the E, RS, R/W and data signals as the LCD_OutChar function is executed.
8.5 䡲 Parallel Port LCD Interface with the HD44780 Controller ; Initialize HD44780 LCD display ; Inputs: none, Outputs: none LCD_Init movb #$FF,DDRP ;LCD data movb #$FF,DDRH ;PH3=R/W,PH1=E, PH2=RS jsr Timer_Init ;1us TCNT ldy #15 jsr Timer_Wait1ms ;15ms ldaa #$38 ;first time jsr OutCmd ldy #4 jsr Timer_Wait1ms ;4ms ldaa #$38 ;second time jsr OutCmd ldd #100 jsr Timer_Wait ;100us ldaa #$38 ;third time jsr OutCmd ldaa #$38 ;N=1 two line, F=0 5x7 jsr OutCmd ;DL=1 8-bit data ldaa #$08 ;display off jsr OutCmd jsr LCD_Clear ldaa #$0E ;set D=1, C=1, B=0 jsr OutCmd ;cursor on,no blink ldaa #$06 ;set I/D, S jsr OutCmd ;inc, no shift ldaa #$14 ;cursor move jsr OutCmd ;left rts ; Output one character to LCD ; Inputs: RegA is ASCII, Outputs: none LCD_OutChar staa PTP movb #RS,PTH ;E=0, R/W=0, RS=1 movb #E+RS,PTH ;E=1, R/W=0, RS=1 movb #RS,PTH ;E=0, R/W=0, RS=1 ldd #40 jsr Timer_Wait ;at least 40us rts LCD_Clear ldaa #$01 jsr OutCmd ;Clear Display ldd #1600 jsr Timer_Wait ;at least 1.52ms ldaa #$02 jsr OutCmd ;Cursor to home ldd #1600 jsr Timer_Wait ;at least 1.52ms rts
Program 8.7 Public functions for an HD44780 controlled LCD display.
305
void LCD_init(void){ DDRH = 0xFF; DDRP = 0xFF; Timer_Init(); // 1us TCNT Timer_Wait1ms(15); // 15 ms OutCmd(0x38); // function set Timer_Wait1ms(4); // 4 ms OutCmd(0x38); // second time Timer_Wait(100); // 100us OutCmd(0x38); // third time // now the busy flag could be read OutCmd(0x38); // 8bit, N=1 2line, F=0 5by7 OutCmd(0x08); // D=0 displayoff LCD_Clear(); OutCmd(0x0E); // D=1 displayon, // C=1 cursoron, B=0 blinkoff OutCmd(0x06); // Entry mode // I/D=1 Increment, S=0 nodisplayshift OutCmd(0x14); // S/C=0 cursormove, R/L=0 shiftleft } void LCD_OutChar(unsigned char letter){ // letter is ASCII code PTP = letter; PTH = RS; // E=0, R/W=0, RS=1 PTH = E+RS; // E=1, R/W=0, RS=1 PTH = RS; // E=0, R/W=0, RS=1 Timer_Wait(40); // 40 us wait } void LCD_clear(void){ OutCmd(0x01); // Clear Display Timer_Wait(1600); // 1.6 ms wait OutCmd(0x02); // Cursor to home Timer_Wait(1600); // 1.6 ms wait }
306
8 䡲 Serial and Parallel Port Interfacing
Figure 8.15 Timing diagram of the LCD signals as data is sent to the HD44780 display.
Checkpoint 8.5: Assuming the 9S12 is running at 8 MHz, how many s wide is the E pulse for the assembly language solution in Program 8.7? The movb instruction requires 4 cycles.
8.6
Binary Actuators 8.6.1 Interface
Relays, solenoids, and DC motors are grouped together because their electrical interfaces are similar. We can add speakers to this group if the sound is generated with a square wave. In each case, there is a coil, and the computer must drive (or not drive) current through the coil. To interface a coil, we consider voltage, current, and inductance. We need a power supply at the desired voltage requirement of the coil. If the only available power supply is larger than the desired coil voltage, we use a voltage regulator (rather than a resistor divider to create the desired voltage.) We connect the power supply to the positive terminal of the coil, shown as V in Figure 8.16. We will use a transistor device to drive the negative side of the coil to ground. The computer can turn the current on and off using this transistor. The second consideration is current. In particular, we must however select the power supply and an interface device that can support the coil current. The 7406 is a digital invertor with open collector outputs (hiZ and low). The 2N2222 is a bipolar junction transistor (BJT), NPN type, with moderate current gain. The TIP120 is a Darlingtion transistor, also NPN type, that can handle larger currents. The IRF540 is a MOSFET transistor that can handle even more current. BJT and Darlington transistors are current-controlled (meaning the output is a function of the input current), while the MOSFET is voltage-controlled (output is a function of input voltage). When interfacing a coil to the microcontroller, we use information like Table 8.13 to select an interface
Figure 8.16 Binary interface to EM relay, solenoid, DC motor or speaker.
+V
+V
+
+
R 1N914 9S12
7406
Port
IOL
L + –
emf
9S12
IC Rb IB
–
+
Port
VOL –
+
VOH –
Table 8.13 Four possible devices that can be used to interface a coil compared to the 9S12.
R
2N2222 TIP120 1N914 or IRF540
Coil
+ – VCE + VBE –
Device
Type
Maximum Current
9S12 7406 2N2222 TIP120 IRF540
CMOS TTL logic BJT NPN Darlington NPN power MOSFET
10 mA 40 mA 500 mA 5A 28 A
Coil L + –
emf –
8.6 䡲 Binary Actuators
307
device capable the current necessary to activate the coil. It is a good design practice to select a driver with a maximum current at least twice the required coil current. When the digital Port output is high, the the interface transistor is active and current flows through the coil. When the digital Port output is low, the transistor is not active and no current flows through the coil. Similar to the solenoid and EM relay, the DC motor has a frame that remains motionless, and an armature that moves. In this case, the armature moves in a circular manner (shaft rotation). A DC motor has an electro-magnet as well. When current flows through the coil, a magnetic force is created causing a rotation of the shaft. Brushes positioned between the frame and armature are used to alternate the current direction through the coil, so that a DC current generates a continuous rotation of the shaft. When the current is removed, the magnetic force stops, and the shaft is free to rotate. The resistance in the coil (R) comes from the long wire that goes from the terminal to the – terminal of the motor. The inductance in the coil (L) arises from the fact that the wire is wound into coils to create the electromagnetics. The coil itself can generate its own voltage (emf) because of the interaction between the electric and magnetic fields. If the coil is a DC motor, then the emf is a function of both the speed of the motor and the developed torque (which in turn is a function of the applied load on the motor.) Because of the internal emf of the coil, the current will depend on the mechanical load. For example, a DC motor running with no load might draw 50 mA, but under load (friction) the current may jump to 500 mA. Observation: It is important to realize that many devices can not be connected directly up to the microcontroller. In the specific case of motors, we need an interface that can handle the voltage and current required by the motor.
The third consideration is inductance in the coil. The 1N914 diode in Figure 8.16 provides protection from the back emf generated when the switch is turned off, and the large dI/dt across the inductor induces a large voltage (on the negative terminal of the coil), according to V L•dI/dt. For example, if you are driving 0.1A through a 0.1 mH coil (Port output 1) using a 2N2222, then disable the driver (Port output 0), the 2N2222 will turn off in about 20ns. This creates a dI/dt of at least 5•106 A/s, producing a back emf of 500 V! The 1N914 diode shorts out this voltage, protecting the electronic from potential damage. The 1N914 is called a snubber diode. If you are sinking 16 mA (IOL) with the 7406, the output voltage (VOL) will be 0.4 V. However, when the IOL of the 7406 equals 40 mA, its VOL will be 0.7 V. 40 mA is not a lot of current when it comes to typical coils. However, the 7406 interface is appropriate to control small reed relays. Checkpoint 8.6: A reed relay is interfaced with the 7406 circuit in Figure 8.16. The positive terminal of the coil is connected to 5 V and the coil requires 40 mA. What will be the voltage across the coil when active?
There are lots of motor driver chips, but they are fundamentally similar to the circuits shown in Figure 8.16. For the 2N2222 and TIP120 NPN transistors, if the Port output is low, no current can flow into the base, so the transistor is off, and the collector current, IC, will be zero. If the Port output is high, current does flow into the base and VBE goes above VBEsat turning on the transistor. The transistor is in the linear range if VBE VBEsat and Ic hfe•Ib. The transistor is in the saturated mode if VBE VBEsat, VCE 0.3 V and Ic hfe•Ib. We select the resistor for the NPN transistor interfaces to operate right at the transition between linear and saturated mode. We start with the desired coil current, Icoil (the voltage across the coil will be V VCE which will be about V 0.3 V). Next, we calculate the needed base current (Ib) given the current gain of the NPN Ib Icoil/hfe
308
8 䡲 Serial and Parallel Port Interfacing
knowing the current gain of the NPN (hfe). See Table 8.14. Finally, given the output high voltage of the microcontroller (VOH is about 5 V) and base-emitter voltage of the NPN (VBEsat) needed to activate the transistor, we can calculate the desired interface resistor. Rb (VOH VBEsat)/Ib hfe *(VOH VBEsat)/Icoil The inequality means we can choose a smaller resistor, creating a larger Ib. Because the of the transistors can vary a lot, it is a good design practice to make the Rb resistor about 1 ⁄2 the value shown in the above equation. Since the transistor is saturated, the increased base current produces the same VCE and thus the same coil current.
Table 8.14 Design parameters for the 2N2222 and TIP120.
Parameter
2N2222 (IC 150 mA)
2N2222 (IC 500 mA)
TIP120 (IC 3A)
hfe VBEsat VCE at saturation
100 0.6 0.3
40 2 1
1000 2.5 V 2V
The IRF540 MOSFET is a voltage-controlled device, if the Port output is low, the MOSFET is off, and the coil current will be zero. If the Port output is high, the MOSFET is on, and the VCE will be very close to 0. No resistor is needed between the Port output and the gate of the MOSFET, but often we add a resistor (i.e., Rb 1 k) to limit current into and out of the 9S12 during the turn on/off transients. Because of the resistance of the coil, there will not be significant dI/dt when the device is turned on. Consider a DC motor as shown in Figure 8.16 with V 12 V, R 50 and L 100 H. Assume we are using a 2N2222 with a VCE of 1 V at saturation. Initially the motor is off (no current to the motor). At time t 0, the digital port goes from 0 to 5 and transistor turns on. Assume for this section, the emf is zero (motor has no external torque applied to the shaft) and the transistor turns on instantaneously, we can derive an equation for the motor (Ic) current as a function of time. The voltage across both LC together is 12 VCE 11 V at time 0. At time 0, the inductor is an open circuit. Conversely, at time , the inductor is a short circuit. The Ic at time 0 is 0, and the current will not change instantaneously because of the inductor. Thus, the Ic is 0 at time 0. The Ic is 11 V/50 220 mA at time . 11 V Ic*R L*d Ic/dt General solution to this differential equation is Ic I0 I1et/
d Ic/dt (I1/ )et/
We plug the general solution into the differential equation and boundary conditions. 11 V (I0 I1et/ )*R L*(I1/ )et/ To solve the differential equation, the time constant will be L/R 2 sec. Using initial conditions, we get Ic 220 mA*(1 et/2 s)
Example 8.3 Design an interface for two 12 V 1A geared DC motors. These two motors will be used to propel a robot with two independent drive wheels as shown in Figure 8.17.
8.6 䡲 Binary Actuators
309
Figure 8.17 Geared DC motors provide a good torque and speed for light-weight robots. (Courtesy of Jonathan Valvano.)
Solution We will use two copies of the TIP120 circuit in Figure 8.16 because the TIP120 can sink at least three times the current needed for this motor. We select a 12 V supply and connect it to the V in the circuit. The needed base current is. Ib Icoil/hfe 1A/1000 1 mA The desired interface resistor. Rb (VOH Vbe)/Ib (5 2.5)/1 mA 2.5 k To cover the variability in hfe, we will use a 1.24 k resistor instead of the 2.5 k. The actual voltage on the motor when active will be 12 2 10 V. The coils and transistors can vary a lot, so it is appropriate to experimentally verify the design by measuring the voltages and currents.
8.6.2 Electromagnetic and Solid-State Relays
A relay is a device that responds to a small current or voltage change by activating switches or other devices in an electric circuit. It is used to remotely switch signals or power. The input control is usually electrically isolated from the output switch. The input signal determines whether the output switch is open or closed. Relays are classified into three categories depending upon whether the output switches power (i.e., high currents through the switch) or electronic signals (i.e., low currents through the switch). Another difference is how the relay implements the switch. An electromagnetic (EM) relay uses a coil to apply EM force to a contact switch that physically opens and closes. The solid state relay (SSR) uses transistor switches made from solid state components to electronically allow or prevent current flow across the switch). The three types are: 䡲 The classic general purpose relay has an EM coil and can switch AC power 䡲 The reed relay has an EM coil and can switch low level DC electronic signals 䡲 The solid state relay (SSR) has an input triggered semiconductor power switch Two solid state relays are shown in Figure 8.18. Interfacing a SSR is identical to interfacing an LED, which was previously described in Section 2.8.3, Figure 2.17. A SSR interface was presented earlier as Figure 3.10. SSRs allow the microcontroller to switch AC loads from 1 to 30A. They are appropriate in situations where the power is turned on and off many times.
310
8 䡲 Serial and Parallel Port Interfacing
Figure 8.18 Solid state relays can be used to control power to an AC appliance. (Courtesy of Jonathan Valvano.)
The input circuit of an EM relay is a coil with an iron core. The output switch includes two sets of silver or silver-alloy contacts (called poles.) One set is fixed to the relay frame, and the other set is located at the end of leaf spring poles connected to the armature. The contacts are held in the “normally closed” position by the armature return spring. When the input circuit energizes the EM coil, a “pull in” force is applied to the armature and the “normally closed” contacts are released (called break) and the “normally open” contacts are connected (called make.) The armature pull in can either energize or de-energize the output circuit depending on how it is wired. Relays are mounted in special sockets, or directly soldered onto a PC board. The number of poles (e.g., single pole, double pole, 3P, 4P, etc.) refers to the number of switches that are controlled by the input. Single throw means each switch has two contacts that can be open or closed. Double throw means each switch has three contacts. The common contact will be connected to one of the other two contacts (but not both at the same time.) The parameters of the output switch include maximum AC (or DC) power, maximum current, maximum voltage, on resistance, and off resistance. A DC signal will weld the contacts together at a lower current value than an AC signal, therefore the maximum ratings for DC are considerable smaller than for AC. Other relay parameters include turn on time, turn off time, life expectancy, and input/output isolation. Life expectancy is measured in number of operations. Figure 8.19 illustrates the various configurations available. The sequence of operation is described in Table 8.15. Figure 8.19 Standard relay configurations.
Form A 1
1
Form C 1 2
Form D 2 1
Form E 3 2
+
+
+
+
1 +
–
–
–
–
–
SPST-NO
Table 8.15 Standard definitions for five relay configurations.
Form B
SPST-NC
SPDT
SPDT
SPDT (B-M-B)
Form
Activation Sequence
Deactivation Sequence
A B C D E
Make 1 Break 1 Break 1, Make 2 Make 1, Break 2 Break 1, Make 2, Break 3
Break 1 Make 1 Break 2, Make 1 Make 2, Break 1
8.7 䡲 *Pulse-Width Modulation
8.6.3 Solenoids
311
Solenoids are used in discrete mechanical control situations such as door locks, automatic disk/tape ejectors, and liquid/gas flow control valves (on/off type). Much like an EM relay, there is a frame that remains motionless, and an armature that moves in a discrete fashion (on/off). A solenoid has an electro-magnet. When current flows through the coil, a magnetic force is created causing a discrete motion of the armature. Each of the solenoids shown Figure 8.20 has a cylindrically shaped armature the moves in the horizontal direction relative to the photograph. The solenoid on the top is used in a door lock, and the second from top is used to eject the tape from a video cassette player. When the current is removed, the magnetic force stops, and the armature is free to move. The motion in the opposite direction can be produced by a spring, gravity, or by a second solenoid.
Figure 8.20 Photo of four solenoids. (Courtesy of Jonathan Valvano.)
8.7
*Pulse-Width Modulation In the previous interfaces the microcontroller was able to control electrical power to a device in a binary fashion: either all on or all off. Sometimes it is desirable for the microcontroller to be able to vary the delivered power in a variable manner. One effective way to do this is to use pulse width modulation (PWM). The basic idea of PWM is to create a digital output wave of fixed frequency, but allow the microcontroller to vary its duty cycle. Figure 8.21 shows various waveforms that are high for H cycles and low for L cycles. The system is designed in such a way that H ⴙ L is constant (meaning the frequency is fixed). The duty cycle is defined as the fraction of time the signal is high: H Duty = H + L Hence, duty cycle varies from 0 to 1. We interface this digital output wave to an external actuator (like a DC motor), such that power is applied to the motor when the signal is high, and no power is applied when the signal is low. We purposely select a frequency high enough so the DC motor does not start/stop with each individual pulse, but rather responds to the overall average value of the wave. The average value of a PWM signal is linearly related to its duty cycle and is independent of its frequency. Let P (P V*I) be the power
312
8 䡲 Serial and Parallel Port Interfacing
Figure 8.21 Pulse width modulation used to vary power delivered to a DC motor.
+V
DC motor
+
R
1N914 2N2222 TIP120 or IRF540 Rb
9S12 PWM PP0
H
L
200
50
PP0
125
125
PP0
50
200
PP0
L + –
H
L
emf –
H H
L L
to the DC motor, shown in Figure 8.21, when the PP0 signal is high. Notice the circuit in Figure 8.21 is one of the examples previously described in Figure 8.16. Under conditions of constant speed and constant load, the delivered power to the motor is linearly related to duty cycle. Delivered power duty * P
H H+L
*P
Unfortunately, as speed and torque vary, the developed emf will affect delivered power. Nevertheless, PWM is a very effective mechanism, allowing the microcontroller to adjust delivered power. Appreciating the importance of pulse-width modulation, Freescale added dedicated hardware to handle PWM, not previously available in the 6811. The 9S12C32 has six channels, the 9S12DP512 has eight channels, and the 9S12E128 has 12 channels. This section will present the details on the 9S12DP512. With the exception of the MODRR register, the PWM operation on all 9S12 microcontrollers is identical. Table 8.16 shows the 9S12DP512 registers used to create pulse-width modulated outputs. There are eight 8-bit channels, but
two 8-bit channels can be concatenated together to create one 16-bit channel. In particular, each of the 16-bit registers in Table 8.16 could be considered as two separate 8-bit registers. For example, the 16-bit register PWMPER01 could be considered as the two 8-bit registers PWMPER0 (at address $00B4) and PWMPER1 (at address $00B5). On the 9S12DP512, the PWM channels always use outputs on Port P (PP7-PP0). Bits 4, 5, and 6 of the MODRR register are used to map the SPI channels onto Port P, Port H or Port S, as described in Table 8.10. Since PWM has precedence over SPI (see Table 4.8), a Port P pin will become a PWM output if the corresponding bit in the PWME register is set (regardless of MODRR and SPI). On the 9S12C32, the six PWM channels use outputs on Port P (PP5 to PP0) or on Port T (PT4 to PT0). PP5 is available on all 9S12C32 packages, but the other five channels can be connected to either Port P or Port T. If a bit in the MODRR register is 1, the corresponding Port T pin is connected to the PWM system (see Table 8.17). If the bit is 1, the corresponding Port T pin is connected to the timer system. Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0247
0
0
0
MODRR4
MODRR3
MODRR2
MODRR1
MODRR0
MODRR
Table 8.17 9S12C32 MODRR register determines if PWM is on Port P or Port T.
On the 9S12E128, six PWM channels can be created on Port P (PP5 to PP0) and six more on Port U (PU5-PU0). The MODRR register can be used to map the bottom four bits of Port U onto either PWM or a timer module. The PWME register allows you to enable/disable individual PWM channels. The PWMCTL register is used to concatenate two 8-bit channels into one 16-bit PWM. For example, if the CON23 is 1, then channels 2 and 3 become one 16-bit channel with the output generated on PP3. Concatenated channels are controlled using the higher of the two channels. For example, concatenated channel 23 is configured with bits PWME3, PPOL3, PCLK3, and CAE3. The PWMPOL register specifies the polarity of the output. Figure 8.22 shows a PWM output for case when the PPOLx bit is 1. The output will be high for the number of counts in the PWMDTY register. The PWMPER register contains the number of counts in one complete cycle. The duty cycle is defined as the fraction of time the signal is high, calculated as a percent, depends on PWMPER and PWMDTY. Duty cycle 100% * PWMDTYx/PWMPERx Figure 8.22 PWM output generated when PPOL 1.
PWMPERx PWMDTYx PPx
If the PPOLx bit is 0, the output will be low for the number of counts in the PWMDTY register, as illustrated in Figure 8.23. The duty cycle, defined as a fraction of time the signal is high, is Duty cycle 100% * (PWMPERx PWMDTYx)/PWMPERx
Figure 8.23 PWM output generated when PPOL 0.
PWMPERx PWMDTYx PPx
314
8 䡲 Serial and Parallel Port Interfacing
There are many possible choices for the clock. The base clock is derived from the E clock. Activating the PLL affects the E clock, hence will affect the PWM generation. Channels 0, 1, 4, and 5 use either clock A or clock SA. Channels 2, 3, 6, and 7 use either clock B or clock SB. The six bits in the PWMPRCLK register, as shown in Table 8.18, determine the relationship between clocks A,B and the E clock.
Table 8.18 Clock A and Clock B prescale in PWMCLK.
PCKB2
PCKB1
PCKB0
Clock B
PCKA2
PCKA1
PCKA0
Clock A
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
E E/2 E/4 E/8 E/16 E/32 E/64 E/128
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
E E/2 E/4 E/8 E/16 E/32 E/64 E/128
It is possible to divide the A and B clocks further using the PWMSCLA and PWMSCLB registers. The period of the SA clock is the period of the A clock divided by two times the value in the PWMSCLA register. Similarly, the period of the SB clock is the period of the B clock divided by two times the value in the PWMSCLB register. If the value in PWMSCLA(B) is 0, then a divide by 512 is selected. The clock used for each channel is determined by the PWMCLK register. The period of the PWM output is the period of the selected clock times the value in the PWMPER register. PCLKn 1 Clock SB is the clock source for PWM channel n, where n 7, 6, 3, or 2 0 Clock B is the clock source for PWM channel n PCLKm 1 Clock SA is the clock source for PWM channel m, where m 5, 4, 1, or 0 0 Clock A is the clock source for PWM channel m Let n be the 3-bit value for PCKA2-0 in the PWMCLK register. Let the E clock period is PeriodE. Then if the A clock is selected for channel x, the periods of the A clock and PWM output will be PeriodA 2n * PeriodE PeriodPTx 2n * PWMPERx * PeriodE If the SA clock is selected for channel x, the periods of the SA clock and PWM output will be or or
PeriodSA 2n * 2 * PWMSCLA * PeriodE PeriodSA 2n * 512 * PeriodE (if PWMSCLA equals 0) PeriodPTx 2n * 2* PWMSCLA * PWMPERx * PeriodE PeriodPTx 2n * 512 * PWMPERx * PeriodE (if PWMSCLA equals 0)
The design of a PWM system considers three factors. The first factor is period of the PWM output. Most applications choose a period, initialize the waveform at that period, and adjust the duty cycle dynamically. The second factor is precision, which is the total number of duty cycles that can be created. An 8-bit PWM channel may have up to 256 different outputs, while a 16-bit channel can potentially create up to 65536 different duty cycles. More specifically, since the duty cycle register must be less than or equal to the period register (e.g., PWMDTYx PWMPERx), the precision of the system will equal PWMPERx 1 in alternatives. The last consideration is the number of channels. The 9S12DP512 supports up to eight 8-bit channels or four 16-bit channels. It is possible to mix and match, creating for example four 8-bit channels and two 16-bit channels. Different versions of the 9S12 will have different numbers of PWM channels.
8.7 䡲 *Pulse-Width Modulation
315
Example 8.4 Implement a 10-ms 8-bit PWM. Solution The software for this module will have two public functions, one function to turn it on, and a second function to set the duty cycle. In this design example, we will create the PWM output using channel 0 generated on the PP0 output, using the hardware shown in Figure 8.21. In order to maximize precision, it is best to create the 10 ms period using as large a value in PWMPER0 as possible. We have the limitation that the prescale and PWMPER0 factors will be integers. Since 10 ms/256 equals 39.0625 s, we need a clock just larger than 39 s. The fastest clock that can be used is 40 s, resulting in PWMPER0 equal to 250. Assuming the E clock period is 125 ns, the prescale needs to be 40/0.125 or 320. There are a number of ways to make this happen, but one way is to select Clock A to be E/32, create SA A/10, the select the SA clock for channel 0, as shown in Program 8.8. Checkpoint 8.7: Give another way to create a prescale of 320 on channel 0.
PWM_Init0 ;10ms PWM on PP0 bset PWME,#$01 ;enable chan 0 bset PWMPOL,#$01 ;high then low bset PWMCLK,#$01 ;Clock SA ldaa PWMPRCLK anda #$F8 oraa #$05 staa PWMPRCLK ;A=E/32 movb #5,PWMSCLA ;SA=A/10 movb #250,PWMPER0 ;10ms period clr PWMDTY0 ;initially off rts PWM_Duty0 ;RegA is duty cycle staa PWMDTY0 ;0 to 250 rts
// 10ms PWM on PP0 void PWM_Inito(void){ PWME |= 0x01; // enable channel 0 PWMPOL |= 0x01; // PP0 high then low PWMCLK |= 0x01; // Clock SA PWMPRCLK = (PWMPRCLK&0xF8)|0x05; // A=E/32 PWMSCLA = 5; // SA=A/10, 0.125*320=40us PWMPER0 = 250; // 10ms period PWMDTY0 = 0; // initially off } // Set the duty cycle on PP0 output void PWM_Duty0(unsigned char duty){ PWMDTY0 = duty; // 0 to 250 }
Program 8.8 Implementation of an 8-bit PWM output.
Checkpoint 8.8: How would you modify Program 8.8 to have a period of 100 ms?
Example 8.5 Implement a 1-second 16-bit PWM. Solution Again this module will have two public functions, one function to turn it on, and a second function to set the duty cycle. To create a 16-bit PWM we need to concatenate two 8-bit channels. We could have used channels 01, 23, 45, or 67. In this example, we choose to create the PWM output using concatenated channel 23 with its output generated on the PP3 output. In order to maximize precision, it is best to create the 1 s period using as large a value in PWMPER23 as possible. Since 1 s/65536 equals 15.2587890625 s, we need a clock just larger than 15 s. The fastest clock that can be used is 16 s, resulting in PWMPER23 equal to 62500. Assuming the E clock period is 125 ns, the prescale needs to be 16/0.125 or 128. There are a number of ways to make this happen, but one way is to make Clock B to be E/128, the select the B clock for channel 23, as shown in Program 8.9.
316
8 䡲 Serial and Parallel Port Interfacing
PWM_Init3 ;1s PWM on PP3 bset PWME,#$08 ;enable chan 3 bset PWMPOL,#$08 ;high then low bclr PWMCLK,#$08 ;Clock B bset PWMCTL,#$20 ;concat 2+3 ldaa PWMPRCLK anda #$8F oraa #$70 staa PWMPRCLK ;B=E/128 movw #62500,PWMPER23 ;1s period movw #0,PWMDTY23 ;off rts PWM_Duty3 ;RegD is duty cycle std PWMDTY0 ;0 to 62500 rts
// 1s PWM on PP3 void PWM_Init3(void){ PWME |= 0x08; // enable channel 3 PWMPOL |= 0x08; // PP3 high then low PWMCLK &=~0x08; // Clock B PWMCTL |= 0x20; // Concatenate 2+3 PWMPRCLK = (PWMPRCLK&0x8F)|0x70; // B=E/128 PWMPER23 = 62500; // 1s period PWMDTY23 = 0; // initially off } // Set the duty cycle on PP3 output void PWM_Duty3(unsigned short duty){ PWMDTY23 = duty; // 0 to 62500 }
Program 8.9 Implementation of a 16-bit PWM output. Checkpoint 8.9: What would be the effect of creating the 1 s output using a 1 ms SB clock and a PWMPER23 value of 1000? Checkpoint 8.19: Are programs 8.9 and 8.10 friendly enough to be used together?
8.8
*Stepper Motors A motor can be evaluated in terms of its maximum speed (RPM), its torque (N-m), and the efficiency in which it translates electrical power into mechanical power. Sometimes however, we wish to use a motor to control the rotational position ( motor shaft angle) rather control the rotational speed ( d/dt). Stepper motors are used in applications where precise positioning is more important than high RPM, high torque, or high efficiency. Stepper motors are very popular for microcontroller-based embedded systems because of their inherent digital interface. Figure 8.24 shows three stepper motors. The larger motors provide more
Figure 8.24 Photo of three stepper motors. (Courtesy of Jonathan Valvano.)
8.8 䡲 *Stepper Motors
317
torque, but require more current. It is easy for a computer to control both the position and velocity of a stepper motor in an open-loop fashion. Although the cost of a stepper motor is typically higher than an equivalent DC permanent magnetic field motor, the overall system cost is reduced because stepper motors may not require feedback sensors. They are used in printers to move paper and print heads, tapes/disks to position read/write heads, and highprecision robots. For example, the stepper motor shown in Figure 6.8 moves the R/W head from one track to another on an audio tape recorder. A bipolar stepper motor has two coils on the stator (the frame of the motor), labelled A and B in Figures 8.25 and 8.26. Typically, there is always current flowing through both coils. When current flows through both coils, the motor does not spin (it remains locked at that shaft angle). Stepper motors are rated in their holding torque, which is their ability to hold stationary against a rotational force (torque) when current is constantly flowing through both coils. To move a bipolar stepper, we reverse the direction of current through one (not both) of the coils, see Figure 8.25. To move it again we reverse the direction of current in the other coil. Remember, current is always flowing through both coils. Let the direction of the current be signified by up and down. To make the current go up, the microcontroller outputs a binary 01 to the interface. To make the current go down, it outputs a binary 10. Since there are 2 coils, four outputs will be required (e.g., 01012 means up/up). To spin the motor, we output the sequence 01012, 01102, 10102, 10012, . . . Figure 8.25 A bipolar stepper has 2 coils, but a unipolar stepper divides the two coils into four parts.
Interface
Bipolar stepper
Interface
Unipolar stepper A
A
+ –
+V A’
+ –
+ –
B + –
B
B’
+ –
+ –
0101 0110 1010 1001
I N
A
0101 0110 1010 1001
Flip B
Stator S
S
N S N
S
S
I
N
S N
I
I S
A
A B
A
N
N
I
S
S
N N S
I S
B
Output = 0101
N
S
Flip B
B
N
I
N N
Flip A
B
I
+V
N
S
B
I S
A
I S A
N
N S
Flip A
B I
N
S
S
N
N S N
S
I S
B
N
A
N
I S
B
S
I S
S S
N S
A
I
N
N
N S
I N B
Rotor Output = 0110
Output = 1010
Output = 1010
Figure 8.26 To rotate this stepper by 18°, the interface flips the direction of one of the currents.
N
A
318
8 䡲 Serial and Parallel Port Interfacing
over and over. Each output causes the motor to rotate a fixed angle. To rotate the other direction, we reverse the sequence (01012, 10012, 10102, 01102 . . .). There is a North and a South permanent magnet on the rotor (the part that spins). The amount of rotation caused by each current reversal is a fixed angle depending on the number of teeth on the permanent magnets. For example, the rotor in Figure 8.26 is drawn with 5 North teeth and 5 South teeth. If there are n teeth on the South magnet (also n teeth on the North magnet), then the stepper will move at 90/n degrees. This means there will be 4n steps per rotation. Because moving the motor involves accelerating a mass (rotational inertia) against a load friction, after we output a value, we must wait an amount of time before we can output again. If we output too fast, the motor does not have time to respond. The speed of the motor is related to the number of steps per rotation and the time in between outputs. For information on stepper motors see the data sheets web page at http://users.ece.utexas.edu/~valvano/Datasheets. The unipolar stepper motor provides for bi-directional currents by using a center tap, dividing each coil into two parts. In particular, coil A is split into coil A and A’, and coil B is split into coil B and B’. The center tap is connected to the V power source and the four ends of the coils can be controlled with open collector drivers. Because only half of the electromagnets are energized at one time, a unipolar stepper has less torque than an equivalent-sized bipolar stepper. However, unipolar steppers are easier to interface. For example, you can use four copies of the circuit in Figure 8.16 to interface a unipolar stepper motor. Figure 8.27 shows a circular linked graph containing the output commands to control a stepper motor. This simple FSM has no inputs, four output bits and four states. There is one state for each output pattern in the usual stepper sequence 5, 6, 10, 9, . . . The circular FSM is used to spin the motor is a clockwise direction. Notice the one-toone correspondence between the state graph in Figure 8.27 and the fsm[4] data structure in Program 8.10. Figure 8.27 This stepper motor FSM has four states. The 4-bit outputs are given in binary.
Name
Output
S5 0101
Next S6 0110
S10 1010
S9 1001
Example 8.6 Design a stepper motor controller than spins the motor at 6 RPM. Solution We choose a stepper motor according to the speed and torque requirements of the system. A stepper with 200 steps/rotation will provide a very smooth rotation while it spins. Just like the DC motor, we need an interface that can handle the currents required by the coils. We can use a L293 to interface either unipolar or bipolar steppers that require less than 1 A per coil. In general, the output current of a driver must be large enough to energize the stepper coils. We control the interface using an output port of the microcontroller, as shown in Figure 8.28. The circuit shows the interface of a unipolar stepper, but the bipolar stepper interface is similar except there is no V connection to the motor. The main program, Program 8.10, begins by initializing the Port T output and the state pointer. Every 5 ms the program outputs a new stepper command. The function Timer_Wait1ms() from Program 4.5 uses the built-in timer to generate an appropriate delay between outputs to the stepper. For a 200 step/rotation stepper, we need to wait 50 ms between outputs to spin at 6 RPM. Speed (1 rotation/200 steps)*(1000 ms/s)*(60 sec/min)*(1step/50 ms) 6 RPM
8.8 䡲 *Stepper Motors Figure 8.28 A unipolar stepper motor interfaced to a Freescale 9S12.
319
+V +5 16
PT3 9S12
2
L293 1A 1Y
8
A 3
Stepper Motor
1N914
A' PT2
7
2A 2Y
6
shaft
1N914
B PT1
10
3A 3Y
11 1N914
PT0
15
4A 4Y
1 1,2EN
+5
B'
14 4,5,12,13
1N914
9 3,4EN
org Out equ Next equ S5 fcb fdb S6 fcb fdb S10 fcb fdb S9 fcb fdb
$4000 0 1 5 S6 6 S10 10 S9 9 S5
; in ROM
main lds jsr movb ldx loop movb ldy jsr ldx bra
#$4000 Timer_Init #$FF,DDRT ;output to stepper #S5 ;initial state Out,x,PTT ;output #50 Timer_Wait1ms Next,x ;clockwise step loop
To illustrate how easy it is to make changes to this implementation, let’s consider these three modifications. To make it spin in the other direction, we simply change pointers to sequence in the other direction. To make it spin at a different rate, we change the wait time. To implement an eight-step sequence (the half-stepping outputs are 5, 4, 6, 2, 10, 8, 9, 1, . . .), we add the four new states and link all eight states in the desired sequence. These changes can be easily made. Checkpoint 8.11: If the stepper motor were to have 36 steps per rotation, how fast would the motor spin using Program 8.10?
320
8 䡲 Serial and Parallel Port Interfacing Checkpoint 8.12: What would you change in Program 8.10 to make the motor spin at 30 RPM? Performance Tip: Use a DC motor for applications requiring high torque or high speed, and use a stepper motor for applications requiring accurate positioning at low speed. Performance Tip: To get high torque at low speed, use a geared DC motor (the motor spins at high speed, but the shaft spins slowly).
8.9
Homework Problems Homework 8.1 Assume the baud rate is 9600 bits/sec. Show the serial port output versus time waveform that occurs when the ASCII characters “ABC” are transmitted one right after another. What is the total time to transmit the three characters. Homework 8.2 Assume the baud rate is 19200 bits/sec. Show the serial port output versus time waveform that occurs when the ASCII characters “125” are transmitted one right after another. What is the total time to transmit the three characters. Homework 8.3 Assume the 9S12 E clock is 8 MHz. Write an assembly language subroutine that initializes the serial port to communicate at 9600 bits/sec, 8-bit data, 1 start bit, and 1 stop bit. Homework 8.4 Sometimes it is important for the software to know when the SCI transmission is complete. The transmit complete (TC) flag is set after the data in the shift register has been transmitted. Rewrite the SCI_OutChar subroutine so that it first writes to the data register, then waits for the TC flag to be set. The TC flag is cleared by first reading the status register with TC set followed by writing into the transmit data register. Homework 8.5 Design an interface for a 64-key keyboard, which is configured with eight rows and eight columns. Show the hardware interface to Ports H and J. Show the initialization ritual. Assume there is either no keys or one key pressed. Write an input subroutine that returns the key number 0 to 63 if a key is pressed or –1 if no key is pressed. Assume the keys do not bounce. Homework 8.6 Design an interface for a 20-key keyboard, which is configured with four rows and five columns. Show the hardware interface to Ports H and J. Show the initialization ritual. Assume there is either no keys or one key pressed. The keys bounce with a maximum time of 1 ms. Use a periodic interrupt at rate of 2 ms, and scan the keyboard in the ISR. Set a public global variable (called Key) equal to 0 to 19 if a key is pressed or –1 if no key is pressed. Homework 8.7 Let P be the 16-bit unsigned period of a squarewave in cycles. Each cycle is 500 ns. Calculate the equivalent frequency, f, in Hz. In particular, f 2000000/P The input is passed by value in Register D, and the result is also returned by value in Register D. Homework 8.8 Let P be the 16-bit unsigned period of a squarewave in cycles. Each cycle is 125 ns. Calculate the equivalent frequency, f, in Hz. In particular, f 8000000/P The input is passed by value in Register D, and the result is also returned by value in Register D. Homework 8.9 Interface an electromagnetic relay (2 wires) to the 9S12 pin PP5. The coil requires 250 mA at 5 V. Write a ritual to initialize the interface. Write a subroutine, called On, that activates the relay, and a subroutine, called Off, that deactivates the relay. Homework 8.10 Interface a solenoid (2 wires) to the 9S12 pin PP5. The coil requires 100 mA at 5 V. Write a ritual to initialize the interface. Write a subroutine, called Pulse, that activates the solenoid for 10 ms (then shuts off). No interrupts needed, use Timer_Wait.
8.10 䡲 Laboratory Assignments
321
Homework 8.11 Interface a DC motor (2 wires) to the 9S12. The coil requires 500 mA at 12 V. In addition to the motor output, there are two inputs. When the Go input is high the motor spins, (when Go is low, no power is delivered). The the motor is spinning, the other input (Direction) determines the CCW/CW rotational direction. Use a L293 H-bridge driver. Homework 8.12 There is a 9S12 digital output connected to a 9S12 digital input across a long cable. The connection has an equivalent capacitance of 25 pF into a 10 M resistance. The capacitance results from the long cable, and the resistance results from the input impedance of the 9S12. What is the time constant of this system? If we operate 10 times slower than the time constant, what is the maximum period allowed for this system? List two ways to speed up this transmission. Homework 8.13 Considering the voltages shown in Table 8.2, prove that you can connect a 9S12 output (VDD 5 V) to a 7404 input. Similarly, prove that you can not connect a 7404 output to a 9S12 input. Which logic family types shown in Table 8.2 allow the output of the digital gate to be connected to a 9S12 input? (By the way, if you wanted to connect a 7404 output to a 9S12 input, you could add a 1 k pull-up resistor on the 7404 output to 5 V, increasing the VOH of the output.) Homework 8.14 Interface a 12-bit DAC, MAX539 to the 9S12 SPI port. Connect MAX539 pins 1, 2, and 3 to the 9S12 SPI. Leave pin 4 not connected. Use a REF03 to create a 2.5 V reference and connect it to the MAX539 pin 6 reference input. Pin 8 is 5 V power and pin 5 is ground. Write two functions, one to initialize and one to update the DAC analog output. Updating the DAC output will require three SPI transmissions. Homework 8.15 Design an 8-bit PWM driver for Port P pin 5. Implement positive logic (PPOL5 equals 1) and left justified (CAE5 equals 0). There will be three functions: one to initialize the system at 1000 Hz 50% duty cycle, one to set the period, and a third function to set the duty cycle. You should fix the PWMPER5 to a constant value of 250, then allow the user to modify the clock using the second function. Add comments to your software that explains how the PWM driver can be used. Homework 8.16 Interface a unipolar stepper motor (5 wires) to the 9S12 pins PM3 to 0. Each coil requires 500 mA at 12 V. There are 200 steps per revolution. Write software that spins the motor at 1 rps, using Timer_Wait. Homework 8.17 Interface a unipolar stepper motor (5 wires) to the 9S12 pins PM3 to 0. Each coil requires 100 mA at 6 V. There are 36 steps per revolution. Write software that spins the motor at 10 rps, using Timer_Wait. Homework 8.18 Interface a bipolar stepper motor (4 wires) to the 9S12 pins PT3 to 0. Each coil requires 500 mA at 12 V. There are 200 steps per revolution. Write software that spins the motor at 5 rps, using Timer_Wait. Homework 8.19 Interface a 32 speaker (2 wires) to the 9S12 PT0. To make a sound, output a 1 kHz squarewave to the interface, creating about 1 V peak-to-peak on the speaker (about 30 mA pulsed current). Use the 5 V supply and a NPN transistor. Write a main program to activate the sound. Homework 8.20 Write open-loop software to control power to the robot shown in Figure 8.17. Assume the two copies of the TIP120 circuit from Figure 8.16 are connected to two 8-bit PWM channels. Write a Motor_Init subroutine to initialize the two PWM channels. Write a Motor_Left subroutine that adjusts delivered power to the left wheel. Write a Motor_Right subroutine that adjusts delivered power to the right wheel. Assume call by value parameters 0 to 250 in RegA for the left and right subroutines.
8.10
Laboratory Assignments Lab 8.1 Keyboard Device Driver Purpose: You will design the hardware interface between a keyboard and a microcomputer, create the low-level device driver, interface a single LED, and implement keyboard security system. Description: In this keyboard lab, you will design the keyboard interface using busy-wait synchronization. In the next chapter we will learn interrupts. Placing the key input task into a
322
8 䡲 Serial and Parallel Port Interfacing background thread, frees the main program to execute other tasks while the software is waiting for the operator to type something. This security system doesn’t have anything else to do, but in a complex system, it is important to be able to perform multiple tasks. The second advantage of interrupts is the ability to create accurate time delays even with a complex software environment. In this implementation, you will use busy-wait. One way to solve the switch-bounce problem is to wait in between scanning the keyboard. The time in between scans must be longer than the bounce time of the switch, but shorter than the total time a key is touched or released. For example, if the switch has a bounce time of 500 sec, then you could scan every 1 msec. If there is exactly one key typed and this key is different from the pattern observed at the time of the scan, then you will return the ASCII code. This experiment will illustrate how a parallel port of the microcomputer will be used to control a keyboard matrix. In each case your computer will drive the rows (output 0 or HiZ) and read the columns. The low level software (inputs, scans, debounces, and saves keys in a FIFO) runs in a background period interrupt thread. Your system must handle two-key rollover. For example, if the operator were to type “1,2,3”, they could push “1”, push “2”, release “1”, push “3”, release “2”, then release “3”. Low level device drivers normally exist in the BIOS ROM and have direct access to the hardware. They provide the interface between the hardware and the rest of the software. Good low-level device drivers allow: 䡲 䡲 䡲 䡲
New hardware to be installed New synchronization methods to be implemented (like changing busy-waiting to interrupts) New algorithms to be added (error detection, data compression) Higher level features to be built on top of the low level
and still maintain the same software interface. In larger systems like the Workstation and IBM-PC, the low level I/O software is compiled and burned in ROM separate from the code that will call it, it makes sense to implement the device drivers as software traps or software interrupt (swi) and specify the calling sequence in assembly language. In embedded systems like we use, it is OK to provide a source code file that the user can assemble into their application. Linking is the process of resolving addresses to code and programs that have been complied separately. In this way, the routines can be called from any program without requiring complicated linking. In other words, when the device driver is implemented with an swi, the linking is built into the operation of the software interrupt instruction. In our embedded system, the assembler will perform the linking. The concept of a device driver can be illustrated with a prototype device driver. You are encouraged to modify/extend this example, and define/develop/test your own format. A prototype keyboard device driver follows. The device driver software is grouped into four categories. 1. Data structures: global, private (accessed only by the device driver, not the user) openFlag Boolean that is true if the keyboard port is open initially false, set to true by Key_Open, set to false by Key_Close static storage (or dynamically created at bootstrap time, i.e., when loaded into memory) 2. Initialization routines (called by user) Key_Open Initialization of keyboard port Sets openFlag to true Initializes hardware Returns an error code in RegA if unsuccessful (already open) Input Parameters(none) Output Parameter(error code) Typical calling sequence jsr Key_Open tsta ; 0 if opened correctly bne error Key_Close Release of keyboard port Sets openFlag to false Returns an error code in RegA if not previously open Input Parameters(none) Output Parameter(error code) Typical calling sequence jsr KeyClose tsta ; 0 if closed correctly bne error
8.10 䡲 Laboratory Assignments
323
3. Regular I/O calls (called by user to perform I/O) Key_In Input an ASCII character from the keyboard port Waits for a key to be pressed, then waits for it to be released (there is bounce and two key rollover) Returns data in RegB if successful Returns an error code in RegA if unsuccessful device not open, hardware failure (probably not applicable here) Output Parameters: RegB is data, RegA is error code Typical calling sequence jsr Key_In tsta ; 0 if input is OK correctly bne error stab data ; save new key data Key_Status Returns the status of the keyboard port Returns a true in RegA if a call to Key_In would return with a key Returns a false in RegA if a call to Key_In would not return right away, but rather it would wait Returns a true if device not open, hardware failure (probably not applicable here) Typical calling sequence loop jsr work ; perform work until key is typed jsr Key_Status tsta ; true if a key is typed beq loop jsr Key_In ; read and process the key 4. Support software (private code). If you have any helper functions, these would be considered local to your driver and would be placed in this category. In C, these helper functions would be defined as private. In C, we could define the helper functions in the .c file, but not place a prototype in the .h file. In this way, the function could only be called from functions in the .c implementation, and not by the user. In assembly language we are very careful not to call a helper function from outside the device driver. An interrupt service routine is an example of support software. a) Create an I/O window and build a keyboard similar to the one shown in Figure L8.1. b) Write the low-level keyboard device driver. The main program will implement an access code based security system. Each access code will consist of four digits between 0 to 9.
Top view 1 2B
1A 4
3
C
2
3
4
5
6
7
8
9
2nd
0
help
enter
2
5E
6F
7
8
9
2nd
Clear
0
Help
Enter
D
1
3 4 clear
9 8 7 6 54 3 2 1 Wires on 0.1" centers
5 6 7 8
Bottom view Figure L8.1 0-9 keyboard with up arrow, down arrow, 2nd, CLEAR, HELP, and ENTER.
324
8 䡲 Serial and Parallel Port Interfacing The security system can recognize up to five access codes. You will specify these codes in global memory. The keyboard will be used to enter access codes. If this access code is one of the valid codes, checked by searching the access code database, the single LED is turned on. The LED will remain on until the new key is typed. The main program will need its own data structure to hold the last four keys typed. Assume “1257” and “2222” are valid codes. Following example shows the LED status (0 off, 1 on) after each key hit. 1 2 1 2 5 7 8 9 2 2 2 2 2 2 6 1 2 5 7 4 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 Write a main program to test the keyboard device driver. Collect some latency data (time from key touch to return of Key-In) measurements. c) Build the LED display writing a simple device driver that allows you to turn the LED on and off. d) Write a main program that implements the security system. Lab 8.2 Input/Output Interface to a Stepper Motor Purpose: The purpose of this laboratory is to develop a microcomputer system that spins a stepper motor. Description: a) Design the interface between the stepper and the 9S12. Use the simulator to create three files. Stepper.rtf will contain the assembly source code. Stepper.uc will contain the microcomputer configuration. Stepper.io will define the external connections. You should specify the microcomputer and attach one switch and the four signals to the stepper motor controller. The four stepper motor signals are called B, B, A, and A. b) You will write assembly code that inputs from the switch, and outputs to the stepper. When the input switch is “off” or open position, Port A bit 0 will be “0”. For this situation, your software will not change the Port B stepper motor outputs. When the input switch is “on” or closed position Port A bit 0 will be “1”. In this case, your software will output the sequence 5,6,10,9,5,6,10,9, . . . over and over again to the stepper motor. The motor will turn 1.8° for every new output to Port B. Instead of a stepper motor, the four outputs will be connected to four LEDs. The following C program describes the software algorithm.
Program L8.2 The C program to illustrate Lab 8.2.
unsigned char Angle; // ranges from 0 to 199 void main(void){ Angle=0; // initialize global DDRA=0; // make Port A inputs DDRB=0xFF; // make Port B outputs while(1){ while((PORTA&0x01)==0) {}; // stop if PA0=0, continue if PA0=1 PORTB=5; Angle++; PORTB=6; Angle++; PORTB=10; Angle++; PORTB=9; Angle++; if(Angle==200) Angle=0; } The software variable Angle varies from 0 to 199 as the stepper motor angle varies from 0 to 358°. c) During the demonstration, you will be asked to run the program to verify proper operation. Be prepared to use the debugger to determine how fast the simulated motor is spinning. Each output to Port B causes a 1.8° step. Lab 8.3 Calculator Purpose: The objectives of this lab are to: 䡲 Interface a matrix keyboard and HD44780 LCD display to the microcomputer 䡲 Write device drivers for the keyboard and HD44780 LCD display 䡲 Implement a four-function integer calculator
8.10 䡲 Laboratory Assignments
325
Description: In this lab you will design a four-function 8-bit unsigned integer calculator. The matrix keypad will include the numbers ‘0’‘9’, and the letters ‘’, ‘’, ‘*’, ‘/’, ‘’ and ‘C’. The HD44780 LCD display will show both an 8-bit global accumulator, and an 8-bit temporary register. You are free to design the calculator functionality in any way you wish, but you must be able to: (1) clear the accumulator and temporary; (2) type numbers in using the matrix keyboard; (3) add, subtract, multiply, and divide; (4) display the results on the HD44780 LCD display. Recall that a device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a 16-key matrix keyboard and
HD44780 display. You can assume the matrix keyboard does not bounce. During the initial debugging stages of the lab, you may disable the HD44780 busy flag, but your final demonstration will have to include the realistic timing for the LCD. b) Write a device driver for the HD44780. You should be able to: (1) initialize the interface; (2) clear the display; (3) output a character; (4) output an 8-bit integer; and (5) output a string. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the matrix keyboard. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the keyboard must be included in this driver. The names of all the public driver subroutines should start with the letters “Key_”. Draw flowcharts of these subroutines. d) Write the main program that implements the calculator functionality. Include a “call-graph” of the system. Lab 8.4 Stepper Motor Controller Purpose: The objectives of this lab are to 䡲 Interface a matrix keyboard, a LCD display and stepper motor to the microcomputer 䡲 Write device drivers for the keyboard, LCD display and stepper motor 䡲 Implement a stepper motor controller Description: In this lab you will design a simple stepper motor controller. The matrix keypad will include the numbers ‘0’‘9’, and the letters ‘c’, and ‘g’. To move the motor, the operator types in the desired angle (0 to 359), then hits the ‘g’ key. As the operator enters the numbers, the digits are displayed on the three-digit LCD. If the operator types ‘c’, the command is cleared, and no motion occurs. The system should move clockwise or counterclockwise, whichever is fewer steps. While the motor is moving the three-digit LCD display will show the current angle of the stepper motor (0 to 359). Recall that a device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a 12-key matrix keyboard, a three-digit LCD display and one stepper motor. You can assume the matrix keyboard does not bounce. b) Write a device driver for the 3-digit LCD. You should be able to initialize the interface and output an angle as a number from 0 to 359. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the matrix keyboard. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the keyboard must be included in this driver. The names of all the public driver subroutines should start with the letters “Key_”. Draw flowcharts of these subroutines. d) Write a device driver for the stepper interface. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the stepper motor must be included in this driver. The names of all the public driver subroutines should start with the letters “Step_”. Draw flowcharts of these subroutines. e) Write the main program that implements the calculator functionality. Include a “call-graph” of the system.
9
Interrupt Programming and Real-Time Systems Chapter 9 objectives are to: c c c c c c
Explain the fundamentals of interrupt programming Introduce interrupt-driven I/O, and implement periodic interrupts Explain key wakeup interrupts and use them to interface individual switches Present the timer-based modules needed for real-time systems Use the pulse accumulator and input capture to measure period and pulse width Develop methods to debug real-time events
An embedded system uses its input/output devices to interact with the external world. Input devices allow the computer to gather information, and output devices can display information. Output devices also allow the computer to manipulate its environment. The tight-coupling between the computer and external world distinguishes an embedded system from a regular computer system. The challenge is under most situations the software executes much faster than the hardware. E.g., the software may ask the hardware to clear the LCD display, but within the hardware this action might take 1 ms to complete. During this time, the software could execute thousands and thousands of instructions. Therefore, the synchronization between the executing software and its external environment is critical for the success of an embedded system. This chapter begins with an overview I/O synchronization. We then present general concepts about interrupts, and specific details for the 9S12. We will then use periodic interrupts to cause a software task to be executed on a periodic basis. This chapter describes the timer-based modules used to design real-time embedded systems.
9.1
I/O Sychronization Latency is the time between when the I/O device needs service, and the time when service is initiated. Latency includes hardware delays in the digital hardware plus computer software delays. For an input device, software latency (or software response time) is the time between new input data ready and the software reading the data. For an output device, latency is the delay from output device idle and the software giving the device new data to output. In this book, we will also have periodic events. For example, in our data acquisition systems, we wish to invoke the analog to digital converter (ADC) at a fixed time interval. In this way we can collect a sequence of digital values that approximate the continuous analog signal. Software latency in this case is the time between when
326
9.1 䡲 I/O Sychronization
327
the ADC converter is supposed to be started, and when it is actually started. The microcomputer-based control system also employs periodic software processing. Similar to the data acquisition system, the latency in a control system is the time between when the control software is supposed to be run, and when it is actually run. A real time system is one that can guarantee a worst case latency. In other words, the software response time is small and bounded. Throughput or bandwidth is the maximum data flow in bytes/second that can be processed by the system. Sometimes the bandwidth is limited by the I/O device, while other times it is limited by computer software. Bandwidth can be reported as an overall average or a short-term maximum. Priority determines the order of service when two or more requests are made simultaneously. Priority also determines if a high priority request should be allowed to suspend a low priority request that is currently being processed. We may also wish to implement equal priority, so that no one device can monopolize the computer. In some computer literature, the term “softreal-time” is used to describe a system that supports priority. The purpose of our interface is to allow the microprocessor to interact with its external I/O device. There are five mechanisms to synchronize the microprocessor with the I/O device. Each mechanism synchronizes the I/O data transfer to the busy to done transition. The methods are discussed in the following paragraphs. Blind cycle is a method where the software simply waits a fixed amount of time and assumes the I/O will complete before that fixed delay has elapsed. For an input device, the software triggers (starts) the external input hardware, wait a specified time, then reads data from device, see the left part of Figure 9.1. For an output device, the software writes data to the output device, triggers (starts) the device, then waits a specified time. We call this method blind, because there is no status information about the I/O device reported to the computer software. It is appropriate to use this method in situations where the I/O speed is short and predictable. One appropriate application of blind cycle synchronization is an ADC converter. For example, we can ask the ADC to convert, wait exactly 7 s, then read the digital result. This method works because the ADC conversion speed is short and predictable. Another good example of blind cycle synchronization is spinning a stepper motor. If we repeat this 8-step sequence over and over (1) output a 0x05, (2) wait 1 ms, (3) output a 0x06, (4) wait 1 ms, (5) output a 0x0A, (6) wait 1 ms, (7) output a 0x09, (8) wait 1 ms, the motor will spin at a constant speed. The LCD interface developed in Section 8.5 utilized blind cycle synchronization. Busy Waiting is a software loop that checks the I/O status waiting for the done state. For an input device, the software waits until the input device has new data, then reads it from the input device, see the middle part of Figure 9.1. For an output device, the software writes data, triggers the output device then waits until the device is finished. Another approach to output device interfacing is for the software to wait until the output device has finished the previous output, write data, then trigger the device. Busy-wait synchronization will be used in situations where the software system is relatively simple and real time response is not important. The ADC converter could also have been interfaced with busy-wait synchronization. For example, we can ask the ADC to convert, wait until the sequence conversion flag (SCF) in the ADC is set, then read the digital result. An interrupt uses hardware to cause special software execution. With an input device, the hardware will request an interrupt when input device has new data. The software interrupt service will read from the input device and save the data in a global structure, see the right part of Figure 9.1. With an output device, the hardware will request an interrupt when the output device is idle. The software interrupt service will get data from a global structure, then write to the device. Sometimes we configure the hardware timer to request interrupts on a periodic basis. The software interrupt service will perform a special function. A data acquisition system needs to read the ADC at a regular rate. The 9S12 microcomputer will execute special software (trap) when it tries to execute an illegal instruction. Other computers can be configured to request an interrupt on an access to an illegal address or a
328
9 䡲 Interrupt Programming and Real-Time Systems
divide by zero. The Freescale microcomputers do not provide for a divide by zero trap, but many computers do. Interrupt synchronization will be used in situations where the system is fairly complex (e.g., a lot of I/O devices) or when real time response is important. Periodic Polling uses a clock interrupt to periodically check the I/O status. At the time of the interrupt the software will check the I/O status, performing actions as needed. With an input device, a ready flag is set when the input device has new data. At the next periodic interrupt after an input flag is set, the software will read the data and save them in a global structure. With an output device, a ready flag is set when the output device is idle. At the next periodic interrupt after an output flag is set, the software will get data from a global structure, and write it. Periodic polling will be used in situations that require interrupts, but the I/O device does not support interrupt requests directly. DMA, or direct memory access, is an interfacing approach that transfers data directly to/from memory. With an input device, the hardware will request a DMA transfer when input device has new data. Without the software’s knowledge or permission the DMA controller will read data from the input device and save it in memory. With an output device, the hardware will request a DMA transfer when the output device is idle. The DMA controller will get data from memory, then write it to the device. Sometimes we configure the hardware timer to request DMA transfers on a periodic basis. DMA can be used to implement a high-speed data acquisition system. DMA synchronization will be used in situations where high bandwidth and low latency are important. One can think of the hardware being in one of three states. The idle state is when the device is disabled or inactive. No I/O occurs in the idle state. When active (not idle) the hardware toggles between the busy and ready states. The interface includes a flag specifying either busy (0) or ready (1) status. 䡲 The hardware will set the flag when the hardware component of the I/O operation is complete. 䡲 The software can read the flag to determine if the device is busy or ready. 䡲 The software can clear the flag, signifying the software component is complete. 䡲 This flag serves as the hardware trigger event for an interrupt. For an input device, a status flag is set when new input data is available. The “busy to ready” state transition will cause a busy-wait loop to complete, see middle of Figure 9.1. Once the software recognizes the input device has new data, it will read the data and ask the input device to create more data. It is the busy to ready state transition that signals to the computer that service is required. When the hardware is in the done state the I/O transaction is complete. Often the simple process of reading the data will clear the flag and request another input. Figure 9.1 The input device sets a flag when it has new data.
Blind Cycle
Input
Wait a fixed time
Input
Input Busy
BusyWait Status Ready
Interrupt
Empty
Fifo
Read data
Some
Read data
Read data
Get data from Fifo
Put data in Fifo
return
return
return
return from interrupt
The problem with I/O devices is that they are usually much slower than software execution. Therefore, we need synchronization, which is the process of the hardware and software waiting for each other in a manner such that data is properly transmitted. A way to visualize this synchronization is to draw a state versus time plot of the activities of the hardware and software. For an input device, the software begins by waiting for new input. When the input
9.1 䡲 I/O Sychronization
329
device is busy it is in the process of creating new input. When the input device is ready, new data is available. When the input device makes the transition from busy to ready, it releases the software to go forward. In a similar way, when the software accepts the input, it can release the input device hardware. The arrows in Figure 9.2 represent the synchronizing events. In this example, the time for the software to read and process the data is less than the time for the input device to create new input. This situation is called I/O bound, meaning the bandwidth is limited by the speed of the I/O hardware. Figure 9.2 The software must wait for the input device to be ready.
Ready
Input device Software
Ready
Busy
Busy
Wait
Busy
Wait
Wait
Read Process
Read Process
Time
If the input device were faster than the software, then the software waiting time would be zero. This situation is called CPU bound (meaning the bandwidth is limited by the speed of the executing software). From this figure we can see that the bandwidth depends on both the hardware and the software. The busy-wait method is classified as unbuffered because the hardware and software must wait for each other during the transmission of each piece of data. The interrupt solution (shown in the right part of Figure 9.1) is classified as buffered, because the system allows the input device to run continuously, filling a FIFO with data as fast as it can. In the same way, the software can empty the buffer whenever it is ready and whenever there is data in the buffer. We will implement a buffered interface for the serial port input in Chapter 12 using interrupts. For an output device, a status flag is set when the output is idle and ready to accept more data. The “busy to ready” state transition causes a busy-wait loop to complete, see the middle part of Figure 9.3. Once the software recognizes the output is idle, it gives the output device another piece of data to output. It will be important to make sure the software clears the flag each time new output is started. Figure 9.3 The output device sets a flag when it has finished outputting the last data.
Blind Cycle
BusyWait
Busy
Write data
Status Ready
Wait a fixed time
Write data
return
return
Interrupt
Output
Output
Output
Empty
Fifo Full
Fifo
Not empty
Get data from Fifo
Not full
Put data in Fifo return
Write data return from interrupt
Figure 9.4 contains a state versus time plot of the activities of the output device hardware and software. For an output device, the software begins by generating data then sending it to the output device. When the output device is busy it is processing the data. Normally when the software writes data to an output port, that only starts the output process. The time it takes an output device to process data is usually longer than the software execution time. When the output device is done, it is ready for new data. When the output device makes the transition from busy to ready, it releases the software to go forward. In a similar way, when the software writes data to the output, it releases the output device hardware. The output
330
9 䡲 Interrupt Programming and Real-Time Systems
Figure 9.4 The software must wait for the output device to finish the previous operation.
Ready
Ready
Output device
Ready
Software
Busy
Busy
Wait Write Generate Generate
Busy
Wait Write
Wait Write
Generate
Time
Generate
interface illustrated in Figure 9.4 is also I/O bound because the time for the output device to process data is longer than the time for the software to generate and write it. The arrows in Figure 9.4 signify the synchronizing events. Again, I/O bound means the bandwidth is limited by the speed of the I/O hardware. The busy-wait solution for this output interface is also unbuffered, because when the hardware is done, it will wait for the software and after the software generates data, it waits for the hardware. On the other hand, the interrupt solution (shown as the right part of Figure 9.3) is buffered, because the system allows the software to run continuously, filling a FIFO as fast as it wishes. In the same way, the hardware can empty the buffer whenever it is ready and whenever there is data in the FIFO. We will implement a buffered interface for the serial port output in Chapter 12 using interrupts.
9.2
Interrupt Concepts 9.2.1 Introduction
An interrupt is the automatic transfer of software execution in response to a hardware event that is asynchronous with the current software execution. This hardware event is called a trigger. The hardware event can either be busy to ready transition in an external I/O device (like the SCI input/output) or an internal event (like an op code fault, memory fault, power failure, or a periodic timer). When the hardware needs service, signified by a busy to ready-state transition, it will request an interrupt by setting its trigger flag. A thread is defined as the path of action of software as it executes. The execution of the interrupt service routine is called a background thread. This thread is created by the hardware interrupt request and is killed when the interrupt service routine executes the rti instruction. A new thread is created for each interrupt request. It is important to consider each individual request as a separate thread because local variables and registers used in the interrupt service routine are unique and separate from one interrupt event to the next interrupt. In a multithreaded system, we consider the threads as cooperating to perform an overall task. Consequently we will develop ways for the threads to communicate (e.g., FIFO) and synchronize with each other. Most embedded systems have a single common overall goal. On the other hand, general-purpose computers can have multiple unrelated functions to perform. A process is also defined as the action of software as it executes. Processes do not necessarily cooperate towards a common shared goal. Threads share access to I/O devices, system resources, and global variables, while processes have separate global variables and system resources. Processes do not share I/O devices. The software has dynamic control over aspects of the interrupt request sequence. First, each potential interrupt trigger has a separate arm bit that the software can activate or deactivate. The software will set the arm bits for those devices it wishes to accept interrupts from, and will deactivate the arm bits within those devices from which interrupts are not to be allowed. In other words it uses the arm bits to individually select which devices will and which devices will not request interrupts. The second aspect that the software controls is the interrupt enable bit, I, which is in the condition code register. The software can enable interrupts by making I 0, or it can disable interrupts by setting I 1. An interrupt occurs only when all three conditions are met: trigger, arm and enable. The disabled interrupt state
9.2 䡲 Interrupt Concepts
331
(I 1) does not dismiss the interrupt requests, rather it postpones them until a later time, when the software deems it convenient to handle the requests. We will pay special attention to these enable/disable software actions. In particular we will need to disable interrupts when executing nonreentrant code, but disabling interrupts will have the effect of increasing the response time of software. The interrupt service routine (ISR) is the software module that is executed when the hardware requests an interrupt. There may be one large ISR that handles all requests (polled interrupts), or many small ISRs specific for each potential source of interrupt (vectored interrupts). The design of the interrupt service routine requires careful consideration of many factors. Three conditions must be true for an interrupt to be generated. A device must be armed (e.g., RIE is set), interrupts must be enabled (I 0), and an external event must occur setting a trigger flag (e.g., new SCI input ready sets RDRF). An interrupt causes the following sequence of events. First, the current instruction is finished. There are exceptions to this rule: the 9S12 instructions rev revw and wav take a long time to execute, hence these three instructions can be interrupted in the middle of their execution. Second, the execution of the main program is suspended, pushing all the registers on the stack. Third, the PC is loaded with the address of the ISR (vector). Lastly, interrupts are disabled (I 1). These four steps, called a context switch, occur automatically in hardware as the context is switched from foreground to background. Next, the software executes the ISR. When the ISR is done it executes an rti causing the main program execution to be resumed. When the microcomputer accepts an interrupt request, it will automatically save the execution state of the main thread by pushing the registers (CCR, A, B, X, Y, and PC) on the stack. After the ISR provides the necessary service, it will execute an rti instruction. This instruction pulls these registers from the stack, which returns control to the main program. Since all threads use the same stack pointer, it is imperative that the ISR software balance the stack before exiting via the rti instruction. Execution of the main program will then continue with the exact stack and register values that existed before the interrupt. Although interrupt handlers can create and use local variables, parameter passing between threads must be implemented using shared global memory variables. A private global variables can be used if an interrupt thread wishes to pass information to itself, e.g., from one interrupt instance to another. The execution of the main program is called the foreground thread, and the executions of the various interrupt service routines are called background threads. An axiom with interrupt synchronization is that the interrupt program should execute as fast as possible. The interrupt should occur when it is time to perform a needed function, and the interrupt service routine should perform that function, and return right away. Placing backward branches (busy-waiting loops, iterations) in the interrupt software should be avoided if possible. The percentage of time spent executing interrupt software should be minimized. For an input device, the interface latency of an interrupt-driven input device is the time between when new input is available, and the time when the software reads the input data. We can also define device latency as the response time of the external I/O device. For example, if we request that a certain sector be read from a disk, then the device latency is the time it take to find the correct track and spin the disk (seek) so the proper sector is positioned under the read head. For an output device, the interface latency of an interruptdriven output device is the time between when the output device is idle, and the time when the software writes new data. A real-time system is one that can guarantee a worst case interface latency. Many factors should be considered when deciding the most appropriate mechanism to synchronize hardware and software. One should not always use busy-waiting because one is too lazy to implement the complexities of interrupts. On the other hand, one should not always use interrupts because they are fun and exciting. Busy-waiting synchronization is appropriate when the I/O timing is predicable, and when the I/O structure is simple and fixed. Busy-waiting should be used for dedicated single thread systems where there is
332
9 䡲 Interrupt Programming and Real-Time Systems
nothing else to do while the I/O is busy. Interrupt synchronization is appropriate when the I/O timing is variable, and when the I/O structure is complex. In particular, interrupts are efficient when there are I/O devices with different speeds. Interrupts allow for quick response times to important events. In particular, using interrupts is one mechanism to design real-time systems, where the interface latency must be short and bounded. They can also be used for infrequent but critical events like power failure, memory faults, and machine errors. Interrupts can be used to assist program development by triggering on stack overflow, invalid op code, and breakpoints. Periodic interrupts will be useful for real-time clocks, data acquisition systems, and control systems. For extremely high bandwidth and low latency interfaces, DMA should be used. An atomic operation is a sequence that once started will always finish, and can not be interrupted. Most instructions on the 9S12 are atomic. The exceptions are wai rev and revw, which can be suspended to process an interrupt. If we wish to make a section of code atomic, we can run that code with I 1. In this way, interrupts will not be able to break apart the sequence. In particular, to implement an atomic operation we will (1) save the current value of the CCR, (2) disable interrupts, (3) execute the operation, and (4) restore the CCR back to its previous value. Checkpoint 9.1: What three conditions must be true for an interrupt to occur? Checkpoint 9.2: How do you enable interrupts? Checkpoint 9.3: What are the steps that occur when an interrupt is processed?
9.2.2 Essential Components of Interrupt Processing
In this section, we will present the specific details for the 9S12 microcomputers. As you develop experience using interrupts, you will come to notice a few common aspects that most computers share. The following paragraphs outline three essential mechanisms that are needed to utilize interrupts. Although every computer that uses interrupts includes all three mechanisms there are a wide spectrum of implementation methods. All interrupting systems must have the ability for the hardware to request action from computer. The interrupt requests can be generated using a separate connection to microprocessor for each device, or using a shared negative logic wire-or requests using open collector logic. The shared interrupt request line on the 9S12 is IRQ, which is on the PE1 pin. The XIRQ line on the PE0 pin can also be shared, but XIRQ is usually reserved for catastrophic errors. The Freescale microcomputers support both types. All interrupting systems must have the ability for the computer to determine the source. A vectored interrupt system employs separate connections for each device so that the computer can give automatic resolution. You can recognize a vectored system because each device has a separate interrupt vector address. With a polled interrupt system, the interrupt software must poll each device, looking for the device that requested the interrupt. The third necessary component of the interface is the ability for the computer to acknowledge the interrupt. Normally there is a trigger flag in the interface that is set on the busy to ready state transition. In essence this trigger flag is the cause of the interrupt. Acknowledging the interrupt involves clearing this flag. It is important to shut off the request, so that the computer will not mistakenly request a second (and inappropriate) interrupt service for the same condition. Some Intel systems use a hardware acknowledgment that automatically clears the request. Most Freescale microcomputers use a software acknowledge. So when designing an interrupting interface on the 9S12, it will be important to know exactly what hardware conditions will set the trigger flag (and request an interrupt) and how the software will clear it (acknowledge) in the ISR. There are no standard definitions for the terms mask, enable, and arm in the professional, Computer Science, or Computer Engineering communities. Nevertheless, in this book we will adhere to the following specific meanings. To arm (disarm) a device means to enable (shut off) the source of interrupts. Each potential interrupting device has a separate arm bit. One arms (disarms) a device if one is (is not) interested in interrupts from
9.2 䡲 Interrupt Concepts
333
this source. For example, the 9S12 TIE register has eight arm bits for the output compare and input capture interrupts. The Freescale literature calls the arm bit as an “interrupt enable mask”. To enable (disable) means to allow interrupts at this time (postponing interrupts until a later time). On the 9S12 there is one interrupt enable bit for the entire interrupt system. We disable interrupts if it is currently not convenient to accept interrupts. In particular, to disable interrupts we set the I bit in 9S12 condition code register using the sei instruction. The software interrupt (swi) instruction and illegal instruction trap can not be disarmed or disabled. The XIRQ interrupt can be enabled by clearing the X bit in the CCR, but XIRQ interrupts can not be disabled. In particular, once cleared, the software can not set the X bit. The reset line will halt execution and load the PC with the 16-bit contents at $FFFE, but does not save the current state by pushing registers on the stack. Reset can’t be disarmed or disabled. Common Error: The system will crash if the interrupt service routine doesn’t either acknowledge or disarm the device requesting the interrupt. Common Error: The ISR software doesn’t have to explicitly disable interrupts at the beginning (sei) or explicitly reenable interrupts at the end (cli). The disabling and enabling occur automatically.
9.2.3 Sequence of Events
The sequence of events begins with the Hardware needs service (busy to done) transition. This signal is connected to an input of the microcomputer that can generate an interrupt. For example, the key wakeup, input capture, serial communication interface (SCI) and serial peripheral interface (SPI) systems support interrupt requests. Some interrupts are internally generated like output compare, real-time interrupt (RTI), and timer overflow. The second event is the setting of a trigger flag in one of the I/O status registers of the microcomputer. This is the same flag that a busy-waiting interface would be polling on. Examples include the key wakeup (KWIFJn), serial communication interface (RDRF and TDRE), output compare (CnF), real-time interrupt (RTIF), and timer overflow (TOF). In order for an interrupt to be requested the appropriate trigger flag bit must be armed. Examples include the key wakeup (KWIEJn), serial communication interface (RIE and TIE), output compare (CnI), real-time interrupt (RTII), and timer overflow (TOI). In summary, three conditions must be met simultaneously for an interrupt service to occur. These three conditions can occur in any order. 1. A device is armed 2. A microcomputer interrupts are enabled 3. An interrupting event occurs that sets the trigger
e.g., C3I 1 I0 e.g., C3F 1
The third event in the interrupt processing sequence is the context switch, or threadswitch. The thread-switch is performed by the microcomputer hardware automatically. First, the microcomputer will finish the current instruction (rev revw and wav are interruptable). After the current instruction is complete, it takes 9 more bus cycles on the 9S12 to perform the thread-switch: 1. 2. 3. 4. 5. 6. 7. 8. 9.
The 16-bit interrupt vector address is read (eventually this is loaded into the PC) The PC is pushed (return address) The first of three op code fetches is performed to fill the instruction queue Register Y is pushed on the stack Register X is pushed on the stack The second of three op code fetches is performed to fill the instruction queue Registers B and A are pushed on the stack (RegD is pushed little endian) The CCR is pushed, with the I bit still equal to 0, then sets I 1 The third of three op code fetches is performed to fill the instruction queue (queue is full)
334
9 䡲 Interrupt Programming and Real-Time Systems
The fourth event is the software execution of the interrupt service routine (ISR). For a polled interrupt configuration, the ISR must poll each possible device, and branch to specific handler for that device. The polling order establishes device priority. For a vectored interrupt configuration, you could poll anyway to check for runtime hardware/software errors. The ISR must either acknowledge or disarm the interrupt. We acknowledge an interrupt by clearing the trigger flag that was set in the second event shown above. After we acknowledge a low-priority interrupt, we may re-enable interrupts (cli) to allow higher priority devices to go first. All ISR’s must perform the necessary operations (read data, write data etc.) and pass parameters through shared global memory (e.g., FIFO queue). The last event is another thread-switch in order to return control back to the thread that was running when the interrupt was processed. In particular, the software executes an rti at the end of the ISR, which will pull CCR, B, A, X, Y, and PC off the stack. At the beginning of the interrupt service the CCR was pushed on the stack with I 0. Therefore, the execution of rti automatically re-enables interrupts. After the ISR executes rti the stack is restored to the state it was before the interrupt. The ISR may change global variables or I/O ports, but the registers and stack are left unchanged by the ISR. The interrupt hardware will automatically save all registers on the stack during the thread-switch, as shown in Figure 9.5. The thread-switch is the process of stopping the foreground (main) thread and starting the background (interrupt handler). The “oldPC” value on the stack points to the place in the foreground thread to resume once the interrupt is complete. At the end of the interrupt handler, another thread-switch occurs as the rti instruction restores registers from the stack (including the PC). Checkpoint 9.4: What would happen if the ISR forgot to acknowledge the interrupt? Checkpoint 9.5: If you didn’t want to or couldn’t acknowledge what else might the ISR do? Figure 9.5 Stack before and after an interrupt.
Before interrupt
RAM
I 0 SP PC
Stack $3FFF $4000 EEPROM main
$FFFF
9.2.4 9S12 Interrupts
After Context Switch 1) Finish instruction interrupt 2) Push registers 3) PC = {Vector} I 1 4) I=1 SP PC
RAM old CC old B old A old X old Y old PC Stack
$3FFF $4000 EEPROM main
Handler
Handler
rti
rti
Vector
Vector $FFFF
On the 9S12, exceptions include resets, software interrupts and hardware interrupts. Each exception has an associated 16-bit vector that points to the memory location where the ISR that handles the exception is located. Vectors are stored in the upper 128 bytes of the standard 64 kibibyte address map. As we have seen previously, the reset vector points to the main program, but the other vectors will point to interrupt service routines. A hardware priority hierarchy determines which exception is serviced first when simultaneous requests are made. Basically, the exception with the vector at a higher address has priority over an exception with a vector at a lower address. Since the reset vector is at $FFFE, it is the highest priority exception. Six exceptions are
9.2 䡲 Interrupt Concepts
335
nonmaskable, meaning there is no associated arm bit, and the exception is not affected by the I bit in the CCR. The remaining sources have an arm bit that can be activated (armed) or deactivated (disarmed). The priorities of the non-maskable sources are: 1. 2. 3. 4. 5. 6.
Power-On-Reset (POR) or regular hardware RESET pin Clock monitor reset Computer-Operating-Properly (COP) watchdog reset Unimplemented instruction (trap) Software interrupt instruction (swi) XIRQ signal (if X bit in CCR 0)
Maskable interrupt sources include on-chip peripheral systems and external interrupt service requests. Interrupts from these sources are recognized when the interrupt enable bit (I) in the CCR is cleared. The default state of the I bit out of reset is one, but it can be written at any time. The 9S12 has two external requests, XIRQ and IRQ, that are level zero active. Many of the internal I/O devices can generate interrupt requests based on external events (e.g., key wakeup, input capture, SCI, SPI, etc.) Other than the six non-maskable sources listed above, the remaining interrupt requests will temporarily set the I bit in the CCR during the interrupt program to prevent other interrupts (including itself). On the other hand, the XIRQ request temporarily sets both the I and X bits in the CCR during the interrupt program to postpone all other interrupts sources. The interrupts have a fixed priority, but you can elevate one request to highest priority using the HPRIO, Hardware Priority Interrupt Register ($001F). The relative priorities of the other interrupt sources remain the same. We typically use XIRQ to interface a single highest priority device. XIRQ has a separate interrupt vector ($FFF4) and a separate enable bit (X). Once the X bit is cleared (enabled) the software can not disable it. A XIRQ interrupt is requested when the external XIRQ pin is low and the X bit in the CCR is 0. XIRQ processing will automatically set X I 1 (an IRQ can not interrupt an XIRQ service) at the start of the XIRQ handler. Just like regular interrupts, the X and I bits will be restored to their original values by the rti instruction. The priority is fixed in the order shown in Table 9.1 with Key Wakeup P having the lowest priority and Reset having the highest. Not all interrupt sources are available on every 9S12, but this list defines some of the interrupt sources. Any one particular application usually uses just a few interrupts. In particular, those devices that need prompt service should be armed to request an interrupt. The software arms (specific for each possible source) and enables (I 0 globally) interrupts. The external event triggers the interrupt by setting the trigger flag. The interrupt service routine (ISR) is executed in response to the trigger. The ISR acknowledges the interrupt by clearing the trigger flag. For some interrupt sources, such as the SCI interrupts, flags are automatically cleared during the response to the interrupt requests. For example, the RDRF flag in the SCI system is cleared by the automatic clearing mechanism, consisting of a read of the SCI status register while RDRF is set, followed by a read of the SCI data register. The normal response to an RDRF interrupt request is to read the SCI status register to check for receive errors, then to read the received data from the SCI data register. These two steps satisfy the automatic clearing mechanism without requiring any special instructions. On the other hand, many trigger flags employ a confusing, but effective way for the software to acknowledge it. Flags such as RTIF, CnF, TOF, PIFJn, PIFHn, and PIFPn are cleared when the software writes a 1 into the bit position of that flag. Writing a zero to the flag register has no effect, and writing a $FF clears all the flag bits in the register. Many of the potential interrupt requests share the
336
9 䡲 Interrupt Programming and Real-Time Systems
same interrupt vector. E.g., there are 8 possible key wakeup interrupt sources (PH7 to PH0) that all use the vector at $FFCC. Therefore, when this request is processed the ISR software must determine which of the 8 possible signals caused the interrupt. Vector Address
As we defined earlier, when more than one source of interrupt exists the computer must have a reliable method to determine which interrupt request has been made. There are two common approaches, and the Freescale microcomputers apply a combination of both methods. The first approach is called vectored interrupts. With a vectored interrupt system each potential interrupt source has a unique interrupt vector address. You simply place the correct handler address in each vector, and the hardware automatically calls the correct software when an interrupt is requested, see Table 9.1. The second approach is called polled interrupts. SCI, SPI, and key wakeup must be polled. With a polled interrupt system multiple interrupt sources share the same interrupt vector address (e.g., both RDRF and TDRE share the same vector). Once the interrupt has occurred, the ISR software must poll the potential devices to determine which device needs service. The 9S12 systems have a separate acknowledgment, so that if both interrupts are pending, acknowledging one will not satisfy the other, so the second device will request a second interrupt and get serviced. Common Error: If two interrupts were requested, it would be a mistake to service just one and acknowledge them both. Observation: External events are often asynchronous to program execution, so careful thought is required to consider the effect if an external interrupt request were to come in between each pair of instructions. Observation: The computer automatically sets the I bit during processing, so that an interrupt handler will not interrupt itself.
9.2.6 Pseudo-Interrupt Vectors
Table 9.2 Some pseudo-interrupt vectors for the 9S12.
Some development boards do not allow you to erase and reprogram the interrupt vectors from $FF80 to $FFFF. In these development systems, the ROM at $FF80 to $FFFF has the interrupt vectors pointing to memory locations that can be set. The locations to which the real vectors point are called pseudo-interrupt vectors. Typically, the pseudointerrupt vectors are defined in the same order as the real vectors. Pseudo vectors for three debuggers are shown in Table 9.2. In the old 6811 development boards each pseudo vector was in RAM and required 3 bytes. Three bytes were required to place a jmp instruction to your ISR. During a 6811 initialization, the program places jmp instructions into the pseudo vectors. In contrast, most 9S12 debuggers only require 2 bytes for each pseudo vector. The MON12 debugger on 9S12 boards from Axiom (http://www.axman.com) and the D-Bug12 debugger from Technological Arts implement 16-bit pseudo vectors in RAM. During initialization at run-time, your program
Real MON12 Vector Pseudo Vector
D-Bug12 Pseudo Vector
Serial Monitor Interrupt Source or Pseudo Vector Trigger Flag
must place pointers to your ISRs into the pseudo vectors. Everytime an interrupt occurs the Axiom MON12 debugger requires 21 extra bus cycles to implement the indirect jump to your ISR. The Freescale Serial Monitor used by Metrowerks CodeWarrior and TExaS also employ pseudo vectors. The difference is the Serial Monitor pseudo vectors are in EEPROM. Your software does not have to perform any run-time initialization of the pseudo vector. Rather, the Serial Monitor will automatically translate a “Program ROM” command from $FF80-$FFFF down to $F780-$F7FF. For example, this code is the proper way to set the TC0 interrupt vector in a system without pseudo vectors, see Table 9.1. org fdb
$FFEE TC0han
However, when your software is loaded into EEPROM, this vector transparently and automatically ends up being programmed at $F7EE. Everytime an interrupt occurs the Serial Monitor requires 19 extra bus cycles to implement the pseudo vector. The actual Serial Monitor code for an interrupt is uvector08: bsr ... ISRHandler: pulx ldy cpy beq jmp
ISRHandler
-$0636,X #$FFFF BadVector ,Y
;TC0 interrupt starts executing here ;pull bsr return address off stack ;get value of pseudo vector ;is it programmed? ;jump to your ISR
SCI interrupts with the serial monitor include an overhead longer than 19 cycles, because the SCI interrupts are used by the debugger itself to perform its actions. In particular, after a SCI interrupt, the debugger will check the LOAD/RUN switch to see if the debugger or user program should process the interrupt.
9.3
Key Wakeup Interrupts The basic idea of key wakeup is to connect an input to the 9S12 and configure the interface so an interrupt is requested on either the rising or falling edge of the input. Using key wakeup allows make software respond quickly to changes in the external world. The 9S12C32 has ten possible key wakeup interrupt sources, which are available on Ports J, and P. The 9S12DP512 has twenty key wakeup interrupt sources, which are available on Ports H, J, and P. See Table 9.3. Any or all of these pins can be configured as a key wakeup interrupt. Each of the wakeup lines has a separate I/O pin (PTH, PTJ, PTP), a direction register bit (DDRH, DDRJ, DDRP), a trigger flag bit (PIFH, PIFJ, PIFP), an arm bit (PIEH, PIEJ, PIEP), and a polarity bit (PPSH, PPSJ, PPSP). First we identify external digital signals containing strategic edges (rising or falling). In particular, strategic means we wish to execute software whenever one of these edges occur. We connect these digital signals to individual key wakeup pins. To use key wakeup, we must make these lines an input, and configure the strategic edge to be active. Key wakeup interrupts can be configured to be active on either the rising or falling edge. If the corresponding bit in the PPSH/PPSJ/PPSP is 0, then a falling edge will set the trigger flag. Conversely, if the bit in the PPSH/PPSJ/PPSP register is 1, then a rising edge will set the trigger flag. A key wakeup interrupt will be generated if the trigger flag bit is set, the arm bit is set and the interrupts are enabled (I 0).
Table 9.3 9S12 key wakeup ports (all twenty pins are available on the 9S12DP512, while just the ten shaded pins are available on the 9S12C32).
Another convenience of Ports H, J, and P is the available pull-up or pull-down resistors a shown in Table 9.4. Each of the pins of Ports H, J, and P can be configured separately.
A typical application of pull-up is the interface of simple switches. Using pull-up or pull-down mode eliminates the need for an external resistor when interfacing a switch. The PJ6, PT6 interfaces in Figure 9.6a) implement negative logic switch inputs, and the PJ7, PT7 interfaces in Figure 9.6b) implement positive logic switch inputs. The Port P interfaces employ internal resistors.
340
9 䡲 Interrupt Programming and Real-Time Systems
Figure 9.6 Key wakeup or input capture can generate interrupts on a switch touch.
+5V 9S12 +5V
PJ6
9S12 +5V
PJ7
10kΩ PT6
PT7 10kΩ
(a) Pull-up interface
(b) Pull-down interface
Checkpoint 9.6: What values to you write into DDRJ, PPSJ and PERJ to configure the switch interfaces of PJ6 and PJ7 in Figure 9.6?
Three conditions must be simultaneously true for a key wakeup interrupt to be requested: 䡲 The trigger flag bit is set 䡲 The arm bit is set 䡲 The I bit in the 9S12 CCR is 0 Even though there are twenty key wakeup lines, there are only three interrupt vectors, one for Port H, one for Port J and the other for Port P. So, if two or more wakeup interrupts are used on the same port, it will be necessary to poll. Interrupt polling is the software function to look and see which of the potential sources requested the interrupt. The flag bits are cleared by writing a one to it. For example, to clear Port P trigger flag 7 in C we can execute PIFP = 0x80;
// clears flag bit 7 of Port P
In assembly, to clear Port P trigger flag 7 movb #$80,PIFP ; clears flag bit 7 of Port P
Example 9.1 You are asked to design a measurement system for the robot in Figure 8.17 that counts the number of times the wheel turns. This count will be a measure of the total distance travelled. The desired resolution is 1⁄32 of a turn and the desired range is 0 to 2047 31⁄32 revolutions Solution Whenever you measure something, it is important to consider the resolution and range. The basic idea is to use an optical sensor (QRB1134) to visualize the position of the wheel. A black/white striped pattern is attached to the wheel, and an optical reflective sensor placed near the stripes. The sensor has an infrared LED output and a light sensitive transistor. The current to the 1.8 V LED is controlled by the R1 resistor. In this circuit, the LED current will be (5-1.8 V)/200 , which is 16 mA. The R2 pull-up resistor on the transistor creates a output swing at V1 depending on whether the sensor sees a black stripe or white stripe. Unfortunately, the signal V1 is not digital. The rail-to-rail op amp, in open loop mode, creates a clean digital signal at V2, which has the same frequency as V1. The negative terminal is set to a voltage approximately in the center of V1, shown as 2 V in Figure 9.7. In general, we should select the threshold at the place in the wave where the slope is maximum. We then interface V2 to a key wakeup pin, and configure the system to trigger a key wakeup interrupt on each rising edge. This solution uses PP5, such that a rising edge triggers an interrupt on Port P key wakeup, see Program 9.1. Because there are 32 stripes on the wheel, there will be 32 interrupts each time the wheel rotates once. A 16-bit counter is used, because we expect less than 65535 counts. The count is a binary fixed-point number with a resolution of 25 revolutions. E.g., if the count is 100, this means 100/32 or 3.125 revolutions. We also assume no other key wakeup channels on Port P will be used.
9.3 䡲 Key Wakeup Interrupts Figure 9.7 An optical sensor is used to detect rotations on a wheel.
+5V
+5V 5kΩ R2
200Ω R1
TLC2274 +5V V1 +
QRB1134 2V
light
341
9S12 V2
PP5
–
V2 V1
5V 2V 0V
org $0800 ;($3800 if 9S12C32) rmb 2 ;0.03125 revolutions org $4000 ;Rising edge on PP5 causes an interrupt Key_Init movw #0,Count bclr DDRP,#$20 ; PP5 is input bclr PERP,#$20 ; no pull down on PP5 bset PPSP,#$20 ; rising edge active bset PIEP,#$20 ; arm PP5 movb #$20,PIFP ; clear flag cli rts Keyhandler movb #$20,PIFP ; ack, clear flag ldx count inx stx Count ; units 1/32 revolution rti org $FF8E fdb Keyhandler Count
// Rising edge on PP5 causes an interrupt unsigned short Count; // 1/32 revolutions void Key_Init(void){ Count = 0; DDRP &= ~0x20; // PP5 is input PERP &= ~0x20; // no pull down on PP5 PPSP |= 0x20; // rising edge active PIEP |= 0x20; // arm PP5 PIFP = 0x20; // clear flag asm cli // enable interrupts } interrupt 56 void Keyhandler(void){ PIFP = 0x20; // clear flag Count++; // 1/32 revolution }
Program 9.1 Assembly and C implementations of an interrupting key wakeup.
Because of the read, modify, write sequence, the following software clears all the flag bits (hence these are inappropriate ways to clear one flag.)
bset PIFP,#$04
PIFP |= 0x04;
Observation: All 8 key wakeup lines on Port P use the same interrupt vector, but they have separate polarity, arm, pullup/down, and flag bits. Checkpoint 9.7: How do you modify Program 9.1 so it counts falling edges?
If a pin is configured as an input, then reads to PTH/PTJ/PTP return the same value as reads to PTIH/PTIJ/PTIP, which will be the digital value at the input. Conversely, if a pin is configured as an output, then reads to PTH/PTJ/PTP return the most recent value written to the output port, while reads to PTIH/PTIJ/PTIP will return the digital value at the output pin. The RDRH/RDRJ/RDRP register determines the drive strength of an output signal. If the bit is 1, then the corresponding output will have 1/3 drive current. This mode is used to reduce supply current to the 9S12.
342
9 䡲 Interrupt Programming and Real-Time Systems
9.4
Periodic Interrupt Programming We will continue our interrupt examples with periodic interrupts. Periodic interrupts are both simple to understand and extremely useful for real-time embedded systems. A periodic interrupt is one that is requested on a fixed time basis. Periodic interrupts are required for data acquisition and control systems, because software execution must be performed periodically at accurate time intervals. For a data acquisition system, it is important to establish an accurate sampling rate. The time in between ADC samples must be equal (and known) in order for the digital signal processing to function properly. Similarly for microcomputer-based control systems, it is important to maintain both the timing with the sensors (inputs) and with the actuators (outputs). One synchronization method that uses periodic interrupts is called “intermittent polling” or “periodic polling”. In regular busy-waiting, the main program polls the I/O devices continuously. With intermittent polling, the I/O devices are polled on a regular basis, established by a periodic interrupt, as shown in the flowchart of Figure 9.8. Assume for a moment that all n devices are simultaneously ready. It is an appropriate design constraint for the time it takes to service all n devices (maximum time to execute the ISR) to be small compared to the interrupt period used for the periodic polling. This constraint will prevent the periodic polling ISR from capturing all the available CPU time. Similarly, the time to execute this ISR will affect the response time of other interrupts in the system. On the other hand, the interrupt frequency used for the periodic polling should be large compared to the bandwidth of the I/O channel, so no data are lost. If no device needs service, then the interrupt simply returns. This method frees the main program from the I/O tasks. The original IBM-PC computer used an 18 Hz
Figure 9.8 An ISR flowchart that implements periodic polling.
Periodic Interrupt Ready Device 1 Busy
Input/Output Data 1 Ready
Device 2 Busy
Input/Output Data 2
Ready Device n Busy
Input/Output Data n
Acknowledge Interrupt rti
9.5 䡲 Real-Time Interrupt (RTI)
343
periodic interrupt to interface its keyboard. It is appropriate to use periodic polling when the following two conditions apply: 1. The I/O hardware can not generate interrupts directly. 2. We wish to perform the I/O functions in the background. Observation: The average response time of an event interfaced with periodic polling is 1/2 the period. Observation: The worst case response time of an event interfaced with periodic polling is the period.
There are three mechanisms on the 9S12 that generate periodic interrupts: real-timeinterrupt (RTI), timer overflow (TOF) and output compare (OC).
9.5
Real-Time Interrupt (RTI) First, the real-time interrupt (RTI) mechanism can generate interrupts at a fixed rate. Seven bits (RTR6-0) in the RTICTL register specify the interrupt rate. The 7-bit value is composed of two parts: Let RTR6, RTR5, RTR4 be n, which is a 3-bit number ranging from 0 to 7 Let RTR3, RTR2, RTR1, RTR0 be m, which is a 4-bit number ranging from 0 to 15 Table 9.5 shows the 9S12 registers used in RTI interrupts. The entries shown in bold will be used in this section.
Table 9.5 9S12 registers used to configure real time interrupts.
Address Bit 7 $0037 $0038 $003B
6
5
RTIF PROF RTIE 0 0 RTR6
4
3
0 LOCKIF LOCK 0 LOCKIE 0 RTR5 RTR4 RTR3
2
1
TRACK SCMIF 0 SCMIE RTR2 RTR1
Bit 0
Name
SCM CRGFLG 0 CRGINT RTR0 RTICTL
If n is zero, then the RTI system is off. A 9S12C32 with an 8 MHz crystal will have an OSCCLK frequency of 8 MHz and a default E clock frequency of 4 MHz. A 9S12DP512 with a 16 MHz crystal will have an OSCCLK frequency of 16 MHz and a default E clock frequency of 8 MHz. Let fcrystal be the crystal frequency, then the RTI interrupt frequency can be calculated using RTI interrupt frequency fcrystal *2n/(m 1)/512 RTI interrupt period 512*(m 1)*2n/fcrystal Observation: The phase-lock-loop (PLL) on the 9S12 will not affect the RTI rates.
The interrupt rate is determined by the crystal clock and the RTICTL value. Table 9.6 shows the available interrupt periods, assuming an 8 MHz crystal. Table 9.7 shows the available interrupt periods, assuming a 16 MHz crystal. Basically, the RTIF trigger flag is set periodically. If armed (RTIE 0), this trigger flag will request an interrupt. To clear the RTIF flag (acknowledge the interrupt), the software writes a one to it.
344
9 䡲 Interrupt Programming and Real-Time Systems
Table 9.6 9S12 real-time interrupt period in ms, assuming an 8 MHz crystal.
Table 9.7 9S12 real-time interrupt period in ms, assuming a 16 MHz crystal.
Example 9.2 Write software that increments a global variable every 32.768 ms. Solution The solution will use a periodic RTI interrupt that occurs every 32.768 ms. RTI is simple, and accurate if the desired interrupt period matches one of the possibilities shown in Table 9.6 or 9.7. The main program executes RTI_Init to initialize the RTI
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
345
interrupts, as shown in Program 9.2. The RTI rate is determined by the crystal frequency and the RTICTL register. Bit 7 of the CRGINT register is set to arm the RTI system. The RTI_Init routine initializes the global variable and enables interrupts (cli). The ISR will acknowledge the interrupt and increment a global variable, Time. The ISR makes the trigger flag zero by writing a one to it.
; 9S12C32 4 MHz, 9S12DP512 8 MHz org $0800 ;($3800 if C32) Time rmb 2 org $4000 RTI_Init sei ;make atomic movb #$77,RTICTL ;($73 if C32) movb #$80,CRGINT ;arm RTI movw #0,Time cli ;enable IRQ rts ; interrupts every 32.768ms RTIHan movb #$80,CRGFLG ;ack ldd Time addd #1 std Time rti org $FFF0 fdb RTIHan ;vector
// 9S12C32 4 MHz, 9S12DP512 8 MHz unsigned short Time; void RTI_Init(void){ asm sei // RTICTL = 0x77; // CRGINT = 0x80; // Time = 0; // asm cli }
Program 9.2 Implementation of a periodic interrupt using the real time clock feature.
Checkpoint 9.8: How would you modify Program 9.2 to count every 10.24 ms?
9.6
Timer Overflow, Output Compare, and Input Capture
9.6.1 Timer Features and Timer Overflow
Table 9.8 shows the 9S12 registers used in timer overflow, input capture and output compare. The entries shown in bold will be used in this section. The timer overflow interrupt feature can also be used to generate interrupts at a fixed rate, as listed in Table 9.9. The 16bit TCNT register is incremented at a fixed rate. The TOF trigger flag is set when the counter overflows and wraps back around (automatically) to zero. If armed, the TOF trigger flag will generate an interrupt. Three bits (PR2, PR1, and PR0) in the TSCR2 register determine the rate at which the counter will increment, hence will determine the TOF interrupt rate. To clear the TOF flag (acknowledge the interrupt), the software writes a one to it. To create a TOF periodic interrupt, we enable the timer (TEN 1), arm the timer overflow (TOI), and set the rate (PR2-0). Let n be the 3-bit number (0 to 7) formed from the least significant three bits of TSCR2. Let fE be the frequency of the E clock (adjusted by the PLL). The TOF interrupt rate is TOF interrupt frequency fE /2n 16 TOF interrupt period 2n 16/fE
PTT DDRT TSCR1 TSCR2 TIOS TIE TFLG1 TFLG2 TCTL1 TCTL2 TCTL3 TCTL4
14 14 14 14 14 14 14 14 14
13 13 13 13 13 13 13 13 13
12 12 12 12 12 12 12 12 12
11 11 11 11 11 11 11 11 11
10 10 10 10 10 10 10 10 10
9 9 9 9 9 9 9 9 9
8 8 8 8 8 8 8 8 8
7 7 7 7 7 7 7 7 7
6 6 6 6 6 6 6 6 6
5 5 5 5 5 5 5 5 5
4 4 4 4 4 4 4 4 4
3 3 3 3 3 3 3 3 3
2 2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 1 1
lsb
Name
0 0 0 0 0 0 0 0 0
TCNT TC0 TC1 TC2 TC3 TC4 TC5 TC6 TC7
Table 9.8 9S12 registers used for timer overflow, input capture, and output compare.
E 4 MHz
E 8 MHz
E 24 MHz
PR2
PR1
PR0
Divide by
TCNT period
TOF period
TCNT period
TOF period
TCNT period
TOF period
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
1 2 4 8 16 32 64 128
250 ns 500 ns 1 s 2 s 4 s 8 s 16 s 32 s
16.384 ms 32.768 ms 65.536 ms 131.072 ms 262.144 ms 524.288 ms 1048.576 ms 2097.152 ms
125 ns 250 ns 500 ns 1 s 2 s 4 s 8 s 16 s
8.192 ms 16.384 ms 32.768 ms 65.536 ms 131.072 ms 262.144 ms 524.288 ms 1048.576 ms
42 ns 83 ns 167 ns 333 ns 667 ns 1.33 s 2.67 s 5.33 s
2.73067 ms 5.46133 ms 10.9227 ms 21.8453 ms 43.6907 ms 87.3813 ms 174.763 ms 349.525 ms
Table 9.9 Timer overflow periods for various E clock frequencies.
Example 9.3 Write software that increments a global variable every 32.768 ms. Solution The solution will use a periodic timer overflow interrupt that occurs every 32.768 ms. The main program executes TOF_Init to initialize the periodic interrupts, as shown in
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
347
Program 9.3. The interrupt rate is determined by the crystal frequency, the PLL and the TSCR2 register. When an E clock of 8 MHz, 32.768 ms/125 ns is 218, so the bottom three bits of TSCR2 should be 2. Bit 7 of the TSCR2 register is set to arm the TOF system. The TOF_Init routine initializes the global variable and enables interrupts (cli). The ISR will acknowledge the interrupt and increment a global variable, Time. The ISR makes the trigger flag zero by writing a one to it.
; 9S12C32 4MHz, 9S12DP512 8 MHz org $0800 ;($3800 if C32) Time rmb 2 org $4000 TOF_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$82,TSCR2 ;($81 if C32) movw #0,Time cli ;enable IRQ rts TOFHan movb #$80,TFLG2 ;acknowledge ldd Time addd #1 std Time rti org $FFDE fdb TOFHan ;vector
// 9S12C32 4MHz, (9S12DP512 8 MHz) unsigned short Time; void TOF_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // enable counter TSCR2 = 0x82; // (0x81 if C32) Time = 0; // Initialize asm cli // enable interrupts }
Program 9.3 Implementation of a periodic interrupt using timer overflow.
Checkpoint 9.9: How would you modify Program 9.3 to count approximately every 1 second?
9.6.2 Output Compare Interrupts
The third mechanism to generate periodic interrupts is output compare. There are 8 independent output compare channels, numbered 0 to 7. Let i be the channel number, 0 i 7. To enable output compare the corresponding bit in the TIOS register must be set. When the TCNT register matches TCi, the output compare flag, CiF is set. If armed (CiI1), then it will request an interrupt. To clear the CiF flag (acknowledge the interrupt), the software writes a one to it. The ISR will acknowledge the interrupt and set TCi TCiPERIOD, where PERIOD is a constant, specifying the time for the next interrupt. The interrupting period is determined by the TCNT period (set by TSCR2) multiplied by the constant PERIOD. Let n be the 3-bit number (0 to 7) formed from the least significant three bits of TSCR2. Let fE be the frequency of the E clock (adjusted by the PLL). The output compare interrupt rate is OC interrupt frequency fE /2n/PERIOD OC interrupt period PERIOD*2n/fE TCTL1 and TCTL2 registers are also used for output compare. If OMi OLi 0 then an output compare event will not directly affect the output pin. If the pair (OMi,OLi) equals (0,1) then the output pin will toggle on each output compare. If the pair (OMi,OLi) equals (1,0) then the output pin will clear on each output compare. If the pair (OMi,OLi) equals (1,1) then the output pin will set on each output compare.
348
9 䡲 Interrupt Programming and Real-Time Systems
Example 9.4 Write software that increments a global variable every 1 second. Solution When an E clock of 8 MHz, 1 s/125 ns is 8,000,000. The only possibility is to make n equal to 7 and PERIOD equal to 62500. Program 9.4 shows a periodic interrupt using output compare 6, incrementing a global variable, Time, every 1 sec. During the initialization, bit 7 of TSCR1 is set to activate the timer system. The TCNT period is set to 16 s in TSCR2. Bit 6 in TIOS is set to activate output compare on channel 6. The arm bit is set in TIE. The global variable is cleared in the initialization. The initial value of TC6 is set so the first interrupt occurs in 80 s (subsequent interrupts will occur every 1 s). It is possible the C6F flag might already be set, due to activity occurring before the initialization is executed. Clearing the C6F trigger flag in the initialization guarantees the first interrupt will occur exactly 80 s later. ; 9S12C32 4 MHz, 9S12DP512 8 MHz PERIOD equ 62500 ;in 16usec org $0800 ;($3800 if C32) Time rmb 2 org $4000 OC6_Init sei ;make atomic movb #$80,TSCR1 ;enable TCNT movb #$07,TSCR2 ;($06 if C32) bset TIOS,#$40 ;activate OC6 bset TIE,#$40 ;arm OC6 movw #0,Time ldd TCNT ;time now addd #5 ;first in 80us std TC6 movb #$40,TFLG1 ;clear C6F cli ;enable IRQ rts OC6Han movb #$40,TFLG1 ;acknowledge ldd TC6 addd #PERIOD std TC6 ;next in 1 s ldd Time addd #1 std Time rti org $FFE2 fdb OC6Han ;vector
// 9S12C32 4 MHz, 9S12DP512 8 MHz #define PERIOD 62500 unsigned short Time;
void OC6_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // 16us TCNT TSCR2 = 0x07; // (0x06 if C32) TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 Time = 0; // Initialize TC6 = TCNT+5; // first in 80us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ TC6 = TC6+PERIOD; // next in 1 s TFLG1 = 0x40; // acknowledge C6F Time++; }
Program 9.4 Implementation of a periodic interrupt using output compare. Checkpoint 9.10: How would you modify Program 9.4 to count at 100 Hz? Observation: The phase-lock-loop (PLL) on the 9S12 will affect the TOF and output compare rates.
Example 9.5 Design an interface 32 speaker and use it to generate a loud 1 kHz sound. Solution At 5 V, a 32 speaker will require a current of about 150 mA. We will use the 2N2222 circuit in Figure 8.16 because it can sink at least three times the current needed for this speaker.
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
349
In this example the interface will be connected to PT6. We select a 5 V supply and connect it to the V in the circuit. The needed base current is Ib Icoil/hfe 150 mA/100 1.5 mA The desired interface resistor. Rb (VOH Vbe)/ Ib (5 0.6)/1.5 mA 2.9 k To cover the variability in hfe, we will use a 1.5 k resistor instead of the 2.9 k. The actual voltage on the speaker when active will be 5 0.3 4.7 V. We can make the sound quieter by using a larger resistor for Rb. To generate the 1 kHz sound we need a 1 kHz squarewave. There are two good methods on the 9S12 to generate squarewaves. First, the output compare module can be used to create an interrupt every 0.5 ms, and make the output toggle at each interrupt. The second method uses the pulse width modulator (PWM) and previously presented in Section 8.6. The output compare method is used here (Program 9.4 adapted), but the PWM approach has the advantage of not requiring a periodic interrupt. The initialization of Program 9.5 selects toggle mode for output compare 6. Specifically, we set the bits (OM6,OL6) to (0,1) in TCTL1. To select the frequency of the sound we simply set the rate at which output compare interrupts are generated. To turn the sound off, we disarm OC6 interrupts. Notice with toggle mode, the output compare hardware changes the PT6 output automatically. Using automatic mode (as compared to having the software set and clear the port) creates a squarewave with a very low jitter (down to the stability of the crystal).
// 9S12C32 4 MHz, 9S12DP512 8 MHz void OC6_Init(void){ asm sei // Make atomic TSCR1 = 0x80; // 1 MHz TCNT TSCR2 = 0x03; // (0x02 if C32) TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 TCTL1 = (TCTL1&0xCF)|0x10; TC6 = TCNT+50; // first in 50us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ TC6 = TC6+500; // next in 0.5 ms TFLG1 = 0x40; // acknowledge C6F }
Program 9.5 Sound output using output compare.
Observation: To make a quieter sound, we could use a larger resistor between the 9S12 output and the 2N2222 base.
350
9 䡲 Interrupt Programming and Real-Time Systems
9.6.3 Input Capture Interrupts
We can use input capture to measure the period or pulse width of digital signals. The input capture system can also be used to trigger interrupts on rising or falling transitions of external signals. Table 9.8 shows the registers needed for input capture. TCNT is a 16-bit counter incremented at a fixed rate, determined by the E clock and the TSCR2 register. On most 9S12 microcontrollers, an input capture feature exists for each of the eight Port T inputs (let n be 0 to 7, representing the input PT0 to PT7 respectively.) There is a separate 16-bit input capture register for each of the 8 input capture modules (TC0 to TC7). Each input capture module has 䡲 䡲 䡲 䡲 䡲 䡲
A direction register bit, DDRTn An external input pin, PTn A flag bit, CnF Two edge control bits, EDGnB EDGnA An interrupt arm bit CnI A 16-bit input capture register, TCn
In this book, we use the term arm to describe the bit that allows/denies a specific flag from requesting an interrupt. The Freescale manuals refer to this bit as a mask. I.e., the device is armed when the mask bit is 1. Typically, there is a separate arm bit for every flag that can request an interrupt. An external input signal is connected to the input capture pin (PT0 to PT7). The EDGnB, EDGnA bits specify whether the rising, falling or both rising and falling edges of the external signal will trigger an input capture event, see Table 9.10. Two or three actions result from an input capture event: 1. The current TCNT value is copied into the input capture register, TCNT → TCn 2. The input capture flag is set, 1 → CnF 3. An interrupt is requested if the CnI equals 1 This means an interrupt can be requested on a capture event. The input capture mechanism has many uses. Three of common applications of input capture are: 1. An interrupt service routine is executed on the active edge of the external signal 2. Perform two rising edge input captures and subtract the measurements to get period 3. Perform rising edge then falling edge captures and subtract the measurements to get pulse width The flag bits do not behave like a regular memory location. In particular, a flag can not be set by software. Rather, an input capture or output compare hardware event will set the flag. The other peculiar behavior of the flag is that the software must write a one to the flag in order to clear it. If the software writes a zero to the flag, no change will occur. The pin is selected as input capture by placing a 0 in the corresponding bit of the TIOS register. There is a direction register, DDRT, and we should clear the corresponding bits for the input capture inputs. We specify the active edge (i.e., the edge that latches TCNT and sets the flag) by initializing the TCTL3 and TCTL4 registers, as described in Table 9.10. We can arm or disarm the input capture interrupts by initializing the TIE register. Our software can determine if an input capture event has occurred by reading the TFLG1 register. Every time the TCNT register overflows from $FFFF to 0, the TOF flag in the TFLG2 register is set. The TOF flag will cause an interrupt if the mask TOI equals 1. Checkpoint 9.11: When does an input capture event occur?
Table 9.10 Two control bits define the active edge used for input capture.
EDGnB
EDGnA
Active Edge
0 0 1 1
0 1 0 1
None Capture on rising Capture on falling Capture on both rising and falling
9.6 䡲 Timer Overflow, Output Compare, and Input Capture
351
Checkpoint 9.12: What happens during an input capture event? Observation: The TCNT timer is very accurate because of the stability of the crystal clock. Therefore, measurements based on the clock will also be very accurate. Observation: When measuring period or pulse-width, the measurement resolution will equal the TCNT period.
The flags in the TFLG1 and TFLG2 registers are cleared by writing a 1 into the specific flag bit we wish to clear. For example, writing a $FF into TFLG1 will clear all 8 flags. The following is a valid method for clearing C3F. I.e., this acknowledge sequence clears the C3F flag without affecting the other 7 flags in the TFLG1 register. TFLG1 = 0x08; Checkpoint 9.13: Write assembly or C code to clear C6F. Common Error: Executing TFLG1 |= 0x08; will mistakenly clear all the bits in the TFLG1 register.
Example 9.6 Design a system that measures period with a resolution of 1 s. Solution Period is defined as the time from one rising edge to the next rising edge. The input signal will be connected to PT1 (any Port T pin could have been used) and the input capture system will be used to measure period. The initialization function first sets the I bit, so interrupts do not occur until the entire initialization sequence is complete, see Program 9.6. TIOS bit 1 and DDRT bit 1 are cleared so PT1 will be an input capture. Input capture is part of the timer module, which is activated by setting the TEN bit. The resolution of the system is determined by the period of the TCNT, so TSCR2 is set to make the TCNT period equal to 1 s, assuming the E clock is 8 MHz. Because the 9S12 must execute the ISR every rising edge, we should not try to use this solution to measure periods less than 50 s. In particular, it takes 9 bus cycles to perform an interrupt context switch plus 31 cycles to execute this assembly language ISR (Metrowerks Codewarrior C ISR executes in 30 cycles), so 40 cycles or 5 s are required for each edge. If the input wave has a period of 50 s, then the ISR software consumes 10% of the available processor execution. On the other extreme, this solution will will be incorrect for periods over 65.535 ms. The TCTL4 register is configured to so PT1 captures on each rising edge. Global variables are initialized and interrupts are armed and enabled. The 16-bit subtraction in the ISR calculates the number of TCNT clocks between rising edges. Since the ritual does not wait for the first edge, the first period measurement will be incorrect and should be neglected. Period rmb 2 ;resolution 1us First rmb 2 ;TCNT at first edge Done rmb 1 ;set each rising Init sei ;make atomic bclr TIOS,#$02 ;PT1=input capture bclr DDRT,#$02 ;PT1 is input movb #$80,TSCR1 ;enable TCNT movb #$03,TSCR2 ;1us clk bclr TCTL4,#$08 ;EDG1BA =01 bset TCTL4,#$04 ;on rise of PT1 movw TCNT,First ;init global
// Range = 50 us to 65.535 ms, // no overflow checking unsigned short Period; // 1us units unsigned short First; // TCNT first edge unsigned char Done; // Set each rising void Init(void){ asm sei // make atomic TIOS &=~0x02; // PT1 input capture DDRT &=~0x02; // PT1 is input TSCR1 = 0x80; // enable TCNT TSCR2 = 0x03; // 1us clock
continued on p. 354 Program 9.6 A software system implementing 16-bit period measurement.
352
9 䡲 Interrupt Programming and Real-Time Systems
continued from p. 353 clr Done movb #$02,TFLG1 ;clear C1F bset TIE,#$02 ;Arm C1F cli ;enable rts TC1Han ldd TC1 [3] subd First [3] std Period ;1us resolution[3] movw TC1,First ;setup [6] movb #$02,TFLG1 ;clear C1F [4] movb #$FF,Done [4] rti [8] org $FFEC ;timer channel 1 fdb TC1Han
TCTL4 = (TCTL4&0xF3)|0x04; // rising First = TCNT; // first will be wrong Done = 0; // set on subsequent TFLG1 = 0x02; // Clear C1F TIE |= 0x02; // Arm IC1 asm cli } void interrupt 9 TC1Han(void){ Period = TC1-First; // 1us resolution First = TC1; // Setup for next TFLG1 = 0x02; // ack by clearing C1F Done = 0xFF; }
Because the input capture interrupt has a separate vector the software does not poll. An interrupt is requested on each rising edge of the input signal. Figure 9.9 illustrates the period measurement for one situation with a period of 8192 s. On the first interrupt, TCNT ($F000) is latched into TC1. The ISR will save the $F000 in the private global called First. On the second interrupt, TCNT ($1000) is latched into TC1. The ISR will perform a 16-bit subtraction of $1000 $F000 $2000 8192, and store the 8192 into the public global called Period. This method is accurate as long as the period is between 50 and 65535 s. Figure 9.9 Example measurement of an input with a 8192 s period.
TCNT
EFFF F000 F001
FFFE FFFF 0000 0001
0FFF 1000 1001
1μs
8192 μs = 8192 cycles PT1 TC1
C1F=1 XXXX
F000
C1F =1 F000
F000
1000
Checkpoint 9.14: How would you modify Program 9.6 to implement a 2 s measurement resolution?
The interface circuit in Figure 9.7 could be combined with Program 9.6 to measure the speed of a spinning motor by connecting V2 to PT1 and calculating Speed = constant/Period.
9.7
Pulse Accumulator The pulse accumulator is a mechanism on the 9S12 to count events, measure frequency, or measure pulse width on a digital input signal. For example, if we wished to know how fast a motor is spinning, we could use a tachometer, which generates a squarewave with a frequency that is related to motor speed. We interface the tachometer output to the PT7 input and use the pulse accumulator to measure either frequency or pulse width. The software then converts the pulse accumulator measurements into motor speed. The 9S12 pulse accumulator is a 16-bit read/write counter that can operate in either of two modes. External event counting mode can be used for counting events or frequency measurement. We will use gated time accumulation mode for pulse width measurement. The I/O ports involved in the 9S12 pulse accumulator are shown in Table 9.11. The bits used in this section are shown in bold.
9.7 䡲 Pulse Accumulator
353
Address
msb
$0062
15
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$0046 $0060 $0061 $0240 $0242
TEN 0 0 PT7 DDRT7
TSWAI PAEN 0 PT6 6
TSBCK PAMOD 0 PT5 5
TFFCA PEDGE 0 PT4 4
0 CLK1 0 PT3 3
0 CLK0 0 PT2 2
0 PAOVI PAOVF PT1 1
0 PAI PAIF PT0 Bit 0
TSCR1 PACTL PAFLG PTT DDRT
14
13
12
11
10
9
8
7
6
5
4
3
2
1
lsb
Name
0
PACNT
Table 9.11 9S12 I/O ports used by the pulse accumulator.
DDRT7 is the Data Direction bit for PT7. Normally, the DDRT7 bit is cleared so PT7 is an input, but even if it is configured for output, PT7 still drives the pulse accumulator. PAEN is the Pulse Accumulator System Enable bit. Turn this bit on to activate the pulse accumulator. The PAMOD and PEDGE bits select the operation mode, as shown in Table 9.12.
PAMOD
PEDGE
Mode
Action on Clock
Sets PAIF
0 0 1 1
0 1 0 1
event counting event counting gated time accumulation gated time accumulation
PT7 falling edge increments PACNT PT7 rising edge increments PACNT Counts when PT7 1 Counts when PT7 0
Falling edge Rising edge Falling edge Rising edge
Table 9.12 9S12 pulse accumulator operation modes on PT7.
In the event counting mode, the 16-bit counter (PACNT) is incremented on either the rising edge or falling edge of PT7. The maximum clocking rate for the external event counting mode is the E clock frequency divided by two. Event counting mode does not require the timer to be enabled. To use counting mode to measure frequency, we count the number of edges in a fixed time, T. We define frequency resolution as the smallest change in frequency the system can recognize. In this approach, the frequency resolution will be 1/T. The range of frequencies that can be measured will be 0 to 65535/T. In the gated time accumulation mode, a free-running clock (E clock divided by 64) increments the 16-bit counter. In particular, the E clock divided by 64 increments PACNT while the PT7 input is active. Gated accumulation mode does require the TEN in the TSCR1 register to be set. We can use gated accumulation mode to measure pulse width. We define pulse width resolution as the smallest change in pulse width the system can recognize. Let tE be the period of the E clock. The pulse width resolution will be 64*tE. The range of pulse widths that can be measured will be 64*tE to 65535*64*tE. The PAOVF status bit is set each time the pulse accumulator count rolls over from $FFFF to $0000. To clear this status bit, we write a one to the PAFLG register bit 1. The PAOVI will arm the device so that a pulse accumulator interrupt is requested when PAOVF is set. When PAOVI is zero, pulse accumulator overflow interrupts are disarmed. The PAIF status bit is automatically set each time a selected edge is detected at the PT7 pin (PEDGE 0 means falling edge, and PEDGE 1 means rising edge). To clear this status bit, write to the PAFLG register bit 1. The PAII will arm the device so that a pulse accumulator interrupt is requested when PAIF is set. When PAII is zero, pulse accumulator input interrupts are disarmed.
354
9 䡲 Interrupt Programming and Real-Time Systems Observation: The PACNT input and timer channel 7 use the same pin PT7. To use the pulse accumulator, disconnect PT7 from the output compare logic by clearing bits, OM7 and OL7. Also clear the channel 7 output compare 7 mask bit, OC7M7.
Example 9.7 Design a system that measures frequency with a resolution in Hz. Solution To estable the frequency resolution at 1 Hz, we count the number of falling edges that occur in one second. The signal to be measured will be connected to the pulse accumulator input, which is PT7 on the 9S12. The frequency measurement function, shown in Program 9.7, enables the pulse accumulator and selects event counting mode. When measuring frequency it usually doesn’t matter whether we count rising or falling edges. But, in this case, falling edges will be counted. The approach will be to initialize the pulse accumulator to event counting, clear the count, wait 1 second, then read the counter. Since frequency is defined as the number of edges in one second, the value in the PACNT after the one second time delay will be frequency in Hz. The 9S12 can measure 0 to 65535 Hz. In both cases, the frequency resolution (which is the smallest change in frequency that can be distinguished) will be 1 Hz. In general, the frequency resolution will be one divided by the fixed time during which counts are measured. The PAOVF bit will be set if the input frequency exceeds the measurement range. If the input signal has a frequency of 22.1 Hz (as illustrated in Figure 9.10), then function will return a result of 22.
Program 9.7 Frequency measurement using the pulse accumulator.
Figure 9.10 Example measurement with an input with a 22 Hz frequency.
Freq_Init bclr DDRT,#$80 ;PT7 is input movb #$40,PACTL ;count falling rts ;measures 0 to 65535 Hz ;returns Reg D = freq in Hz Freq_Measure movw #0,PACNT movb #$02,PAFLG ;clear PAOVF ldy #1000 bsr Timer_Wait1ms brclr PAFLG,#$02,ok ;check PAOVF bad ldd #65535 ;too big bra out ok ldd PACNT ;units in Hz out rts
void Freq_Init(void){ DDRT &= ~0x80; // PT7 input PACTL = 0x40; // count falling } // measures 0 to 65535 Hz // returns result in Hz unsigned short Freq_Measure(void){ PACNT = 0; PAFLG = 0x02; Timer_Wait1ms(1000); if(PAFLG&0x02){ return(65535); } return PACNT; // frequency }
Checkpoint 9.15: What will be the output of Program 9.7 if the frequency is 1234.56 Hz? Checkpoint 9.16: How do you modify Program 9.7 so that it measures frequency with a resolution of 1 kHz? What the output then be if the frequency is 1234.56 Hz?
9.7 䡲 Pulse Accumulator
355
Example 9.8 Design a system that measures pulse width with a resolution of 8 s. Solution Pulse width is defined as the time the input signal is high. Again, the input signal will be connected to the pulse accumulator input, which is PT7 on the 9S12. The pulse width measurement function, shown in Program 9.8, enables the pulse accumulator and selects gated accumulation mode. In this case, PEDGE is set to zero, so the PACNT will accumulate when the input is high. With PEDGE equal to zero, the PAIF will be set on the falling edge of the input, signaling the pulse width measurement is complete. The approach will be to initialize the pulse accumulator to gated accumulation mode, clear the count, wait for PAIF to be set, then read the counter. Since PACNT counts while the input is high, the value in this counter will represent the width of the pulse. The pulse width resolution is the smallest change in pulse width that can be distinguished. In general, the pulse width resolution will be the period of the free-running clock used to increment the counter. Assuming the 9S12 E clock period is 125 ns, the pulse width resolution will be 8 s. The 9S12 can measure 8 s to 0.52 s. The PAOVF bit will be set if the input pulse width exceeds the measurement range. If the input signal has a pulse width of 152 s (as illustrated in Figure 9.11), then function will return a result of 152/8 or 19.
Pulse_Init bclr DDRT,#$80 ;PT7 is input movb #$60,PACTL ;measure high rts ;returns Reg D = pulse width in 8us ; measures 8us to 0.52s Pulse_Measure movw #0,PACNT movb #$02,PAFLG ;clear PAOVF loop brclr PAFLG,#$01,loop brclr PAFLG,#$02,ok ;check PAOVF bad ldd #65535 ;too big bra out ok ldd PACNT ;units in 8us out rts
void Pulse_Init(void){ DDRT &= ~0x80; // PT7 input PACTL = 0x60; // measure high } // measures 8us to 0.52 sec // returns result in 8us unsigned short Pulse_Measure(void){ PACNT = 0; PAFLG = 0x02; while((PAFLG&0x01)==0){}; if(PAFLG&0x02){ return(65535); } return PACNT; // pulse width }
Program 9.8 Pulse width measurement using the pulse accumulator.
Figure 9.11 Example measurement of an input with a 152 s pulse width.
152μs PT7
PAIF set
E/64 PACNT
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
8μs
Checkpoint 9.17: The pulse width resolution of the system in Program 9.8 is 8 s. What does that mean? Checkpoint 9.18: What will be the output of Program 9.8 if the pulse width is 1234.5 sec?
356
9.8
9 䡲 Interrupt Programming and Real-Time Systems
*Direct Memory Access The purpose of this section is to introduce terminology of high-speed I/O interfacing. The bandwidth of an I/O device is the number of bytes/sec that can be transferred. Real-time systems have extremely tight requirements for both latency and bandwidth. The 9S12 has a 16-bit data bus. If it is executing at 24 MHz, the data bus bandwidth is 48 Mbytes/sec. A high speed SCI interface can achieve only 10,000 bytes/sec. The SPI clock can run at 12 Mbps, but peak bandwidth for an SPI interface will be limited by software speed. One of the limitations of software-based interfaces such as busy-wait and interrupts is the data must be brought into the processor and manipulated before it can be transferred to memory. If you wish to transfer data from an SPI input device into RAM, you must first transfer it from SPIDR to Register A, then from Register A into RAM. In order to achieve high bandwidth, we need to be able to transfer data directly from input to RAM or RAM to output using Direct Memory Access, or DMA. Because DMA is faster, we will use this method to interface high bandwidth devices like disks and networks. A key architecture component is the availability of a co-processor that can perform I/O functions in parallel with but separate from the processor execution. We program a co-processor in a similar manner as the way we program the regular processor. For example, there can be a program counter and general purpose registers. However, the instructions are usually very simple, explicitly defining an I/O operation to perform. An architecture with simple and explicit machine codes is called Reduced Instruction Set Computer (RISC). Devices that support DMA include the hard drive controller on the PC, video graphics controller on the PC, and the XGate peripheral co-processor on the 9S12X series of microcontrollers from Freescale. During a read DMA cycle (Figure 9.12) data flows directly from the memory to the output device. During the DMA cycles the co-processor drives the address and control bus.
Figure 9.12 A DMA read cycle copies data from RAM, ROM or input device into an output device.
$98 DMA Read Cycle
$3800 R Processor
Input ports
Input signals
Output ports
Output signals
RAM $98 ROM Address Control Data
During a write DMA cycle (Figure 9.13) data flows directly from the input device to memory. Figure 9.13 A DMA write cycle copies data from the input device into RAM, or output device.
$25 $3800
DMA Write Cycle W
Processor Input $25 ports
Input signals
RAM Output ports ROM Address Control Data
Output signals
9.9 䡲 Hardware Debugging Tools
357
Prediction: The need for I/O bandwidth will increase faster than the processor execution speed, therefore I/O co-processors will become more prevalent in embedded systems of the future. Prediction: The power requirements increase linearly with bandwidth, so there will always be a place in the embedded systems market for low speed low power systems.
9.9
Hardware Debugging Tools Microcomputer related problems often require the use of specialized equipment to debug the system hardware and software. Two very useful tools are the logic analyzer and in-circuit emulator (ICE). A logic analyzer is essentially a multiple channel digital storage scope with many ways to trigger (see Figure 9.14). As a trouble shooting aid, it allows the experimenter to observe numerous digital signals at various points in time and thus make decisions based upon such observations. As with any debugging process, it is necessary to select which information to observe out of a vast set of possibilities. Any digital signal in the system can be connected to the logic analyzer. Figure 9.14 shows an 8-channel logic analyzer, but real devices can support 128 or more channels. One problem with logic analyzers is the massive amount of information that it generates. With logic analyzers (similar to other debugging techniques) we must strategically select which signals in the digital interfaces to observe and when to observe them. In particular, the triggering mechanism can be used to capture data at appropriate times eliminating the need to sift through volumes of output. Sometimes there are extra I/O pins on the microcontroller, not needed for the normal operation of the system (shown as the bottom two wires in Figure 9.14). In this case, we can connect the pins to a logic analyzer, and add software debugging instruments that set and clear these pins at strategic times within the software. In this way we can visualize the hardware/software timing.
Figure 9.14 A logic analyzer and example output.
Logic Analyzer 9S12
Digital Interface Digital Interface
PT1 PT0
Some microcontrollers have external pins containing the address, R/W, and data containing bus cycle information as discussed in Section 4.2. In this case, we could connect address, R/W, and data to the logic analyzer. The logic analyzer must be synchronized to the processor, so that the analyzer knows which memory reads are op code fetches. This way the location and the data it calculates can be reconstructed from the bus cycles. This debugging method is nonintrusive. This process doesn’t work on high-performance processors such as the Pentium because (1) there is an internal memory cache to contain data it needs most frequently, and (2) it fetches many op codes that are never actually executed, as it tries to prefetch machine codes it thinks the processor will need in the future. An in-circuit emulator is a hardware debugging tool that recreates the input/output signals of the processor chip. To use an ICE, we remove the processor chip. One side of the cable is inserted into the vacated processor chip socket, and the other side is connected to the ICE. Figure 9.15 shows the microcomputer system with and without the ICE. Notice the cable between the debugging instrument (ICE) and the microcomputer socket on the target board. In most cases, the emulator/computer system operates at full speed. The emulator allows the programmer to observe and modify internal registers of the processor. Emulators
358
9 䡲 Interrupt Programming and Real-Time Systems
In-Circuit Emulator Registers I/O Ports I/O
I/O
9S12 Embedded System with microcomputer and I/O
A B X Y SP PC
= = = = = =
$55 $31 $1234 $5678 $0BF0 $F103
PortH PortJ PortS PortT PortE TCNT
= = = = = =
$83 $00 $55 $0F $21 $A010
I/O
I/O
Socket Embedded System with emulator and I/O
Figure 9.15 In-circuit emulator and example output.
are often integrated into a personal computer, so that its editor, hard drive, and printer are available for the debugging process. Observation: Many target microcomputer systems have the microcomputer chip soldered onto the circuit board, and thus can not be removed.
To debug a board-level system where the program is stored in an external ROM, we can use another class of emulator called the ROM-emulator (see Figure 9.16). This debugging tool replaces the ROM with cable connects to a dual-port RAM within the emulator. While the software is running, it fetches information from the emulator RAM just like it was the ROM. While the software is halted, you can modify its contents. Figure 9.16 In-circuit ROM emulator and example output.
Observation: An in-circuit ROM emulator can only be used in a microcomputer system that stores the program into an external ROM chip.
The only disadvantage of the in-circuit emulator is its cost. To provide some of the benefits of this high-priced debugging equipment, the 9S12 has a background debug module (BDM). The BDM hardware exists on the microcomputer chip itself and communicates with the debugging computer via a dedicated serial interface, as shown in Figure 9.17. Although not as flexible as an ICE, the BDM can provide the ability to observe software execution in real-time, the ability to set breakpoints, the ability to stop the computer and the ability to read and write registers, I/O ports and memory. The registers can only be observed when the computer is halted, but the memory and I/O ports are accessible while the program is executing.
9.10
Profiling Profiling is similar to performance debugging because both involve dynamic behavior. Profiling is a debugging process that collects the time history of strategic variables. For example if we could collect the time-dependent behavior of the program counter, then we could see the execution patterns of our software. We can profile the execution of a multiple thread software system to detect reentrant activity. We can profile a software system to see which of two software modules is run first. For a real-time system,
9.10 䡲 Profiling
359
Figure 9.17 P&E Microcomputer Systems Multilink BDM.
we need to guarantee the time between when software should be run and when it actually runs is short and bounded. Profiling allows us to measure when software is actually run, experimently verifying the system is real-time.
9.10.1 Profiling Using a Software Dump to Study Execution Pattern
Program 9.9 Debugging instrument for profiling.
In this section, we will use a debugging instrument to study the execution pattern of our software. In order to collect information concerning execution we will define a debugging instrument that saves the time and location in an array (like a dump), as shown in Program 9.9. The debugging session will initialize the private global N to zero. In this profile, the place p will be an integer, uniquely specifying from which place in the software Profile is called. The assembly version of Profile requires 44 cycles to execute (including the ldy and jsr). If the 9S12 is running at 24 MHz, this debugging instrument consumes less then 2 s per call. This amount of time would usually be classified as minimally intrusive.
Time rmb 200 Place rmb 200 N rmb 1 Profile ;RegY contains p pshb pshx ldab N cmpb #100 ;full? bhs Pdone lslb ;16-bits each ldx #Time movw TCNT,B,x ;record time ldx #Place sty B,X ;record place inc N Pdone pulx pulb rts
unsigned short Time[100]; unsigned short Place[100]; unsigned char N; void Profile(unsigned short p){ if(N<100){ Time[N] = TCNT; // record time Place[N] = p; // record place N++; } }
360
9 䡲 Interrupt Programming and Real-Time Systems
Next, we add calls to the debugging instrument at strategic locations in the software, giving a different number for each place, as shown in Program 9.10. By observing these data, we can determine both a time profile (when TCNT) and an execution profile (where p) of the software execution.
Program 9.10 A time/position profile dumping into a data array.
;------t=sqrt(s)-----; input s RegA, resolution 1/16 ; output t Reg B, 1/16 t rmb 1 ;8-bit, res=1/16 cnt rmb 1 ;loop counter s16 rmb 2 ;16-bit 16*s sqrt ldy #0 jsr Profile clrb ;sqrt(0)=0 tsta beq done ldy #1 jsr Profile ldab #16 mul ;16*s std s16 ;s16=16*s movb #32,t ;t=2.0 movb #3,cnt next ldy #2 jsr Profile ldaa t ;RegA=t tab ;RegB=t tfr a,x ;RegX=t mul ;RegD=t*t addd s16 ;RegD=t*t+16*s idiv ;RegX=(t*t+16*s)/t tfr x,d lsrd ;RegB=((t*t+16*s)/t)/2 adcb #0 stab t dec cnt bne next done ldy #3 jsr Profile rts
Observation: Debugging instruments need to save and restore registers so the original function is not disrupted.
9.10.2 Profiling Using an Output Port
In this section, we will discuss a hardware/software combination to visualize program activity. Our debugging instrument will set output port bits. We will place these instruments at strategic places in the software. If we are using a regular oscilloscope, then we must stabilize the system so that the function is called over and over. We connect the output pins to a scope or logic analyzer and observe the program activity. Program 9.11 uses an output port to profile.
9.10 䡲 Profiling Program 9.11 A time/position profile using two output bits.
;------t=sqrt(s)-----; input s RegA, resolution 1/16 ; output t Reg B, 1/16 t rmb 1 ;8-bit, res=1/16 cnt rmb 1 ;loop counter s16 rmb 2 ;16-bit 16*s sqrt movb #0,PTT clrb ;sqrt(0)=0 tsta beq done movb #1,PTT ldab #16 mul ;16*s std s16 ;s16=16*s movb #32,t ;t=2.0 movb #3,cnt next movb #2,PTT ldaa t ;RegA=t tab ;RegB=t tfr a,x ;RegX=t mul ;RegD=t*t addd s16 ;RegD=t*t+16*s idiv ;RegX=(t*t+16*s)/t tfr x,d lsrd ;RegB=((t*t+16*s)/t)/2 adcb #0 stab t dec cnt bne next done movb #3,PTT rts
361
//------t=sqrt(s)-----unsigned char sqrt(unsigned char s){ unsigned char t; // resolution 1/16 unsigned char cnt; // loop counter unsigned short s16; PTT = 0; t = 0; // secant method if(s>0) { PTT = 1; s16 = 16*s; t = 32; // guess 2.0 for(cnt=3; cnt; cnt--){ PTT = 2; t = ((t*t+s16)/t)/2; } } PTT = 3; return t; }
Checkpoint 9.19: Write two friendly debugging instruments, one that sets Port B bit 3 high, and the other makes it low.
9.10.3 *Thread Profile
When more than one thread is active, you could use the previous technique to visualize the thread that is currently running. For each thread, we assign an output pin. The debugging instrument would set the corresponding bit high when the thread starts and clear the bit when the thread stops. We would then connect the output pins to a multiple channel scope or logic analyzer to visualize in real-time the thread that is currently running. For an example of this type of profile, run one of the thread.* examples included with the TExaS simulator, and observe the logic analyzer. Program 9.12 shows a simple thread profile of a system with a foreground thread (main program) and a background thread (ISR). PT1 will be high when the software is running in the foreground and PT0 will be high when executing in the background. The debugging instruments are shown in bold. The ISR saves the previous PTT value at the beginning and restores it at the end. The results shown in Figure 9.18 demonstrate the interrupt occurs every 128 s and most of the time, the software is running in the foreground.
362
9 䡲 Interrupt Programming and Real-Time Systems
org $0800 ;($3800 if C32) rmb 2 org $4000 main lds #$4000 bset DDRT,#$03 ;PT1,PT0 output movb #$20,RTICTL ;($10 if C32) movb #$80,CRGINT ;arm RTI movw #0,Time cli ;enable IRQ movb #$02,PTT ;foreground loop bra loop ; interrupts every 128us RTIHan ldab PTT ;save movb #$01,PTT ;background movb #$80,CRGFLG ;ack ldx Time inx stx Time stab PTT ;restore rti org $FFF0 fdb RTIHan ;vector Time
unsigned short Time; void main(void){ DDRT |= 0x03; // PT1,PT0 output RTICTL = 0x20; // (0x10 if C32) CRGINT = 0x80; // Arm Time = 0; // Initialize asm cli PTT = 0x02; // foreground while(1){ } } // interrupts every 128us void interrupt 7 RTIHan(void){ char oldPTT=PTT; PTT = 0x01; // background CRGFLG = 0x80; // Acknowledge Time++; PTT = oldPTT; }
Program 9.12 Implementation of a periodic interrupt using the real time clock feature.
Figure 9.18 Real-time thread profile measured with a logic analyzer.
Observation: Notice in Figure 9.18 that the time to execute the ISR (when PT0 is high) is short compared to the time between interrupt requests (period of PT0). This represents a good interrupt design.
9.11 䡲 Tutorial 9. Profiling
9.11
363
Tutorial 9. Profiling In this tutorial we will profile a real-time system that uses four periodic output compare interrupts. The goal of the system is to periodically execute four separate tasks in the background. Each task is performed at fixed rate; the four rates are similar but unequal, as shown in Table T9.1. As you can see from the table, in each 1 second the time to execute all four tasks is less than 200 ms. In other words, we plan to use only 20 percent of the available processor time. The TCNT period will be set to 16 s.
Task
ISR code
Interrupt period
Time to execute Task
Total time in 1 second
Task 0 Task 1 Task 2 Task 3
TC0TC079 TC1TC173 TC2TC267 TC3TC350
1264 s 1168 s 1072 s 800 s
50 s 50 s 50 s 50 s
39.6 ms 42.8 ms 46.6 ms 62.5 ms
Table T9.1 Real-time requirements of an embedded system. Monitors and memory dumps are minimally intrusive techniques to collect strategic information without slowing down too much the system we are testing. At the start of each ISR, one bit in Port T will be set high, and at the end of the ISR, that bit will be cleared. In addition, the main program will toggle PT4. We profile this system by observing all five bits on a logic analyzer. This profile will allow us to see where and when our tasks are running. We will be studying Task 3 in particular, so we expect PT3 to go high for 50 s every 800 s. The second debugging instrument used in this tutorial is a memory dump. It is a memory dump because the debugging information is not output or displayed, but rather it is just dumped into a memory buffer. In particular, we will measure the time between one execution of Task3 until next execution of Task3. These measurements are entered into a histogram so we can see the variability in the period. Let I be the difference in TCNT cycles, which we expect to be 50 each time. We calculate the time error or jitter as JI50. Next, we make it unsigned (KJ8) and apply upper and lower bounds (if K0 then K0, if K16 then K16). A histogram is a count of the number of times an event occurs, so we perform Dbg_Hist[K]. The first entry, Dbg_Hist[0], is the number of times the time between executing Task3 is more than 96 s too early. The middle entry, Dbg_Hist[8], is the number of times it perfect (50 cycles or 800 s). Similarly, Dbg_Hist[16] is the number of times it is more than 96 s too late. Dbg_Hist rmb 34 ;16-bit counts Question 9.1 What could cause a delay in executing the Task3 ISR? The debugging instruments shown in Program T9.1 were used to profile the system. FirstFlag is a flag used to skip the first measurement, because there is no previous interrupt to measure the time delay from. PreviousTCNT is the TCNT measurement from the previous execution of Task3. The initialization is called once, and the measurement is called from the start of Task3. Question 9.2 Why is it important to know the variability in the time between successive executions of a periodic task? Question 9.3 Consider the situation when two interrupts are requested at the same time. Is one lost or just delayed? If both are executed, which one goes first? Observation: Profiling is made easier if the subroutine as a single rts exit point at the bottom of the function. Action: Copy the Tutor9.rtf Tutor9.uc Tutor9.scp files from the web onto your hard drive. Start a fresh copy of TExaS and open these files from within TExaS. Assemble and run the system, observing the logic analyzer. You should see something like Figure T9.1 Question 9.4 Observe Figure T9.1. PT3 signifies the execution of Task3. The time between the first and second PT3 pulses is noticeably longer than the time between the second and third PT3 pulses. Why?
364
9 䡲 Interrupt Programming and Real-Time Systems
Program T9.1 Debugging instruments to measure time jitter of a periodic task.
Question 9.5 In the original design specification we expected the four tasks to occupy 20% of the available processor time. Does the data in Figure T9.1 support or reject this hypothesis?
Figure T9.1 Profile the system.
Question 9.6 Observing the listing file, estimate the intrusiveness of the Dbg_Measure instrument. Action: Close the logic analyzer window (so the simulation runs faster). Start the system and let it run for a long time. Question 9.7 The Dbg_Hist[8] entry will get very large, but does either Dbg_Hist[0] or Dbg_Hist[16] ever get incremented? What does that mean?
9.12 䡲 Homework Problems
365
Action: If we were to active the PLL changing the E clock from 8 MHz to 24 MHz, then the tasks would run 3 times faster. We can quickly simulate this effect by changing the 400 in the Fiftyus function to 133, making all four tasks complete in about 17 s instead of 50 s. Changing the E clock does not change how often the tasks should be run. I.e., Task 3 should still run every 800 s, but now only takes 17 s to complete. Assemble the new system and let it run for a long time. Question 9.8 Can you say this new system is real time? Question 9.9 How would you prove Task 3 is now running in real time?
9.12
Homework Problems Homework 9.1 Your job is to design a device driver for a computer mouse. Assuming it is to be written in C, give the Mouse.h header file that lists the prototypes for the public functions. You show just the header file, not the implementation file. Homework 9.2 Your job is to design a device driver for a black and white text-based video screen. There are 24 lines and 80 columns. Assuming it is to be written in C, give the Video.h header file that lists the prototypes for the public functions. Homework 9.3 In this problem you will write an assembly language subroutine that outputs data to the following printer using a busy-waiting handshake protocol.
Figure Hw9.3 Printer interface.
9S12
Printer
Start
Start
PA4 PA0 PB7-PB0
Ack Data
Ack Data
The following sequence will print one ASCII character: 1. The microcomputer puts the 8-bit ASCII on the Data lines 2. The microcomputer issues a Start pulse (does not matter how wide) 3. The microcomputer waits for the Ack pulse (Printer is done) a) Show the subroutine that outputs a character You may assume the Ack pulse is larger than 10 s. The 8-bit ASCII data to print is passed by value in Reg B. An example calling sequence is ldab #’V ; ASCII ‘V’ jsr Output b) How long is your Start pulse? Explain your calculation Homework 9.4 Redesign the printer interface of Homework 9.3 using interrupt synchronization. Connect the printer to Ports H and J and use a key wakeup interrupt on the Ack signal Write three routines: an initialization subroutine to turn it on, a public function that accepts a null-terminated ASCII string pointed to by register X, and an interrupt service routine triggered by the rising edge of Ack. The ASCII string to print is passed by reference in Reg X. An example calling sequence is ldx jsr
#String ; pointer to null-terminated ASCII string OutString
After the string has been printed, the system should disarm Homework 9.5 What happens if you forget to execute cli in the initialization in a system using interrupts? Homework 9.6 What happens if you execute cli as the first instruction in an ISR? Homework 9.7 What happens if you execute sei as the last instruction in an ISR? Homework 9.8 Write interrupting software that maintains the time of day. Give the initialization, the ISR, and the interrupt vector. The initial time of day is passed in when initialization is called. Register
366
9 䡲 Interrupt Programming and Real-Time Systems A contains the initial hour, Register B contains the initial minute. Assume the initial seconds are 0. Implement military time, where the hour goes from 0 to 23. Homework 9.9 Write interrupting software that counts a global variable at 1 Hz. Give the initialization, the ISR, and the interrupt vector. Homework 9.10 Assuming the object code is running in RAM, write three debugging subroutines that implement a ScanPoint system. The first subroutine initializes your system. The second subroutine adds a ScanPoint at the address passed into it in Register D. You may assume that the ScanPoint address is the first byte of an op code. When the target program executes that scanned instruction, the values of the registers are displayed, the original instruction is executed, and the program continues execution. Your system should be able to support up to ten ScanPoints. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. The last subroutine removes a ScanPoint at the address passed into it in Register D. For simplicity, you may assume scanpoints are only placed at single-byte instructions. Homework 9.11 Assuming the object code is running in RAM, write debugging subroutines that implement single stepping. In particular, write a subroutine that executes the target software at the address passed into it in Register D. You may assume that the starting address is the first byte of an op code. Your system should execute the target program one instruction at a time, showing the values of the registers, and pausing for SCI input after each instruction. You may assume the SCI port is not used for the target system, and you can call any of the routines defined in tut2.rtf. If the operator types ‘q’, then the debugging halts and control is returned to the program that called your subroutine. For any other input, you should execute the next instruction. This is an advanced topic and will require output compare interrupts to solve. Homework 9.12 Create the repeating waveform on PT7 output as shown in Figure Hw9.12. Design the software system using RTI periodic interrupts. Show all the software for this system: direction registers, global variables, stack initialization, RTI initialization, main program, RTI ISR, RTI vector and reset vector. The main program initializes the system, then executes a do-nothing loop. The RTI ISR performs output to Port T. Please make your code that accesses Port T friendly. Variables you need should be allocated in the appropriate places.
Figure Hw9.12 Desired output.
PT7
5.12ms
10.24ms
5.12ms
10.24ms
Homework 9.13 Create the repeating waveform on PT1 output as shown in Figure Hw9.13. Design the software system using OC1 periodic interrupts. Show all the software for this system: direction registers, global variables, stack initialization, OC1 initialization, main program, OC1 ISR, OC1 vector and reset vector. The main program initializes the system, then executes a do-nothing loop. The OC1 ISR performs output to Port T. Please make your code that accesses Port T friendly. Variables you need should be allocated in the appropriate places. Figure Hw9.13 Desired output.
PT1
5 ms
10 ms
5 ms
10 ms
Homework 9.14 Redesign the FSM in Homework 6.24 to run in the background using RTI interrupts. Execute the FSM every 2.048 ms. There are no backward jumps in the ISR. Homework 9.15 Assume the PLL is running so the E clock is 25 MHz. Redesign the FSM in Homework 6.25 to run in the background using input capture and output compare interrupts. The FSM is run whenever there is a rising edge on PT3. There are no backward jumps in the ISR. Homework 9.16 Redesign the FSM in Homework 6.26 to run in the background using output compare interrupts. Execute the FSM every 10 ms. There are no backward jumps in the ISR. Homework 9.17 Redesign the FSM in Homework 6.27 to run in the background using output compare interrupts. Execute the FSM every 5 ms. There are no backward jumps in the ISR. Homework 9.18 Redesign the FSM in Homework 6.28 to run in the background using TOF interrupts. Execute the FSM every 16.384 ms. There are no backward jumps in the ISR.
9.13 䡲 Laboratory Assignments
367
Homework 9.19 Assume the PLL is running so the E clock is 24 MHz. Redesign the system in Homework 6.25 without using the FSM to run in the background using input capture and output compare interrupts. An input capture occurs on the rising edge on PT3. The pulse is created with output compare. There are no backward jumps in the ISR. Homework 9.20 These seven events all occur during each output compare 7 interrupt. 1. 2. 3. 4. 5. 6.
The TCNT equals TC7 and the hardware sets the flag bit (e.g., C7F 1) The output compare 7 vector address is loaded into the PC The I bit in the CCR is set by hardware The software executes movb #$80,TFLG1 The CCR, A, B, X, Y, PC are pushed on the stack The software executes something like ldd TC7 addd #2000 std TC7
7. The software executes rti List one possible order in which the events occur. Homework 9.21 There is a digital squarewave connected to input PT0. Use input capture on PT0 and output compare on channel 1 to measure the frequency on PT0. The range of values is 0 to 10000 Hz, and the desired resolution is 1 Hz. Have the input capture interrupt on every rising edge of the input signal. Within the input capture ISR, increment a private global called Count. Have the output compare interrupt every 1 second. Within the output compare ISR, copy the Count value to a public a global variable called Frequency, then clear Count for the next measurement. For example, if the frequency is 1000 Hz, the variable will be written with 1000. Show the ritual, input capture ISR and output compare ISR. Assume the E clock is 8 MHz. Homework 9.22 There is a digital squarewave connected to input PT2. Use input capture on PT2 and output compare on channel 3 to measure the frequency on PT2. Have the input capture interrupt on every rising edge of the input signal. Within the input capture ISR, increment a private global called Count. Have the output compare interrupt every 0.1 second. Within the output compare ISR, copy the Count value to a public a global variable called Frequency, then clear Count for the next measurement. The range of values is 0 to 10000 Hz, and the desired resolution is 10 Hz. For example, if the frequency is 1000 Hz, the variable will be written with 100. Show the ritual, input capture ISR and output compare ISR. Assume the E clock is 8 MHz. Homework 9.23 Interface a switch to PJ7. Use positive logic (switch pressed makes PJ7 1). The switch bounce time is 10 ms. Use key wakeup on PJ7 and output compare on channel 0 to count the number of times the switch is pressed. Interrupt on the rising edge of PJ7. In the PJ7 ISR, disarm PJ7 and arm OC0 to interrupt in 15 ms. In the OC0 ISR, if the switch is pressed, increment the global variable, Count. The OC0 OSR software should disarm OC0 and rearm PJ7 key wakeup. Show the ritual, key wakeup ISR and output compare ISR. Assume the E clock is 8 MHz. Each touch will cause one key wakeup and one OC interrupt (Count incremented). Similarly, each release will cause one key wakeup and one OC interrupt (Count not incremented).
9.13
Laboratory Assignments Lab 9.1 Traffic Light Controller Purpose. This lab has these major objectives: the usage of linked list data structures, create a segmented software system, interrupt synchronization by designing an input-directed traffic light controller. Description. Design implement and test the traffic light system described in Lab 6.5 with the added constraint that the software runs in the background using periodic interrupts. In particular, there are three components: a data structure containing the state graph, an initialization function that is called once to start the machine, and a periodic interrupt service routine that executes the state machine. All the other specifications and constraints described in Lab 6.5 still apply.
10
Numerical Calculations Chapter 10 objectives are to: c Introduce fixed-point and use it to develop numerical solutions c Develop extended precision mathematical calculations c Define floating point formats
The overall theme of this chapter is numerical calculations. Non-integer values can be represented on the computer using either fixed-point or floating point. Without hardware support, floating point operations run many times slower than fixed-point. Therefore, on a microcontroller like the 9S12 without floating point hardware, we would rather employ fixed-point. In general, we can use fixed-point for situations where the range of values is known at design time, and this range is small.
10.1
Fixed-Point Numbers We will use fixed-point numbers when we wish to express values in our software that have noninteger values. A fixed-point number contains two parts. The first part is a variable integer, called I. This integer may be signed or unsigned. An unsigned fixedpoint number is one that has an unsigned variable integer. A signed fixed-point number is one that has a signed variable integer. The precision of a number system is the total number of distinguishable values that can be represented. The precision of a fixed-point format is determined by the number of bits used to store the variable integer. On the 9S12, we typically use 8 bits or 16 bits. Extended precision can be implemented, but the execution speed will be slower because the calculations will have to be performed using software algorithms rather than with hardware instructions. This integer part is saved in memory and is manipulated by software. These manipulations include but are not limited to add, subtract, multiply, divide, convert to BCD, convert from BCD. The second part of a fixed-point number is a fixed constant, called . This value is fixed at design time, and can not be changed at run time. The fixed constant is not stored in memory. Usually we specify the value of this fixed constant using software comments to explain our fixed-point algorithm. The value of the fixed-point number is defined as the product of the two parts: Fixed-point number ⬅ I• The resolution of a number is the smallest difference that can be represented. In the case of fixed-point numbers, the resolution is equal to the fixed constant (). Sometimes we express the resolution of the number as its units. For example, a decimal fixed-point number with a resolution of 0.001 volts is really the same thing as an integer with units of mV. When inputting numbers from a keyboard or outputting numbers to a display, it may be convenient to use decimal fixed-point. With decimal fixed-point the fixed constant is a power of 10. Decimal fixed-point number I • 10m for some constant integer m
368
10.1 䡲 Fixed-Point Numbers
369
Again, the integer m is fixed and is not stored in memory. Decimal fixed-point will be easy to input or output to humans, while binary fixed-point will be easier to use when performing mathematical calculations. With binary fixed-point the fixed constant is a power of 2. Binary fixed-point number I • 2m for some constant integer m Observation: If the range of numbers is known and small, then the numbers can be represented in a fixed-point format. Checkpoint 10.1: Give an approximation of using the decimal fixed-point ( 0.001) format. Checkpoint 10.2: Give an approximation of using the binary fixed-point ( 28) format.
In the first example, we will develop the equations that a 9S12 would need to implement a digital voltmeter. The 9S12 has a built-in analog to digital converter (ADC) that can be used to transform an analog signal into digital form. The 10-bit ADC analog input range is 0 to 5 V, and the ADC digital output varies 0 to 1023 respectively. Let Vin be the analog voltage in volts and N be the digital ADC output, then the equation that relates the analog to digital conversion is Vin 5*N/1023 0.0048876*N Resolution is defined as the smallest change in voltage that the ADC can detect. This ADC has a resolution of about 5 mV. In other words, the analog voltage must increase or decrease by 5 mV for the digital output of the ADC to change by at least one bit. It would be inappropriate to save the voltage as an integer, because the only integers in this range are 0, 1, 2, 3, 4, and 5. Since the 9S12 does not support floating point, the voltage data will be saved in fixed-point format. Decimal fixed-point is chosen because the voltage data for this voltmeter will be displayed. A fixed-point resolution of 0.001 V is chosen because it is slightly smaller (better) than the ADC resolution. Table 10.1 shows the performance of the system. The table shows us that we need to store the variable part of the fixed-point number in a 16-bit variable.
Table 10.1 Performance data of a microcomputer-based voltmeter.
Vin (V) Analog input
N ADC digital output
I (0.001 V) variable part of the fixed-point data
0.000 0.005 1.000 2.500 5.000
0 1 205 512 1023
0 5 1000 2500 5000
One possible software formula to convert N into I is as follows. I (5000*N 512)/1023 It is very important to carefully consider the order of operations when performing multiple integer calculations. There are two mistakes that can happen. The first error is overflow, and it is easy to detect. Overflow occurs when the result of a calculation exceeds the range of the number system. The two solutions of the overflow problem were discussed earlier, promotion and ceiling/floor. The other error is called drop-out. Drop-out occurs after a right shift or a divide, and the consequence is that an intermediate result looses its ability to represent all of the values. To avoid drop-out, it is very important to divide last when performing multiple integer calculations. If you divided first, e.g., I 5000*(N/1023), then the
370
10 䡲 Numerical Calculations
values of I would be only 0, or 5000. The addition of “512” has the effect of rounding to the closest integer. The value 512 is selected because it is about one half of the denominator. For example, the calculation (5000*N)/1023 4 for N 1, whereas the “(5000*1 512)/1023” calculation yields the better answer of 5. The display algorithm is given as Program 10.1.
Program 10.1 Print unsigned 16-bit decimal fixed-point number to an output device.
n){ // fixed constant is 0.001 digits to the left of the decimal point decimal point tenths digit hundredths digit thousandths digit units
When adding or subtracting two fixed-point numbers with the same , we simply add or subtract their integer parts. First, let x, y, z be three fixed-point numbers with the same . Let x I•, y J•, and z K•. To perform z x y, we simply calculate K I J. Similarly, to perform z x y, we simply calculate K IJ. When adding or subtracting fixed-point numbers with different fixed parts, then we must first convert two the inputs to the format of the result before adding or subtracting. This is where binary fixedpoint is more convenient, because the conversion process involves shifting rather than multiplication/division. In this next example, let x,y,z be three binary fixed-point numbers with the different s. In particular, we define x to be I•25, y to be J•22, and z to be K•23. To convert x, to the format of z, we divide I by 4 (right shift twice). To convert y, to the format of z, we multiply J by 2 (left shift once). To perform z x y, we calculate K (I 2) (J 1) For the general case, we define x to be I•2n, y to be J•2m, and z to be K•2p. To perform any general operation, we derive the fixed-point calculation by starting with desired result. For addition, we have z x y. Next, we substitute the definitions of each fixedpoint parameter K•2p I•2n J•2m Lastly, we solve for the integer part of the result K I•2np J•2mp For multiplication, we have z x•y. Again, we substitute the definitions of each fixed-point parameter K•2p I•2n•J•2m Lastly, we solve for the integer part of the result K I•J•2nmp For division, we have z x/y. Again, we substitute the definitions of each fixed-point parameter K•2p I•2n/J•2m
10.2 䡲 *Extended Precision Calculations
371
Lastly, we solve for the integer part of the result K I/J•2nmp Again, it is very important to carefully consider the order of operations when performing multiple integer calculations. We must worry about overflow and drop out. In particular, in the division example, if (n m p) is positive then the left shift (I•2nmp) should be performed before the divide (/J). We can use these fixed-point algorithms to perform complex operations using the integer functions of our 9S12.
Example 10.1 Rewrite the following digital filter using fixed-point calculations. y x 0.0532672•x1 x2 0.0506038•y1 0.9025•y2 Solution In this case, the variables y, y1, y2, x, x1, and x2 are all integers, but the constants will be expressed in binary fixed-point format. The value 0.0532672 can be approximated by 14•28. The value 0.0506038 can be approximated by 13•28. Lastly, the value 0.9025 can be approximated by 231•28. The fixed-point implementation of this digital filter is y x x2 (14•x1 13•y1 231•y2) 8 Common Error: Lazy or incompetent programmers use floating-point in many situations where fixed-point would be preferable. Observation: As the fixed constant is made smaller, the accuracy of the fixed-point representation is improved, but the variable integer part also increases. Unfortunately, larger integers will require more bits for storage and calculations. Checkpoint 10.3: Using a fixed constant of 28, rewrite the digital equation F 1.8•C 32 in binary fixed-point format. Checkpoint 10.4: Using a fixed constant of 103, rewrite the digital filter y x 0.0532672•x1 x2 0.0506038•y1 0.9025•y2 in decimal fixed-point format. Checkpoint 10.5: Assume resistors R1, R2, R3 are the integer parts of 16-bit unsigned binary fixed-point numbers with a fixed constant of 24 ohms. Write an equation to calculate R3 R1 ll R2 (parallel combination.)
10.2
*Extended Precision Calculations In this section, we will study various techniques to perform extended precision calculations. Sometimes complex calculations can be performed simply by combining simpler operations, while at other times, more sophisticated algorithms will be required. Three 32-bit local variables are used in the examples of this section. For most situations, local variables are more appropriate than globals, although using globals is often faster and easier to debug. Assume there are 12 bytes allocated on the stack pointed to by the stack pointer SP, and the following local variable binding. N M P
set set set
0 4 8
;32-bit local ;32-bit local ;32-bit local
372
10 䡲 Numerical Calculations
10.2.1 Addition and Subtraction
Program 10.2 A 32-bit addition operation.
Program 10.2 gives a 32-bit addition algorithm. The approach starts with the least significant byte and uses the add-with-carry operation to combine the 8-bit additions to form the 32-bit operation.
; 32-bit addition P=N+M ; Input: Two 32-bit numbers N,M ; Output: One 32-bit sum P ; Error: C/V set for unsigned/signed overflow add32 ldaa N+3,sp ; start with least significant byte adda M+3,sp staa P+3,sp ldaa N+2,sp ; next byte adca M+2,sp ; carry from previous addition staa P+2,sp ldaa N+1,sp ; next byte adca M+1,sp ; carry from previous addition staa P+1,sp ldaa N,sp ; last byte adca M,sp ; carry from previous addition staa P,sp ; C bit set if unsigned overflow ; V bit set if signed overflow, Z bit is not correct
Checkpoint 10.6: Why isn’t the Z bit correct?
Program 10.3 gives a 32-bit subtraction algorithm. Again, the approach starts with the least significant byte and uses the subtract-with-borrow operation to combine the 8-bit subtractions to form the 32-bit operation. Similar to addition, the V and C bits are properly set, while the Z bit is incorrect.
Program 10.3 A 32-bit subtraction operation.
sub32 ldaa N+3,sp ; start with suba M+3,sp staa P+3,sp ldaa N+2,sp ; next byte sbca M+2,sp ; carry from staa P+2,sp ldaa N+1,sp ; next byte sbca M+1,sp ; carry from staa P+1,sp ldaa N,sp ; last byte sbca M,sp ; carry from staa P,sp ; C bit set if unsigned overflow ; V bit set if signed overflow, Z
least significant byte
previous addition
previous addition
previous addition
bit is not correct
Program 10.4 presents functions that add (R A B) and subtract (R A B) two unsigned 8-bit values, using promotion to detect for errors. The assembly language version implements the 16-bit local result in Register D. This C program was previously presented as Program 3.2.
10.2 䡲 *Extended Precision Calculations Program 10.4 Using promotion to detect and compensate for unsigned overflow errors.
A ;promote to 16 bits B #0 ;A+B (16 bits) #255 aOK #255 ;ceiling R ;demote A ;promote to 16 bits B #0 #0 sOK #0 R
;A-B (16 bits)
;floor ;demote
373
unsigned char A,B,R; void add(void){ unsigned short result; result = A+B; /* promote */ if(result>255){ /* overflow ?*/ result = 255; /* yes */ } R = result; /* demote */ } void sub(void){ short result; result = A-B; /* promote */ if(result<0){ /* underflow? */ result = 0; /* yes */ } R = result; /* demote */ }
Program 10.5 presents functions that add and subtract two signed 8-bit values, using promotion to detect for errors. The sex instruction is used to promote signed numbers. The C version was previously presented as Program 3.3. A 16-bit local variable holds the temporary result. Checkpoint 10.7: How do you force the C compiler to promote an intermediate calculation to signed 32-bits?
Program 10.5 Using promotion to detect and compensate for signed overflow errors.
add
aOK1
aOK2 sub
sOK1
sOK2
ldab sex pshd ldab sex addd cpd ble ldd cpd bge ldd stab rts ldab sex pshd ldab sex subd cpd ble ldd cpd bge ldd stab rts
A b,d
;promote to 16 bits
B b,d ;promote to 16 bits 2,s+ ;add A+B #127 aOK1 #127 ;ceiling #-128 aOK2 #-128 ;floor R ;demote B b,d A b,d 2,s+ #127 sOK1 #127 #-128 sOK2 #-128 R
;promote to 16 bits
;promote to 16 bits ;subtract A-B
;ceiling
;floor ;demote
char A,B,R; void add(void){ short result = A+B; /* if(result>127){ /* result = 127; /* } if(result<-128){ /* result = -128; /* } R = result; /* }
void sub(void){ short result = A-B; /* if(result>127){ /* result = 127; /* } if(result<-128){ /* result = -128; /* } R = result; /* }
Figure 10.1 A 32-bit shift left. Program 10.6 A 32-bit shift left operation.
The 32-bit shift left operation is described in Figure 10.1, and presented in Program 10.6. In particular, N N 1. When shifting left, you start from the least significant byte and proceed to the most significant byte. The operation can be used for signed or unsigned numbers. The C bit will be set if there was an unsigned overflow. The V bit will be set if there was a signed overflow. ROL
ROL C
ROL
C
asl32
asl rol rol rol ; C bit set ; V bit set
ASL
C
0
C
N+3,sp ; start with least significant byte N+2,sp ; next byte N+1,sp ; next byte N,sp ; last byte if unsigned overflow if signed overflow, the Z bit is incorrect
Observation: In assembly language it is critical to keep track of whether a number is signed or unsigned.
The signed 32-bit shift right operation is described in Figure 10.2, and presented in Program 10.7. In particular, N N 1. When shifting right, you start from the most significant byte and proceed to the least significant byte. The signed shift maintains the most significant bit, and overflow can not occur. When you shift right the least significant bit is lost. The second shift program will round the result up if the input was originally odd. For example, in the first program 5 1 is 2, and in the second program 5 1 is 3. Figure 10.2 A 32-bit signed shift right. Program 10.7 Two 32-bit signed shift right operations.
ASR
ROR C
ROR C
; simple shift without rounding asr32 asr N,sp ; MSByte ror N+1,sp ; next byte ror N+2,sp ; next byte ror N+3,sp ; last byte ; C bit set if was odd ; V and N not correct ;shift right with rounding asr32r asr N,sp ; MSbyte ror N+1,sp ; next byte ror N+2,sp ; next byte ror N+3,sp ; last byte ; C bit set if you should round up bcc Done inc N+3,sp ; round up bcc Done inc N+2,sp ; round up bcc Done inc N+1,sp ; round up bcc Done inc N,sp ; round up Done ; no valid CCR flags
ROR C
C
10.2 䡲 *Extended Precision Calculations
375
Checkpoint 10.8: Assuming the input is 100001 in decimal, how are the results of the two shifts in Program 10.7 different?
The unsigned 32-bit shift right operation is described in Figure 10.3, and presented in Program 10.8. In particular, N N 1. The unsigned shift right will clear the most significant bit, and overflow can not occur. The second shift program will round the result up if the input was originally odd. Figure 10.3 A 32-bit unsigned shift right. Program 10.8 Two 32-bit unsigned shift right functions.
LSR
ROR
0
C
lsr32 lsr ror ror ror ; C bit set
ROR C
ROR C
C
N,sp ; MSbyte N+1,sp ; next byte N+2,sp ; next byte N+3,sp ; last byte if was odd
;shift right with rounding lsr32r lsr N,sp ; MSbyte ror N+1,sp ; next byte ror N+2,sp ; next byte ror N+3,sp ; last byte ; C bit set if you should round up bcc Done inc N+3,sp ; round up bcc Done inc N+2,sp ; round up bcc Done inc N+1,sp ; round up bcc Done inc N,sp ; round up Done ; no valid CCR flags
10.2.3 Mathematical Instructions on the 9S12 Figure 10.4 The emul and emuls instructions take two 16-bit inputs and generate a 32-bit product.
When designing the 9S12, Freescale added a few mathematical instructions not available on the 6811. These instructions are quite useful when implementing mathematical calculations. The emul instruction performs a 16-bit by 16-bit unsigned multiply RegY:D RegY*RegD, as shown in Figure 10.4. The emuls instruction is a 16-bit by 16-bit signed multiply, using the same registers and generating the same condition code bits. Reg Y
Reg D
Reg Y Reg D
16 bits
16 bits
32 bits
Condition Code Bits after R Y*D N: result is negative, N R31 Z: result is zero, Z not(R31)•not(R30)• . . . •not(R2)•not(R1)•not(R0) C: R15, bit 15 of the result The ediv instruction performs a 32-bit by 16-bit unsigned divide RegY (Y:D)/RegX, RegD is remainder, as shown in Figure 10.5. The edivs instruction is a 32-bit by 16-bit signed divide, using the same registers. The overflow bit calculation is different, but the other three condition code bits are the same.
376
10 䡲 Numerical Calculations
Figure 10.5 The ediv and edivs instructions perform extended precision division.
Reg Y
Reg D
32 bits
Reg X
Reg Y
16 bits
16 bits
remainder
Reg D
Condition Code Bits after R (Y:D)/X N: result is negative (undefined after an overflow or a divide by zero), N R15 Z: result is zero (undefined after an overflow or a divide by zero) Z not(R15)•not(R14)• . . . •not(R2)•not(R1)•not(R0) V: overflow (undefined after a divide by zero), ediv result $FFFF edivs result $7FFF or less than $8000 C: divide by zero, C not(X15)•not(X14)• . . . •not(X2)•not(X1)•not(X0)
Example 10.2 Write software to execute the following equation, where I Xd and Xm are 16-bit signed integers. This equation implements the integral term of a proportionalintegral-derivative (PID) controller. Xd is the desired speed and Xm is the measured speed. I I 0.157658 •(Xd Xm) Solution First search the space of all integers M and N, such that M and N are less than 1000 and M/N is very close to 0.157658. There are only 1,000,000 possibilities, so it doesn’t take long to find the best possibility, which is 35/222 (0.157657658). Notice the accuracy of this approximation is good to more than 5 decimal places. In C, we could execute I = I+(35*(Xd-Xm))/222;
Overflow could occur in the addition and subtraction, but not in the multiply or divide. The assembly solution is shown in Program 10.9. Program 10.9 Fixed-point calculation using the emuls and edivs instructions.
ldd Xd subd Xm bvs error ldy #35 emuls ldx #222 edivs tfr y,d addd I bvs error std I
; Xd-Xm
;32-bit Y:D is 35*(Xd-Xm) ;16-bit Y is (35*(Xd-Xm))/222 ;I+(35*(Xd-Xm))/222
The emacs instruction performs a 16-bit by 16-bit signed multiply, followed by a 32-bit signed addition. It uses indexed addressing to access the two 16-bit inputs and extended addressing to access the 32-bit sum. Recall that {X} and {Y} represent the 16-bit contents pointed to by Registers X and Y respectively. If we define U as the 32-bit contents of memory location U, then emacs U calculates = + {X}*{Y}
Condition Code Bits, first P X*Y, then R M P (M,P,R are 32 bits) N: result is negative, N R31 Z: result is zero, Z not(R31)•not(R30)• . . . •not(R2)•not(R1)•not(R0)
10.2 䡲 *Extended Precision Calculations
377
V: signed overflow (after addition), V P31•M31•not(R31) not(P31)•not(M31)•R31 C: unsigned overflow (after addition), C P31•M31 M31•not(R31) not(R31)•P31 The emacs instruction is quite useful for calculating fixed-point equations.
Example 10.3 Write assembly code to implement the following digital filter. The input parameters are all signed 16-bit integers. yy 0.902*xx[0] 1.81*xx[1] 0.045*xx[2] Solution First, we can convert this equation to decimal fixed-point without introducing any error. In C, we could execute yy = (902*xx[0]-1810*xx[1]+45*xx[2])/1000;
In this case the inputs are signed integers, but the solution will also apply to situations where xx and yy are fixed-point numbers as long as xx and yy have the same resolution. To use the emacs instruction place the input variables consecutively in RAM, and the constants 902, 1810, 45 consecutively in ROM. Registers X and Y are not automatically incremented, so the program performs that task explicitly. The assembly code is shown in Program 10.10. Program 10.10 Fixed-point calculation using the emacs instruction.
$0800 6 ;three 16-bit signed integers 2 ;filter output 4 ;temporary 32-bit result $4000 902,-1810,45 #xx ;pointer to data #cc ;pointer to coeficients #0,acc ;initially clear temporary result #0,acc+2 #3 ;number of terms acc ;acc=acc+{X}*{Y} 2,x 2,y A,loop acc acc+2 ;Y:D=902*xx[0]-1810*xx[1]+45*xx[2] #1000 yy
;(902*xx[0]-1810*xx[1]+45*xx[2])/1000
Program 10.11 presents assembly programs that multiply two 32-bit unsigned numbers (N,M), yielding a 64-bit product (P N*M). The approach first considers each input as having two 16-bit components (i.e., N 216*msN lsN and M 216*msM lsM). The next step is to perform four executions of the 16 by 16-bit emul instruction. The final 64-bit product is the sum of the four partial products. P N*M (216*msN lsN)*(216*msM lsM) (232*msN*msM) (216*lsN*msM) (216*msN*lsM) (lsN*lsM) Program 10.12 presents assembly programs that divide a 64-bit unsigned dividend (N), by a 32-bit divisor (M) yielding a 32-bit quotient (Q N/M) and a 32-bit remainder (R).
378
10 䡲 Numerical Calculations
Program 10.11 A 32-bit mulitply function.
; Input: Two 32-bit numbers N(SP+2),M(SP+4), call by reference ; Output: One 64-bit product P(SP+6), return by reference ; modifies Reg A,B,X,Y mult32 ; P=(2**32)*msN*msM ldx 2,sp ; pointer to N ldy 0,x ; msw of N ldx 4,sp ; pointer to M ldd 0,x ; msw of M emul ; Y:D=msN*msM ldx 6,sp ; pointer to P sty 0,x std 2,x ; set high 32 bits of P ; P=P+lsN*lsM ldx 2,sp ; pointer to N ldy 2,x ; lsw of N ldx 4,sp ; pointer to M ldd 2,x ; lsw of M emul ; Y:D=lsN*lsM ldx 6,sp ; pointer to P sty 4,x std 6,x ; set low 32 bits of P ; P=P+65536*lsN*msM ldx 2,sp ; pointer to N ldy 2,x ; lsw of N ldx 4,sp ; pointer to M ldd 0,x ; msw of M emul ; Y:D=lsN*msM ldx 6,sp ; pointer to P addd 4,x std 4,x ; add to P tfr y,d adcb 3,x adca 2,x std 2,x ; add to P ldd 0,x adcb #0 adca #0 std 0,x ; P=P+65536*msN*lsM ldx 2,sp ; pointer to N ldy 0,x ; msw of N ldx 4,sp ; pointer to M ldd 2,x ; lsw of M emul ; Y:D=msN*lsM ldx 6,sp ; pointer to P addd 4,x std 4,x ; add to P tfr y,d adcb 3,x adca 2,x std 2,x ; add to P ldd 0,x adcb #0 adca #0 std 0,x rts
10.2 䡲 *Extended Precision Calculations Program 10.12 A 64-bit by 32-bit divide function.
N R M Q
rmb rmb rmb rmb rmb
4 4 4 4 4
; ; ; ; ;
64-bit dividend low part of dividend becomes remainder 32-bit divisor pad quotient
; 64-bit by 32-bit into 32-bit divide Q=N/M, R=N%M ; Input: 64-bit dividend, global N (replaced with R) ; 32-bit divisor, global M (destroyed) ; Output: 32-bit quotient, global Q ; 32-bit remainder, global R ; V bit set on overflow (R>=M) ; C,V bits set on divide by zero (M=0) ; modifies Reg A,B,X,Y i set 0 ; loop counter div32 leas -1,s ; allocate i ldd M bne d32A ; divisor not zero ldd M+2 bne d32A ; divisor not zero sev ; divide by zero sec bra d32E d32A movw #0,M+4 movw #0,M+6 ; divisor is 64 bits, right justified movw #0,Q movw #0,Q+2 ; quotient=0 movb #32,i,s ; i=0 d32B ldx #M jsr lsr64 ; M=M>>1 ldx #Q jsr lsl32 ; Q=Q<<1 ldx #N ldy #M jsr cmp64 ; N-M blo d32C ; skip if N<M ldx #Q jsr inc32 ldx #N ldy #M jsr sub64 d32C dec i,s bne d32B ldx #R ldy #M+4 ; no overflow if R<M jsr cmp32 ; R-M blo d32D ; skip if R<M clc sev ; overflow, R>=M bra d32E d32D clc clv ; no error d32E leas 1,s rts
379
380
10 䡲 Numerical Calculations
The implementations of the subroutines can be found in the math.rtf file installed as part of TExaS.
10.2.5 Table Lookup and Interpolation
Many applications require the representation of complex waveforms in digital form. A typical application is a calibration curves that describe the input/output behavior of the system. The electronic hardware takes the measurand (y is position, pressure, temperature etc.) as input and has the ADC conversion (x is 0 to 1023) as output. In this situation, the software algorithm is asked to reverse the process, taking as input (x) the ADC measurement and giving as output (y) is the measurand. One of the most efficient, yet simple, techniques for describing nonlinear equations is to provide a small table of (x,y) points then use linear interpolation between the points. In this way, the response is piecewise linear. This technique only works for any single-valued data set (a unique y for each x). There is a clear tradeoff between accuracy and software efficiency (static and dynamic). E.g., you can add more points to improve accuracy, but it requires more memory and runs slower.
Example 10.4 Design a fixed-point sin() function using table lookup and interpolation. The input is an 8-bit unsigned fixed-point with a resolution of 2/256, and the output is an 8-bit signed fixed-point with a resolution of 1/127. Solution Rather than have a complete table of all 256 possibilities, we will store a subset of individual points and use interpolation between the points, as shown in Table 10.2 and Figure 10.6. Let x be the input (0 x 2) and Ix be the integer portion (0 to 255). Similarly, y is the output (1 y 1) and Iy is the integer portion (127 to 127). To use the tbl instruction, we need to create two unsigned 8-bit tables values containing the specific (input, output) pairs for the conversion. To make the table unsigned, we will store Iy 128 (offset binary).
Table 10.2 Fixed-point implementation of a sin function.
10.3 䡲 Expression Evaluation Figure 10.6 A table contains specific points and the software will use linear interpolation to fill in the gaps.
381
256 192 Iy 128 64 0
0
64
128 Ix
192
256
The two tables consists of multiple unsigned (x,y) pairs, which define a piece-wise linear function. The first x entry must be less than or equal to minimum input, and the last x entry must be bigger than maximum input. The table must be monotonic in x. See Program 10.13. Program 10.13 Interpolation using the tbl instruction.
10.3
IxTab fcb 0,13,26,38,51,64,77,90,102,115,128,141 fcb 154,166,179,192,205,218,230,243,255 IyTab fcb 128,167,203,231,249,255,249,231,203,167 fcb 128,89,53,25,7,1,7,25,53,89,128 ;**********Sin******************* ;Inputs: RegA is 0 to 254 Xdata point, Ix ; RegA input must be greater than or equal to first IxTab point ; RegA input must be less than last IxTab point ;Output: RegA is -127 to +127 Iy point ;Registers destroyed: X,Y,B,CCR Sin ldx #IxTab ;first find x1<=Ix<x2 ldy #IyTab lookx1 cmpa 1,x ;check Ix<x2 blo found ;stops when X points to x1 inx iny bra lookx1 found suba 0,x ;Ix-x1 clrb ;D=256*(Ix-x1) pshd ldab 1,x ;x2 subb 0,x ;x2-x1 clra ;D=x2-x1 tfr D,X ;X=x2-x1 puld ;D=256*(Ix-x1) idiv ;X=(256*(Ix-x1))/(x2-x1) tfr X,D ;B=(256*(Ix-x1))/(x2-x1), Y points to y1,y2 tbl 0,y ;A=Y1+B*(Y2-Y1) suba #-128 ;convert to 2’s complement rts
Expression Evaluation In this section, we will develop methods to implement mathematical and logical operations that have many terms and many parentheses. When the expression is simple, no formal approach is required, and registers can be used to store intermediate results. For example, Program 10.14 shows the assembly code to calculate P 3*M N, where P, M, and N are 8-bit unsigned global variables.
382
10 䡲 Numerical Calculations
Program 10.14 A simple expression evaluation.
ldaa ldab mul addb stab
#3 M ; Reg D=3*M ; Reg B=3*M+N
N P
To increase the precision of the expression, we replace instructions with subroutine calls as needed. Program 10.15 shows the assembly code to calculate P 3*M N, where P, M, and N are 16-bit unsigned global variables. Program 10.15 16-bit expression evaluation.
ldd ldy emul addd std
#3 M N P
; Reg D=3*M ; Reg D=3*M+N
As the complexity of the expression increases, we need a place to save intermediate results. The stack is a natural place to store temporary results. Polish notation is a prefix notation used in logic and arithmetic operations. The Polish logician Jan Lukasiewicz invented this notation around 1920 in order to simplify sentential logic. The following expression (prefix notation): *23 evaluates to 6. This more complex expression: *1234 can sometimes be written as (* ( 1 2) ( 3 4)) and evaluates to 21. Lisp s-expressions employ Polish notation. Reverse Polish notation (RPN) is a postfix notation, invented by Australian philosopher and computer scientist Charles Hamblin in the mid-1950s. Edsger Dijkstra invented the “shunting yard” algorithm, which converts from infix notation to RPN. The following rules are used when evaluating expressions in Reverse Polish Notation: 䡲 䡲 䡲 䡲
Numbers are pushed on the stack Values of the variables are pushed on the stack For unary functions the input is popped and result is pushed For binary functions both inputs are popped and result is pushed
Table 10.3 illustrates the notation in which the parameters come first. Any expression that is written in regular format with paratheses can be rewritten in an equivalent Reverse Polish Notation. Compilers possess the ability to perform this translation automatically, but assembly language programmers can translate their expressions by hand.
Table 10.3 Examples of Reverse Polish notation.
Regular expression
Reverse Polish Notation
3*M N ⬃(M|(N&P)) M*(5 P) N/10 wxyz4
3M*N NP&M|⬃ 5 P M * N 10 / wxyz4
10.4 䡲 *IEEE Floating-Point Numbers
383
Program 10.16 shows the assembly code to calculate P M*(5 P) N/10, where P, M, and N are 8-bit unsigned global variables. Let buffer be a 10-byte RAM area, which will be used as a data stack. Register Y points to this data stack, which is used in a manner similar to Register SP and the regular stack. The subroutine Add pops two elements off the data stack, adds them, and pushes the result on the data stack. The subroutine Mult pops two elements off the data stack, multiplies them, and pushes the result on the data stack. Notice how easy it is to implement a data stack with the 9S12. Program 10.16 16-bit expression evaluation of P M*(5 P) N/10.
#buffer+10 #5,-1,y ; P,-1,y ; Add M,-1,y ; Mult N,-1,y ; #10,-1,y ; Div Sub 1,y+,P ;
; empty push 5 push P push M push N push 10
pop
Add
ldaa 1,y+ ;add top two,pop adda 0,y staa 0,y rts
Mult
ldaa 1,y+ ;multiply top two,pop ldab 0,y mul stab 0,y rts
Checkpoint 10.9: Write an assembly subroutine to implement Sub, which pops two elements off the data stack, subtracts them, and pushes the result on the data stack. Write an assembly subroutine to implement Div, which pops two elements off the data stack, divides them, and pushes the result on the data stack.
10.4
*IEEE Floating-Point Numbers Although, we will not consider using floating-point operations on the 9S12 in this introductory text, it is appropriate to know the definition of floating-point. NASA believes that there are on the order of 1021 stars in our Universe. Manipulating large numbers like these is not possible using integer or fixed-point formats. Other limitation with integer or fixed-point numbers is there are some situations where the range of values is not known at the time software is designed. In a Physics research project, you might be asked to count the rate at which particles strike a sensor. Since the experiment has never been tried before, you do not know in advance whether there will be 1 per second or 1 trillion per second. The applications with numbers of large or unknown range can be solved with floating-point numbers. Floating-point is similar in format to fixed-point, except the exponent is allowed to change at run time. Consequently, both the exponent and the mantissa will be stored. Just like with fixed-point numbers we will use binary exponents for internal calculations, and decimal exponents when interfacing with humans. This number system is called floating-point because as the exponent varies the binary point or decimal point moves.
384
10 䡲 Numerical Calculations Observation: If the range of numbers is unknown or large, then the numbers must be represented in a floating-point format. Observation: Floating-point implementations on computers like the 9S12 that do not have hardware support are extremely long and very slow. So, if you really need floating point, a computer with hardware support is highly desirable.
The IEEE Standard for Binary Floating-Point Arithmetic or ANSI/IEEE Std 754-1985 is the most widely used format for floating point numbers. The three most common IEEE formats are presented in this section: single-precision (32-bit), double-precision (64-bit), and double-extended precision (80-bits). The floating-point format, f, for the single-precision data type is shown in Figure 10.7. Bit 31 Bits 30:23 Bits 22:0
Mantissa sign, s 0 for positive, s 1 for negative 8-bit biased binary exponent 0 e 255 24-bit mantissa, m, expressed as a binary fraction, a binary 1 as the most significant bit is implied. m 1 • m1m2m3 . . . m23
The value of a single-precision floating-point number is f (1)s • 2e127• m The range of values that can be represented in the single-precision format is about 1038 to 1038. The 24-bit mantissa yields a precision of about seven decimal digits. The floating point value is zero if both e and m are zero. Because of the sign bit, there are two zeros, positive and negative, which behave the same during calculations. To illustrate floatingpoint, we will calculate the single-precision representation of the number 10. To find the binary representation of a floating point number first extract the sign. 10 (1)0 •10 Step 2, you multiply or divide by two until you get a number greater than or equal to 1, but less than 2. 10 (1)0 •23• 1.25 Step 3, the exponent e is equal to the number of divide by twos plus 127. 10 (1)0 •2130 127• 1.25 Step 4, separate the 1 from the mantissa. Recall that the 1 will not be stored. 10 (1)0 •2130 127• (1 0.25) Step 5, express the mantissa as a binary fixed-point number with a fixed constant of 223. 10 (1)0 •2130 127• (1 2097152•223) Step 6, convert the exponent and mantissa components to hexadecimal. 10 (1)0 •2$82 127• (1 $200000•223) Step 7, extract s, e, m terms, convert hexadecimal to binary 10 (0,$82,$200000) (0,10000010,01000000000000000000000) Sometimes this conversion does not yield an exact representation, as in the case of 0.1. In particular, the fixed-point representation of 0.6 is only an approximation.
There are some special cases for floating point numbers. When e is 255, the number is considered as plus or minus infinity, which probably resulted from an overflow during calculation. When e is 0, the number is considered as denormalized. The value of the mantissa of a denormalized number is less than 1. A denormalized short result number has the value, f (1)s • 2126• m where m 0 • m1m2m3 . . . m23 Observation: The floating point zero is stored in denormalized format.
The floating-point format for the double-precision data type is shown in Figure 10.8. Bit 63 Bits 62:52 Bits 51:0
Mantissa sign, s 0 for positive, s 1 for negative 11-bit biased binary exponent 0 e 2047 52-bit mantissa, m, expressed as a binary fraction, a binary 1 as the most significant bit is implied. m 1 • m1m2m3 . . . m52
The value of a double-precision floating-point number is g (1)s • 2e 1023• m The range of values that can be represented in the double-precision format is about 10308 to 10308. The 53-bit mantissa yields a precision of about 15 decimal digits. There are two zeros, positive and negative, which behave the same during calculations. The floating-point format, t, for the double-extended data type is shown in Figure 10.9. In actuality, double-extended refers to any floating point format with 79 or more bits. Bit 79 Bits 78:64 Bits 63:0
Mantissa sign, s 0 for positive, s 1 for negative 15-bit biased binary exponent 0 e 32767 entire 64-bit mantissa, m, expressed as a binary fraction. m m0 • m1m2m3 . . . m63
The value of a double-extended floating-point number is f (1)s • 2e16383• m The range of values that can be represented in the double-extended format is 104932 to 104932. The 64-bit mantissa yields a precision of about 19 decimal digits. There are two zeros, positive and negative, which behave the same during calculations. Notice, that the most significant mantissa bit, m0, is explicitly stored. Normalized float numbers have m0 equal to 1. When m0 equals 0, the number is unnormalized. The mantissa for unnormalized and denormalized numbers are both less than 1. The difference is the unnormalized number can be any value, whereas the exponent of a denormalized number is the smallest possible value. When two floating point numbers are added or subtracted, the smaller one is first unnormalized. The mantissa of the smaller number is shifted right and its exponent is incremented until the two numbers have the same exponent. Then, the mantissas are added or subtracted. Lastly, the result is normalized. To illustrate the floating point addition, consider the case of 10 0.1. First, we show the original numbers in floating point format. The mantissa is shown in binary format. 10.0 = (-1)0 • 23 • 1.01000000000000000000000 + 0.1 = (-1)0 • 2-4 • 1.10011001100110011001101
Everytime the exponent is incremented the manitissa is shifted to the right. Notice that 7 binary digits are lost. The 0.1 number is unnormalized, but now the two numbers have the same exponent. Often the result of the addition or subtraction will need to be normalized. 10.0 = (-1)0 • 23 • 1.01000000000000000000000 + 0.1 = (-1)0 • 23 • 0.00000011001100110011001 1001101 10.1 = (-1)0 • 23 • 1.01000011001100110011001
When two floating point numbers are multiplied, their mantissas are multiplied and their exponents are added. When dividing two floating point numbers, their mantissas are divided and their exponents are subtracted. After multiplication and division, the result is normalized. To illustrate the floating point multiplication, consider the case of 10*0.1. Let m1, m2 be the values of the two mantissas. Since the range is 1 m1, m2 2, their product will vary from 1 m1* m2 4. 10.0 = (-1)0 •23 • 1.01000000000000000000000 * 0.1 = (-1)0 •2-4 • 1.10011001100110011001101 1.0 = (-1)0 •2-1 •10.00000000000000000000000
The result needs to be normalized. 1.0 = (-1)0 •20 • 1.00000000000000000000000 Checkpoint 10.10: Why can’t you use your calculator to find the s, e, and m terms of a number in double-extended format?
Roundoff is the error that occurs as a result of an arithmetic operation. For example, the multiplication of two 64-bit mantissas yields a 128-bit product. The final result is normalized into a normalized floating point number with a 64-bit mantissa. Roundoff is the error caused by discarding the least significant bits of the product. Roundoff during addition and subtraction can occur in two places. First, an error can result when the smaller number is shifted right. Second, when two n-bit numbers are added the result is n 1 bits, so an error can occur as the n 1 sum is squeezed back into an n-bit result. Truncation is the error that when a number is converted from one format to another. For example, when a double-extended number is converted to a short real format, 40 bits are lost as the 64-bit mantissa is truncated to fit into the
10.5 䡲 Tutorial 10 Overflow and Dropout
387
24-bit mantissa. Recall, the number 0.1 could not be exactly represented as a short real floating point number. This is an example of truncation as the true fraction was truncated to fit into the finite number of bits available. Observation: Computers use binary floating-point because it is faster to shift than it is to multiply/divide by 10.
10.5
Tutorial 10 Overflow and Dropout The purpose of this tutorial is to study overflow and dropout errors during integer calculations. The objective of the software is to calculate the circumference of a circle given its radius. c2r Assume r is an unsigned 8-bit fixed-point number with a resolution of 0.1 cm. c is also fixed-point, but we will consider both 8-bit and 16-bit implementations for c. I.e., c C*0.1 cm and r R*0.1 cm, where C and R are unsigned integers. The values of r range from 0.0 to 25.5 cm. We substitute the definitions of c and r into the equation to get an exact relationship between input R and output C, C 2**R We need to convert this equation to a function with integer operations. One simple possibility is C 6283*R/1000 The difficulty with this equation is the numbers are big. If we search the space of all integers (I1,I2) less than 255, such that I1/I2 is as close to 2 as possible, we find this possibility C 245*R/39 Action: Copy the Tutor10.rtf Tutor10.uc Tutor10.xls. files from the web onto your hard drive. Open the Excel spreadsheet Tutor10.xls. The first column, I1 is simply all the integers from 0 to 255. The second column, I2, is the closest integer such that I1/I2 is about 2. The third column is I1/I2, and the fourth column shows the error between I1/I2 and 2. Question 10.1 What is the error between the approximation 245/39 and exact 2? Compare this error to the resolution 0.1cm. Do we need a better approximation for 2? Question 10.2 Search the space of all pairs of integers (I1,I2) less than 255, such that I1/I2 is as close to as possible. Action: Open, assemble and run the program Tutor10.rtf. There are four implementations of the circumference function. In each case, the multiply is 8-bit by 8-bit into 16-bits, and the divide is 16-bit by 16-bit into 16-bit. The first function implements C1 (245*R)/39 where C1 is demoted back to 8 bits. The second function implements C2 245*(R/39) where C2 is 16 bits. The third function implements C3 (245*R)/39 where C3 is 16 bits. The fourth function implements C4 (245*R 19)/39 where C4 is 16 bits. The test cases and results are shown in Table T10.1. r (cm)
c (cm)
C (0.1 cm)
C1 (0.1 cm)
C2 (0.1 cm)
C3 (0.1 cm)
C4 (0.1 cm)
0.0 2.4 4.8 7.2 9.6
0.0000 15.0796 30.1593 45.2389 60.3186
0 151 302 452 603
0 150 45 196 91
0 0 245 245 490
0 150 301 452 603
0 151 302 452 603
388
10 䡲 Numerical Calculations Question 10.3 What is the name of the error that occurs in program Circumference1. Why did it occur? How do we solve this type of error? Question 10.4 What is the name of the error that occurs in program Circumference2. Why did it occur? Question 10.5 Notice that Circumference4 is a little better than Circumference3, because it properly handles the round to closest integer during the divide. Consider the general situation of unsigned integer division Q I/J. Q is the quotient, I is the dividend and J is the divisor. How can this calculation be modified to round to closest integer?
10.6
Homework Problems
Homework 10.1 Give an approximation of s2 using the decimal fixed-point ( 0.001) format.
Homework 10.2 Give an approximation of s2 using the binary fixed-point ( 28) format. Homework 10.3 Assume M and N are two integers, each less than 1000. Find the best set of M and N, such that M/N is approximately s2. (Like 7/5, but much more accurate). Use the Tutor10.xls spreadsheet described on Tutorial 10. Homework 10.4 Assume M and N are two integers, each less than 1000. Find the best set of M and N, such that M/N is approximately . (Like 22/7, but much more accurate). Use the Tutor10.xls spreadsheet described on Tutorial 10. Homework 10.5 Give an approximation of s101 using the decimal fixed-point ( 0.01) format. Homework 10.6 Give an approximation of s99 using the binary fixed-point ( 24) format. Homework 10.7 A signed 16-bit binary fixed point number system has a resolution of 1/256. What is the corresponding value of the number if the integer part stored in memory is 384? Homework 10.8 A unsigned 16-bit decimal fixed point number system has a resolution of 1/100. What is the corresponding value of the number if the integer part stored in memory is 384? Homework 10.9 First, rewrite the following digital filter using decimal fixed-point math. Assume the inputs are unsigned 8-bit values (0 to 255). Then, rewrite it so that it can be calculated with integer math using the fact that 0.11111 is about 1/9 and 0.088889 is about 4/45 and 0.8 is 4/5. In both cases, the calculations are to be performed in 16-bit unsigned integer form without overflow. y 0.11111•x 0.08889•x1 0.80000•y1 Homework 10.10 Does the associative principle hold for signed integer multiply and divide? Assume Out1 Out2 A B C are all the same precision (e.g., 16 bits). In particular do these two C calculations always achieve identical outputs? If not, give an example. Out1 = (A*B)/C; Out2 = A*(B/C); Homework 10.11 Does the associative principle hold for signed integer addition and subtraction? Assume Out3 Out4 A B C are all the same precision (e.g., 16 bits). In particular do these two C calculations always achieve identical outputs? If not, give an example. Out3 = (A+B)-C; Out4 = A+(B-C); Homework 10.12 Write an assembly subroutine that implements an averaging filter. The three 16-bit unsigned numbers are passed into the subroutine by value in Registers D, X and Y. The average is (first second third)/3. The return parameter is passed back in Register D. If you need a temporary variable, you should use the stack. Homework 10.13 Give the short real floating point representation of s2.
10.6 䡲 Homework Problems
389
Homework 10.14 Give the short real floating point representation of 134.4. Homework 10.15 Give the short real floating point representation of 0.0123. Homework 10.16 Perform the operation 10 in short real floating point format. Determine the difference between what you got and what you should have gotten (10 ). This error has two components: truncation error that results in the approximation itself and roundoff error that occurs during the addition. Homework 10.17 Perform the operation 0.1*0.1 in short real floating point format. Determine the difference between what you got and what you should have gotten (0.01). This error has two components: truncation error that results in the approximation of 0.1 itself and roundoff error that occurs during the multiplication. Homework 10.18 Perform the operation * in short real floating point format. Determine the difference between what you got and what you should have gotten (2). This error has two components: truncation error that results in the approximation of itself and roundoff error that occurs during the multiplication. Homework 10.19 Write assembly code that calculates random numbers. The return parameter is passed back in Register D. Show all the code including subroutines: unsigned short Next; // 16-bit unsigned short random(void) { Next = Next*0x4E6D + 12345; return(Next); // ignore overflow } Homework 10.20 Write assembly code that calculates random numbers. The return parameter is passed back in Register D. You may call any of the subroutines presented in this chapter. unsigned long Next; // 32-bit unsigned short random(void) { Next = Next*0x1A504E6D + 123456789L; return(Next); // return lower 16 bits } Homework 10.21 Assume we have 10-dimensional vectors, stored as 10-element arrays. For example, let the vector X equal (x0, x1, . . . x9). Each value is a signed 8-bit integer. Write assembly code that finds the dot-product of two 10-element vectors. X•Y x0*y0 x1*y1 . . . x9*y9 The two parameters are passed by reference using registers. The result is to be returned as a 16-bit signed value in Register D. Local variables must be allocated on the stack. A typical calling sequence is ldx #vector1 ; pointer to the first vector ldy #vector2 ; pointer to the second vector jsr DotProduct Homework 10.22 Write a subroutine to implement linear regression. The calling sequence is ldx #DataSet1 ; pointer to data structure jsr Regression ; Reg X = m =slope as a fixed point ; Reg Y = b =offset as a fixed point ; Reg D = e =average error as a fixed point Input x,y numbers will be in signed twos complement 8-bit decimal fixed point with a resolution, , of 0.01. These 8-bit numbers have effective values which range from 1.27 to 1.28. Outputs m,b,e will be in signed twos complement 16-bit decimal fixed point with a resolution, , of 0.0001. Therefore, the 16-bit numbers have effective values which range from 3.2768 to 3.2766. Set all the results (m,b,e) equal to 3.2767 ($7FFF) if any data overflow or divide by zero occurs. A typical array structure looks like: DataSet1 fdb 3 ; number of data points fcb 0,1 ; (x,y) = (0 , 0.01) fcb 10,2 ; (x,y) = (0.1 , 0.02) fcb 20,3 ; (x,y) = (0.2 , 0.03)
390
10 䡲 Numerical Calculations For this example b 0.01, m 0.1, e 0, so Reg X is returned as 100, Reg Y is 1000, and Reg D is 0. Let (x0,y0) and (x1,y1) be two points, then the slope and intercept of the “y mx b” line through those points is given by m
y1 y0 x1 x0
b
y0•x1 y1•x0 x1 x0
and e 0
In general, let x(i) and y(i) be arrays of length n 2. Each of the following sums range from i 0 to n 1. n• a (x(i)•y(i)) a x(i)• a y(i) m
a y(i) a (x(i)•x(i))a (x(i)•y(i)) a x(i) and b
n• a (x(i)•x(i)) a x(i)• a y(i)
n• a (x(i)•x(i))a x(i)• a x(i)
For n 2, the average error is defined as
a |y(i)m•x(i) b| e
n
Homework 10.23 Write a subroutine to implement determinant. The calling sequence is: ldx #DataSet1 ; pointer to matrix jsr Determinant ; Reg D = determinant of the matrix as a fixed point number All matrix numbers will be in signed 16-bit decimal fixed point with a resolution, , of 0.0001. Therefore, the 16-bit numbers have effective values which range from 3.2768 to 3.2767. A typical matrix structure looks like: DataSet1 fcb 4 ; 4 by 4 matrix fdb 0,1000,2000,3000 ; (0.0, 0.1, 0.2, 0.3) fdb 4000,5000,6000,7000 ; (0.4, 0.5, 0.6, 0.7) fdb 8000,9000,10000,11000 ; (0.8, 0.9, 1.0, 1.1) fdb 12000,13000,14000,15000 ; (1.2, 1.3, 1.4, 1.5) Set the result equal to 3.2767 ($7FFF) if a data overflow or divide by zero occurs. Let A be a square matrix of size n 1. Let Aij be the reduced matrix with the ith row and jth column removed. a00 a10 A § Á a n0
Á Á aij Á
a01 a11 Á an1
a0n a1n ¥ Á ann
E.g., if A is a 4 by 4 matrix, the Aij is a 3 by 3 matrix (showing just the variable parts). If 0.0 0.4 A § 0.8 1.2
0.1 0.5 0.9 1.3
0.2 0.6 1.0 1.4
0.3 0.7 ¥ 1.1 1.5
then 0.0 A 21 £ 0.4 1.2
0.2 0.6 1.4
0.3 0.7 ≥ 1.5
The determinate of A is defined as in1
|A|
a i0
(1) i a 0i| A 0i|
10.6 䡲 Homework Problems
391
E.g., 0.0 0.4 ∞ 0.8 1.2
0.1 0.5 0.9 1.3
0.2 0.6 1.0 1.4
0.3 0.5 0.7 ∞ 0.0 • ∞ § 0.9 1.1 1.3 1.5
0.6 1.0 1.4
0.4 0.2 • † £ 0.8 1.2
0.7 0.4 1.1 ¥ ∞ 0.1 • ∞ § 0.8 1.5 1.2 0.5 0.9 1.3
0.7 0.4 1.1 ≥ † 0.3 • † £ 0.8 1.5 1.2
0.6 1.0 1.4 0.5 0.9 1.3
0.7 1.1 ¥ ∞ 1.5 0.6 1.0 ≥ † 1.4
The special case for n 1 is |A| `
a00 a10
a01 ` a00 • a11 a10 • a10 a11
Homework 10.24 Normally, we use the microcomputer hardware stack (RAM pointed to by RegS) to: 1. Hold return addresses when we call subroutines, and software interrupts 2. Store temporary data like our local variables 3. Pass data into functions In this problem, we will develop the concept of a separate data stack (LIFO) to pass parameters between our functions. We will still use the microcomputer hardware stack for the return addresses and local variables. We begin with a fixed allocation of the data stack area. In RAM we define: Size equ 10 Data stack size in bytes Stack rmb Size If we dedicate index RegY as the data stack pointer (rather than defining a global memory pointer) then the program execution speed will be improved at the expense of the inconvenience of dedicating RegY to be used only for this one purpose throughout all our programs. The application we will solve involves the calculation of logic equations. We will define True to be the 8-bit value $FF and False to be 0. Logic values (true or false) will be stored on the data stack. For example, if False,False,True is pushed on the data stack (with True on top): Figure HW 10.24a Data stack before push
RegY
$FF 0 0
RegY always points to the top data stack entry. If another False is pushed on the data stack: Figure HW 10.24b Data stack after push
Reg Y
0 $FF 0 0
In all of the subroutines of this problem, you do not need to save/restore registers. Also, you do not need to worry about data stack underflow or overflow. Assume we have global variables in RAM which represent logic variables. E.g.,
392
10 䡲 Numerical Calculations P Q R S
rmb rmb rmb rmb
1 1 1 1
logic variable
True=$FF and False=0
a) Show the assembly subroutine that initializes the data stack. b) Write a subroutine Qpush, which pushes the value of the logic variable Q onto the data stack. Assume (but do not show) there are similar routines Ppush, Rpush,Spush. c) Write a subroutine popQ, which pops the value off the top of the data stack and stores it in the logic variable Q. Assume (again do not show) there are similar routines popP, popR,popS. d) Write the subroutine NOT, which performs the logical not of the top data stack entry. This routine simply changes the value of the top of the data stack without performing any data stack pushes or pops. e) Write the subroutine AND, which performs the logical and of the top two data stack entries. This routine pops two values off the top of the data stack performs the function and pushes the result back. Assume (again do not show) there are routines OR, NOR, NAND, XOR with similar parameter passing which perform their respective logic functions. f) Now comes the fun part. In order to develop a really powerful software solution, we need a oneto-one correlation between the mathematical expression and the software implementation. To do this, we will develop the concept of a command sequence. A command sequence is simply a list of subroutines to execute in order. The programming of logic expressions involves two steps. The first step is to rewrite the logic expressions in Reverse Polish Notion. For example, R (P Q)•S’ S R䊝Q can be written as P Q or S not and StoreR R Q XOR StoreS This is indeed a one-to-one translation and can be performed for any logic equation. The next step is specify the command sequence as an assembly language data structure. This variable-length vector contains 16-bit subroutine pointers and is terminated with a 0. Each subroutine passes data using the data stack. E.g., CmdSeq fdb Ppush,Qpush,OR,Spush,NOT,AND,popR,Rpush,Qpush,XOR,popS,0 Write the assembly code that executes the logic expressions defined by the CmdSeq data structure. Full credit will be given to the proper usage of local variables
10.7
Laboratory Problems Lab 10.1 Fixed-Point LCD Driver Purpose: This lab has these major objectives: interface an LCD interface used to display information on the embedded system, development of a device driver, allocation of local variables on the stack, and implementation of a fixed-point output. Description: The basic approach to this lab will be to first develop and debug your system using the simulator. During this phase of the project you will use the TExaS debugger to observe your software operation. After the software is debugged, you will run your software on the real 9S12. You will need to interface the 9S12 to the LCD. 4-bit data mode requires fewer I/O pins but necessitates a more complex communication protocol. Many microcontrollers have a limited number of pins, therefore you will interface the LCD using 4-bit data mode in this lab, which requires only 6 output pins of the 9S12. This lab will use “blind cycle” synchronization, which means after the software issues an output command to the LCD, it will blindly wait a fixed amount of time for that command to complete. For 16-pin LCD devices pins 15 and 16 should be left not connected. The objective of this lab is to develop a device driver for the LCD display. A device driver is a set of functions that facilitate the usage of an I/O port. In particular, there are three components of a device driver. First component is the description of the driver. If the software were being in C, then
10.7 䡲 Laboratory Problems
393
this description would have been the function prototypes for the public functions, which would have been placed in the header file of the driver, e.g., the LCD.H. Since this driver will be developed in assembly, your descriptions are placed in the comments before each subroutine. It is during the design phase of a project that this information is specified. In this lab, you are required to develop and test these seven public functions (notice that public functions include LCD_ in their names), as shown in Program L10.1a.
Program L10.1a Public functions of the LCD driver.
;-----------LCD_Open----------; initialize the LCD display, called once at beginning ; Input: none ; Output: none ; Registers modified: CCR ;-----------LCD_Clear----------; clear the LCD display, send cursor to home ; Input: none ; Outputs: none ; Registers modified: CCR ;-----------LCD_OutChar----------; sends one ASCII to the LCD display ; Input: RegA (call by value) letter is ASCII code ; Outputs: none ; Registers modified: CCR ; ; ; ; ;
-----------LCD_GoTo-----------Move cursor (set display address) Input: RegA is display address is 0 to 7, or $40 to $47 Output: none errors: it will check for legal address
; ; ; ; ;
-----------LCD_OutString------Output character string to LCD display, terminated by a NUL(0) Inputs: RegX (call by reference) points to an ASCII string Outputs: none Registers modified: CCR
; ; ; ; ;
-----------LCD_OutDec-----------Output a 16-bit number in unsigned decimal format Input: RegD (call by value) 16-bit unsigned number Output: none Registers modified: CCR
; ; ; ; ; ; ; ; ; ; ; ;
—-----------LCD_OutFix----------Output characters to LCD display in unsigned decimal, resolution 0.001, Inputs: RegD is an unsigned 16-bit Outputs: none Registers modified: CCR E.g., RegD=0, then output “0.000 RegD=3, then output “0.003 RegD=89, then output “0.089 RegD=123, then output “0.123 RegD=9999, then output “9.999 RegD>9999, then output “*.***
fixed-point format range 0.000 to 9.999 number
“ “ “ “ “ “
394
10 䡲 Numerical Calculations The second component of a device driver is the implementation of the functions that perform the I/O. If the driver were being developed in C, then the implementations would have been placed in the corresponding code file, e.g., LCD.C. When developing a driver in assembly, the implementations are the instructions and comments placed inside the body of the subroutines. In addition to public functions, a device driver can also have private functions. This interface will require a private function that outputs to commands to the LCD (notice that private functions do not include LCD_ in their names), as shown in Program L10.1b.
Program L10.1b Private function of the LCD driver.
;-----------outCsr----------; sends one command code to the LCD control/status ; Input: RegA is 8-bit command to execute ; Output: none 0) save any registers that will be destroyed by pushing on the stack 1) E=0, RS=0 2) 4-bit DB7,DB6,DB5,DB4 = most significant nibble of command 3) E=1 4) E=0 (latch 4-bits into LCD) 5) 4-bit DB7,DB6,DB5,DB4 = least significant nibble of command 6) E=1 7) E=0 (latch 4-bits into LCD) 8) blind cycle 90 us wait 9) restore the registers by pulling off the stack
An important factor in device driver design is to separate the policies of the interface (how to use the programs, which is defined in the comments placed at the top of each subroutine) from the mechanisms (how the programs are implemented, which is described in the comments placed within the body of the subroutine.) Possible algorithms for the seven functions are as follows LCD_Open 0) save any registers that will be destroyed by pushing on the stack 1) initialize timer Timer_Init() 2) wait 100ms allowing the LCD to power up (skip this in TExaS) 3) set DDRH so that PH5-0 are output signals to the LCD 4) E=0, RS=0 5) 4-bit DB7,DB6,DB5,DB4 = $02 (DL=0 4-bit mode) 6) E=1 7) E=0 (latch 4-bits into LCD) 8) blind cycle 90 us wait 9) outCsr($06) // I/D=1 Increment, S=0 no displayshift 10)outCsr($0C) // D=1 displayon, C=0 cursoroff, B=0 blink off 11)outCsr($14) // S/C=0 cursormove, R/L=1 shiftright 12)outCsr($28) // DL=0 4bit, N=1 2 line, F=0 5by7 dots 13)LCD_Clear() // clear display 14)restore the registers by pulling off the stack LCD_OutChar 0) save any registers that will be destroyed by pushing on the stack 1) E=0, RS=1 2) 4-bit DB7,DB6,DB5,DB4 = most significant nibble of data 3) E=1 4) E=0 (latch 4-bits into LCD) 5) 4-bit DB7,DB6,DB5,DB4 = least significant nibble of data 6) E=1 7) E=0 (latch 4-bits into LCD) 8) blind cycle 90 us wait 9) restore the registers by pulling off the stack
10.7 䡲 Laboratory Problems
395
LCD_Clear 0) save any registers that will be destroyed by pushing on the stack 1) outCsr($01) // Clear Display 2) blind cycle 1.64ms wait 3) outCsr($02) // Cursor to home 4) blind cycle 1.64ms wait 5) restore the registers by pulling off the stack LCD_OutString 0) save any registers that will be destroyed by pushing on the stack 1) read one character from the string 2) increment the string pointer to the next character 3) break out of loop (go to step 6) if the character is NUL(0) 4) output the character to the LCD by calling LCD_OutChar 5) loop back to step 1) 6) restore the registers by pulling off the stack LCD_GoTo 0) save any registers that will be destroyed by pushing on the stack 1) go to step 3 if DDaddr is $08 to $3F or $48 to $FF 2) outCsr(DDaddr+$80) 3) restore the registers by pulling off the stack LCD_OutDec (recursive implementation) 1) allocate local variable n on the stack 2) set n with the input parameter passed in RegD 3) if(n >= 10){ LCD_OutDec(n/10); n = n%10; } 4) LCD_OutChar(n+$30); /* n is between 0 and 9 */ 5) deallocate variable LCD_OutFix 0) save any registers that will be destroyed by pushing on the stack 1) allocate local variables letter and num on the stack 2) initialize num to input parameter, which is the integer part 3) if number is less or equal to 9999, go the step 6 4) output the string “*.*** “ calling LCD_OutString 5) go to step 19 6) perform the division num/1000, putting the quotient in letter, and the remainder in num 7) convert the ones digit to ASCII, letter = letter+$30 8) output letter to the LCD by calling LCD_OutChar 9) output ‘.’ to the LCD by calling LCD_OutChar 10)perform the division num/100, putting the quotient in letter, and the remainder in num 11)convert the tenths digit to ASCII, letter = letter+$30 12)output letter to the LCD by calling LCD_OutChar 13)perform the division num/10, putting the quotient in letter, and the remainder in num 14)convert the hundredths digit to ASCII, letter = letter+$30 15)output letter to the LCD by calling LCD_OutChar 16)convert the thousandths digit to ASCII, letter = num +$30 17)output letter to the LCD by calling LCD_OutChar 18)output ‘ ‘ to the LCD by calling LCD_OutChar 19)deallocate variables 20)restore the registers by pulling off the stack
396
10 䡲 Numerical Calculations The third component of a device driver is a main program that calls the driver functions, as shown in Program L10.1c. This software has two purposes. For the developer (you), it provides a means to test the driver functions. It should illustrate the full range of features available with the system. The second purpose of the main program is to give your client or customer (e.g., the TA) examples of how to use your driver. Here is a 9S12DP512 example test program, assuming a positive logic switch is connected to PORTAD0 bit 7 (PAD7).
Program L10.1c Main program used to test the LCD driver.
org $4000 Entry lds #$4000 bset ATDDIEN,#$80 ;PAD7 digital input jsr LCD_Open ;***Your function that initializes the LCD*** start ldx #Welcome jsr LCD_OutString ;***Your function that outputs a string*** ldx #TestData loop brset PTAD,#$80,* ;wait for switch release brclr PTAD,#$80,* ;wait for switch touch jsr LCD_Clear ;***Your function that clears the display*** ldd 0,x jsr LCD_OutDec ;***Your function that outputs an integer*** ldaa #$40 ;Cursor location of the 8th position jsr LCD_GoTo ;***Your function that moves the cursor*** ldd 2,x+ jsr LCD_OutFix ;***Your function outputs a fixed-point*** cpx #TestDataEnd bne loop jsr LCD_Clear ;***Your function that clears the display*** bra start Welcome fcc “Welcome “ fcc “ “ ;32 spaces fcc “to lab! “ fcb 0 TestData fdb 0,5,16,123,5432,9876,9999,10000,23456,65535 TestDataEnd
You will use the TExaS simulator to develop and test your device driver, and then you will test your solutions on the real 9S12. There are many functions to write in this lab, so it is important to develop the device driver in small pieces. One technique you might find useful is desk checking. Basically, you hand-execute your functions with a specific input parameter. For example, using just a pencil and paper think about the sequential steps that will occur when LCD_OutDec or LCD_OutFix processes the input 9876. Later, while you are debugging the actual functions on the simulator, you can single step the program and compare the actual data with your expected data. a) One by one each of the subroutines should be designed, implemented and tested. Successive refinement is a development approach that can be used to solve complex problems. If the problem is too complicated to envision a solution, you should redefine the problem and solve an easier problem. If it is still too complicated, redefine it again, simplifying it even more. You could simplify LCD_OutFix 1. Implement the variables in global variables (rather than as local variables on the stack) 2. Ignore special cases with illegal inputs 3. Implement just one decimal digit During the development phase, you implement and test the simpler problem then refine it, adding back the complexity required to solve the original problem. You could simplify LCD_OutDec in a similar fashion.
10.7 䡲 Laboratory Problems
397
b) Draw the hardware circuit diagram. Build the interface using the circuit diagram. Please double check your connections before applying power. c) This lab is sufficiently complex that it should be first debugged on the TExaS simulator. Once the system is debugged on the simulator, download and debug it on the real 9S12. Each time a function is called, an activation record is created on the stack, which includes parameters passed on the stack (none in this lab), the return address, and the local variables. You will be asked to create a stack window and identify the activation records created during the execution of LCD_OutDec.
11
Analog I/O Interfacing Chapter 11 objectives are to: c Discuss sampling and the Nyquist Theorem c Use the DAC to generate sounds and music c Describe the internal ADC on the 9S12
The common theme of this chapter is analog I/O interfacing. The chapter begins with a discussion of representing continuous signals with digital approximations. A digital to analog converter will be used to generate waveforms and sound. This chapter covers some ADC modes built into the 9S12. The ADC is then used to design measurement systems. Finally, the chapter concludes with a control system, which includes both inputs and outputs.
11.1
Approximating Continuous Signals in the Digital Domain An analog signal is one that is continuous in both amplitude and time. Neglecting quantum physics, most signals in the world exist as continuous functions of time in an analog fashion (e.g., voltage, current, position, angle, speed, force, pressure, temperature, and flow etc.) In other words, the signal has an amplitude that can vary over time, but the value can not instantaneously change. To represent a signal in the digital domain we must approximate it in two ways: amplitude quantizing and time quantizing. From an amplitude perspective, we will first place limits on the signal restricting it to exist between a minimum and maximum value (e.g., 0 to 5 V), and second, we will divide this amplitude range into a finite set of discrete values. The range of the system is the maximum minus the minimum value. The precision of the system defines the number of values from which the amplitude of the digital signal is selected. Precision can be given in number of alternatives, binary bits or decimal digits. The resolution is the smallest change in value that is significant. Figure 11.1 shows a temperature waveform (solid line), with a corresponding digital representation sampled at 1 Hz and stored as a 5-bit integer number with a range of 0 to 31°C. Because it is digitized in both amplitude and time, the digital samples (individual dots) in Figure 11.1 must exist at an intersection of grey lines. Because it is a timevarying signal (mathematically, this is called a function), we have one amplitude for each time, but it is possible for there to be 0, 1, or more times for each amplitude. The second approximation occurs in the time domain. Time quantizing is caused by the finite sampling interval. For example, the data are sampled every 1 second in Figure 11.1. In practice we will use a periodic interrupt to trigger an analog to digital convertor (ADC) to digitize information, converting from the analog to the digital domain. Similarly, if we are converting from the digital to the analog domain, we use the periodic interrupt to trigger a digital to analog convertor (DAC). The Nyquist Theorem states that if the signal is sampled with a frequency of fs, then the digital samples only contain frequency components from 0 to 1⁄2 fs. Conversely, if the analog signal does contain frequency components larger than 1⁄2 fs, then there will be an aliasing error during the sampling process. Aliasing is when the digital signal appears to have a different frequency than the original analog signal.
398
11.2 䡲 Digital to Analog Conversion 32 28
Temperature (C)
Figure 11.1 An analog signal is represented in the digital domain as discrete samples.
399
24 20 16
Discrete digital signal
12 8
Continuous analog signal
4 0 0
1
2
3
4
5
6
7
8
9 10
Time (s)
Checkpoint 11.1: Why can’t the digital samples represent the little wiggles in the analog signal? Checkpoint 11.2: Why can’t the digital samples represent temperatures above 31°C?
11.2
Digital to Analog Conversion A digital to analog convertor (DAC) converts digital signals into analog form, as shown in Figure 11.2. An embedded system uses a DAC to affect changes in its external world (e.g., sound, RF wave, pressure, force, or heat.) When interfacing a DAC to the 9S12 we can use the SPI synchronous serial port, which was described previously in Section 8.3. The DAC output can be current or voltage. Additional analog processing may be required to filter, amplify, convert or modulate the signal. The DAC precision is the number of distinguishable DAC outputs. The DAC range is the maximum and minimum DAC output. The DAC resolution is the smallest distinguishable change in output. The units of resolution are in volts or amps depending on whether the output is voltage or current. The resolution is the change that occurs when the digital input changes by 1. The MAX550 interface in Figure 8.10 has a precision of 255 alternatives or 8 bits, a range of 0 to 5 V, and a resolution of about 20 mV. Range(volts) Precision(alternatives) • Resolution(volts)
Figure 11.2 Input/output functions of a 10-bit DAC and a 10-bit ADC.
1024
Digital Input
10-bit Digital signal
896
Analog Output
DAC
768 zoomed in
640 512 384 256
Analog Input
128
Digital Outputs
ADC
0 0
1
2
3
4
Analog signal (volts)
5
400
11 䡲 Analog I/O Interfacing
Let N be a m-bit digital output of the computer, hence is an input to the m-bit DAC. Let the range of the DAC is from Vmin to Vmax. From an overall perspective, the output of the DAC is a linear function of N Vout (Vmax Vmin) * N/2m Vmin The DAC accuracy is (Actual Ideal) / Ideal where Ideal is the desired output. An DAC is monotonic if an increase in digital value always causes an increase in analog value. This means if the digital signal increments slowly, then the analog output will never decrease. Example 11.1 Design a 2-bit DAC with a range of 0 to 5 V using resistors. Solution We begin the design by specifying the exact input/output relationship of the 2-bit DAC. There are two possible solutions depending upon whether we want a resolution of 1.25 V or 1.67 V, as shown as V1 and V2 in Table 11.1. The solution implements the V2 response. Table 11.1 Specifications of the 2-bit DAC.
N
Q1 Q0
V1 (V)
V2 (V)
0 1 2 3
0 0 5 5
0.00 1.25 2.50 3.75
0.00 1.67 3.33 5.00
0 5 0 5
Assume the output high voltage (VOH) of the 9S12 is 5 V, and its output low voltage (VOL) is 0. We choose the resistor ratio to be 2⁄1 so Q1 bit is twice as significant as the Q0 bit, as shown in Figure 11.3. If both Q1 and Q0 are 0, the output V2 is zero. If Q1 is 0 and Q0 is 5 V, the output V2 is determine by the resistor divider network 5 V 20 k V2 10 k 0 V, which is 1.67 V. If Q1 is 5 V and Q0 is 0, the output V2 is determine by the resistor divider network 5 V 10 k V2 20 k 0 V, which is 3.33 V. If both Q1 and Q0 are 5 V, the output V2 is 5 V. Figure 11.3 A 2-bit DAC.
9S12 bit1 bit0
Q1
10kΩ V2
Q0
20kΩ
You can realistically build a 4- or 5-bit DAC using this method. Checkpoint 11.3: How do you build a 3-bit DAC using this method?
11.3
Music Generation Most digital music devices rely on high-speed DAC converters to create the analog waveforms required to produce high-quality sound. In this section, we will discuss a very simple sound generation system that illustrates this application of the DAC. The hardware consists of a DAC and a speaker interface. You can drive headphones directly from a DAC output, but to drive a regular speaker, you will need to add an audio amplifier, as illustrated in Figure 11.4. For more information on the audio amplifier, refer to the data sheet of the MC34119. The quality of the music will depend on both hardware and software factors. The precision of the DAC, external noise and the dynamic range of the speaker are some
11.3 䡲 Music Generation
401
of the hardware factors. Software factors include the DAC output rate and the complexity of the stored sound data.
Figure 11.4 DAC allows the software to create music.
10kΩ DAC Out
headphones
DAC Out
speaker
0.1μF 20kΩ 0.1μF
+ 0.1μF
MC34119
If you output a sequence of numbers to the DAC that form a sine wave, then you will hear a continuous tone on the speaker, as shown in Figure 11.5. The loudness of the tone is determined by the amplitude of the wave. The pitch is defined as the frequency of the wave. Table 11.2 contains frequency values for the notes in one octave.
pitch = 1/period
period loudness
Figure 11.5 The loudness and pitch are controller by the amplitude and frequency.
Table 11.2 Fundamental frequencies of standard musical notes. The frequency for ‘A’ is exact.
A 4-bit DAC was made with four resistors of values 1.5, 3, 6, and 12 k. The DAC had a range of 0 to 5 V and was interfaced to four output pins of the 9S12. The measured data in Figure 11.6 was collected using this DAC. The plot on the left was measured with a digital scope (without the headphones being attached). The plot on the right shows the frequency response of this data, plotting amplitude (in dB) versus frequency (in kHz). This measured waveform is approximately 2.7 2.3sin(2440 t) volts. The two peaks in the spectrum are at DC and 440 Hz (e.g., 20*log(2.3) 7.2 dB).
402
11 䡲 Analog I/O Interfacing
Figure 11.6 A 440 Hz sine wave generated with a 4-bit DAC. The plot on the right is the Fourier Transform(frequency spectrum dB versus kHz) of the data plotted on the left.
The frequency of each note can be calculated by multiplying the previous frequency by 12. You can use this method to determine the frequencies of additional notes above and below the ones in Table 11.2. There are twelve notes in an octave, therefore moving up one octave doubles the frequency. Figure 11.7 illustrates the concept of instrument. You can define the type of sound by the shape of the voltage versus time waveform. Brass instruments have a very large first harmonic frequency. 12
Figure 11.7 A waveform shape that generates a trumpet sound.
period
The tempo of the music defines the speed of the song. In 2⁄4 3⁄4 or 4⁄4 music, a beat is defined as a quarter note. A moderate tempo is 120 beats/min, which means a quarter note has a duration of 1⁄2 second. A sequence of notes can be separated by pauses (silences) so that each note is heard separately. The envelope of the note defines the amplitude versus time. A very simple envelope is illustrated in Figure 11.8. The 9S12DP512 has plenty of processing power to create these types of waves. Figure 11.8 You can control the amplitude, frequency and duration of each note (not drawn to scale).
330 Hz
0.5s
330 Hz
0.5s
523 Hz
1.0s
The smooth-shaped envelope, as illustrated in Figure 11.9, causes a less staccato and more melodic sound. This type of sound generation is possible to produce in real-time on the 9S12, but it uses most of the available processing capabilities. Figure 11.9 The amplitude of a plucked string drops exponentially in time.
330 Hz
0.5s
330 Hz
0.5s
523 Hz
1.0s
A chord is created by playing multiple notes simultaneously. When two piano keys are struck simultaneously both notes are created, and the sounds are mixed arithmetically. You can create the same effect by adding two waves together in software, before sending the wave to the DAC. Figure 11.10 plots the mathematical addition of a 262 Hz (low C) and a 392 Hz sine wave (G), creating a simple chord.
11.4 䡲 Analog to Digital Conversion 2 Sound Amplitude
Figure 11.10 A simple chord mixing the notes C and G.
403
1 0 –1 –2 0
11.4
0.005
0.01 Time (sec)
0.015
0.02
Analog to Digital Conversion An analog to digital convertor (ADC) converts an analog signal into digital form, also shown in Figure 11.2. An embedded system uses the ADC to collect information about the external world (data acquisition system.) The input signal is usually an analog voltage, and the output is a binary number. The ADC precision is the number of distinguishable ADC inputs (e.g., 1024 alternatives, 10 bits). The ADC range is the maximum and minimum A/D input (e.g., 0 to 5 V). The ADC resolution is the smallest distinguishable change in input (e.g., 5 mV). The resolution is the change in input that causes the digital output to change by 1. Range(volts) Precision(alternatives) • Resolution(volts) Normally we don’t specify accuracy for just the ADC, but rather we give the accuracy of the entire system (including transducer, analog circuit, ADC and software). An ADC is monotonic if it has no missing codes. This means if the analog signal is a slow rising voltage, then the digital output will hit all values one at a time. The merit of an ADC involves three factors: precision (number of bits), speed (how fast can we sample), and power (how much energy does it take to operate). How fast we can sample involves both the ADC conversion time (how long it takes to convert), and the bandwidth (what frequency components can be recognized by the ADC). The ADC cost is a function of the number and quality of internal components.
11.4.1 9S12 ADC Details
Most of the 9S12 microcontrollers have a built-in ADC converter, see Table 11.3. The 9S12C32 has one 8-channel 10-bit ADC using Port AD bits 7 to 0 (the pins are named PAD7 to PAD0). The 9S12DP512 has two 8-channel 10-bit ADC modules: ATD1 uses Port AD1 bits 7 to 0 (the pins are named PAD15 to PAD08) and ATD0 uses Port AD0 bits 7 to 0 (the pins are named PAD07 to PAD00). On the 9S12DP512, the ATD port names include a 0 or 1 to specify which ADC module, otherwise the ADC modules on the various 9S12 microcontrollers operate similarly. For example, the one control register 2 on the 9S12C32 is called ATDCTL2, but on the 9S12DP512 there are two ADC modules, so there are two control register 2s called ATD0CTL2 and ATD1CTL2. The ADC on the 9S12 can be operated in 8-bit mode or 10-bit mode. The 8 pins of 9S12C32 Port AD can be individually defined as analog input, digital output, or digital input. The 16 pins of 9S12DP512 Ports AD1 and AD0 can be individually defined as analog input or digital input (but not digital output). We set the corresponding bit in the ATD0DIEN register to be 1 for digital or 0 for analog input. On the 9S12C32, if we set the corresponding bit in the DDRAD register to 1, then that bit can be used as a digital output. There are no DDRAD registers on the 9S12DP512, because these pins can not be digital outputs. The ADC digital output can be right- or left-justified within the 16-bit result register, and it can be in a signed or unsigned format. When the ADC is triggered, it performs a sequence of conversions, with the sequence length being any number from 1 to 8 conversions. When performing a sequence, it can convert the same channel multiple times or it can convert
Table 11.3 9S12 registers used for analog to digital conversion.
different channels during the sequence. We can trigger ADC conversions in three ways. The first way is to use an explicit software trigger (write to ATD0CTL5), and when the conversions are complete, the SCF flag is set. The example in Programs 11.1 and 11.2 employ the explicit software trigger to start an ADC conversion. The second way trigger the ADC is continuous mode. In this mode, the software starts it, but the ADC sample sequence is repeated over and over continuously. The third way is to connect an external trigger to the digital input on PAD07. With an external trigger we can use busy-wait synchronization (gadfly) on the SCF flag, or arm interrupts (ASCIE 1) on the ASCIF flag. The results of the ADC conversions can be found in the ATD0DR0 to ATD0DR7 result registers, where the register number refers to the sample sequence number. In other words, ATD0DR0 contains the result of the first conversion in the sequence, ATD0DR1 contains the result of the second conversion, . . . and ATD0DR7 contains the result of the eighth conversion. The ATD0CTL2 contains bits that activate the ADC module. The 9S12 ADC system is enabled by setting ADPU equal to 1. The ADC will request an interrupt on the completion of a conversion sequence if the arm bit ASCIE is set. ASCIF is the ATD Sequence Complete Interrupt Flag. If ASCIE 1 the ASCIF flag equals the SCF flag, else ASCIF reads zero. Write operations to ATD0CTL2 have no effect on ASCIF. ETRIGE is the External Trigger Mode Enable bit. This bit enables an external trigger using the digital input from Port AD bit 7. The external trigger allows us to synchronize sampling the ATD conversion with external events. If external triggering is enabled, then the type of trigger is defined in the ETRIGLE and ETRIGP bits as specified in Table 11.4.
Table 11.4 External trigger modes for the 9S12 ADC.
ETRIGLE
ETRIGP
External Trigger mode
0 0 1 1
0 1 0 1
Falling edge of PAD07 starts a conversion sequence Rising edge of PAD07 starts a conversion sequence Perform ADC conversions when PAD07 is low Perform ADC conversions when PAD07 is high
11.4 䡲 Analog to Digital Conversion
405
The ATD0CTL3 and ATD0CTL4 contain bits that configure the ADC mode. The bits S8C, S4C, S2C, S1C control the number of conversions per sequence. Let n be the four-bit number specified by these bits. For values of n from 1 to 7, n specifies the sequence length. For values of n equal to 0 or 8 to 15, the sequence length is 8. At reset, the default sequence length is 4 (0100), maintaining software continuity to HC12 family. This book will not discuss FIFO mode or freeze mode. SRES8 is the ADC Resolution Select bit. This bit selects the resolution of ADC conversion as either 8 (SRES8 1) or 10 bits (SRES8 0). The ADC converter has an accuracy of 10 bits; however, if low resolution is acceptable, selecting 8-bit resolution will reduce the conversion time, and may simplify software design. It takes about 10 s to convert an analog signal into a digital number. The exact time to perform an ADC conversion is determined by the E clock and the ATD0CTL4 register. The ATD0CTL4 register selects the sample period and PRS-Clock prescaler. SMP1, SMP0 are the Sample Time Select bits. These two bits select the length of the second phase of the sample time in units of ATD conversion clock periods, as listed in Table 11.5. The sample time consists of two phases. The first phase is two ATD conversion clock periods long and transfers the sample quickly (via the buffer amplifier) onto the ADC machine’s storage node. The second phase attaches the external analog signal directly to the storage node for final charging and high accuracy. Table 11.5 lists the lengths available for the second sample phase. Let m be the 5-bit number formed by bits PRS4-0. Let fE be the frequency of the E clock. The ATD conversion clock frequency is calculated as follows: ATD clock frequency 1⁄2 fE/(m 1) The default (after reset) prescaler value is 5, which results in a default ATD conversion clock frequency that is the E clock divided by 12. The choice of these parameters involves a tradeoff between accuracy and speed. Freescale recommends the ADC clock frequency be restricted to the 500 kHz to 2 MHz range. For analog signals with the white noise, we can essentially add an analog low pass filter by increasing the ADC sample time, s. To increase conversion speed, we wish to select a faster clock and shorter sample period. The last factor to consider is the slewing rate of the input signal. For signals with a high slope, dV/dt, we need to select a faster conversion time (i.e., shorter sample time). For a 4 MHz E clock, the possible m prescales range from 0 (ADCclock 2 MHz) to 3 (ADCclock 500 kHz). For an 8 MHz E clock, the possible m prescales range from 1 (ADCclock 2 MHz) to 7 (ADCclock 500 kHz). For a 24 MHz E clock, the possible m prescales range from 5 (ADCclock 2 MHz) to 23 (ADCclock 500 kHz). Other choices are not recommended. The time for one ADC conversion is equal to 2(m 1)(s n)/fE, where s is the total sample time (Table 11.5) and n is the number of ADC bits (e.g.,10).
Table 11.5 Sampling time for the 9S12 ADC.
SMP1
SMP0
First Sample Phase
Second Sample Phase
Total Sample Time, s
0 0 1 1
0 1 0 1
2 ADC clock periods 2 ADC clock periods 2 ADC clock periods 2 ADC clock periods
2 ADC clock periods 4 ADC clock periods 8 ADC clock periods 16 ADC clock periods
4 ADC clock periods 6 ADC clock periods 10 ADC clock periods 18 ADC clock periods
Observation: The ADC frequency does not determine the data acquisition sampling rate, rather it determines how fast one ADC conversion occurs. The sampling rate is determined by how often the software starts the ADC conversion.
Writing to the ATD0CTL5 register will start an ADC conversion. To begin continuous conversions, we write to the ATD0CTL5 with SCAN 1. On the other hand, if we write to ATD0CTL5 with SCAN 0, only one sequence occurs. CC, CB, CA select the analog input channel(s) whose signals are sampled and converted to digital numbers. Because the result registers (16 bits) are wider than the ADC digital code (8 or 10 bits), we must choose
406
11 䡲 Analog I/O Interfacing
where in the result register to put the digital code. DJM is the Result Register Data Justification bit, where 1 means right-justified and 0 means left-justified data in the result registers. DSGN selects between signed and unsigned format. We set DSGN to 1 for signed data representation and we set it to 0 for unsigned data representation. Table 11.6 describes the four possible 10-bit data formations for the 9S12. When MULT is 0, the ATD sequence controller samples only from the specified analog input channel for an entire conversion sequence. When MULT is 1, the ATD sequence controller samples a sequence of different analog input channels. The number of channels sampled is determined by the sequence length value (S8C, S4C, S2C, and S1C). The first analog channel examined is determined by channel selection code (CC, CB, and CA control bits); subsequent channels sampled in the sequence as determined by incrementing the channel selection code. The status register contains a status bit, SCF, which we use to poll for the ADC conversion completion. The SCF flag is cleared by writing data into the ATD0CTL5 (i.e., starting a new conversion.) The SCF flag can also be cleared by writing a 1 to it. The CC2,CC1,CC0 bits are the sequence counter as the ADC steps through a conversion sequence. The CCFn bits are individual flags for each of the conversions. Performance Tip: If we are interested in a single conversion, we should initialize the ATDCTL3 register to perform just one conversion. Common Error: The ADC data register ATD0DR0 does not necessarily contain the result from ADC channel 0. The ADC data register ATD0DR0 contains the result of the first conversion, the ATD0DR1 contains the result of the second conversion, etc. Observation: Voltages above 5 V or below 0 V will damage the ADC pin on a 9S12.
11.4.2 ADC Data Formats
The 9S12DP512 subroutine, ADC_In, will return a 10-bit value representing the analog input. Table 11.6 shows the 8-bit format and the four 10-bit formats available on 9S12.
Table 11.6 Binary formats used by the Freescale internal ADCs.
For the 9S12 running with an 8-bit precision, the analog input range is 0 to 5 V, the analog input resolution is 5 V/256, which is about 20 mV. For 9S12 running with a 10-bit precision, the analog input range is 0 to 5 V, the analog input resolution is 5 V/1024, which is about 5 mV. When the 9S12 inserts an 8- or 10-bit data into the 16-bit result register it will pad the extra bits with zeros, but the signed right-justified column in Table 11.6 is shown with bits 15 to 10 having been sign-extended from bit 9. The code to perform this sign extension is ldd ATD0DR0 bita #$02 beq ok oraa #$F8 ok std Result
; ; ; ;
10-bit test bit 9 skip if positive sign extend
Result = ATD0DR0; // 10-bit if(Result&0x200){ // test sign bit Result |= 0xF800; // sign extend }
11.4 䡲 Analog to Digital Conversion
407
Example 11.2 Design a device driver for the internal ADC. Solution In this simple solution, we will design two functions: one to initialize the ADC and one to sample a channel. The ADC_In function will perform one conversion, and the returns the 10-bit result, see Program 11.1. In this initialization routine, bit 7 of ATD0CTL2 is set to enable the ADC. In order to sample the ADC once we set S8C, S4C, S2C, S1C to 0001. In this solution, we choose the fastest possible rate for the ADC clock (0.5 s). For an 8 MHz E clock, we would set m equal to 1 to get a 2 MHz ADCclock. However, the solution assumes a 24 MHz E clock, and thus, we set m equal to 5 to get an ADCclock of 2 MHz. The solution chooses the shortest possible sampling time, which is appropriate for situations of low noise (s 4). For more noiser situations we can slow down the ADCclock and increase the sampling time. The time to make one ADC conversion is 2(m 1)(s n)/fE 2(5 1) (4 10)/24 MHz 7 s.
; 9S12DP512, assuming a 24MHz E clock ; bit 7 DJM 1=right , 0=left justified ; bit 6 DSGN 1=signed, 0=unsigned ; bit 5 SCAN 0=single sequence ; bit 4 MULT 0=single channel ; bits 2-0 channel number 0 to 7 ; Analog signal connected to PAD7-0 ADC_Init movb #$80,ATD0CTL2 ;power up movb #$08,ATD0CTL3 ;1 sample movb #$05,ATD0CTL4 ;10-bit rts ;In: RegA has channel Number ;RegA=$82 means right-justified channel 2 ;Out: RegD has 10-bit ADC result ADC_In staa ATD0CTL5 ;Start ADC Loop brclr ATD0STAT1,$01,Loop ldd ATD0DR0 ;10-bit result rts
// 9S12DP512, assuming a 24MHz E clock // bit 7 DJM 1=right , 0=left justified // bit 6 DSGN 1=signed, 0=unsigned // bit 5 SCAN 0=single sequence // bit 4 MULT 0=single channel // bits 2-0 channel number 0 to 7 // Analog signal connected to PAD7-0 void ADC_Init(void){ ATD0CTL2 = 0x80; // enable ADC ATD0CTL3 = 0x08 // 1 sample ATD0CTL4 = 0x05; // 10-bit, } // chan=0x82 means right-justified channel 2 unsigned short ADC_In(unsigned char chan){ ATD0CTL5 = chan; // start while((ATD0STAT1&0x01)==0) ; // CCF0 return ATD0DR0; // 10-bit result }
Program 11.1 Assembly and C software to sample data using the ADC.
Observation: To make an ADC driver for the 9S12C32, simply change all the ATD0 to ATD in Program 11.1. Checkpoint 11.4: Derive an equation for the digital value returned by the function ADC_In versus the analog input voltage.
11.4.3 ADC Resolution
The ADC resolution is the smallest change in input that can be reliably detected by the system. Figure 11.11 illustrates how ADC resolution should be measured. Because of the noise processes, if we set the ADC input to Vin and sample it 1000 times, we will get a distribution of digital outputs. We plot the number of times we got an output as a function of the output sample. The shape of this response is called a probability density function (pdf) characterizing the noise processes. For example, white noise has a Gaussian pdf. The standard deviation of repeated measurements (with units of volts) is a simple measure of ADC
408
11 䡲 Analog I/O Interfacing
Figure 11.11 Experimental determination of ADC resolution.
10-bit ADC of a 9S12
1000
2.500V 2.501V
100 pdf
2.502V 2.503V 2.504V
10
1 504
2.505V
506
508 ADC output
510
512
resolution (in volts). A better measure of resolution would be to repeat the 1000 measurements with an input slightly larger, Vin V. If we can demonstrate that the second data set is statistically different from the first (regardless of Vin), we claim the resolution is less than or equal to V. For the 10-bit ADC on the 9S12, we have to increase the input by 5 mV to always be able to statistically recognize the change. Therefore we claim the ADC has a resolution of 5 mV.
11.5
*Multiple Access Circular Queues A multiple access circular queue (MACQ) is used for performing digital signal processing on a data acquisition system. A MACQ is a fixed length order preserving data structure, as shown in Figure 11.12. The source process (producer) places information into the MACQ. Once initialized, the MACQ is always full. The oldest data is discarded when the newest data is entered into a MACQ. The sink process (consumer) can read any of the data from the MACQ. Reading the data in the MACQ is non-destructive. This means that the MACQ is not changed by the read operation.
Figure 11.12 A multiple access circular queue stores the most recent set of measurements.
MACQ before v[0] v[1] v[2] v[3]
new
MACQ after v[0] v[1] v[2] v[3]
lost The MACQ is useful for implementing digital filters and digital controllers. The following equation illustrates a robust way to calculate the first derivative of a measured signal. Let v[0] v[1] v[2] and v[3] are the most recent data sampled at a fixed time period t. If each v has the units of mV, and t has the units of msec, then the derivative, d, will be in mV/msec. d
v[0] 3 v[1] 3v[2] v[3] 6¢t
To measure the derivative following sequence of operations are executed every 1 ms, v[3] v[2] v[2] v[1] v[1] v[0] v[0] new voltage measurement (in mV) d (v[0] 3*v[1] 3*v[2] v[3])/6
11.6 䡲 Real-Time Data Acquisition
409
Checkpoint 11.5: Write an equation to calculate acceleration using inputs v[0] v[1] and v[2]. Checkpoint 11.6: Write a more robust equation to calculate acceleration using inputs d[0] d[1] d[2] and d[3], where d is calculated using the above equation.
11.6
Real-Time Data Acquisition Whenever we wish convert a continuous analog signal into discrete time digital samples, the rate at which the sampling process occurs is extremely important. Nyquist Theorem: If fmax is the largest frequency component of the analog signal, then you must sample more than twice fmax in order to faithfully represent the signal in the digital samples. For example, if the analog signal is A B sin(2 ft ) and the sampling rate is greater than 2f, you will be able to determine A, B, f, and from the digital samples.
The goal of a data acquisition system is to sample the ADC at a regular rate. Let fs be the desired sampling rate, and let ti be the actual time the ADC creates sample number i. In a perfect world, we would like to have (ti ti1) 1/fs for all i. When using period interrupts to establish the sampling rate (Programs 9.2, 9.3, and 9.4), there are two factors that lead to fluctuations in the sample period. We define time jitter, t, as the maximum variation in the sample-to-sample time. 1/fs t (ti ti1) 1/fs t We learned in Section 9.2 that it takes 9 cycles to process an interrupt (vector fetch, push registers). These 9 cycles, plus the execution of the ISR itself, are equal for every sample. Thus, the time between samples is not affected by this fixed delay. The first factor that does cause jitter is the instruction currently being executed at the time of the interrupt request. The time to execute an instruction on the 9S12 varies from 1 to 13 cycles. We can neglect the rev revw and wav instructions, because these three instructions can be suspended by an interrupt. We do not know which instruction will be executing or when during that instruction the interrupt will be requested. This uncertainty causes a maximum time jitter of at most 13 cycles, or 1.625 sec on an 8 MHz 9S12. This jitter is usually acceptable. The second source of jitter can be much larger. If there are any portions of the main program that disable interrupts (e.g., because of a critical section), then the time running with interrupts disabled will cause time jitter in the sampling. In a similar fashion, if there are other interrupts, then the time to execute the other ISR may cause a time jitter. Observation: Good software places as little processing in the ISR itself. Perform whatever functions must be done in the ISR, and shift the rest of the processing to the foreground. Observation: Real-time systems must put an upper bound on the time the software is allowed to run with interrupts disabled. Checkpoint 11.7: Assuming the 9S12 is running at 24 MHz, what is the time jitter caused by the uncertainty in which instruction is being executed at the time of the interrupt?
Example 11.3 Design a real-time data acquisition system that samples a voltage signal, performs a digital differentiation and displays the slope on a LCD. The range of inputs is 0 to 5 V with frequency components of 0 to 50 Hz.
410
11 䡲 Analog I/O Interfacing
Solution We will connect the input to PAD2 (but any ADC input could have been used). According to the Nyquist Theorem we need to sample at 100 Hz or faster. The time to output the result on the LCD is about 0.4 ms (using Program 8.7, 10 characters at 40 s each), so the reasonable choices of sampling rates are 100 to 2000 Hz. This solution implements 100 Hz sampling using a periodic output compare interrupt, like Program 9.4. The main program will initialize the PLL, OC6 and ADC. The main loop of the foreground thread will wait for a sample, calculate the derivative, then output the slope, see Figure 11.13 and Program 11.2. The multiple access circular queue will contain the most recent four voltage samples. A flag will be used to synchronize data transfer from background to foreground. This sample mechanism can be used whenever the main program is guaranteed to finish every time within the 10 ms interval, before the next sample arrives. The first in first out queue, which will be presented in the next chapter, can be used in situations where the main program is only guaranteed on average to be finished. In this case, the periodic interrupt will trigger the ADC, calculate Voltage as a decimal fixed-point number (units 0.001 V, or mV) and send the data to the foreground. The system implements a heartbeat, by toggling PT6 during each execution of the ISR. The PT6 output is not required for the correct operation of the system, but it is useful to see the program is running and the interrupt rate is 100 Hz. The function LCD_OutDec will display a signed integer on the LCD (not explicitly given in this book, left as an exercise to the reader). Although the original sample is unsigned (0 to 1023), and the voltage will be unsigned (0 to 5000), the calculations are performed in signed math because the slope may be negative. Since the units of voltage are mV and the sampling occurs every 10 ms, the units of slope will be 0.1 mV/ms, which is 0.1 V/sec. For example, let the voltage slope be 1 V/s, typical set of four voltage measurements might be 50, 60, 70, or 80 mV. The calculated slope would be (80 3*(70 60) 50)/6 10, which means 1.0 V/sec. Figure 11.13 Flowchart for a real-time data acquisition system.
main
OC6 interrupt (10ms)
OC6_Init
PTT ^= 0x40
ADC_Init
Sample=ADC_In
Flag 1
0
Voltage = 625*sample/128 Flag = 1
Slope = dVoltage/dt TC6 = TC6+15000 Flag = 0 LCD_Out(Slope)
; running at org V rmb Voltage rmb Flag rmb
24 MHz $0800 8 2 1
;($3800 if C32) ;last four samples ;current sample ;true if new data
Ack interrupt
// running at 24 MHz short V[4]; // last four samples short Voltage; // current sample, 0.001V char Flag; // true if new data short Slope; // derivative, 0.1V/sec
continued on p. 413 Program 11.2 Implementation of a periodic interrupt using output compare.
11.6 䡲 Real-Time Data Acquisition
continued from p. 412 Slope
rmb 2 ;derivative org $4000 main lds #$4000 jsr PLL_Init ;Program 4.4 jsr ADC_Init ;Program 11.1 jsr OC6_Init loop ldaa Flag beq loop ;wait for data movw V+4,V+6 ;shift MACQ movw V+2,V+4 movw V,V+2 movw Voltage,V clr Flag ;done with Voltage ldd V+2 ;V[1] subd V+4 ;V[1]-V[2] pshd asld ;2*(V[1]-V[2]) addd 2,SP+ ;3*(V[1]-V[2]) addd V ;V[0]+3V[1]-3V[2] subd V+6 ;V[0]+3V[1]-3V[2]-V[3] ldx #6 idivs stx Slope ;0.1V/sec jsr LCD_OutDec ;to be written bra loop OC6_Init bset DDRT,#$40 movb #$80,TSCR1 ;enable TCNT movb #$04,TSCR2 ;24MHz/16 bset TIOS,#$40 ;activate OC6 bset TIE,#$40 ;arm OC6 clr Flag ldd TCNT ;time now addd #50 ;first in 50us std TC6 movb #$40,TFLG1 ;clear C6F cli ;enable IRQ rts OC6Han ldaa PTT eora #$40 ;heartbeat staa PTT ldaa #$82 jsr ADC_In ;10 bit sample ldy #625 emuls ldx #128 edivs sty Voltage ;0 to 5000 movb #1,Flag ;signal ldd TC6 addd #15000 std TC6 ;next in 10 ms movb #$40,TFLG1 ;acknowledge rti org $FFE2 fdb OC6Han ;vector
void main(void){ PLL_Init(); // Program 4.4 ADC_Init(); // Program 11.1 OC6_Init(); while(1){ while(Flag == 0){}; V[3] = V[2]; // shift MACQ V[2] = V[1]; V[1] = V[0]; V[0] = Voltage; Flag = 0; // done with Voltage Slope = (V[0]+3*V[1]-3*V[2]-V[3])/6; LCD_OutDec(Slope); // 0.1V/sec } } void OC6_Init(void){ asm sei // Make atomic DDRT |= 0x40; TSCR1 = 0x80; // 1.5 MHz TCNT TSCR2 = 0x04; // divide by 16 TIOS |= 0x40; // activate OC6 TIE |= 0x40; // arm OC6 Flag = 0; // no data TC6 = TCNT+50; // first in 50us TFLG1 = 0x40; // clear C6F asm cli // enable IRQ } interrupt 14 void OC6handler(void){ short sample; PTT ^= 0x40; // heartbeat sample = (short)ADC_In(0x82); Voltage = (625*sample)/128; Flag = 1; // new data ready TC6 = TC6+15000; // next in 10 ms TFLG1 = 0x40; // acknowledge C6F }
411
412
11 䡲 Analog I/O Interfacing Checkpoint 11.8: Estimate the time jitter of the assembly version of the real-time system in Program 11.2.
Example 11.4 Design an analog interface between a 30 k thermistor and the 9S12, so that temperatures from 20 to 45°C can be measured. Solution The software components of this system are left as laboratory exercises. A thermistor is a transducer, with its resistance being a function of temperature. A 30 k thermistor has a resistance of 30 k at room temperature (25°C). The basic approach is to use a bridge to convert the thermistor resistance into a voltage difference (V1 V2), see Figure 11.14. A rail-to-rail instrumentation amplifier (AD623) converts the voltage difference into the 0 to 5 V range of the ADC. Rail-to-rail means the output can swing from one power supply rail (0 V) to the other (5 V). A rail-to-rail analog amplifier has three wonderful advantages over analog circuits that use the usual 12 V and 12 V supplies. First, with rail-to-rail analog chips, the entire embedded system can operate on a single supply. These rail-to-rail devices are low power, and hence the system can be run on batteries. Third, the analog output is guaranteed to exist in the 0 to 5 V range; hence broken devices, unplugged transducers, or damaged cables will not destroy the ADC pin of the 9S12. One way to design a data acquisition system is to create a design table, as shown in Table 11.7. The table is presented with the columns from left to right as the signal traverses the system. However, during the design phase, we begin with the left-most column (expected input) and the right-most column (desired outputs) and work our way into the center. We calibrate the thermistor using an ohmmeter, a reference thermometer, and a water bath. This calibration gives us a thermistor resistance (RT) for various temperatures throughout the measurement range. The nonlinear resistance versus temperature response can be modeled with RT R0 e/T where R0 and are calibration coefficients and T is temperature in Kelvin. The 10-bit ADC of the 9S12 will be used, so we can work backwards in the table (from right-most to second right-most column) using a linear mapping of the output T (in 0.01°C) into the digital sample expected from the ADC. Similarly, knowing how the ADC operates, we can calculate the expected V3 input (in V) that gives the corresponding ADC result. The bridge will be used to convert resistance to voltage. There are three design parameters for the bridge. The first is the bridge input voltage. One could use the 5 V supply as the input, but a less-noisy solution is to use an analog reference voltage of 2.50 V. This is a highly stable low noise voltage signal is generated by the REF03 chip
Figure 11.14 Analog interface between a thermistor and the ADC input of a 9S12.
+5V 2 0.1uF
6
+5V
+2.50V
0.1uF
7.3kΩ Rg
REF03
4
R1 200kΩ
200kΩ V1 R1 V2
RT
R2 10kΩ
7 1
3 2
8 6
AD623 5 4
V3
9S12 ADC
T
LCD 25.12C
11.7 䡲 *Control Systems Table 11.7 Thermistor design table.
(this reference is not used to power devices, but used as an analog constant). The second parameter is R1, which is choosen large enough to prevent the bridge from heating the thermistor, causing an error. A R1 value of 200 k creates a power dissipation of about 0.005 mW in the thermistor. Given a dissipation constant of 2.5 mW/°C, the selfheating error will be about 0.002°C (insignificant). The resistor R2 sets the maximum temperature of the system. The value of 10 k is choosen slightly smaller than the thermistor resistance at 45°C. Given the bridge input, R1 and R2, the design parameters V1 and V1 V2 can be calculated using Ohm’s Law. The last step is select the gain of the instrumentation amplifier so a V1 V2 of 0.339 V is converted to a V3 of 5.0 V. The system needs a gain of about 14.7 (5/0.339). AD623 and AD627 are low-power single supply rail-to-rail instrumentation amps, V3 Gain*(V1 V2) Vpin5 Gain 1 (100 k/RG) (for the AD623) Gain 5 (200 k/RG) (for the AD627) V3 Vpin5 when V1 equals V2 For this system, we set pin 5 to ground, and select RG 100 k/(14.7 1) 7.3 k. If we choose a smaller gain, then the minimum temperature will decrease. After we build the circuit, we recalibrate the system, creating a table like Table 11.7, but filling in actual measured values. The last two columns of the measured response will be used by the software to convert ADC sample into a fixed-point value to be displayed.
11.7
*Control Systems This book is an introduction to embedded systems, and thus control theory is beyond its scope. However, the simple example presented in this section knits together many of the topics of this chapter and provides the framework showing how more complex control systems might be implemented with a microcontroller. A control system is a collection of mechanical and electrical devices connected for the purpose of commanding, directing, or regulating a physical plant, as shown in Figure 11.15. An example physical plant is a DC motor. The real state variables are the properties of the physical plant that are to be controlled (e.g., motor speed in RPM). The sensor and state estimator comprise a data acquisition system. A tachometer is an example of a sensor, and its state estimator would be the associated hardware (Figure 9.7) and software (Program 9.6.) The goal of this data acquisition system is to estimate the state variables (e.g., measure speed). A closed loop
414
11 䡲 Analog I/O Interfacing
control system uses the output of the state estimator in a feedback loop to drive the errors to zero. The control system compares these estimated state variable, X’(t), to the desired state variable, X*(t), in order to decide appropriate action, U(t). The actuator is a transducer that converts the control system commands, U(t), into driving forces, V(t), that are applied the physical plant. An example actuator is the hardware and software associated with PWM (Figure 8.13 and Program 8.9). Figure 11.15 A control system employs closed-loop negative feedback.
Actuator Error
Desired + X* X’
– e=X*–X’
Incremental Controller
Measured Speed State estimator Period Measurement
U
PWM circuit
Sensor Tachometer
Plant V R L
Actual Speed + –
DC Motor
emf
In general, the goal of the control system is to drive the real state variables to equal the desired state variables. In actuality though, the controller attempts to drive the estimated state variables to equal the desired state variables. It is important to have an accurate state estimator, because any differences between the estimated state variables and the real state variables will translate directly into controller errors. If we define the error as the difference between the desired and estimated state variables: e(t) X*(t) X’(t) then the control system will attempt to drive e(t) to zero. We evaluate the effectiveness of a control system by determining three properties: steady state controller error, transient response, and stability. The steady state controller error is the average value of e(t). The transient response is how long does the system take to reach 99% of the final output after X* is changed. A system is stable if steady state (smooth constant output) is achieved. An unstable system oscillates or saturates. In general control theory, X(t), X’(t), X*(t), U(t), V(t) and e(t) refer to multidimensional vectors (e.g., speed of multiple motors, speed&acceleration, (yaw, pitch, roll) or (x,y,z,,,)), but the simple examples in this book control only a single parameter. Observation: Many control systems operate well when the control equations are executed about 10 times faster than the step response time of the physical plant.
Example 11.5 Design a control system that controls a DC motor in the range of 1000 to 2000 RPM. Solution The objective of this system is to control the speed of a motor, using the design approach shown in Figure 11.15. One input will be the desired speed, stored as a decimal integer (RPM) in a global variable called Desired. For example to spin the motor at 1500 RPM, we set Desired to 1500. The second input will be the measured speed, stored as a decimal integer (RPM) in a global variable called Measured. Figure 9.7 and Program 9.6 supply measured Period with a resolution of 1 s and a range of 50 s to 65.535 ms. Assuming there are 32 stripes on the wheel, we can use the following equation to measure speed (in RPM): Measured = 1,875,000/Period
11.7 䡲 *Control Systems
415
The units of Period are s/stripe. The 1,875,000 constant is derived from 1,000,000 s/sec, 60 min/sec and 1 rotation/32 stripes. Given motor speeds of 1000 to 2000 RPM, we can expect Period measurements from 1875 to 938. For example, if the speed is 1500 RPM, this is 25 rotations per second. Because there are 32 stripes the frequency of the tachometer will be 800 Hz, giving a period will be 1250 s. As you can see, the above equation will take the 1250 period measurement and calculate Measured as 1500 RPM. The speed resolution is the smallest change in speed we can reliably detect. We calculate speed resolution at 1500 RPM as 1875000/1249 1875000/1250 1.2 RPM. The output of the controller is U, which is an integer from 0 (no power) to 250 (full power), representing the duty cycle of the PWM actuator. Figure 8.13 and Program 8.9 comprise the actuator for this system. In this case, the controller calls PWM_Duty0 when it wants to adjust power to the motor. An incremental control algorithm simply adds or subtracts a constant from U depending on the sign of the error, as shown in Figure 11.16. In other words, if the error is positive (too slow) then U is incremented and if the error is negative (too fast) then U is decremented. It is important to choose the proper rate at which the incremental control software is executed. If it is executed too many times per second, then the actuator will saturate resulting in a Bang-Bang system (always on or always off). If it is not executed often enough then the system will not respond quickly to changes in the physical plant or changes in desired speed. In this incremental controller we add or subtract “1” from the actuator, but a value larger than “1” would have a faster response at the expense of introducing oscillations.
Figure 11.16 Incremental control used to run a DC motor at a constant speed.
Input capture
Periodic interrupt Measured=18750000/Period
Period = TC1-First Error = Desired-Measured First = TC1 TFLG1 = 0x02
Too fast >0 U=U–1
<0 U =0
Error =0
>0
Too slow U
=250
<250 U=U+1
Program 11.2 uses a periodic interrupt so that the incremental controller runs in the background. The interrupt rate is selected to be about 10 times faster than the time constant of the physical plant. Even though the position variables Desired and Measured are unsigned, the error calculation Error will be signed. Program 11.2 Incremental position control software.
Checkpoint 11.9: In what way would the controller behave differently if we added/subtracted 10 instead of 1? Checkpoint 11.10: What happens the TC5 interrupt executes too frequently? Observation: It is a good debugging strategy to observe the assembly listing generated by the compiler when performing calculations on variables of mixed types (signed/unsigned, char/short). Observation: Incremental control will work moderately well (accurate and stable) for an extremely wide range of applications. Its only short-coming is that the controller response time can be quite slow.
11.8
Tutorial 11 Analog Input Programming The objective of this tutorial is to illustrate I/O programming of the LCD and ADC. An analog signal is sampled using the ADC, and the results are displayed on a LCD. As our software becomes more complete, we need alternative methods to visualize its complexity. The first part of this tutorial will be to visualize the software in different formats. Action: Copy the Tutor11.rtf Tutor11.uc Tutor11.scp Tutor11.io. files from the web onto your hard drive. Start a fresh copy of TExaS and open the Tutor11.rtf program file from within TExaS. This should open the corresponding microcomputer, scope and IO Device windows. Question 11.1 Flowcharts are a convenient way to describe computer algorithms. Look at the Tutor11.rtf file and draw flowcharts of the Main ADC_In and LCD_OutChar programs. Question 11.2 How does the synchronization technique used in ADC_In differ from LCD_OutChar? Question 11.3 Call-graphs are used to visualize software hierarchy. Look at the Tutor11.rtf file and draw a call-graph of the software system. First, begin by defining the three software modules
11.9 䡲 Homework Problems
417
and two hardware devices, as shown in Figure T11.1. Ovals are software modules and rectangles are I/O devices. If there were any global data structures, they would also be shown as rectangles. Next, for each situation where one function calls another, draw a call arrow from the function that performs the call to the function it calls. If one function calls a second function more than once, show only one arrow. Finally, for each I/O device access (read or write) draw an arrow from the function to the I/O device. Figure T11.1 The first step when drawing a call-graph is to list the functions and modules.
Main ADC_Init ADC_In
ADC Analog to digital convertor
LCD_OutHex LCD_Init LCD_OutChar
LCD Display
Question 11.4 A data flow graph illustrates the data as it flows from input to output. Draw a data flow graph of this system, starting with the input hardware (ADC), going through each software function that handles the data, leading to the output hardware (LCD). Again, ovals are software modules and rectangles are I/O devices. Action: Assemble and run Tutor11.rtf. Observe the I/O device registers in the microcomputer window, the analog voltage versus time signal in the scope window and the status of the external hardware in the IO Device window. The results of the ADC sampling will be displayed on the LCD. Question 11.5 What is the TCNT clock period? In this system, the default value is not used. Action: You will measure the time from ADC sample to ADC sample. First, remove all entries in the Microcomputer ViewBox window except TCNT. Next, add a breakpoint at the assembly line that starts the ADC (first instruction of the ADC_In function.) Next, switch the breakpoint system from Break BreakMode so that the check mark is absent. Run mode to Scan mode. In particular, execute Mode- the program and observe the timing results in the TheLog.RTF window. You should get results like the following TCNT=1748 TCNT=2047 TCNT=2346 TCNT=2645 TCNT=2942 TCNT=3239 TCNT=3536 Question 11.6 Using the results from Question 11.5, what is the average time in between ADC samples? Ignore the first couple of samples. Give your results in cycles and in sec. Question 11.7 Calculate the approximate sample rate in samples/sec. This is not a very appropriate method to sample an ADC. We can achieve very accurate timing using a periodic interrupt to sample the ADC. Question 11.8 If we still wish to display each ADC sample on the LCD, can we use interrupt synchronization to sample faster?
11.9
Homework Problems Homework 11.1 The Maxim MAX549 is a 2-channel 8-bit DAC similar to the MAX550. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX549. Show the circuit diagram connecting the DAC chip to an SPI port. Develop DACinit and DACout functions similar to the MAX550
418
11 䡲 Analog I/O Interfacing example in the chapter, except the DACout function includes a channel number in Register B. If Register B equals 0, then output the value in Register A to DAC channel A. If Register B equals 1, then output the value in Register A to DAC channel B. Homework 11.2 The Maxim MAX539 is a 1-channel 12-bit DAC similar to the MAX550. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX539. Show the circuit diagram connecting the DAC chip to an SPI port. Develop DACinit and DACout functions similar to the MAX550 example in the chapter, except the DACout function takes a 12-bit number in Register D. Homework 11.3 The Maxim MAX5235 is a 2-channel 12-bit DAC similar to the MAX550. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX5235. Show the circuit diagram connecting the DAC chip to an SPI port. Develop DACinit and DACout functions similar to the MAX550 example in the chapter, except the DACout function includes a channel number in Register X. If Register X equals 0, then output the 12-bit value in Register D to DAC channel A. If Register X equals 1, then output the 12-bit value in Register D to DAC channel B. Homework 11.4 Assume you have the MAX550 interface of Figure 8.10 and the software of 8.3. Write a main program and an output compare interrupt service routine that creates a 100 Hz sine wave analog output. The DAC outputs occur in the ISR, and after the main program initializes the DAC and output compare, it is free to perform other unrelated tasks. Homework 11.5 Consider the 8-bit R-2R resistor ladder shown in Figure Hw11.5. Assume Port T is an output and the digital output voltages from PTT are 0 or 5 V. Derive a relationship between the 8bit digital number output to PTT and the current flowing in the resistor labeled Rout. Hint: if one output pin is high and the other pins are low, calculate the current flowing from the pin up from through the 20 k resistor. Show that this current is the same value regardless of which pin is high (assuming the other pins are low). When a current comes up to a node (drawn with the black dot), it can go one way or another. Again assuming exactly one digital output is high, what happens to currents at each node? I.e., how much goes left and how much goes right? Solve for the basis elements of the 8-bit digital number. I.e., what is Iout if the digital number is 1, 2, 4, 8, 16, 32, 64, and 128? Given the responses for these basis elements, use the law of superposition to derive a general relationship.
Figure Hw11.5 8-bit R-2R resistor ladder.
20kΩ 10kΩ 10kΩ 10kΩ 10kΩ 10kΩ 10kΩ 10kΩ
9S12 PT0 PT1 PT2 PT3 PT4 PT5 PT6 PT7
Iout 20kΩ R out
20k
20k
20k
20k
20k
20k
20k
20k
Homework 11.6 Assume you have a 12-bit signed ADC. Let Vin be the analog voltage in volts and N be the digital ADC output. The input range of 5 Vin 5 V. The ADC digital output range is 2048 N 2047. First, write a linear equation that relates Vin as a function of N. Next, rewrite the equation in fixed-point math assuming Vin is represented as a decimal fixed point number with 0.001 V. Homework 11.7 Assume you have an 11-bit signed ADC. Let Vin be the analog voltage in volts and N be the digital ADC output. The input range of 10 Vin 10 V. The ADC digital output range is 1024 N 1023. First, write a linear equation that relates Vin as a function of N. Next, rewrite the equation in fixed-point math assuming Vin is represented as a decimal fixed point number with 0.01 V. Homework 11.8 Write an assembly language subroutine that samples ADC channel 2 four times, calculates the average of the four samples, and returns the result in Register A. Homework 11.9 Write an assembly language subroutine that samples all 8 ADC channels, calculates the average of the eight samples, and returns the result in Register A. Homework 11.10 Write an assembly language subroutine that samples all 8 ADC channels, calculates the minimum and maximum of the eight samples, and returns the range (maximum-minimum) in Register A.
11.10 䡲 Laboratory Assignments
419
Homework 11.11 Write an assembly language subroutine that samples ADC channels 0,1,2, calculates the median of the three samples, and returns the result in Register A. Homework 11.12 Write an assembly language subroutine that samples the ADC and returns a voltage in Register D using decimal fixed point with 0.001 V. Homework 11.13 Write an assembly language subroutine that samples the ADC and returns a voltage in Register D using binary fixed point with 28 V. Homework 11.14 Assume an AC waveform is connected to analog channel 0. Write an initialization ritual. Write a subroutine that samples the analog input 256 times, and returns the DC amplitude (average) in Register A, and the AC amplitude (maximum-minimum) in Register B. Homework 11.15 Assume an analog input signal is connected to the ADC channel 2 on computer 1. Assume the transmit serial output of computer 1 is connected to the receive serial input of computer 2. A MAX550A is connected to computer 2, like the interface shown in the previous chapter. Write the dedicated software in both systems, so that the analog input is sampled by the first computer, transmitted via the serial link, and converted back to analog form by the DAC by the second computer. Homework 11.16 An embedded system will use an ADC to measure a parameter. The measurement system range is 0.0 to 9.99 and a resolution of 0.01. What is the smallest number of ADC bits that can be used? Homework 11.17 An embedded system will use an ADC to measure a distance. The measurement system range is 10 to 10 cm and a resolution of 0.01 cm. What is the smallest number of ADC bits that can be used? Homework 11.18 An embedded system will use an ADC to measure a force. The measurement system range is 0 to 100 N and a resolution of 0.01 N. What is the smallest number of ADC bits that can be used? Homework 11.19 An 8-bit ADC (different from the 9S12) has an input range of 0 to 2 volts and an output range of 0 to 255 (called straight binary). What digital value will be returned when an input of 1.5 volts is sampled? Homework 11.20 A 12-bit ADC (different from the 9S12) has an input range of 2.5 to 2.5 volts and an output range of 0 to 4095 (called offset binary). What digital value will be returned when an input of 1.25 volts is sampled? Homework 11.21 A 16-bit ADC (different from the 9S12) has an input range of 0 to 2.5 volts and an output range of 0 to 65535 (called straight binary). What digital value will be returned when an input of 0.625 volts is sampled? Homework 11.22 Assume the ADC sequence length is 3, ATDCTL3 equals $18, and $95 is written into ATDCTL5. Which of the following happens? a) Channel 5 is sampled and the result is placed in ATDDR0 b) Channel 5 is sampled and the result is placed in ATDDR5 c) Channel 5 is sampled three times and the results are placed in ATDDR0-ATDDR2 d) Channel 5 is sampled three times and the results are placed in ATDDR5-ATDDR7 e) Channels 5,6,7 are sampled and the results are placed in ATDDR0-ATDDR2 f) Channels 5,6,7 are sampled and the results are placed in ATDDR5-ATDDR7 Homework 11.23 The Maxim MAX1247 is a 4-channel 12-bit ADC with an SPI interface. Search the http://www.maxim-ic.com/ web site for a data sheet for the MAX1247. Show the circuit diagram connecting the ADC chip to an SPI port. Develop ADC_Init and ADC_In functions to initialize and sample the ADC. The ADC_In function takes the channel number (0-3) in RegA and returns the 12-bit ADC sample in RegD.
11.10
Laboratory Assignments Lab 11.1 Voltmeter Purpose: The purpose of this lab is to learn LCD interfacing, to use interrupts to perform real time sampling and to use fixed-point numbers to represent non-integer values.
420
11 䡲 Analog I/O Interfacing Description: In this lab you will develop an accurate way to establish 1 kHz sampling. In particular, the ADC should be started exactly every 1 ms. Second, you will convert the ADC sample into a decimal fixed-point number, with a of 0.01 V. Lastly, you will develop a fixed-point display function, which will be used to display the sampled signal on the LCD. a) Write initialization routines for the output compare interrupts, the ADC and the LCD display. b) Write a subroutine that converts an 8-bit binary ADC sample into unsigned fixed-point format. The input parameter to the subroutine will be passed in using Register D, and your subroutine will return the result in Register D. Table L11.1a shows some example results. Do not worry if your answers differs by 1, because of rounding.
Table L11.1a Example results of the conversion from ADC sample to fixed-point.
Analog input
ADC sample
Fixed-point Output
0.000 V 1.234 V 3.456 V 5.000 V
%0000000000 %0011111100 %1011000000 %1111111111
0 123 346 500
c) Write a subroutine that outputs the fixed-point number to the LCD. The input parameter to the subroutine will be passed in using Register D. Table L11.1b shows some example results. Table L11.1b Example results of the fixed-point display subroutine.
Fixed-point
LCD Display
0 123 346 500
0.00 V 1.23 V 3.46 V 5.00 V
d) Write the main program that first initializes the system. The real time measurements occur in the background, results are passed via a global variable to the foreground and the main program converts it to fixed point, then displays the results on the LCD. Lab 11.2 Distance Monitor Purpose: The purpose of this lab is to learn LCD interfacing, to use interrupts to perform real time sampling and to use fixed-point numbers to represent non-integer values. Description: In this lab you will use a Sharp GP2Y0A21YK0F infrared object detector to measure distance (http://www.sharpsma.com). This sensor creates a continuous analog voltage between 0 and 5 V that depends inversely on distance to object, see Figure L11.2. Figure L11.2 Response curve of the Sharp GP2Y0A21YK0F distance sensor.
3.5 6 cm
Output voltage (V)
3.0
8 cm
5 cm
7 cm
2.5 10 cm
2.0 15 cm
1.5
0.5
20 cm
25 cm
1.0
30 cm 50 cm 40 cm
Gray paper (Reflectance ratio 18%)
80 cm
0.0 0.00
White paper (Reflectance ratio 90%)
0.05
0.10 0.15 1/Distance (1/cm)
0.20
11.10 䡲 Laboratory Assignments
421
The operational range of this sensor is 10 to 80 cm. Other Sharp sensors have other distance ranges. The response time is 39 ms, so you will use output compare interrupts to establish 10 Hz sampling. In particular, the ADC should be started exactly every 100 ms. Thirdly, you will convert the ADC sample into a decimal fixed-point number, with a of 0.1 cm. Lastly, you will develop a fixedpoint display function, which will be used to display the sampled signal on the LCD. a) Write initialization routines for the output compare interrupts, the ADC and the LCD display. b) Write a subroutine that converts the ADC sample into distance defined as an unsigned fixedpoint number. The input parameter to the subroutine will be passed in using Register D, and your subroutine will return the result in Register D. c) Write a subroutine that outputs the fixed-point distance to the LCD. d) Write the main program that first initializes the system. The real time measurements occur in the background, results are passed using a MailBox to the foreground and the main program converts it to fixed point, then displays the results on the LCD. Lab 11.3 AC/DC Voltmeter Purpose: The objectives of this lab are to 䡲 䡲 䡲 䡲
Interface three-digit LCD display to the microcomputer Write device drivers for a switch, an ADC and a LCD Implement functions for addition multiplication division and squareroot Implement AC/DC voltmeter
Description: In this lab you will design an AC/DC voltmeter. The ADC will be used to sample an analog input. The DC amplitude is the simple average of multiple samples. Let v[n] be 256 voltage sampled voltage values. DC (v[0] v[1] . . . v[255])/256) The AC amplitude is calculated as a root-mean-squared value. AC sqrt(((v[0] DC)2 (v[1] DC)2 . . . (v[255] DC)2)/256) The LCD output will be in decimal fixed point with a of 0.01 V. You will find it much simpler to perform the AC/DC calculations on the raw integer sample, then convert to integer AC/DC results to fixed-point voltages. A toggle switch will allow the operator to select either AC or DC mode. The three-digit LCD display will show either the calculated DC or AC value as a fixed-point number with equal to 0.01 V. A device driver is a set of software functions that facilitate the use of an I/O port. a) Create new program, microcomputer and I/O files. Attach a toggle switch, an analog signal to the ADC and simple three-digit LCD display. You can assume the toggle switch does not bounce. b) Write a device driver for the 3-digit LCD. You should be able to initialize the interface and output a fixed-point number. The names of all the public driver subroutines should start with the letters “LCD_”. Draw flowcharts of these subroutines. c) Write a device driver for the ADC interface. You should design subroutines as needed. The names of all the public driver subroutines should start with the letters “ADC_”. Draw flowcharts of these subroutines. d) Write a device driver for the switch interface. You should design subroutines as needed. All software that directly accesses the I/O ports connected to the switch must be included in this driver. The names of all the public driver subroutines should start with the letters “Switch_”. Draw flowcharts of these subroutines. e) Write the main program that implements the voltmeter functionality. Sample the ADC as fast as possible, and use the TCNT timer to estimate the sampling rate. Calculate the DC and AC results independent of the switch position, so that the sampling rate will be approximately constant. Include a “call-graph” of the system. f) Evaluate the accuracy of the meter, which is the difference between the true signal, and the results measured with the system. Use a signal period that is about 16 times larger than the ADC sample period. In this way there will be about 16 ADC samples per wave, and about 16 waves per block of 256 ADC samples. The first two tests will be performed on the pure sine wave. Determine the mathematical relationship between the peak-to-peak sine wave amplitude and its RMS value. First, keeping the DC value fixed, evaluate the accuracy of the system the 5 different AC values.
422
11 䡲 Analog I/O Interfacing Second, keeping the AC value fixed, evaluate the accuracy of the system the 5 different DC values. Lastly, keeping the minimum and maximum of the signal constant, test the voltmeter with each of the signal shapes. Explain the differences in the results. Lab 11.4 Real-Time Position Measurement System Purpose: This lab has these major objectives: 䡲 䡲 䡲 䡲
An introduction to sampling analog signals using the ADC interface; Development of an ADC device driver; Data conversion and calibration techniques; Develop an interrupt-driven real-time sampling device drive.
Description: You will design a position meter with a range of about 3 cm. A linear slide potentiometer (Alpha RA300BF-10-20D1-B54) converts position into resistance (0 R 50 k). You will use an electrical circuit to convert resistance into voltage (Vin). The potentiometer has three leads. The 9S12 ADC will convert voltage into a 10-bit digital number (0 to 1023). Your software will calculate position from the ADC sample as a decimal fixed-point number. The position measurements will be displayed on the LCD. A periodic interrupt will be used to establish the real-time sampling. The left of Figure L11.4a shows the data flow graph of this system. Dividing the system into modules allows for concurrent development and eases the reuse of code. The right of Figure L11.4a shows the call graph.
Position Voltage 0 to 3 cm 0 to +5V Position Sensor
Sample 0 to 1023
ADC hardware
ADC driver
Sample 0 to 1023 OC ISR
OC hardware LCD display
Fixed-point 0 to 3.000 LCD driver
OC ISR
main
OC init
ADC driver
OC hardware
ADC hardware
LCD driver LCD hardware
Figure L11.4a Data flow graph and call graph of the position meter system.
You should make the position resolution and accuracy as good as possible. The position resolution is the smallest change in position that your system can reliably detect. In other words, if the resolution were 0.01 cm and the position were to change from 1.00 to 1.01 cm, then your device would be able to recognize the change. Resolution will depend on the amount of electrical noise, the number of ADC bits, and the resolution of the output display software. Considering just the errors due to the 10-bit ADC, we expect the resolution to be 3 cm/1024 or about 0.003 cm. Accuracy is defined as the absolute difference between the true position and the value measured by your device. Accuracy is dependent on the same parameters as resolution, but in addition it is also dependent on the stability of the transducer and the quality of the calibration procedure. In this lab, you will be measuring the position of the armature (the movable part) on the slide potentiometer. This signal has very few frequency components (0 to 2 Hz.) According to the Nyquist Theorem, we need a sampling rate greater than 4 Hz. Consequently, you will create a system with a sampling rate of 5 Hz. You will sample the ADC exactly every 0.2 sec and calculate position using decimal fixed-point with of 0.001 cm. You should display the results on the LCD, including units. An output compare interrupt will be used to establish the real-time periodic sampling. When a transducer is not linear, you could use a piece-wise linear interpolation to convert the ADC sample to position ( of 0.001 cm.) The 9S12 assembly language etbl instruction is an efficient mechanism to perform the interpolation. The etbl.RTF assembly program
11.10 䡲 Laboratory Assignments
423
included with TExaS is an example of a piece-wise linear interpolation using the etbl instruction. There are two small tables Xtable and Ytable. The Xtable contains the ADC results and the Ytable contains the corresponding positions. The ADC sample is passed into the lookup function. This function first searches the Xtable for two adjacent of points that surround the current ADC sample. Next, the function uses the etbl instruction to perform a linear interpolation to find the position that corresponds to the ADC sample. You are free to implement the conversion in any acceptable manner, with the exception that you are not allowed to use the etbl instruction. The 10-bit ADC converters on the 9S12 are successive approximation devices with a short conversion time. You need to enable the ADC in ATD0CTL2. In particular, you will set ATD0CTL2 ⴝ $80. You can define the number of ADC conversions (1 to 8) in a sequence using ATD0CTL3. For this lab, you will only need a single conversion, so you can set the control bits S8C S4C S2C S1C in ATD0CTL3 equal to 0001 respectively. In particular, you will set ATD0CTL3 ⴝ $08. Bit 7 of determines if the ADC operates with 8 bits or 10 bits. You will clear bit 7 to specify 10-bit precision. The remaining 7 bits of ATD0CTL4 specify the ADC clock, which will determine the time to perform an ADC conversion. If the 9S12DP512 were running at 8 MHz, you should set ATD0CTL4 ⴝ $03. At this setting, the ADC will be clocked at 1 MHz, and the ADC conversion time will be equal to 14 s. However, in this lab we will be running the 9S12DP512 at 24 MHz, therefore you could set ATD0CTL4 ⴝ $05, the ADC will be clocked at 2 MHz, and the ADC conversion time will be equal to 7 s. In summary, the ADC initialization should set ATD0CTL2=$80 ATD0CTL3=$08 ATD0CTL4=$05
turns on ADC specifies ADC sequence will perform one conversion specifics 10-bit mode, and 7us conversion time
Writing to the ADC Control register (ATD0CTL5) begins a conversion. The ADC chip clocks itself. To perform a right-justified ADC conversion of channel 4, you should write a $84 to ATD0CTL5. After the first sample is complete, CCF0 is set and the result can be read out of the first result register, ATD0DR0. After the entire sequence has been converted, the SCF bit is set. In summary, the ADC conversion of channel 4 requires the following actions 1) 2) 3) 4)
ATD0CTL5=$84 starts the ADC Read ATD0STAT1 and look at bit 0 (CCF0) Loop back to step 2 over and over until CCF0 is set (7us) Read 10-bit result in ATD0DR0
The analog signal connected to the microcomputer comes from a position sensor, such that the analog voltage ranges from 0 to 5 V as the position ranges from 0 to 3 cm. First, you will use output compare interrupts to establish 5 Hz sampling. In particular, the ADC should be started exactly every 0.2 s. Second, you will convert the ADC sample (0 to 1023) into a 16-bit unsigned decimal fixed-point number, with a of 0.001 cm. Lastly, you will use your LCD_OutFix function from the previous lab to display the sampled signal on the LCD. Include units on your display. a) You can create a scale by Xerox-copying a metric ruler. There are many ways to build the transducer. One method requires cutting, gluing, and soldering. Start with a piece of wood or plastic a little larger than the potentiometer. Glue the frame (the fixed part) of the potentiometer to this solid object. Tape or glue the metric ruler on the frame but near the armature (the movable part) of the sensor. Attach or draw a hair-line to the armature, which will define the position measurement. Solder three solid wires to the slide potentiometer. b) Write two subroutines: ADC_Init will initialize the ADC interface and ADC_In4 will sample the ADC channel 4. Use the simulator to test these functions. c) Write a simple simple version of the system, which you can use to collect calibration data. In particular, this system should first sample the ADC and then display the results as unsigned decimal numbers. You should use your LCD_OutDec developed in the previous lab. Collect five to ten calibration points and create a table showing the true position (as determined by reading the position of the hair-line on the ruler), the analog input measured with a digital voltmeter and the ADC sample (like the first three columns of Table L11.4).
424
11 䡲 Analog I/O Interfacing
Table L11.4 Calibration results of the conversion from ADC sample to fixed-point.
Position
Analog Input
ADC Sample
Fixed-Point Output
0.010 cm 0.741 cm 1.500 cm 2.074 cm 3.000 cm
0.000 V 1.234 V 2.500 V 3.456 V 5.000 V
0 252 512 707 1023
10 741 1500 2074 3000
d) Use this calibration data to write a subroutine that converts a 10-bit binary ADC sample into a 16-bit unsigned fixed-point number. The input parameter (10-bit ADC sample) to the subroutine will be passed in using Register D, and your subroutine will return the result (integer portion of the fixed-point number) in Register D. Table L11.4 shows some example results. You are allowed to use a linear equation to convert the ADC sample into the fixed-point number. e) Write a subroutine: OC_Init will initialize the output compare system to interrupt at exactly 5 Hz (every 0.2 second). Use the simulator to test these functions. When debugging your code in TExaS it will be more convenient to run with a shorter OC interrupt period, e.g., 10 to 50 ms. 1. Disable interrupts to make the initialization atomic (set I bit in CCR) 2. Enable the timer and an output compare channel, make PT7 an output (interface a LED) 3. Arm output compare 4. Specify when the first output compare interrupt will be 5. Enable interrupts (clear I bit in CCR) f) Write an output compare interrupt handler that samples the ADC and outputs the data to the LCD. Use the simulator to test these functions. Using the interrupt synchronization, the ADC will be sampled at almost equal time intervals7. The interrupt service routine performs these tasks 1. Acknowledge the output compare interrupt by clearing the flag that requested the interrupt 2. Specify the time for the next interrupt 3. Toggle PT7 (change from 0 to 1, or from 1 to 0) 4. Sample the ADC 5. Convert the sample into a fixed-point number (0 to 3000) 6. Output the fixed-point number on the LCD 7. Return from interrupt g) Write a simple main program, which initializes the PLL, timer, LCD, ADC and output compare interrupts. After initialization, this main program (foreground) performs a do-nothing loop. The entire run-time operations occur in the output compare interrupt service routine (background). h) Use the system to collect another five to ten data points, creating a table showing the true position (xti as determined by reading the position of the hair-line on the ruler), and measured position (xmi using your device). Calculate average accuracy by calculating the average difference between truth and measurement, Average accuracy (with units in cm)
1 n ƒ x ti xmi ƒ n ia 1
Lab 11.5 Music generation using a Digital to Analog Converter Purpose: The purpose of this lab is to 䡲 Build a DAC, 䡲 Design a data structure to represent music, 䡲 Develop a system to play sounds.
7
More precisely, the output compare flag is set at exact time intervals. There is some variability in when the ISR uns depending on which instruction is being executed at the time when the flag is set.
11.10 䡲 Laboratory Assignments
425
Description: Most digital music devices rely on high-speed DAC converters to create the analog waveforms required to produce high-quality sound. In this lab you will create a very simple sound generation system that illustrates this application of the DAC. Your goal is to create an embedded system that plays three note (a digital piano with three keys). The first step is to design and test a 4-bit DAC, which converts 4 bits of digital output from the 9S12 to an analog signal, see Figure L11.5a. You are free to design your DAC with a precision more than 4 bits. You will convert the binary bits (digital) to an analog output using a simple resistor network. During the static testing phase, you will connect the DAC analog output to your voltmeter and measure resolution, range, precision and accuracy. During the dynamic testing phase you will connect the DAC output to headphones, and listen to sounds created by your software. It doesn’t matter what range the DAC is, as long as there is an approximately linear relationship between the digital data and the speaker current. The performance score of this lab is not based on loudness, but sound quality. The quality of the music will depend on both hardware and software factors. The precision of the DAC, external noise and the dynamic range of the speaker are some of the hardware factors. Software factors include the DAC output rate and the complexity of the stored sound data. You can create a 3 k resistor from two 1.5 k resistors. You can create a 6 k resistor from two 12 k resistors,
Figure L11.5a DAC allows the software to create music.
Static testing
9S12
9S12 Bit3
Bit3 Bit2 bit1
Dynamic testing
Vout Voltmeter
Bit2 Bit1
I out
Speaker
The second step is to design a low-level device driver for the DAC. Remember, the goal of a device driver is to separate what the device does (general descriptions of DAC_Init and DAC_Out) from how is does it (implementations of DAC_Init and DAC_Out). The third step is to design a data structure to store the sound waveform. You are free to design your own format, as long as it uses a formal data structure. Compressed data occupies less storage, but requires runtime calculation. The fourth step is to organize the digital piano software into a device driver. Although you will be playing only three notes, the design should allow additional notes to be added with minimal effort. For example, if your system plays C, D, E, then you will need public functions Piano_Stop Piano_C, Piano_D and Piano_E. The Stop function makes it silent and the other functions activate a sound. A background thread implemented with output compare will fetch data out of your music structure and send them to the DAC. The last step is to write a main program that inputs from binary switches and performs the four public functions. If you output a sequence of numbers to the DAC that form a sine wave, then you will hear a continuous tone on the speaker. a) Draw the circuit required to interface the DAC to the 9S12. Design the DAC converter using a simple resistor-adding technique. Use resistors in a 1/2/4/8 resistance ratio. Select values in the 1.5 k to 12 k range. For example, you could use 1.5 k, 3 k, 6 k, and 12 k. Notice that you could create double/half resistance values by placing identical resistors in series/parallel. It is a good idea to email your design to your TA and have him/her verify your design before you build it. You can solder 24 guage solid wires to the jack to simplify connecting your circuit to the headphones. b) Write a low-level device driver for the DAC interface. Include two functions that implement the DAC interface. The function DAC_Init() initializes the DAC, and the function DAC_Out sends a new data value to the DAC. You can debug your software in TExaS using the DC motor I/O device. This module allows you to connect a DAC to an output port. You can select the precision of the DAC (4 bits in this case). You can visualize the generated waveform on the scope by selecting the D/A output (or DC motor power). Figure L11.5b shows the TExaS dialog to interface the DAC to PM3,2,1,0, and a sine wave generated by a 4-bit DAC simulated in TExaS.
426
11 䡲 Analog I/O Interfacing
Figure L11.5b The 4-bit DAC is used to create a sin wave using TExaS.
c) Write a couple of simple main programs that test the DAC interface. This main program can be used for static testing. You can single step this program using the debugger to test the static function of the DAC org Entry lds jsr clra loop jsr inca anda bra
$4000 #$4000 DAC_Init DAC_Out #$0F loop
d) Using Ohm’s law and fact that the digital output voltages will be approximately 0 and 5 V, make a table of the theoretical DAC voltage and as a function of digital value (without the speaker attached). Calculate resolution, range, precision and accuracy. This main program can be used for dynamic testing. It creates triangle waveform (adjust the 1000 to affect the frequency). org Entry lds jsr jsr clra psha n equ loop ldd jsr ldaa inca jsr staa cmpa bne loop2 ldd jsr ldaa deca jsr staa cmpa bne bra
e) Design and write the piano device driver software. Add minimally intrusive debugging instruments to allow you to visualize when interrupts are being processed. f) Write a main program to run the entire system. Document clearly the operation of the routines. Figure L11.5c shows the data flow graph of the music player.
11.10 䡲 Laboratory Assignments Figure L11.5c Data flows from the memory and the switches to the speaker.
Push buttons
Switch interface
Timer hardware
Timer interface
main
Sound interface
427
Speaker hardware
music
Figure L11.5d shows a possible call graph of the system. Dividing the system into modules allows for concurrent development and eases the reuse of code. Figure L11.5d A call graph showing the three modules used by the music player.
main program
Switch driver
Switch hardware
DAC driver
music
OC hardware
Speaker hardware
Extra Credit: Extend the system so that is plays your favorite song (a sequence of notes, set at a specific tempo and includes an envelop like Figure 11.9). Your goal is to play your favorite song. One possible approach is to use two output compare interrupts. A fast output compare ISR outputs the sinewave to the DAC (Figure 8.10). The rate of this interrupt is set to specify the frequency (pitch) of the sound. A second slow output compare ISR occurs at the tempo of the music. For example, if the song has just quarter notes at 120, then this interrupt occurs every 500 ms. If the song has eight notes, quarter notes and half notes, then this interrupt occurs at 250, 500, 1000 ms respectively. During this second ISR, the frequency of the first ISR is modified according to the note that is to be played next. Compressed data occupies less storage, but requires runtime calculation. On the other hand, a complete list of points will be simpler to process, but requires more storage than is available on the 9S12. The fourth step is to organize the music software into a device driver. Although you will be playing only one song, the song data itself will be stored in the main program, and the device driver will perform all the I/O and interrupts to make it happen. You will need public functions Play and Stop, which perform operations like a cassette tape player. The Play function has an input parameter that defines the song to play. If you complete the extra credit (with input switches that can be used to play and stop), then the piano functionality in parts e) and f) need not be completed. Either way, parts a) b) c) and d) are required. Lab 11.6 Real-Time Temperature Data Acquistion Purpose: The objective of this lab is to study analog to digital conversion, real-time sampling, digital filtering foreground-background communication, table lookup, and LCD display. Description: In preparation for this assignment, review fixed-point, passing/returning parameters, and ADC converters. Look up how TExaS simulates analog signals and the ADC using the on-help. Analog command. Example programs that apply to this lab include In particular, get help for the IO- HD44780.rtf, LCD.rtf, tut3.rtf, and tut5.rtf. The ADC will be used to measure oral body temperature. The range is about 90 to 105 F, and the resolution is 0.1 F. Real-time data acquisition will use 0.5 ms periodic interrupts. The sampling rate will be 2 kHz. 500 Hz analog noise will be added. A 500 Hz digital reject filter will remove the noise. Software will convert the ADC sample into decimal fixed point number, and the result will be displayed on a LCD. The system will implement a digital oral thermometer. The thermistor resistance is nonlinearly related to its temperature in Kelvin.
428
11 䡲 Analog I/O Interfacing RT R0exp(/T) where R0 is 1.03947E-07 k and is 5808.1 K for this device. An analog circuit, shown in Figure L11.6a, converts the resistance to voltage, and the ADC converts voltage to digital sample. R1 is 200 k, R2 is 11.1 k, and the gain is 28.6.
Figure L11.6a Temperature data acquisition circuit.
+5 V
R1
R1
+5V
V2 RG
RT
R2
V3
AD623 or AD627
ADC
V1
The overall response is shown in Table L11.6, and plotted in Figure L11.6b. The details of these calculations can be found in the spreadsheet Lab11_06.xls.
a) Interface a LCD display (either the simple BCD or the Hitachi HD44780), a switch, and an analog signal to the microcomputer. You should select a slow-varying sine-wave to simulate oral temperature. The peak temperature will be displayed on the LCD. The maximum voltage should be set to 5000 mV (90F) and the minimum voltage can be adjusted to change body oral temperature. For example, 1910 mV (ADC 97) represents 98.6 F. Make the sine-wave period large, e.g., 50000 sec. In this way, it will take a lot of ADC samples to create an entire cycle. You should add 100 mV of 500 Hz (2000 sec period) noise. b) Enter the last two columns of data from Table L11.6 into a constant data structure. Write a subroutine that converts ADC sample to decimal fixed-point temperature using table look up and linear interpolation. Table L11.6 Temperature calibration.
c) Write software that samples at 2 kHz, implements the following digital filter, and stores the digital filter output into a FIFO queue. You will need two MACQs, one for x(n) and one for y(n). y(n) (113*x(n) 113*x(n 2) 98*y(n 2))/128 Observation: If the sampling rate were to be 240 Hz, this filter rejects 60 Hz. d) Write a main program that waits for the switch to be pressed. Once the switch is pressed, the interrupts are armed, data is removed from the FIFO, converted to decimal fixed point and the maximum temperature is displayed on the LCD. If the switch is released, then the interrupts are disarmed, the display is blanked, and the software waits for the switch to be pressed again. Observation: If you change the sampling rate, change the simulated noise period to be four times the ADC sample period. Lab 11.7 AC Voltmeter Purpose: In this lab you will learn how to convert, subtract, and display decimal fixed-point numbers. You will pass parameters on the stack. The analog to digital converter will be used to convert an analog signal into digital form. Description: In preparation for this assignment, review fixed-point, passing/returning parameters on the stack, and ADC converters. Look up how TExaS simulates analog signals and the ADC using the on-help. In particular, get help for the IO->Analog command. Run and analyze the TUT3.rtf example program. Five possible analog waveforms can be connected to the ADC. Your objective is to measure the analog signal 1008 times and convert each ADC sample to fixed-point voltage. During the 100-sample sequence establish the minimum (min) and maximum (max). At the end of the cycle, you will measure the AC amplitude (max ⴚ min) and display it on the LCD display. Activate the appropriate decimal point in the LCD and add appropriate label and units. Your software should reinitialize variables and continuously repeat the 100-sample cycle. a) Create an I/O file and attach an analog waveform and a LCD display. You will adjust the number of samples (e.g., 100) in your main program loop, and the period of the analog wave so that about 2 to 5 waveform periods occur in each software loop. You will adjust the minimum, maximum, and noise level during the testing steps of this problem (part f). b) Write a general purpose ADC sampling subroutine that accepts an 8-bit call by value input parameter, channel number (0 to 7), and returns an 8-bit unsigned binary digital result from the ADC. Both parameters will be on the stack. Typical calling sequences are shown below. Use binding (equ) to make the subroutine more readable. ***** ZZ=A2D(4); *** ******* XX=A2D(chan); ************ ldab #4 movb chan,1,-sp ; push chan on the stack pshb leas -1,sp des ; allocate space for result bsr A2D jsr A2D pulb movb 1,sp+,XX ; get result stab ZZ leas 1,sp ins ; discard input c) Write an 8-bit unsigned binary to 16-bit unsigned decimal fixed-point conversion subroutine. The fixed-point constant is 0.01 V. You may choose to pass parameters anyway you wish, but please document in the comments how parameters are passed. In the table below, the ADC converts from the first to second column. This subroutine converts the 8-bit unsigned number shown in the second column to the value shown in the third column.
8
You can adjust this number to make sure you are observing at least two full periods of the waveform.
430
11 䡲 Analog I/O Interfacing
Analog Voltage
ADC Output
16-bit Unsigned Decimal Fixed Point
LCD Display
0.0 V 0.02 V 1.25 V 2.50 V 4.98 V
0 1 64 128 255
0 2 125 250 498
0.00 0.02 1.25 2.50 4.98
d) Write a subroutine that takes a 16-bit unsigned decimal fixed-point number (fixed constant is 0.01 V), and displays it on the LCD display. You may choose to pass parameters anyway you wish, but please document in the comments how parameters are passed. In the above table, this subroutine converts the 16-bit integer shown in the third column to the LCD pattern shown in the fourth column. e) Write the main program that calls the above three subroutines and performs the AC voltmeter measurements, updating the LCD at the end of each cycle. f) Using the information entered in the IO-Analog command as truth, collect the following measurements. At the time of checkout be prepared to discuss why the last three measurements had more error than the first.
Waveform
Noise
True Max (volts)
True Min (volts)
True AC (volts)
Sine Sine EKG Sine
None None None 100 mV 500 s
4.000 1.005 4.000 4.000
1.000 1.000 1.000 1.000
3.000 0.005 3.000 3.000
Measured AC Percent Error (%) (volts) 100•(true-measured)/true
g) In addition to the operations described in part f extend the main program to also measure the period of a sine wave in sec. Display the AC amplitude on the LCD display and display the period by simply writing it to a global variable, and observing it in the ViewBox. To implement hysteresis, we define two thresholds at 25 percent and 75 percent9 depending on min and max: High (3*(max min))/4 min Low (max min)/4 min First wait for the signal to go below low, then wait for the signal to go above high, then wait for the signal to go below low, then wait for the signal to go above high, then wait for the signal to go below low, then wait for the signal to go above high,
first TCNT; period (TCNT-first)/8; first TCNT; period (TCNT-first)/8; first TCNT;
Record the TCNT each time the signal goes above high, and the period is the 16-bit unsigned difference between TCNT measurements. After subtracting, divide by 8 to get the answer in sec. Discuss with the TA at the time of checkout, what would happen if you divided first, then subtracted and, why two thresholds were used instead of one. In this first example, TCNT is $1000 the first time the signal goes above high, and $7000 the second time. The period is ($7000 to $1000)/8 or 3072 sec.
9
You may adjust these percentages so that you get only one trigger each period.
11.10 䡲 Laboratory Assignments
431
period Max High
TCNT = $1000
Low Min
TCNT = $7000
This second example illustrates that the system works even if the TCNT rolls over. In this example, the period is now 3584 sec, but TCNT is $F000 the first time the signal goes above high, and $6000 the second time. The period is ($6000-$F000)/8 or 3584 sec. This works because both TCNT and the subtraction are unsigned 16-bit values. period Max High
TCNT = $F000
Low Min
TCNT = $6000
Lab 11.7 Microcomputer-Based Motor Controller Purpose: The objective of this lab is to study analog to digital conversion, digital to analog conversion, real-time digital control, and LCD display. Description: The objective of this problem is to design a microcomputer-based motor controller. The desired rotation speed x* will be selected interactively by the operator typing on keyboard, either the matrix keyboard or the SCI interface. The output information will be displayed on either a LCD or the SCI interface. You will power the motor with an analog signal from the DAC. You will estimate the motor speed (x’) by measuring the tachometer voltage using the ADC. Your first control software will implement an incremental control algorithm. Your second system will implement a proportional/integrator (PI) control system. The goal of the control software is to maintain the motor speed as close to x* as possible. You will implement two control systems. The first one will be a simple incremental controller. Let u be the DAC output controlling the motor. The power to the motor is directly related to the DAC value. The value is increased by a fixed amount if it is spinning too slowly, and decreased by a fixed amount if it is spinning too fast. The incremental control algorithm executes the following at a regular rate:
or
u min(255,u 1)
if x* x’ (too slow)
u max(0,u 1)
if x* x’ (too fast)
The min and max operations maintain the DAC output within the valid range of 0 to 255. The disadvantage of incremental control is that it has a very slow response. The second system you will implement will be a PI controller. We can use linear control theory to develop the digital controller, see Figure L11.7. We will define the tachometer voltage as the actual motor speed. This speed will be measured with the ADC. Any error in the state estimator will lead to a nonremovable controller error. Just like the data acquisition and digital filter situations, t is the continuous time and n is the discrete time. We will assume the controller is executed at a fixed interval, t. Figure L11.7 Block diagram of a linear control system in the frequency domain.
x*
e(n)
PI Controller k kP + s I x'(n)
u(n)
Actuator c
p(t) DC motor m 1 + sτ
State Estimator 1
f(t)
432
11 䡲 Analog I/O Interfacing Theoretically we can choose controller constants, kP and kI, to create the desired controller response. Unfortunately it can be difficult to estimate c, m and . If a load is applied to the motor, then m and will change. In addition, most motors do not follow a simple single pole relationship. The basic approach is presented in the following equations. Let x(n) be the current tachometer voltage represented as a fixed-point number. Let x* be the desired tachometer voltage also represented as a fixedpoint number. The error is e(n) (x* x(n)) The proportial term is up(n) (kp*e(n))/100 The integral term is ui(n) ui(n 1) (ki*e(n))/100 if (ui(n) 50) then ui(n) 50 (called anti-reset-windup) if (ui(n) 50) then ui(n) 50 The DAC output is the combination of u(n) up(n) ui(n) if (u(n) 255) then u(n) 255 if (u(n) 0) then u(n) 0 All calculations are performed as 16-bit signed integers. A simple empirical method can be used to determine the controller constants. This empirical approach starts with just a proportional term (kp). This proportional controller will generate a smooth motor speed (actuator output achieves a constant value), but the speed will not be correct. Try different kp constants until the response times are fast enough. The response time is the delay after x* is changed for the motor to reach a new constant speed. kp is too big if the actuator saturates both at the maximum and minimum after x* is changed. Steady state controller accuracy is defined as the average difference between x* and x. The next step is to add some integral term (ki) a little at a time to improve the steady state controller accuracy without adversely affecting the response time. Don’t change both kp and ki at once. Rather, you should vary them one at a time. Overshoot is defined as the maximum positive error that occurs when x* is increased. Similarly, undershoot is defined as the maximum negative error that occurs when x* is decreased. If the response time, overshoot, undershoot and accuracy are within acceptable limits, then a PI controller is adequate. The foreground (main) process: Initializes I/O ports and data structures Explanations of the various interpreter commands Maintain a display of x(n) The interpreter process (using interrupting keyboard or interrupting SCI): Can specify the desired motor speed, 0 x* 5000 mV with a resolution of 1 mV The digital controller (periodic interrupt) process: The controller rate is 1 ms. Implement a PI control system
12
Communication Systems Chapter 12 objectives are to: c c c c c
Present a general model for data flow problems Develop implementations for the first in first out queue Discuss methods to support interthread communication Design show simple networks based on the SCI port Introduce the controller area network (CAN), and use it to connect 9S12’s together c Present the I2C protocol, and use it to interface peripheral devices
The goal of this chapter is to provide a brief introduction to communication systems. Communication theory is a richly developed discipline, and much of the communication theory is beyond the scope of this book. Nevertheless, the trend in embedded systems is to employ multiple intelligent devices, therefore the interconnection will be a strategic factor in the performance of the system. A variety of different manufacturers are involved in the development these devices, thus the interconnection network must be flexible, robust, and reliable. Because the emphasis of this book is on real-time embedded systems, this chapter focuses on implementing communication systems appropriate for embedded systems. The components of an embedded system typically combined to solve a common objective, thus the nodes on the communication network will cooperate towards that shared goal. In particular, requirements of an embedded system, in general, involve relatively low to moderate bandwidth, static configuration, and a low probability of corrupted data. On the other hand reliability and latency are important for real-time systems.
12.1
Introduction In Chapter 8, we presented the hardware and software interfaces for the SCI channel. At that time we connected the 9S12 to an I/O device, and used the SCI channel to input/output data to the human. In this chapter, we will build on those ideas and introduce the concepts of networks by investigating a couple of simple networks. In particular, we will use the SCI channel to connect multiple 9S12’s together, creating a network. A communication network includes both the physical channel (hardware) and the logical procedures (software) that allow users or software processes to communicate with each other. The network provides the transfer of information as well as the mechanisms for process synchronization. It is convenient to visualize the network in a hierarchical fashion, as shown in Figure 12.1. At the lowest level, frames are transferred between I/O ports of the two (or more) computers along the physical link or hardware channel. Error detection and correction may be 433
434
12 䡲 Communication Systems
Figure 12.1 A layered approach to communication systems.
Communication TaskA
TaskB
TaskC
TaskC
Messages OS1
OS2
Computer 1 I/O Port
Computer2 Frames
I/O Port
Physical Link
handled at this low level. At the next logical level, the operating system (OS) of one computer sends messages or packets to the OS on the other computer. The message protocol will specify the types and formats of these messages. Error detection and correction may also be handled at this level. Messages typically contain four fields: 1. Address information field Physical address specifying the destination/source computers Logical address specifying the destination/source processes (e.g., users) 2. Synchronization or handshake field Physical synchronization like shared clock, start and stop bits OS synchronization like request connection or acknowledge Process synchronization like semaphores 3. Data field ASCII text (raw or compressed) Binary (raw or compressed) 4. Error detection and correction field Vertical and horizontal parity Checksum Block correction codes (BCC) Observation: Communication systems often specify bandwidth in total bits/sec, but the important parameter is the data transfer rate. Observation: Often the bandwidth is limited by the software and not the hardware channel.
At the highest level, we consider communication between users or high-level software tasks. Many embedded systems require the communication of command or data information to other modules at either a near or a remote location. Because the focus of this book is embedded systems, we will limit our discussion with communication with devices within the same room. A full duplex channel allows data to transfer in both directions at the same time. In a half duplex system, data can transfer in both directions but only in one direction at a time. Half-duplex is popular because it is less expensive (2 wires) and allows the addition of more devices on the channel without change to the existing nodes.
12.2
Reentrant Programming and Critical Sections In general, if two threads access the same global memory and one of the accesses is a write, then there is a causal dependency between the threads. This means, the execution order may affect the outcome. Shared global variables are very important in multithreaded systems because they are required to pass data between threads, but they are complicated and it is hard to find bugs can result with their use.
12.2 䡲 Reentrant Programming and Critical Sections
435
A program segment is reentrant if it can be concurrently executed by two (or more) threads. To implement reentrant software, we place variables in registers or on the stack, and avoid storing into global memory variables. When writing in assembly, we use registers, or the stack for parameter passing to create reentrant subroutines. Typically each thread will have its own set of registers and stack. A nonreentrant subroutine will have a section of code called a vulnerable window or critical section. An error occurs if 1. One thread calls the nonreentrant subroutine 2. Is executing in the critical section when interrupted by a second thread 3. The second thread calls the same subroutine There are a number of scenarios that can happen next. In the most common scenario, the second thread is allowed to complete the execution of the subroutine, control is then returned to the first thread, and the first thread finishes the subroutine. This first scenario is the usual case with interrupt programming. In the second scenario, the second thread executes part of it, is interrupted and then re-entered by a third thread, the third thread finishes, the control is returned to the second thread and it finishes, lastly the control is returned to the first thread and it finishes. This second scenario can happen in interrupt programming if interrupts are reenabled during the execution of the ISR. A critical section may exist when two different subroutines access and modify the same memory-resident data structure. Program 12.1 shows an assembly and a C function that are nonreentrant because they use a global variable, num. The assembly language program accepts two 16-bit signed integers in Registers D and X and returns the average in Register D. These functions could have been made reentrant by implementing num as a local variable or in a register, but the purpose of the example is to illustrate what can go wrong when a nonreentrant function is re-entered. Program 12.1 This function is nonreentrant because of the read-modify-write access to a global.
num rmb 2 Ave stx num addd num asrd rts
short num; short Ave(short first, short second){ num = first; return (num+second)/2; }
Checkpoint 12.1: Rewrite Program 12.1 so that it is reentrant.
A critical section exists between the stx and the addd instructions. Assume there are two concurrent threads (the main program and a background ISR) that both call this subroutine. Concurrent means that both threads are ready to run. Because there is only one computer, exactly one thread will be running at a time. Typically, the operating system switches execution control back and forth using interrupts. For example, the main program might be executing when an interrupt causes the computer to switch over and execute the ISR. When the ISR is done it executes an rti and the control returns back to the main program. An error occurs if: 1. The main program calls Ave 2. The main program executes the stx instruction saving its second number in num 3. The OS halts the main program (using an interrupt) and starts the interrupt service routine 4. The ISR calls Ave The ISR executes the stx saving its second number in num The ISR finishes Ave 5. The OS returns control back to the main program 6. The main program executes the addd instruction but gets the wrong num An atomic operation is one that once started is guaranteed to finish. In most computers, once an instruction has begun, the instruction must be finished before the computer can
436
12 䡲 Communication Systems
process an interrupt. Therefore, the following read-modify-write sequence is atomic because it can not be halted in the middle of its operation. inc counter
;where counter is a 8-bit global variable
On the other hand, this read-modify-write sequence is not atomic because it can start, then be interrupted. ldx counter inx stx counter
;where counter is a 16-bit global variable
In general, nonreentrant code can be grouped into three categories all involving 1) nonatomic sequences, 2) writes and 3) global variables. We will classify I/O ports as global variables for the consideration of critical sections. We will group registers into the same category as local variables because each thread will have its own registers and stack. The first group is the read-modify-write sequence: 1. The software reads the global variable producing a copy of the data 2. The software modifies the copy (at this point the original variable is still unmodified) 3. The software writes the modification back into the global variable. In the second group, we have a write followed by read, where the global variable is used for temporary storage: 1. The software writes to the global variable (this becomes the only copy of the information) 2. The software reads from the global variable expecting the original data to still be there. In the third group, we have a non-atomic multi-step write to a global variable: 1. The software writes part of the new value to a global variable 2. The software writes the rest of the new value to a global variable. Observation: When considering reentrant software and vulnerable windows we classify accesses to I/O ports the same as accesses to global variables. Observation: Any variable larger than 16 bits on the 9S12 will require at least two instructions to read or write it. Observation: Sometimes we store temporary information in global variables out of laziness. This practice is to be discouraged because it wastes memory and may cause the module to not be reentrant.
Sometime we can have a critical section between two different software functions (one function called by one thread, and another function called by a different thread). In addition to above three cases, a non-atomic multi-step read will be critical when paired with a multi-step write. For example, assume a data structure has multiple components (on the 9S12, any variable larger than 16 bits is considered to have multiple components). In this case, the write to the data structure will be atomic because it occurs in an ISR, running with I 1. The critical section exists in the foreground between steps 1 and 2. In this case, a critical section exists even though no software has actually been reentered. Foreground Thread
Background Thread
1. The software reads some of the data 2. The software reads the rest of the data
1. The ISR writes to the data structure
12.2 䡲 Reentrant Programming and Critical Sections
437
In a similar case, a non-atomic multi-step write will be critical when paired with a multistep read. Again, assume a data structure has multiple components. In this case. the read from the data structure will be atomic because it occurs in an ISR, running with I 1. The critical section exists in the foreground between steps 1 and 2. Foreground Thread
Background Thread
1. The software writes some of the data 2. The software writes the rest of the data
1. The ISR reads from the data structure
When multiple threads are active, it is possible for two threads to be executing the same program. For example, the system may be running in the foreground and calls Func. Part way through execution the Func, an interrupt occurs. If the ISR also calls Func, two threads are simulataneously executing the function. To experimentally determine if a function has been reentered, we could use two output pins. We increment the port at the start and decrement it at the end. The thread has been re-entered if the port value goes above 1, as shown in Program 12.2. In this example, Port T is not part of the original code, but rather used just for the purpose of debugging. PT0 is 1 when one thread starts executing the function. However, if PT1 becomes 1, then the function has been reentered.
Program 12.2 Detection of re-entrant behavior using two output bits.
; subroutine to be tested Func inc PTT ; the function dec PTT rts
// function to be tested void Func(void){ PTT++; // the function PTT—; }
Checkpoint 12.2: What does it mean if both PT1 and PT0 are 1?
If critical sections do exist, we can either eliminate it by removing the access to the global variable or implement mutual exclusion, which simply means only one thread at a time is allowed execute in the critical section. In general, if we can eliminate the global variables, then the subroutine becomes reentrant. Without global variables there are no “vulnerable” windows because each thread has its own registers and stack. Sometimes one must access global memory to implement the desired function. Remember that all I/O ports are considered global. Furthermore, global variables are necessary to pass data between threads. A simple way to implement mutual exclusion is to disable interrupts while executing the critical section. It is important to disable interrupts for as short a time as possible, so as to minimize the affect on the dynamic performance of the other threads. While we are running with interrupts disabled, time-critical events like power failure and danger warnings can not be processed. Notice also that the interrupts are not simply disabled then enabled. Before the critical section, the interrupt status is saved, and the interrupts disabled. After the critical section, the interrupt status is restored. You can not save the interrupt status in a global variable, rather you should save it either on the stack or in a register. In assembly, we can use the following skeleton to implement mutual exclusion and eliminate the critical section: pshc ;save CCR sei ;disable interrupts ;execute the critical section pulc ;restore I bit to its original value
438
12 䡲 Communication Systems
In C, we can use a similar approach to implement mutual exclusion and eliminate the critical section: void function(void){ char saveCCR; asm tpa // previous interrupt enable asm staa saveCCR // save previous asm sei // make atomic // execute the critical section asm ldaa saveCCR // recall previous asm tap // end critical section } Checkpoint 12.3: Consider the situation of nested critical sections. For example, a function with a critical section calls another function that also has a critical section. What would happen if you simply added a sei at the beginning and a cli at the end of each critical section?
Reentrant programming is very important when writing high-level language software too. Obviously, we minimize the use of global variables. But when global variables are necessary must be able to recognize potential sources of bugs due to nonreentrant code. We must study the assembly language output produced by the compiler. For example, we can not determine whether the following read-modify-write operation is reentrant without knowing if it is atomic: time++;
If the compiler generates the following object code, then time; is atomic (therefore not critical) inc time
If the compiler generates the following object code, then time; is not atomic (therefore critical) ldd time addd #1 std time Observation: A good compiler generates atomic code when setting or clearing individual bits in the I/O ports. E.g., PTT & ⬃0x40; and PTH | 0x20; will be compiled as bclr PTT,#$40 and bset PTH,#$20.
Another category of timing-dependent bugs, similar to critical sections, is called a race condition. A race condition occurs in a multi-threaded environment when there is a causal dependency between two or more threads. In other words, different behavior occurs depending on the order of execution of two threads. In this first example, thread1 initializes DDRT 0xF0 because it uses PT7 to 4 as outputs, and thread2 initializes DDRT 0x0F because it uses PT3 to 0 as outputs. In particular, if thread1 initializes first and thread2 initializes second, then PT7 to 4 will be set to inputs. Conversely, if thread2 initializes first and thread1 initializes second, then PT3 to 0 will be set to inputs. In a second example, assume two threads are trying to get data from the same input device. When data arrives at the input, the thread that executes first will capture the data.
12.3
Interthread Communication and Synchronization For regular function calls we use the registers and stack to pass parameters, but interrupt threads have logically separate resisters and stack. In particular, all registers are automatically saved by the microcomputer as it switches from main program (foreground thread) to interrupt service routine (background thread). The rti instruction will restore the registers
12.3 䡲 Interthread Communication and Synchronization
439
(including the interrupt enable bits and the PC) back to their previous values. Thus, all parameter passing must occur through shared global memory. One cannot pass data from the main program to the interrupt service routine using registers or the stack. The classic producer/consumer problem has two threads. One thread produces data and the other consumes data. For an input device, the background thread is the producer because it generates new data, and the foreground thread is the consumer because it uses the data up. For an output device, the data flows in the other direction so the producer/consumer roles are reversed.
12.3.1 Mailbox
Figure 12.2 A mailbox can be used to pass data between threads.
One simple interthread communication scheme is the mailbox. Figure 12.2 illustrates an input device interfaced using interrupt synchronization. The big arrow in this figure signifies the communication/synchronization link between the background and foreground. The mailbox structure is implemented with two global variables. RxMail contains data, and RxStatus is a flag specifying whether the mailbox is full or empty. The interrupt is requested when its trigger flag is set, signifying new data is ready from the input device. The ISR will read the data from the input device and store it in the global variable RxMail, then update its status as full. The main program will perform other calculations, while occasionally checking the status of the mailbox. When the mailbox has data, the main program will process it. This approach is adequate for situations where the input bandwidth is slow compared to the software processing speed. It is also possible to process the data within the ISR itself, and just report the results of the processing to the main program using the mailbox. Main program
ISR Read data from input
Other calculations Empty RxStatus
RxMail=data RxStatus=full
Full Process RxMail RxStatus=empty
rti
One way to visualize the interrupt synchronization is to draw a state versus time plot of the activities of the hardware, the mailbox, and the two software threads. Figure 12.3 shows that at time (a) the mailbox is empty, the input device is idle and the main program is performing other tasks, because mailbox is empty. When new input data is ready, the trigger flag will be set and an interrupt will be requested. At time (b) the ISR reads data from input Figure 12.3 Hardware/software timing of an input interface using a mailbox.
Input device
Trigger set
interrupt service routine main program
b
b rti
a
c empty
RxStatus
Trigger set
full
rti d
a
c
empty
full
d
a empty
440
12 䡲 Communication Systems
device and saves it in RxMail, then it sets RxStatus to full. At time (c) the main program recognizes RxStatus is full. At time (d) the main program processes data from RxMail, sets RxStatus to empty. Notice that even though there are two threads, only one is active at a time. The interrupt hardware switches the processor from the main program to the ISR, and the rti instruction switches the processor back.
12.3.4 Producer Consumer Problem
Figure 12.4 FIFO queues and double buffers can be used to pass data from a producer to a consumer.
Table 12.1 Producer-consumer examples.
The first in first out circular queue (FIFO) and double buffer are useful for data flow situations, as shown in Figure 12.4. These data structures can be used to link a source process (the producer is hardware/software that generates data) to a sink process (the consumer is hardware/software that consumes data). In both cases the data is order-perserving, such that the order in which data is saved equals the order in which it is retrieved. There are many producer-consumer applications. In Table 12.1 the activities on the left are producers that create or input data, while the activities on the right are consumers that process or output data.
Put Source process Producer
Get FIFO or Double buffer
Sink process Consumer
Source/Producer
Sink/Consumer
Keyboard input Program with data Program sends message Microphone and ADC Program that has sound data
Program that interprets Printer output Program receives message Program that saves sound data DAC and speaker
The source process puts data into the FIFO or double buffer. If there is room, the Put operation saves data in the structure. If the data structure is full and the user tries to put, the Put routine will return a full error signifying the last (newest) data was not properly saved. The sink process removes data from the FIFO or double buffer. After a Get, the particular information returned from the Get routine is no longer saved. If the structure is empty and the user tries to get, the Get routine will return an empty error signifying no data could be retrieved. The FIFO and double buffer are order preserving, such that the information is returned by repeated calls of Get in the same order as the data was saved by repeated calls of Put. A FIFO typically can store many small chunks of data, whereas a double buffer can store two large fixed-size blocks of data. Checkpoint 12.4: What conditions might cause the FIFO to become full?
The first in first out circular queue (FIFO) is quite useful for implementing a buffered I/O interface. It can be used for both buffered input and buffered output. The order preserving data structure temporarily saves data created by the source (producer) before it is processed by the sink (consumer). After initialization, the FIFO has two functions: Put (enters new data) and Get (removes the oldest data). You have probably already experienced the convenience of FIFOs. For example, when using an editor, you can continue to type characters while other processing is occurring. The ASCII codes are input from the keyboard as they are typed and put in a FIFO. When the editor is active again, it gets more keyboard data to process. A FIFO is also used when you ask the computer to print a file.
12.3 䡲 Interthread Communication and Synchronization
441
Rather than waiting for the actual printing to occur character by character, the print command will put the data in a FIFO. Whenever the printer is free, it will get data from the FIFO. The advantage of the FIFO is it allows you to continue to use your computer while the printing occurs in the background. To implement this magic of background printing we will need interrupts. Figure 12.5 shows a data flow graph with buffered input and buffered output. FIFOs used in this book will be statically allocated global structures. Because they are global variables, it means they will exist permanently and can be carefully shared by more than one program. The advantage of using a FIFO structure for a data flow problem is that we can decouple the producer and consumer threads. Without the FIFO we would have to produce one piece of data, then process it, produce another piece of data, then process it, as described in Figures 12.2 and 12.3. With the FIFO, the producer thread can continue to produce data without having to wait for the consumer to finish processing the previous data. This decoupling can significantly improve system performance.
Figure 12.5 A data flow graph showing two FIFOs that buffer data between producers and consumers.
Consumer
RxFifo_Put
RxFifo_Get
SCI_InChar
Producer RDRF ISR
RxFifo
main
Producer TxFifo_Put SCI_OutChar TxFifo
TxFifo_Get
SCI input
Consumer TDRE ISR
SCI output
Let tp be the time (in sec) between calls to Put, and rp be the arrival rate (in bytes/sec) into the system. Similarly, let tg be the time (in sec) between calls to Get, and rg be the service rate (in bytes/sec) out of the system. rg
1 tg
rp
1 tp
If the minimum time between Put’s is greater than the maximum time between Get’s, min tp Ú max tg then a FIFO is not necessary and the data flow program could be solved with a simple mailbox. On the other hand, if the time between Put’s temporarily becomes less than the time between Get’s because either 䡲 The arrival rate temporarily increases 䡲 The service rate temporarily decreases then information will be collected in the FIFO. For example, a person might type very fast for a while followed by long pause. The FIFO could be used to capture without loss all the data as it comes in very fast. Clearly, on average the system must be able to process the data (the sink process) at least as fast as the average rate at which the data arrives. If the average input rate is larger than the average output rate rp 7 rg then the FIFO will eventually overflow no matter how large the FIFO. If rp is temporarily high or rg is temporarily low, and that causes the FIFO to become full, then this problem can be solved by increasing the FIFO size.
442
12 䡲 Communication Systems
There is fundamental difference between an empty error and a full error. Consider the application of using a FIFO between your computer and its printer. This is a good idea because the computer generates data to be printed at a very high rate followed by long pauses. The printer is like a turtle. It can print at a slow but steady rate. The computer will Put a byte into the FIFO that it wants printed. The printer will Get a byte out of the FIFO when it is ready to print another character. A full error occurs when the computer calls Put at too fast a rate. A full error is serious, either data will be lost or the rest of the computer pauses waiting for there to be room in the FIFO. On the other hand, an empty error occurs when the printer is ready to print but the computer has nothing in mind. An empty error is not serious. If the FIFO is empty the printer just shuts itself off and does nothing. Checkpoint 12.5: If the FIFO becomes full, can the situation be solved by increasing the size?
An input interface using interrupts, but without a FIFO, isn’t any better than a busywaiting solution. If the next input data arrives before the previous data is processed, then data will be lost. When the I/O bandwidth is fast or unpredictable, it is appropriate to pass data from the producer thread to the consumer thread using a first in first out queue (FIFO). The FIFO will buffer the data between the foreground and background. The presence of the FIFO placed between the producer and consumer greatly improves performance by reducing the time each waits for the other. Figure 12.6 shows I/O system that uses interrupts for both input and output. When the main program wishes to output, it calls OutChar, which will put the data in the TxFifo and arm the output device. When the main program wishes to input, it calls InChar, which will get data from the RxFifo. This example has been implemented at tut4 as part of the TExaS system. Figure 12.6 FIFO queues can be used to pass data between threads.
Output ISR
Input ISR InChar
Read data from input
Empty RxFifo Not empty RxGetFifo
RxFifo
Full
RxPutFifo
Empty
Not empty
OutChar
TxGetFifo Full
Not full
rts
TxFifo
TxFifo
Not full
TxPutFifo
Write data to output
Disarm output
ERROR Arm output rti
rts
rti
Observation: For systems with interrupt-driven I/O on multiple devices, there will be a separate FIFO for each device.
The incoming serial data will set input trigger, requesting an interrupt. The ISR (background) will accept the data and put it in the RxFifo. The RxFifo buffers data between the input hardware and the main program that processes the data. If the RxFifo becomes full, then data will be lost. FIFO full errors will always occur if the average input rate (number of bytes arriving per second from the input hardware) exceeds the average processing rate (number of bytes processed per second by the main program). In this situation, either the output rate must be increased (by using a faster computer or by writing a better software processing algorithm), or the input rate must be decreased (by slowing down the arrival rate of data.) The second way the RxFifo could become full is if there is a temporary increase
12.3 䡲 Interthread Communication and Synchronization
443
in the arrival rate or a temporary decrease in the process rate. For this situation, the full errors could be eliminated by increasing the size of the RxFifo. The output trigger occurs when the output device is idle, ready to output more data. If there is data available in the TxFifo, the ISR will get it and write it to the output device. If the TxFifo is empty, the output device is disarmed. The main program puts data in TxFifo and gets data from the RxFifo as desired. If the TxFifo becomes full, then it is appropriate to wait for the interrupts to make room. It is inefficient, but not catastrophic for the main program to wait on a full TxFifo. Efficiency can be improved for the buffered output problem by increasing the TxFifo size. It is also inefficient, but not catastrophic for the main program to wait on an empty RxFifo. Efficiency can be improved for the buffered input problem by performing other tasks while waiting for data. It is important to study the timing behavior of the I/O hardware and software processing when designing an interrupting interface. One simple way to study a problem is to measure the number of elements in the RxFifo when new data is entered by the input ISR. If the time for the software to read and process the data is much faster than the time for the input device to create new input, then there will be very few elements in the RxFifo. For most systems, the producer and consumer rates fluctuate, but during the times when the software waits for the I/O hardware, the system is classified as I/O bound. For an I/O bound input interface the RxFifo has either 0 or 1 entry, and the use of interrupts does not enhance the bandwidth over the busy-waiting implementations. Even with an I/O-bound input device however, it may be more efficient to utilize interrupts because it provides a straightforward approach to servicing multiple devices. If the input device generates a burst of high bandwidth activity, then there will be many elements in the RxFifo. As long as the interrupt service routine is fast enough to keep up with the input device and as long as the RxFifo does not become full, no data is lost. Recall the ISR doesn’t have to process the input data, just read it and save it in the RxFifo. In this situation, the overall bandwidth is higher than it would be with a busy-waiting implementation, because the input device does not have to wait for each data byte to be processed. This is the classic example of a “buffered” input, because data enters the system (via the interrupts) is temporarily stored in a buffer (put into the RxFifo) and the data is processed later (by the main program, get from the RxFifo.) During the times when the I/O device is faster than the software, the system is called CPU-bound. A system will work if the producer rate only temporarily exceeds the consumer rate (a short burst of high bandwidth input). If the external device sustained the high bandwidth input rate, then the RxFifo would become full and data would be lost. For an output device, we will count the number of elements in the TxFifo when data is removed by the output ISR. If the rate for the software to generate new data is much slower than the rate for the output device to send data, then there will be very few elements in the TxFifo. During this time the system is called CPU-bound. In this situation, the TxFifo has either 0 or 1 entry, and the use of interrupts does not enhance the bandwidth over the busywaiting implementations. Even with a CPU-bound output device however, it may be more efficient to utilize interrupts because it provides a straight-forward approach to servicing multiple devices. If the main program generates a burst of output activity, then there will be many elements in the TxFifo. In this situation, the overall bandwidth is higher than it would be with a busy-waiting implementation, because the main program does not have to wait for each data byte to be outputted. This is the classic example of a “buffered” output, because data enters the system (via the main program) is temporarily stored in a buffer (put into the TxFifo) and the data is processed later (by the output ISR, get from the TxFifo.) During the time when the main program is faster than the output hardware, the system is called I/Obound. Just like the input situation, a system will work only if the producer rate temporarily exceeds the consumer rate. If the main program sustained the output rate, then the TxFifo would become full and main program would then have to wait. Again, the output situation is most efficient if the TxFifo is big enough to avoid full errors.
444
12 䡲 Communication Systems
12.3.4 FIFO Queue Implementation
There are many ways to implement a statically allocated FIFO. We can use either a pointer or and index to access the data in the FIFO. We can use either two pointers (or two indices) or two pointers (or two indices) and a counter. The counter specifies how many entries are currently stored in the FIFO. There are even hardware implementations of FIFO queues. If we were to have infinite memory, as shown in Figure 12.7, a FIFO implementation is easy. GetPt points to the data that will be removed by the next call to Fifo_Get, and PutPt points to the empty space where the data will stored by the next call to Fifo_Put. Program 12.3 presents the basic idea of a pointer-based FIFO implementation. To put data in the FIFO, the new data is stored at PutPt, then this pointer is incremented. To get data from the FIFO, the value at GetPt is read, then this pointer is incremented.
Figure 12.7 The FIFO implementation with infinite memory.
RxFifo GetPt PutPt
Program 12.3 Code fragments showing the basic idea of a FIFO.
Valid data
;Reg A is data to put into the FIFO Fifo_Put ldx PutPt staa 1,X+ ;store into FIFO stx PutPt ;update pointer rts ;Reg A returned with byte from FIFO Fifo_Get ldx GetPt ldaa 1,X+ ;read from FIFO stx GetPt ;update rts
There are three modifications that are required to these functions. If the FIFO is full when Fifo_Put is called then the subroutine should return a full error. Similarly, if the FIFO is empty when Fifo_Get is called, then the subroutine should return an empty error. There is never an infinite amount of memory, so a finite number of bytes will be permanently allocated to the FIFO. Figures 12.8 and 12.9 show an example with 10 bytes allocated. The PutPt and GetPt must be wrapped back up to the top when they reach the bottom. The shaded blocks in these two figures represent valid data saved in the FIFO. Figure 12.8 shows how the FIFO changes as four bytes are Put into it. Figure 12.9 shows the same FIFO as Fifo_Get is called four times. Observe the order-preserving nature of the FIFO. Figure 12.8 The FIFO Put operation showing the pointer wrap.
Put
Put
Put
Put
PutPt
newest
PutPt
newest
PutPt
GetPt PutPt
GetPt PutPt
oldest newest
GetPt
oldest newest
GetPt
oldest
GetPt
oldest
12.3 䡲 Interthread Communication and Synchronization Figure 12.9 The FIFO Get operation showing the pointer wrap.
Get
Get
Get
Get GetPt
newest
PutPt
GetPt
newest
oldest newest
newest
PutPt
PutPt
445
PutPt
GetPt PutPt
oldest
GetPt
oldest
GetPt
oldest
There are two mechanisms to determine whether the FIFO is empty or full. A simple method is to implement a counter containing the number of bytes currently stored in the FIFO. Fifo_Get would decrement the counter and Fifo_Put would increment the counter. The second method, shown in Figure 12.10 and Program 12.4, is to prevent the FIFO from being completely full. For example, if the FIFO had 10 bytes allocated, then the Fifo_Put subroutine would allow a maximum of 9 bytes to be stored. If there were already 9 bytes in the FIFO and another Fifo_Put were called, then the FIFO would not be modified and a full error would be returned. In this way if PutPt equals GetPt at the beginning of Fifo_Get, then the FIFO is empty. Similarly, if PutPt+1 equals GetPt at the beginning of Fifo_Put, then the FIFO is full. Be careful to wrap the PutPt+1 before comparing it to Fifo_Get. This second method does not require the length to be stored or calculated. The FIFO global structures must be allocated in RAM. PutPt and GetPt are private, and not accessible by programs outside the FIFO module. Figure 12.10 Flowcharts of the put and get operations.
Fifo_Get
Fifo_Put
=PutPt GetPt != PutPt Retreive data at GetPt GetPt++
tempPt = PutPt Store data at tempPt tempPt++ within buffer tempPt beyond buffer Reset tempPt
return(1)
return(0)
within buffer GetPt beyond buffer Reset GetPt
=GetPt tempPt != GetPt PutPt = tempPt
empty
return(1) full return(0)
The initialization function, Fifo_Init, is usually called once at the start of the system. The FIFO is empty if the PutPt equals the GetPt. Both pointers should always address locations within the 10-byte allocated area. The Fifo_Put routine enters new data in the FIFO. To check for FIFO full, the Fifo_Put routine attempts to put using a temporary pointer. If putting makes the FIFO look empty, then the temporary pointer is discarded
446
12 䡲 Communication Systems
Program 12.4 Implementation of a two-pointer FIFO.
FIFO_SIZE PutPt GetPt Fifo Fifo_Init
equ 10 rmb 2 rmb 2 rmb FIFO_SIZE movw #Fifo,PutPt movw #Fifo,GetPt rts ; Input RegA data to put ; Output RegB 1=OK, 0=full Fifo_Put pshx ldx PutPt ;Temporary staa 1,x+ ;Try to put cpx #Fifo+FIFO_SIZE bne skip ldx #Fifo ;Wrap skip clrb cpx GetPt ;Full if same beq ok incb ;1 means OK stx PutPt ok pulx rts ; Input none ; Output RegA data from Get ; RegB 1=ok, 0=empty Fifo_Get pshx clrb ldx GetPt cpx PutPt ;Empty? beq done incb ;1=OK ldaa 1,x+ ; Data cpy #Fifo+FIFO_SIZE bne no ;wrap? ldx #Fifo ;yes no stx GetPt done pulx rts
and the routine is exited without saving the data. This is why a FIFO with 10 allocated bytes can only hold 9 data points. If putting doesn’t make the FIFO look empty, then the temporary pointer is stored into the actual PutPt saving the data as desired. The Fifo_Get routine removes the oldest data from the FIFO. To check for FIFO empty, the Fifo_Get routine simply checks to see if GetPt equals PutPt. If they match at the start of the routine, then Fifo_Get returns with the “empty” condition signified. Next, the information is retreived from the FIFO. The GetPt is incremented signifying that information is no longer in the FIFO. If the add one to GetPt operation makes the pointer go beyond the FIFO buffer, the pointer is wrapped back to the beginning.
12.3.4 Double Buffer
A double buffer is two buffers of fixed size. One example that uses a double buffer is a disk. Consider the situation where a large amount of data is to be read from a disk. The disk is organized into fixed size blocks. The size of each of the two buffers will match the block size of the disk. In the situation shown in Figure 12.11, the hardware is
12.4 䡲 Serial Port Interface Using Interrupt Synchronization Figure 12.11 A double buffer allows you to store data into one buffer at the same time as retrieving data from the other buffer.
447
Data read from disk Buf1 Data processed by software
Buf2
reading data from the disk filling Buf1. The hardware is configured to read an entire block. During this time the software is reading the data previously stored in Buf2. The double buffer will preserve order. This means the order in which the characters are input from the disk is the same as the order in which they are processed by the software. The differences between a FIFO queue and a double buffer are data size and queue length. The data size of a FIFO is typically one or two bytes. This means that one puts and gets single bytes into and out of the FIFO queue. The data size of the double buffer is typically large (e.g., 80, 256, and 1024 bytes). This means that one always saves and removes big blocks into and out of the double buffer. The FIFO queue length is large (typically ranging from 16 to 60000 bytes). The double buffer has exactly 2 buffers. When the software finishes processing Buf2 and the hardware finishes filling Buf1, the buffers are switched (hardware fills Buf2 and the software processes Buf1). This means if the hardware finishes first, then the disk hardware will have to be paused. Maximum disk efficiency occurs only if the disk can continuously read data has the blocks pass under the read head. I/O devices which manipulate data in fixed size blocks are candidates for using double buffer data structures. Other examples of such devices include: graphics displays, bar code scanners, UPC readers, credit card readers, and IR receivers. A graphics display uses two buffers called a front buffer and a back buffer. The graphics hardware uses the front buffer to create the visual image on the display, i.e., the front buffer contains the data that you see. The software uses the back buffer to create a new image, i.e., the back buffer contains the data that you see next. When the new image is ready, and the time is right, the two buffers are switched (the front becomes the back and the back becomes the front). In this way, the user never sees a partially drawn image.
12.4
Serial Port Interface Using Interrupt Synchronization The objective of this section is to develop software to support bidirectional data transfer using interrupt synchronization, implementing the data flow graph shown previously in Figure 12.5. We could connect the microcontroller to another computer and use this channel to transfer data. For example, if we connect the DB9 cable to a serial port on a PC, we could run HyperTerminal on the PC and communicate with the microcontroller. The RS232 timing is generated automatically by the SCI. In order for data to be properly received, the baud rate must match with the other module, which in this interface will be 9600 bits/sec. Initially, the two FIFOs are cleared, and just the receiver is armed, see Program 12.5. The transmitter will be armed when data is available within the SCI_OutChar routine. An interrupt occurs when new incoming data arrives in the receiver data register (RDRF 1). An interrupt also occurs when the transmit data register is empty (TDRE 1). TDRE is one, when the output channel is idle needing the software to supply additional data. Notice that the transmit channel is disarmed when the TxFifo is empty and rearmed when new data is put into the TxFifo. When the RxFifo becomes full, then data is lost, but when the TxFifo becomes full, the main program simply waits for space to become available.
448
12 䡲 Communication Systems
;9S12C32, 4MHz (9S12DP512 at 8 MHz) ;baud rate=9600 SCI_Init jsr RxFifo_Init ;FIFO is empty jsr TxFifo_Init ;FIFO is empty movb #$2C,SCI0CR2 ;arm just RDRF movw #52,SCI0BD ;(26 if 9S12C32) cli rts * Inputs: none Outputs: RegA is ASCII SCI_InChar pshb iloop jsr RxFifo_Get ;B=0 if empty tbeq B,iloop pulb rts ;A=character * Inputs: RegA is ASCII Outputs: none SCI_OutCh pshb ;A=character oloop jsr TxFifo_Put ;save in FIFO tbeq B,oloop ;B=0 if full movb #$AC,SCI0CR2 ;arm TDRE pulb rts SCIhandler ldaa SCI0SR1 bita #$20 beq CkTDRE ;Not RDRF set ldaa SCI0DRL ;ASCII character bsr RxFifo_Put CkTDRE ldaa SCI0SR1 bpl sdone ;Not TDRE set ldaa SCI0CR2 ;bit 7 is TIE bpl sdone ;disarmed? bsr TxFifo_Get tbeq B,nomore staa SCI0DRL ;start output bra sdone nomore movb #$2C,SCI0CR2 ;disarm TDRE sdone rti org $FFD6 fdb SCIhandler
// 9S12C32, (9S12DP512 at 8 MHz) // 9600 bits/sec void SCI_Init(void){ RxFifo_Init(); // empty FIFOs TxFifo_Init(); SCI0BD = 52; // (26 if 9S12C32) SCI0CR1 = 0; // M=0, no parity SCI0CR2 = 0x2C; // enable, arm RDRF asm cli // enable interrupts } // Input ASCII character from SCI // spin if RxFifo is empty char SCI_InChar(void){ char letter; while (RxFifo_Get(&letter) == 0){} ; return(letter); } // Output ASCII character to SCI // spin if TxFifo is full void SCI_OutChar(char data){ while (TxFifo_Put(data) == 0){} ; SCI0CR2 = 0xAC; // arm TDRE } #define TDRE 0x80 #define RDRF 0x20 // RDRF set on new receive data // TDRE set on empty transmit register interrupt 20 void SciHandler(void){ char data; if(SCI0SR1 & RDRF){ RxFifo_Put(SCI0DRL); // clears RDRF } if((SCI0CR2&0x80)&&(SCI0SR1&TDRE)){ if(TxFifo_Get(&data)){ SCI0DRL = data; // clears TDRE } else{ SCI0CR2 = 0x2c; // disarm TDRE } } }
Program 12.5 Assembly and C implementations of an interrupting SCI interface. Observation: Data is lost when the RxFifo gets full. Common Error: Notice that the above transmit device driver either acknowledges the interrupt by sending another character or disarms itself because the TxFifo is empty. The software will crash (infinite loop) if it returns from interrupt without acknowledging or disarming. Checkpoint 12.6: Why didn’t the initialization software arm TDRE? Checkpoint 12.7: What bad thing would happen if the RDRF ISR waited for there to be room in the RxFifo like SCI_OutChar waits for there to be room in the TxFifo? Checkpoint 12.8: Modify Program 12.5 so the baud rate is 1200 bits/sec.
12.5 䡲 *Distributed Systems
449
Consider the interrupting serial port interface shown in Program 12.5, and the FIFO implementations in Program 12.4. Assume also there is a TxFifo implementation separate, but identical to the RxFifo. When processing input, it is possible for the main program to start to execute RxFifo_Get, be interrupted and the ISR calls RxFifo_Put. To verify correctness of our system, we notice nothing bad would happen (e.g., crash, lost data, extra data, etc.) if the RxFifo_Put subroutine were to execute in between any two assembly instructions of the RxFifo_Get routine. A similar consideration arises when processing outputs. In this situation, the main program may start to execute TxFifo_Put, be interrupted and the output interrupt routine calls TxFifo_Get. Again, nothing bad happens if the TxFifo_Get subroutine is executed in between any two assembly instructions within the TxFifo_Put routine. If we are processing both input and output, then two FIFO’s would be used. Each FIFO routine (RxFifo_Get, RxFifo_Put, TxFifo_Get, TxFifo_Put) is called from exactly one place in the software system. Even though the functions themselves are nonreentrant, the system has no critical sections because none of the individual functions will be reentered. They can interrupt each other, but not themselves. In conclusion, the FIFO routines as used in the system (you can run this system as tut4 on TExaS) have no critical sections. One the other hand, if the foreground and background threads both were to call the same FIFO function, then there would be a critical section.
12.5
*Distributed Systems In this section, we will present three simple communication systems that utilize the SCI port. If the distances are short, half-duplex can be implemented with simple open collector or opendrain TTL-level logic. Open collector logic has two output states: low and off. In the off state the output is not driven high or low, it just floats. The 10 k pull-up resistor will passively make the signal high if none of the open collector outputs are low. The 9S12 can make its TxD serial outputs be open collector. This mode allows a half-duplex network to be created without any external logic (although pull-up resistors are often used). Three factors will limit the implementation of this simple half-duplex network: (1) the number nodes on the network, (2) the distance between nodes; and (3) presence of corrupting noise. In these situations a halfduplex RS485 driver chip like the SP483 made by Sipex or Maxim can be used. The first communication system is master-slave configuration, where the master transmit output is connected to all slave receive inputs, as shown in Figure 12.12. This provides for
Figure 12.12 A master-slave network implemented with multiple microcomputers.
+5V
10kΩ Master 9S12
TxD x RxD
SCI PS1 PS0
SCI PS1
Slave 9S12
PS0
TxD Ground
RxD
Ground TxD means regular digital output TxD means open collector output
TxD x RxD
to other slaves
SCI PS1
Slave 9S12
PS0 Ground
450
12 䡲 Communication Systems
broadcast of commands from the master. All slave transmit outputs are connected together using wire-or open collector logic, allowing for the slaves to respond one at a time. The WOMS1 bit (WOMS bit 1) in the slaves should be set to activate open collector mode on PS1. The low-level device driver for this communication system was presented in the previous section. When the master performs SCI output it is broadcast to all the slaves. There can be no conflict when the master transmits, because a single output is connected to multiple inputs. When a slave receives input, it knows it is a command from the master. The potential problem exists because multiple slave transmitters are connected to the same signal. If the slaves only transmit after specifically being triggered by the master, no collisions can occur. Checkpoint 12.9: What voltage level will the master RxD observe if two slaves simultaneously transmit, one making it a logic high and the other a logic low?
The next communication system is a ring network. This is the simplest distributed system to design, because it can be constructed using standard serial ports. In fact, we can build a ring network simply by chaining the transmit and receive lines together in a circle, as shown in Figure 12.13. Building a ring network is a matter as simple as soldering a RS232 cable in a circle with one DB9 connector for each node. Messages will include source address, destination address and information. If computer A wishes to send information to computer C, it sends the message to B. The software in computer B receives the message, notices it is not for itself, and it resends the message to C. The software in computer C receives the message, notices it is for itself, and it keeps the message. Although simple to build, this system has slow performance (response time and bandwidth), and it is difficult to add/subtract nodes. Figure 12.13 A ring network implemented with 3 microcomputers.
SCI TxDx PS1 B 9S12
SCI PS0 PS1
RxD
A 9S12
RxD PS0
TxD
Ground
Ground All TxD are regular digital output
TxDx
SCI PS1
RxD
PS0
C 9S12
Ground
Checkpoint 12.10: Assume the ring network has 10 nodes, the baud rate is 100,000 bits/sec, and there are 10 bits/frame. What is average time it takes to send a 10 byte message from one computer to another?
The third communication system is a very common approach to distributed embedded systems, called multi-drop, as shown in Figure 12.14. To transmit a byte to the other computers, the software activates the SP483 driver and outputs the frame. Since it is halfduplex, the frame is also sent to the receiver of the computer that sent it. This echo can be checked to see if a collision occurred (two devices simultaneously outputting). If more than
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN) Figure 12.14 Two multi-drop networks implemented with 3 microcomputers.
TxDx means open collector +5V10kΩ A 9S12
SCI
PS1 PS0
TxD RxD
RS485 SP483 A 9S12
SCI
PS1 PS0
SCI
PS1 PS0
Ground
PS1
Ground TxD RxD
B 9S12
PS1 SCI PS0 Ground
Ground C 9S12
SCI
PS0
Ground B 9S12
451
TxD RxD
to others
C 9S12
PS1 SCI PS0 Ground
to others
two computers exist on the network, we usually send address information first, so that the proper device receives the data. The 6812 SCI has a status bit in the SCISR2 register called RAF that will true if there is an incoming frame on the RxD line. Many collisions can be avoided by checking this bit before transmitting. Checkpoint 12.11: How can the transmitter detect a collision had corrupted its output? Checkpoint 12.12: How can the receiver detect a collision had corrupted its input?
There are many ways to check for transmission errors. You could use a longitudinal redundancy check (LRC) or horizontal even parity. This error check byte is simply the exclusive-or of all the message bytes (except the LRC itself). The receiver also performs an exclusive-or on the message as well as the error check byte. The result will equal zero if the block has been transmitted successfully. Another popular method is checksum, which is simply the modulo256 (8-bit) or modulo65536 (16-bit) sum of the data packet. In addition, each byte could have (but doesn’t have to) include even parity. There are two mechanisms that allow the transmission of variable amounts of data. Some protocols use start (STX $02) and stop (ETX $03) characters to surround a variable amount of data. The disadvantage of this “termination code” method is that binary data cannot be sent because a data byte might match the termination character (ETX). Therefore, this protocol is appropriate for sending ASCII characters. Another possibility is to use a byte count to specify the length of a message. Many protocols use a byte count. The S19 records, for example, have a byte count in each line.
12.6
*Design and Implementation of a Controller Area Network (CAN)
12.6.1 The Fundamentals of CAN
In this section, we will design and implement a Controller Area Network (CAN). CAN is a high-integrity serial data communications bus that is used for real-time applications. It can operate at data rates of up to 1 Mbits/second, having excellent error detection and confinement capabilities. The CAN was originally developed by Robert Bosch for use in
452
12 䡲 Communication Systems
automobiles, and is now extensively used in industrial automation and control applications. The CAN protocol has been developed into an international standard for serial data communication, specifically the ISO 11989. Figure 12.15 shows the block diagram of a CAN system, which can have up to 112 nodes. There are four components of a CAN system. The first part is the CAN bus consisting of two wires (CANH, CANL) with 120 termination resistors on each end. The second part is the Transceiver, which handles the voltage levels and interfacing the separate receive (RxD) and transmit (TxD) signals onto the CAN bus. The third part is the CAN controller, which is hardware built into the 9S12, and it handles message timing, priority, error detection, and retransmission. The last part is software running within the 9S12 that handles the high-level functions of generating data to transmit and processing data received from other nodes.
9S12C32
9S12C32
1
9S12C32 112
2
CAN Controller
CAN Controller
CAN Controller
PM0/RxD PM1/TxD
PM0/RxD PM1/TxD
PM0/RxD PM1/TxD
Transceiver Slope Control Driver Control
Dominant Detect Shutdown
Driver Control
POR
MCP2551 CANL
CANH
Transceiver
Transceiver Slope Control
Dominant Detect
Slope Control
Shutdown
Driver Control
POR
CANH
Shutdown POR
MCP2551
MCP2551 CANL
Dominant Detect
CANL
CANH 120Ω
120Ω Figure 12.15 Block Diagram of a 9S12-Based CAN communication system.
Each node consists of a 9S12 microcontroller (with an internal CAN controller), and a transceiver that interfaces the CAN controller to the CAN bus. A transceiver is a device capable of transmitting and receiving on the same channel. The CAN is based on the “broadcast communication mechanism”, which follows a message-based transmission protocol rather than an address-based protocol. The CAN provides two communication services: the sending of a message (data frame transmission) and the requesting of a message (remote transmission request). All other services such as error signaling, automatic retransmission of erroneous frames are user-transparent, which implies that the CAN interface automatically performs these functions. The 9S12 has an integrated CAN interface (e.g., the 9S12C32 has one CAN channel and the 9S12DP512 has five CAN channels). The physical channel consists of two wires containing in differential mode one digital logic bit. Because multiple outputs are connected together, there must be a mechanism to resolve simultaneous requests for transmission. In a manner similar to open collector logic, there are dominant and recessive states on the transmitter, as shown in Figure 12.16. The outputs follow a wired-and-mechanism in such a way that if one or more nodes are sending a dominant state, it will override any nodes attempting to send a recessive state. Checkpoint 12.13: What are the dominant and recessive states in open collector logic?
The CAN transceiver is a high-speed, fault-tolerant device that serves as the interface between a CAN protocol controller (located in the 9S12) and the physical bus. The
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN) Figure 12.16 Voltage specifications for the recessive and dominant states.
5V 3.5V 2.5V 1.5V 0V
453
Recessive Dominant Recessive CANH CANL Time
transceiver is capable of driving the large current needed for the CAN bus and has electrical protection against defective stations. Typically each CAN node must have a device to convert the digital signals generated by a CAN controller to signals suitable for transmission over the bus cabling. The transceiver also provides a buffer between the CAN controller and the high-voltage spikes than can be generated on the CAN bus by outside sources. Examples of CAN transceiver chips include the AMIS-30660 high speed CAN transceiver, Infineon Technologies TLE6250GV33 transceiver, ST Microelectronics L9615 transceiver, Philips Semiconductors AN96116 transceiver, and the Microchip MCP2551 transceiver. These transceivers have similar characteristics and would be equally suitable for implementing a CAN system. In a CAN system, messages are identified by their contents rather by addresses. Each message sent on the bus has a unique identifier, which defines both the content and the priority of the message. This feature is especially important when several stations compete for bus access, a process called bus arbitration. As a result of the content-oriented addressing scheme, a high degree of system and configuration flexibility is achieved. It is easy to add stations to an existing CAN network. Four message types or frames can be sent on a CAN bus. These include the Data Frame, the Remote Frame, the Error Frame, and the Overload Frame. This section will focus on the Data Frame, where the parts in standard format are shown in Figure 12.17. The Arbitration Field determines the priority of the message when two or more nodes are contending for the bus. For the Standard CAN 2.0A, it consists of an 11-bit identifier. For the Extended CAN 2.0B, there is a 29-bit Identifier. The identifier defines the type of data. The Control Field contains the DLC, which specifies the number of data bytes. The Data Field contains zero to eight bytes of data. The CRC Field contains a 15-bit checksum used for error detection. Any CAN controller that has been able to correctly receive this message sends an Acknowledgement bit at the end of each message. This bit is stored in the Acknowledge slot in the CAN data frame. The transmitter checks for the presence of this bit and if no acknowledge is received, the message is retransmitted. To transmit a message, the software must set the 11-bit Identifier, set the 4-bit DLC, and give the 0 to 8 bytes of data. The receivers can define filters on the identifier field, so only certain message types will be accepted. When a message is received the software can read the identifier, length, and data.
Message Frame Bus Idle
Arbitration field
Control
11-bit Identifier
SOF
RTR
Figure 12.17 CAN Standard Format Data Frame.
DLC
r0 IDE/r1
Data field Data (0–8 bytes)
CRC field ACK
EOF
IFS Bus Idle
15 bits
Delimiter
Delimiter Slot
454
12 䡲 Communication Systems
The Intermission Frame Space (IFS) separates one frame from the next. There are two factors that affect the number of bits in a CAN message frame. The ID (11 or 29 bits) and the Data fields (0, 8, 16, 24, 32, 40, 48, 56, or 64 bits) have variable length. The remaining components (36 bits) of the frame have fixed length including SOF (1), RTR (1), IDE/r1 (1), r0 (1), DLC (4), CRC (15), and ACK/EOF/intermission (13). For example, a Standard CAN 2.0A frame with two data bytes has 11 16 36 63 bits. Similarly, an Extended CAN 2.0B frame with four data bytes has 29 32 36 97 bits. If a long sequence of 0’s or a long sequence of 1’s is being transferred, the data line will be devoid of edges that the receiver needs to synchronize its clock to the transmitter. In this case, measures must be taken to ensure that the maximum permissible interval between two signal edges is not exceeded. Bit Stuffing can be utilized by inserting a complementary bit after five bits of equal value. Some CAN systems add stuff bits, where the number of stuff bits depends on the data transmitted. Assuming n is the number of data bytes (0 to 8), CAN 2.0A may add 3 n stuff bits and a CAN 2.0B may add 5 n stuff bits. Of course, the receiver has to un-stuff these bits to obtain the original data. The urgency of messages to be transmitted over the CAN network can vary greatly in a real-time system. Typically there are one or two activities that require high transmission rates or quick responses. Both bandwidth and response time are affected by message priority. Low priority messages may have to wait for the bus to be idle. There are two priorities occurring as the 9S12 CANs transmit messages. The first priority is the 11-bit identifier, which is used by all the CAN controllers wishing to transmit a message on the bus. Message identifiers are specified during system design and cannot be altered dynamically. The 11-bit identifier with the lowest binary number has the highest priority. In order to resolve a bus access conflict, each node in the network observes the bus level bit by bit, a process known as bit-wise arbitration. In accordance with the wired-andmechanism, the dominant state overwrites the recessive state. All nodes with recessive transmission but dominant observation immediately lose the competition for bus access and become receivers of the message with the higher priority. They do not attempt transmission until the bus is available again. Transmission requests are hence handled according to their importance for the system as a whole. The second priority occurs locally, within each CAN node. When a node has multiple messages ready to be sent, it will send the highest priority messages first.
12.6.2 Details of the 9S12 CAN
Table 12.2 shows the I/O registers used for the CAN. The 9S12 CAN receiver has a FIFO queue, which can hold up to five incoming messages, as shown in Figure 12.18. The 9S12 CAN transmitter uses a priority queue, which can hold up to three outgoing messages. To transmit a message the software writes the message into addresses $0170 to $017F. The software specifies the priority of the outgoing message (CANTXTBPR at $017F). High priority messages go to the front of the queue and are transmitted next. Low priority messages go to the back of the queue and are transmitted only when no higher priority messages are ready. Once in the queue, the CAN hardware is responsible for handling the priority, timing, transmitting the message, error detection, and retransmission if an error occurs. The 9S12 CAN receiver has a FIFO queue, which can hold up to five incoming messages. To retrieve the contents of an incoming message the software reads from addresses $0160 to $016F. Observation: It is confusing when designing systems that use a sophisticated I/O interface like the CAN to understand the difference between those activities automatically handled by the CAN hardware module and those activities your software must perform. The solution to this problem is to look at software examples to see exactly the kinds of tasks your software must perform.
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN)
Table 12.2 9S12 CAN ports. Figure 12.18 Data flow through the 9S12 CAN controller.
9S12C32
+5V
CAN Controller
3
Priority Queue Identifier DLC Data
1
nc 5
Vdd
Vref
TxD
PM1
MCP2551 CANL
FIFO Queue Identifier DLC Data
PM0
4
CANH RxD Vss 2
6 7
Rs 8
The CANCTL0 and CANCTL1 registers contain flags and control bits. RXFRM is the Received Frame Flag. It is set when a receiver has received a valid message correctly, independently of the filter configuration. Once set, it remains set until cleared by software or reset. Clearing is done by writing a ‘1’ to the bit. RXACT is the Receiver Active Status flag. This read-only flag indicates the CAN is receiving a message. SYNCH is the Synchronized Status flag. This read-only flag indicates whether the CAN is synchronized to the CAN bus and, as such, can participate in the communication process. INITRQ is the Initialization
456
12 䡲 Communication Systems
Mode Request bit. When this bit is set by the CPU, the CAN skips to Initialization Mode. Any ongoing transmission or reception is aborted and synchronization to the bus is lost. The module indicates entry to Initialization Mode by setting INITAK=1. SLPRQ is the Sleep Mode Request bit. This bit requests the CAN to enter Sleep Mode, which is an internal power saving mode. The Sleep Mode request is serviced when the CAN bus is idle, i.e. the module is not receiving a message and all transmit buffers are empty. The module indicates entry to Sleep Mode by setting SLPAK=1. CANE is the CAN Enable bit, which we set to 1 to enable the CAN module. If it is 0, then the module is disabled. CLKSRC is the CAN Clock Source bit, which defines the clock source for the CAN module. We set it to 1 to use the Bus Clock, and to 0 to use the Oscillator Clock. The frequency of the Oscillator Clock is equal to the frequency of the external crystal. The Bus Clock is the frequency at which data is accessed on the Bus and is a function of both the crystal and the PLL. We define the time quanta, Tq, as the period of the selected clock. LISTEN is the Listen Only Mode bit, which configures the CAN as a bus monitor. When the bit is set, all valid CAN messages with matching ID are received, but no acknowledgement or error frames are sent out. The CANBTR0 and CANBTR1 registers provide for bus timing control, which can only be written in initialization mode. SJW1, SJW0 are the Synchronization Jump Width bits, which we will set to zero for high speed communication. BRP[5-0] are Baud Rate Prescaler bits, and let x be the 6-bit number formed by these bits. The clock period used to create the individual bit timing is (x 1)*Tq. SAMP is the Sampling bit, which determines the number of samples of the serial bus to be taken per bit time. If set, three samples per bit are taken; the regular one (sample point) and two preceding samples using a majority rule. For higher bit rates, it is recommended that SAMP be cleared which means that only one sample is taken per bit. There are three time segments for each transmitted bit. Segment 0 is exactly one clock period, but the length of the other two periods is programmed using CANBTR1. The input bit is sampled at the time in between Segment 1 and Segment 2. TSEG22-TSEG20 are the three Time Segment 2 bits, and let y be the 3-bit number formed by these bits. The length of Segment 2 will be y 1 clock periods. TSEG13-TSEG10 are the four Time Segment 1 bits, and let z be the 4-bit number formed by these bits. The length of Time Segment 1 will be z 2 clock periods. The time for each bit includes all three segments Bit Time Tq *(x 1)(3 y z) Checkpoint 12.14: What is the relationship between y and z if we wish to sample the input in the middle of the bit interval?
CANRFLG is the Receiver Flag Register. The WUPIF CSCIF RSTAT1 RSTAT0 TSTAT1 TSTAT0 OVRIF and RXF flags are cleared by writing a ‘1’ to the corresponding bit position. Every flag has an associated interrupt arm bit in the CANRIER register. For low power applications, we can place the system in Sleep Mode. WUPIF is the Wake-Up Interrupt Flag, which is used to detect bus activity while in Sleep Mode. This bit is 1 when it has detected activity on the bus and requested wake-up. CSCIF is the CAN Status Change Interrupt Flag. This flag is set when the CAN changes its current bus status as shown in the 4-bit (RSTAT[1:0], TSTAT[1:0]) status register. The coding for the bits RSTAT1, RSTAT0 is: 00 Rx OK: 01 Rx Warning: 10 Rx Error: 11 Bus-Off:
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN)
457
Excessive transmitter errors will turn off both the receiver and the transmitter. OVRIF is the Overrun Interrupt Flag, which is set when a data overrun condition occurs. In particular, an overrun occurs when five valid messages are in the receive FIFO, and a sixth message is received. RXF is the Receive Buffer Full Flag, which is set by the CAN when a new message is shifted in the receiver FIFO. This flag indicates whether the shifted buffer is loaded with a correctly received message (matching identifier, matching Cyclic Redundancy Code (CRC) and no other errors detected). After the CPU has read that message from locations $0160-$016F, the RXF flag must be cleared to release the buffer. If armed (RXFIE), this bit will request an interrupt. The software can configure the 9S12 CAN to filter incoming messages. Accepted messages will set the RXF flag and will be available for processing. Dropped messages will not set the RXF flag and will be discarded. CANIDAR0-7 are the Identifier Acceptance Registers. CANIDMR0-7 are corresponding the Identifier Mask Registers. These registers can only be set in initialization mode. CANIDAC is the Identifier Acceptance Control Register. The two bits IDAM1 IDAM0 specify the Identifier Acceptance Mode. 002 means the eight acceptance registers are configured as two 32-bit filters. 012 means the eight acceptance registers are configured as four 16-bit filters. 102 means the eight acceptance registers are configured as eight 8-bit filters. 112 means the filter is closed, meaning no message will be accepted and that the foreground buffer is never reloaded. On reception, each message is written into the background receive buffer. The CPU is only signaled to read the message if it passes the criteria in the identifier acceptance and identifier mask registers (accepted); otherwise, the message is overwritten by the next message (dropped). The acceptance registers of the CAN are applied on the IDR0 to IDR3 registers of incoming messages in a bit by bit manner. Mask bits AM7-AM0 are set to 0 to specify the corresponding bit will be filtered, and a mask bit of 1 means the corresponding bit will match (be acceptable) regardless of ID bit value. AC7-AC0 comprise a user defined sequence of bits with which the corresponding bits of the related identifier register (IDRn) of the receive message buffer are compared. The result of this comparison is then masked with the corresponding identifier mask register. The three bits IDHIT2, IDHIT1, and IDHIT0 specify which filter applied to the message currently available in the receive FIFO. Observation: To enable the receiver to accept all messages set the mask registers to 0xFF.
CANTFLG is the Transmitter Flag Register. The flags are cleared by writing a ‘1’ to the corresponding bit position. Every flag has an associated interrupt arm bit in the CANTIER register. TXE2, TXE1, and TXE0 are the Transmitter Buffer Empty bits, which indicate that the associated transmit message buffer is empty, and thus not scheduled for transmission. The CPU must clear the flag after a message is set up in the transmit buffer and is due for transmission. The CAN sets the flag after the message is sent successfully. The flag is also set by the CAN when the transmission request is successfully aborted due to a pending abort request. There are three transmit buffers in the priority, but only one is accessible at addresses $0170-$017F. CANTBSEL is the Transmit Buffer Selection register, defining which buffer will be accessible. In particular, TX2, TX1, and TX0 are the Transmit Buffer Select bits. The lowest numbered bit places the respective transmit buffer in $0170-$017F space (e.g. if CANTBSEL is 0112, transmit buffer 0 is selected). Read and write accesses to the selected transmit buffer will be blocked, if the corresponding TXEx bit is cleared and the buffer is scheduled for transmission. IDE is the ID Extended bit, which indicates whether the extended or standard identifier format is applied in this buffer. In the case of a receive buffer, the flag is set as received and indicates to the CPU how to process the buffer identifier registers. In the case of a transmit buffer, the flag indicates to the CAN what type of identifier to send. IDE 1 means
458
12 䡲 Communication Systems
Extended format (29 bit), and IDE 0 means Standard format (11 bit). RTR is the Remote Transmission Request bit, which reflects the status of the Remote Transmission Request bit in the CAN frame. In the case of a receive buffer, it indicates the status of the received frame and supports the transmission of an answering frame in software. In the case of a transmit buffer, this flag defines the setting of the RTR bit to be sent. RTR 1 means Remote frame and RTR 0 means Data frame.
12.6.3 9S12 CAN Device Driver
Program 12.6 Initialization of the 9S12 CAN network.
The device driver for the 9S12-based CAN network is divided into three components: initialization, transmission, and reception. Although the 9S12 can handle standard and extended message formats, this software system will be configured to handle only the standard format. Program 12.6 gives the initialization code for the interface. The highlevel software on all nodes of the network will call CAN_Open() to initialize the CAN modules. If a node wishes to send 0 to 8 bytes of data to the other nodes, it would pass the information to CAN_Send(), which will transmit the message via the CAN bus. This information would then be retrieved by the receiving nodes by calling CAN_Receive(). The receiver will generate an interrupt when a new message is ready, and a FIFO queue will be used to pass the message from the background to the foreground. Each entry in the FIFO needs to be at least 11 bytes long: 2 bytes for the 11-bit ID, 1 byte for the 3-bit length, and 8 bytes for the data. The CAN is enabled by setting the CANE bit. In order to set the configuration registers, the CAN must be in initialization mode. If the main program calls CAN_Open a second time, there may be transmit or receive messages in progress. In order to prevent errors, this ritual will first request a transfer into Sleep Mode. This request will allow incoming and outgoing messages to complete before acknowledging Sleep Mode has been entered. Once in Sleep Mode, this ritual can safely request the CAN enter Initialization Mode. The initialization sequence turns off Listen Mode, sets the clock, and establishes the acceptance filters. Setting all the acceptance masks to 0xFF means all messages will be accepted.
void CAN_Open(void){ asm sei // make atomic CANFifo_Init(); // Initialize FIFO data structure CANCTL1 |= 0x80; // CANE=1, Enable CAN CANCTL0 |= 0x02; // SLPRQ=1, go to sleep first while((CANCTL1&0x02)==0){} ; // SLPAK signifies Sleep Mode CANCTL0 &= ~0x02; // SLPRQ=0, leave Sleep Mode CANCTL0 |= 0x01; // INITRQ=1, Enter Initialization Mode while((CANCTL1&0x01)==0){} ; // INITAK signifies Initialization Mode CANCTL1 &= ~0x10; // LISTEN=0, get out of Listen-only mode CANCTL1 &= ~0x40; // CLKSRC=0, use oscillator clock CANIDAC = 0x10; // four 16-bit filters CANIDMR0 = 0xFF; CANIDMR1 = 0xFF; CANIDMR2 = 0xFF; CANIDMR3 = 0xFF; CANIDMR4 = 0xFF; CANIDMR5 = 0xFF; CANIDMR6 = 0xFF; CANIDMR7 = 0xFF; CANBTR0 = 0x03; // (x+1)=4, assume oscillator is 8 MHz CANBTR1 = 0x23; // (3+y+z)=8, divide by 32 gives 250,000 bits/sec CANCTL0 &= ~0x01; // INITRQ=0, Leave Initialization mode while(CANCTL1&0x01){} ; // wait for the end of initialization CANRIER |= 0x01; // Arm RxF, interrupt on receive message asm cli // Enable interrupts }
Program 12.7 shows the software used to transmit a message. It begins by waiting for an empty transmit buffer. After the first while loop, one or more bits in the CANTFLG register will be set. Each flag bit that is set means its corresponding buffer is free.
12.6 䡲 *Design and Implementation of a Controller Area Network (CAN) Program 12.7 Transmit a message on the 9S12 CAN network.
459
void CAN_Send(unsigned short id, char length, char *data, char priority) { char *pt=(char*)&_CANTXDSR0; // points to transmit message buffer while((CANTFLG&0x07)== 0){} ; // Wait for transmit buffer available CANTBSEL = CANTFLG; // Request selection of empty xmt buf CANTXIDR0 = id>>3; // Write Identifier into ID registers CANTXIDR1 = id<<5; // with RTR and IDE=0 CANTXDLR = length; // 0 to 8 bytes while(length){ *pt++ = *data++; // copy data into data registers length--; } CANTXTBPR = priority; // set priority of this message CANTFLG = CANTBSEL; // flag buffer as ready for transmission }
By writing into CANTBSEL the CAN selects which buffer to use. In standard format, the CANTXIDR0 and CANTXIDR1 registers contain the 11-bit identifier. IDE is set to 0 to create a standard format message with 11-bit identifier. RTR is set to 0 to create a data frame. The message length is copied into the CANTXDLR register. 0 to 8 bytes are copied into the data field of the message. The priority field is set to place this message into the 3-message priority queue maintained by the transmitter. If we write multiple bits into CANTBSEL when selecting which buffer to use, reading from it will return which buffer was selected. The last step (writing into CANTFLG) causes this message to be flagged as ready to transmit. Program 12.8 shows the software used to receive a message. Interrupt synchronization is used so the main program doesn’t have to be continuously checking for the presence of an incoming message. The CANFifo module implements a FIFO queue that puts and gets 13-byte messages. When a message is properly received, the RXF flag is set and an interrupt is requested. The CANFifo_Put function copies 13 bytes (the ID, Length and Data from the CAN buffer) from the CAN receive buffer into the FIFO queue. Using a 13-byte FIFO rather than just 11 bytes simplifies the function. After the information is copied, the receive buffer is released by writing a 1 to RXF. The high-level program can get the received message by calling CAN_Receive.
Program 12.8 Receive a message from the 9S12 CAN network.
void CAN_Receive(char msg[13]) { while (CANFifo_Get(msg) == 0){} ; // wait for incoming message } interrupt 38 void CANInterruptHandler(void){ char *msgPtr = (char*)&_CANRXIDR0; if(CANRFLG & RXF){ CANCTL0 |= RXFRM; // clear Received frame flag CANFifo_Put(msgPtr); // puts 11 bytes of message into 13-byte FIFO CANRFLG |= RXF; // clear RXF by writing a 1. } }
Program 12.9 shows an example main program used to send and receive messages. If it receives a message with ID equal to 50, then it will respond with a message ID of 51 and data equal to the sum of the received data.
460
12 䡲 Communication Systems
Program 12.9 Example main program that sends and receives messages on the 9S12 CAN network.
12.7
void main(void){ // example foreground program char msg[13]; // received message unsigned short id; // ID of received message char length; char i; short sum; CAN_Open(); // activate CAN for(;;) { CAN_Receive(msg); // wait for incoming message id = (msg[0]<3)+(msg[1]>>5); // bytes 0,1 are CANTXIDR0-1 if(id == 50){ // bytes 2,3 contain no information length = msg[12]; // byte 12 is CANTXDLR i = 4; sum = 0; while(length){ sum += msg[i++]; // bytes 4-11 are data length—; } CAN_Send(51,2,&sum,0); } } }
*Inter-Integrated Circuit (I2C) Interface
12.7.1 The Fundamentals of I2C
Ever since microcontrollers have been developed, there has been a desire to shrink the size of an embedded system, reduce its power requirements, and increase its performance and functionality. Two mechanisms to make systems smaller are to integrate functionality into the microcontroller and to reduce the number of I/O pins. The inter-integrated circuit I2C interface was proposed by Philips in the late 1980s as a means to connect external devices to the microcontroller using just two wires. The SPI interface has been very popular, but it takes 3 wires for simplex and 4 wires for full duplex communication. In 1998, the I2C Version 1 protocol become an industry standard and has been implemented into thousands of devices. The I2C bus is a simple two-wire bi-directional serial communication system that is intended for communication between microcontrollers and their peripherals over short distances. This is typically, but not exclusively, between devices on the same printed circuit board, the limiting factor being the bus capacitance. It also provides flexibility, allowing additional devices to be connected to the bus for further expansion and system development. The interface will operate at bit rates of up to 100 kbps with maximum capacitive bus loading. The module can operate up to a baud rate of 400 kbps provided the I2C bus slew rate is less than 100 ns. The maximum interconnect length and the number of devices that can be connected to the bus are limited by a maximum bus capacitance of 400pF in all instances. Version 2.0 supports a high speed mode with a data rate up to 2.4 MHz. This section will focus on Version 1, because the 9S12 does not support Version 2. Figure 12.19 shows a block diagram of a communication system based on the I2C interface found in many 9S12 microcontrollers. The master/slave network may consist of multiple masters and multiple slaves. The Serial Clock Line (SCL) and the Serial Data line (SDA) are both bidirectional. Each line is open collector, meaning a device may drive it low or let it float. A logic high occurs if all devices let the output float, and a logic low occurs when at least one device drives it low. The value of the pull-up resistor depends on the speed of the bus. 4.7 k is recommended for data rates below 100 kbps, 2.2 k is recommended for standard mode, and 1 k is recommended for fast mode.
12.7 䡲 *Inter-Integrated Circuit (I2C) Interface
9S12DP512
9S12DP512
I2 C Interface
I2 C Interface
Clock Control
PJ7
In/Out Shift Reg
PJ6
I2 C Device
Clock Control
SCL SDA
PJ7
+5V 2.2kΩ
In/Out Shift Reg
PJ6
I2 C Device
I2 C Device
SCL SDA
SCL SDA
461
+5V 2.2kΩ
SCL SDA Figure 12.19 Block diagram of an I2C communication network.
Checkpoint 12.15: Why is the recommended pullup resistor related to the bus speed?
The SCL clock is used in a synchronous fashion to communicate on the bus. Even though data transfer is always initiated by a master device, both the master and the slaves have control over the data rate. The master starts a transmission by driving the clock low, but if a slave wishes to slow down the transfer, it too can drive the clock low (called clock stretching). In this way, devices on the bus will wait for all devices to finish. Both address (from Master to Slaves) and information (bidirectional) are communicated in serial fashion on SDA. The bus is initially idle where both SCL and SDA are both high. This means no device is pulling SCL or SDA low. The communication on the bus, which begins with a START and ends with a STOP, consists of five components: START (S) is used by the master to initiate a transfer DATA is sent in 8-bit blocks and consists of 7-bit address and 1-bit direction from the master control code for master to slaves information from master to slave information from slave to master ACK (A) is used by slave to respond to the master after each 8-bit data transfer RESTART (R) is used by the master to initiate additional transfers without releasing the bus STOP (P) is used by the master to signal the transfer is complete and the bus is free The basic timings for these components are drawn in Figure 12.20. For now we will discuss basic timing, but we will deal with issues like stretching and arbitration later. A slow slave uses clock stretching to give it more time to react, and masters will use arbitration when two or more masters want the bus at the same time. An idle bus has both SCL and SDA high. A transmission begins when the master pulls SDA low, causing a START (S) component. The timing of a RESTART is the same as a START. After a START or a RESTART, the next 8 bits will be an address (7-bit address plus 1-bit direction). There are 128 possible 7-bit addresses, however, 32 of them are reserved as special commands. The address is used to enable a particular slave. All data transfers are 8 bits long, followed by a 1-bit acknowledge. During a data transfer, the SDA data line must be stable (high or low) whenever the SCL clock line is high. There is one clock pulse on SCL for each data bit, the MSB being transferred first. Next, the selected slave will respond with a positive acknowledge (Ack) or a negative acknowledge (Nack). If the direction bit is 0 (write), then subsequent data transmissions contain information sent from master
12 䡲 Communication Systems
462
Start(S) or Restart(R)
D7
SDA
SDA
SCL
SCL
Ack
D6
D5
D4
D3
Nack
D2
D1
D0
Stop (P)
SDA
SDA
SDA
SCL
SCL
SCL
Figure 12.20 Timing diagrams of I2C components.
to slave. For a write data transfer, the master drives the RDA data line for 8 bits, then the slave drives the acknowledge condition during the 9th clock pulse. If the direction bit is 1 (read), then subsequent data transmissions contain information sent from slave to master. For a read data transfer, the slave drives the RDA data line for 8 bits, then the master drives the acknowledge condition during the 9th clock pulse. The STOP component is created by the master to signify the end of transfer. A STOP begins with SCL and SDA both low, then it makes the SCL clock high, and ends by making SDA high. The rising edge of SDA while SCL is high signifies the STOP condition. Checkpoint 12.16: What happens if no device sends an acknowledgement?
Figure 12.21 illustrates the case where the master sends 2 bytes of data to a slave. The shaded regions demark signals driven by the master, and the white areas show those times when the signal is driven by the slave. Regardless of format, all communication begins when the master creates a START component followed by the 7-bit address and 1-bit direction. In this example, the direction is low, signifying a write format. The 1st through 8th SCL pulses are used to shift the address/direction into all the slaves. In order to acknowledge the master, the slave that matches the address will drive the SDA data line low during the 9th SCL pulse. During the 10th through 17th SCL pulses sends the data to the selected slave. The selected slave will acknowledge by driving the SDA data line low during the 18th SCL pulse. A second data byte is transferred from master to slave in the same manner. In this particular example, two data bytes were sent, but this format can be used to send any number of bytes, because once the master captures the bus it can transfer as many bytes as it wishes. If the slave receiver does not acknowledge the master, the SDA line will be left high (Nack). The master can then generate a STOP signal to abort the data transfer or a RESTART signal to commence a new transmission. The master signals the end of transmission by sending a STOP condition.
Figure 12.21 I2C transmission of two bytes from master to slave.
S
SDA
Address
A
A7 A6 A5 A4 A3 A2 A1 W
Data
A
D7 D6 D5 D4 D3 D2 D1 D0
Data
A P
D7 D6 D5 D4 D3 D2 D1 D0
SCL 1
8 9 10
17 18 19
26 27
12.7 䡲 *Inter-Integrated Circuit (I2C) Interface
463
Figure 12.22 illustrates the case where a slave sends 2 bytes of data the master. Again, the master begins by creating a START component followed by the 7-bit address and 1-bit direction. In this example, the direction is high, signifying a read format. During the 10th through 17th SCL pulses the selected slave sends the data to the master. The selected slave can only change the data line while SCL is low and must be held stable while SCL is high. The master will acknowledge by driving the SDA data line low during the 18th SCL pulse. Only two data bytes are shown in Figure 12.22, but this format can be used to receive as any many bytes the master wishes. Except for the last byte all data are transferred from slave to master in the same manner. After the last data byte, the master does not acknowledge the slave (Nack) signifying ‘end of data’ to the slave, so the slave releases the SDA line for the master to generate STOP or RESTART signal. The master signals the end of transmission by sending a STOP condition. Figure 12.22 I2C transmission of two bytes from slave to master.
S
SDA
Address
A
A7 A6 A5 A4 A3 A2 A1 R
Data
A
D7 D6 D5 D4 D3 D2 D1 D0
Data
N P
D7 D6 D5 D4 D3 D2 D1 D0
SCL 1
8 9 10
17 18 19
26 27
Figure 12.23 illustrates the case where the master uses the RESTART command to communicate with two slaves, reading one byte from one slave and writing one byte to the other. As always, the master begins by creating a START component followed by the 7-bit address and 1-bit direction. During the first start, the address selects the first slave and the direction is read. During the 10th through 17th SCL pulses the first slave sends the data to the selected slave. Because this is the last byte to be read from the first slave, the master will not acknowledge letting the SDA data float high during the 18th SCL pulse, so the first slave releases the SDA line. Rather than issuing a STOP at this point, the master issues a repeated start or RESTART. The 7-bit address and 1-bit direction transferred in the 20th through 27th SCL pulses will select the second slave for writing. In this example, the direction is low, signifying a write format. The 28th pulse will be used by the second slave to acknowledge it has been selected. The 29th through 36th SCL pulses sends the data to the second slave. During the 37th pulse the second slave to acknowledge the data it received. The master signals the end of transmission by sending a STOP condition. S
SDA
Address1
A
A7 A6 A5 A4 A3 A2 A1 R
Data
N R
D7 D6 D5 D4 D3 D2 D1 D0
Address2
A
A7 A6 A5 A4 A3 A2 A1 W
Data
A P
D7 D6 D5 D4 D3 D2 D1 D0
SCL 1
8 9 10
17 18
20
27 28 29
36 37
Figure 12.23 I2C transmission of one byte from the first slave and one byte to a second slave.
Table 12.3 lists some addresses that have special meaning. A write to address 0 is a general call address, and is used by the master to send commands to all slaves. The 10-bit Table 12.3 Special addresses used in the I2C network.
Address
R/W
Description
0000 0000 0000 0000 0000 0000 1111 1111
0 1 x x 0 x x X
General call address Start byte CBUS address Reserved for different bus formats Reserved High speed mode 10-bit address Reserved
000 000 001 010 011 1xx 0xx 1xx
464
12 䡲 Communication Systems
address mode gives two address bits in the first frame and 8 more address bits in the second frame. The direction bit for 10-bit addressing is in the first frame.
12.7.2 I2C Synchronization
The I2C bus supports multiple masters. If two or more masters try to issue a START command on the bus at the same time, both clock synchronization and arbitration will occur. Clock synchronization is procedure that will make the low period equal to the longest clock low period and the high is equal to the shortest one among the masters. Figure 12.24 illustrates clock synchronization, where the top set of traces is generated by the first master, and the second set of traces is generated by the second master. Since the outputs are open collector, the actual signals will be the wired-AND of the two outputs. Each master repeats these steps when it generates a clock pulse. It is during Step 3) that the faster device will wait for the slower device 1. 2. 3. 4.
Drive its SCL clock low for a fixed amount of time Let its SCL clock float Wait for the SCL to be high Wait for a fixed amount of time, stop waiting if the clock goes low
Because the outputs are open collector, the signal will be pulled to a logic high by the 2 k resistor only if all devices release the line (output a logic high). Conversely, the signal will be a logic low if any device drives it low. When masters create a START, they first drive SDA low, then drive SCL low. If a group of masters are attempting to create START commands at about the same time, then the wire-AND of their SDA lines has its 1 to 0 transition before the wire-AND of their SCL lines has its 1 to 0 transition. Thus, a valid START command will occur causing all the slaves to listen to the upcoming address. In the example shown in Figure 12.24, Master #2 is the first to drive its clock low. In general, the SCL clock will be low from the time the first master drives it low (time 1 in this example), until the time the last master releases its clock (time 2 in this example.) Similarly, the SCL clock will be high from the time the last master releases its clock (time 2 in this example), until the time the first master drives its clock low (time 3 in this example.) Figure 12.24 I2C timing illustrating clock synchronization and data arbitration.
Master #1
{
Master #2
{
SDA 2
{
SDA
Actual I 2C bus
SDA 1
2
A7
A6
SCL1 1 A7
3
4 A6
SCL2
SCL
The relative priority of the contending masters is determined by a data arbitration procedure. A bus master loses arbitration if it transmits logic “1” while another master transmits logic “0”. The losing masters immediately switch over to slave receive mode and stop driving the SCL and SDA outputs. In this case, the transition from master to slave mode does not generate a STOP condition. Meanwhile, a status bit is set by hardware to indicate loss of arbitration. In the example shown in Figure 12.24, master #1 is generating an address with A7 1 and A6 0, while master #2 is generating an address with A7 1 and A6 1. Between times 2 and 3, both masters are attempting to send A7 1, and notice the actual SDA line is high.
12.7 䡲 *Inter-Integrated Circuit (I2C) Interface
465
At time 4, master #2 attempts to make the SDA high (A6 1), but notices the actual SDA line is low. In general, the master sending a message to the lowest address will win arbitration. The third synchronization mechanism occurs between master and slave. If the slave is fast enough to capture data at the maximum rate, the transfer is a simple synchronous serial mechanism. In this case the transfer of each bit from master to slave is illustrated by the following interlocked sequences Master Sequence 1. 2. 3. 4. 5. 6.
Drive its SCL clock low Set the SDA line Wait for a fixed amount of time Let its SCL clock float Wait for the SCL to be high Wait for a fixed amount of time
Slave Sequence (no stretch)
6. Capture SDA data on low to high edge of SCL
7. Stop waiting if the clock goes low If the slave is not fast enough to capture data at the maximum rate, it can perform an operation called clock stretching. If the slave is not ready for the rising edge of SCL, it will hold the SCL clock low itself until it is ready. Slaves are not allowed to cause any 1 to 0 transitions on the SCL clock, but rather can only delay the 0 to 1 edge. The transfer of each bit from master to slave with clock stretching is illustrated by the following sequences Master Sequence
Slave Sequence (clock stretching)
1. 2. 3. 4. 5. 6. 7.
1. 2. 3. 4. 5. 6.
Drive its SCL clock low Set the SDA line Wait for a fixed amount of time Let its SCL clock float Wait for the SCL clock to be high Wait for a fixed amount of time Stop waiting if the clock goes low
Wait for the SCL clock to be low Drive SCL clock low Wait until it’s ready to capture Let its SCL float Wait for the SCL clock to be high Capture the SDA data
Clock stretching can also be used when transferring a bit from slave to master Master Sequence
Slave Sequence (clock stretching)
1. Drive its SCL clock low 2. Wait for a fixed amount of time
1. 2. 3. 4. 5.
4. 5. 6. 7. 8.
Let its SCL clock float Wait for the SCL clock to be high Capture the SDA input Wait for a fixed amount of time, Stop waiting if the clock goes low
Wait for the SCL clock to be low Drive SCL clock low Wait until next data bit is ready Let its SCL float Wait for the SCL clock to be high
Observation: Clock stretching allows fast and slow devices to exist on the same I2C bus Fast devices will communicate quickly with each other, but slow down when communicating with slower devices. Checkpoint 12.17: Arbitration continues until one master sends a zero while the other sends a one. What happens if two masters attempt to send data to the same address?
12.7.3 9S12 I2C Details
Many 9S12 microcontrollers have an I2C interface, but they implement just a subset of the standard. They support master and slave modes, can generate interrupts on start and stop conditions, and allow I2C networks with multiple masters. On the other hand, the 9S12
466
12 䡲 Communication Systems
microcontrollers do not support general call, 10-bit addressing, or high speed mode. As shown in Figure 12.19, I/O pins PJ7 and PJ6 can be connected directly to an I2C network. Because I2C networks are intended to connect devices on the same PCB board, no special hardware drivers are required. Stop mode and wait mode are two low power states. Stop mode occurs when the device is turned off (IBEN 0), and wait mode is a general state issued by the software when it executes a wai instruction. Table 12.4 lists the I2C ports on the 9S12D64 and 9S12DP512.
Address
Bit 7
6
5
4
3
2
1
Bit 0
Name
$00E0 $00E1 $00E2 $00E3 $00E4
ADR7 IBC7 IBEN TCF DB7
ADR6 IBC6 IBIE IAAS DB6
ADR5 IBC5 MS/SL IBB DB5
ADR4 IBC4 Tx/Rx IBAL DB4
ADR3 IBC3 TXAK 0 DB3
ADR2 IBC2 RSTA SRW DB2
ADR1 IBC1 0 IBIF DB1
0 IBC0 IBSWAI RXAK DB0
IBAD IBFD IBCR IBSR IBDR
Table 12.4 9S12 I2C ports.
IBAD is the Bus Address Register. This register contains the address the 9S12 will respond to when addressed as a slave, thus, it is not the address sent on the bus during the address transfer. IBCR is the I2C control register, containing many of the bits that configure the I2C interface. IBEN is the I2C enable bit, which must be set to activate the interface. IBIE is the shared interrupt arm bit for the three flags IAAS, TCF, and IBAL. MS/SL is the master/slave bit, where 1 means master and 0 means slave. When this bit is changed from 0 to 1, a START signal is generated. When this bit is changed from 1 to 0 a STOP signal is generated. A STOP signal should only be generated if the IBIF flag is set. MS/SL is cleared without generating a STOP signal automatically when the master loses arbitration. The Tx/Rx bit specifies whether the next data transfer will be an output (equals 1) or an input (equals 0). When operating the interface as a slave and an address match occurs, the Tx/Rx bit should be set to match the SRW flag received during the address match. When sending an address as a master, Tx/Rx should be 1. When a master sends data to a slave, it specifies the R/W bit (bit 0 of the address frame), and sets Tx/Rx in the IBCR. TXAK specifies the value driven onto SDA during data acknowledge cycles for both master and slave receivers. The I2C module will always acknowledge address matches, provided it is enabled, regardless of the value of TXAK. TXAK is only used when the I2C Bus is a receiver, not a transmitter. When receiving data as a master or a selected slave, this bit determines if an acknowledgement will be sent during the 9th clock bit. 0 means an acknowledgement will be sent (Ack), and 1 means no acknowledgement will be sent (Nack). A repeated start (RESTART) will be sent if the master writes a 1 to the RSTA bit, provided this 9S12 is the current bus master. RSTA is a write-only bit, reads from this bit always return 0. Attempting a repeated start when the bus is owned by another master will result in loss of arbitration. If IBSWAI is 1, then the I2C device will halt during wait mode. IBFD is the I2C Bus Frequency Divider Register, which determines the baud rate transferred as a master. The bit clock generator is implemented as a prescale divider— IBC7-6, prescaled shift register—IBC5-3 select the prescaler divider and IBC2-0 select the shift register tap point. The timing of the 9S12 I2C interface is derived from the bus clock. Table 12.5 presents the three fields of IBFD that define operating speed. Figure 12.25 defines the four timing intervals. tstart is the delay from fall of SDA data to fall of SCL clock during a START or RESTART. tbit is the time to transfer one bit. thold is the time after the fall of the clock that the data will remain valid. tstop is the delay from the rise
12.7 䡲 *Inter-Integrated Circuit (I2C) Interface
467
IBC7-6
MUL
IBC5-3
scl2 start
scl2 stop
scl2 tap
tap2 tap
IBC2-0
SCLTap
SDATap
00 01 10 11
1 2 4 reserved
000 001 010 011 100 101 110 111
2 2 2 6 14 30 62 126
7 7 9 9 17 33 65 129
4 4 6 6 14 30 62 126
1 2 4 8 16 32 64 128
000 001 010 011 100 101 110 111
5 6 7 8 9 10 12 15
1 1 2 2 3 3 4 4
Table 12.5 9S12 I2C timing components as specified by IBFD. Figure 12.25 9S12 I2C timing intervals.
t hold
START or RESTART
STOP
SDA SCL t start
t bit
t stop
of SCL clock to rise of SDA data during a STOP. Table 12.6 gives the ratio of bus frequency to I2C frequency for all possible values of IBFD. For example, if the bus frequency is 8 MHz, and we wish to create an I2C clock frequency of 200 kHz, then we need a divider value of 40. From Table 12.6, we see IBFD could be chosen as $07, $0B, or $40. Let tE be the period of the bus clock, then the four timing intervals are tbit tE • MUL • {2 • (scl2tap [(SCLTap-1) • tap2tap] 2)} thold tE • MUL • {scl2tap [(SDATap-1) • tap2tap] 3} tstart tE • MUL • [scl2start (SCLTap-1) • tap2tap] tstop tE • MUL • [scl2stop (SCLTap-1) • tap2tap] IBFD
Table 12.6 9S12 I2C clock divider values as specified by IBFD.
468
12 䡲 Communication Systems Checkpoint 12.18: Assuming a 24 MHz bus clock, what value can you program into IBFD to create a 100 kHz baud rate?
IBSR is the I2C Bus Status Register. This status register is read-only, except that IBIF, IAAS, and IBAL can be cleared if the software writes a 1 to the corresponding bit position. TCF is the Data transferring bit. While one byte of data is being transferred, this bit is cleared. It is set by the falling edge of the 9th clock of a byte transfer. Note that this bit is only valid during or immediately following a transfer to the I2C module or from the I2C module. IAAS is the Addressed as a slave bit, which is set when its own specific address (IBAD) is matched with the calling address sent by another master. The IAAS flag will request an interrupt if the IBIE arm bit is set. After IAAS is set, the software should check the SRW bit and set its Tx/Rx mode accordingly. Writing IAAS clears this bit. IBB is the Bus busy bit, indicating the status of the bus. When a START signal is detected, the IBB is set. If a STOP signal is detected, IBB is cleared. A master should wait until IBB is 0, signifying the bus is idle before initiating a transfer. IBAL is the Arbitration Lost bit, which is set by hardware when the arbitration procedure is lost. Arbitration is lost in the following circumstances: 䡲 SDA sampled low when the master drives a high during an address or data transmit cycle. 䡲 SDA sampled low when master drives a high during the acknowledge bit of a data receive cycle. 䡲 A START cycle is attempted when the bus is busy. 䡲 A RESTART cycle is requested in slave mode. 䡲 A STOP condition is detected when the master did not request it. The IBAL bit must be cleared by software, by writing a one to it. SRW is Slave Read/Write bit, which indicates the value of the R/W command bit of the calling address sent from the master. This bit is only valid when the 9S12 is in slave mode, a complete address transfer has occurred with an address match and no other transfers have been initiated. Checking SRW, the CPU can select slave transmit/receive mode according to the command of the master. If SRW is 0, it means the master writing data to this 9S12 as a slave. Conversely, if SRW is 1, master wishes this 9S12 slave to transmit data back to the master. IBIF is the I2C Interrupt bit, which is set when one of the following conditions occurs: 䡲 Arbitration lost (IBAL bit set) 䡲 Byte transfer complete (TCF bit set) 䡲 Addressed as slave (IAAS bit set) These three conditions will request an interrupt if the IBIE arm bit is set. IBIF must be cleared by software, writing a one to it. RXAK is the Received Acknowledge bit, which is the value of SDA during the acknowledge bit of a bus cycle. If the RXAK is low, it indicates an acknowledge signal has been received during the 9th clock. If RXAK is high, it means no acknowledge signal is detected at the 9th clock. IBDR is the I2C Bus Data I/O Register. In master transmit mode, when data is written to the IBDR a data transfer is initiated. As shown in Figures 12.20 through 12.23, the most significant bit is sent first. In master transmit mode (MS/SL 1 and Tx/Rx 1), writing this register initiates a 9-bit transmission, outputting to both SDA data and SCL clock. In master receive mode (MS/SL 1 and Tx/Rx 0), reading this register initiates next byte data receiving, where the 8-bit data from SDA input and the SCL clock is an output. When either a master or a slave is sending, the TXAK bit is send during the 9th clock. In slave mode, input/output functions are available only after an address match has occurred. Note that the Tx/Rx bit must correctly reflect the desired direction of transfer in master and slave modes for the transmission to begin. Reading the IBDR will return the last byte received while the IIC is configured in either master receive or slave receive modes. The IBDR does not reflect every byte that is transmitted on the I2C bus, nor can software verify that a byte has been written to the IBDR correctly by reading it back. In master transmit mode, the first byte of
12.7 䡲 *Inter-Integrated Circuit (I2C) Interface
469
data written to IBDR following assertion of MS/SL is used for the address transfer and should comprise of the calling address (in position D7 to D1) concatenated with the required R/W bit (in position D0).
12.7.4 9S12 I C Single Master Example 2
Program 12.10 9S12 I2C initialization in single master mode.
The objective of this example is to present a low-level device driver for an I2C network where this 9S12 is the only master, as shown in Program 12.10. This simple example will employ busy-wait synchronization. I2C_Open first enables the I2C interface, starting out in slave mode. Since this is the only master, it does not need a slave address (IBAD).
Program 12.11 contains the function I2C_Send that transmits two bytes to a slave, creating a transmission shown in Figure 12.21. In a system with multiple masters is should check to see if the bus is idle first. Because this system has just one master, the bus should be idle. By setting the MS/SL bit, the 9S12 will create a START condition. In a system with multiple masters, it should check to see if it lost bus arbitration (IBAL). The slave address (with bit 0 equal to 0) will be sent. The two data bytes are sent, then the STOP is issued. If there is a possibility the slave doesn’t exist, then this program could have checked RXAX after each transfer. Program 12.11 9S12 I2C transmission in single master mode.
unsigned char data1, unsigned char data2){ // send START // send address with D0=0 signifying write ; // wait for the address to be sent // clear IBIF // send first byte ; // wait for the data to be sent // clear IBIF // send second byte ; // wait for the data to be sent // clear IBIF // send STOP
Program 12.12 contains the function I2C_Recv that receives two bytes from a slave, creating a transmission shown in Figure 12.22. By setting the MS/SL bit, the 9S12 will create a START condition. During the first transfer, the Tx/Rx bit is 1, so the slave address (with bit 0 equal to 1) will be sent. During the second two transfers, the Tx/Rx bit is 0, so data flows into the 9S12. To trigger the first data reception, the software performs a dummy read on the IBDR. During the first data transfer TXAK is 0, creating a positive acknowledgement. Conversely during the second data transfer TXAK is 1, creating a negative acknowledgement, signaling to the slave that this is the last data to be transferred. The two data bytes are received, then the STOP is issued. Notice that the last byte is captured after the STOP is issued, so that a third data transfer is not initiated.
470
12 䡲 Communication Systems
Program 12.12 9S12 I2C reception in single master mode.
12.8
unsigned short I2C_Recv(char slave){ unsigned char data1,data2; IBCR |= 0x30; // send START IBDR = slave|0x01; // send address with D0=1 signifying read while((IBSR&0x02)==0){} ; // wait for the address to be sent IBSR = 0x02; // clear IBIF IBCR &= ~0x18; // Tx/Rx=0, and TXAK=0 data1 = IBDR; // dummy read to initiate receiving while((IBSR&0x02)==0){} ; // wait for the data to be received IBSR = 0x02; // clear IBIF IBCR |= 0x08; // TXAK=1 data1 = IBDR; // capture first byte, initiate second while((IBSR&0x02)==0){} ; // wait for the address to be sent IBSR = 0x02; // clear IBIF IBCR &=~0x38; // send STOP data2 = IBDR; // capture second byte return (data1<8)+data2; }
Wireless Communication Wireless communication is beyond the scope of this introductory textbook. Nevertheless this short section illustrates the interfacing techniques presented in this book are sufficient to implement wireless communication between 9S12 systems. In general, one considers bandwidth, distance, topology and security when designing a wireless link. In this example, the goal is to communicate at 1000 bytes/sec between two 9S12 systems within the same room without security. This low bandwidth can be solved with a radio-frequency (RF) link, without the complexities necessary to support BlueTooth or 802.11. This short distance is classified as a Short Range Device (SRD). There are many RF communication modules that could have been used. Chipcon makes a low-power 2.4 GHz RF transceiver with a SPI interface. The CC2500 is intended for 2400–2483.5 MHz Industrial, Scientific and Medical (ISM) applications. As illustrated in Figure 12.26, the CC2500 interface has three parts. The computer interface uses a SPI protocol, the clock circuit is based on an external crystal, and the antenna circuit must be tuned for the 2.4 GHz frequency. The system will operate up to 500 kbits/sec and the chip implements dual 64-bit FIFOs for transmit and receive. Basically, the 9S12 on the left transmits data via its SPI, and the 9S12 on the right receives the data with its SPI. It is a transceiver, meaning data can flow across the link in both directions.
Figure 12.26 Block diagram of a wireless link between two 9S12 systems.
9S12 SPI
Crystal
12.9
2.4 GHz Radio Frequency Link
Chipcon CC2500 Dipole Antenna
Chipcon CC2500 Dipole Antenna
9S12 SPI
Crystal
Tutorial 12 Performance Debugging Real-time profiling of the interrupting serial port program in Tutor12.rtf, which is a debugged version of tut4.rtf. Memory dumps are a minimally intrusive way to collect stragetic information without affecting too much the system we are testing. Observing the FIFO will allow us to determine if the system is CPU bound or I/O bound. It will also allow us to select
12.9 䡲 Tutorial 12 Performance Debugging
471
the optimal FIFO size. In particular, we will collect a histogram of the FIFO size at the time of a call to SCI_OutChar. This global variable to the RAM section. It will contain a histogram of the FIFO size during operation. Dbg_Hist
rmb
TXFIFO_SIZE
The subroutines shown in Program T12.1 were added to tut4.rtf. Debug_Init clears the histogram, and Debug_Histogram is called at the beginning of SCI_OutChar. Program T12.1 Debugging instruments to measure FIFO size during operation.
Question 12.1 Observe the debugging instruments and run the system. Describe the FIFO situation. Why doesn’t the TxFifo ever become full? Question 12.2 Reduce the baud rate to 9600 bits/sec. You will have to change both the SCI_Init and click on the IO window and execute IO-CRT . . . Run the slower baud rate system and describe the FIFO? Will it crash if the TxFifo becomes full? Action: Monitors are a minimally instrustive debugging tools used to observe the execute pattern of the system. Program T12.2 shows debugging instruments that can be used to visualize when a subroutine is entered and when it is exited. The enter code is added on the first line of the profiled function and the exit code is executed as the last line.
Program T12.2 Debugging instruments to profile the system.
Debugging instruments like these were added to the system. Port T output bits PT3, PT2, PT1, and PT0 are attached to a logic analyzer. In particular, the following four functions will be profiled: SCI_InChar SCI_OutChar RxFifo_Put TxFifo_Get. Observation: Profiling is made easier if the subroutine as a single rts exit point at the bottom of the function. Question 12.3 Figure T12.1 shows the initial activity as the system displays the welcome message. PT3 is the call to TxFifo_Get, and PT1 is the call to SCI_OutChar. Describe this initial behavior of the system. What is the sequence of execution as one character is transmitted in this CPU board system?
472
12 䡲 Communication Systems
Figure T12.1 Profile the system during the initial “Welcome” message.
Question 12.4 Figure T12.2 shows the activity as the system after a character is typed. PT2 is the call to RxFifo_Put, and PT0 is the call to SCI_InChar. Describe this behavior. Figure T12.2 Profile the system after a character is typed.
12.10
Homework Problems Homework 12.1 How do you tell if the following C code is reentrant? PTT |= 0x80; // set PT7
12.10 䡲 Homework Problems
473
Homework 12.2 How do you tell if the following C code is reentrant? Counter++; Homework 12.3 Consider the memory manager programs of Section 6.9. Are any of the following programs reentrant: Heap_Init, Heap_Allocate, Heap_Release, Fifo_Init, Fifo_Put, and Fifo_Get? How can the problem be fixed? Homework 12.4 Implement a FIFO system that uses indices instead of pointers to access the FIFO data. Each FIFO element is 1 byte. The following globals can not be changed (i.e., no counter) org $0800 ;place globals in RAM MaxSize equ 10 ;size of FIFO (can hold up to MaxSize-1 bytes) GetI rmb 1 ;Index into FIFO where to get next PutI rmb 1 ;Index into FIFO where to put next FIFO rmb MaxSize ; indices range from 0 to MaxSize-1 ; FIFO[GetI] is the oldest data ; FIFO[PutI] is the free spot for next put ; If GetI==PutI, then the FIFO is empty ; If wrap(PutI+1)==GetI then the FIFO is full The following subroutine initializes the FIFO Initialize clr GetI clr PutI rts a) Write the Put function. The data to Put is passed by value on the stack. Return Reg B 1 if the FIFO was full at the time of the call, and the data could not be saved. Return Reg B 0 if the data was properly saved. COMMENTS will be graded. Save and restore Registers A,X,Y if used. Here are a couple of typical calling sequences: jsr InCh psha jsr Put ins parameter Tstb bne Error
;keyboard data ;by value ;discard
ldab #55 pshb jsr Put ins tstb beq OK
;by value
b) Write the Get function. The data from Get is returned by reference using the stack. Return Reg B 1 if the FIFO was empty at the time of the call, and the data could not be removed. Return Reg B 0 if the data was properly removed. COMMENTS will be graded. Save and restore Registers A,X,Y if used. Here are a couple of calling sequences: ldx
#data ;reference to ;global pshx ;by reference jsr Get ;info goes in ;data ins ;discard ;parameter ins tstb bne Error
data set 0 des tsy pshy jsr Get ins ins tstb beq OK
;allocate data ;pointer to data ;by reference ;info goes in data
Homework 12.5 Interrupts are a good method to create a real-time DAS. A typical approach is illustrated in the following software skeleton. Let fs be the sampling rate. void interrupt 7 RTIHan(void){ RTIFLG = 0x80; // Acknowledge by clearing RTIF Fifo_Put(Adin()); // save new data } void main(void){ Initialization(); // clear FIFO, arm RTI, enable interrupts
474
12 䡲 Communication Systems while(1){ while(Fifo_Get(&data)){} ; Process(data); } } The best case, average, and worst case execution times are given in the following table. Assume all other software times can be neglected. The goal of a real-time DAS is to execute Adin() every 1/fs.
Program
Best Case (min)
Typical (average)
Worst Case (max)
Initialization Adin Process Fifo_Put Fifo_Get
912 s 25 sec 500 s 15 sec 20 sec
912 s 25 sec 1000 s 15 sec 20 sec
912 s 25 sec 5000 s 15 sec 20 sec
What is the maximum sampling rate possible for this interrupt-based DAS? The system must be realtime continuously processing data with no delayed or lost data points. Continuous means it runs without stopping (e.g., for years.) Assume the FIFO is arbitrarily large, but not infinite size. Homework 12.6 Consider the pointer-based Fifo_Put and Fifo_Get functions shown in Program 12.2. There is one foreground thread that calls Fifo_Get, and two interrupt threads that call Fifo_Put. In particular, both the regular RDRF/SCI interrupt and a TOF periodic timer ISR both call Fifo_Put to enter data into the Fifo. Is there a critical section? Homework 12.7 Consider the linked list Fifo_Put and Fifo_Get functions from Program 6.32. Assume exactly one is called from the foreground main program and one from the background ISR. Is there a critical section? I.e., do interrupts need to be disabled to use these two functions? Homework 12.8 These six events occur during each RDRF interrupt. Specify the proper time sequence. Put your answer as a six letter sequence, e.g., F,E,D,C,B,A or A,B,C,D,E,F a) The I bit in the CCR is set by hardware, and the SCI vector address is loaded into the PC b) The hardware sets the flag bit (e.g., RDRF 1) when a new data frame is received c) The software writes the data into a permanently allocated memory variable d) The software reads the SCI0SR1 and reads the SCI0DRL e) The rti instruction is executed f) The CCR, A, B, X, Y, PC are pushed on the stack Homework 12.9 These six events occur during each TDRE interrupt. Specify the proper time sequence. Put your answer as a six letter sequence, e.g., F,E,D,C,B,A or A,B,C,D,E,F a) The rti instruction is executed b) The I bit in the CCR is set by hardware, and the SCI vector address is loaded into the PC c) The hardware sets the flag bit (e.g., TDRE 1) when the transmitter is idle d) The software reads the SCI0SR1 and writes the SCI0DRL e) The software reads new the data from a permanently allocated memory variable f) The CCR, A, B, X, Y, and PC are pushed on the stack Homework 12.10 The following code was used to acknowledge a receive data register full (RDRF) interrupt. The flag RDRF is bit 5 in the SCISR1 register. Which explanation best describes this code? SCISR1 |= 0x20; a) This software only makes the RDRF bit high. It is friendly. b) This software only makes the RDRF bit low. It is friendly. c) This software will make all flag bits high in the SCISR1 register. It is not friendly. d) This software will make all flag bits low in the SCISR1 register. It is not friendly. e) This will cause a compile error because the software can not set flag bits in the SCISR1 register. f) This will cause a run-time crash because the software does not clear RDRF
12.10 䡲 Homework Problems
475
Homework 12.11 Consider the situation in which a FIFO queue is used to buffer data between a main program (e.g., SCI_OutChar that calls TxFifo_Put) and an output interrupt service routine (e.g., SCIhandler that calls TxFifo_Get and writes to SCIDRL). Experimental observations show this FIFO is usually empty, and at most contains three elements. What does it mean? Choose A-F. a) The system is I/O bound b) Bandwidth could be increased by increasing FIFO size c) The system is CPU bound d) The FIFO could be replaced by a global variable e) The latency is small and bounded f) Interrupts are not needed in this system Homework 12.12 Consider the situation in which a FIFO queue is used to buffer data between a main program (e.g., SCI_OutChar that calls TxFifo_Put) and an output interrupt service routine (e.g., SCIhandler that calls TxFifo_Get and writes to SCIDRL). Experimental observations show this FIFO often becomes full, and usually having more than 5 elements. What does it mean? Choose A-F. a) The system is I/O bound b) Bandwidth could be decreased by increasing FIFO size c) The system is CPU bound d) The FIFO could be replaced by a global variable e) The latency is small and bounded f) Interrupts are not needed in this system Homework 12.13 Consider the situation in which a FIFO queue is used to buffer data between a main program (e.g., SCI_InChar that calls RxFifo_Get) and an input interrupt service routine (e.g., SCIhandler that reads SCI0DRL and calls RxFifo_Put). Experimental observations show this FIFO is often becomes full, and usually having more than five elements. What does it mean? Choose A-F. a) The system is I/O bound b) Bandwidth could be decreased by increasing FIFO size c) The system is CPU bound d) The FIFO could be replaced by a global variable e) The latency is small and bounded f) Interrupts are not needed in this system Homework 12.14 Consider the situation in which a FIFO queue is used to buffer data between a main program (e.g., SCI_InChar that calls RxFifo_Get) and an input interrupt service routine (e.g., SCIhandler that reads SCI0DRL and calls RxFifo_Put). Experimental observations show this FIFO is usually empty, and at most contains three elements. What does it mean? Choose A-F. a) The system is I/O bound b) Bandwidth could be increased by increasing FIFO size c) The system is CPU bound d) The FIFO could be replaced by a global variable e) The latency is small and bounded f) Interrupts are not needed in this system Homework 12.15 Consider the situation in which the interrupt-driven SCI device is used for debugging purposes. In other words, the basic system operation does not require SCI output, but the programmer adds calls to SCI_OutUDec in order to visualize strategic parameters in real-time during execution. Which statement best describes the intrusiveness of this debugging method? a) If it is the case that the TxFifo becomes full, SCI_OutUDec will have to wait for the output to complete. In this situation, it is highly intrusive. b) Because interrupts are used, the time to execute SCI_OutUDec will be short. Therefore, this debugging method is always minimally intrusive. c) It is nonintrusive, because the SCI is not required for basic system operations. d) It can be made minimally intrusive if the baud rate is set at the maximum rate. e) It is minimally intrusive because output occurs in the background.
476
12 䡲 Communication Systems Homework 12.16 A serial transmission channel passes one bit at a time, whereas a parallel transmission channel passes multiple bits at the same time. It would seem that parallel format would be faster, but when comparing communication protocols used on the PC we notice that the serial channels (e.g., USB, Ethernet SATA) are much faster than parallel channels (e.g., Printer port IDE). Why might this be true? Homework 12.17 Design a simplex communication channel between two 9S12 using the SCI port using FIFO queues and interrupts as appropriate. Assume each 9S12 runs a separate initialization routine at about the same time. Write a public function for the transmitter called by the main program to send a byte and a public function for the receiver called by its main program to accept a byte. Package it up into a module hiding the mechanisms from the policies. Estimate the maximum bandwith of the channel. Homework 12.18 Design a simplex communication channel between two 9S12 using the SPI port using FIFO queues and interrupts as appropriate. Assume each 9S12 runs a separate initialization routine at about the same time. Write a public function for the transmitter called by the main program to send a byte and a public function for the receiver called by its main program to accept a byte. Package it up into a module hiding the mechanisms from the policies. Estimate the maximum bandwith of the channel. Homework 12.19 Design a simplex communication channel between two 9S12 using the CAN port using FIFO queues and interrupts as appropriate. Assume each 9S12 runs a separate initialization routine at about the same time. Write a public function for the transmitter called by the main program to send a byte and a public function for the receiver called for its main program to accept a byte. Package it up into a module hiding the mechanisms from the policies. Estimate the maximum bandwith of the channel. Homework 12.20 Design a simplex communication channel between two 9S12 using the Ports H and K using FIFO queues and keywakeup interrupts as appropriate. Assume each 9S12 runs a separate initialization routine at about the same time. Write a public function for the transmitter called by the main program to send a byte and a public function for the receiver called by its main program to accept a byte. Package it up into a module hiding the mechanisms from the policies. Estimate the maximum bandwith of the channel. Homework 12.21 Assume you have three Port T pins 2, 3, and 4 available for this problem. Design a full-featured interrupt driven serial port on these lines using input capture and output compare. Implement the equivalent features as described in Program 12.5.
12.11
Laboratory Assignments Lab 12.1 Distributed Data Acquisition System Purpose: The purpose of this lab is to learn about realt-time sampling, FIFOs, serial port interfacing and distributed systems. Description: You will extend the system from Lab 11.4 to implement a distributed system. In particular, one 9S12 will sample the data at 5 Hz and a second 9S12 will display the results on its LCD. Basically the hardware/software components from Lab 11.4 will be divided and distributed across two 9S12 microcontrollers. Figure L12.1a shows the data flow graph of the distributed data acquisition system. The sensor is attached to computer 1, and the ADC (ADC_In4 function) in computer 1 generates a digital value from 0 to 1023. The output compare periodic interrupt in computer 1 establishes the realtime sampling at 5 Hz. You will send data from computer 1 to computer 2 using asynchronous serial communication (SCI1). Busy-wait synchronization on TDRE must be used in computer 1, and RDRF interrupt synchronization must be used on computer 2. You can choose any baud rate you wish, as long as both computers use the same rate. One way to send the 10-bit sample is to break it into two parts and transmit it as two 8-bit bytes. For example, let b9b8b7b6b5b4b3b2b1b0 be a 10-bit sample. One possibility is to transmit two bytes 011b9b8b7b6b5 and 010b4b3b2b1b0. This way the receiver can combine the two bytes back into a 10bit sample without mistakenly switching the most significant and least significant parts. Also notice that all transmissions are printable ASCII, so the two parts of the system can be separately debugged
12.11 䡲 Laboratory Assignments
477
in TExaS. The time to transmit one bit is called the bit time, which is 1 divided by the baud rate. Every 200 ms, two bytes will be transferred (a total of 20 bits). Choose a baud rate so that the time to transmit 20 bits is short compared to 200 ms. This will guarantee that both the transmit data register and the transmit shift register will be empty at the time the OC ISR is executed. Therefore, calling SCI_OutChar twice (busy-wait synchronization) will not actually have to wait, because the first data will be moved immediately into the transmit shift register and the second data can be loaded into the transmit data register (to be transmitted after the first frame is done). With this protocol if you lose a transmission, then the receiver should discard the extra byte. In this scheme, it does not matter which computer starts first. An alternate scheme to transmit 10-bit data on an 8-bit channel is to encode the data as a signed 8-bit difference from the previous data. The receiver starts with a 16-bit 0, each 8-bit signed data received is promoted to 16-bit signed and added to the previous value. One flaw in this protocol is if you lose a transmission, then an error will exist in all subsequent samples. However since each 10-bit sample is transmitted as only one 10-bit frame, this protocol will be twice as fast as the previous. If you implement a scheme that requires 3 or more SCI transmissions per sample, then the busy-wait synchronization in the transmitter will actually have to wait. Therefore, if your protocol requires three or more SCI transmissions per ADC sample, you will need to arm transmit interrupts and use a transmit fifo. This system is simple enough that no data should be lost. Thus, you do not have to solve the lost data scenario. However, you might have to start computer 2 before starting computer 1. An RDRF interrupt will occur in computer 2 for every SCI frame received. The FIFO queue is used to pass data from the RDRF interrupt service routine (background on computer 2) to the main program running in the foreground on computer 2. If the rate at which the ISR puts data into the FIFO is slower than the rate at which data can be sent to the LCD, then the FIFO will never become full. You are free to implement either a 16-bit FIFO (every other RDRF interrupt puts into the FIFO) or an 8-bit FIFO (every RDRF interrupt puts). The main program in computer 2 will output the position on its LCD. Figure L12.1a Data flows from the sensor through the two microcontrollers to the LCD. The output compare timer is used to trigger the real-time sampling. Use the special serial cable to connect the two SCI1 ports.
Position Voltage 0 to 3 cm 0 to +5V Position Sensor
Sample 0 to 1023
ADC hardware
Sample 0 to 1023
ADC driver
Data 0 to 255
OC ISR
Data 0 to 1023
Fixed-point 0 to 3.000 LCD driver
main
SCI1 hardware
SCI driver
OC hardware
Computer 1 Computer 2 LCD display
Data 0 to 255
Data 0 to 1023 FIFO
Data 0 to 255 Data 0 to 255
RDRF ISR
SCI1 hardware
Figure L12.1b shows a possible call graph of the system. Dividing the system into modules allows for concurrent development and eases the reuse of code.
Figure L12.1b A call graph showing the modules used by the distributed data acquisition system.
Computer 1 main1
OC init OC hardware
Computer 2
OC ISR
RDRF ISR
ADC driver
SCI driver
ADC hardware
SCI1 hardware
SCI driver SCI1 hardware
main2
FIFO
LCD driver LCD hardware
478
12 䡲 Communication Systems Computer 1 Software Tasks a) Write a subroutine: SCI_Init1 that will initialize the SCI1 transmitter in computer 1. 1. Enable SCI1 transmitter (no interrupts) 2. Set the baud rate b) Write a subroutine: SCI_OutChar for computer 1 that sends one byte using busy-wait synchronization on TDRE. 1. Wait for TDRE in SCI1SR1 to be 1 2. Write a byte to SCI1DRL c) Modify the output compare interrupt handler from Lab 11.4 so that it samples the ADC at 5 Hz and sends the data to the other computer using SCI. The interrupt service routine performs these tasks 1. acknowledge the output compare interrupt by clearing the flag that requested the interrupt 2. specify the time for the next interrupt 3. toggle PT7 (change from 0 to 1, or from 1 to 0) 4. sample the ADC 5. break the 10-bit sample into two parts and send two bytes to the other computer (calls SCI_OutChar twice) 6. return from interrupt d) Write the main program for computer 1, which initializes the PLL, timer, ADC, SCI1, and output compare interrupts. After initialization, this main program (foreground) performs a do-nothing loop. The entire run-time operations in computer 1 occur in the output compare interrupt service routine (background). Computer 2 Software Tasks e) Design, implement and test a FIFO software module for computer 2 that operates on either 8-bit or 16-bit values. This module should operate in a similar manner as the FIFOs in the example tut4. I.e., write three subroutines Fifo_Init, Fifo_Put, Fifo_Get. The size of the queue can be about 4 to 6 elements. Use the simulator to test these functions. The software design steps are 1. Define the names of the functions, input/output parameters, and calling sequence. Type these definitions in as comments that exist at the top of the subroutines. 2. Write pseudo-code for the operations. Type the sequences of operations as comments that exist within the bodies of the subroutines. 3. Write assembly code to handle the usual cases. I.e., at first, assume the FIFO is not full on a put, not empty on a get, and the pointers do not need to be wrapped. 4. Write a main program to test the FIFO operations. Debug in the TExaS simulator. 5. Iterate steps 3 and 4 adding code to handle the special cases. f) Write a subroutine: SCI_Init2 that will initialize the SCI1 receiver in computer 2. 1. 2. 3. 4.
Clear a global error count Enable SCI1 receiver (arm interrupts for RDRF) Set the baud rate to match computer 1 Enable interrupts
g) Write a RDRF interrupt handler that receives data from the other computer and puts them into a FIFO queue. The number of lost samples will be maintained in the global error count. The interrupt service routine performs these tasks 1. 2. 3. 4. 5. 6.
Acknowledge the interrupt by clearing the flag which requested the interrupt Read the data received from SCI1DRL Toggle PT7 (change from 0 to 1, or from 1 to 0) Put the new data into the FIFO queue Increment a global error count if the FIFO fills up (but don’t loop back) Return from interrupt
12.11 䡲 Laboratory Assignments
479
h) Design a main program for computer 2 that reads data from the FIFO, converts it to fixedpoint, and displays the measurement on the LCD routines The main program in this data acquisition system performs these tasks 1. 2. 3. 4. 5. 6.
Initialize PLL, FIFO, LCD, and SCI Try to remove a sample from the FIFO queue Go back to step 2 if the FIFO was empty and no data is available Convert sample to fixed-point Output the result as a fixed-point number with units Repeat steps 2,3,4,5 over and over
i) In this section you will estimate the maximum sampling rate. The limitation of computer 1 will be the time it takes to transmit 20 bits on the SCI. The limitation of computer 2 will be the time it takes to display one measurement on the LCD (e.g., if you move the LCD cursor rather than clearing the display, it will run faster). Change the output compare interrupt period in computer 1 so that it is close to but larger (slower) than the time for computer 1 to transmit one measurement and the time it takes computer 2 to display one measurement. Experimentally, verify the system can operate properly at this sampling rate (show the FIFO never gets full). Next, change the output compare sample period so that it is close to but smaller (faster) than these two times. Experimentally, determine want happens when you try to sample this fast (it doesn’t work, explain what happens and why). Without actually doing it, describe two changes to the system you could do so that the sampling rate could be increased beyond this limit. Hint: changing the size of the FIFO will not affect the maximum sampling rate.
Appendix 1 Embedded System Development Using TExaS A1.1
Introduction to TExaS The goal of this introduction is to overview the available information describing TExaS. TExaS is an educational product, to be used by the student as he or she learns how to design implement and test embedded systems. During the normal development process, one simulates first and builds a real system second. As a first-time user of TExaS, I suggest you watch the movies. Each movie addresses a fundamental concept, so go to this web site http://users.ece.utexas.edu/⬃valvano/ and click on Instructional Movies. The first three lessons provide sufficient information, allowing you to design, implement and test simple systems. The second source of information is the help system within TExaS. The help system is especially useful for looking up information about the op codes, pseudo-op codes, and I/O device configurations. The third source of information is the large quantity of example systems, some of which are listed in Table A1.1. I suggest when developing a system, you start with an example similar to your intended target, then learn how the example works. This will give you great insight as to what steps must be taken, what steps need not be taken, and the sequencing of the steps. TExaS is an integrated application containing an editor, assembler, instruction set simulator, I/O port simulator, and external device simulator. TExaS is an educational product. It was developed as a learning tool for real-time embedded systems. TExaS simulates a wide range of external devices that can be configured and attached to the microcomputer. It simulates all the machine instructions and many of the I/O ports of the 9S12 microcomputer. The interactive editor/assembler assists the new programmer. Extensive runtime checking helps to identify tricky software bugs. Students can use this application for a laboratory course on microcomputer programming or microcomputer interfacing. Since TExaS does not allow the user to create new external devices, its use for developing commercial products is inappropriate. Although you have full creative control over the software you write and the I/O ports you use, you are limited to specific predefined external devices. Like many applications, the windows are used to observe and modify the state of the simulated environment. Associated with each window is a file that contains the actual information. TExaS supports all five phases of software development: 䡲 䡲 䡲 䡲 䡲
Defining the microcomputer type and memory configuration Writing the program source code using an editor Assembling the source code and loading the object code into memory Interfacing external components Debugging the program by running it on the interactive simulator
There are six types of files used by the simulator. The first type is a Program file (*.rtf). Program files contain assembly source code, which includes explicit instructions for the computer in human readable format. Files of this type can be opened, edited, and saved by any application that supports standard rich text format (RTF). Applications that support RTF include TExaS (of course), Microsoft Word, and WordPad. TExaS allows you to simultaneously edit multiple RTF files. When your 480
A1.1 䡲 Table A1.1 Example starter files for assembly development within TExaS.
bsema4 etbl fact fixmem for fuz fuz8 FuzMotor hc12 hd44780 ic ir key keywake Lcd list LLfifo math mealy moore Motor OC PIDMotor PLL port12 PWMod Random robot RTI scan sci sqrt SqWave stepper table tbl Thread TOF Trap tree tut tut2 tut3 tut4 tut5 uif wai
Introduction to TExaS
481
Binary spinlock semaphore Extended precision table interpolation Factorial example Fixed block size memory manager Examples of for loops Fuzzy logic controller, two examples 8 bit fuzzy logic controller Fuzzy logic motor controller Symbol definitions for all 9S12 ports Hitachi HD44780 LCD example Input capture example with interrupts IR remote control example with input capture interrupts Simple 4 by 4 matrix keyboard interface Key wakeup example with interrupts Liquid crystal display example 1-D list data structure example Dynamic linked list FIFO with memory manager 32-bit multiply, 32-bit divide, 64-bit addition Mealy finite state machine with a statically allocated linked list Moore finite state machine with a statically allocated linked list DC motor, DAC actuator, ADC tachometer, software incremental control Squarewave generation using output compare, no interrupts DC motor, DAC actuator, ADC tachometer, software PID control Software to activate the PLL Symbol definitions for 9S12 ports simulated by TExaS Pulse width modulation using output compare 16 bit and 32 bit random number generators Stepper motor robot car Real time interrupt examples with and without interrupts 4 by 4 matrix scanned keyboard interface Serial Communications Interface example, no interrupts Fixed point square root Multiple squarewaves generated with output compare interrupts Stepper motor output with a statically allocated linked list 2-D table data structure 8 bit table lookup with interpolation Preemptive thread switcher with spinlock semaphores Timer Overflow examples with and without interrupts Software interrupt and illegal instruction trap Command interpreter based on a binary tree data structure Simple switch input/output Busy-waiting SCI interface ADC input with SCI output Interrupting SCI interface Periodic output compare interrupt, switch trigger input capture Unsigned if examples wai instruction example with interrupts
program is assembled, TExaS puts detailed information in the assembly listing. A cathode ray tube (CRT) terminal is a common device used by computers to input and output data. The following three RTF files are not program files, but rather have specific functions: TheLog.rtf TheList.rtf TheCRT.rtf
TExaS logs information into this file as the program is running This file contains the assembly listing This file contains the input/output data of a simulated remote CRT terminal
482
Appendix 1 䡲 Embedded System Development Using TExaS
All other RTF files are considered assembly source files that can be edited, assembled, and executed. TExaS includes a cross-assembler, which runs on one computer (the Windows platform) and converts assembly source code into object code for a different computer (Freescale 9S12). To develop C programs for the 9S12 you will need a cross-compiler. If you are developing C language programs with a C Cross-Compiler then you will not need any assembly source files. On the other hand, if you are developing assembly programs with this simulator, then you must have at least one assembly source file. It is into this file that you will type your assembly source code. The Windows operating system typically assumes the RTF files are WordPad or Microsoft Word files, so we won’t be able to start the simulator by double clicking the RTF icons. The file TheCRT.rtf will automatically open when simulating a remote terminal. Normally the two files TheLog.rtf and TheList.rtf will remain open when running the simulator, but they can be closed or minimized to increase the speed (and decrease the visibility) of the simulation. If closed, these two files can be opened again to restore the visibility. Observation: Hiding windows will improve the simulation speed (less human time as compared to microcontroller time). Checkpoint A1.1: What is an assembly source code?
The next file type is a Microcomputer file (*.uc). The details of the microcontroller are saved in this file. For example, if you are developing a system based on the 9S12DP512, this file contains the components inside the 9S12DP512 chip itself. In particular, the instruction set engine (e.g., CPU12) microcomputer type (port structure), the microcomputer speed (crystal frequency), the simulation modes, the ViewBox entries, the Break/ScanPoints entries and formats are saved in this file. You must have exactly one Microcomputer file open to run the simulation. You will be able to start the simulator by double clicking a microcomputer file. The third file type is an I/O Device file (*.io). This is an optional file containing the details of the external I/O devices that are connected to the microcomputer ports. Again, if you are developing a system based on the 9S12DP512, this file contains the components outside the 9S12DP512 chip. In particular, the I/O file defines devices external to the microcomputer like switches, LEDs, LCDs, keyboard, the CRT remote terminal, motors, IR remote control, motors and sensors. You can only have one I/O Device file open at a time. You will be able to start the simulator by double clicking an I/O Device file. The fourth file type is a Stack file (*.stk). The stack is a block of memory that the computer uses to hold temporary information. The top of the stack is defined by the register SP. In TExaS, the stack file (e.g., tut.stk) is used to create a window that you can use to observe the microcomputer stack during program execution. You can only have one Stack window open at a time. To view the stack, open an existing stack file, or create a new Stack file. The format for viewing the stack window is saved and recalled in the Microcomputer file. There are two options for defining the bottom of the stack, i.e., the value of the SP when the stack is empty. In manual mode, you explicitly define the location of the bottom of the stack by entering its address. In automatic mode, the simulator will set the bottom of the stack to the value loaded into the SP by the lds assembly instruction. The fifth file type is a Scope file (*.scp). These are optional files used for debugging. You may have up to two scope files open at a time. This file is used to create a window that you can use to observe the time dependent behavior of digital and analog signals during program execution. The scope can be configured as an oscilloscope or a digital logic analyzer. An oscilloscope, or scope for short, is a hardware debugging tool that shows the value of an analog voltage (e.g., 12 to 12 V) versus time. A logic analyzer is also a hardware debugging tool, but it shows digital signals (e.g., true and false) versus time. To use a scope, open an existing scope file, or create a new Scope file. Checkpoint A1.2: An oscilloscope graphically displays information about an electronic circuit. Which parameter is displayed on the y-axis and which parameter is displayed on the x-axis.
A1.2 䡲 Major Components of TExaS
483
Checkpoint A1.3: How does a logic analyzer differ from an oscilloscope? Observation: Closing the scope window greatly improves the simulation speed (less human time as compared to microcontroller time).
The last file type is a Plot file (*.plt). Files of this type are used to display the map when simulating the stepper motor robot. The colors on the map represent levels of friction or elevations. Black lines are walls and the red dots show where the robot is. The dots are left on the field so you can see where it has been. For more information on the stepper robot, see the help system within the application. When you open one of these six types of files, it will automatically open files of the other five types if they exist in the same directory (same name but different extension.) For example, double clicking the tut.uc icon will start the application and open the existing files tut.rtf, tut.uc, tut.io, tut.scp, and tut.stk (no tut.plt file exists.) Remember there can be only one Microcomputer, one I/O Device, two Scopes and one Stack file open at a time. So if there is already one of these types open when you open a program file (*.rtf), it can not open a second Microcomputer, I/O Device or Stack file. Notice also that even though you can open two scope files, only one can be automatically opened. The other scope file will have to be manually opened. There are two approaches when first starting to use TExaS. The first approach involves modifying an existing example. The second approach is to start from scratch. The various microcomputers have separate assemblers, instruction simulators and I/O port simulators. Consequently the first decision we must settle is the microcomputer type. If you were to change the microcomputer type in the middle of a software development project, then you must rebuild the I/O devices in the IO file. In summary, it is necessary to have at least one program file (*.rtf) and one Microcomputer file (*.uc) open in the application to develop assemble language software using the simulator. When developing software with a C compiler, only the Microcomputer file (*.uc) is needed. If you need devices external to the microcomputer (like switches, LED, LCD, keyboard, or CRT terminal) then an I/O file will be required. Also, having the two files TheLog.rtf and TheList.rtf open will greatly enhance the interactivity of the application. In the virtual world of simulation all of these files (and their associated windows) together constitute the embedded system.
A1.2
Major Components of TExaS Next, we will review the five major components of this application, and then we will outline the steps required to develop software using the simulator. The components allow you to write software, convert the software into machine code, simulate the execution by the processor, simulate the I/O ports of the microcomputer, and simulate external devices connected to the microcomputer. Editor: The TExaS editor is used to create assembly source files. If you are developing C language programs with a C Cross-Compiler like ICC12 or Metrowerks Codewarrior then you will not need this editor, but rather you will use the editor associated with the compiler. On the other hand, if you are developing assembly language programs, you will create your assembly language source code using this editor. In some ways, this editor is very similar to WordPad. This editor supports fonts, styles, multiple files and importing/exporting text files. In addition to the usual editor functions, there are two special features that assist assembly language development. The first feature is its ability to embed figures into the assembly source files. These figures can be flowcharts or circuit diagrams, and they can be quite useful in documenting your
484
Appendix 1 䡲 Embedded System Development Using TExaS
system. The tut2.rtf file is a good example of how to use embedded figures to document software. The second special feature is the automatic recolor feature. Enabling this automatic behavior provides real-time feedback while you are typing in the source window. Because a complete 2-pass assembly is performed to recolor the file after each change, it is recommended to enable this option only for short programs. For mediumsized programs, you can activate the “single-line recolor” mode, which performs a partial syntax check on a few lines above and below the editor focus. For long programs, the assembler will color the source code only when the program is explicitly assembled. The source colors are very helpful in identifying syntax errors. Assembler: The TExaS assembler converts source code (human-readable instructions like ldaa $10) into object code (microcomputer-readable instructions like $96 $10). There is a simple and direct translation between assembly instructions given in the source code and machine instructions given in the object code. During the first pass, the symbol table is created. The symbol table is a mapping from a symbolic name to its corresponding 16-bit address (e.g., PORTB on the 9S12 is $0001). During the second pass, the object code and assembly listing are created. The machine code is automatically loaded into the simulated memory. You may also create Freescale-standard S19 object code files if desired. The simulated execution does not require S19 object files. On the other hand, S19 object code files would be needed if you were to use the editor and assembler to create software to export object code to another application. The listing file, created in the TheList.rtf file, contains both the source and object codes combined in a human-readable format. Once again there are a couple of unique features of this assembler. If enabled, the assembler will add color to the source code, showing individual items like labels, op codes, pseudo-op codes, numbers, strings, and comments (see the Assemble->Options...command). You can configure the assembler to identify illegal items will a special color or format. The specific colors for each item can be specified with the Assemble->TextFormat...command. The second unique feature is the extensive error reporting that is included when a syntax error is found. Rather than reporting the usual terse “syntax error” message, this assembler attempts to suggest ways to correct the error. The third unique feature is its expression evaluator, which applies the standard rules of precedence and parentheses. For example five
equ ‘0’|(3+4*5/10)
The application will automatically create object code for the microcomputer you have selected. There are many assemblers on the market for the 9S12. This assembler attempts to support many of their syntax and commands. This means you can import software originally written for these other assemblers into TExaS. Checkpoint A1.4: What is object code?
Instruction Set Simulator: This part of the TExaS application implements the basic central processing unit (CPU) of the 9S12. The program counter, or PC, is a special register located in the CPU used to control program execution. Software is a sequence of specific instructions to be executed. These instructions are fetched/executed using the current value of the program counter, PC. The simulated memory supports the appropriate amount of RAM, EEPROM, or ROM. The bus is a set of digital signals that connect the CPU, memory and I/O devices. The unique aspects of the instruction set simulator are the bus cycle activity and the extensive error checking. The exact read/write cycles of the 6811 can be viewed. The three-element 16-bit instruction queue of the 9S12 will be explained in Chapter 3. TExaS simulates the 9S12 memory bus activity using a simplified 8-bit memory read/write cycles similar to the 6811. Although the software/hardware timing is accurate when simulating a 9S12, the memory bus activity is shown in a simplified format. Information collected during simulation is recorded in the TheLog.rtf file. The Mode
A1.2 䡲 Major Components of TExaS
485
menu commands are used to configure the instruction set simulator. There is a clear tradeoff between the simulation speed (your program runs faster) and the amount of information you can observe (your program runs slower.) Some Mode menu commands are illustrated in Table A1.2.
Table A1.2 TExaS Mode menu commands.
Processor Simulation Speed Run Mode Open S19 Mode Follow PC Halt on Error Break Mode Cycle View Instruction View Log Record
Specify which processor to simulate Choose the number of instructions between screen updates Specify various execution modes Specify parameters for loading externally-compiled S19 code Cursor in program document is updated Halt execution on a program error Toggle between breakpoints and scanpoints Show bus cycles in TheLog.RTF during execution Show instructions in the TheLog.RTF during execution Record View Box data in TheLog.RTF during execution
The Mode->Processor command allows you select the processor and memory configuration of the microcomputer. The Mode->SimulationSpeed command allows you to choose the number of instructions executed between screen updates. Many of the Mode menu commands can be set in one dialog box using the Mode->RunMode command. You use the Mode->OpenS19Mode command to specify how TExaS imports object code created with a cross-compiler. Toggle the Mode->FollowPC command to enable and disable highlighting the current instruction being executed in the TheList.rtf window. Toggle the Mode->CycleView command to enable and disable showing memory bus cycles in TheLog.rtf during execution. Toggle the Mode->InstructionView command to enable and disable displaying instructions in the TheLog.rtf as they are executed. Toggle the Mode->LogRecord command to enable and disable recording strategic information in TheLog.rtf during execution. While the Mode menu is used to configure the simulation, the Action menu initiates activity. Some Action menu commands shown in Table A1.3.
Table A1.3 TExaS Action menu commands.
Reset Step StepOver StepOut RuntoCursor Few Go OpenS19Again OpenS19 BackDump BreakatCursor
F9 F10 Shft F10 Alt F10 Alt 1 F11 F12 Alt 2 Alt 3 Alt 4 Alt 5
Hardware reset Single step, execute one instruction Step over, execute 1 instruction or 1 subroutine Finish subroutine Run to cursor in listing file Execute a few instructions and stop Start/stop execution Reload S19 object code and listing file Load S19 object code and listing file Dump log data from most recent instructions executed Place a breakpoint at listing cursor
The Action->Reset (F9) command performs a hardware reset on the microcomputer system. The Action->Step (F10) command executes one instruction. The Action->StepOver (Shft F10) command will execute one instruction. If that instruction is a subroutine call, then the entire subroutine will be executed. The Action->StepOut (Alt F10) will execute until the subroutine is finished then stop. Toggle the Action->Go (F12) command to start and stop simulation. You use the Action->OpenS19 command to
486
Appendix 1 䡲 Embedded System Development Using TExaS
import object code created with a cross-compiler. Executing the Action->BackDump (Alt 4) will display the activity generated by most recent instructions executed. The second unique aspect of this simulator is the error checking. Examples of illegal activity include: 䡲 䡲 䡲 䡲 䡲 䡲 䡲
Execution of an illegal instruction Read/write to an undefined address Stack underflow (causing a read/write from unimplemented memory) Write to ROM, EEPROM Read from unprogrammed ROM, EEPROM Read from RAM that has not yet been written to Read from an unimplemented I/O port
These error-checking operations will catch many run-time programming errors. Real 9S12 microcomputers will execute an unimplemented instruction trap interrupt (like a software interrupt) when the processor attempts to execute an illegal instruction. The TExaS application gives you the option of halting simulation or executing the trap interrupt (like the real microcomputer.) You select this option using the Mode->RunMode . . . command. Software bugs often results in one of the illegal activities shown above. Whereas a real computer gives garbage data then continues on when executing an illegal read or write, this simulator will report the error and stop. Debugging, the process of identifying and removing software errors, is an important aspect of embedded system developing, and this simulator has many powerful debugging tools. I/O Port Simulator: This part of the application simulates many of the I/O peripherals on the microcomputer. Simple peripherals include the parallel I/O ports with direction registers as appropriate. Other functions like the timer, timer overflow, input capture, output compare, key wakeup, serial communications interface and ADC are available as supported by the actual microcomputer device. Most but not all peripherals on the 9S12 are supported. For a complete list of implemented features see the Port12.rtf file created when TExaS is installed. Interrupts (flags, masks, priority and vectors) are accurately simulated. The application will automatically simulate the I/O ports for the specific microcomputer you have selected. For the latest implementation details see the readme.txt file. If your software accesses an unimplemented I/O port, a run-time error is generated. As we saw earlier, the command Mode->HaltonError will enable/disable the reporting of run-time errors. External Device Simulator: This is one of the most complex yet important parts of the TExaS application. What makes embedded system programming interesting is its interaction with physical devices external to the microcomputer. These devices are configured using the commands in the IO menu. An IO file must be open to create external I/O devices. The user interacts with these devices (e.g., toggling a switch) using the IO window. Logic probes and voltmeters are automatically attached as appropriate. A logic analyzer and oscilloscope can be added to provide visual information about signals outside the microcomputer chip.
A1.3
Embedded System Design Process Figures A1.1 through A1.4 illustrate the design process of an embedded system with four switches and four light emitting diodes (LEDs) using a simulator like TExaS. The requirements of this system is to have each switch control an LED. If a switch is pressed, the corresponding LED come on. Figure A1.1 shows the circuit diagram of the system, drawn during the early stages of the design.
A1.3 䡲 Embedded System Design Process Figure A1.1 Circuit diagram of an embedded system with four inputs and four outputs.
Inputs
Processing
+5V
Outputs 7405
PH3
+5V
PT3
+5V
22Ω 5μF
+5V
200Ω
9S12 74HC14
1kΩ
200Ω 74HC14
1kΩ
487
7405
PH2
PT2 +5V
22Ω +5V
200Ω
5μF 74HC14
1kΩ
7405
PH1 22Ω 5μF
200Ω 74HC14
1kΩ
+5V
PT1
+5V
7405
PH0
PT0
22Ω 5μF
Next, the system is “built” in the simulator, and Figure A1.2 shows the I/O window within TExaS. Figure A1.2 TExaS simulation of an embedded system with four inputs and four outputs.
PT3
PT2
PT1
PT0
0
z
0
z
0
z
0
z
PH3
PH2
PH1
PH0
0
5
0
5
0
5
0
5
Next, we will discuss the process of software development within the simulator. One way to develop assembly software is to first write the software in a high level language like C, then convert the software by hand into assembly. Program A1.1 shows the C code for this simple system. The line numbers are not part of the program, but were added for this example in order to help with the explanation. Characters between /* and */ are comments, and are added as documentation. Lines 2,3,4 are inserted by the compiler wizard when a new project is created, In particular, Line 3 defines symbols for all the I/O ports on the 9S12DP512. These symbols make the software easier to read. For example, line 11 could have been written as *(unsigned char volatile *)(0x0240) = Data;
488
Appendix 1 䡲 Embedded System Development Using TExaS
Program A1.1 C language program for the 9S12DP512 written in Metrowerks CodeWarrior.
1 2 3 4 5 6 7 8 9 10 11 12 13
/* ********ChapA1.c**********************/ #include /* common defines and macros */ #include <mc9s12dp512.h> /* derivative information */ #pragma LINK_INFO DERIVATIVE “mc9s12dp512” unsigned char Data; void main(void){ DDRH = 0x00; /* Port H in an input */ DDRT = 0xFF; /* Port T is an output */ while(1){ Data = PTH; /* read switch value into Data */ PTT = Data; /* write value to LEDs */ } }
Line 5 defines a global variable. Line 7, when executed, define Port H as an input port. Similarly, the execution of Line 8 makes Port T as an output. The while(1) code causes the lines 10 and 11 to be executed over and over. Executing the code Data ⴝ PTH; will bring copy of the 8 input pins of Port H into the global variable. The code PTT ⴝ Data; stores the value from the global out to the output Port T, changing the pattern on the LED lights. Program A1.2 shows the assembly code for the simple system shown previously in Figures A1.1 through A1.4. The line numbers are not part of the program, but were added for this example in order to help with the explanation. Characters that follows a semicolon (;) are comments, and are added as documentation. A line of assembly code has four fields. The label field is optional and starts in the leftmost column. The next field contains the op code (like ldaa) or a pseudo-op code (like equ). Op codes contain actual instructions to be executed by the computer. For example, ldaa brings a value from memory or I/O port into Register A, staa sends a value from Register A out to memory or I/O port, and bra causes the program to branch. Pseudo-op codes give instructions to the assembler and are not executed by the computer. The third field is the operand field, which contains information needed by the instruction. The instruction ldaa #n will load the number n into Register A. For example,
Program A1.2 Assembly language program for the 9S12DP512.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
; Appendix 1 tutorial program for the 9S12DP512 PTH equ $0260 ; Port H I/O Register DDRH equ $0262 ; Port H Data Direction Register PTT equ $0240 ; Port T I/O Register DDRT equ $0242 ; Port T Data Direction Register org $0800 ;globals go in RAM Data ds 1 ;copy of Input from Port H switches org $4000 ;object code goes in ROM main ldaa #$00 staa DDRH ;make all pins of Port H input ldaa #$FF staa DDRT loop ldaa PTH ;read switch values staa Data ;save a copy in global variable staa PTT ;output to lights bra loop ;repeat org $FFFE fdb main ;starting address after a RESET
A1.3 䡲 Embedded System Design Process
489
the #$0010 operand field in line 9 specifies the data to be used in the instruction is the value $00. On the other hand, the instruction ldaa N will load the contents of memory location N into Register A. For example, the PTH operand field in line 13 specifies the place to read the data will be Port H. The instruction staa N will store the contents of Register A out to memory location N. The last field, which is the right most field, contains comments. Comments are ignored by the computer, but used by the programmer to clarify the program operation. Tabs and/or spaces delimit the fields, which are nicely lined up in this example. Lines 2 through 5 use the pseudo-op equ to define symbols for I/O ports used on the 9S12DP512. Just like symbols in the C code, these symbols make the software easier to read. For example, line 15 could have been written as staa $0240, but notice how much easier it is to understand staa PTT. Lines 6 and 8 use the pseudo-op org to place the variables in RAM, and the program in ROM. Line 7 defines a global variable. Lines 9 through 16 give the actual instructions the 9S12 will execute. Lines 9 and 10, when executed, define Port H as an input port. The instruction ldaa #$00 places the value $00 into Register A. The instruction staa DDRH stores the value from Register A into the I/O register DDRH, making Port H an input. Executing Lines 11 and 12 make Port T an output. Executing the instruction ldaa PTH will bring a copy of the 8 input pins of Port H into Register A. The instruction staa PTT stores the value from Register A out to the output Port T, changing the pattern on the LED lights. The bra loop instruction causes lines 13-16 to be executed over and over. Lines 17 and 18 define the reset vector, which specifies where the software will begin when power is applied or when the reset button is pushed. Figure A1.3 shows a prototype of this system constructed with a commercially available development board constructed on a breadboard. The software would be tested again to verify its correctness. Figure A1.4 shows the final product. The final product must also be tested.
Figure A1.3 Prototype of an embedded system with four inputs and four outputs. (Courtesy of Jonathan Valvano.)
10
The # means immediate mode (take this as data) and the $ means hexadecimal.
490
Appendix 1 䡲 Embedded System Development Using TExaS
Figure A1.4 Final embedded system with four inputs and four outputs. (Courtesy of Jonathan Valvano.)
A1.4
Running and Modifiing Existing Assembly Language Programs The simplest place to begin using the TExaS simulator is to run an existing configuration, as listed in Table A1.1. In fact, the tutorials, located at the end of each chapter, will step you through this process in detail. In this section however, the general approach to running an existing system is presented. First, you choose the general topic of interest. For example, if you were interested in microcomputer software that used the serial port, you could choose the sci example. Next you choose the microcomputer. If you choose the 9S12DP512, open the sci.rtf file located in the MC9S12DP subdirectory. When any of the sci.* files is opened, TExaS will automatically open the related files. For example, sci.rtf is the source code, sci.uc is the microcomputer file, and sci.io is the external I/O devices. On most computers double-clicking sci.rtf will incorrectly start WordPad or Microsoft Word. So to start TExaS with these files, you could double-click the sci.uc icon. The program must be assembled before it can be executed. The assembly process involves converting the human readable assembly source code (sci.rtf) into the machine readable object code. Click on the source code (the sci.rtf window) and execute the Assemble->Assemble command (ctl-B). The TExaS assembler automatically loads the object code into simulated memory. Next, you run the simulation. You start the simulation by executing the Action->Go command (F12). There are many windows you can observe during execution. The ViewBox in the microcomputer window shows strategic information during execution. If the FollowPC mode is active (execute the Mode->FollowPC command to toggle this mode on and off), then the TheList.rtf window will show the current position of the executing software. The TheLog.rtf window can be configured to show a wide range of results during simulation. If the CycleView mode is active (execute the Mode->CycleView command to toggle this mode on and off), then the address/data bus activity will be logged. If the InstructionView mode is active (execute the Mode->InstructionView command to toggle this mode on and off), then the executed instruction will be logged. If the LogRecord mode is active (execute the Mode->LogRecord command to toggle this mode on and off), then the parameters of the ViewBox will be dumped into the TheLog.rtf during execution. The status of external devices is shown in the IO device window. If a serial port is active then the input/output of this external device is shown in the TheCRT.rtf window. If you plan to modify these files, it makes sense to give them new names. If you didn’t change the names, and were to upgrade to a newer version of TExaS, then the install process might overwrite your programs. Use the File->SaveAs command to change the names of all the files you will be using. You should maintain the appropriate extension (*.rtf, *.uc, *.io, *.stk, *.scp), place these new files all in the same directory (but the directory may be different than the original directory that contained the existing example), and give them all the same first part of the filename name. For example, you could save the source file as My.rtf, save the microcomputer file as My.uc, and save the
A1.5 䡲 TExaS Editor
491
external I/O file as My.io. The scope and stack view files if needed are saved as My.scp and My.stk respectively. Leave the TheList.rtf, TheLog.rtf, and TheCRT.rtf files alone, because these are special files that must maintain these exact file names. The second step is to reconfigure the processor and external I/O ports as needed. For example, you may wish to switch from one type of 9S12 to another. Use the Mode-> Processor. . . command to select the new microcomputer. To reconfigure the external devices, click on the I/O window and execute the appropriate command from the IO menu. If you are converting an example from one microcomputer to another, you will have to rebuild all the external I/O and scope connections to be compatible with the I/O port names of the new microcomputer. Because the I/O devices are at different memory-mapped locations, switching between microcomputers will also require you to rebuild the scope connections. The third step is to write assembly code by editing the My.rtf file. For small programs you may wish to enable automatic recolor, and for large programs you may wish to disable it. The program must be assembled (ctl-B) before it can be executed. The fourth step is configuring the simulation modes. A basic tradeoff exists between simulation speed and the ability to observe the system behavior. The commands that affect simulation are grouped in the Mode menu. These configuration settings are saved and restored with the microcomputer file (e.g., in the file My.uc). The last step is running and debugging your software. The ViewBox, Stack Window, IO window, oscilloscope and logic analyzer provide visualization of your running program. The action commands are appropriately grouped into the Action menu. The usual debugging commands of Action->Reset, Action->Go, Action->Step, Action-> StepOver, and Action->StepOut are available. Breakpoints can be set on any address (even I/O ports). When a read or write access occurs to a breakpoint address, the simulation can be configured to stop (halt mode) or simply copy the ViewBox parameters into the TheLog.rtf window (scan mode). Some special debugging features include Action-> Few (execute a few instructions and stops) and Action->BackDump (display the simulator state for the previous instructions). You can perform right-click commands in the listing window: Action->RuntoCursor, and Action->BreakatCursor. Sometimes it is more efficient to build a system from scratch. To develop assembly programs you need at least one source code file and one microcomputer file. If you have I/O devices, you will need an I/O file too. The first step is to create new files as needed. After creating a new microcomputer file, use the Mode->Processor . . . command to select the desired microcomputer and specify the clock period. The second step is to perform File->SaveAs operations on the files you will be developing. You should maintain the appropriate extension (*.rtf, *.uc, *.io, *.stk, *.scp), place these new files all in the same directory, and give them all the same first part of the filename name. Next, follow the development cycle described in the above steps 3, 4, and 5.
A1.5
TExaS Editor The editor is a simplified version of WordPad. You can specify fonts, sizes and colors. Embedded figures can be added to clarify the software. For example, you can add circuit diagrams, flowcharts, and speadsheet objects into your programs. These embedded objects are ignored by the assembler when creating the object code, but can be quite useful for documentation. The editor uses rich text format (RTF), so formatted text can be cut and pasted from other applications that support rich text format. The following lists the default color settings of the TExaS editor: The labels are shown in purple The op codes are shown in blue The pseudo-op codes are shown in gray The numbers are shown in dark blue
492
Appendix 1 䡲 Embedded System Development Using TExaS
The strings are shown in magneta The operands are shown in black The comments are shown in green The assembly errors are shown in bold red These colors can be changed using the Assemble->TextFormat . . . command. Checkpoint A1.5: How can you tell if an operation is an opcode or pseudo-op?
A1.6
Assembly Language Syntax
A1.6.1 Overall Structure
Programs written in assembly language consist of a sequence of source statements. Each source statement consists of a sequence of ASCII characters ending with a carriage return. Each source statement may include up to four fields: a label, an operation (instruction mnemonic or assembler directive), an operand, and a comment. We use pseudo-op codes in our source code to give instructions to the assembler itself. The equ is an assembly directive and the ldaa is a regular machine instruction. PORTA Inp
equ ldaa
$0000 PORTA
; Assembly time constant ; Read data from fixed address I/O data port
An assembly language statement contains the following fields. Label Field can be used to define a symbol Operation Field defines the operation code or pseudo-op Operand Field specifies either the address or the data. Comment Field allows the programmer to document the software. Sometimes not all four fields are present in an assembly language statement. A line may contain just a comment. The entire line is considered a comment if the first character of the line is a star (*) or a semicolon (;). For example, ; This line is a comment * This is a comment too * This line is a comment
Instructions with inherent mode addressing do not have an operand field. For example, label
clra deca cli inca
comment comment comment comment
Recommendation: For small programs, you should enable automatic assembly colors. The editor will then color each field according to its type. Recommendation: For large programs, you disable automatic assembly colors, because the system will run too slow. Instead, use the assembler to color the source code explicitly each time the program is assembled.
A1.6.2 Label Field
The label field occurs as the first field of a source statement. The label field can take one of the following three forms: A. An asterisk (*) or semicolon (;) as the first character in the label field indicates that the rest of the source statement is a comment. Comments are ignored by the assembler, and are printed on the source listing only for the programmer’s information. Examples: * This line is a comment ; This line is also a comment
A1.6 䡲 Assembly Language Syntax
493
B. A white-space character (blank or tab) as the first character indicates that the label field is empty. The line has no label and is not a comment. These assembly lines have no labels: ldaa 0 rmb 10
C. A symbol character as the first character indicates that the line has a label. Symbol characters are the upper or lower case letters a to z, digits 0 to 9, and the special characters, period (.), dollar sign ($), and underscore (_). Symbols consist of at least one and at most 99 characters, the first of which must be alphabetic or the special characters period (.) or underscore (_). All characters are significant and upper and lower case letters are distinct. A symbol may occur only once in the label field. If a symbol does occur more than once in a label field, then each reference to that symbol will be flagged with an error. The exception to this rule is the set pseudo-op that allows you to define and redefine the same symbol. We typically use set to define the stack offsets for the local variables in a subroutine. The set pseudo-op allows two separate subroutines to re-use the same name for their local variables. With the exception of the equ and set directives, a label is assigned the value of the program counter of the first byte of the instruction or data being assembled. The value assigned to the label is absolute. Labels may optionally be ended with a colon (:). If the colon is used it is not part of the label but merely acts to set the label off from the rest of the source line. Thus the following code fragments are equivalent: here: deca bne here here
deca bne here
A label may appear on a line by itself. The assembler interprets this as set the value of the label equal to the current value of the program counter. A label may occur on a line with a pseudo-op. The size of the symbol table depends on the available PC computer memory, but you are typically allowed to have thousands of labels.
A1.6.3 Operation Field
The operation field occurs after the label field, and must be preceded by at least one white-space character. The operation field must contain a legal opcode mnemonic or an assembler directive. Upper case characters in this field are converted to lower case before being checked as a legal mnemonic. Thus nop, NOP, and NoP are recognized as the same mnemonic. Entries in the operation field may be opcodes or directives. Opcodes correspond directly to the machine instructions. The operation code includes any register name associated with the instruction. These register names must not be separated from the opcode with any white-space characters. Thus clra means clear accumulator A, but clr a means clear memory location identified by the label a. The available instructions depend on the microcomputer you are using. Directives or pseudo-ops are special operation codes known to the assembler that control the assembly process rather than being translated into machine instructions. The directives that TExaS supports are described in detail later in this chapter.
A1.6.4 Operand Field
The interpretation of the operand field is dependent on the contents of the operation field. The operand field, if required, must follow the operation field, and must be preceded by at least one white-space character. The operand field may contain a symbol, an expression, or a combination of symbols and expressions separated by commas. There can be no
494
Appendix 1 䡲 Embedded System Development Using TExaS
white-spaces in the operand field. For example the following two lines produce identical object code because of the space between data and in the first line: ldaa ldaa
data data
+
1
Observation: The Metrowerks assembler allows spaces within the operand field, but requires that a semicolon (;) be placed before each comment.
The operand field of machine instructions is used to specify the addressing mode of the instruction, as well as the operand of the instruction. Table A1.4 summarizes the operand field formats on 9S12. Table A1.4 Example operands for the 9S12.
Operand
Format
Example
no operand expression #expression expression,idx expr,#expr expr,#expr,expr expr,idx,#expr,expr
inherent direct, extended, or relative immediate indexed with address register bit set or clear bit test and branch bit test and branch
The 9S12 assembly language includes some additional operand formats, as shown in Table A1.5. The accumulator offset, acc, is A, B or D, and the index register, idx, is X, Y, SP, or PC. The PC is not allowed with any of the predecrement, postdecrement, preincrement, or postincrement addressing modes. Table A1.5 Additional example operands for the 9S12.
indexed, post increment indexed, post decrement indexed, pre increment indexed, pre decrement accumulator offset indexed indexed indirect RegD indexed indirect
The valid syntax of the operand field depends on the microcomputer. For a detailed explanation of the instructions and their addressing modes, see the help system with the TExaS application.
A1.6.5 Expressions
An expression is a combination of symbols, constants, algebraic operators, and parentheses. The expression is used to specify a value that is to be used as an operand. Expressions may consist of symbols, constants, or the character ‘*’ (denoting the current value of the program counter) joined together by one of the operators: * / % & | ^. * / % & | ^
add subtract multiply divide remainder after division bitwise and bitwise or bitwise exclusive or
A1.6 䡲 Assembly Language Syntax
495
Expressions may include parentheses and other expressions. Expressions are evaluated using the standard arithmetic precedence. Evaluation occurs left to right for multiple operations with the same precedence. The precedence follows standard mathematic conventions, as shown in Table A1.6. Arithmetic is carried out in signed 32-bit twos-complement integer precision at assembly time.
Table A1.6 Operator precedence.
Precedence
operation
Highest 2 3 lowest
parentheses unary ⬃ binary * / % & binary ^ |
Maintenance Tip: It is good programming practice to add parenthesis even if it is not necessary in order to clarify the operation. E.g., (A&B)|(C&D) is clearer than A&B|C&D.
Each symbol is associated with a 16-bit integer value that is used in place of the symbol during the expression evaluation. The asterisk (*) used in an expression as a symbol represents the current value of the location counter (the first byte of a multi-byte instruction.) Constants represent numbers that do not vary in value during the execution of a program. Constants may be presented to the assembler in one of four formats: decimal, hexadecimal, binary, or ASCII. The programmer indicates the number format to the assembler with the following prefixes: 0x $ % ‘c’
hexadecimal, C syntax hexadecimal, assembly syntax binary ASCII code for a single letter ‘c’
Unprefixed constants are interpreted as decimal. The assembler converts all constants to binary machine code and are displayed in the assembly listing as hexadecimal. A decimal constant consists of a string of numeric digits. The value of an 8-bit decimal constant ranges from 128 to 255. The value of a 16-bit decimal constant must fall in the range from 32768 to 65535. Some valid decimal constants are 12, 1235, and 3200. A hexadecimal constant consists of a maximum of four characters from the set of digits (0 to 9) and the alphabetic letters (A-F), and is preceded by a dollar sign ($). Hexadecimal constants must be in the range $0000 to $FFFF. Some valid hexadecimal constants are $12, $ABCD, and $001f. A binary constant consists of a maximum of 16 ones or zeros preceded by a percent sign (%). Some valid binary constants are %00101, %1, and %10100. A single ASCII character can be used as a constant in expressions. ASCII constants are surrounded by a single quotes (’). Any character, except the single quote, can be used as a character constant. Some valid character constants are ‘*’, ‘a’, and ‘Q’. Invalid cases will be identified as syntax errors by the assembler. Checkpoint A1.6: What is the value of 2 4*6/5 1? Checkpoint A1.7: The following two expressions evaluate to exactly the same result: $0F&‘A’|$F0&‘0’ and ($0F&‘A’)|($F0&‘0’). Which is better and why? Checkpoint A1.8: The following two assembly code sequences produce similar results: ldaa #56 and ldaa #5 adda #6. How are they different?
496
Appendix 1 䡲 Embedded System Development Using TExaS
A1.6.6 Comment Field
The last field of an assembler source statement is the comment field. This field is optional and is only printed on the source listing for documentation purposes. The comment field is separated from the operand field (or from the operation field if no operand is required) by at least one white-space character. The comment field can contain any printable ASCII characters. Observation: The Metrowerks assembler requires that a semicolon (;) be placed before each comment.
As software developers, our goal is to produce code that not only solves our current problem, but can serve as the basis of our future problems. In order to reuse software we must leave our code in a condition such that future programmer (including ourselves) can easily understand its purpose, constraints, and implementation. Documentation is not something tacked onto software after it is done, but rather a discipline built into it at each stage of the development. We carefully develop a programming style providing appropriate comments. A comment that tells us why we perform certain functions is more informative than comments that tell us what the functions are. An examples of bad comments would be: clr Flag sei ldaa $0240
;Flag=0 ;Set I=1 ;Read PTT
These are bad comments because they provide no information to help us in the future to understand what the program is doing. An example of good comments would be: clr Flag sei ldaa $0240
;Signifies no key has been typed ;The following code will not be interrupted ;Bit7=1 iff the switch is pressed
These are good comments because they make it easier to change the program in the future. Self-documenting code is software written in a simple and obvious way, such that its purpose and function are self-apparent. To write wonderful code like this, we first must formulate the problem organizing it into clear well-defined subproblems. How we break a complex problem into small parts goes a long way making the software self-documenting. Both the concept of abstraction and modular code address this important issue of software organization. Maintaining software is the process of fixing bugs, adding new features, optimizing for speed or memory size, porting to new computer hardware, and configuring the software system for new situations. It is the MOST IMPORTANT phase of software development. Flowcharts are effective in the design phase of a project. Flowcharts and software manuals are good mechanisms for documenting programs only when these types of documentation are kept up to date when modifications are made. We should use careful indenting, and descriptive names for variables, functions, labels, I/O ports. Effective use of equ provide explanation of software function without cost of execution speed or memory requirements. A disciplined approach to programming is to develop patterns of writing that you consistently follow. Software developers are unlike short story writers. It is OK to use the same subroutine outline over and over again. In Program A1.3, notice the following style issues: 1. Begins and ends with a line of *s 2. States the purpose of the subroutine 3. Gives the input/output parameters, what they mean and how they are passed 4. Different phases (submodules) of the code delineated by a line of s
A1.6 䡲 Assembly Language Syntax Program A1.3 An example use of comments.
A1.6.7 Assembly Listing and Errors
497
;****************** Max ******************************* ; Purpose: returns the maximum of two 16-bit numbers ; Inputs: RegX and RegY are two 16-bit unsigned numbers ; Output: RegX is the maximum of the two inputs ; Destroyed: CCR ; Calling sequence ; ldx #100 ;first number ; ldy #200 ;second number ; jsr Max Max psha ;Save registers, that will be modified pshb pshy ; - - - - - - - - - - - - - - - - - - - - - - - - - - pshx ;first number on the stack xgdy ;RegD is second number tsx ;access the stack cpd 0,x ;which is bigger bhs second ;go if second>=first first pulx ;RegX =first bra end second pulx xgdx ;RegX = second end ; - - - - - - - - - - - - - - - - - - - - - - - - - - puly ;Restore registers pulb pula rts ;****************** End of Max *****************************
The assembler output includes a listing containing the source program, the object code, and any assembly errors. The listing file is created when the TheList.rtf file is open. Each line of the listing contains a reference line number, the address and bytes assembled, and the original source input line. If an input line causes more than 8 bytes to be output (e.g., a long fcc directive), the additional bytes are included in the object code (S19 file or loaded into memory) but not shown in the listing. There are three assembly options, each can be toggled on/off using the Assembly->Options command. (4) [100] {PPP}
cycles total type
shows the number of cycles to execute this instruction gives a running cycle total since last org pseudo-op gives the cycle type
The codes used in the cycle type are presented in Chapter 4. The end of the assembly listing contains a symbol table. The symbol table contains the name of each symbol, along with its defined value. Since the set pseudo-op can be used to redefine the symbol, the value in the symbol table is the last definition. Programming errors fall into two categories. Simple typing/syntax error will be flagged by the TExaS assembler as an error when the assembler tries to translate source code into machine code. The more difficult programming errors to find and remove are functional bugs that can be identified during execution, when the program does not perform as expected. Error messages are meant to be self-explanatory. The assembler has a verbose (see Assembler->Options command) mode that provides more details about the error and suggests possible solutions. The assembler error types are listed below: 1. Label previously defined error: the same label occurs multiple times How to fix: check spelling of all the labels
498
Appendix 1 䡲 Embedded System Development Using TExaS
2. Undefined opcode error: operation does not exist How to fix: check the spelling/availability of the instruction, verify the correct processor is being used 3. Operand error: syntax error within the operand How to fix expression error: check parentheses, start with a simpler expression How to fix undefined symbol: check spelling of both the definition and access How to fix addressing mode error: look up the addressing modes available for the instruction 4. Phasing error: the value of a symbol changes from pass1 to pass2 How to fix: first remove any undefined symbols, then remove forward references If you really need a forward reference: use and to force extended or direct addressing 5. Can’t program address error How to fix: use the org pseudo-op to match available memory. 6. Branch too far error: Destination address is too far away to use 8-bit PC-relative addressing How to fix: switch to long branch version of the instruction Error diagnostic messages are placed in the listing file just after the line containing the error. If there is no TheList.rtf file, then assembly errors are reported in TheLog.rtf file. If neither TheList.rtf or TheLog.rtf exist, then assembly errors are not reported. A phasing error occurs during Pass 2 of the assembler when the address of a label is different than when it was previously calculated. The purpose of Pass 1 of the assembler is to create the symbol table. In order to calculate the address of each assembly line, the assembler must be able to determine the exact number of bytes each line will take during pass 1. For most instructions, the number of bytes required is fixed and easy to calculate, but for other instructions, the number of bytes can vary. A phasing errors occur when the assembler calculates the size of an instruction different in Pass 2 than previously calculated in Pass 1. Sometimes a phasing error often occurs on a line further down in the program than where the mistake occurs. A phasing error usually results from the use of forward references. In this first example, the symbol “size” is not available at the time of assembling the ldaa size. The assembler incorrectly chooses extended addressing mode version rather than the correct direct mode. One solution is to move the variables to the top, and a second solution is to force direct mode using ldaa size. ldaa size ... org 0 size fcb 5 ;
In this example, the symbol “index” is not available at the time of assembling the ldaa index,x. The assembler incorrectly chooses the 2 byte IDX addressing mode version rather than the correct 3 byte IDX1 mode. ldaa index,x index equ 100 ; ... loop ldaa #0 The listing shows the phasing error $0000 A6E064 ldaa index,x $0064 index equ 100 ; ... $0003 8600 loop ldaa #0 ##### Phasing error This line was at address $0002 in pass 1, now in pass 2 it is $0003
When the assembler gets to loop, the Pass 1 and Pass 2 values are off by one causing a phasing error at the loop ldaa #0 instruction. The solution here to simply put the index equ 100 first. Observation: The assembler must be able to accurately determine the object code size of each instruction during pass 1.
A1.6.8 Assembler Pseudo-Ops
Table A1.7 Assembly directives supported by TExaS.
Pseudo-ops are specific commands to the assembler that are interpreted during the assembly process. An alternative name for pseudo-op is assembly directive. A few of them create object code, but most do not. There are many assemblers available developing Freescale assembly code. Although they all use the standard Freescale op codes, the spelling of the pseudo-op codes varies. The TExaS assembler supports many of the various dialects. The pseudo-op codes supported by this assembler are shown in Table A1.7. If you plan to export software developed with TExaS to another application, then you should limit your use only the pseudo-ops compatible with that application. Group A is supported by Freescale’s MCUez, and Metroworks. Group B is supported by Freescale’s DOS level AS05, AS08, AS11 and AS12. Group C are are used by ImageCraft’s ICC11 and ICC12. Group A
Group B Group C Meaning
org
org equ set dc.b db fcb fcc dc.w dw fdb dc.l dl ds ds.b rmb ds.w ds.l end end
.org
.byte .word .long .blkb .blkw .blkl .end
Specific absolute address to put subsequent object code Define a constant symbol Define or redefine a constant symbol Allocate byte(s) of storage with initialized values Create an ASCII string (no termination character) Allocate word(s) of storage with initialized values Allocate 32-bit long word(s) of storage with initialized values Allocate bytes of storage without initialization Allocate bytes of storage without initialization Allocate 32-bit words of storage without initialization Signifies the end of the source code (TExaS ignores these)