This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Using Your Sybex Electronic Book To realize the full potential of this Sybex electronic book, you must have Adobe Acrobat Reader with Search installed on your computer. To find out if you have the correct version of Acrobat Reader, click on the Edit menu—Search should be an option within this menu file. If Search is not an option in the Edit menu, please exit this application and install Adobe Acrobat Reader with Search from this CD (doubleclick on rp500enu.exe in the Adobe folder).
Navigation Navigate throught the book by clicking on the headings that appear in the left panel; the corresponding page from the book displays in the right panel.
Search
To search, click the Search Query button on the toolbar or choose Edit >Search > Query to open the Search window. In the Adobe Acrobat Search dialog’s text field, type the text you want to find and click Search. Use the Search Next button (Control+U) and Search Previous button (Control+Y) to go to other matches in the book. The Search command also has powerful tools for limiting and expanding the definition of the term you are searching for. Refer to Acrobat's online Help (Help > Plug-In Help > Using Acrobat Search) for more information.
Click here to begin using your Sybex Elect ronic Book!
(3)Read and sign the Candidate Agreement, which will be presented at the time of the exam(s). The text of the Candidate Agreement can be found at www.comptia.org/certification. (4)Take and pass the CompTIA certification exam(s). For more information about CompTIA’s certifications, such as their industry acceptance, benefits, or program news, please visit www.comptia.org/certification. CompTIA is a non-profit information technology (IT) trade association. CompTIA’s certifications are designed by subject matter experts from across the IT industry. Each CompTIA certification is vendor-neutral, covers multiple technologies, and requires demonstration of skills and knowledge widely sought after by the IT industry. To contact CompTIA with any questions or comments, call 1-630-268-1818 or e-mail to [email protected]. TRADEMARKS: SYBEX has attempted throughout this book to distinguish proprietary trademarks from descriptive terms by following the capitalization style used by the manufacturer. The author and publisher have made their best efforts to prepare this book, and the content is based upon final release software whenever possible. Portions of the manuscript may be based upon pre-release versions supplied by software manufacturer(s). The author and the publisher make no representation or warranties of any kind with regard to the completeness or accuracy of the contents herein and accept no liability of any kind including but not limited to performance, merchantability, fitness for any particular purpose, or any losses or damages of any kind caused or alleged to be caused directly or indirectly from this book. Photographs and illustrations used in this book have been downloaded from publicly accessible file archives and are used in this book for news reportage purposes only to demonstrate the variety of graphics resources available via electronic access. Text and images available over the Internet may be subject to copyright and other rights owned by third parties. Online availability of text and images does not imply that they may be reused without the permission of rights holders, although the Copyright Act does permit certain unauthorized reuse as fair use under 17 U.S.C. Section 107. Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1
To Our Valued Readers: CompTIA’s i-Net+ certification program has established itself as one of the leading general Internet certifications in the IT industry. Sybex is proud to have helped thousands of i-Net+ candidates prepare for their exam, and we are excited about the opportunity to continue to provide people with the skills they’ll need to succeed in the highly competitive IT industry. CompTIA recently revised the i-Net+ exam, updating the objectives set and expanding the question pool, all in an effort to prevent the dreaded paper-certification syndrome, where individuals obtain a certification without a thorough understanding of the technology. Sybex supports this philosophy, as we have always advocated a comprehensive instructional approach to certification courseware. It has always been Sybex’s mission to teach exam candidates how new technologies work in the real world, not to simply feed them answers to test questions. We’re especially excited about this second edition of our best-selling i-Net+ Study Guide, as it now sports the new CompTIA Authorized Quality Curriculum (CAQC) logo on the cover. CompTIA developed the CAQC program to help exam candidates make better decisions about which training materials to use, and has established rigorous standards that courseware developers must meet in order to display the CAQC logo. The book you hold in your hands went through a review process that checked for exam objective correlation and instructional design integrity, and we are happy to say that we passed with flying colors! We’re confident that this books will help you, the i-Net+ exam candidate, succeed in your endeavors. Good luck in pursuit of your Network+ certification!
Neil Edde Associate Publisher—Certification Sybex, Inc.
SYBEX Inc. 1151 Marina Village Parkway, Alameda, CA 94501 Tel: 510/523-8233 Fax: 510/523-2373 HTTP://www.sybex.com
Software License Agreement: Terms and Conditions The media and/or any online materials accompanying this book that are available now or in the future contain programs and/or text files (the "Software") to be used in connection with the book. SYBEX hereby grants to you a license to use the Software, subject to the terms that follow. Your purchase, acceptance, or use of the Software will constitute your acceptance of such terms. The Software compilation is the property of SYBEX unless otherwise indicated and is protected by copyright to SYBEX or other copyright owner(s) as indicated in the media files (the "Owner(s)"). You are hereby granted a single-user license to use the Software for your personal, noncommercial use only. You may not reproduce, sell, distribute, publish, circulate, or commercially exploit the Software, or any portion thereof, without the written consent of SYBEX and the specific copyright owner(s) of any component software included on this media. In the event that the Software or components include specific license requirements or end-user agreements, statements of condition, disclaimers, limitations or warranties ("End-User License"), those End-User Licenses supersede the terms and conditions herein as to that particular Software component. Your purchase, acceptance, or use of the Software will constitute your acceptance of such End-User Licenses. By purchase, use or acceptance of the Software you further agree to comply with all export laws and regulations of the United States as such laws and regulations may exist from time to time. Software Support Components of the supplemental Software and any offers associated with them may be supported by the specific Owner(s) of that material, but they are not supported by SYBEX. Information regarding any available support may be obtained from the Owner(s) using the information provided in the appropriate read.me files or listed elsewhere on the media. Should the manufacturer(s) or other Owner(s) cease to offer support or decline to honor any offer, SYBEX bears no responsibility. This notice concerning support for the Software is provided for your information only. SYBEX is not the agent or principal of the Owner(s), and SYBEX is in no way responsible for providing any support for the Software, nor is it liable or responsible for any support provided, or not provided, by the Owner(s). Warranty SYBEX warrants the enclosed media to be free of physical defects for a period of ninety (90) days after purchase. The Software is not available from SYBEX in any other form or media than that enclosed herein or posted to www.sybex.com. If you discover a defect in the media during
this warranty period, you may obtain a replacement of identical format at no charge by sending the defective media, postage prepaid, with proof of purchase to: SYBEX Inc. Customer Service Department 1151 Marina Village Parkway Alameda, CA 94501 e-mail: [email protected] After the 90-day period, you can obtain replacement media of identical format by sending us the defective disk, proof of purchase, and a check or money order for $10, payable to SYBEX. Disclaimer SYBEX makes no warranty or representation, either expressed or implied, with respect to the Software or its contents, quality, performance, merchantability, or fitness for a particular purpose. In no event will SYBEX, its distributors, or dealers be liable to you or any other party for direct, indirect, special, incidental, consequential, or other damages arising out of the use of or inability to use the Software or its contents even if advised of the possibility of such damage. In the event that the Software includes an online update feature, SYBEX further disclaims any obligation to provide this feature for any specific duration other than the initial posting. The exclusion of implied warranties is not permitted by some states. Therefore, the above exclusion may not apply to you. This warranty provides you with specific legal rights; there may be other rights that you may have that vary from state to state. The pricing of the book with the Software by SYBEX reflects the allocation of risk and limitations on liability contained in this agreement of Terms and Conditions. Shareware Distribution This Software may contain various programs that are distributed as shareware. Copyright laws apply to both shareware and ordinary commercial software, and the copyright Owner(s) retains all rights. If you try a shareware program and continue using it, you are expected to register it. Individual programs differ on details of trial periods, registration, and payment. Please observe the requirements stated in appropriate files. Copy Protection The Software in whole or in part may or may not be copyprotected or encrypted. However, in all cases, reselling or redistributing these files without authorization is expressly forbidden except as specifically provided for by the Owner(s) therein.
For my wife and daughter, without whom I could not write —David Groth I would like to dedicate this book to Tippy, whose ever-smiling face made me laugh even as my head hit the floor. —Dorothy McGee
It takes many people to put a book together and this book is no exception. Everyone who worked on this book should be proud of the work they have done here. First, I would like to thank my co-author, Dorothy McGee. Thanks also to my technical editors, Andrew Barkl and Scott Warmbrand. They were responsible for making sure the information in this book is technically accurate and as up-to-date as possible. This book would not have existed if not for the efforts of Elizabeth Hurley, this book’s developmental and acquisitions editor at Sybex. Thank you for putting up with all my phone calls and e-mails! Additionally, thanks go out Susan Hobbs and Judy Flynn for turning my collection of chickenscratchings into a cohesive, useful study guide. The production department at Sybex also deserves my thanks. Thanks to Nila Nichols, electronic publishing specialist, for making this book look the way it does; to Teresa Trego, production editor, for making the production end of things run so smoothly. I would also like to acknowledge my wife, family, and friends. My wife, Linda, tirelessly wrote and edited the appendices as well as kept me on the right track. She was a real trooper because she did it while taking care of our daughter, Alison. Thanks to Alison, too, for being fussy when I needed a break (or when she thought I needed a break) or cute when I needed a laugh. Thank you to my family and friends who understood when I couldn’t do something because I had to work on the book. I really appreciate that. Finally, I thank you, the reader, for purchasing this book. I know that it has all the information in it to help you pass the test. If you have questions about the i-Net+ exam or this book, feel free to e-mail me at dgroth@corpcomm .net. All us worked very hard on this book to make it the best i-Net+ study guide available. I hope you agree that it is. —David Groth So many people helped to bring this book to fruition, that it’s hard to know where to start. I would like to extend my whole-hearted thanks to David Groth and Elizabeth Hurley for the confidence and encouragement that both have given me during the production of this book. Many thanks go to Susan Hobbs who not only kept me on my toes during editing, but really lifted my spirits with her positive attitude throughout everything. I would also like to thank Teresa Trego for keeping me on track and ensuring that no stone was unturned during the whole process.
In addition to the wonderful folks at Sybex, I must acknowledge my husband, family, and friends. Darin, my husband, went beyond patience and understanding when I had to pass on chores or going places. His encouragement and much-needed advice when my brain performed a core-dump helped me get through. Many thanks to my mother, Sarah, and in-laws (Connie and Bud), who put up with my absences and lack of phone calls as I constantly lost track of time. I must also acknowledge Andy and John, my supervisors at work, who told me “go for it!” when the opportunity arose to work on this book and were patient on those days that I absolutely couldn’t work one minute late. Last but never least, I must thank you, the reader, who plunked down your hard-earned money for this book. Remember, this is the information age and knowledge IS power. Never stop learning, and never stop growing. If you have any questions or comments about the i-Net+ exam or the content in this book, please feel free to e-mail me at [email protected]. —Dorothy McGee
If you are like the rest of the networking community, you’ve probably taken certification exams. Becoming certified is one of the best things you can do for your career in the computer or networking field. It proves that you are knowledgeable in the area in which you are certified, and provides benefits from the vendor that are not accessible by non-certified individuals. In this book, you’ll find out what the i-Net+ exam is all about. Each chapter covers part of the exam, and at the end of each chapter, there are review questions to help you prepare for the exam.
What Is the i-Net+ Certification? i-Net+ is a certification developed by the Computing Technology Industry Association (CompTIA). This organization exists to provide resources and education for the computer and technology community. It is the same body that developed the vendor-neutral A+ and Network+ exams for computer and networking technicians. In 1997, members of CompTIA convened to develop a new certification that tests skills for Internet professionals, and in 2001 revamped the exam to cover new technologies and concepts. To ensure industry-wide support, it is sponsored by many IT industry leaders, including:
The i-Net+ exam was designed to test the skills of Internet professionals who are responsible for implementing and maintaining Internet, intranet, and extranet infrastructure and services as well as development of related applications. The exam tests areas of Internet technologies such as the TCP/IP protocol, the various types of servers, and the concepts of Internet design and implementation, such as which items are required for an easy-to-read web site and the prerequisites for its installation. In addition, it covers troubleshooting concepts and various how-tos.
Why Become i-Net+ Certified? As this book is being written, the latest i-Net+ certification exam is brandnew. But i-Net+ is just one certification in a line of CompTIA certifications, starting with A+ certification and Network+ certification. Because CompTIA is a well-respected developer of industry vendor-neutral certifications, getting i-Net+ certified indicates that you are competent in the specific areas tested by the exam. Two major benefits are associated with becoming i-Net+ certified:
Proof of professional achievement
Opportunity for advancement
Proof of Professional Achievement Networking professionals are competing these days to see who can get the most certifications. Technicians want the i-Net+ certification because it is broad, covering the entire field of Internet-related technical knowledge, rather than only development or security, for example. Thus, it can be a challenge to prepare for the i-Net+ exam. Passing the exam, however, certifies that you have achieved a certain level of knowledge about vendorindependent Internet-related subjects.
Opportunity for Advancement We all like to get ahead in our careers. With advancement comes more responsibility, to be sure, but usually it means more money and greater opportunities. In the information technology area, this can usually be accomplished by obtaining multiple technology certifications, including i-Net+.
i-Net+, because of its wide-reaching industry support, is recognized as a baseline of Internet and networking information. Some companies will specify that i-Net+ certification will result in a pay raise at review time. And some companies will specify that i-Net+ certification, in conjunction with A+ Certification, is required as a condition of employment before an employee’s next review.
How to Become i-Net+ Certified The simplest way to find out how to become i-Net+ certified is to take the exam. It is administered by Sylvan Prometric, with which most of you are familiar if you have taken any other computer certification exams. The exam is administered by computer, and your results are displayed to you right after taking the exam. To register for the exam, call Sylvan (not the testing center) at 877-803-6867 and tell them you want to take the i-Net+ exam (IK0-002). You must pay for the exam at registration time with a major credit card (for example, Visa or MasterCard). Special incentive pricing may be in effect when you take the exam—check CompTIA’s web site for details.
You can also register on the Internet through Sylvan Prometric at www.sylvanprometric.com or www.2test.com.
At the end of the exam, your score report will be displayed on screen and printed so that you have a hard copy.
Who Should Buy This Book? If you are one of the many people who want to pass the i-Net+ exam, you should buy this book and use it to study for the exam. The i-Net+ exam is designed for Internet professionals with six months of experience in a variety of entry-level, Internet-related technical job functions. This book was written with one major goal in mind: to prepare you to pass the i-Net+ exam by describing in detail the concepts on which you’ll be tested. In addition, this book can also be used as a good reference for broad Internet concepts.
How to Use This Book and CD This book includes several features that will make studying for the i-Net+ exam easier. First, at the beginning of the book (right after this introduction, in fact) is an assessment test you can use to check your readiness for the actual exam. Take this test before you start reading the book. It will help you to determine the areas you may need to brush up on. You can then focus on those areas while reading the book. The answers to this test appear on a separate page after the last question. Each answer also includes an explanation and a note telling you in which chapter this material appears. In addition, there are review questions at the end of each chapter. As you finish each chapter, answer the questions and then check your answers, which will appear on the page after the last question. If you answered any question(s) incorrectly, you’ll know that you may need some additional study in that particular area of the exam. You can go back and reread the section in the chapter that deals with each question you got wrong to ensure that you know your stuff. On the CD-ROM that is included with this book, there are several extras you can use to bolster your exam readiness: Electronic “flashcards” You can use these 150 flashcard-style questions to review your knowledge of i-Net+ concepts on your PC. Additionally, you can download the questions into your Palm device (if you own one) for reviewing anywhere, anytime, without a PC! Practice Exams Take these exams when you have finished reading all the chapters, answered all the review questions, and you feel you are ready for the i-Net+ exam. Take the practice exams as if you were actually taking the i-Net+ exam (such as without any reference material). The answers to the practice exams can be found at the end of the test. If you get more than 90 percent of the answers correct, you’re ready to go ahead and take the real exam. Test engine This portion of the CD-ROM includes all of the questions that appear in the text of this book: the assessment questions at the end of this introduction, the chapter review, and the practice exams from this CD-ROM. The book questions will appear similarly to the way they did in the book, and they will also be randomized. This random test will allow you to pick a certain number of questions and will simulate an actual
exam. Combined, these test engine elements will allow you to test your readiness for the real i-Net+ exam. Full text of the book If you are going to travel but still need to study for the i-Net+ exam, and if you have a laptop with a CD-ROM drive, you can take this entire book with you on the CD-ROM. The book is in PDF (Adobe Acrobat) format so it can be read easily on any computer.
Conventions Used in This Book To understand the way this book is put together, you must learn about a few of the special conventions we used. Following are some of the items you will commonly see. Italicized words indicate new terms. After each italicized term, you will find a definition. Lines formatted in this font refer to the output of a program. You will usually see several of these lines together indicating what the output of a text-based program usually looks like. This font is also used in URLs and e-mail addresses.
Tips will be formatted like so. A tip is a special piece of information that can make either your work or your test-taking experience easier.
Notes are formatted with this symbol and this box. When you see a note, it usually indicates some special circumstance to make note of. Notes usually include information that is somewhat out of the ordinary and relates to the exam.
Warnings are found within the text whenever there is a technical situation that arises that may cause damage to a component or cause a system failure of some kind. Additionally, warnings are placed in the text to call particular attention to a potentially dangerous situation.
Keep a watchful eye out for these special items within the text as you read.
Sidebars This special formatting indicates a sidebar. Sidebars are entire paragraphs of information that, although related to the topic being discussed, aren’t actually on the exam. They are just what their name suggests: a sidebar discussion.
Exam Objectives The i-Net+ exam objectives were developed by a group of Internet industry professionals through the use of an industry-wide job task analysis. CompTIA asked groups of Internet professionals to fill out a survey rating the skills they felt were important in their jobs. The results were grouped into objectives for the exam. This section includes the outline of the exam objectives for the i-Net+ exam, and the weight of each objective category.
The objectives and weighting percentages given in this section can change at any time. Check CompTIA’s web site at www.comptia.org for a list of the most current objectives.
Internet Basics & Clients (30%) 1.1 Identify the issues that affect an Internet site. Content may include the following: Performance, including:
Bandwidth (both client and server)
Internet connection types (both client and server)
2.2 Understand and be able to describe differences between popular client-side and server-side programming languages. Content may include the following:
When to use the languages
When they are executed
Examples may include the following:
Java
JavaScript
XML
ASP
Extensible Stylesheet Language—XSL
Document Type definitions—DTD
JSP
CGI script
Perl
Java Servlets
VBScript
PHP
2.3 Create HTML pages. Content may include the following:
3.3 Understand and be able to describe the use of Internet domain names and DNS. Content may include the following:
DNS entry types
Hierarchical structure
Role of root domain servers
Top level or original domains
NSlookup
3.4 Understand and be able to describe the capabilities of popular remote access protocols. Content may include the following:
SLIP
PPP
PPTP
L2TP
PPPOE
Point-to-point multi-point
3.5 Understand how various protocols or services apply to the function of their corresponding server, such as a mail server, a web server, or a file transfer server. Content may include the following:
5.2 Understand and be able to describe the differences between the following from a business standpoint. Content may include the following:
Private Network
Intranet
Extranet
Internet
5.3 Recognize and explain the current types of e-business models being applied today. Content may include the following:
Business-to-business models
Business-to-consumer models
Business-to-employee models
Business to Government
Consumer-to-business
Consumer-to-consumer
Storefront (bricks & mortars) vs. e-business
New and changing customer expectations
e-business and the Internet
Aggregator
5.4 Identify key factors relating to strategic marketing considerations as they relate to launching an e-business initiative. Content may include the following:
How to Contact the Authors If you have any questions while you are reading this book, feel free to contact any of the authors. David Groth can be reached via e-mail (the best way to reach him) at [email protected]. Dorothy McGee can be reached at [email protected]
Test-Taking Tips The i-Net+ exam is in its second revision (as this book is being written) and has gained wide acceptance among Internet professionals. Remember a few things when taking your test:
Get a good night’s sleep the night before.
Take your time on each question. Don’t rush it.
Arrive at the testing center a few minutes early so that you can review your notes.
Answer all questions, even if you don’t know the answer. (Unanswered questions are considered wrong.)
If you don’t know the answer to a question, mark it and then come back to it later.
Read each question twice and make sure you understand it.
Good luck on your i-Net+ exam and in your future in the Internet industry.
Assessment Test 1. Looking at the graphic below, the network topologies used (from left
to right) are:
A. Star, bus, ring B. Bus, Ring, Mesh C. Bus, Star, Ring D. Ring, Bus, Star 2. A hacker, posing as an employee of your company’s IT department,
calls one of your employees, Dan, and asks for the company password. Your unsuspecting employee responds with the password. Which of the following has Dan just become a victim of? A. DoS B. Ping flood C. Brute force attack D. Social engineering
3. The test server is a Pentium-100 with 500MB of RAM running
Apache on Linux. Each child of the web server uses 5MB of RAM. The server has a dedicated T1 line and serves an unlimited number of clients that download its 100K static pages at 10 K/s. Given this test server, at what rate of incoming requests will the web server start increasing its queue length? A. 2 B. 6 C. 11 D. 16 4. Your company, XYZ corporation, has decided that it wants to
strengthen its security. You need a solution that will keep hackers out of your network, and be able to block employees from accessing questionable content. What should you deploy? A. FTP server B. Proxy server C. Directory Server D. Firewall 5. Copyright law provides protection for software but allows for
_______. A. Limited use B. Educational use C. Nonprofit use D. Fair use
6. You’re configuring an email client to send and receive email. The pro-
tocol you would configure to send email is _____ and the protocol to receive email is ______. A. POP3 SMTP B. SMTP POP3 C. POP3 NNTP D. NNTP SMTP 7. You have just received 10 calls from users complaining that they can-
not access the Internet. You try going to a web site that is almost never down, but you can’t get there. What is the most likely problem? A. The connection to the ISP is down. B. TCP/IP is not configured correctly. C. There is a problem on the local network. D. You web server is malfunctioning 8. Your network is configured in the following manner: 5 workstations
and 1 server connected to a hub. Physically, your network is configured in a ________ topology. A. Bus B. Star C. Ring D. Mesh 9. True or False. You must have an IP address in order to browse the
10. You’ve just acquired an ADSL modem and have setup access through
an xDSL provider. Which protocol would you be using to connect your ADSL modem to the Internet? A. SLIP B. PPP C. PPPoE D. HTTP 11. You have just received your IP address and domain name from a reg-
istrar. Eagerly, you get your Internet site up and are anticipating the orders for your product to role in. The next day, you find that there are no hits on your web site. On a hunch, you use NSlookup to locate information on your domain and get an “Entry not found” response. Which type of DNS record did your ISP not add? A. Common Name B. Address C. Mail Exchange D. Domain Name 12. Which component of a Web browser contains all the menus for the
program? A. Button bar B. Menu bar C. Status Bar D. Activity indicator 13. What is the program that interprets Java programs called? A. The Java Interpreter B. The Java Compiler C. The Java Virtual Machine D. The Java Grinder
14. Which remote access protocol uses IPSec for encryption? A. SLIP B. PPP C. PPTP D. L2TP
Answer: D. L2TP was create as a remote access protocol to work in conjunction with IPSec, which allows for secure and encrypted tunneling over the Internet. The other three protocols were constructed before IPSec was conceived. See Chapter 3 for more information. 15. Which of the following is not necessary to support an Internet client? A. Web browser Client B. operating system C. IP address D. Internet connect 16. Public-key encryption uses which of the following? A. A public key and a private key B. A private key C. Packet filtering D. Packet filtering and a private key 17. An object is likely to be in a cache if _____. A. The object was requested a long time ago B. The object is fresh; it was requested just before the cache was last
cleared C. The object has high value because it has never before been
requested D. The object has been requested recently
18. True or False. You can use Telnet to access the console of a Unix host. A. True B. False 19. What is the default subnet mask for a Class B address? A. 255.255.0.0 B. 0.0.0.0 C. 255.255.255.0 D. 255.0.0.0 20. When browsing the Internet with a web browser, what is the text that
links you to another page on the web called? A. Hypertext B. Hyperlink C. Hyperactive D. Hyperbole 21. A bastion host is a ___________. A. Type of firewall configuration B. Denial of service attack C. Encryption method D. Proxy server 22. The process of establishing a relationship between two tables in a rela-
tional database is called doing a _____. A. link B. join C. pivot D. combo
23. Which of the following is not needed before transfering a file from an
FTP server using an FTP client? A. FTP Server name B. DNS MX record C. username D. password 24. Netscape’s server-side scripting language, LiveScript, is similar to _____. A. VBScript B. JavaScript C. Perl D. Java 25. A database management system (DBMS) is primarily responsible for
what? A. Storing data in a database and handling queries sent to it B. Backing up a database C. Integrating a database with the Internet D. Providing Perl support 26. A company that receives income from products it sells to foreign
nationals is _______. A. Not liable for taxes on the income in its home country B. Usually responsible for paying duties on services provided over the
Internet C. Liable for taxes on the income in its home country D. Usually eligible for special tax benefits
27. When planning the content of a web site, it is most important to _____ . A. Use the latest technologies so the site won't need to get rein-
vented soon B. Only use technologies that won't be roadblocks for potential users C. Follow the local policy, based on the goals of the site and the audi-
ence profile D. Choose technology the development team enjoys and is good at 28. You need to remote into a Cisco router to perform some configuration
work. Your computer is connected to the same network as the router. What protocol is used to provide this connection? A. HTTP B. FTP C. PPPoE D. Telnet 29. Which components of virus scanning software need to be updated fre-
quently? (Choose two.) A. Virus application B. Definition file C. Scanning Engine D. NetShield 30. Which is true of full-text reverse indexes? A. They really shine for sites that update infrequently B. They are primarily governed by ROBOTS.TXT spider rules C. They speed queries D. They provide concept-based functionality
31. You have just configured your firewall to recognize a bug in Microsoft
Internet Explorer. Later that day, the firewall receives a request that uses the bug, and sends up an alert. The firewall has performed which of the following? A. Port filtering B. Packet filtering C. Application filtering D. Bug filtering 32. Of the people who view a banner ad, the percentage that follows its
link to an advertiser’s site is called _______. A. The clickthrough rate B. The passthrough rate C. The yield D. The drawing power 33. If you want to buy your sister either black high heels or purple tennis
shoes, which query will be most useful? A. A full-text search for “black high heels or purple tennis shoes” B. A keyword search for “black high heels or purple tennis shoes” C. A full-text search for (black and “high heels”) or (purple and “ten-
nis shoes”) D. A keyword search for “high heels or sneakers” 34. Which troubleshooting utility is used on a Microsoft Windows 98
machine to view TCP/IP configuration information? A. ipconfig B. ifconfig C. winipcfg D. tracert
35. How many host IP addresses are available with the CIDR
designation /21? A. 128 B. 1,024 C. 2,046 D. 9,128 36. One difference between patent and copyright is _______. A. Patent protection is available only in the United States B. There is no implicit patent; you must apply for and be granted one C. Two independent inventors of a product or process may both
enjoy patents on it D. Copyright does not allow for corporate ownership 37. Your network is experiencing severe slowdown problems. After doing
a bit of homework, you find that the Engineering group is transferring large graphics files amongst themselves. Your decide to use a _______ to keep Engineering’s network traffic from interfering with the rest of the network. A. Bridge B. Router C. Hub D. Switch 38. Licensing always allows for _______. A. One party’s use of another party’s copyrighted material under cer-
tain terms B. A payment of money to the copyright holder C. A limited period of use D. Mandatory review of the terms by a judge
39. What is Microsoft’s database connector called? A. Java Database Connectivity (JDBC) B. SQL C. Paradox D. Open Database Connectivity (ODBC) 40. Which FTP command produces the output “Type set to A”? A. put B. ascii C. get D. ls
Answers to Assessment Test 1. C. Network topologies follow a shape and pattern that is logical to
its design. The bus topology follows a straight line, such as with a bus running a route. The star topology creates a star-burst pattern with a central clustering device. The ring topology connects each network client with it’s adjacent neighbors, creating a ring pattern. See Chapter 1 for more information. 2. D. Poor Dan wasn’t denied any network services from the phone call,
so he wasn’t hit with a denial of service (DoS) attack. A ping flood is a specific type of DoS attack that uses enormous amounts of pings against a specific computer. Although Dan may have been bullied into giving the password, it doesn’t count as a brute force attack, which simply runs a password-cracking program that attempts multiple combinations of username and password. See Chapter 5 for more information. 3. C. If there are 100 processes and each finishes serving a request every
10 seconds, 10 requests a second will be fulfilled. At 11 requests a second, one request is added to the queue every second. For more information, see Chapter 7. 4. D. An FTP server is used to receive files from the Intranet and Inter-
net, but it will not provide you with security. A proxy server is a server that acts on behalf of clients for Internet content and can provide some security. However, a proxy server will not provide enough blocking features to prevent employees from accessing questionable content, so choice B is wrong. A firewall will provide a solution to both problems. See Chapter 2 for more information. 5. D. The doctrine of fair use describes circumstances in which someone
may use copyrighted material without explicit permission. For more information, see Chapter 9.
6. B. NNTP is the Network News Transport Protocol, and is used to
send and receive Internet news (also called Usenet), so answers C and D are incorrect. POP3 is the protocol used for receiving email, and SMTP is used to send email. Therefore, answer B is correct. See Chapter 3 for more information. 7. A. If none of your clients can get to the Internet, the first step is to
check your connection from your network to your ISP. Most likely, that is the root cause. See Chapter 8 for more information. 8. B. A star topology has 1 central device, in this case the hub, that the
other network devices attach to. A ring topology has each networking device connected to its neighboring network device. Don’t confuse the two topologies. See Chapter 1 for more details. 9. A. True. Every network device that connects to the Internet must
have an IP address. The IP address functions as a unique identifier for that machine. For more information on IP addresses, see Chapter 4. 10. C. SLIP and PPP are protocols that are used for remote access, but
they are not generally used to connect via xDSL technologies. Since the question pertains to the modem connection to the xDSL provider, HTTP can’t be correct since it is not a remote access protocol. PPPoE is predominately used to connect broadband media, such as xDSL modems, to the Internet. See Chapter 3 for more information. 11. B. An address record must be included to resolve IP addresses to
domain names. A mail exchange record is used to identify e-mail servers. Common name records are aliases that allow you to have more than one name for a single web site. See Chapter 2 for more information. 12. B. The menu bar is the only component of a Web browser that con-
tains the menus for the program. The button bar contains all the control buttons for browsing, the status bar shows the status of the browsing session, and the activity indicator shows that Internet activity is occurring. For more information see Chapter 4.
13. C. The Java Virtual Machine (JVM) interprets Java programs. For
more on Java, see Chapter 6. 14. D. L2TP was create as a remote access protocol to work in conjunc-
tion with IPSec, which allows for secure and encrypted tunneling over the Internet. The other three protocols were constructed before IPSec was conceived. See Chapter 3 for more information. 15. A. While a web browser is handy to have, it is actually an Internet cli-
ent and therefore unnecessary. Other Internet clients can still function without the browser. For more information see Chapter 4. 16. A. Private-key encryption only uses a private key for encryption.
Packet filtering is actually a method used by firewalls to determine if the received data meets its expectations. See Chapter 5 for more information. 17. D. The object has been put into the cache by a recent request.
Because it is recent, it is likely it hasn't expired in the cache or been pushed out by other cached content. For more information, see Chapter 7. 18. A. True. The primary function of Telnet is to access the console of a
Unix host, although it is used to access routers to perform configuration work and some Internet sites. For more information see Chapter 4. 19. A. 255.255.255.0 is for a Class C address, and 255.0.0.0 is for a
Class A address. For more information on default masks, see Chapter 3. 20. B. A Hyperlink is a line of text that, when clicked, will take you to
another page. For more information see Chapter 4. 21. A. A bastion host is a firewall configuration on a computer that has
two network interface cards (NICs). One NIC connects directly to the network, and the other NIC connects directly to your internal network. It is the least expensive configuration for a firewall, and also the least secure. See Chapter 5 for more information.
22. B. To establish a relationship between two tables in a relational data-
base is to do a join. For more on relational databases, see Chapter 6 23. B. An MX record is a mail exchange record that indicates a mail
server. FTP is a file transfer protocol used for transferring files, not email. For more information see Chapter 4. 24. B. LiveScript sometimes is called “server-side JavaScript.” For more
on scripting languages, see Chapter 6. 25. A. Though DBMSes may do the other tasks listed, their primary job
is to store data and allow users to access it. For more on DBMSes, see Chapter 6. 26. C. The income you receive from foreign customers is taxable income
like any other. For more information, see Chapter 9. 27. C. Although all enjoy some truth, B is most likely to lead to a suc-
cessful web site. For more information, see Chapter 7. 28. D. While you can perform some remote management with a simple
web browser, router management is usually not accomplished in this fashion. FTP is a file transfer protocol, but the question states that you will be performing configuration work. PPPoE is a remote access protocol that works over Ethernet, but the question states that you are connected to the same network as the router, so answer C is incorrect. Telnet is the correct choice, as it provides Terminal Emulation services. See Chapter 3 for more information. 29. B, C. Virus scanners consist of two major components: the virus def-
inition file and the scanning engine. Both parts must be updated on a regular basis or else the virus scanner itself is useless. See Chapter 8 for more information. 30. C. The table mapping words to files allows direct lookups on where
a word is used. For more information, see Chapter 7.
31. C. Application filtering works by configuring the firewall so that it
is aware of various bugs in applications, and will refuse to fulfill a request that makes use of the bug. See Chapter 5 for more information. 32. A. The percentage of ad viewers that choose to visit the underlying
site is called the clickthrough rate. For more information, see Chapter 9. 33. C. The full-text search will look through the details of more pages.
The search terms will not find purple high heels or black tennis shoes. For more information, see Chapter 7. 34. C. Ipconfig is used on Windows NT machines, and can be easily con-
fused with winipcfg. Ifconfig is the Unix version. See Chapter 8 for more information. 35. C. /21 indicates that there are 21 bits used for the network address
of an IP address and that there are 11 bits left over for host addresses, corresponding to a decimal number of 2,048 (subtract 2 for addresses with all 0s and all 1s, leaving 2,046). For more information on CIDR, see Chapter 3. 36. B. You must apply for and be granted a patent to be able to defend
an invention from infringement. For more information, see Chapter 9. 37. A. Hubs transmit data to all devices connected to it, so that wouldn’t
solve the problem here. A router is used to connect different networks together, and the question indicates a single network since all workstations are affected by the one group. A bridge is the best choice because it allows you to segment the network so that traffic meant on one end stays on that one end and doesn’t cross over to the other side. See Chapter 1 for more information. 38. A. Licensing is one party’s use of another party’s copyrighted material
under certain terms, which may or may not be granted in exchange for payment. For more information, see Chapter 9.
39. D. Microsoft’s ODBC API is used to connect the client to a variety of
databases. JDBC is a Java-based connector, but was not developed by Microsoft. Paradox and SQL are actual database programs. For more on ODBC, see Chapter 6. 40. B. Put is used to send a file to an FTP server and Get retrieves a file,
so answers A and C are incorrect. Ls is used to list a directory, so answer D is wrong. The ascii command changes the file type to A and thus produces the specified output. For more information see Chapter 4.
i-Net+ Networking Basics I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 3.7 Create a logic diagram of Internet components from the client to the server. Content may include the following:
Bridge
Brouter
Router
Switch
Hub
Repeater
Network adapter
Cable Modem
xDSL Modem
Modem
WAN Link
CSU/DSU
Firewall
Network Address Translation (NAT) server
Proxy Server
3.8 Describe various hardware and software connection devices and when to use them. Content could include the following:
y most accounts, the Internet is a big network. It contains many of the same components as any corporate network. To that end, before discussing the Internet, it is helpful to understand some of the basic components and concepts of a network. Many of the concepts involved in understanding networks will cross over to understanding the inner workings of the Internet. This chapter will introduce you to some of the more common networking topics you must understand when working with Internet technologies. Some of those topics include definitions of servers and protocols, hardware and software connection devices, and the various bandwidth technologies used to connect Internet sites to one another. This chapter will introduce you to these and other networking components and concepts so that you may have a better understanding of the Internet’s underpinnings.
What Is a Network?
I
n the computer world, the term network describes two or more connected computers that can share resources. A resource can include data, printers, applications, fax devices, scanners, or any computer device that can be shared. The type used depends on the number of computers (and people) who need access, the geographical and physical layout of the enterprise, and of course, financial resources. Networks can be classified in two different ways:
Network Topology: The network topology comes in several forms, and is most generally defined by the physical layout of the network.
Network Type: The geographical location and physical size of the network generally define the type of network.
In this section, we’ll discuss each type of network and describe the situation that is most appropriate for its use.
Network Topologies From the computer itself to the cables that connect them together, the physical layout of a network is known as the network’s topology. Since the topology is determined by factors such as the building’s physical layout and the location of the network devices (also known as clients), you can think of a topology as a kind of network map. When choosing a topology for a network, you should choose the one that best facilitates connectivity between the network devices. FIGURE 1.1
A network with multiple topologies
Networks tend to follow one of four basic patterns, although you do have mixtures more commonly in the real world. Figure 1.1 displays a network configured with several different topologies. The topologies that we discuss in this section are:
Bus Networks generally have a specific area that handles a majority of the network traffic, called the network’s backbone. The cables that connect the network devices to the backbone are called segments. In a bus topology, as depicted in Figure 1.1 on the left side of the graphic, the individual clients are directly connected to the backbone with their own cable. If you have relatively few devices in a small area to connect to the network, the bus topology works well; however, the major drawbacks to this configuration are that adding new devices directly into the backbone can take some skill, a cable break affects the entire network, and more than one device could talk on the network at the same time (causing a collision to occur). Bus networks are typically used in Ethernet networks.
Ring In a ring topology, every network device is attached directly to both of its neighboring devices, forming a ring pattern. Ring networks are primarily installed in Token Ring networks, and as we will discuss later in the chapter, have the advantage of preventing two devices from talking on the network at the same time; however, you still have issues with network failure if one of the cable segments fail. Looking back at Figure 1.1, the right side of the graphic displays a ring network.
Star In a star topology, each network device connects directly to a central clustering device, such as a hub or a switch. This method creates a starburst pattern that gives it the name. The advantages you have with a star topology is that if a cable breaks, finding it and replacing it is easy, and that the other network devices are not affected by the broken cable. Star networks are typically seen in Ethernet networks. Figure 1.1 depicts a star network in the middle of the graphic.
Mesh The least common topology in use is the mesh topology. As its name implies, each networking device connects directly to the central server and to every other network device. The main advantage of this topology is that even if one cable breaks, you still have a number of other cables that the device can use to communicate with the network. Unfortunately, the sheer number of connections increase dramatically with the number of network devices—2↑ (n-1)/2 with n being the number of network devices. For example, if you had a network of eight devices, you would need 2↑ (8-1) or 10 connections. Not too
bad, you may think, but if you increased that number to 15, that number becomes 2↑ (15-1) or 98 connections. Trying to find one broken connection becomes the proverbial needle in the haystack. Another drawback to the mesh topology—and the main reason that it isn’t commonly used—is that it can get expensive to set up and maintain.
Mixed Topologies A mixed topology is one where you combine two of the four topologies together to gain the advantages of both. While not on the exam, mixed topologies are common enough that you should at least know what they are. The two most common mixed topologies are:
FIGURE 1.2
Star-bus: Combines both the star and the bus topologies, hence its name. A central device, such as a hub or a switch, is used to connect network devices to form the “star” part of the topology. The individual central devices are then connected to the backbone, which in turn forms the “bus” part of the topology. Figure 1.2 shows an example of a star-bus network. As we see later in this chapter, this topology is commonly used in Ethernet networks.
A network using the star-bus topology
Hub
Workstation
Workstation
Workstation
Workstation
Star-ring: Offers a central device to cluster individual network components together; however, instead of having a bus topology for a backbone, a ring is used. Star-rings are most commonly used in Token Ring networks, which is discussed later in this chapter.
Physical or Logical? A network can also be described by two additional methods: physical topology or logical topology. The physical topology of a network is the actual, physical cabling of the network devices; however, the logical topology describes the manner in which the network really operates. For example, imagine that you had four workstations and one server connected to a hub running an Ethernet network. Physically, you have a star network; however, Ethernet works on a bus technology (more on Ethernet later, so therefore, logically you have a bus network.
The Main Network Types Defined Now that you know the basic network topologies, you can examine the different types of networks and when they are used. Networks generally fall into one of three main categories:
Local Area Network (LAN)
Metropolitan Area Network (MAN)
Wide Area Network (WAN)
Local Area Network (LAN) By definition, a local area network, or LAN, is limited to a specific area, usually an office, and cannot extend beyond the boundaries of a single building. The first LANs were limited to a range (from a central point to the most distant computer) of 185 meters (about 600 feet) and to no more than 30 computers. Today’s technology allows for a larger LAN, but practical administration limitations require dividing it into small, logical areas called workgroups. A workgroup is a collection of individual computers that share the same files and databases over the LAN, such as the sales department. Examples of LANs are shown in Figures 1.1 and 1.2. Theoretically, a LAN can connect a maximum of 1,024 computers at a maximum distance of 900 meters (around 2,700 feet, assuming thinnet cable is used). These figures are based on connecting the segments with special devices to extend the overall range of the network to the backbone, and very light network traffic. If you use a different type of cabling, these maximums can decrease to 30 computers, with the most distant computer connected at a maximum of 100 meters (about 300 feet) from a central point.
Metropolitan Area Network (MAN) A metropolitan area network is a network that generally does not leave the boundaries of a town or a city. The MAN is typically seen in college or university campuses that span one particular area of a town, or used by small companies that have offices in the same city but in different buildings. MANs typically use the local telephone company to connect individual LANs together, but can also use link types such as fiber-optic cabling and various forms of wireless technologies.
Wide Area Network (WAN) Chances are you are already an experienced WAN user and didn’t know it. If you have ever connected to the Internet, you have used the largest WAN on the planet. A wide area network, or WAN, is any network that crosses metropolitan, regional, or national boundaries. Most networking professionals define a WAN as any network that uses routers and public network links. The Internet fits both definitions. WANs differ from LANs in the following ways:
WANs cover greater distances than LANS or MANs.
WAN speeds are slower than LAN speeds.
LANs are limited in size and scope; WANs are not.
WANs can be connected on demand or can be permanently connected. LANs have permanent connections between stations.
WANs can use public or private network transports. LANs primarily use private network transports.
The Internet is actually a specific type of WAN. It is a collection of networks that are interconnected, and is therefore technically an internetwork. (Internet is short for the work internetwork.) The Internet will be discussed more fully in Chapter 2. A WAN can be centralized or distributed. A centralized WAN consists of a central computer (at a central site) to which other computers and dumb terminals connect. The Internet, on the other hand, consists of many interconnected central computers in many locations. Thus, it is a distributed WAN.
Networks are made up of many components, both hardware and software, and each hardware device on the network performs a different function. In this section, you will learn about some of these devices and their specific functions.
Network Adapter The network interface card (NIC), as its name suggests, is the device in your computer that connects (interfaces) your computer to the network. This device provides the physical, electrical, and electronic connections to the network media. It is responsible for converting the information your computer processes into the special electrical signals for the type of network technology your network uses. Also known as a network adapter, a NIC is either an expansion card (currently the most popular implementation) or built into the motherboard of the computer. If the NIC is built-in, it is called an integrated NIC, and is fast becoming one of the standard features offered to both the corporate and consumer markets for desktop computers. An example of a NIC is shown in Figure 1.3. FIGURE 1.3
In some cases, a NIC must be added to the computer. It is usually installed into an expansion slot on the computer’s motherboard. In notebook computers, NIC adapters can be connected to the printer port (known as a parallel port), through a built-in PC card slot (currently the most common method), or built-in (the latest trend in laptop computers).
To be used on a network, the NIC must have at least one protocol bound to it within the operating system. Binding a protocol means to logically associate a particular protocol with that instance of a NIC within an operating system so that the OS can communicate with the rest of the network using that protocol.
Regardless of which type of NIC you choose, the important thing to remember when buying a NIC for your computer is to buy one that matches the bus type in your computer and the type of network that you have. It sounds rather obvious, but you can’t get a Token Ring card to communicate on an Ethernet network, no matter how hard you try. It just won’t work because a Token Ring NIC wasn’t designed for an Ethernet network. The electrical signals are in a completely different format.
It is never a good idea to mix-and-match NICs from different vendors on the same network. While you can get most NICs from one vendor to work in harmony with another vendor’s product, sometimes the two “collide” and can cause problems throughout the entire network. Stick with one (two at the most) vendor’s product, and you will enjoy fewer network issues.
Cabling Although it is possible to use several forms of wireless networking, such as radio and infrared, most current networks communicate via some sort of cable. Although the i-Net+ exam doesn’t test you on cabling technologies, it is important that we at least cover some of the common network cabling because, without cabling, most networks have no pathway to transmit data. In this section, we’ll look at three types of cables commonly found in LANs:
Coaxial Cable Coaxial cable (or coax) contains a center conductor made of copper, and is surrounded by a plastic jacket, with a braided shield over the jacket (as shown in Figure 1.4). A plastic, such as either PVC or Teflon, covers this metal shield. The Teflon-type covering is frequently referred to as a plenumrated coating. That simply means that the coating does not produce toxic gas when burned and is rated for use in air plenums that carry breathable air. This type of cable is more expensive, but may be mandated by electrical code whenever cable is hidden in walls or ceilings. FIGURE 1.4
Construction of a coaxial cable Jacket
Braided shield
Insulator
Center conductor
Plenum rating applies to all types of cabling.
Coaxial cable is available in different specifications that are rated according to the RG Type system. Different cables have different specifications and, therefore, different RG grading designations (according to the U.S. military specification MIL-C-17). Distance and cost are considerations when selecting coax cable. The thicker the copper, the farther a signal can travel—and you pay higher costs and receive a less-flexible cable. There are two main categories of coaxial cable, Thick Ethernet (or thicknet) and Thin Ethernet (or thinnet). The primary difference between the two is the diameter of the cable and the distance they can carry a signal in a single segment. Thinnet coaxial can carry a signal 185 meters in a single segment, and thicknet can carry a signal 500 meters in a single segment. Thicknet cable has approximately the same diameter as a small garden hose and is difficult to bend. Thinnet cable, on the other hand, has approximately the same diameter as a pencil, is much more flexible, and thus easier to install. Of the two, thinnet is much more common in newer installations. The main consideration with the installation of coaxial cable is the phenomenon of signal bounce. With coaxial cable, the signal travels up and down the entire length of the wire. When the signal reaches the end of the
wire, the electrical change from copper to air prevents the conversation from simply falling out the end. So the signal bounces back down the wire it just traversed. This creates an echo, just as if you were yelling into a canyon. These additional signals on the wire make communication impossible. To prevent this, you must place a terminator on each end of the wire to absorb the unwanted echo.
Proper termination requires that one terminator be connected to a ground. Connecting both terminators to a ground can create a ground loop, which can produce all kinds of bizarre, ghostlike activity, for example, a network share that appears and disappears.
Coaxial cable primarily uses BNC connectors. BNC has many definitions in the computer world. Some think British Naval Connector, citing its origins. Others would say Bayonet Nut Connector, after its function. Still others would say Bayonet Neill Concelman, after its authors. Suffice to say, it’s just easier to call it a BNC connector and know that it’s used on 10Base-2 Ethernet connections to RG-58 cable.
Twisted-Pair Cable Twisted-pair cable consists of multiple, individually insulated wires that are twisted together in pairs. Sometimes a metallic shield is placed around the twisted pairs, hence the name shielded twisted-pair (STP). (You might see this type of cabling in Token Ring installations.) More commonly, you see cable without the metallic shield, called unshielded twisted-pair (UTP). UTP is commonly used in 10BaseT, star-wired networks. The wires in twisted-pair cable are twisted to minimize electromagnetic interference. When electromagnetic signals are conducted on copper wires that are in close proximity (such as inside a cable), some electromagnetic interference occurs. In cabling parlance, this interference is called crosstalk. Twisting two wires together as a pair minimizes such interference and provides some protection against interference from outside sources. This cable type is the most common today, and is popular for several reasons:
It’s cheaper than other types of cabling.
It’s easy to work with.
It permits transmission rates considered impossible 10 years ago.
UTP cable, the more common type of twisted-pair cable, is rated in the following categories: Category 1 Two twisted-pair (four wires). Voice grade (not rated for data communications). This is the oldest category of UTP and it is frequently referred to as POTS, or Plain Old Telephone Service. Before 1983, this was the standard cable used throughout the North American telephone system. POTS cable still exists in parts of the Public Switched Telephone Network (PSTN). Category 2 Four twisted-pair (eight wires). Suitable for up to 4Mbps. Typically used for telephone wiring and some older token-ring networks. Category 3 Four twisted-pair (eight wires), with three twists per foot. Acceptable for up to 10Mbps. The popular cable choice for a long time, and is used in 10Base-T Ethernet networks. Category 4 Four twisted-pair (eight wires) and suitable for 16Mbps. Used for Token-Ring and 10Base-T networks. Category 5 Four twisted-pair (eight wires) and acceptable for 100Mbps. Used in 100Base-T and 10Base-T Ethernet networks. Category 6 Four twisted-pair (eight wires) and rated for 155Mbps. Commonly used in fast Ethernet Networks. Category 7 Four twisted-pair (eight wires) and rated for up to 1000 Mbps (Gigabit). Latest specification. Frequently, you will hear Category shortened to Cat. Today, any cable that you install should be a minimum of Cat 5. We say “a minimum” because some cable is now certified to carry a bandwidth signal of 350MHz or beyond. This allows unshielded twisted-pair cables to reach a speed of 1Gbps, which is fast enough to carry broadcast-quality video over a network. UTP cables use RJ (Registered Jack) connectors rather than BNC connectors. The connector used with UTP cable is called RJ-45, which is similar to the RJ-11 connector used on most telephone cables, except RJ-45 is larger. The RJ-11 has four wires, or two pair, and the network connector RJ-45 has four pair, or eight wires. Signaling Methods How much of a cable’s available bandwidth (overall capacity, such as 10Mbps) is used by each signal depends on whether the signaling method is baseband or broadband. Baseband uses the entire bandwidth of the cable for each signal (using one channel). It is typically used with digital signaling.
In broadband, multiple signals can be transmitted on the same cable simultaneously by means of frequency division multiplexing (FDM). Multiplexing is dividing a single medium into multiple channels. With FDM, the cable’s bandwidth is divided into separate channels (or frequencies), and multiple signals can traverse the cable on these frequencies simultaneously. FDM is typically used for analog transmissions, such as cable television. Another method, time division multiplexing (TDM), can also be used to further divide each individual FDM frequency into individual time slots. Additionally, TDM can be used on baseband systems.
Fiber-Optic Cable If your data runs are measured in kilometers, or if you have gigabits of data to move each second, fiber-optic is your cable of choice because copper cannot reach more than 500 meters (around 1,600 feet—that’s six football fields to you and me) without electronics regenerating the signal. Additionally, fiber-optic is the only cabling technology that can support the high data transfer speeds that the backbone of the Internet requires. You may also want to opt for fiber-optic cable if an installation requires high security because it does not create a readable magnetic field. The most common use of fiber-optic cable these days is for high-speed telephone lines.
Ethernet running at 10Mbps over fiber-optic cable is normally designated 10BaseF; the 100Mbps version of this implementation is 100BaseFX.
Although fiber-optic cable may sound like the solution to many problems, it has pros and cons just as the other cable types. The pros are as follows:
It’s completely immune to electromagnetic interference (EMI) or radio frequency interference (RFI).
It can transmit up to 4 kilometers.
Here are the cons:
It’s difficult to install.
It requires a bigger investment in installation and materials.
Fiber-optic technology was initially expensive and difficult to work with, but it is now being installed in more places. Some companies with high bandwidth requirements plan to bring fiber-optic speeds to the desktop. At the
time this book is being written, the 10 Gigabit Ethernet Alliance (10GEA) is working on the 10G Ethernet standard for fiber optic cabling (which should be ratified in 2002), and fiber-optic networks will probably take off at an even greater rate when vendors begin shipping products by the end of 2001.
Servers In the truest sense, a server does exactly what its name implies: It services client requests for access to resources on the network. Servers are typically powerful computers that run the software that controls and maintains the network. This software is known as the network operating system, which you will learn about later in this chapter. Servers are often specialized for a single purpose. This is not to say that a single server can’t do many jobs, but more often than not, you’ll get better performance if you dedicate a server to a single task. Here are some examples of servers that are dedicated to a single task: File server Allows for a central storage area that clients can use to share data. Print server Controls and manages one or more printers for the network. Proxy server Performs a function on behalf of other computers. Proxy means “on behalf of.” Application server Hosts a network application. Web server Holds and delivers web pages and other web content and uses the Hypertext Transfer Protocol (HTTP) to deliver them. Mail server Hosts and delivers e-mail; is the electronic equivalent of a post office. Fax server Sends and receives faxes (via a special fax board) for the entire network without the need for paper. Remote access server Hosts modems, or VPN connections, for inbound requests to connect to the network; provides remote users (working at home or on the road) with a connection to the network. Telephony server Functions as a “smart” answering machine for the network; can also perform call center and call routing functions.
Network Address Translation (NAT) Server Translates a client’s network address into an Internet address. Regardless of the specific role(s) each server plays, they all (should) have the following in common:
Hardware and/or software for data integrity (such as backup hardware and software)
The capability to support a large number of clients
Physical resources, such as hard drive space and memory, must be greater in a server than in a workstation because the server needs to provide services to many clients. Also, a server should be located in a physically secure area. Figure 1.5 shows a sample network that includes both workstations and servers. Note that there are more workstations than servers because a few servers can serve network resources to hundreds of users simultaneously.
If the physical access to a server is not controlled, you don’t have security. Use this guideline: If anybody can touch it, it isn’t secure. The value of the company data far exceeds the investment in computer hardware and software.
FIGURE 1.5
A sample network including servers and workstations
Do you Protect Your Server? Most institutions list a server backup as a requirement in their disaster recovery plans, or if they don’t have a plan at least somewhere in their operational policies; however, if money is tight, they generally forego the expense. This isn’t a good idea, because a company’s data (and their business) resides on the server(s) that it maintains. In one case, a company’s employees failed to check that the backup program had been successful in completing the process. This went on for months until a server crash necessitated the restoration of the data from the backup tapes. The damage was about $5 million dollars and several employee jobs. The moral of the story—always check your backups!
Repeater While cables connect networks together, it is the electronic signal that contains the information passed between two networking devices. Unfortunately, electronic signals can be interfered with on its way to its destination, and will degrade due to electrical resistance from the cable itself (a process called attenuation). Attenuation therefore affects how far away you can place a workstation. A repeater is a device that is placed between the workstation and the rest of the network to amplify the signal, thus allowing a workstation to be place further on the network than if it used just one cable. However, you can not use an unlimited amount of repeaters because most networks require a response on the network in a specific amount of time.
Bridge A bridge is a network device, operating at the Data Link layer of the Open Systems Interconnection (OSI) model, that logically separates a single network into two segments but enables the two segments to appear to be one network to connected workstations. The primary use for a bridge is to keep traffic meant for stations on one side on that side of the bridge and not let it pass to the other side. For example, if you have a group of workstations that constantly exchange data on the same network segment as a group of workstations that don’t use the network much, the busy group will slow down the performance of the network for the other users. If you put in a bridge to separate the two groups, only traffic destined for a workstation on the other side
of the bridge will pass to the other side. All other traffic stays local. Figure 1.6 shows a network before and after bridging. Notice how the network has been divided into two segments; traffic generated on one side of a bridge will never cross the bridge unless a transmission has a destination address on the opposite side of the bridge. FIGURE 1.6
In addition to segmenting a network, bridges also allow you to route nonroutable protocols, such as NetBEUI. This can be desirable if your network requires this type of situation, but can re-introduce any traffic problems that you previously experienced.
Hub After the NIC, a hub is probably the next most common device found on networks today. A hub (also called a concentrator) serves as a central connection point for several network devices. At its basic level, a hub simply repeats everything it receives on one port to all the other ports on the hub, and doesn’t care what stations are connected, thus it provides a communication pathway for all stations connected to it. Figure 1.7 shows an example of a hub. FIGURE 1.7
A standard hub
Hubs are found on every twisted-pair Ethernet network, including those found at ISPs. Hubs are used to connect multiple network devices together. ISPs may have several Internet servers connected to a hub, which is in turn connected to the ISP’s Internet connection, allowing the servers to communicate with each other as well as with the Internet. There are many classifications of hubs, but two of the most important are active and passive:
An active hub is electrically powered and actually amplifies and cleans up the signal it receives, thus doubling the effective segment distance limitation for the specific topology (for example, extending an Ethernet segment another 100 meters).
A passive hub typically is unpowered and makes only physical, electrical connections. Normally, the maximum segment distance of a particular topology is shortened because the hub takes some power away from the signal strength to be able to do its job.
Switch In the past few years, the switching hub has received a lot of attention as a replacement for the standard hub. The switching hub is more intelligent than a standard hub in that it can actually understand some of the traffic that passes through it. A switching hub (or switch for short) listens to all the stations connected to it and records their network cards’ hardware addresses (see Figure 1.8). Then, when one station on a switch wants to send data to a station on the same switch, the data gets sent directly from the sender to the receiver. This is different from the way hubs operate. As mentioned in the previous section, hubs don’t care what stations are connected and simply repeat anything they receive on one port out to all the other ports. Because of this difference, there is much less overhead on the transmissions and the full bandwidth of the network can be used between sender and receiver. Switches have received a lot of attention because of this capability. If a server and several workstations were connected to the same 100Mbps Ethernet switch, each workstation would receive a dedicated 100Mbps channel to the server, and there would never be any collisions.
A switch builds a table of all addresses of all connected stations
Router Routers play a major part in the Internet. As a matter of fact, the structure of the Internet is made up of two major items: routers and phone connections. (Phone connections are discussed later in this chapter.) A router is a network device that connects multiple, often dissimilar, network segments into an internetwork. The router, once connected, can make intelligent decisions about how best to get network data to its destination based on network performance data that it gathers from the network itself. Because the router is somewhat intelligent, it is much more complex and thus more expensive than other types of network connectivity devices.
Router Ports A router is not much to look at. Most routers have metal cases and are roughly 19 inches wide, approximately 14 inches deep, and anywhere from 1.5 inches high to 2 feet high with the more complex models. A typical router has multiple ports, or connection points, so that it can connect to all kinds of different network segments and route traffic between them. But at the bare minimum, most routers have at least three ports, and each has a different use. Each port connects to a different device. For example, the most common port found on a router (there may be many of these ports) is a high-speed serial port (usually labeled something like WAN 0 or Serial 0). This port usually connects to either a modem bank or a WAN connection device like a Channel Service Unit /Data Service Unit (CSU/DSU), which is used to connect a router to a T1 phone line, discussed later in this chapter. The second type of port is the port that connects the router to the LAN. It is usually an Ethernet port that you would connect to a hub so that the router could communicate with the rest of the LAN. It is usually labeled something like LAN 0 or Eth0 (for an Ethernet router). The third type of port that some routers have is what is called an out-ofband management port. This port is a serial port (that most often uses an RJ45 connector) that you connect to a terminal or PC running terminal software so you can configure the router. Some routers forego this port in favor of in-band management, meaning that you run the management software on a PC connected to the network and configure the router over the network. Some routers have one or the other, but many high-end routers have both to allow you the most flexibility in configuration. Figure 1.9 shows an example of a router and some of the most common items found on routers today. Note that the router shown in Figure 1.9 has two serial ports, a LAN port, and an out-of-band management port.
Brouter A brouter is a cross between a bridge and a router. As discussed previously, a bridge segments a network to keep local traffic on the right side of the bridge while a router can connect multiple networks together and make intelligent decisions on where to forward traffic. A brouter combines the best of both worlds, but is not used as often as a switch.
Modems The device most commonly used to connect computers over a public medium is a modem (a contraction of modulator/demodulator). A modem changes digital signals, which are in the form of ones and zeros, from the computer into analog signals that can be transmitted over phone lines and other analog media. On the receiving end, the modem changes the analog signals back to digital signals. The pattern of these analog signals encodes the data for transmission to the receiving computer. The receiving modem then takes the analog signals and turns them back into ones and zeros. Using this method, which is slower than completely digital transmissions, data can travel over longer distances with fewer errors.
A modem can be either internal or external. The key difference between the two is the amount of configuration required. You must configure internal modems with an IRQ and an I/O address as well as a virtual COM port address to ensure that they function properly. External modems simply connect to a serial port and don’t require nearly as much configuration. When deciding which type to purchase, consider how many IRQs and I/O addresses are free on the machine where you are installing the modem. This is a key factor if you are already running a significant number of peripherals on the computer. The three most common types of modems that are discussed in this chapter are:
Analog
Cable
xDSL
Cable and xDSL modems are not really modems, but rather are adapters. The different between a modem and an adapter is that a modem converts digital signals into analog, and vice versa. Cable and xDSL modems convert digital data from the computer into a digital format that is understood by that technology, and vice versa. Even though the i-Net+ exam lists them as modems, be aware of the difference.
Analog Modems Analog modems are the most common form of modem in use today and are rated to transmit at 56 Kbps; however, the best transmission rate that you can get with a standard analog modem is actually 52 Kbps due to FCC regulations. Analog modems use the standard public phone line from your home to send and receive data, and are typically a connect-on-demand device (although you could maintain a permanent connection, if you wanted).
Cable Modems Cable television has become rather popular in the United States because of the number of channels a subscriber can receive and the relatively few reception problems compared to the old “rabbit-ears” method. Cable systems have now evolved to include Internet services directly from your cable box.
A cable modem is designed to convert the digital signals from your computer and translate them into an acceptable format that the cable provider’s system can understand. Cable modems are easy to install and use. You can either lease one from the cable company for a small monthly charge, or you can save some money and buy one at your local computer store. External cable modems have one connection to your cable TV converter (or wall outlet) and one connection to your PC, while internal versions require only the cable connection. Typically, transmission rates can get up to 10Mbps, which is much higher than an analog modem and comparable to an xDSL modem.
Do not expect to receive a 1.5 Mbps data rate very frequently. The data rate is actually dependent upon how many clients in your area have subscribed to the same service and whether or not they are using the service at the same time. Cable Internet services are similar to a hub in that they are shared services.
Cable modems are becoming increasingly popular in metropolitan areas for two reasons:
They are easy to install and require very little waiting time for the cable company to turn on the service. This is because metropolitan areas already have cable lines installed for television, and the phone company basically just has to “flip a switch.”
They are inexpensive. For about the cost of a second (or third) phone line per month, cable offers a high-speed connection to the Internet without interfering with your phone line. (So who needs call waiting?)
xDSL Modems xDSL modems are based on digital subscriber line technology, which is discussed later in the chapter, and is cable’s main competition. The x in xDSL can stand for several different versions of DSL technology, but xDSL modems all work the same way. All xDSL configurations require a modem, called an endpoint, and a NIC. Often, the modem and NIC are on a single expansion card that helps cut down on the number of card’s or peripherals that attach to the computer. The modem is then hooked up to the phone line.
CSU/DSU A channel service unit / data service unit (CSU/DSU) is a LAN-to-WAN network device that converts the digital signals from your LAN into the format required by the WAN communications link, and vice versa. You can think of a CSU/DSU as a type of translator between two different technologies. The CSU/DSU typically has one port that is connected to the LAN, and one port that connects to your WAN modem or adapter.
Firewall Networks that are connected to the Internet are subject to possible attacks from outside malicious entities located elsewhere on the Internet. To protect a network against attacks, a device called a firewall is employed. Firewalls reside between a company’s LAN and the Internet and monitor all traffic going into and out of the network. Any suspicious or unwanted activity is monitored and, if necessary, quelled. Firewalls are usually combinations of hardware and software with multiple NICs (one for the Internet side, another for the LAN side, and possibly a third for a DMZ, discussed in a moment). Some firewalls are stand-alone hardware devices; others consist of special software that turns the server into a firewall. Both types can be generalized as firewalls. The major difference between the two is that the latter may run a commercially-available NOS, like NT, NetWare, or Unix, whereas the former is running its own highly specialized operating system. Most firewalls in use today implement a concept called a demilitarized zone (DMZ) or screened subnet, which is a network segment that is neither public nor local, but halfway between. People outside your network primarily access your public Web servers, FTP servers, and mail-relay servers. Because hackers tend to go after these servers first, place them in the DMZ. A standard DMZ setup has three network cards in the firewall computer. The first goes to the Internet. The second goes to the network segment where the aforementioned servers are located, the DMZ. The third connects to your intranet.
Never put your intranet server into a DMZ. By doing so, you’re allowing a hacker access to your corporate information and thereby defeating the purpose of a DMZ.
When hackers break into the DMZ, they can see only public information. If they break into a server, they are breaking into a server that holds only public information. Thus, the entire corporate network is not compromised. In addition, no e-mail messages are vulnerable; only the relay server can be accessed. All actual messages are stored and viewed on e-mail servers inside the network. As you can see in Figure 1.10, the e-mail router, the FTP server, and the Web server are all in the DMZ, and all critical servers are inside the firewall. FIGURE 1.10
A firewall with a DMZ
Internet
DMZ Router
E-mail router
Web server
Switch
Firewall
FTP server
Switch
Protected intranet E-mail server File & print server
In addition to all the hardware components, networks use some software components to tie together the functions of the different hardware components. Each software component has a different function on the network. In this section, you will learn about some of the software often found on a network. The most important network software components that you’ll learn about include:
Network operating system (NOS)
Protocols
Each software component runs on a computer and provides the network with some service.
Network Operating System Every network today uses some form of software to manage the resources of the network. This software runs on the servers and is called a network operating system (or NOS, for short). NOSes are, first and foremost, computer operating systems, which means they manage and control the functions of the computer they are running on. NOSes are more complex than computer operating systems because they manage and control the functions of the network as well. A NOS gives a network its “soul” because each NOS works a bit differently. Different NOSes will need to be administered differently. The three most popular network operating systems that you will need to know about are:
Microsoft Windows NT/Windows 2000
Novell NetWare
Unix
In the following sections, you will learn background information on each NOS, its current version, its applicability to the Internet/its strength as a NOS for an Internet server, and its system requirements.
Microsoft Windows NT/2000 There has been a buzz in the computer industry as of late about Windows 2000, produced by Microsoft Corporation. Everyone’s asking, “Should I be installing it or Windows NT?” With the same graphical interface as other versions of
Windows and simple administration possible from the server console, it is a force to be reckoned with. Microsoft has put its significant marketing muscle behind it, and Windows NT has become a viable alternative in the network operating system market, previously dominated by Novell NetWare and the various flavors of Unix. Currently, Windows 2000 is just beginning to catch up with Windows NT deployments.
Most companies already have a significant investment in Windows NT, and have not migrated to Windows 2000 because of the expense of migrating their current environments. In the business world, upgrades usually take a few years to occur. For more information on Windows NT and 2000 than is discussed in this chapter, visit Microsoft’s Web site at www.microsoft.com.
Microsoft’s Windows NT Server has become the predominant generalpurpose server for the industry. Its versatility and familiar graphical user interface (it’s the same as Windows 95/98 in NT 4 and 2000) belie its complexity. Using TCP/IP and other protocols, Windows NT can communicate and be integrated with NetWare and Unix servers. Additionally, it is the preferred NOS for the intranet and Internet services of small companies because it’s easy to set up and manage for Internet services. Again, this ease comes from the familiarity people have with the client OS, Windows 95/98. Also, Internet services can be installed during NOS setup with a few mouse-clicks and a minimal amount of configuration. The only downside to Windows NT is that it’s sometimes unstable and it has much larger hardware requirements than the other NOSes discussed in this chapter (as listed in Table 1.1). TABLE 1.1
Windows NT Server 4 Hardware Requirements Hardware
Minimum
Recommended
Processor
Intel 80486 or higher (I386 architecture) or a supported RISC processor (MIPS R4x00, Alpha AXP, or PowerPC)
Windows NT Server 4 Hardware Requirements (continued) Memory
16MB
32MB or greater
Network card
At least one that matches the topology of your network
At least one that matches the topology of your network
CD-ROM
None
8x or greater
Mouse
Required
Required
Windows 2000 is the latest version of Microsoft’s server suite, and is designed to eventually replace both Windows 95/98 (known as Windows 9x in the industry) and Windows NT. While Windows NT used a similar architecture as Windows 9x, Windows 2000 only looks similar. Windows 2000 is the biggest release of Windows to date and has the most features, including a new directory service, called Active Directory Service and based on X.500 technology, and Plug-and-Play support. The minimum hardware support requirements for Windows 2000 are listed in Table 1.2. TABLE 1.2
Windows 2000 Server Hardware Requirements Hardware
Minimum
Recommended
Processor
133MHz or higher PentiumCompatible CPU
(The faster the better)
Display
VGA
SVGA
Hard disk space
2GB with a minimum of 1GB free space (Additional free hard drive space is required if you are installing over a network.)
Windows 2000 Server Hardware Requirements (continued) Hardware
Minimum
Recommended
Network card
At least one that matches the topology of your network
At least one that matches the topology of your network
CD-ROM
None
8x or greater
Mouse
Required
Required
Note that these are the minimum requirements for Windows 2000 Server, and not Windows 2000 Advanced Server.
Novell Netware NetWare, made by Novell, Inc., was the first NOS developed specifically for use with PC networks. It was introduced in the late ’80s and quickly became the software people chose to run their networks. NetWare is one of the more powerful network operating systems on the market today. It is almost infinitely scalable and has support for multiple client platforms. Although most companies larger than a few hundred stations are running NetWare, this NOS enjoys success in many different types of networks. Currently, NetWare is at version 5.1 and includes workstation management support, Internet connectivity, Web proxy, and native TCP/IP protocol support, as well as continued support for its award-winning X.500-based directory service, Novell Directory Services (NDS). At the time this book is being written, Novell 6 is scheduled for release at the last quarter of 2001. As an Internet and intranet NOS, NetWare sees use in large networks for secure intranets. In our tests, with similarly configured servers, NetWare had the best Web server performance over NT and Unix (using the included Netscape Enterprise Server for NetWare). Plus, its web page security is integrated with Novell’s directory service (NDS). Hardware requirements are listed in Table 1.3.
For more information on NetWare, check out Novell, Inc.’s web site at www.novell.com.
TABLE 1.3
NetWare 5.1 Hardware Requirements and Recommendations Hardware
Minimum
Recommended
Processor
Pentium II
None listed
Display
VGA
SVGA
Hard disk space
1.3GB
40GB or more
Memory
128MB for Standard, 384MB to Include WebSphere
256MB for Standard, 768MB to include WebSphere
Network card
At least one
As many as required
CD-ROM
Required
8x or greater
Mouse
Not required
Recommended if using graphical interface. PS/2 style is the best choice
Unix Of the network operating systems other than Windows NT and NetWare, the various forms of Unix are probably the most popular. It is also among the oldest of the network operating systems. Bell Labs developed Unix, in part, in 1969—in part because there are now so many iterations, commonly called flavors, of Unix that it is almost a completely different operating system. Although the basic architecture of all flavors is the same (32-bit kernel, command-line based, capable of having a graphical interface, as in X Windows), the subtle details of each make one flavor better than another in a particular situation.
Unix flavors incorporate a kernel, which constitutes the core of the operating system. The kernel can access hardware and communicate with various types of user interfaces. The two most popular user interfaces are the command-line interface (called a shell) and the graphical interface (X Windows). The Unix kernel is similar to the core operating system components of Windows NT and NetWare. In Unix, the kernel is typically simple and, therefore, powerful. Additionally, the kernel can be recompiled to include support for more devices. As a matter of fact, some flavors include the source code so that you can create your own flavor of Unix. As an Internet platform, Unix has many advantages, mainly because the Internet was first and foremost a Unix-based network. Many services available for the Internet (like Usenet news) work best on the Unix platform because these technologies were first developed on Unix. Additionally, Unix is powerful enough to scale to service hundreds of thousands of web requests per second. Many of the most popular web sites run on Unix. Each flavor of Unix has widely varied hardware requirements. Some flavors can run on any processor/hardware combination. Others can only run on certain combinations. As an example, hardware requirements for the common PC-based Unix flavor Red Hat Linux 7.1 are covered in Table 1.4. If you need to install any flavor of Unix onto a computer, check the software’s packaging or documentation for its respective hardware requirements.
Unix hardware requirements vary from vendor to vendor. As such, they are not currently tested for in the exam.
TABLE 1.4
Red Hat 7.1 Linux Hardware Requirements Hardware
Minimum
Recommended
Processor
Intel 80486 or higher (I386 architecture), 680x0, or a supported RISC processor (MIPS, AP1000+, Alpha AXP, SPARC, or PowerPC)
Red Hat 7.1 Linux Hardware Requirements (continued) Hard disk space
450MB for Workstation, 1620MB for Server
1.5GB free
Memory
16MB
32MB or greater
Network card
None
At least one that matches the topology of your network
Protocols All network entities must communicate to gain the benefits of being networked. To be able to communicate, each device on the network must understand the same basic rules of that communication. For example, each node must understand a common “language” and the types of “words” to use. Not to imply that computers speak English, but they do need a set of rules to communicate. These rules are called protocols. Multiple protocols operating together are called a protocol suite. Finally, a software implementation of a protocol is called a protocol stack. There is really only one protocol suite used on the Internet, the Transmission Control Protocol/Internet Protocol (TCP/IP) suite. It was developed at approximately the same time the Internet was developed. When it was being designed, its designers wanted a protocol that could reconfigure itself around possible breaks in the communication channel. Today, TCP/IP is almost ubiquitous because almost every operating system includes a TCP/IP protocol stack so that the operating system can communicate with the Internet. That feature, along with its relatively decent performance, makes TCP/IP a very popular protocol. We’ll discuss TCP/IP in more in detail in Chapter 3.
Local Area Network Link Types Local area networks (LANs) have many ways of delivering data from point A to point B. These “link types” include specifications that dictate how the stations will transmit their data, how the data will travel on the network, and how much data can be transmitted. The majority of networks installed today
Other Protocols In addition to TCP/IP, there are other protocols available for use on LANs. The protocol suite Internetwork Packet eXchange/Sequenced Packet eXchange (IPX/SPX), developed by Novell for use with NetWare, is probably the second most popular protocol. It is used with both NetWare and Windows NT and is a popular choice because of its ease of configuration. Some other protocol suites you may encounter are the NetBIOS Enhanced User Interface (NetBEUI), DEC Networking (DECNet), and Systems Network Architecture (SNA) protocols, but these see much more limited use in LANs today when compared to TCP/IP and IPX/SPX.
(including the ones at ISPs) use these link types. There are two popular LAN link types you will see on almost every network:
Ethernet
Token Ring
Most servers and workstations connect using one of these link types.
Ethernet Ethernet, the most popular network specification, was originally the brainchild of Xerox Corporation. Introduced in 1976, it quickly became the network of choice for small LANs. The Unix market was the first to embrace this easy-to-install network. Ethernet uses the CSMA/CD (Carrier Sense Multiple Access with Collision Detection) media access method, which means that only one workstation can send data across the network at a time. It functions much like the old party line telephone systems used in rural areas. If you wanted to use the telephone, you picked up the line and listened to see if anyone was already using it. If you heard someone on the line, you didn’t try to dial or speak; you simply hung up and waited a while before you picked up the phone to listen again. If you picked up the phone and heard a dial tone, you knew the line was free. You and your phone system operated by carrier sense. You sensed the dial tone or carrier, and if it was present, you used the phone. Multiple access means that more than one party shared the line. Collision detection means that if two people picked up the phone at the same time and dialed, they would “collide” and both would need to hang up the phone and try again. The first one back on the free line gains control and is able to make a call.
In the case of Ethernet, workstations send signals (frames) across the network. When a collision takes place, the workstations transmitting the frames stop transmitting and wait for a random period of time before retransmitting. Using the rules of this model, the workstations must contend for the opportunity to transmit across the network. For this reason, Ethernet is referred to as a contention-based system. Current implementations of Ethernet allow for connection speeds of either 10 or 100Mbps. There are, however, standards being developed for Gigabit Ethernet (one thousand megabits per second).
Token Ring Token Ring was developed by IBM as a robust, highly reliable network. It is more complex than Ethernet because it has self-healing properties. Token Ring is an IEEE 802.5 standard whose topology is physically a star but logically a ring. Workstations connect to the bus by means of individual cables that connect to a multistation access unit (MSAU) or controlled-access unit (CAU). MSAUs and CAUs are similar to Ethernet hubs in that they exist at the center of the star, but they are for Token Ring networks. The difference between an MSAU and a CAU is that an MSAU is a passive device that has no power plug and no intelligence, whereas a CAU has intelligence and a power plug. A CAU can perform physical network management operations. The original Token Ring cards were 4Mbps. These were later replaced by 16Mbps cards. The 16Mbps cards are manufactured to work at 4Mbps (for compatibility), but the 4Mbps cards only run at 4Mbps. The 4Mbps version will allow only one token on the ring at a time. The 16Mbps version will allow a card to retransmit a new free token immediately after the last bit of a frame. The term for this is early token release.
When configuring a Token Ring network, you must remember that all Token Ring cards must be set to either 4Mbps or 16Mbps. You cannot mix the speeds on the same segment.
In a Token Ring, although the cards attach like a star to the MSAU or CAU, they function logically in a ring. A free token (a small frame with a special format) is passed around the ring in one consistent direction. A node receives the token from its nearest active upstream neighbor (NAUN) and passes it to its nearest active downstream neighbor (NADN). If a station receives a free token, it knows that it can attach data and send it on down the
ring. This is called media access. Each station is given an equal chance to have the token and take control to be able to pass data. Each station in the ring receives the data from the busy token and repeats the data, exactly as it received it, on to the next active downstream neighbor on the ring. The addressed station (the station the data is intended for) keeps the data and passes it on up to its upper-layer protocols. It then switches 2 bits of the frame before it retransmits the information back on to the ring to indicate that it received the data. The data is sent repeatedly until it reaches the source workstation, and then the process begins again. Each station in the ring basically acts as a repeater. The data is received and retransmitted by each node on the network until it has gone full circle. This is something like the party game called Rumor or Telephone, in which one person whispers something into one player’s ear, who in turn whispers it into someone else’s ear, and so on until it has gone full circle. The only difference is that, in the party game, when the person who initiated the message receives it back, it has usually undergone substantial permutations. When the originating node on the network receives the message, it is normally intact except that 2 bits have been flipped to show that the message made it to its intended destination.
Token Ring computers act as repeaters, in contrast to computers in an Ethernet network, where they are passive and therefore not relied on to pass data. This is why Token Ring networks can experience periods of latency when a computer fails and Ethernet networks will not. Also, the token-passing access method will not have collisions because only one token is on the cable at one time; Ethernet networks with CSMA/CD do have collisions.
Internet Bandwidth Link Types An Internet bandwidth technology (or link) is the communications pathway between the various LANs that make up the Internet. These links are typically specific types of analog or digital telephone lines that carry data for a corporate WAN and for the Internet. They are leased from the telephone companies that serve the cities at the ends of the link. Hence, these WAN links are often called leased lines. In addition to connecting networks together, the same WAN link technologies are also used to connect entire networks to the Internet and to provide
the Internet with its structure by connecting multiple ISPs together. Wide area network links are commonly grouped into two main types:
Point-to-point
Public switched networks
Point-to-Point WAN Connections Point-to-point WAN connections are WAN links that exist directly between two locations. Point-to-point connections are typically used for WAN connections between a central office and a branch office or from these locations to an ISP for Internet connectivity. These connections come in a variety of connection speeds. The main advantage of point-to-point connections to the Internet is that there is only one “hop” between the two locations, thus much less latency in each transmission, which means more data can be transmitted. The main downside is that these connections are often more expensive than their switched counterparts. There are seven main point-to-point WAN connections in use today:
DDS/56Kbps
T1/E1
T3/E3
Asynchronous Transfer Mode (ATM)
Integrated Services Digital Network (ISDN)
Digital Subscriber Line (DSL)
Synchronous Optical Network (SONET)
Each connection type differs primarily in the data throughput rates offered and in the cost. In this section, you will learn about the most popular pointto-point WAN (and Internet) connection types. DDS/56Kbps The Dataphone Digital Service (DDS) line from AT&T is a dedicated, pointto-point connection with throughput anywhere from 2400bps to 56Kbps. The 56Kbps digital connection is the most common, and this type of line has since obtained the moniker 56K line. This type of line is used most often for small office connections to the central office. Some small companies may use this for their connection to their ISP for an Internet connection.
If a phone company other than AT&T provides this service, the line is known as a Digital Data Service line. The abbreviation is still DDS, however.
T1/E1 A T1 is a 1.544Mbps digital connection that is typically carried over two pair of UTP wires. This 1.544Mbps connection is divided into 24 discrete, 64Kbps channels (called DS0 channels). Each channel can carry either voice or data. In the POTS world, T1 lines are used to bundle analog phone conversations over great distances, using much less wiring than would be needed if each pair carried only one call. This splitting into channels allows a company to combine voice and data over one T1 connection. You can also order a fractional T1 channel that uses fewer than the 24 channels of a full T1. An E1 is the same style channel, but it is a European standard and is made up of 32 64Kbps channels for a total throughput of 2.048Mbps. A T1 connection is used very often to connect a medium-size company (50 to 250 workstations) to the Internet. It is usually cost prohibitive to have a T1/E1 connection for any company smaller than that, and it doesn’t have the bandwidth that larger companies would require for high-speed WAN connections. Smaller ISPs that mainly provide residential dial-up connections may only have a T1 connection. T3/E3 A T3 line and a T1 connection work similarly, but a T3 line carries a whopping 44.736Mbps. This is equivalent to 28 T1 channels (or a total of 672 DS0 channels). E3 is a similar technology for Europe that uses 480 channels for a total bandwidth of 34.368Mbps. Currently these services require fiberoptic cable or microwave technology. Many local ISPs have T3 connections to the major ISPs, such as SprintNet, AT&T, and MCI. Also, very large, multinational companies use T3 connections to send voice and data between their major regional offices. Asynchronous Transfer Modem (ATM) Of the link types we have discussed so far, Asynchronous Transfer Mode (ATM) is one link type that is used on both LANs and WANs. ATM uses cell-switching technology, which means that it works by dividing all data to be transmitted into special 53-byte packets called cells and sending them over a switched, permanent virtual circuit. Because all the packets are the same length, and because they are very small, ATM is a highly efficient, and
very fast, set of WAN standards. It can support transmissions of voice and video in addition to data at speeds of from 1.5 to 2488Mbps. Additionally, ATM supports the ability to reserve bandwidth to ensure Quality of Service (QoS) so that voice and data transmissions won’t interfere with each other. Several Internet backbone ISPs use ATM to move massive amounts of Internet data quickly. ISDN ISDN is a digital, point-to-point network capable of maximum transmission speeds of about 1.4Mbps, although speeds of 128Kbps are more common. Because it is capable of much higher data rates, at a fairly low cost, ISDN is becoming a viable remote Internet connection method, especially for those who work out of their homes and require high-speed Internet access but can’t afford a T1 or higher. ISDN uses the same UTP wiring as your residential or business telephone wiring (also known as Plain Old Telephone Service, or POTS), but it can transmit data at much higher speeds. That’s where the similarity ends, though. What makes ISDN different from a regular POTS line is how it uses the copper wiring. Instead of carrying an analog (voice) signal, it carries digital signals. This is the source of several differences. A computer connects to an ISDN line via an ISDN terminal adapter (often incorrectly referred to as an ISDN modem). An ISDN terminal adapter is not a modem because it does not convert a digital signal to an analog signal; ISDN signals are digital. A typical ISDN line has two types of channels. The first type of channel is called a Bearer, or B, channel, which can carry 64Kbps of data. A typical ISDN line has two B channels. One channel can be used for a voice call while the other is being used for data transmissions, and this occurs on one pair of copper wire. The second type of channel is used for call setup and link management and is known as the Signal, or D, channel (also referred to as the Delta channel). This third channel has only 16Kbps of bandwidth. In many cases, to maximize throughput, the two Bearer channels are combined into one data connection for a total bandwidth of 128Kbps. This is known as bonding or inverse multiplexing. This still leaves the Delta channel free for signaling purposes. In rare cases, you may see user data such as e-mail on the D line. This was introduced as an additional feature of ISDN, but it hasn’t caught on. ISDN has three main advantages:
Fast connection.
Higher bandwidth than POTS. Bonding yields 128Kbps bandwidth.
Specialized equipment is required at the phone company and at the remote computer.
Not all ISDN equipment can connect to each other.
xDSL Digital Subscriber Line (DSL) is a hot topic for home Internet access because it is relatively cheap (less than $100/month in most areas), fast (greater than 128Kbps), and available in most major cities in the United States. xDSL is a general category of copper access technologies that is becoming popular because it uses regular, POTS phone wires to transmit digital signals and is extremely inexpensive compared with the other digital communications methods. xDSL implementations cost hundreds instead of the thousands of dollars that you would pay for a dedicated, digital point-to-point link (such as a T1). They include Digital Subscriber Line (DSL), High Data Rate Digital Subscriber Line (HDSL), Single Line Digital Subscriber Line (SDSL), Very High Data Rate Digital Subscriber Line (VDSL), and Asymmetric Digital Subscriber Line (ADSL), which is currently the most popular. It is beyond the scope of this book to cover all the DSL types. Ask your local telephone company which method they provide. ADSL is winning the race because it focuses on providing reasonably fast upstream transmission speeds (up to 640Kbps) and very fast downstream transmission speeds (up to 9Mbps). This makes downloading graphics, audio, video, or data files from any remote computer very fast. The majority of web traffic, for example, is downstream. The best part is that ADSL works on a single phone line without losing the ability to use it for voice calls. This is accomplished with what is called a splitter, which enables the use of multiple frequencies on the POTS line. xDSL modems were discussed earlier in this chapter. Sonet Some of the fastest WAN connections are those employed in the Synchronous Optical Network (SONET). SONET is a high-speed, fiber-optic system that provides a standard method for transmitting digital signals over a fiberoptic network. Multiple transmission types (such as 64Kbs channels, T1/E1 channels) can be multiplexed together to provide SONET speeds.
SONET is able to achieve maximum transmission speeds of up to 2.488 gigabits per second. It does so by using a fixed frame size of 810 bytes. This fixed frame size makes transmissions very efficient, and thus they can carry more data. SONET speeds are rated as channels. They are designated with an OC (Optical Carrier) number. The OC lines are designated OC-1 through OC768. OC-1 channels communicate at 51.84Mbps, OC-3 channels communicate at 155.52Mbps, and OC-768 channels communicate at 40Gbps.
Public Switched Network WAN Connections The other type of WAN link most commonly in use is the public switched network WAN connection. These connections use the telephone company’s analog switched network to carry digital transmissions. Your network traffic is combined with other network traffic from other companies. Essentially, you are sharing the bandwidth with all other companies. The upside to this type of WAN connection is that it is cheaper than pointto-point connections, but because you share the bandwidth with other traffic, it isn’t necessarily as efficient. Let’s take a brief look at some of the public switched network connections that companies use to connect to the Internet, including:
Public Switched Telephone Network (PSTN)
X.25
Frame Relay
Public Switched Telephone Network (PSTN) Almost everyone outside the phone companies themselves refers to PSTN (Public Switched Telephone Network) as POTS (Plain Old Telephone Service). This is the wiring system that runs from most people’s houses to the rest of the world. It is the most popular method for connecting to the Internet because of its low cost, ease of installation, and simplicity. The majority of the houses in the U.S. that have Internet connections connect to their ISP via PSTN and a modem. The phone company runs a UTP (unshielded twisted-pair) cable (called the local loop) from your location (called the demarcation point or demarc, for short) to a phone company building called the Central Office. All the pairs from all the local loop cables that are distributed throughout a small regional area come together at a central point, similar to a patch panel in a UTP-based LAN.
This centralized point has a piece of equipment called a switch attached. The switch functions almost exactly like the switches we mentioned earlier, in that a communications session, once initiated when the phone number of the receiver is dialed, exists until the conversation is closed. The switch can then close the connection. On one side of the switch is the neighborhood wiring. On the other side are lines that may connect to another switch or to a local set of wiring. The number of lines on the other side of the switch depends on the usage of that particular exchange. Figure 1.11 shows a PSTN system that utilizes these components. FIGURE 1.11
A local PSTN (POTS) network Demarc
Local
Local loop
Regional
Long distance
Switch Switch Central Office
Use caution when working with bare phone wires because they may carry a current. In POTS, the phone company uses a battery to supply power to the line, which is sometimes referred to as self-powered. It isn’t truly self-powered; the power comes from the phone system.
POTS has many advantages, including:
It is inexpensive to set up. Almost every home in the United States has or can have a telephone connection.
There are no LAN cabling costs.
Connections are available in many countries throughout the world.
POTS is the most popular remote access connection method for the Internet because only two disadvantages are associated with it: limited bandwidth and thus a limited maximum data transfer rate. At most, 64Kbps data transmissions are possible, though rarely achieved by anyone connecting from home to the Internet. X.25 X.25 was developed by the International Telecommunications Union (ITU) in 1974 as a standard interface for WAN packet switching. It does not specify anything about the actual data transmission, however. It only makes specifications about the access to the WAN and just assumes that a route from sender to receiver exists. The original X.25 specification supported transmission speeds of up to 64Kbps, but the 1992 revision supports transmission speeds of up to 2Mbps. It is currently one of the most widely used WAN interfaces. Frame Relay Similar to X.25, Frame Relay is a WAN technology in which packets are transmitted by switching. Packet switching involves breaking messages into chunks at the sending router. Each packet can be sent over any number of routes on its way to its destination. The packets are then reassembled in the correct order at the receiver. Because the exact path is unknown, a cloud is used when creating a diagram to illustrate how data travels throughout the service. Figure 1.12 shows a Frame Relay WAN connecting smaller LANs.
Frame Relay uses permanent virtual circuits (PVCs). PVCs allow virtual data communications circuits between sender and receiver over a packetswitched network. This ensures that all data that enters a Frame Relay cloud at one side comes out at the other over a similar connection. The beauty of using a shared network is that sometimes you can get much better throughput than you are paying for. When signing up for one of these connections, you specify and pay for a Committed Information Rate (CIR), or in other words, a minimum bandwidth. If the total traffic on the shared network is light, you may get much faster throughput without paying for it.
Frame Relay begins at the CIR speed and can reach as much as 1.544Mbps, the equivalent of a T1 line, which was discussed earlier. However, the major downside to Frame Relay is that you share traffic with all other people within the Frame Relay cloud. If you aren’t paying for a CIR, your performance can vary widely. Despite this disadvantage, Frame Relay is a popular Internet connection method because of its low cost. Table 1.5 shows all these point-to-point connections and their respective performance, availability, and cost. TABLE 1.5
Point-to-point WAN and Internet connection types
Connection
Max Throughput
U.S. Availability
Relative Cost
56K/DDS
56Kbps
Widely available
Low
T1/E1
1.544Mbps/ 2.048Mbps
Widely available
Medium
T3/E3
44.736Mbps/ 34.368Mbps
Widely available
High
ATM
2488Mbps
Moderately available
Very high
ISDN
Around 2Mbps
Moderately available
Low
DSL
Greater than 128Kbps
Available in larger cities, becoming more available in rural areas
There are many software applications available that help you draw your network, as well as some software that attempts to map the network for you.
Summary
I
n this chapter, you learned about some of some of the LAN and WAN networking technologies that apply to the business of the Internet and how networks are physically described by its topology. Because most networks are connected to the Internet these days, the concepts contained in this chapter will be valuable to you as an Internet professional. You learned the definitions of a LAN, MAN and WAN as well as the differences between them. You also learned about some of the hardware components that exist on the network, including workstations, servers, NICs, network cables, repeaters, hubs, switches, bridges, routers, brouters, firewalls, and modems. In addition to learning about the hardware components, you learned about some of the software components that work on the network to provide Internet (and other) services, including network operating systems (NOSes) and protocols. This chapter included a discussion about the link types that carry data from point A to point B on a network. LAN link types include Ethernet and Token Ring. Ethernet is the most common LAN link type. WAN link types include DDS/56Kbps, T1/E1, T3/E3, ATM, ISDN, DSL, and Frame Relay. The WAN link types can be used for connecting to the Internet and vary in speed and link cost.
Exam Essentials Understand network topologies. Identify the four basic network topologies and be able to identify them graphically. Know the different network types. Understand the difference between a LAN, MAN, and WAN.
Know how to identify a network device and its function. Understand the difference between a network interface card, server, repeater, bridge, hub, switch, router, and brouter. Know the different server types and their function. Understand that the name of the server explains the servers’ function. Understand the different WAN connections. Identify each WAN connection type and the speeds that they can achieve.
Key Terms
B
efore you take the exam, be certain you are familiar with the following terms: 56K line
gies into an internetwork? A. Hub B. Bridge C. Switch D. Router 2. Which Internet bandwidth technology is the primary technology used
on the Internet backbone? A. Token Ring B. Ethernet C. X.25 D. ATM 3. Which of the following is a standard interface for Frame Relay? A. X.25 B. ISDN C. T1 D. xDSL 4. Which network hardware device is required for the computer to be
able to connect it to a network? A. Bridge B. NIC C. Router D. Firewall
5. Which network hardware device protects a LAN against malicious
attacks from the Internet? A. Bridge B. Switch C. NIC D. Firewall 6. Which of the following is the fastest possible Internet communications
technology? A. Ethernet B. ATM C. T1 D. T3 7. A T3 connection has a maximum bandwidth of _______ Mbps? A. 1.544 B. 2.048 C. 34.368 D. 44.736 8. An E1 connection has a maximum bandwidth of ______ Mbps? A. 1.544 B. 2.048 C. 34.368 D. 44.736
9. Of the following, which Internet connection type for home users is
taking off and offers fairly high speed (>128Kbps) for a fairly reasonable price? A. DSL B. ISDN C. Frame Relay D. ATM 10. You are running a Token Ring network with five clients and one
server on the same floor of an office building. What topology are you configured for? A. Bus B. Star C. Ring D. Mesh 11. The _____ server provides address translation services to network
clients accessing the Internet from a LAN. A. Windows 2000 Server B. Firewall C. Network Address Translation Server (NAT) D. Remote Access Server (RAS) 12. Which network hardware device is used to segment a single network
into multiple segments? A. Hub B. Firewall C. NIC D. Bridge
13. Which component of the network is responsible for providing network
services to the rest of the network? A. Server B. Bridge C. Workstation D. NIC 14. A WAN link is depicted in a logic diagram by a _________. A. Straight line B. Dashed line C. Zig-zagged line D. Straight line with an arrow 15. Which network hardware device will increase your web browsing
performance? A. Firewall B. Cache C. Bridge D. Router 16. Which NOS is the oldest NOS currently in use? A. Unix B. NetWare C. Windows NT D. OS/2
17. A ___________ is used in firewalls to provide a safe area for public
data that is not part of the public or private networks. A. Firewall B. Internet-in-a-box C. DMZ D. Router 18. Based on speed and cost, which Internet bandwidth link type would be
the best choice for a small ISP serving 100 dial-up users? A. 56K/DDS B. T1 C. T3 D. ATM 19. Which NOS was developed, in part, by Bell Labs and currently has
several hundred different “flavors?” A. OS/2 B. Windows NT C. NetWare D. Unix 20. Which device is also known as a concentrator? A. Router B. Switch C. Hub D. Brouter
Answers to Review Questions 1. D. Hubs, bridges, and switches connect only the same network topol-
ogies. Routers are the only devices that connect different topologies (such as Ethernet to Token Ring). 2. D. Although Token Ring and Ethernet are found in ISPs, ATM is the
primary WAN technology used on the Internet backbone. X.25 is only a WAN access technology. 3. A. ISDN, T1, and xDSL are all Internet bandwidth technologies; X.25
is the interface for Frame Relay. 4. B. The other devices (bridge, router, and firewall) are all different
network connectivity devices, but you absolutely must have a NIC installed in a computer to be able to connect the computer to a network. 5. D. Bridges, switches, and routers are all simply network connectivity
devices. Some routers can perform packet filtering, but firewalls are designed specifically to protect a network against malicious activity from the Internet. 6. B. ATM has maximum speeds of 2488Mbps. Ethernet has a maxi-
mum transmission speed of 100Mbps. T1 lines are 1.544Mbps, and T3s are 44.736Mbps. 7. D. A T1 connection has a maximum transmission speed of
1.544Mbps. The 2.048 is E1 speed, 34.368 is E3 speed, and 44.736 is T3 speed. 8. B. An E1 connection communicates at 2.048Mbps. T1 connections
are 1.544Mbps, E3 connections are 34.368Mbps, and T3 connections are 44.736 9. A. Frame Relay and ATM normally aren’t for home users (unless you
happen to be Bill Gates J). ISDN is more expensive than DSL and offers more bandwidth.
10. C. Token Ring typically uses a ring or a star-ring topology. With only
six networking devices, you would not need a star-ring. 11. C. A remote access server is used for remote clients needing access to
your corporate network, but does not provide translation services. While Windows 2000 Server does have an NAT component, it is a network operating system. 12. D. A bridge is the only device of those listed that is used to segment a
network. Hubs and NICs are only connection devices and don’t divide a network. Firewalls perform security checks on network traffic but don’t do any segmenting. 13. A. A server provides network services to the rest of the network.
Bridges, workstations, and NICs do not. Bridges segment a network, workstations request the resources a server provides, and NICs allow a workstation to get access to a network. 14. C. The straight line with an arrow may sound correct, but a zig-zag
shows a WAN link that is not under company control. WAN links are leased from vendors, and therefore beyond the direct control of the company. 15. B. Of the devices listed, a cache is the only device that can increase a
network’s web browsing performance. All the others can actually introduce delay into Internet communications. 16. A. Although NetWare, NT, and OS/2 have been in use for some time,
Unix is, in fact, the oldest. 17. C. The demilitarized zone (DMZ) is the network segment connected
to a firewall where public data is placed so that it is available to both public and private networks. A, B, and D are incorrect because they are all examples of other Internet hardware and software technologies.
18. B. Because the maximum connection speed of today’s modems is
56Kbps and the ISP is serving a maximum of 100 users, the maximum throughput needed is 100 × 56, or 5600Kbps (5.6Mbps). A 56K/DDS link would be too slow and a T3 or ATM connection would be way too fast (and probably way too expensive for a small ISP). A T1 (at 1.544Mbps) would be slower than the throughput number figured above, but it is extremely unlikely that all 100 users would be on at the same time. Plus, you can buy multiple T1s for the cost of a single T3. 19. D. The only one of these listed that was developed in any part by Bell
Labs is Unix. The others were all developed by other companies, such as IBM (OS/2), Novell (NetWare), and Microsoft (NT). 20. C. A hub serves as a central connection point for several network
devices, and so it’s also known as a concentrator. Switches, which are also used as central connection points, were developed afterwards and are also known as switching hubs.
Internet Basics I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 3.1 Understand and be able to describe the core components of the Internet infrastructure. Content may include the following:
Network access points
Backbone
Hardware/software infrastructure knowledge
Internetworking devices such as routers, switches and bridges
3.2 Identify problems with Internet connectivity from source to destination for various types of servers. Content may include the following:
he Internet is a very complex entity. To understand the topics found in later chapters in this book, you must first understand the underlying layout and technologies of the Internet so that you have a common reference point for those discussions. In this chapter, you will learn the following:
What the Internet is
Internet layout
Domain Name Services (DNS)
Uniform Resource Locators (URLs)
Internet communications process
Throughout this chapter, you will also learn the terminology of the devices and processes used on the Internet. Let’s begin the discussion of these topics with the definition of the Internet.
What Is the Internet?
T
he simplest definition of the Internet is that it is a collection of local area networks connected together by high-speed public WAN connections. Servers on these LANs provide information to the rest of the Internet in the form of documents, images, and multimedia content. The information delivered by these servers is generally called Internet content. For a small fee, or in some cases free if you can deal with the pop-up ads, anyone with a computer and a modem can access the Internet and get access to this content.
Figure 2.1 shows a graphical representation of the Internet. Notice how individual users and LANs connect to Internet Service Providers (ISPs), which in turn can connect to other ISPs that connect to backbone ISPs. Backbone ISPs are ISPs with very high-speed connections between them (several hundred megabits per second). You will learn about ISPs in the sections to follow. FIGURE 2.1
A graphical representation of the Internet
ISP ISP Backbone ISP ISP
Backbone ISP
ISP Backbone ISP
ISP
Backbone ISP ISP
ISP ISP
ISP
History of the Internet The Internet started out as a project of the U.S. government’s Defense Advanced Research Projects Agency (DARPA) in 1973. They had two major goals in mind: interconnectivity and redundancy. The hardest to solve was interconnectivity because a different vendor manufactured each network. In those days, vendors had their own, proprietary protocols that just didn’t talk well, if at all, to other vendor’s products. The second concern was to build a network that would always be available, and they had to design one that could reconfigure itself around breaks and faults in case one of its nodes were taken out during a war (worst case scenario.) The architects of this network, called ARPAnet, took these factors into consideration and developed a suite of protocols (called TCP/IP) and a network that could do just that.
For more detailed information on TCP/IP, see Chapter 3.
Another network was developed in 1980 to connect IBM mainframes in university data centers. This network was called BITnet, and it allowed universities to communicate with one another, thus facilitating collaboration among professors at those universities with the first, primitive e-mail system. In 1983, the Internet Architecture Board (IAB) was formed to guide the development of the TCP/IP protocol suite (the protocol used on the Internet) and to provide research data for the Internet. The IAB consists of two organizations, the Internet Engineering Task Force (IETF) and the Internet Research Task Force (IRTF). The IETF is responsible for the ongoing development of the TCP/IP protocol. When a new TCP/IP protocol is proposed, the IETF issues a Request for Comments (RFC) that details the specifications of the new protocol and how it is to be used. The IRTF, on the other hand, is responsible for researching new Internet technologies and their possible implications on the Internet as a whole.
RFCs can be found at www.ietf.org on the Internet.
In 1986, the National Science Foundation (NSF) developed NSFnet as a backbone for the now-emerging Internet. It would connect the old ARPAnet, BITnet, and a bunch of other networks together to form the Internet. At this point, the Internet became far-reaching and powerful as thousands of people who were now connected to it could all communicate and collaborate.
The Internet Today Since the days of the NSFnet, ARPAnet, BITnet, and all the others, Internet use has grown exponentially. No longer do only geeks and professors know about it; it has become a part of popular culture. Every television commercial ends with the company name and the address of the company’s web site so you can visit it and get even more information. One measure of a company’s success is how many hits the company’s web site gets per day. While estimates vary depending on where you get your information, the best estimate is that there were more than 375 million people worldwide on the Internet, with about 765 million projected by 2005. Currently, more than 75 percent of all metropolitan areas in the United States have Internet access. Basically, any household with a phone line can get access to the
Internet (with either a local or a long distance phone call). With each passing year, Internet access technologies allow faster access to the Internet. Home access speeds are available from 33.6Kbps (modems) to 512Kbps (ISDN and DSL access) to 1.5 Mbps (cable modem). At these speeds, Internet content can include streaming audio and video. Unfortunately, high-speed Internet access has not spread as quickly as first believed, but is slowly working its way across the globe. Once fast Internet access has reached more marketplaces, the estimates given could jump even higher.
The Layout of the Internet
A
lthough the Internet is a constantly evolving entity, its areas can be broken down into several basic classifications:
Access points (ISPs)
WAN connections
Backbone providers
Each classification deals with a particular section of the Internet, as shown in Figure 2.2. Notice how the Internet areas connect to each other and what types of connections are used between them. FIGURE 2.2
Layout of the Internet
ISP
WAN connection
Backbone ISP
WAN connection
ISP WAN connection
ISP
Backbone ISP
WAN connection
Backbone ISP
ISP WAN connection
High-speed WAN connection
In the following sections, you will learn the details of each Internet area and the responsibilities each area has within the Internet. You will also learn which areas end users interact with and the different types of ISPs.
As previously mentioned, anyone can get access to the information found on the Internet, but first they must be connected to the Internet. The Internet has often been called the “Information Superhighway.” I’d actually describe it as an “Internet Tollway.” To get the benefits of the “highway,” you have to pay to get on. Thus, in order to get onto the Internet, you have to pay the people who have set up access points to it (similar to the on-ramps of the toll highways). These access points are called Internet Service Providers (ISPs). An ISP has a very high-speed connection (usually capable of transmitting several megabits per second) to the Internet. The ISP then sells slower (several kilobits per second) dial-up or dedicated connections.
You can find free services from a local ISP; however, you generally get little technical support, a slower connection, and have to deal with annoying popup ads. You should also be aware that some of the “free” services aren’t truly free and may charge you after so many hours of use. If you can afford to pay for Internet services, the fee is most likely well worth the money spent.
ISPs usually have a high-speed LAN, with a large, complex router to connect the LAN to the Internet. Then, on the ISP’s “backbone” (as shown in Figure 2.3) are the ISP’s mail, news, and web servers, as well as the routers that provide dial-up and dedicated leased-line access to the Internet for the ISP’s customers. Additionally, some ISPs sell “space” on their backbone to companies so that those companies can place their web servers directly on the ISP’s backbone for the best possible performance. This practice is called server hosting. In Figure 2.3, notice where the ISP’s backbone is and what items connect to it within an ISP. Also notice that the backbone connects to a router that, in turn, connects the ISP to its own ISP.
E-mail server Internet or other ISP Multiple WAN connections
ISP backbone
ISP backbone router Dial-up router(s)
Leased-line router(s) Customer Web server POTS
Leased line CSU/DSU
CSU/DSU Router
Customer LAN
ISPs can be found in every major city in the U.S. and in almost every rural area. In Europe and the Asian countries, ISPs can be found in the larger cities. However, the Internet’s reach is expanding more and more every day. Soon it will be possible to get Internet access anywhere on (or off) Earth.
To find an ISP in your area, you can look in the Yellow Pages under “Internet providers,” or, if you can get to a machine connected to the Internet, check out The List of ISPs at http://thelist.internet.com.
WAN Connections
I
f the Internet were a living entity, the ISPs would be its appendages and the WAN connections would be the arterial connections between them. A
wide area network (WAN) connection is a special phone line that is leased from the local telephone company and used to carry data between two LANs. For our discussion, WAN connections connect two ISPs to provide the Internet with its structure. WAN connection speeds range from 9600bps to hundreds of megabits per second (Mbps). These WAN connections were covered in detail in Chapter 1, but to summarize, Table 2.1 illustrates some WAN technologies and their associated speeds. TABLE 2.1
Common WAN Connection Technologies WAN Connection
Common Speed(s)
DDS
56Kbps
Frame Relay
56Kbps–1.544Kbps
T1
1.544Kbps
T3
44.736Mbps
ISDN
128Kbps–2Mbps
ATM
155Mbps
For more information on WAN technologies and their speeds, refer to The Network Press Encyclopedia of Networking by Werner Feibel (Sybex, 2000).
As mentioned in Chapter 1, the Internet is a web of interconnected networks. An ISP’s LAN is connected to a router, which is connected by some kind of leased telephone connection to the router at the ISP’s ISP (called an upstream provider). That ISP is connected to another ISP, and so on. Routers are also capable of providing multiple, redundant links between two routers. If one connection fails, the router will send all traffic over the other connection. In addition to providing LAN-to-Internet connectivity, a router can provide a way for dial-up clients to connect to the Internet. When you connect your home computer to the Internet via a modem, your modem is dialing another modem attached to a router of some kind. The router then routes the
requests from the connected computer to the Internet and routes the associated responses back to the original requesting computer. Figure 2.4 shows a sample router with two serial ports and one LAN port and how it might be used in an ISP. Note what devices are connected to each port. Note that the modems that customers will dial into and the WAN connections are connected to a router’s serial port (because they are serial devices) and the LAN port connects to the rest of the LAN. FIGURE 2.4
Router use on the Internet Modem bank
CSU/DSU Internet Leased line
Management PC
Backbone Providers Although the Internet is essentially a network of ISPs, there are a few select ISPs which connect to each other with high-speed WAN connections to provide the Internet with a “backbone.” These ISPs are known as backbone providers (as shown earlier in Figure 2.2) and connect to each other at speeds from 100Mbps to 1Gbps. The Internet backbone is the set of high-speed WAN connections, servers, and ISPs that provide the structure for the Internet. Many ISPs claim to be backbone providers, but this is usually a marketing gimmick and means that they connect directly to an actual backbone provider
but are not actually part of the Internet backbone. Most backbone providers are divisions of telephone companies and are called Network Access Points (NAPs). Sprint and Pacific Bell are examples of NAPs. Originally, there were four major NAPs that connected the Internet. Since that time, several new NAPs have been added, like ICS and Worldcom.
Internet2: The Next Generation The Internet has seen many advancements, but there are even more Internet technologies are just around the corner or waiting to be developed, technologies such as IPv6, Quality of Service (QoS), Telemedicine, video multicasting, and many others. A number of them will improve collaboration abilities and directly benefit higher education (as did the technologies of the current Internet). For this reason, a consortium of higher education institutions have gotten together and formed the University Corporation for Advanced Internet Development (UCAID). One of UCAID’s projects is Internet2 (I2), the collection of next-generation Internet applications and technologies being developed for use with the Internet infrastructure in use today. Internet2 is not its own network, as some people incorrectly assume. It is only the name given to the ongoing research of these technologies and their possible applications. Just as the current Internet technologies have their roots in the collaboration efforts of education, it is the hope of the UCAID that the work done with Internet2 will increase the Internet’s usability. For more information, visit UCAID at www.ucaid.edu.
Domain Name Services (DNS)
D
omain Name Services (DNS) is a network service that associates alphanumeric host names with the TCP/IP address of a particular Internet host. When surfing the Web, you could refer to a host by its IP address (for example, 201.35.124.12), but it is more common to use a DNS host name (www.sybex.com). Internet host names are used because they are easier to remember than long, dotted-decimal IP addresses. In this section, you will learn what a domain is, how domains are organized within DNS, and the specifics of how to use DNS.
What Are Domains? Host names are typically the name of a device that has a specific IP address, and on the Internet, they are part of what is known as a fully qualified domain name. A fully qualified domain name consists of a host name and a domain name. Although you have a Social Security number and can remember it when you need it, life would be difficult if you had to remember the Social Security numbers of all your friends and associates. You might be able to remember the Social Security numbers of as many as 10 friends and relatives, but after that things would get a bit difficult. Likewise, it’s easier to remember www.microsoft.com than it is to remember 198.105.232.6. The process of finding the host name for any given IP address is known as name resolution, which can be performed in several ways, and we’ll look at all of them in the next few sections. First, you need to understand Internet domains and how they are organized.
Internet Domain Organization On the Internet, domains are arranged in a hierarchical tree structure. There are seven top-level domains currently in use: com A commercial organization. Most companies will end up as part of this domain. edu An educational establishment, such as a university. gov A branch of the U.S. government. int An international organization. mil A branch of the U.S. military. net A network organization. org A nonprofit organization.
Unfortunately, the word domain is used in several ways, depending on the context. When the topic is the Internet, a domain refers to a collection of network host computers.
U.S. Domains Your local ISP is probably a member of the net domain, and your company is probably part of the com domain. The gov and mil domains are reserved strictly for use by the government and the military within the United States. The com domain is by far the largest, followed by the edu domain, and well over 130 countries are represented on the Internet.
New U.S. Domains Because the com domain is so popular, almost every company has a web address that ends with .com. Additionally, there are no divisions in any of the domains, especially the com domain. For these reasons, the Internet Assigned Numbers Authority (IANA) has come up with some new top-level domains to further segment the U.S. Internet DNS space. Current proposals, listed below, are scheduled to be released the second or third quarter of 2001 and may already be in use by the time you read this book. aero Domain for the air-transport industry biz Additional domain for businesses coop Domain for cooperatives info Domain used for web sites that provide some useful information to a community (like a community billboard), with no restriction on use museum Domain for various forms of museums name Domain for registration by individuals pro Domain for professionals, such as accountants, lawyers, and physicians
International Domains In other parts of the world, the final part of a domain name represents the country in which the server is located: ca for Canada, jp for Japan, uk for Great Britain, and ru for Russia, for example. Figure 2.5 shows an example of the layout of the Internet DNS hierarchy. Notice how the com, edu, and international domains are all at the same level.
If you want to contact someone within one of these domains by e-mail, you just add that person’s e-mail name to his domain name, separated by an at sign (@). For example, if you want to e-mail the president of the United States, send your e-mail to this address: [email protected] The InterNIC used to assign all Internet domain names and ensure that there are no duplicate names. Names are assigned on a first-come, firstserved basis, but if you try to register a name that infringes on someone else’s registered trademark, your use of that name will be rescinded if the trademark holder objects. In October 1998, however, the Internet Corporation for Assigned Names and Numbers (ICANN) was formed to take over this task. ICANN, located at www.icann.org, is a non-profit organization composed of a 19-member Board of Directors and a staff of 14 at this writing. Instead of contacting ICANN directly to register your domain, you must go through a third-party, called a registrar, who handles your paperwork and submission. Now that we have detailed how Internet domain names work and where they came from, we can return to our discussion of name-resolution methods.
Using DNS The abbreviation DNS stands for Domain Name Services. You use DNS to translate host names and domain names to IP addresses, and vice versa, by means of a standardized lookup table that the network administrator defines and configures. The system works just like a giant telephone directory.
Suppose you are using your browser to surf the Web and you enter the URL http://www.microsoft.com to go to the Microsoft home page. Your web browser then asks the TCP/IP protocol to ask the DNS server for the IP address of www.microsoft.com. When your web browser receives this address, it connects to the Microsoft web server and downloads the home page. DNS is an essential part of any TCP/IP network, simplifying the task of remembering addresses; all you have to do is simply remember the host name and domain name. The DNS tables that are used to resolve the host name to an IP address are composed of records. Each record is composed of a host name, a record type, and an address. There are several record types, including the address record, the mail exchange record, and the CNAME record. The address record, commonly known as the A record, directly maps a host name to an IP address. The example below shows the address record for a host called mail in the company.com domain: mail.company.com.
IN
A
204.176.47.9
The mail exchange (MX) record points to the mail exchanger for a particular host. DNS is structured so that you can actually specify several mail exchangers for one host. This feature provides a higher probability that e-mail will actually arrive at its intended destination. The mail exchangers are listed in order in the record, with a priority code that indicates the order in which the mail exchangers should be accessed by other mail delivery systems. If the first priority doesn’t respond in a given amount of time, the mail delivery system tries the second one, and so on. Here are some sample mail exchange records: host.company.com. host.company.com. host.company.com.
In this example, if the first mail exchanger, mail.company.com, does not respond, the second one, mail2.company.com, is tried, and so on. The CNAME record, or canonical name record, is also commonly known as the alias record and allows hosts to have more than one name. For example, your web server has the host name www, and you want that machine also to have the name ftp so that users can easily FTP in to manage web pages. You can accomplish this with a CNAME record. Assuming you already have
an address record established for the host name www, a CNAME record adding ftp as a host name would look something like this: www.company.com. ftp.company.com.
IN IN
A CNAME
204.176.47.2 www.company.com
When you put all these record types together in a file, its called a DNS table, and it might look like this: mail.company.com. mail2.company.com. mail3.company.com. host.company.com. host.company.com. host.company.com. www.company.com. ftp.company.com.
You can establish other types of records for specific purposes, but we won’t go into those in this book. DNS can become very complex very quickly, and entire books are dedicated to the DNS system.
Obtaining Your Own Domain Name In .COM, .NET, or .ORG It seems like you can’t watch a television commercial these days without seeing at the bottom of the screen a domain name that matches the company name (for example, pizzahut.com for Pizza Hut, ibm.com for IBM, and so on.). You may wonder how these names are obtained. It is easy and almost anyone can do it. It costs around $100.00 for a single domain name, but the fee does vary from registrar to registrar. The steps are as follows:
1. Choose a domain name (such as bobsroom.com). 2. Using your web browser, go to www.internic.net/whois.html and use their search engine to see if the domain name you want has been taken. If the name you want is available, proceed to step 3. If not, go back to step 1 and start over.
3. Find a registrar that you would like to use. You can choose one from ICANN at www.internic.net/regist.html.
4. Go to the registrar’s web site and complete the information requested. Some registrar’s require you to telephone or mail your request, but most can take the information from you directly from their web site (and your credit card number!)
NSLookup NSLookup is a command-line utility, found on Windows NT/2000 and Unix systems, that allows you to lookup information on a domain name, such as it’s IP address. Many web sites are now offering NSLookup services directly from a web page with all of the options listed. To find one of these web sites, type NSLookup in your favorite search engine.
Uniform Resource Locators (URLS)
E
veryone who has ever used the World Wide Web (WWW) has more than likely used a Uniform Resource Locator (URL). A URL is a standard way of referring to an Internet resource when making Internet connections and requests. You will primarily use URLs in web clients (like Navigator and Internet Explorer) and other Internet utilities. A URL consists of several components, including: Protocol designation Specified the specific type of protocol to use when making a connection to the Internet, such as HTTP, FTP, FILE, and TELNET. For more information on these protocols, see Chapter 3. DNS name of host The actual DNS name of the host to which you are connecting. A TCP/IP address can be used in the place of the DNS name. Path The path on the host where the requested resource can be found. This path is relative to the hosting directory on the Internet server. Resource name The actual name of the resource you are requesting from the server. For most web URLs, this name will be the name of an HTML file.
Figure 2.6 shows an example of a URL. Notice that there are different parts of the URL. FIGURE 2.6
A sample URL Protocol
http://www.sybex.com/test/index.html
What the Heck Is a Tilde? Sometimes you will see a URL listed like www.somewhere.com/~dgroth/. Everything in that URL makes sense except the tilde (~) character. This character has a special purpose. In web servers that provide web hosting for multiple users’ home pages, the ~ indicates to the web server to get the web pages from the specified user’s (in our example, dgroth) home directory on that server. Whenever you sign up with an ISP, you are given a user account, a password, and a home directory on a web server. You can then set up your own home page by placing the HTML files in your home directory (or in a special subdirectory under your home directory). When this method is used, the web pages are kept relatively secure from other users, but the web server can still access them.
Internet Communications Process
T
he i-Net+ exam tests your knowledge of the behind-the-scenes processes that happen during Internet communications from an Internet client to the various types of servers that exist. The Internet communication processes that you should understand are as follows:
HTTP (Web) Requests and Responses The most common type of communication on the Internet is that between a web browser and a web server. This communication is known as an HTTP communications session because the request is made using the HTTP protocol. An HTTP communication consists of both a request for data (also known as an HTTP GET) and a response that includes the requested data. Figure 2.7 illustrates the process that occurs when a web browser makes a request of an HTTP server. FIGURE 2.7
The HTTP request and response process 2. Request is sent over TCP/IP using TCP port 80 (by default).
1. User requests a URL.
4. The server returns the requested document(s), and the Web browser displays them. 3. Server receives request, decodes it, and locates the requested files(s).
As shown in Figure 2.7, the HTTP communications session consists of four major processes: 1. The browser submits the URL request to the web server. 2. The browser communicates via TCP/IP and TCP port 80 to the web
server. 3. The web server receives the request, decodes it, and locates the
requested documents. 4. The server returns the requested documents and the web browser
Step 1: The Browser Request There are two entities involved in any HTTP web request: the client and the server. The client is most often a web browser, although other Internet utilities are starting to use HTTP as a request method. The server component is almost always a web (HTTP) server. With a web browser, you make a request of a web server by entering a URL in the address line and pressing Enter or by clicking a hypertext link in an HTML document. This process initiates a request to the web server. The request looks something like this: GET http://www.accn.com/index.html HTTP/1.0 The GET portion of this request is known as the HTTP request method. This can be one of several different options. Some options for the request method are detailed in Table 2.2. TABLE 2.2
HTTP Request Method Options Request Method
Explanation
GET
Primary method of retrieving data from a web server. This method requests a certain document or file from a web server.
PUT
A method by which a client can upload a file to a web server.
HEAD
A method that instructs an HTTP server to return only header information about a requested resource, not the actual resource itself.
Step 2: Browser Communication Because HTTP is part of the TCP/IP protocol suite, it uses part of the TCP/ IP protocol suite as a transport method. Specifically, HTTP uses the Transmission Control Protocol (TCP) as its main transport protocol. When a web browser makes an HTTP request of a web server, HTTP uses TCP port 80 during its communications. A TCP port identifies which TCP/IP process on
the server machine the request is destined for. TCP port 80 is the default port address that specifies that the request is destined for an HTTP server process. Other port addresses can be used, but both client and server must be set up specifically to use them.
TCP/IP and its protocols are covered in more detail in Chapter 3.
In addition, HTTP requests include information such as what (HTML document or multimedia content) is being requested as well as what version of HTTP is being used (HTTP 1.0 in most cases).
Step 3: Web Server Receives Request The third step in the communications process is when the web server receives the HTTP request and processes it. During this step, the web server decodes the request and tries to determine exactly what the browser is asking for. Once the browser has determined what the request is, it locates the file(s) asked for in the request and proceeds to the next step, returning the requested information to the client.
Step 4: The Requested Document Is Returned Once the server has found the requested information, it can send it back to the client that requested it. The server sends the data back using TCP or its “cousin,” User Datagram Protocol (UDP). Which protocol is used depends on the type of content being sent. Most HTML documents are sent back using the TCP protocol.
Caching Server A caching server is responsible for storing frequently accessed web pages on the server and then returning them to the client. The benefit to the client is twofold: content is returned to the client faster than going out to the Internet every time the page is accessed, and the amount of traffic along the Internet connection is reduced. The drawback to caching is that if a web page is updated, old content may be returned to the client. Caching servers use one of two methods to return content to the client: passive caching or active caching.
Some documents, such as those from a paid subscription service or those requiring specific authentication, cannot be cached.
Passive Caching In passive caching, the server waits for a client to make a request, retrieves the document; then, decides whether or not to cache it. The decision process is usually a setting in the caching server’s software, and can tell the server to always cache web content or keep a copy of the most frequently accessed web pages. When the caching server gets close to it’s storage limit, it looks at the amount of requests (or hits) for each web page and then deletes the pages that have the lowest amount of hits. Figure 2.8 illustrates the full process in passive caching. FIGURE 2.8
Passive caching at work Client makes a request for a document
Document in cache?
Yes
Return cached web page
Yes
Determine document(s) with lowest hit count and delete
No Retrieve page from the Web
Server close to storage limit? No Store web page on server and return to client
Active Caching In active caching, the server waits for periods of low activity to go out and retrieve any documents that it thinks will be requested by clients in the near future. Active caching is best if your clients must have the most updated information as possible, such as with a research and development department. The process is as follows: 1. The server detects a period of low activity. 2. The most frequently accessed documents are calculated by the server. 3. The web site for the first document is accessed, and the date/time
stamp on the web page is checked against the document stored in the cache. 4. If the date/time stamp is the same as the cached document, the server
pulls the next document in line and starts back at Step 3. If not, it proceeds to step 5. 5. The older document is deleted, and the updated copy is saved. 6. The next document in line is accessed, and the server starts the process
over again. When the client makes a request for a web document, the server returns the document using the same process as denoted with passive caching. Some caching server software allows you to set specific web pages to always be retrieved from the web site, which helps negate the potential problem of a client receiving old content.
Proxy Server A proxy server is one of several solutions to the problems associated with connecting your intranet or corporate network to the Internet. A proxy server is a program that handles traffic to external host systems on behalf of the client software running on the protected network. This means that clients access the Internet through the proxy server. It’s a bit like those one-way mirrors—you can see out of it, but a potential intruder cannot see in. Many proxy servers also function as a caching server, which allows you to have two servers in one. This is of particular value, as you only need to configure one software application as opposed to two separate products. It can also save your company a lot of money, since software applications for your server can be quite expensive.
A proxy server sits between a user on your network and a server out on the Internet. Instead of communicating with each other directly, each talks to the proxy (in other words, to a “stand-in”). From the user’s point of view, the proxy server presents the illusion that the user is dealing with a genuine Internet server. To the real server on the Internet, the proxy server gives the illusion that the real server is dealing directly with the user on the internal network. So depending on which way you are facing, a proxy server can be both a client and a server. The point to remember here is that the user is never in direct contact with the Internet server, as Figure 2.9 illustrates. FIGURE 2.9
How a proxy server works
However, the proxy server doesn’t just forward requests from your users to the Internet and back. Because it examines and makes decisions about the requests that it processes, it can control what your users can do. Depending on the details of your security policy, client requests can be approved and forwarded, or they can be denied. Rather than requiring that the same restrictions be enforced for all users, many advanced proxy server packages can offer different capabilities to different users.
There are two types of proxies: Winsock proxies and HTTP proxies. Winsock proxies make any kind of TCP/IP request (including FTP, HTTP, and so on) on behalf of client stations. Winsock proxies require a special piece of software on the client station. These proxies also allow TCP/IP requests to be made at the workstation using any protocol. You don’t have to have TCP/IP installed or configured on the workstation to use a Winsock proxy. Most proxies fall into this category. HTTP proxies, on the other hand, simply make web requests on behalf of a web browser. Both types are often implemented on networks.
A proxy server can only be effective if it is the only connection between an internal network and the Internet. As soon as you allow another connection that does not go through a proxy server, your network is at risk.
E-mail (SMTP and POP3) E-mail, like the web, is almost everywhere these days. All corporate business cards now have e-mail addresses on them. Communications have been enhanced to the point where large amounts of information can be conveyed almost instantaneously as well as efficiently. E-mail is just a logical, digital version of the U.S. postal system. Digital messages are sent from a computer on one end to a recipient computer. But the message doesn’t go directly from sender to recipient; instead, it passes through several computers on its way to its destination. Internet e-mail is a store-and-forward messaging system, which means the message sits in one location (stored) until a server process moves it to the next location (forwarded). This process repeats until the message arrives at its destination. Figure 2.10 shows a sample Internet mail setup. FIGURE 2.10
Let’s discuss each protocol and how they work together within the Internet to provide the Internet with its messaging system.
SMTP The Simple Mail Transfer Protocol (SMTP), as its name suggests, is the TCP/ IP suite protocol used to transfer mail between Internet hosts. SMTP is most commonly used between mail clients and mail servers as well as between mail servers. Like HTTP, SMTP uses TCP. SMTP initiates communications on TCP port 25. All SMTP conversations (either between client and server or between servers) work basically the same way. The sender opens a connection on TCP port 25, and the recipient responds that it is ready by sending back its name, address, and SMTP mail program version. The mail-sending process can then begin. During this process, SMTP uses special SMTP commands to send the mail. Each command has a special function within the SMTP communications process. To illustrate some of the most common STMP commands, let’s examine a simple SMTP communication: 220 mail.somewhere.net ESMTP Sendmail 8.9.3/8.9.3; Tue, 3 Aug 1999 08:52:14 -0500 (CDT) HELO corpcomm.net 250 ns1.corpcomm.net Hello fgo1-a9.corpcomm.net [209.74.93.19], pleased to meet you mail from:[email protected] 250 [email protected]... Sender ok rcpt to:[email protected] 250 [email protected]... Recipient ok data 354 Enter mail, end with "." on a line by itself Test Please ignore this message. David G. . 250 IAA22065 Message accepted for delivery
The first line (the line starting with 220) is the line that the SMTP server responds with, indicating that it is ready to start the conversation. As previously mentioned, this line includes the version of the SMTP service the recipient is running (in this case, Sendmail). The next line (starting with HELO) indicates what domain the sending computer is from. The receiving computer will then verify that the sending computer is actually at the domain it says it is from. This particular feature is fairly new. It was implemented to prevent unauthorized users from using an SMTP mail server to send mail without permission. Once the receiving computer has verified that it is who it says it is, the sender then uses the mail from command to indicate who sent the mail. The e-mail address that appears after the mail from command is the address that appears in the From line in the header of the sent e-mail. The rcpt to: line tells the receiving computer who the mail’s intended recipient is. This line specifies the e-mail address that appears on the To line of an e-mail. The last part of this conversation begins with the data command, which indicates to the receiving computer that what follows is the actual body of the e-mail. After the data command, the sending system sends all the data that is part of that e-mail. To signify the end of the data, the sending computer sends a . on a line by itself. The final line indicates that the mail was sent successfully.
POP3 When an e-mail gets sent over the Internet, it uses SMTP until it reaches the mail server at its destination. The e-mail then is stored on the mail server until the client is ready to download it. From there, Post Office Protocol 3 (POP3) is the protocol used to download the mail from the server to the mail client. Most Internet e-mail clients today use SMTP for sending e-mail and the POP3 protocol for downloading received mail.
For more information on POP3, see RFC 1081, which can be found at www.cis.ohio-state.edu/htbin/rfc/rfc1081.html.
News Server Most ISPs today have at least one news server to allow their subscribers access to Internet news articles. News servers are those Internet servers that store and distribute Usenet news articles using the Network News Transfer
Protocol (NNTP). As discussed in Chapter 3, NNTP is one of the protocols in the TCP/IP protocol suite. Just like the other Internet servers, news servers use a daemon to respond to requests and to deliver news messages. NNTP clients communicate with the NNTP daemon to send and receive news articles. These news articles are simply text messages that are organized by subject into categories (called newsgroups). For example, there is a newsgroup called alt.autos. studebaker. The messages contained in this newsgroup pertain primarily to Studebaker automobiles. When you want to post a message to the Internet about Studebakers, you send the message to your local news server and designate that it belongs in the alt.autos.studebaker newsgroup. Then, those with similar interests can look at the alt.autos.studebaker newsgroup, see your message, and respond. They can respond by either posting a message back to the newsgroup or e-mailing you directly. The individual newsgroups are stored in directories on the news server. When a client first connects to the news server, it requests a list of all newsgroups stored on the server. The server responds with a complete list of the names of all the newsgroups it stores. This can take a while because, typically, thousands of individual newsgroups exist (a typical ISP lists over 50,000). Once the client chooses a newsgroup to view, it sends a request to the news server to retrieve all the headers (that is, the Subject line, the To line, and the From line) of the messages in the newsgroup. You might be asking, “Why not download all of the messages?” Well, you probably don’t want to read every message, particularly if it’s a very active newsgroup with thousands of messages, so it downloads only the headers so you can decide which message(s) you want to read. Then, when you click a particular message in the client, the client requests the remainder of the message. The server then locates the full message body and returns it to the client. In addition to allowing clients to read messages, NNTP servers will send out messages that they receive, either from clients or from other servers. NNTP servers can be configured to send all their messages to other NNTP servers. Additionally, the servers that “push” news messages can also receive messages. In this way, messages that get posted to one news server will be propagated throughout the Internet. Because of this distribution mechanism, newsgroups are among the most powerful collaboration tools on the Internet.
One of the biggest problems with news servers on the Internet is that practically anyone can access them. Virus writers love to get their particular virus on them, and sometimes even includes a mechanism in the virus to post to certain newsgroups. If you are going to access a news server, ensure that you have a good virus scanner—just in case.
Media Servers Media servers provide audio and/or video content to clients using some form of streaming media. Before streaming media was available, you would have to download audio and video content to your hard drive and then play it later. However, for online presentations, this became cumbersome because it wasn’t in real time. Streaming media allows the client to download the information to the computer in a buffer area. Once the information stored in the buffer has been played, it is discarded and more information is stored. The data retrieved can also be in real time; however, unless you have a highspeed connection to the Internet you may get frustrated with the “start-andstop” phenomena. Media servers are special servers that service client requests for audio and visual content, such as: online presentations, radio broadcasts, television broadcasts, music, online concerts, movies, and more.
FTP Server An FTP server is any server that provides files to clients using the File Transfer Protocol (FTP) protocol (a subset of the TCP/IP protocol suite, discussed in Chapter 3). An FTP server is usually the next server that a company will set up once their web server is installed and operational. HTTP and FTP are complimentary technologies. When you visit a computer company web site and look for technical support information, there will often be an area where you can download support files (for example, patches, documentation, and so on). As a matter of fact, many web software companies have FTP server software that compliments their HTTP server nicely. For example, when you install Microsoft’s IIS, you have the option of installing the IIS FTP server as well. FTP servers are similar in function to HTTP servers in that FTP servers also use a daemon (called an FTP daemon) to respond to client requests. FTP servers are a bit more complex than HTTP servers, however. Whereas HTTP daemons respond to a very limited set of commands, FTP daemons respond
to a wide array of commands. These commands will be covered in more detail in Chapter 4. FTP daemons run on a server and wait until they receive a request for an FTP connection from an FTP client. The FTP daemon then responds to the client and asks the user to log in. The user sends a username and password to tell the server who is requesting the file. Because most Internet sites don’t need to regulate who can download files, many sites allow Anonymous or FTP as a username and will accept any text as the password. Anonymous users are generally not allowed to upload or delete files, but they can download files. Once the user logs in, they can request a file. When the FTP daemon receives this request, it sends the requested file back to the requesting client.
Certificate Server More and more Internet sales transactions are taking place. The number of sales increases exponentially every day, so it stands to reason that there are people who make it their business to intercept sales communications and (criminally) use the information they gain to their advantage. It is therefore necessary to provide a method for private, secure communications. Today, this is done on the Internet with either secure transmissions using Secure Sockets Layer (SSL) or public key/private key cryptography using digital certificates and certificate servers. web sites that use SSL have addresses that begin with https://. Certificates, on the other hand, allow both the client and the server to prove their identity by presenting a digital version of an identity card. Using a special key, this digital identity card (called a digital certificate) is “signed” by a server that both the sender and receiver trust. The server is known as a certificate authority (also called a certificate server). The certificate server uses information from the requester and other third parties (like credit card companies) to verify the identity of the requester and to create the digital certificate. A server that provides e-commerce through certificate authorities or other third-party security is known as a Commercially Secured Server (CSS).
Directory (LDAP) Server The buzzword in the information technology business these days is directory. A directory, as it applies to networking, is a centralized repository of network resource information. This information can be used for network management purposes or other useful applications. A network directory contains information
such as what kind of users, servers, printers, and so on exist on a network. Each item (usually called an object) and its associated properties (for example, phone number, department, address, etc.) can be searched through a standard query language. For example, if you had a network of 250 people and you wanted to e-mail someone but didn’t know their e-mail address, you could use a program to query the directory and find the e-mail address. Directories have been around for years. Only recently have they been popular with the Internet crowd. The biggest use for directories on the Internet is repositories of personal information. A directory server is a server that stores directory information and makes it available to the Internet for searches. A great example of a directory server is Switchboard (www.switchboard.com). With it, you can look for anyone or any business. To look for people, for example, all you need to know is their last name (although you may want to supply the server with more information). There are many public directory servers on the Internet and each one is slightly different. For this reason, a standard request and access protocol was developed. This protocol is the Lightweight Directory Access Protocol (LDAP), and it is used to provide a common access method for the myriad of directories that exist. Additionally, there is a standard method of organizing and naming entries in these directories. This standard is known as the X.500 directory naming scheme, and many Internet directories conform to this naming standard.
It is important to note that LDAP is not a directory, but a directory access method. There is no such thing as an LDAP directory. A more plausible moniker would be an LDAP-compliant directory service. For more information on LDAP, refer to the University of Michigan’s (the developers of LDAP) web site, www.umich.edu/~dirsvcs/ldap/doc/. More information on X.500 can be found at www.nexor.com/info/info.htm
Most public directory servers use two daemons to provide directory services to the network and to the Internet. The first daemon (which has different names depending on the directory service used) is responsible for managing the directory itself. A directory is, for the most part, a large, relational database that requires maintenance to stay current. This first daemon indexes the entire directory so searches can be performed. It also responds to directory calls in its own (non-LDAP) query language and protocol. The second daemon is used to provide LDAP access to the directory. The LDAP daemon waits until an LDAP query is received. The LDAP daemon formats the query in the directory’s
native query language and passes the query on to the directory service daemon. The directory service daemon retrieves the requested information and passes it back to the LDAP daemon, which in turn returns the information to the requesting client.
Connecting through a Firewall A firewall can be a hardware device, software, or a combination of both that prevents unauthorized access to your network from the Internet. Some firewalls are even used to prevent unauthorized access to Internet sites from company employees. While there are several types of firewalls, from servers running special software to built-in firewalls on routers, all work on the same basic principles. A firewall can deny access by protocol, port number, and specific web sites or IP addresses.
Firewalls are covered in detail in Chapter 5.
For example, let’s say that your coworker gave you a great web address that contains some jokes. Unknown to you, your company has already made the determination that this particular site had some potential objectionable material, and has denied access to the web site. Here’s how the process would work: 1. You type the web address into your browser. 2. Your request for the document goes to the firewall. 3. The firewall checks the port number against its list of denied ports,
and sees that it’s a valid port. 4. The firewall then checks its list of denied web addresses and finds a
match. 5. The firewall sends back an “Unauthorized access” page informing you
that the web site is not allowed. If the web address was not listed by name, the firewall could have checked the web site by IP address and still found it. Either way, you’ll have to wait until you get home to view that great site.
In this chapter, you learned the fundamental principles and technology behind the Internet. The first thing we discussed is that the Internet is not some mysterious “thing,” but rather a collection of local area networks that are connected together by high-speed public WAN connections. The make-up of the Internet can be compared to a major city’s roadways. Each client (vehicle) that traverses the Internet must first go through an access point, or ISP, that is similar to an access ramp on a highway. In turn, the access points connect to other access points, changing roads along the way, and eventually connecting to the Internet backbone through a backbone provider. When you travel along a roadway looking for a particular destination, you are usually trying to locate a specific address. The Internet has its own addressing system that uses numbers to locate a specific web site. Because humans have a difficult time remembering many numbers (after all, we’re not computers), addresses on the Internet have been converted to character names that are more easily remembered. Domain Name Services (DNS) allows for the translation of the easily remembered character addresses into the numeric addresses that computers use. We also introduced you to the basic hierarchical structure of DNS, that it is formed from a root domain that splits into (currently) seven top-level domains. After we delved into how your computer communicates to a particular web site, we discussed several different forms of Internet servers and the types of services that they provide. You were able to see some of the “behind the scenes” communication that occurs whenever you request a web document, download e-mail, and so forth.
Exam Essentials Understand how the Internet is formed from interconnected networks. Identify the different roles of ISPs, from backbone ISPs to network access points, and how they interconnect to each other through routers and WAN links. Identify the different types of servers and their function. Distinguish between e-mail servers, web servers, FTP server, and so on. Know how each one works and what type of content or service it provides on the Internet.
Know the DNS hierarchical structure and be able to identify the top-level domains. Understand that DNS works from a hierarchical structure, with a root domain and seven (current) top-level domains. Identify the content that belongs in each of the seven top-level domains. Identify the different DNS entry types. Know the difference between an address record, a mail exchange record, and a common name (CNAME) record. Identify each record when it is presented to you. Understand the function of Nslookup. Know that Nslookup is a mechanism used to look up information regarding a specific domain.
Key Terms
Before you take the exam, be certain you are familiar with the following terms: record
Review Questions 1. Which protocol is used between a web server and a web browser when
HTML documents are downloaded during a web browsing session? A. HTML B. HTTP C. FTP D. TELNET 2. NSLookup provides information on which of the following? A. Person B. Domain name C. Web site D. ISP 3. What TCP port do HTTP requests use by default? A. 80 B. 25 C. 13 D. 8 4. What TCP port do SMTP requests use by default? A. 13 B. 21 C. 25 D. 80
5. Which TCP/IP suite protocol is primarily used to download Internet
e-mail from an e-mail server? A. SMTP B. HTTP C. HTML D. POP3 6. Which TCP/IP service resolves host names into IP addresses and
vice versa? A. HTTP B. DNS C. POP3 D. SMTP 7. Which command during an SMTP communications session indicates
the actual e-mail recipient to the receiving computer? A. rcpt to B. data C. helo D. mail to 8. Which items are sent by the receiving server to the sending entity during
an SMTP communications session? A. IP address of receiving server B. DNS name of receiving server C. Name and version of receiving server D. Name and version of sending entity
9. An ISP that connects directly to the Internet, rather than another ISP,
is called a(n) _____. A. top-level ISP B. backbone provider C. backbone ISP D. Internet service 10. Which protocol(s) can be used to download files from an Internet
server? A. HTTP B. TELNET C. FTP D. FILE 11. To which component of the Internet can individual users buy modem
connections so that they can get on the Internet? A. Backbone ISP B. Access point ISP C. WAN connection 12. Which of the following is a common name record? A. mail.company.com IN B. host.company.com.
13. Which HTTP request method allows a browser to indicate that it
wants a specific file? A. GET B. PUT C. HOLD D. HEAD 14. Which HTTP request method allows a browser to upload a file to a
web server? A. GET B. PUT C. HOLD D. HEAD 15. Which DNS root-level domain is classified for schools, colleges, and
other educational institutions? A. sch B. col C. edu D. com 16. Which DNS root-level domain is classified for commercial entities? A. com B. edu C. org D. gov
Answers to Review Questions 1. B. HTTP is the protocol used for this process. Although FTP can be
used for downloads, it is generally not used during the web browsing session to download HTML files to the browser. HTML and TELNET are invalid answers. 2. B. While a web site could have an entire domain name to itself, some
domain’s host more than one web site. ISPs are entities that are looked up by browsing the Web, and not by a utility. 3. A. HTTP is used on TCP port 80 (by default). Port 25 is used for
SMTP and ports 13 and 8 are for other uses not discussed in this chapter. 4. C. SMTP uses TCP port 25, HTTP uses port 80, FTP uses port 21, and
TCP port 13 is used for a special purpose called Daytime (not discussed in this chapter). 5. D. POP3 is the protocol used to download mail from an
e-mail server. SMTP is used to send (upload) mail to an e-mail server. HTTP and HTML generally do not get involved in the client-server e-mail process. 6. B. Domain Name Services (DNS) resolves host names into IP
addresses (and vice versa). HTTP is used for web requests, and POP3 and SMTP are used for receiving and sending e-mail. 7. A. data designates the body of the message, helo starts the communi-
cation session. mail to is not an actual command. 8. A, C. The only items that are sent during an SMTP communications
session are the IP address of the receiving server and the name and version of the receiving server. 9. B. While a backbone ISP may sound correct, the technical term is
backbone provider. 10. A, C. Both HTTP and FTP can be used to download files from an
Internet server. TELNET is used to control a Unix host remotely, and FILE tells the browser to go and get a file stored either locally on the computer or over a LAN.
11. B. Backbone ISPs and WAN connections form the main structure of
the Internet. Generally speaking, it is prohibitively expensive to get either a connection to a backbone ISP or your own WAN connection. 12. C. A record with an A in it is an address record, so A is the wrong
choice. MX records belong to a mail exchange, so that is wrong as well. D is wrong, as a common name record is denoted by CNAME. 13. A. PUT is used to send a request to a web server. 14. B. PUT is used for uploading files, HEAD is used to retrieve only the
header information, and HOLD is not a valid answer. 15. C. edu is used for educational institutions, com is for commercial
companies, the other two don’t exist. 16. A. edu is used for educational institutions, org is generally for non-
profit organizations, and gov is for government institutions. 17. D. rcpt to indicates the recipient, HELO starts the communications
session, and data indicates the body of the message. 18. A. POP3 is used to download new mail from an e-mail server. SMTP
is used for sending Internet e-mail. LDAP is used for directory queries, and TCP is a transport protocol. 19. D. GET is for retrieving the entire body, PUT is for uploading files, and
HOLD doesn’t exist. 20. B. Modems and WAN links are communications lines to and from
the Internet access point, so A and C are wrong. A router is a networking device used by networks to router information, so choice D is wrong. An ISP provides access to the Internet, so choice B is correct.
Protocols I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 3.4 Understand and be able to describe the capabilities of popular remote access protocols. Content could include the following:
SLIP
PPP
PPTP
L2TP
PPPOE
Point-to-point/multipoint
3.5 Understand how various protocols or services apply to the function of their corresponding server, such as a mail server, a web server, or a file transfer server. Content could include the following:
ow that you’ve seen the basic infrastructure of the Internet, and some of the different forms of communication between client and server, it’s time to take a closer look at how information moves across it’s boundaries—the protocol. A protocol is nothing more than a set of rules that govern a particular operation. The Internet has many protocols, but the ones you’re interested in pertain to network communications. Network communications take place using network communications protocols. A network communications protocol is a set of rules that govern network communications. If two computers are going to communicate, they both must be using the same protocol. The Internet uses many different protocols (most are a subset of the TCP/ IP protocol suite, which is discussed below). Each protocol governs a specific function (like e-mail, web browsing, and file transfer). In this chapter, you will learn about protocols, their functions, and which protocols are used on the Internet.
The TCP/IP Protocol Suite
T
he Transmission Control Protocol/Internet Protocol (TCP/IP) is really a collection, or suite, of protocols that operate together to provide data transport services for the Internet. The TCP/IP protocol suite is the only protocol suite used on the Internet. Because TCP/IP is so central to working with the Internet and intranets, we’ll discuss it in detail. Then we’ll discuss some of the protocols that make up the TCP/IP protocol suite. We’ll start with
some background on TCP/IP and how it came about and then describe the technical goals defined by the original designers.
An intranet is a private network that is based on Internet technology. It uses HTTP documents (web pages) to easily disseminate information—employee benefits, company documents, company events, and so on—to company employees. Intranets are not usually accessible to non-company employees for security reasons, and, until a recent move in the industry to allow employees to telecommute, were not accessible from the Internet. Because intranets use the same technologies as the Internet, most of the information found in this book applies to intranets as well.
A Brief History of TCP/IP The TCP/IP protocol was first proposed in 1973, but it was not until 1983 that a standardized version was developed and adopted for wide use. In that same year, TCP/IP became the official transport mechanism for all connections to ARPAnet, a forerunner of the Internet. Much of the original work on TCP/IP was done at the University of California, Berkeley, where computer scientists were also working on the Berkeley version of Unix (which eventually grew into the Berkeley Software Distribution [BSD] series of Unix releases). TCP/IP was added to the BSD releases, which in turn was made available to universities and other institutions for the cost of a distribution tape. Thus, TCP/IP began to spread in the academic world, laying the foundation for today’s explosive growth of the Internet. During this time, the TCP/IP family continued to evolve and add new members. One of the most important aspects of this growth was the continuing development of the certification and testing program carried out by the U.S. government to ensure that the published standards, which were free, were met. Publication ensured that the developers did not change anything or add any features specific to their own needs. This open approach has continued to the present day; use of the TCP/IP family of protocols virtually guarantees a trouble-free connection between many hardware and software platforms.
TCP/IP Design Goals When the U.S. Department of Defense began to define the TCP/IP network protocols, their design goals included the following:
It had to be independent of all hardware and software manufacturers. Even today, this is fundamentally why TCP/IP makes such good sense in the corporate world; it is not tied to IBM, Novell, Microsoft, DEC, or any other specific company.
It had to have good built-in failure recovery. Because TCP/IP was originally a military proposal, the protocol had to be able to continue operating even if large parts of the network suddenly disappeared from view, say after an enemy attack.
It had to handle high error rates and still provide completely reliable end-to-end service.
It had to be efficient with a low data overhead. The majority of data packets using the IP protocol have a simple, 20-byte header, which means better performance when compared with other networks. A simple protocol translates directly into faster transmissions, giving more efficient service.
It had to allow the addition of new networks without any service disruptions.
As a result, TCP/IP was developed with each component performing unique and vital functions that allowed all the problems involved in moving data between machines over networks to be solved in an elegant and efficient way. The popularity that the TCP/IP family of protocols enjoys today did not arise just because the protocols were there or even because the U.S. government mandated their use. They are popular because they are robust, solid protocols that solve many of the most difficult networking problems and do so elegantly and efficiently. Let’s now examine the two major components of the TCP/IP protocol suite, the Transmission Control Protocol and the Internet Protocol, as well as their makeup and functions.
Benefits of Using TCP/IP Rather Than Other Networking Protocols for the Internet There are several reasons why TCP/IP was chosen as the primary protocol for the Internet:
TCP/IP is a widely published open standard and is completely independent of any hardware or software manufacturer. Of all the protocols in use today, it is the most ubiquitous and, because of its widespread availability, is a natural choice for the Internet.
TCP/IP can send data between different computer systems running completely different operating systems, from small PCs all the way to mainframes and everything in between.
TCP/IP is separated from the underlying hardware and will run over Ethernet, Token Ring, or X.25 networks and even over dial-up telephone lines. Because of this feature, the Internet can use many different types of physical media, including phone lines and network links.
TCP/IP is a routable protocol, which means it can send datagrams over a specific route, thus reducing traffic on other parts of the network.
TCP/IP has reliable and efficient data-delivery mechanisms. This is a major advantage on the Internet when links constantly go up and down.
TCP/IP uses a common addressing scheme. Therefore, any system can address any other system, even in a network as large as the Internet. (This addressing scheme will be covered in “Understanding IP Addressing" later in this chapter.)
The Transmission Control Protocol The Transmission Control Protocol (TCP) serves to ensure reliable, verifiable data exchange between hosts on a network. TCP breaks data into pieces, wrapping it with the information needed to route it to its destination, and reassembling the pieces at the receiving end of the communications link. The wrapped and bundled pieces are called datagrams. TCP puts on the datagram a header that provides the information needed to get the data to its destination. The most important information in the header includes the source and destination port numbers, a sequence number for the datagram, and a checksum. Because it can ensure delivery, TCP is known as a connection-oriented protocol.
The source port number and the destination port number allow the data to be sent back and forth to the correct process running on each computer. The sequence number allows the datagrams to be rebuilt in the correct order in the receiving computer, and the checksum allows the protocol to check whether the data sent is the same as the data received. It does this by first totaling the contents of a datagram and inserting that number in the header. This is when IP enters the picture. Once the header is in the datagram, TCP passes the datagram to IP to be routed to its destination. The receiving computer then performs the same calculation, and if the two calculations do not match, an error occurred somewhere along the line and the datagram is resent. Figure 3.1 shows the layout of the datagram with the TCP header in place. FIGURE 3.1
A datagram with its TCP header
In addition to the source and destination port numbers, the sequence number, and the checksum, a TCP header contains the following information: Acknowledgment Number Indicates that the data was received successfully. If the datagram is damaged in transit, the receiver throws the data away and does not send an acknowledgment back to the sender. After a predefined time-out expires, the sender retransmits data for which no acknowledgment was received. Offset Specifies the length of the header. Reserved Variables set aside for future use. Flags Indicates that this packet is the end of the data or that the data is urgent. Window Provides a way to increase packet size, which improves efficiency in data transfers. Urgent Pointer Gives the location of urgent data.
Options A set of variables reserved for future use or for special options as defined by the user of the protocol. Padding Ensures that the header ends on a 32-bit boundary. The data in the packet immediately follows this header information.
A Summary of TCP Communications You must remember a few things specifically about TCP communications:
Flow control allows two systems to cooperate in datagram transmission to prevent overflows and lost packets.
Acknowledgment lets the sender know that the recipient has received the information.
Sequencing ensures that packets arrive in the proper order.
Checksums allow easy detection of lost or corrupted packets.
Retransmission of lost or corrupted packets is managed in a timely way.
The Internet Protocol The network routing and addressing portion of TCP/IP is called Internet Protocol (IP). This protocol is what actually moves the data from point A to point B, in a process called routing. IP is referred to as a connectionless-oriented protocol; that is, it does not swap control information (or handshaking information) before establishing an end-to-end connection and starting a transmission. The Internet Protocol must rely on TCP to determine that the data arrived successfully at its destination and to retransmit the data if it did not. IP’s only job is to route the data to its destination. In this effort, IP inserts its own header in the datagram once it is received from TCP. The main contents of the IP header are the source and destination addresses, the protocol number, and a checksum.
You may sometimes hear IP described as unreliable because it contains only minimal error detection or recovery code.
Without the header provided by IP, intermediate routers between the source and destination, commonly called gateways, would not be able to determine where to route the datagram. Figure 3.2 shows the layout of the datagram with the TCP and IP headers in place. FIGURE 3.2
A datagram with TCP and IP headers
Take a look at the fields in the IP header: Version Defines the IP version number. Version 4 is the current standard, and values of 5 or 6 indicate that special protocols are being used. IHL (Internet Header Length) Defines the length of the header information. The header length can vary; the default header is five 32-bit words, and the sixth word is optional. TOS (Type of Service) Indicates the kind or priority of the required service. Total Length Specifies the total length of the datagram, which can be a minimum of 576 bytes and a maximum of 65,536 bytes. Identification Provides information that the receiving system can use to reassemble fragmented datagrams. Flags The first flag bit specifies that the datagram should not be fragmented and therefore must travel over subnetworks that can handle the size without fragmenting it; the second flag bit indicates that the datagram is the last of a fragmented packet. Fragmentation Offset Indicates the original position of the data and is used during reassembly. Time to Live Originally, the time in seconds that the datagram could be in transit; if this time was exceeded, the datagram was considered lost. Now interpreted as a hop count and usually set to the default value 32 (for 32 hops), this number is decreased by each router through which the packet passes.
Protocol Identifies the protocol type, allowing the use of non-TCP/IP protocols. A value of 6 indicates TCP, and a value of 17 indicates User Datagram Protocol (UDP). Header Checksum An error-checking value that is recalculated at each stopover point; necessary because certain fields change. TCP Header The header added by the TCP part of the protocol suite. The data in the packet immediately follows this header information.
Gateways and Routing As we mentioned, routing is the process of getting your data from point A to point B. Routing datagrams is similar to driving a car. Before you drive off to your destination, you determine which roads you will take to get there. Along the way, sometimes you encounter congestion on the road and have to alter your route. The IP portion of the TCP/IP protocol inserts its header in the datagram, but before the datagram can begin its journey, IP determines whether it knows the destination. If it does, IP sends the datagram on its way. If it doesn’t know and can’t find out, IP sends the datagram to the host’s default gateway. Each host on a TCP/IP network has a default gateway, an off-ramp for datagrams not destined for the local network. They’re going somewhere else, and the gateway’s job is to forward them to that destination if it knows where it is. Each gateway has a defined set of routing tables that tell the gateway the route to specific destinations. Because gateways don’t know the location of every IP address, they have their own gateways that act just like any TCP/IP host. In the event that the first gateway doesn’t know the way to the destination, it forwards the datagram to its own gateway. This forwarding, or routing, continues until the datagram reaches its destination. The entire path to the destination is known as the route. Datagrams intended for the same destination may actually take different routes to get there. Many variables determine the route. For example, overloaded gateways may not respond in a timely manner or may simply refuse to route traffic and time out. That time-out causes the sending gateway to seek an alternate route for the datagram.
Routes can be predefined and made static, and alternate routes can be predefined, providing a maximum probability that your datagrams travel via the shortest and fastest route.
Ports and Sockets Explained On a TCP/IP network, data travels from a port on the sending computer to a port on the receiving computer. A port is an address that identifies the application associated with the data. The source port number identifies the application that sent the data, and the destination port number identifies the application that receives the data. All ports are assigned unique 16-bit numbers in the range 0 through 32,767. Today, the very existence of ports and their numbers is more or less transparent to the users of the network because many ports are standardized. Thus, a remote computer knows which port it should connect to for a specific service. For example, most servers that offer Telnet services do so on port 23, and web servers normally run on port 80. This means that when you dial up the Internet to connect to a web server, you automatically connect to port 80, and when you use Telnet, you automatically connect to port 23. The TCP/IP protocol uses a modifiable lookup table to determine the correct port for the data type. Table 3.1 lists some of the well-known port numbers for common protocols.
Many companies are now allowing remote administration of their networks, including their firewalls, across the Internet; however, hackers are well aware of these standard port numbers. If you’re going to set up remote administration via the Internet, you do not want to use the standard port numbers. To keep your network safe, never use the standard ports for administration purposes.
TABLE 3.1
Well-Known Port Numbers for Common Protocols Number
Well-Known Port Numbers for Common Protocols (continued) Number
Protocol
79
Finger
80
Hypertext Transfer Protocol (HTTP)
110
Post Office Protocol 3 (POP3)
119
Network News Transfer Protocol (NNTP)
In multi-user systems, a program can define a port on-the-fly if more than one user requires access to the same service at the same time. Such a port is known as a dynamically allocated port and is assigned only when needed, for example, when two remote computers dial in to a third computer and simultaneously request Telnet services on that system. The combination of an IP address (more on IP addresses in a moment) and a port number is known as a socket. A socket identifies a single network process in terms of the entire Internet. You may hear or see the words socket and port used as if they were interchangeable terms, but they are not. Two sockets, one on the sending system and one on the receiving host, are needed to define a connection for connection-oriented protocols, such as TCP. Sockets were first developed as a part of the BSD Unix system kernel, in which they allow processes that are not running at the same time or on the same system to exchange information. You can read data from or write data to a socket just as you can with a file. Socket pairs are bi-directional so that either process can send data to the other.
Understanding IP Addressing
As you saw in “The Internet Protocol” section earlier in this chapter, IP moves data between computer systems in the form of a datagram, and each datagram is delivered to the destination port number that is contained in the datagram header. This destination port number, or address, is a standard 16-bit number that contains enough information to identify the receiving network as well as the specific host on that network for which the datagram is intended.
In this section, we’ll go over what IP addresses are, why they are so necessary, and how they are used in TCP/IP networking. But first, we need to clear up a possible source of confusion—Ethernet addresses and IP addresses.
Ethernet Addresses Explained You may remember that in an earlier section we mentioned that TCP/IP is independent of the underlying network hardware. If you are running on an Ethernet-based network, be careful not to confuse the Ethernet hardware address and the IP address required by TCP/IP. Each Ethernet network card (and any other NIC, for that matter) has its own unique hardware address, known as the media access control (MAC) address. This hardware address is predefined and preprogrammed on the NIC by the manufacturer of the board as a unique 48-bit number. The first three parts of this address are called the OUI (Organizationally Unique Identifier) and are assigned by the Institute of Electrical and Electronics Engineers (IEEE). Manufacturers purchase OUIs in blocks and then assign the last three parts of the MAC address, making each assignment unique. Remember, the Ethernet address is predetermined and is hard-coded onto the NIC. A MAC address is a Data Link layer address used in the header of an Ethernet frame. An IP address is a Network layer address. IP addresses are very different; let’s take a look.
IP Addresses Explained TCP/IP requires that each computer on a TCP/IP network have its own unique IP address. An IP address is a 32-bit number, usually represented as a four-part number with each of the four parts separated by a period or decimal point. You may also hear this method of representation called dotted decimal or quad decimal. In the IP address, each individual byte, or octet as it is sometimes called, can have a usable value in the range 1 through 255.
The term octet is the Internet community’s own term for an 8-bit byte, and it came into common use because some of the early computers attached to the Internet had bytes of more than 8 bits; some of DEC’s early systems had bytes of 18 bits.
The way these addresses are used varies according to the class of the network, so all you can say with certainty is that the 32-bit IP address is divided in some way to create an address for the network and an address for each host. In general, though, the higher-order bits of the address make up the network part of the address, and the rest constitutes the host part of the address. In addition, the host part of the address can be divided further to allow for a subnet address. We’ll be looking at all this in more detail in the “IP Address Classifications” and “Understanding Subnets” sections later in this discussion. Some host addresses are reserved for special use. For example, in all network addresses, host numbers cannot be all 0s or all 255s. An IP host address with all host bits set to 0 identifies the network itself; so 52.0.0.0 refers to network 52. An IP address with all host bits set to 255 is known as a broadcast address. The broadcast address for network 204.176 is 204.176.255.255. A datagram sent to this address is automatically sent to every individual host on the 204.176 network.
When addressing nodes, you can never give a host an address where the host portion of the IP address is all zeros or all ones.
ICANN (Internet Corporation for Assigned Names and Numbers) assigns and regulates IP addresses on the Internet. Just as ICANN works with registrars to assign domain names, you can get an IP address from one of the registrars. Typically, your Internet Service Provider (ISP) is already a registrar and can secure an IP address on your behalf. Another strategy is to obtain your address from a registrar and only use it internally until you are ready to connect to the Internet.
Intranets and Private IP Addresses If you are setting up an intranet and you don’t want to connect to the outside world through the Internet, the IP addresses that you use on your intranet don’t need to be registered with ICANN. These IP addresses are called private IP address because they are only valid on the local intranet. Registering your addresses with ICANN simply ensures that the addresses you propose to use are unique over the entire Internet. These addresses are known as public IP addresses because they are known and valid across the entire Internet. If you never connect to the Internet, there's no reason to worry about whether your private addresses are redundant with a computer that isn't on your network.
The current TCP/IP addressing scheme version is known as IP version 4, or IPv4 (there is a new revision that hasn’t been universally accepted yet, called IPv6, but you’ll learn about that later in this chapter). In the 32-bit IPv4 address, the number of bits used to identify the network and the host varies according to the network class of the address. You’ll need to know that the several classes are as follows:
Class A is used for very large networks only. The high-order bit in a Class A network is always 0, which leaves 7 bits available to define 127 networks. The remaining 24 bits of the address allow each Class A network to hold as many as 16,777,216 hosts. Examples of Class A networks include General Electric, IBM, HewlettPackard, Apple, Xerox, DEC, Columbia University, and MIT. All the possible Class A networks are in use, and no more are available.
Class B is used for medium-sized networks. The two high-order bits are always 10, and the remaining bits are used to define 16,384 networks, each with as many as 65,534 hosts attached. Examples of Class B networks include Microsoft and Exxon. All the Class B networks are in use, and no more are available.
Class C is for smaller networks. The three high-order bits are always 110, and the remaining bits are used to define 2,097,152 networks, but each network can have a maximum of only 254 hosts. Class C networks are still available from some ISPs.
Class D is a special multicast address and cannot be used for networks. The four high-order bits are always 1110, and the remaining 28 bits allow access to more than 268 million possible addresses.
Class E is reserved for experimental purposes. The first four bits in the address are always 1111.
Figure 3.3 illustrates the relationships among these classes and shows how the bits are allocated by ICANN.
Because the bits used to identify the class are combined with the bits that define the network address, you can draw the following conclusions from the size of the first octet, or byte, of the address:
A value of 126 or less indicates a Class A address. The first octet is the network number; the next three, the host address.
A value of exactly 127 is reserved as a loopback test address. If you send a message to 127.0.0.1, it should get back to you unless something is wrong with your computer. Using this number as a special test address has the unfortunate effect of wasting more than 24 million possible IP addresses.
A value of 128 through 191 is a Class B address. The first two octets are the network number, and the last two are the host address.
A value of 192 through 223 is a Class C address. The first three octets are the network address, and the last octet is the host address.
A value greater than 223 indicates a reserved address.
Another special address is 192.168.xxx.xxx, an address specified in RFC 1918 as being available for anyone who wants to use IP addressing on a private network but does not want to connect to the Internet. If you fall into this category, you can use this address without the risk of compromising someone else’s registered network address. RFC 1918 also reserves the 10.xxx.xxx.xxx networks and the 172.16.xxx.xxx through 172.32.xxx.xxx networks.
IPv4 vs. IPv6: The Next Generation With the explosive growth of the Internet, very few public IP addresses are left. There are no Class A addresses left; few, if any, Class Bs; and the Class C addresses that are available are strictly regulated. We are experiencing this shortage of public IP addresses mainly because the current IP addressing scheme, IP version 4 (IPv4 for short), uses a 32-bit addressing scheme that allows for around 17 million node addresses. Because of this shortage of IP addresses, a new version of IP, designated IPv6, has been specified and is just beginning to take hold in the industry. It uses a 128-bit addressing scheme that will allow for more than 70 octillion (70 followed by 27 zeros). That’s enough for each person on earth to have more than a million IP addresses. Should be enough, don’t you think? As this book was being written (Summer 2001), IPv6 had still not received widespread use, but vendors have begun shipping products that support the new version. Once the products have become widely available, the switch to IPv6 should happen rapidly.
Understanding Subnets
T
he current IP addressing scheme provides a flexible solution to the task of addressing thousands of networks, but it is not without problems. The original designers did not envision the Internet growing as large as it has; at that time, a 32-bit address seemed so large that they quickly divided it into different classes of networks to facilitate routing rather than reserving more bits to manage the growth in network addresses. (Who ever thought we would need a PC with more than 640KB of memory?) To solve this problem,
and to create a large number of new network addresses, another way of dividing the 32-bit address, called subnetting, was developed. An IP subnet modifies the IP address by using host address bits as additional network address bits. In other words, the dividing line between the network address and the host address is moved to the right, creating additional networks but reducing the number of hosts that can belong to each network. When IP networks are subnetted, they can be routed independently, which allows a much better use of address space and available bandwidth. To subnet an IP network, you define a bit mask known as a subnet mask, in which a bit pattern cancels out unwanted bits so that only the bits of interest remain. Working out subnet masks is one of the most complex tasks in network administration and is not for the faint of heart. If your network consists of a single segment (in other words, there are no routers on your network), you will not have to use this type of subnetting. But if you have two or more segments (or subnets), you will have to make some sort of provision for distributing IP addresses appropriately. You can do just that by using a subnet mask. The subnet mask is similar in structure to an IP address in that it has four parts, or octets, but now it defines three elements (network, subnet, and host) rather than two (network and host). It works a bit like a template that, when superimposed on top of the IP address, indicates which bits in the IP address identify the network and which bits identify the host. If a bit is on in the mask, that equivalent bit in the address is interpreted as a network bit. If a bit is off in the mask, the bit is part of the host address. The 32-bit value is then converted to dotted-decimal notation. In general, you will only use one subnet mask on your network. A subnet is only known and understood locally; to the rest of the Internet, the address is still interpreted as a standard IP address. Table 3.2 shows how this all works for the most commonly used standard IP address classes. TABLE 3.2
Default Subnet Masks for Standard IP Address Classes
Default Subnet Masks for Standard IP Address Classes (continued)
Class C
Subnet Mask Bit Pattern
Subnet Mask
11111111 11111111 11111111 00000000
255.255.255.0
Routers then use the subnet mask to extract the network portion of the address so that they can send the data packets along the proper route on the network. Because all the Class A and Class B networks are taken, you are most likely to encounter subnet-related issues when working with a Class C network. In the next section, we’ll describe in detail how to subnet a Class C network.
The Advantages of Subnetting Although subnetting is complex, it does have some advantages:
It reduces the size of routing tables.
It minimizes network traffic.
It isolates networks from others.
It maximizes performance.
It optimizes IP address space.
It enhances the ability to secure a network.
How to Subnet a Class C Network
You can subnet any class IP address, but the most common practice today is to subnet a Class C IP address block, so we’ll discuss that process here. How do you find out the values you can use for a Class C network subnet mask? Remember from the previous discussion that ICANN defines the leftmost three octets in the address, leaving you with the rightmost octet for
your own network addresses? If your network consists of a single segment, you have the following subnet mask: 11111111 11111111 11111111 00000000 When expressed as a decimal number, your address is as follows: 255.255.255.0 Because all of your addresses must match these leftmost 24 bits, you can do what you like with the last 8 bits, given a couple of exceptions that we’ll look at in a moment. You might decide to divide your network into two equally sized segments, say with the numbers 1 through 127 as the first subnet (00000001 through 01111111 in binary) and the numbers 128 through 255 as the second subnet (10000000 through 11111111 in binary). Now the number inside the subnets can vary only in the last seven places, and the subnet mask becomes: 255.255.255.128 In binary this is: 11111111.11111111.11111111.10000000
Use the Windows Calculator in scientific mode (choose View Scientific) to look at binary-to-decimal and decimal-to-binary conversions. Click the Bin (binary) button and then type the bit pattern that you want to convert. Click the Dec (decimal) button to display its decimal value; you can also go the other way and display a decimal number in binary form.
Now let’s get back to the exceptions we mentioned. The network number is the first number in each range, so the first subnet’s network number is X.Y.Z.0, and the second’s is X.Y.Z.128 (X, Y, and Z are the octets assigned by ICANN). The default router address or default gateway is the second number in each range, X.Y.Z.1 and X.Y.Z.129, and the broadcast address is the last address, or X.Y.Z.127 and X.Y.Z.255. You can use all the other addresses within the range as you see fit on your network. Table 3.3 describes how you can divide a Class C network into four equally sized subnets with a subnet mask of 255.255.255.192. This gives
you 61 IP addresses on each subnet once you have accounted for the network, router, and broadcast default addresses. TABLE 3.3
Class C Network Divided into Four Subnets Network Number
Router Address
Broadcast Address
X.Y.Z.0
X.Y.Z.1
X.Y.Z.63
X.Y.Z.64
X.Y.Z.65
X.Y.Z.127
X.Y.Z.128
X.Y.Z.129
X.Y.Z.191
X.Y.Z.192
X.Y.Z.193
X.Y.Z.255
Table 3.4 describes how you can divide a Class C network into eight equally sized subnets with a subnet mask of 255.255.255.224. This gives you 29 IP addresses on each subnet once you have accounted for the network, router, and broadcast default addresses. TABLE 3.4
Class C Network Divided into Eight Subnets Network Number
ICANN no longer gives out addresses under the Class A, B, or C designations. Instead, it uses a method called Classless Internetwork Domain Routing (CIDR), usually pronounced “cider.” CIDR networks are described as “slash x” networks; the x represents the number of bits in the IP address range that ICANN controls. This allows ICANN to define networks that fall between the old classifications and means that you can get a range of addresses much better suited to your needs than in times past. In CIDR terms, a network classified as a Class C network under the old scheme becomes a slash 24 network because ICANN controls the leftmost 24 bits and you control the rightmost 8 bits. Table 3.5 shows some example slash x network types. TABLE 3.5
he most common way of accessing the Internet for the majority of the people connected is to use some kind of remote access protocol. A remote access protocol manages the connection between a remote computer and a remote access server. Each remote access protocol allows a remote computer to access a remote network in some fashion. In the case of Internet connections, the remote network is the Internet and the remote access protocols allow the remote computer to submit requests and receive data from the Internet. Three primary remote access protocols are in use today:
Serial Line Internet Protocol (SLIP)
Point-to-Point Protocol (PPP)
Point-to-Point Tunneling Protocol (PPTP)
Layer 2 Tunneling Protocol (L2TP)
Point-to-Point Protocol over Ethernet (PPPoE)
Serial Line Internet Protocol (SLIP) In 1984, students at the University of California, Berkeley, developed SLIP for Unix as a way to transmit TCP/IP over serial connections (such as modem connections over POTS). SLIP operates at both the Physical and Data Link layers of the OSI model. Today, SLIP is found in many network operating systems in addition to Unix. It is being used less frequently with each passing year, though, because it lacks features when compared with other protocols. Although a low overhead is associated with using SLIP and you can use it to transport TCP/IP over serial connections, it does no error checking, packet
addressing can only be used on serial connections, and it does not support encrypted password methods. SLIP is used today primarily to connect a workstation to the Internet or to another network running TCP/IP. Setting up SLIP for a remote connection requires a SLIP account on the host machine and usually a batch file or a script on the workstation. When SLIP is used to log in to a remote machine, a terminal mode must be configured after login to the remote site so that the script can enter each parameter. If you don’t use a script, you will have to establish the connection and then open a terminal window to log in to the remote access server manually.
It is difficult to create a batch file that correctly configures SLIP. Our advice is to avoid SLIP when possible and use PPP instead.
Point-to-Point Protocol (PPP) PPP is used to implement TCP/IP over point-to-point connections (for example, serial and parallel connections). It is most commonly used for remote connections to ISPs and LANs. PPP uses the Link Control Protocol (LCP) to communicate between a PPP client and a host. LCP tests the link between a client and a PPP host and specifies PPP client configuration. PPP can support several network protocols, and, because it features error checking and flow control and can run over many types of physical media, PPP has almost completely replaced SLIP. In addition, PPP can automatically configure TCP/IP and other protocol parameters. On the downside, high overhead is associated with using PPP, and it is not compatible with some older configurations. From the technician’s standpoint, PPP is easy to configure. Once you connect to a router using PPP, the router assigns all other TCP/IP parameters. This is typically done with DHCP (Dynamic Host Configuration Protocol). Within the TCP/IP protocol stack, DHCP is the protocol that is used to assign TCP/ IP addressing information, including host IP address, subnet mask, and DNS (Domain Name Services) configuration. This information can be assigned over a LAN connection or a dial-up connection. When you connect to an ISP, you are most likely getting your IP address from a DHCP server. PPP has two built-in authentication methods: Password Authentication Protocol (PAP) and the Challenge Handshake Authentication Protocol (CHAP).
Password Authentication Protocol (PAP) A simple authentication method that returns the username and password in unencrypted text (clear text). Challenge Handshake Authentication Protocol A more complex authentication method than PAP in which the username and password are encrypted before transmission.
Point-to-Point Tunneling Protocol (PPTP) PPTP is the Microsoft-created sibling to PPP. It is used to create virtual connections across the Internet using TCP/IP and PPP so that two networks can use the Internet as their WAN link yet retain private network security. PPTP is both simple and secure. To use PPTP, set up a PPP session between the client and server, typically over the Internet. Once the session is established, create a second dial-up session that uses PPTP to dial through the existing PPP session. The PPTP session tunnels through the existing PPP connection, creating a secure session. In this way, you can use the Internet to create a secure session between the client and the server. Also called a virtual private network, this type of connection is very inexpensive when compared with a direct connection. PPTP is a good idea for network administrators who want to connect several LANs but don’t want to pay for dedicated leased lines. But, as with any network technology, there can be disadvantages, including the following:
PPTP is not available on all types of servers.
PPTP is not a fully accepted standard.
PPTP is more difficult to set up than PPP.
Tunneling can reduce throughput.
You can implement PPTP in two ways. First, you can set up a server to act as the gateway to the Internet and to do all the tunneling. The workstations will run normally without any additional configuration. You would usually use this method to connect entire networks. Figure 3.4 shows two networks connected using PPTP. Notice how the TCP/IP packets are tunneled through an intermediate TCP/IP network (in this case, the Internet).
A PPTP implementation connecting two LANs over the Internet IP packets
IP packets
PPTP servers
LAN IP packets tunneled inside PPP packets using PPTP Internet
The second way to use PPTP is to configure a single, remote workstation to connect to a corporate network over the Internet. The workstation is configured to connect to the Internet via an ISP, and the VPN client is configured with the address of the VPN remote access server. The VPN then exists between the VPN client and VPN server. PPTP is often used to provide VPN functions to connect remote workstations to corporate LANs when a workstation must communicate with a corporate network over a dial-up PPP link through an ISP and the link must be secure. An example of this configuration is shown in Figure 3.5. FIGURE 3.5
A workstation is connected to a corporate LAN over the Internet using PPTP IP packets Dial-up or other TCP/IP connection
ISP PPTP server
LAN IP packets tunneled inside PPP packets using PPTP Internet
Windows 98, Windows Me, NT 4, and Windows 2000 include this functionality. You must add it to Windows 95.
Layer 2 Tunneling Protocol (L2TP) The Layer 2 Tunneling Protocol (L2TP) was created from Microsoft’s PPTP protocol and Cisco’s Layer 2 Forwarding (L2F) protocol, and combines the strength of the two protocols while at the same time addressing some of their shortcomings. L2F is a protocol that provides tunneling services that are not dependent upon the IP protocol, and is used for remote access to a network over the Internet. Primarily, L2F is used with VPN technology. Without the dependency upon IP, L2F can work directly with WAN links, such as ATM and frame relay. Another major benefit of L2F is that it defines multiple connections within a single tunnel. L2TP uses PPP to establish a dial-up connection that can be tunneled through the Internet to a particular site. Instead of using the original tunneling protocol provided by L2F, L2TP has redefined it based upon L2F’s design. When a connection is established, the built-in authentication mechanisms from PPTP (PAP and CHAP) are used to authenticate to the remote system; however, L2TP does not define it’s own encryption mechanism to completely protect the tunnel that is created. Instead, it relies on Secure IP (IPSec) technology to provide the encryption. IPSec is a collection of protocols that support secure Internet services, such as VPN and encryption.
Point-to-Point over Ethernet (PPPoE)
W
ith the advent of broadband media (xDSL, cable modem, wireless, and so on), access to high-speed networks has become possible. Unfortunately, providing seamless integration from the client to the ISPs network has been difficult because typical ISPs use PPP for dial-up connections, xDSL providers use Ethernet, and so on. Point to Point Protocol over Ethernet (PPPoE) combines PPP support over an Ethernet connection, allowing a client to use the protocol to connect to an ISP or broadband access provider.
In addition to TCP and IP, the TCP/IP protocol suite has provisions for several different protocols; each has a different function. The i-Net+ exam will test your basic knowledge of these protocols. The other TCP/IP protocols include (but are not limited to):
HTTP
FTP
POP3
SMTP
NNTP
LDAP
TELNET
Many of these protocols are covered in more detail in other chapters, but in the following sections, you’ll gain a basic understanding of what each protocol does.
Hypertext Transfer Protocol (HTTP) The Hypertext Transfer Protocol (HTTP) is the command and control protocol used to manage communications between a web browser and a web server. When you access a web page on the Internet or on a corporate intranet, you see a mixture of text, graphics, and links to other documents or other Internet resources. HTTP is the protocol that initiates the transport of each of the components of a web page.
For more info on the HTTP communications process, see Chapter 2. Also, the HTTP client is covered in Chapter 4.
File Transfer Protocol (FTP) The File Transfer Protocol (FTP) is a TCP/IP protocol that provides a mechanism for single or multiple file transfers between computer systems; FTP is also the name of the client software used to access the FTP server running on the
remote host. The FTP package provides all the tools needed to look at files and directories, change to other directories, and transfer text and binary files from one system to another. File Transfer Protocol uses TCP through port 21 to actually move the files. We’ll look at how to transfer files using FTP in detail in Chapter 4.
Simple Mail Transfer Protocol (SMTP) The Simple Mail Transfer Protocol (SMTP) is the protocol responsible for moving messages from one e-mail server to another. It is also the protocol used to send e-mail from a client to an e-mail server. The e-mail servers run either Post Office Protocol (POP3) or Internet Mail Access Protocol (IMAP) to distribute e-mail messages to users. All e-mail servers that send e-mail to the Internet must be using TCP/IP and an e-mail program that can send e-mail using SMTP.
The SMTP communications process is covered in more detail in Chapter 2. Also, SMTP clients are discussed in more detail in Chapter 4.
Post Office Protocol 3 (POP3) Post Office Protocol 3 (POP3) is the protocol used to download mail from an Internet (SMTP) mail server. POP3 servers provide a storage mechanism for incoming mail. When a client connects to a POP3 server, all the messages addressed to that client are downloaded; messages cannot be downloaded selectively. Once the messages are downloaded, the user can delete or modify messages without further interaction with the server. In some locations, POP3 is being replaced by another standard, Internet Mail Access Protocol (IMAP).
Network News Transfer Protocol (NNTP) This is the protocol used to transport Internet news (also called Usenet news) between news servers. It is also the protocol used to transport these news articles between news servers and news clients. This protocol is often confused with the Network Time Protocol (NTP), which serves a different purpose.
Lightweight Directory Access Protocol (LDAP) This protocol is seeing increased use as network directories see increased use. The Lightweight Directory Access Protocol (LDAP) is the protocol used to make simple requests of a network directory (like NDS, X.500, or Active Directory). LDAP requests can consist of requests for names, locations, and other information like phone numbers and e-mail addresses. Many web browsers (including Navigator and Internet Explorer) contain LDAP clients so they can request information from directory servers.
TELNET This protocol is a terminal emulation protocol that allows a workstation to perform a remote logon to another host over the network. It is used primarily to allow users at workstations to access a Unix server and run commands just as if they were sitting at the server’s console.
Looking at Protocols Graphically Figure 3.6 shows how some of the components we’ve been discussing fit together within the TCP/IP protocol suite. The top layers are the various Internet server applications. Notice that the top layers rely on different bottom layers for transport and other functions. FIGURE 3.6
The components in a TCP/IP block diagram
Refer back to Figure 3.6 to help you remember where each protocol fits in the TCP/IP protocol suite.
The TCP/IP protocol suite is the protocol used for all communications on the Internet. In this chapter, you learned how DoD’s ARPANET developed the first TCP/IP network to create a reliable network that could withstand segment failure, and at the same time allow different networks to talk to each other. We then delved into the depths of the two portions of TCP/IP and discovered that TCP is a connection-oriented protocol, while IP is a connectionless-oriented protocol. The most important element to understand about TCP/IP protocols is that they work off of an addressing mechanism that is similar to how the post office delivers your mail. Every network device that attaches to the Internet must have an address, similar to the way that your house has an address. IP addresses use a dotted decimal number that is broken down into different classes (Class A through E) that determines how many networks and hosts can be used with any given address. You also learned that subnet masks are used in conjunction with IP addresses to further break down the number of networks and hosts used on a network. In addition to addressing, TCP/IP protocols are separated by function. You learned that remote access protocols—SLIP, PPP, PPTP, L2TP, PPPoE—are used to provide dial-up connectivity and tunneling services to a client. Web pages are sent and received via the Hypertext Transport Protocol (HTTP). E-mail is handled by a sending protocol (SMTP) and a receiving protocol (POP3). Network directories are accessed via the Lightweight Directory Protocol (LDAP).
Exam Essentials Understand the nature, purpose, and operational essentials of TCP/IP. The Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite is the primary protocol suite used on the Internet. Stations that use TCP/IP are assigned (either manually or automatically) a 32-bit, dotteddecimal number called an IP address. It is represented as four three-digit numbers, such as xxx.xxx.xxx.xxx, where each digit can be any number from 0 to 255.
Know the classes of IP addresses. IP addresses are characterized by their class. Table 3.6 details the classes of IP addresses based on the range of the first octet. TABLE 3.6
IP Address Classes, Address Ranges, and Default Subnet Masks
Class
First Octet Address Range
Default Mask
A
0–126
255.0.0.0
B
128–191
255.255.0.0
C
192–223
255.255.255.0
Understand the remote access protocols and their use. You learned the details of the three most popular remote access TCP/IP protocols: SLIP, PPP, and PPTP. SLIP is the most primitive with the fewest features. PPP is the most often used remote access protocol because it supports protocols other than TCP/IP and it supports error checking and flow control. Finally, PPTP is the protocol used to provide VPN services over TCP/IP. Know the various protocols that make up the TCP/IP protocol suite and their function. The TCP/IP protocol suite is made up of many different individual protocols, each with a different purpose and use. Table 3.7 outlines each protocol and its function. TABLE 3.7
TCP/IP Suite Protocols Protocol
Function/Use
HTTP
Transporting requests for Internet content from browsers to web servers and transports content back to requesting browser
Review Questions 1. What is the default subnet mask for a Class C IP address? A. 255.0.0.0 B. 255.255.0.0 C. 255.255.255.0 D. 255.255.255.255 2. Which TCP/IP suite protocol is used to transfer text and multimedia
content between a web browser and a web server? A. SMTP B. HTTP C. POP3 D. LDAP 3. The subnet mask 255.255.255.0 corresponds to what CIDR designation? A. /8 B. /24 C. /29 D. /30 4. Which TCP/IP suite protocol is used for reliable, point-to-point TCP/IP
remote access connections? A. PPP B. PPTP C. CIDR D. TCP
5. If you were given an IP address of 176.58.24.1 for your machine, but
no subnet mask, what subnet mask could you use by default? A. 255.0.0.0 B. 255.255.0.0 C. 255.255.255.0 D. 255.255.255.255 6. Which of the following addresses is an invalid TCP/IP address to
assign to a host? A. 204.67.129.1 B. 7.21.1.1 C. 170.200.1.1 D. 191.260.42.1 7. Which TCP/IP protocol(s) can be used for Internet mail? A. LDAP B. SMTP C. NNTP D. FTP 8. Which subnet mask corresponds to a CIDR designation of /8? A. 255.0.0.0 B. 255.240.0.0 C. 255.255.0.0 D. 255.255.240.0 9. Which remote access technology allows secure TCP/IP network con-
nections over the Internet? A. PPP B. SLIP C. SMTP D. PPTP
10. Which of the following TCP/IP addresses are considered broadcast
addresses? A. 201.123.45.255 B. 34.1.0.0 C. 107.28.94.1 D. 79.0.0.0 11. What is the default TCP port number for a POP3 connection? A. 21 B. 25 C. 80 D. 110 12. Which protocol is used for Internet directory queries? A. SMTP B. LDAP C. POP3 D. NNTP 13. What is the default subnet mask for the IP address 18.204.37.112? A. 255.0.0.0 B. 255.255.0.0 C. 255.255.255.0 D. 255.255.255.255 14. Which TCP/IP protocol was developed by Microsoft to allow TCP/IP
over point-to-point connections, such as serial and parallel connections, first? A. SLIP B. PPTP C. PPP D. PPPoE
15. Which of the following addresses could you assign to an Internet host? A. 192.168.10.2 B. 208.34.109.255 C. 67.22.22.22 D. 255.12.37.109 16. What is the default subnet mask for a Class B IP address? A. 255.255.0.0 B. 0.0.0.255 C. 255.255.255.0 D. 255.0.0.0 17. Which protocol combines the strengths of Microsoft’s PPP protocol
and Cisco’s L2F protocol for dial-up connections? A. SLIP B. L2TP C. PPPoE D. Telnet 18. What is the TCP/IP protocol used for sending and receiving
Answers to Review Questions 1. C. 255.0.0.0 is Class A, 255.255.0.0 is Class B, and 255.255.255.255
is a broadcast address. 2. B. SMTP is used for sending e-mail, POP3 is used for downloading e-
mail, and LDAP is used for directory queries. 3. B. A designation of /24 means the leftmost 24 bits refer to the network
portion of a TCP/IP address and the rightmost 8 bits (the remainder) are used to assign host addresses. The leftmost 24 bits out of 32 bits corresponds to a subnet mask of 255.255.255.0. 4. A. Of all the answers listed, PPP is the only remote access protocol
used for reliable, TCP/IP, point-to-point communications. 5. B. Because the IP address begins with 176, your address would be a
Class B address. The default subnet mask for a Class B address is 255.255.0.0. 6. D. The largest number you can use in a TCP/IP address is 255. 260 is
larger than 255, so it is invalid. 7. B. LDAP is used for Internet directory queries, NNTP is used for Inter-
net news, and FTP is used for file transfer. 8. A. A CIDR designation of /8 means that the leftmost 8 bits of an
address refer to the network portion. This would correspond to a subnet mask of 255.0.0.0 9. D. PPTP is used for secure communications over the Internet. PPP and
SLIP are primarily used for point-to-point communications, and SMTP is used for sending e-mail. 10. A. 201.123.45.255 is the only broadcast address because all host
bits are set to 1 (255 in decimal). 107.28.94.1 is a valid IP address, and 34.1.0.0 and 79.0.0.0 are IP addresses that refer to a specific network.
11. D. 110 is used for POP3 communications. Port 21 is used for FTP
communications, 25 is used for SMTP, and 80 is used for HTTP. 12. B. SMTP is used for sending Internet e-mail, POP3 is used to download
Internet e-mail, and NNTP is used to send and receive Internet news. 13. A. 18.204.37.112 is a Class A address. 255.0.0.0 is the default subnet
mask for a Class A address. 255.255.0.0 is for a Class B, 255.255.255.0 is for a Class C, and 255.255.255.255 is a reserved address. 14. C. SLIP was developed by Microsoft, but does not support parallel
connections. PPPoE is used for broadband communications, such as xDSL and cable modems. While PPTP was developed by Microsoft, it is the later version of PPP. PPP was the original protocol developed to support serial and parallel connections. 15. C. 208.34.109.255 and 255.12.37.109 are reserved addresses for
broadcasts. 192.168.10.2 is a reserved address for intranets and cannot be routed on the Internet. 16. A. Of the answers listed, 255.255.0.0 is the default subnet mask for a
Class B IP address. Default subnet masks are defined as 255.0.0.0 for Class A, 255.255.0.0 for Class B, and 255.255.255.0 for Class C. 17. B. SLIP is a serial line access protocol and was one of the first devel-
oped. PPPoE is a developing standard that allows for PPP connections over Ethernet. Telnet is a Terminal Emulation program that allows you to connect to a network device or system remotely, but doesn’t use PPP or L2F. L2TP combined PPP and L2F for a much more robust protocol. 18. D. FTP is used for uploading and downloading files, HTTP is the pro-
tocol used for transferring HTML documents and images from a web server, and LDAP is used for making directory queries. 19. B. SMTP is used for sending Internet e-mail, PPPoE a remote access
protocol, and Telnet is used to remote into a system over a network or Internet connection.
Internet Clients And Their Configuration I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 1.4 Understand and be able to describe the infrastructure needed to support an Internet client. Content may include the following:
Knowledge of client operating systems
Knowledge of web server platforms
Operating System TCP/IP stack configuration
Network connection
Web browser
E-mail client
Hardware platform
DHCP
Client software configuration
1.5 Use/configure Web browsers and other Internet/intranet clients, and be able to describe their use to others. Content may include the following:
Web browsers
FTP clients
Telnet clients
E-mail clients
All-in-one/universal clients
When to use each type of client
The basic commands (e.g., get and put) for each type of client (e.g., FTP, Telnet, POP3)
n Chapter 2, you learned about the different types of server communication methods that are used on the Internet, how they work, and what they are used for. In Chapter 3, you learned about the protocols that carry information to and from the client and server. The server and protocols are useless, however, without an interface between the human and the computer. This interface of hardware and software components is known as an Internet client. The Internet client formats server requests, sends the requests to the server, and displays the results when they are received from the server. In this chapter, you’ll learn about the most common clients used on the Internet, the requirements for using them, and how to configure them.
Internet Client Requirements
F
or an Internet client to be able to request data—web pages, e-mail, Internet news, files, and so on—from the Internet, there are some minimum requirements that must be met. These requirements include:
Hardware
Operating system
TCP/IP
Internet connection
While the purpose of some requirements are obvious, you should at least be aware that they must be present.
Hardware Hardware can be defined as any physical item that you can touch. Internet clients do require some type(s) of hardware to be able to run. The following sections discuss all of the hardware issues relating to Internet clients, including the following:
Hardware requirements
Internet client hardware platforms
Connection hardware
You’ll learn the impact each item has on Internet client use.
Hardware Requirements Each client software package has its own hardware requirements, usually listed on the side of the box or on the manufacturer’s web site. If the minimum hardware requirements aren’t met, the software either won’t run at all or will run poorly. The following list includes some of the hardware requirements you’ll come across for client software: Minimum processor speed Specifies the slowest possible processor (CPU) on which the client will run. Although the software will run if the processor in your PC is the same as this value, to realize the best possible performance it is commonly recommended that you have a processor in your computer that is newer (faster) than the specified processor. Minimum RAM Specifies the minimum amount of memory (RAM) you must have installed in your PC for the client software to run correctly. The specification is usually given in megabytes (MB). However, for best performance, make sure the RAM configuration in your computer exceeds this requirement.
Operating systems do have a maximum limit on how much RAM it can access, and exceeding the maximum limit can actually degrade a PCs performance. For example, one client that I assisted ordered 512 MB of RAM on a Microsoft Windows 98 computer from a vendor, and found that the computer ran at almost a crawl. Windows 98 has an optimum performance rating at 192 MB, and anything over 256 MB begins to slow down the machine. For more information on your particular operating system, please visit the vendor’s web site.
Hard disk space required Signifies how much disk space (megabytes, or MB) the client will require to be installed on your system. This number is usually pretty accurate, but it’s never a bad idea to have more than the minimum requirement.
Because many software companies are realizing that software won’t run well at the “minimum” requirements, some are now releasing “suggested” configurations. When at all possible, ensure that your computer is at the suggested hardware level rather than the minimum.
Internet Client Hardware Platforms Internet clients have to run on some type of electronic hardware device. These devices fall into one of two categories, each with its own merits and disadvantages. We’ll describe two of the platforms: the personal computer (PC) and the Internet appliance. Personal Computer Today, many homes have personal computers. A personal computer (PC) is the most common Internet client hardware platform—mainly because it can perform many different functions in addition to serving as an Internet client. A PC can also be used to play games, view movies, listen to your favorite CDs, and run productivity applications (like a word processor or a spreadsheet program). A PC’s main advantage is its ability to perform a number of tasks; however, its main disadvantage is its cost. The cost of PC, though, is continuing to drop, making computer ownership well within most everyone’s reach. In fact, it’s now possible to buy a PC for less than $1,000, which includes the entire system—even a printer! Internet Appliance Those who don’t have a PC in their home may instead have an Internet appliance, such as Microsoft’s WebTV. An Internet appliance is a device that you connect to your television and to a phone line to provide Internet access without a computer. Internet appliances usually come with a keyboard so you can type information into forms and search engines. If your main reason for owning a PC is to search the web, an Internet appliance may be a better choice. However, there are a few drawbacks:
You are required to sign up with the Internet appliance manufacturer’s Internet Service Provider (ISP).
There is little support for JavaScript or other client-side scripting technologies.
It can’t be used for other applications (for example, word processing).
You can’t install third-party utilities on it, such as drive ghosting software (creates a mirror image of the drive), task and scheduling utilities, graphics programs, and so on. If it’s not built into the “box,” the box probably can’t run it.
Other Devices Many other devices can be used as Internet hardware platforms, including cellular phones, Internet phones, and handheld PCs. Many different hardware devices are being created to enable different ways of accessing the Internet.
Connection Hardware Another piece of equipment you need to consider when setting up an Internet client is the connection hardware. Connection hardware is the device(s) you use to connect your computer to your ISP. If you are connecting to the Internet via a regular phone line, you’ll need a modem. As discussed in Chapter 1, a modem is a device that converts the digital signals (electrical impulses) from your computer into analog signals (tones) that can be transmitted over the telephone. When these signals reach the other end, the receiving modem converts the analog signals back to digital signals so the computer can understand what’s being transmitted. Most new computers come with a modem and Internet connection software already installed. If you are connecting your computer to a LAN that is already connected to the Internet, you must install a network interface card (NIC) to be able to get your PC on the Internet. As discussed in Chapter 1, the NIC converts the signals from your computer into a format the network can understand. The network administrator has already installed the hardware (such as routers, CSU/DSUs, and so on) that are required to connect the LAN to the Internet, so the NIC just connects your PC to the LAN and, thus, to the Internet.
Operating System (OS) In addition to having a computer of some sort, you must have an operating system installed on your computer so the computer knows how to run applications and do “useful” things (like browsing the Internet). An operating
system controls and manages all the functions of the computer on which it is installed. Additionally, it provides the interface between the user and the computer and its applications. For the i-Net+ exam, you must know that the computer you are using to connect to the Internet must have an operating system installed on it (you can’t use the computer without an OS). Furthermore, you must understand that for any Internet clients you install, your computer must be running the required OS version or else the client won’t install properly (or at all). For example, if you are installing a web browser and the OS requirements say, “For Windows 95/98,” that means this client only runs on the Windows 95 or Windows 98 operating system. If you try to install it on a Macintosh, it won’t work (it actually won’t even install). The operating systems that are currently in use include:
Microsoft Windows 95/98
Microsoft Windows Millennium Edition (Me)
Microsoft Windows NT
Microsoft Windows 2000
Unix and Linux
MacOS
Currently, Microsoft Windows products have dominated the market; however, the industry has been taking a closer look at the many varieties of Linux products that are available, including Linux office suites that enhance productivity. Because you may encounter multiple operating systems in the real world, it is a good idea to learn something about each type of operating system (not to mention being more marketable!) .
TCP/IP Protocol Stack Another requirement that all Internet clients have in common is that the TCP/IP protocol must be installed and running. The TCP/IP protocol stack is one of several protocol stacks. As discussed in Chapter 3, a protocol stack is suite of protocols that work together, and the TCP/IP protocol stack is the protocol used on the Internet. If the TCP/IP protocol is not installed and configured correctly, the Internet clients will be unable to send data to and receive data from the Internet. Thankfully, most operating systems include TCP/IP support.
Although there are many client platforms, for the i-Net+ exam, you will only have to know how to configure Windows 95/98 clients, so we will cover only those in this section. Take note, however, that in the “real world,” it is to your advantage to know about the many different client platforms available (including Windows 95/98, Windows NT/2000, Linux, MacOS, OS/2, and so on) and how to configure each to connect to the Internet. The important items that we will discuss in the sections that follow are:
IP address configuration
Name resolution
Winsock compliance
While dial-up connections are also required, we will cover them later in this chapter.
IP Address Configuration The first step to IP address configuration is to ensure that TCP/IP has been installed on the client. You can double-check that it has been installed by following these steps: 1. Open the Network Control Panel (found in Start Settings Control
Panel in Windows 95/98), and see if TCP/IP is listed (as shown in the following screen shot).
2. If TCP/IP isn’t listed, click Add, and the Select Network Component
Type dialog box appears.
3. Select Protocol from the list of components and then click Add. The
Select Network Protocol window appears from which you can pick the manufacturer and the appropriate protocol. For TCP/IP, select Microsoft from the list on the left and TCP/IP from the list on the right. Click OK to install the protocol.
Once the TCP/IP protocol is installed, you can proceed to configure its properties. The three addresses that must be configured are:
We will take an in-depth look at each one and how to configure it in the next few sections. Client TCP/IP Address The first address that needs to be configured is the client IP address you must assign to the client PC so it can send and receive data on the Internet. As discussed in Chapter 3, it is a 12-digit, dotted decimal number that uniquely identifies the client PC on the Internet. All clients that will communicate with the Internet must have a client IP address. Client IP addresses are assigned from the IP Address tab of the TCP/IP protocol Properties window. You can assign them either manually or automatically. To assign an address manually, called a static IP address, select the Specify an IP Address radio button and type in the IP address you want to assign (as shown in Figure 4.1). You must ensure that the address you enter follows the IP addressing conventions we discussed in Chapter 3. To assign an IP address to a client PC automatically, called a dynamic IP address, select the Obtain an IP Address Automatically radio button on the IP Address tab of the TCP/IP Properties. This is the default setting. If TCP/ IP is installed on the client PC and this option is enabled, the client PC will query a DHCP server for its TCP/IP address. If you set up a DHCP server on your network, you can give all your client computers (at least the ones with a TCP/IP stack that supports DHCP) IP address information automatically.
DHCP servers can assign to clients information other than TCP/IP addresses, such as subnet masks, default gateways, DNS information, and WINS server information.
The process by which a client PC requests its IP address begins when the client PC boots up. The TCP/IP stack has been configured to obtain its IP address automatically, so it sends out a broadcast on the local network segment, basically saying, “I need an IP address.” Any DHCP servers on the network segment will respond by saying, “I’ve got one for you.” The DHCP server will then assign an IP address (and any other pertinent information) to that client PC. This process is illustrated in Figure 4.2.
A DHCP server assigning an address “I need an IP address. Who can give me one?” “I can.” “OK.” “Your IP address is: 192.128.1.45 with a subnet mask of 255.255.255.0.”
The decision on whether or not to statically address your computer or use DHCP is going to be based on the type of network you have. If you are using a connection to the Internet through an ISP, the majority of the time, you will be using DHCP to get an address. If you are unsure, check with your ISP or network administrator. Also, many ISPs automatically install and configure these network settings on a PC as part of the installation of their software (for example, if you use AT&T WorldNet to access the Internet, when you install the WorldNet CD, it automatically configures the network settings).
If you want to check your TCP/IP configuration on a Windows 95 or 98 machine, use the winipcfg program. To start this program, choose Start Run, type in winipcfg, and click OK. The utility that appears will allow you to view your entire TCP/IP configuration. The same program can be found on Windows NT as ipconfig.
Subnet Mask If you selected Specify an IP Address and entered an IP address manually at the TCP/IP configuration screen, you must also enter the correct subnet mask (in the specified field) for the IP address you enter or the client won’t be able to communicate properly. However, if you selected Obtain an IP Address Automatically, the subnet mask is automatically supplied by a DHCP server. For a detailed explanation of subnet masks, refer back to Chapter 3. Default Gateway The default gateway is the address of the router to which the client will send all TCP/IP traffic that is not addressed to a specific station on the local network. The default gateway address should be entered on any client PC that is attached to a network that is connected to the Internet via a router. The address of the default gateway is another piece of configuration information that can be distributed using a DHCP server.
Name Resolution In addition to specifying the IP addresses for the client, you must specify how the client will resolve host names into IP addresses and vice versa. If you’ll remember from Chapter 2, host names are logical, alphanumeric names given to computers to identify them on a network without using cryptic sequences of numbers that a user would have to remember to access that host. Host names make accessing TCP/IP hosts more “friendly” because it is easier to remember www.sybex.com than it is to remember 10.45.89.129 (at least for most people). There are three ways to configure name resolution on a client PC:
DNS has been covered in previous chapters, but in this chapter, we will discuss where to find the other name resolution methods and how to configure them properly. HOSTS File Configuration The HOSTS file configuration is a name given to any file (usually named HOSTS. or HOSTS.TXT) that performs host name to IP address mapping. The user must manually edit it to be able to add different hosts. For example, let’s say you have a network with five PCs on it, each with its own name and HOSTS file configuration. When you add a sixth PC, you would have to edit the HOSTS file on each PC and add the new host name of the new PC to refer to that new PC by its host name from any PC on the network. While this may not seem too bad, imagine running a network with 50 or 100 PCs. This file exists in various locations on different PCs. On Windows PCs, it can generally be found in the Windows directory (usually C:\WINDOWS) or in the Windows NT directory in C:\WINNT\SYSTEM32\DRIVERS\ETC and is named HOSTS. Figure 4.3 shows a sample HOSTS. file from a Windows 98 PC. This happens to be a hosts file from a PC on a home network. Notice that there are only two entries: 127.0.0.1 is mapped to the local PC (localhost), and the IP address 10.0.0.2 is mapped to the host name S1. This PC will translate the host name S1 back to the IP address 10.0.0.2. FIGURE 4.3
What happens, though, when a second server, S2 (with an IP address of 10.0.0.3), is added to the network? You must edit this HOSTS file (and all the HOSTS files on client PCs on the network) to include the information for the new server. In our example, then, you must start up a text editor (for example, MS-DOS EDIT.COM or Windows Notepad) and open the HOSTS. file. At the end of the file, insert an entry with the IP address of the new server (10.0.0.3) followed by a tab or a few spaces and then the host name you want to assign to that IP address (in this case, S2). Save the file and reboot the computer. After the reboot, the computer will be able to access server S2 by name. Figure 4.4 shows the edited HOSTS. file with the new entry. Notice that the new entry follows the pattern of the other entries. FIGURE 4.4
Updated HOSTS. file
You only have to edit HOSTS files if you are using them as your method of name resolution. If you are using one of the other methods (such as DNS or WINS), you don’t have to edit any HOSTS files; simply make the change at the DNS or WINS server.
Domain Name Services (DNS) The functions of Domain Name Services (DNS) were discussed in earlier chapters, but in this chapter, you’ll learn how to configure a client PC to make DNS requests. If you are using DNS (and not HOSTS files) to resolve host names to IP addresses and vice versa, you must tell your client PC’s TCP/IP stack the IP address of a DNS server to use to resolve these names. To start configuring DNS on a Windows 95/98 PC, use the following steps: 1. Open the Network Control Panel (as discussed earlier). 2. Open the TCP/IP Properties window (as discussed earlier). 3. Click the DNS Configuration tab. From this screen, you can configure
the IP address of the DNS server(s) that your client PC should use to resolve DNS names to IP addresses.
4. To configure DNS resolution on this Windows 95/98 client PC, you
must first enable DNS resolution by clicking the radio button labeled Enable DNS.
5. Once you click the Enable DNS radio button, the bottom half of the
property page will brighten and allow you to enter values for DNS configuration. There are four areas that can be configured on this tab: Host This field allows you to set the actual host name of the Windows 95/98 PC. The default name for this field is the actual name of the PC. This name is usually specified during the installation of Windows. Windows will, by default, make the name of your computer, as seen in the identification tab of the Network Control Panel applet, the same as your host name. It is recommended you keep these names the same. Domain In this field, enter the Internet DNS domain name that represents this entire network.
DNS Server Search Order This field is the most important field on this tab. This is where you specify the IP addresses of the DNS server(s) for the domain specified in the Domain field. More than one server IP address can be specified. If more than one IP address is specified, the client will query the DNS servers in order (from top to bottom). Domain Suffix Search Order If you type a host name in a web browser and leave out the somewhere.com domain name, the entries in this field will be appended to the host name and the client will try to make DNS queries with the new name. For example, suppose you type just “snoopy” in the address line of a web browser; that isn’t a DNS domain name, so the Windows TCP/IP stack will try to resolve the name by appending whatever domain names are in this list. If somewhere.com is in this list, the TCP/IP stack will append somewhere.com to snoopy and try to resolve snoopy.somewhere.com into an IP address. 6. At a minimum, you must enter a host name, a domain name, and at least
one DNS server IP address to configure DNS on a Windows 95/98 client. Simply type in the values in the appropriate fields. For the DNS server IP address, you must first type the IP address of the DNS server in the appropriate field; then click the Add button.
7. Once you have entered the appropriate values, you can click OK to
close the TCP/IP Properties window and then click OK to close the Network Control Panel. Windows will ask you to reboot the client PC. Once rebooted, the client PC will be able to access hosts by DNS name as well as by TCP/IP addresses.
The i-Net+ exam doesn’t cover the details of setting up a DNS server, and thus it is outside the scope of this book. For an excellent reference on DNS servers and their setup, check out DNS and BIND from O’Reilly & Associates.
Windows Internet Name Service (WINS) The Windows Internet Name Service (WINS) is a name resolution service commonly found on Windows NT networks. WINS is used in conjunction with TCP/IP and maps NetBIOS names to IP addresses. For example, suppose you have a print server on your LAN that you have come to know as PrintServer1. In the past, to print to that server, you needed only to remember its name and to select that name from a list. However, TCP/IP is a completely different protocol and doesn’t understand NetBIOS names; therefore, it has no way of knowing the location of that server or its address. That’s where WINS comes in. Each time you access a network resource on a Windows NT network using TCP/IP, your system needs to know the host name or IP address. If WINS is installed, you can continue using the NetBIOS names that you have previously used to access the resources because WINS provides the cross-reference from name to address for you. Configuring WINS name resolution is also done through the TCP/IP Properties window. The WINS Configuration tab on the TCP/IP Properties window allows you to configure the addresses of WINS servers (shown in Figure 4.5). These addresses are stored with the configuration, and TCP/IP uses them to query for NetBIOS host names and addresses when necessary. WINS is similar to DNS in that it cross-references host names to addresses; however, as we mentioned earlier, WINS references NetBIOS names to IP addresses, and DNS references TCP/IP host names to IP address. To view the NetBIOS name of your Microsoft computer, go to the Identification tab of the Network Control Panel. Another major difference between WINS and DNS is that WINS builds its own reference tables dynamically and you have to configure DNS manually. When a workstation running TCP/IP is booted and attached to the network,
it uses the WINS address settings in the TCP/IP configuration to communicate with the WINS server. The workstation gives the WINS server various pieces of information about itself, such as the NetBIOS host name, the actual username logged on to the workstation, and the workstation’s IP address. WINS stores this information for use on the network and periodically refreshes it to maintain accuracy. FIGURE 4.5
The WINS Configuration tab of the TCP/IP Properties window
Microsoft, however, has developed a new DNS record that allows the DNS server to work in perfect harmony with a WINS server. The Microsoft DNS Server software currently ships with Windows NT. Here’s how it works. When a DNS query returns a WINS record, the DNS server then asks the WINS server for the host name address. Thus, you need not build complex DNS tables to establish and configure name resolution on your server; Microsoft DNS relies entirely on WINS to tell it the addresses it needs to resolve. And because WINS builds its tables
automatically, you don’t have to edit the DNS tables when addresses change; WINS takes care of this for you. You can use both WINS and DNS on your network, or you can use one without the other. Your choice is determined by whether your network is connected to the Internet and whether your host addresses are dynamically assigned. When you are connected to the Internet, you must use DNS to resolve host names and addresses because TCP/IP depends on DNS service for address resolution. WINS is disabled by default (as shown previously in Figure 4.5). To configure WINS, follow these steps: 1. First select one of the radio buttons shown, either Enable WINS Resolu-
tion or Use DHCP for WINS Resolution. If you select Use DHCP for WINS Resolution, the client PC will get its WINS server information from a DHCP server, along with its IP address information. 2. If you select Enable WINS Resolution as shown in the following screen
shot, you can manually specify which WINS server(s) to use for NetBIOS host name to TCP/IP address resolution.
3. When you choose Enable WINS Resolution, configuration is much the
same as it is with DNS configuration. Simply enter the IP addresses of the WINS servers, one at a time, and click Add to add them to the list of WINS servers. 4. When you’re finished entering the IP addresses, click OK to close the
TCP/IP Properties window; then click OK to save the changes, and close the Network Control Panel. Windows will ask you to reboot. 5. After the reboot, the client PC will be able to perform WINS resolution.
We didn’t discuss the Scope ID field in this book because it is not often used. However, for your information, it is used to “group” NetBIOS entities together. All entities on a network with the same Scope ID value can send NetBIOS data (such as share lists and domain information) to one another. If you enter a scope ID of 12, this station can only communicate with other NetBIOS entities that have their scope ID set to 12. Most often, this field is left blank so that all computers can communicate with all other NetBIOS entities without restriction.
Winsock Compliance The software that provides TCP/IP support for Windows applications is known as WINSOCK.DLL. You may hear about commercial TCP/IP software that requires “Winsock compliance.” This just means that the software will use the Winsock DLL to connect to the Internet. Most (if not all) Windows Internet clients are Winsock clients. Winsock comes in two varieties: 16-bit and 32-bit. WINSOCK.DLL is a 16bit DLL that runs with older, 16-bit applications and is the original Winsock. WSOCK32.DLL is a 32-bit DLL that runs with newer, 32-bit applications.
Internet Connection This requirement for an Internet client almost goes without saying. After all, an Internet client is useless without a connection to the Internet. There is one exception, however. If you have your own intranet, and you’re not going to provide Internet access, the Intranet clients will be used for intranet access. Many companies that have an intranet also have an Internet connection because it is a valuable tool to offer employees.
The type of Internet connection you should have varies depending on your Internet needs. If you are in charge of connecting your company to the Internet and you have hundreds of computers that need access, you may want a leasedline connection of some kind between your network and your ISP. If you are setting up your computer to connect to the Internet from home, it may only be feasible to have a slower-speed (and thus, cheaper) connection to the Internet like a Plain Old Telephone Service (POTS) dial-up, Integrated Services Digital Network (ISDN), or Digital Subscriber Link (xDSL) connection.
Chapter 1 details the different types of Internet connections and their merits.
The most popular way to connect a client PC to the Internet is with a standard phone line and a modem (what is known as a dial-up Internet connection). Because a dial-up Internet connection is the most popular way of connecting clients to the Internet, the i-Net+ exam will test your knowledge of configuring a computer to make this type of connection. To connect your Windows 95/98 computer to the Internet over a regular modem connection, you must have a few essentials in place, including:
A modem
Windows Dial-Up Networking (DUN) software
A valid access account with an ISP
A configured Dial-Up Networking connection
In the following sections, we’ll cover each item in more detail. Once you get your client connected, you can install a web browser or another client and communicate with the Internet.
Modem To have a dial-up connection, you must have one critical piece of hardware installed on your computer: a modem. As mentioned in Chapter 1, a modem converts the digital signals that your computer uses into analog signals that can be sent over telephone lines. Dial-up connections can use either an internal or external modem. When installing a modem into a Windows 95/98 machine, you must have the correct Windows 95/98 driver for the modem. A modem driver is the software component that manages and controls the modem. Without the correct driver installed, the dial-up connection software would not be able to communicate with the modem and thus would not be able to dial up to the ISP.
Drivers include several embedded strings of characters called modem initialization commands, which are the commands sent to the modem by the communications program to “initialize” it. These commands tell the modem things like how many rings to wait before answering, how long to wait between detecting the last keystroke and disconnecting, and the speed at which to communicate. For a while, each manufacturer had its own set of commands, and every communications program had to have settings for every particular kind of modem available. In particular, every program had commands for the Hayes line of modems (mainly because Hayes made good modems and their command language was fairly easy to program). Eventually, other modem manufacturers began using the “Hayes-compatible” command set. This set of modem-initialization commands became known as the Hayes command set. It is also known as the “AT command set” because each Hayes modem command started with the letters AT (presumably calling the modem to ATtention). Each AT command does something different. The letters AT by themselves (when issued as a command) will ask the modem if it’s ready to receive commands. If it returns “ok,” that means that the modem is ready to communicate. If you receive “error,” it means there is an internal modem problem that may need to be resolved before communication can take place. Table 4.1 details some of the most common modem commands. Notice that we’ve included a couple of extra commands that aren’t AT commands. These items are characters used to affect how the phone number is dialed (including pauses and turning off call-waiting). TABLE 4.1
Common Modem Initialization Commands Command
Function
Usage
AT
Tells the modem that what follows the letters AT is a command that should be interpreted
Used to precede most commands.
ATDT nnnnnnn
Dials the number nnnnnnn as a tonedialed number
Used to dial the number of another modem if the phone line is set up for tone dialing.
ATDP nnnnnnn
Dials the number nnnnnnn as a pulsedialed number
Used to dial the number of another modem if the phone line is set up for rotary dialing.
Common Modem Initialization Commands (continued) Command
Function
Usage
ATA
Answers an incoming call manually
Places the line off-hook and starts to negotiate communication with the modem on the other end.
ATH0 (or +++ and then ATH0)
Tells the modem to hang up immediately
Places the line on-hook and stops communication. (Note: The 0 in this command is a zero.)
AT&F
Resets modem to factory default settings
This command works as the initialization string when others don’t. If you have problems with modems hanging up in the middle of a session or failing to establish connections, use this string by itself to initialize the modem.
ATZ
Resets modem to power-up defaults
Almost as good as AT&F, but may not work if power-up defaults have been changed with S-registers.
ATS0-n
Waits n rings before answering a call
Sets the default number of rings that the modem will detect before taking the modem off-hook and negotiating a connection. (Note: The 0 in this command is a zero.)
Common Modem Initialization Commands (continued) Command
Function
Usage
ATS6-n
Waits n seconds for a dial tone before dialing
If the phone line is slow to give a dial tone, you may have to set this register to a number higher than 2.
comma (,)
Pauses briefly
When placed in a string of AT commands, the comma will cause a pause to occur. Used to separate the number for an outside line (many businesses use 9 to connect to an outside line) and the real phone number (such as, 9, 555-1234).
*70 or 1170
Turns off call-waiting
The “click” you hear when you have callwaiting (a feature offered by the phone company) will interrupt modem communication and cause the connection to be lost. To disable call-waiting for a modem call, place these commands in the dialing string like so: *70, 5551234. Call-waiting will resume after the call is terminated.
Dial-Up Networking Software If you are going to connect your computer to the Internet via a modem and telephone line, aside from configuring the various aspects of the TCP/IP protocol, you will have to configure a Dial-Up Networking connection on Windows 95. The Windows Dial-up Networking software is used to connect Windows 95/98 to various networked systems and is included as part of Windows 95/98. It is not installed by default unless you have a modem installed in the computer when Windows 95/98 is being installed. It is also installed whenever you install a modem in the computer. Bottom line: you cannot connect your Windows 95/98 PC to an ISP (and thus, to the Internet) unless the Windows 95/98 Dial-Up Networking software is installed.
ISP Account In addition to the software and hardware components involved in a dialup connection, you must have a valid access account with an ISP. An ISP account includes a username and password you can use to gain access to the ISP’s servers and to the Internet. ISPs charge a small fee (typically anywhere from $10–$30 per month) for access to the Internet through a modem connection.
Some long distance telephone carriers are now offering “packaged” deals that combine long distance service with Internet service. These deals can sometimes save you quite a bit of money; however, make sure that you research this type of service thoroughly to ensure that it meets your needs before signing up for one.
When you do get an ISP account, they will give you a “configuration sheet” that contains all the information you will need to configure your DialUp Networking connection. Some ISPs have a preconfigured software installation disk with all this information already entered. In that case, all you need to do is install the software and your client PC will be configured. If your ISP doesn’t have a sheet or a disk like this, you can make a “cheat sheet” by asking them a few questions and writing down the answers:
What is the dial-up phone number?
What is my username and password?
What are the DNS names of your e-mail servers (outgoing and incoming)?
The answers to these questions will be needed in the next section, where you need to create a Dial-Up Networking connection (the last three will be used in sections that follow, where you configure the other clients, including web browsers).
Dial-Up Networking Connection The final component of a dial-up connection is a Dial-Up Networking connection script. This script is an icon that represents a collection of preconfigured settings for dialing up to a specific ISP. This Dial-Up Networking script is a function of Windows 95/98 Dial-Up Networking and includes settings like ISP phone number, ISP TCP/IP settings, username and password, and connection name. To create a Dial-Up Networking connection on a Windows 95/98 client, follow these steps: 1. Ensure that all the previously listed items are in place (such as modem,
dial-up networking software, and an ISP account). 2. To start the process, have your information sheet from your ISP (or
your “cheat sheet”) handy and open the Dial-Up Networking folder. You can access this folder either by opening My Computer, or by choosing Start Programs Accessories Dial-Up Networking in Windows 95 (Start Programs Accessories Communications Dial-Up Networking in Windows 98). This folder normally lists any Dial-Up Networking connections you have already configured, but you haven’t configured one yet, so it should be blank.
3. Once you have this window open, you can start to configure a new
connection by double-clicking the Make New Connection icon. 4. There are two fields in the first screen (the following screen shot is
from Windows 98). The first asks you to give a name to this connection (the default is My Connection). You should type in the name of your ISP or some name that indicates to you that this is a Dial-Up Networking connection to your ISP. In this sample case, we’ll use TestISP. The second field asks you which modem this connection should use. This field has a drop-down list that includes more than one modem (if more than one modem is installed). The default for this field is the first modem that’s installed. In this case, the only modem that’s installed is a 56K U.S. Robotics and it’s already selected, so you don’t have to do anything with this field unless you have another modem installed and you want to use that modem. Once you have finished entering the connection name and selecting a modem, click the Next button.
You can click the Cancel button at any point during this process to cancel the configuration of this connection.
The Configure button displayed in the first screen allows you to configure the modem settings, like modem speaker volume, modem connection speed, and manual dialing capabilities. Most often, the defaults for the modem only need to be changed with the more troublesome connections. You will also have a chance to configure these options later after the connection has been made.
5. The next screen is where you’ll enter the phone number of the ISP’s
modem bank. The Area Code field should default to your area code (you should have entered it when you installed the modem). If not, you can change it on this screen. In the Telephone Number field, you should enter the telephone number given to you by the ISP for the ISP’s modem bank. You can also choose the dialing prefix for long distance numbers by selecting your country from the drop-down list labeled Country or Region Code. But you shouldn’t have to select anything because hopefully your ISP is a local phone call!
6. When you have finished entering the phone number for the ISP’s
modem bank, click Next to bring you to the next screen. At this
Using this method, you are accepting all TCP/IP defaults. The default TCP/IP configuration is for the client PC to use the Point-to-Point Protocol (PPP) for dial-up and to get all TCP/IP addresses (including modem IP address, default gateway, and DNS server addresses) from the machine your client dials in to. This is the configuration that 95 percent of all ISPs use, so it will be included on the i-Net+ exam.
Internet Clients Configuration and Use
J
ust as there are many types of Internet content servers, there are many different types of Internet clients. For the most part, each client allows access to a different type of server. In this section, you’ll learn about the more commonly used Internet clients, how they are configured, and how they are used. The clients that we will discuss are as follows:
Web Browser
FTP Client
Terminal Client (TELNET)
News Client
E-Mail Client
All-In-One/Universal Clients
Web Browser When most people think of the Internet, they think of a graphical environment with lots of pictures, audio, and text. It wouldn’t be possible to display this content from web servers without the web client (more commonly called a web browser). A web browser is an application that you use to submit requests for Internet content such as Web pages, graphics, and so on) to a web server using the Hypertext Transfer Protocol (HTTP). The web browser also displays the responses to those requests on the screen. First, we will take a look at some of the components that each web browser has in common, and then follow-up with a closer look at the two most popular web browsers in use today.
Web Browser Components Although there are a few different web browsers available, they all share a similar “look.” Because web browsers today are based, in some way, on the work done by the National Center for Supercomputing Applications (NCSA), they all have at least a few items in common (as shown in Figure 4.6): Browser window This is the main part of the web browser, where the text and graphics of a web page are displayed. Location bar The location bar is the component that displays the location of the web page currently showing in the browser window. If you type the address of a web site into this area and press Enter, the web browser locates the web site and displays its home page. Menu bar As its name implies, this is the part of the browser that contains the menus. Click a word and a menu appears with choices that control the way you use the web browser. Button bar This bar contains buttons that help you navigate within the WWW. The buttons are normally user-friendly and usually perform the operation indicated by their label (for example, the Back button takes you back to the page that was displayed before the current page). Activity indicator In most web browsers, this indicator will be animated when a user has made a request and is waiting for the requested web page or Internet content to display. Status bar At the bottom of the browser window, there is an area called the status bar (see Figure 4.6). It shows what’s happening during the requestresponse sequence of a web browsing session. It will show whether the site has responded and the progress of the response to the original request (usually with an indication of the percentage downloaded). Home Page Web browsers allow you to set up a default page, called a home page, which is displayed every time you start the browser. The home page is usually one of your favorite (or bookmarked) web addresses that you visit frequently. With some companies, the home page is usually set up to point to an intranet page with a legal warning on Internet use. Preferences Diversity is one of the greatest things about humanity, and people translate that diversity to their computers. Just walk by a few cubicles and you’ll see that everyone has their own screen savers, wallpaper, and so on. Web browsers take into account the fact that the people using them are an individual, and allow you to change a variety of preferences— colors for hyperlinks, text, background, and even language.
MIME Types MIME is short for Multipurpose Internet Mail Extension. The purpose of MIME is to allow files other than text files to be transmitted via e-mail (and HTTP). The first purpose of MIME was to allow binary attachments to e-mails without the need to encode the binary attachment into text (using a process known as uuencoding). With MIME-compliant e-mail, attachments are encoded and decoded automatically during transmission and reception. Most e-mail servers and clients in use today use MIME to send attachments. Cookies Cookies are a special text message given to a web browser by a web server. The browser stores this message on the local hard disk (Navigator stores the cookies in a file called COOKIES.TXT). The next time someone using that web browser on that computer visits the same web server, the web browser sends this message back to the web server (which created it). Cookies are used to provide customized web sites for users. The web server asks the user to fill out a form and records the information in the cookie; then, when the user returns, the web browser sends the cookie to the web server and returns the information. The web server then knows who is surfing because of the information contained in the cookie, and thus the web server can create a custom web page for that user. FIGURE 4.6
Configuring a Web Browser In the early days of the World Wide Web, there was only one web browser, NCSA Mosaic. It was a very basic web browser in that it could only display HTML text and GIF-formatted graphics. It was a free browser that you could download from the NCSA (although development rights were later sold to Spyglass). As the Internet grew, so did the number of browsers available. Every browser could display basic HTML and GIF graphics, but some could display the newer graphic format, JPEG (Joint Photographic Experts Group). Problems emerged when a web site designed for one browser couldn’t be displayed in another. Out of this chaos, two clear leaders emerged: Netscape Navigator and Microsoft Internet Explorer, both in some way based on NCSA Mosaic. Netscape Navigator Netscape Navigator was the first browser (apart from NCSA Mosaic) to gain widespread commercial acceptance. Navigator is extremely similar in both appearance and function to Mosaic. This is because it was created by some of the original developers of NCSA Mosaic, including Marc Andreessen. In 1994, Marc left NCSA and, together with James Clark (formerly of Silicon Graphics), started Netscape Communications Corporation. Their first major product was a “Mosaic-killer” called Netscape Navigator, nicknamed Mozilla (after the name of an animated dragon that appeared in the activity indicator). One of the features that made Netscape Navigator more popular than Mosaic was its support for document streaming. That is, Netscape Navigator would display items as it would receive them rather than waiting until it received all the items on a page before displaying them (as Mosaic did). Figure 4.6 shows an example of what Netscape Navigator looks like (actually part of Communicator version 4). Notice the large N in the upperright corner of the browser window (the activity indicator). This indicator is one characteristic that can help you identify which browser you are using. Also, when you are sending and receiving data on the Internet, the N will be animated with stars moving in the background. Currently, Netscape Navigator has been incorporated into a full Internet communications suite known as Netscape Communicator. Communicator includes the standard Navigator component as well as components for reading and composing e-mail, reading and composing Internet news, a collaboration tool, and an instant messaging program called AOL Instant Messenger (AIM).
Netscape allows you to change the N graphic to any other bitmap that you create for this purpose. In fact, Microsoft Internet Explorer v4.0 and above allow you to do the same thing. For information on Netscape Navigator, check out www.netscape.com.
Netscape Navigator version 6.1 is configured through the Preferences menu option, which can be accessed through the Edit menu in the main window of Navigator. This will bring up the Preferences window shown in Figure 4.7. As you can see, there are several pages of preferences that you can set. The categories of preferences that you can change are listed on the left. Click the + sign next to a category to expand it so you can see its subcategories. Notice in Figure 4.7 that some of the categories have already been expanded. Additionally, when you want to view the individual preferences within each category and subcategory, simply click a category or subcategory in the left-hand pane. The specified collection of preferences will appear in the window on the right. FIGURE 4.7
In the Navigator category, you can change the home page, history, language, MIME types (Helper Applications), Smart Browsing, and Internet search engines. If you want to change your home page, simply type the new home page address in the field labeled Location in the section with the heading Home Page. You can also set the current web page as your new home page by simply clicking the button labeled Use Current Page. Navigator will place the address of the current web page in this field for you. You can choose to start Navigator with an HTML page stored on your local hard drive. Simply click the Choose File button and navigate to the HTML page you want to use as your home page; then, when Navigator starts, it will open this HTML file from your hard drive and display it.
You can also start Navigator with either a blank page or the last page you visited instead of a specific web page. Simply select the appropriate radio button in the Navigator Starts With section for the option you want.
Under the Navigator category, you can see that there is a Helper Applications sub-category. This is where you configure MIME types. Ninety-five percent of the time, you won’t have to configure the MIME types. This will be handled automatically by the web browser plug-in installation. The only time you will ever have to change MIME types for a web browser is when two helper applications want to take over responsibility for the same MIME type. For that reason, you must know how to configure MIME types manually when necessary. The four items that you need to configure are: Description of Type This is a brief description of what this particular MIME type is for. If you look back at the preceding screen shot, you’ll see several examples in the Description box. File Extension This field is used to specify what file extensions are to be associated with this MIME type. When Navigator opens a file with one of these extensions, it will open it with the application listed in the Application to Use field. MIME Type This field specifies the actual MIME type definition. Application to Use This field should display the path and executable name for the application that should be used to open a file with this MIME type. You can either type in the path to the executable or select the program using the Browse button.
More information about MIME types can be found at either www.ltsw.se/ knbase/internet/mime.htp or in RFCs 1341, 1521, and 1522 at www.cis.ohiostate.edu/hypertext/information/rfc.html.
The next category that we will look at is the Advanced category. When you click on it, you get a dialog box similar to Figure 4.8. You can see that you have several choices: Cookies, Images, Forms, Passwords, Cache, Proxies, Software Installation, Mouse Wheel, and Desktop Integration. The three most important ones that we will discuss are the cookies, cache, and proxy settings. FIGURE 4.8
Remember from earlier in the chapter that a cookie is a special text message given to a web browser by a web server. The problem with cookies is that this process can happen without the user’s intervention. This poses a security problem because a cookie can contain sensitive information (such as name and address or credit card number) that can be sent without the user’s knowledge. Thankfully, you can configure the browser to notify you about any cookies it receives, as well as their contents. The Advanced category, as shown in Figure 4.9, contains four options that you can configure for cookies: Enable All Cookies This setting will allow Navigator to accept all cookies, no matter where they’re coming from or going to. This is the least secure setting, but it gives the user the most flexibility. Enable for the Originating Web Site Only This setting is the best compromise between security and flexibility. With this setting active, the browser will only exchange a cookie with the server that sent it. It will never send a cookie created at one site to a server at another.
Disable Cookies This setting disables cookies altogether. No cookies will be sent or received. This is the most secure setting, but some web sites won’t work correctly because they require the use of cookies to function properly. Warn Me Before Accepting a Cookie This setting can be used in conjunction with any of the other settings. With this setting active, if a cookie is sent or received, a message will be displayed whenever it needs to be used. This setting, when activated, will help indicate to you which sites are using cookies. In addition to these options, you can view the cookies that are currently stored on your hard drive. It’s a good idea to periodically clear older ones that you won’t need out of your system, since you can gather quite a few without even realizing it.
Hackers are very familiar with the different types of information that cookies store, and have created programs that gather up cookies from your hard drive and send the information to them. These programs, called trojan horses, have been seen in the past and will probably be seen again in the near future. Turning off all cookies may be the security-conscience thing to do, but some sites won’t display if you don’t accept a cookie. Setting the Warn Me Before Accepting a Cookie option is probably your best balance between reality and security.
Under the Advanced category is a sub-category called Cache, which allows you to set caching preferences. Remember from previous chapters that caching servers store web data locally to service client requests on the network. Web browsers also have this capability built-in. You can set the sizes of the memory and disk cache for the browser’s cache to obtain better performance. This is done through the Cache subcategory of the Advanced category in the Navigator Preferences dialog box (as shown in Figure 4.10). This screen shows you the sizes at which these two caches are currently set (1024KB for the memory cache and 7680KB for the disk cache—the default settings). You can increase the sizes of one or both to increase your web browsing performance by simply typing in a new number next to the appropriate cache and clicking OK.
As discussed in Chapter 2, a proxy server increases an entire network’s Internet performance by responding to Internet requests on behalf of the various Internet clients. A web browser must be configured to use or not use a proxy server. Navigator is no exception. Under the Advanced category, you can select the Proxies subcategory. This will display the preferences fields shown in Figure 4.11 (with similar settings). From this dialog box, you can configure Navigator to use a proxy server. FIGURE 4.11
There are three options for proxy configuration: Direct Connection to the Internet When this proxy setting is checked, the web browser will not use a proxy. This proxy setting is the default setting for Navigator. Manual Proxy Configuration With this setting selected, you must manually configure the proxy settings. To configure the actual settings, click the View button. This will change the greyed out options into editable text boxes. Type in the address of each proxy server you want to configure and the port that it operates on in the fields provided. Then click OK to close the Preferences dialog box. Automatic Proxy Configuration URL The manual configuration of proxies is somewhat complex for the novice user. To make configuration easier, Navigator supports the automatic configuration of proxy information via a special configuration URL. To make Navigator configure its own proxy information, simply select this option, and in the field labeled Configuration Location (URL), enter the URL that the proxy server uses to store the proxy server configuration. This URL is generated during the installation of the proxy server software. Once you have entered the URL, click Reload to download the configuration to the browser. Once you have configured the proxy settings, click OK to save the configuration and close the Preferences dialog box. After configuring the proxies for Navigator, you should notice a marked increase in your web surfing performance.
One problem with caching content is that sometimes you may not get the newest page, especially if you are accessing a web site that is frequently updated. If you suspect that you’re not getting the latest information, or if you have problems accessing a specific web page, try using the Clear Cache button (Delete Files under Temporary Internet Files in Internet Explorer.)
If you are interested, information on the Microsoft antitrust case can be found at www.findlaw.com/01topics/01antitrust/microsoft.html or www.microsoft .com/freedomtoinnovate/default.htm.
Because they were late to the party, Microsoft had to put together a browser in a hurry. What they ultimately did was purchase the licensing rights to the majority of the original Mosaic code from Spyglass, and then added a few tweaks and released it as Internet Explorer 1. While Netscape Navigator dominated the browser market, Microsoft was going to make up for lost time by releasing a modification to Windows 95 called the Windows 95 Plus Pack. This software package included a few neat utilities, some games, and the new browser, Internet Explorer (nicknamed IE). Additionally, Microsoft included IE in the OEM release of Windows 95 and NT for distribution to computer manufacturers. Figure 4.12 shows the Microsoft Internet Explorer window (version 5). Again, the distinguishing feature of this browser is the activity indicator. Note that it is now a Windows icon rather than a big N. FIGURE 4.12
You learned how to access and view the preferences for Navigator in the previous section. With IE, the web browser is integrated into the Windows 95/98 interface, so configuration is handled through a control panel. Microsoft has consolidated all Internet settings into the Internet Control Panel (as shown in Figure 4.13). Internet Properties can be opened by choosing Start Settings Control Panel and double-clicking Internet Options. As you can see, this dialog box has several tabs, one for each category of settings.
You can also access these settings by right-clicking on the Internet Explorer icon on the desktop and selecting Properties from the pop-up menu.
FIGURE 4.13
The Internet Properties dialog box
The home page is set from the General tab of Internet Properties (as shown in Figure 4.13). Simply type the address of the home page you want to use in the field labeled Address in the Home Page section. If you want to
use a blank page as your home page, click the Use Blank button. On the other hand, if Internet Explorer is running and the page you want to use as your home page is already displayed, you can click Use Current, and Internet Explorer will place the address of the current web page in the Address field for you. When you have finished configuring the home page, click OK to accept the configuration. Internet Explorer handles MIME types a bit differently. Because of IE’s integration with Windows 95/98, IE leaves the MIME type handling to the Windows 95/98 operating system. Configuring MIME types is a function of Windows Explorer. Additionally, Internet Explorer is intelligent enough that, when you try to view a file type for which IE doesn’t have a MIME type configured, IE will try to download the appropriate helper application automatically. If you have to manually create a MIME type, you can do so through Windows Explorer. Open the Windows Explorer program (Start Programs Windows Explorer). Once it’s open, choose Options from the View menu. In the Options window, choose the File Types tab. This will present a screen similar to the one in Figure 4.14. From this screen, you can add or edit a new MIME type or associate an existing MIME type with a helper application. The addition works exactly the same as it does under Navigator’s preferences. FIGURE 4.14
Adding and editing MIME Types for Internet Explorer
Choosing how Internet Explorer handles cookies is extremely similar to choosing how Navigator handles them. To set the cookie options for IE, open the Internet Properties dialog box, select the Advanced tab, scroll down to the Security section, and find the Cookies subsection (as shown in Figure 4.15). FIGURE 4.15
Configuring cookie settings for Internet Explorer
As you can see, there are three options that control how Internet Explorer handles cookies. You can only choose one option in this subcategory: Always Accept Cookies With this option enabled, IE will always accept cookies from any web site. Prompt Before Accepting Cookies With this option enabled, IE will always ask you before accepting any cookies from any web server. Before each cookie is accepted or rejected, a dialog box will pop up asking you if you want to accept or reject the cookie. If you accept it, the cookie will be saved and surfing will continue as normal. If you reject it, you may receive an error or just not be able to access that web site. Disable All Cookie Use With this setting enabled, IE will never accept a cookie from any web server.
Remember, just as with Navigator, you must balance usability with security. Choose your cookie settings appropriately. For the local cache within IE, only the disk cache is configurable. You can set the local disk cache preferences through the General tab of the Internet Properties dialog box. Once there, click the Settings button in the Temporary Internet Files section of the General tab to bring up the Settings window (shown in Figure 4.16). The section you should be interested in (for the iNet+ exam) is the Temporary Internet Files Folder section. Within this section, you can change the amount of disk space being used to cache HTML documents and images. To increase the amount of disk space being used, click and drag the slider to the right. To reduce it, drag the slider to the left. You can also change the location of the temporary files (indicated by the notation “Current folder”) by clicking the Move Folder button and specifying a new location. FIGURE 4.16
The Connection tab of the Internet Properties dialog box
If you need to configure more than just an HTTP proxy, you can click the Advanced tab and specify addresses for each type of server (as shown in Figure 4.18). You could also specify one address and use that same proxy server for all entities by checking the Use the Same Proxy Server for All Protocols check box. FIGURE 4.18
FTP Program As mentioned in earlier chapters, FTP utilities are used to upload and download files to and from FTP servers. Unlike web browsers, there are many different types of FTP clients. Some clients use text commands on a command line to transfer files while other FTP clients display directories and files in a graphical interface and use mouse-clicks and menu commands to perform the file transfer functions. Luckily, both forms of FTP clients require little to no configuration apart from the initial installation. The only item that might have to be configured in a graphical client is the list of FTP sites that you are going to visit. In this section, you’ll first get a general overview of the types of FTP clients in use today, and how they look. Because FTP clients pretty much work in the same fashion, we’ll take a look at the different commands available for transferring files.
FTP Client Utilities There are three main FTP utilities in use today:
Unix FTP
Windows 95/98 FTP
Web browsers
We will take a look at each of the different types of FTP clients. Unix FTP The first FTP utility that was ever used was the Unix FTP utility. It’s a fairly simple program. The user starts the program by typing FTP at a Unix command prompt. Once the program begins, a command line appears that usually looks something like this: FTP> At the command line, the user types commands to tell the FTP program which file to get, where to get it, and how to get it. Table 4.2 lists some of the popular commands you might use when you’re using the Unix command-line FTP utility to download or upload a file. Also, Figure 4.19 shows a command-line FTP utility in use.
Specifies multiple files (through the use of a filter) to upload to the remote computer and then starts to upload them one at a time.
binary
Binary
Sets the transfer mode for files to binary. This must be set in order to transfer binary files (any file that is not composed of ASCII text) correctly.
ascii
Ascii
Sets the transfer mode for files to ASCII text mode for transferring HTML and other ASCII text documents.
hash
Hash
Toggles the printing of hash marks for each 8K downloaded.
prompt
Prompt
Toggles prompting between each file upload or download using the mput or mget commands, respectively.
quit
Quit
Ends the current FTP session and closes the FTP program.
Windows 95/98/NT FTP Since the release of Windows 95, every version of Windows has included a command-line FTP program that almost exactly duplicates the Unix FTP utility. The commands and their uses are the same. Additionally, the “look and feel” is almost identical (as shown in Figure 4.20). FIGURE 4.20
You can start the Windows FTP utility one of two ways. You can run the Windows Command Prompt (Start Programs MS-DOS or Start Programs Command Prompt) and type FTP to start the FTP utility. You could also start it by choosing Start Run, typing FTP, and clicking OK. Once it starts, it will work almost exactly the same as its Unix counterpart. The main difference is that, in the Windows version, local path names are shown in DOS format instead of Unix format. Graphical FTP Utilities Although FTP was historically a command-line utility, many companies have made graphical interfaces to make the process of transferring files to and from the Internet easier. Of the FTP utilities available for purchase or download, graphic FTP utilities are the most popular. Figure 4.21 shows an example of one such FTP utility, WS_FTP by Ipswitch Software. Rather than using complex command-line commands, graphical FTP utilities such as these represent both the local and host systems on the screen and use buttons and icons for some of the commands you can perform. For example, remember the binary FTP command that changed the transfer mode to binary. Notice in Figure 4.21 that the graphic utility has a radio button for that function. FIGURE 4.21
An example of a graphic FTP utility (WS_FTP)
A graphical FTP utility that you have used but may not know it is your web browser. Most web browsers (including Netscape Navigator and Microsoft Internet Explorer) support transferring files using the FTP protocol. If you access an FTP server with a web browser (either by typing in an FTP URL or
clicking a link that leads to an FTP server), the browser will display the files in a list and allow you to navigate the FTP server’s directory structure as well as download the file by clicking it. Figure 4.22 shows what Netscape Navigator looks like during a typical FTP transaction. FIGURE 4.22
Using Netscape Navigator as an FTP client
Using FTP Clients In recent years, FTP has become a truly cross-platform protocol for file transfer. Because Internet (and, thus, TCP/IP) use has skyrocketed, almost every client (and server) platform has implemented FTP. Windows 95/98 and NT are no exception. Both of their TCP/IP stacks come with a command-line FTP utility (as a matter of fact, they’re basically the same utility). To start the FTP utility, type FTP at a command prompt. The result is an FTP command prompt: FTP>
From this command prompt, you can upload and download files, as well as change the way FTP operates. To display a list of all the commands you can use at the FTP command prompt, type help and press Enter. To get help on a specific command, type help, a space, and then the name of the command. The first step in starting an FTP session is to determine the address of the FTP site and start the FTP utility. The FTP site typically has the same name as the web site, except the first three characters are FTP instead of WWW. For example, Novell, Inc.’s web site is www.novell.com. Its FTP site, on the other hand, is ftp.novell.com. We’ll use this FTP site as an example for the rest of this section. First, start the FTP utility as discussed earlier, and then follow these steps: 1. At the FTP command prompt, type open, a space, and the name of the
FTP server. For example: FTP> open ftp.novell.com
You can also start an FTP session by typing FTP, a space, and the address of the FTP server (for example, FTP ftp.novell.com). This allows you to start the FTP utility and open a connection in one step.
If the FTP server is available and running, you will receive a response welcoming you to the server and asking you for a username, like so: ftp> open ftp.novell.com Connected to ftp.novell.com. 220 nemesis FTP server (Version wu-2.4.2-academ[BETA-14](4) Tue Oct 14 17:57:04 MDT 1997) ready. User (ftp.novell.com:(none)): 2. Enter a valid username, and press Enter.
Most Internet web servers that allow just about anyone to download files allow the username anonymous. Remember to type the username exactly and to double-check as you enter it because usernames are case-sensitive. In addition to anonymous, you can use the username ftp to gain access to a public FTP server. They are both anonymous usernames. Again, remember that FTP (and Unix) usernames are case sensitive.
If you are accessing a private FTP server, the administrator gave you your username and password. If you are accessing a public FTP server with a username such as anonymous, you can use your e-mail address as the password.
You don’t have to enter your entire e-mail address to log in with anonymous. Most FTP server software doesn’t verify the e-mail address, just that it is, in fact, an e-mail address. To do this, it checks for an @ sign and two words separated by a period. Just enter a very short e-mail address to bypass the password (like [email protected]). This is especially helpful if you have a long e-mail address. It’s also more secure if you don’t want lots of junk e-mail.
If you enter the wrong username or password, the server will tell you so by displaying the following and leaving you at the FTP command prompt: 530 Login Incorrect Login failed. You must now start over with the login process. If you are successful, the FTP server will welcome you and drop you back at the FTP command prompt. You’re now ready to start uploading or downloading files. After you log in to the FTP server, you navigate to the directory that contains the files you want. Thankfully, the FTP command-line interface is similar to the DOS command-line interface. This is no surprise because DOS is based on Unix, and FTP is a Unix utility. Table 4.2 in the preceding section lists and describes the common navigation commands for FTP. Remember, these are also case sensitive.
You won’t have to use these command-line commands if you are using a graphical utility. The graphical utility will most likely have buttons or icons to represent these commands.
After you navigate to the directory and find the file you want to download, you must set the parameters for the type of file. Files come in two types:
If you set FTP to the wrong type, the file you download will contain gibberish. When in doubt, set FTP to download files as binary files. To set the file type to ASCII, type ascii at the FTP command prompt. FTP will respond by telling you that the file type has been set to A (ASCII), like so: FTP> ascii Type set to A To set the file type to binary, type binary at the FTP command prompt. FTP will respond by telling you that the file type has been set to I (binary), like so: FTP> binary Type set to I To download the file, you use the get command, like so: FTP> get scrsav.exe 200 PORT command successful. 150 Opening BINARY mode data connection for 'scrsav.exe' (567018 bytes). The file will start downloading to your hard drive. Unfortunately, the FTP utility doesn’t give you any indication of the progress of the transfer. When the file is done downloading, the FTP utility will display the following message and return you to the FTP command prompt: 226 Transfer complete. 567018 bytes received in 116.27 seconds (4.88 Kbytes/sec)
You can download multiple files using the mget command. Simply type mget, a space, and then a wildcard that specifies the files you want to get. For example, to download all the text files in a directory, type mget *.txt.
Uploading a file follows the same procedure as downloading files, but you must have rights on that server. These rights are assigned on a directory-bydirectory basis. To upload a file, log in and then follow these steps: 1. At the FTP command prompt, type lcd to navigate to the directory on
2. Type cd to navigate to the destination directory. 3. Set the file type to ASCII or binary. 4. Use the put command to upload the file.
The syntax of the put command is as follows: FTP> put <destination file> For example, if you want to upload a file that is called 1.txt on the local server, but you want it to be called my.txt on the destination server, use the following command: FTP> put 1.txt my.txt You’ll see the following response: 200 PORT command successful. 150 Opening BINARY mode data connection for collwin.zip 226 Transfer complete. 743622 bytes sent in 0.55 seconds (1352.04 Kbytes/sec)
You can upload multiple files using the mput command. Simply type mput, a space, and then a wildcard that specifies the files. For example, to upload all the text files in a directory, type mput *.txt.
When you’re finished with the FTP utility, simply type quit to return to the command prompt.
Terminal (Telnet) Client
T
elnet clients allow you enter commands on a remote host without actually sitting at that host’s console. A Telnet client takes the keystrokes from the client’s keyboard and sends them to the Telnet daemon running on the host system. The Telnet daemon sends the screen displays back to the Telnet client. The Telnet client then displays the screen updates within the Telnet client window on the client computer. While Telnet is predominately used to remotely access a Unix server or a router on the network, Telnet is frequently used to access MultiUser Domains (MUDs) that allow people to role-play over the Internet. Because
the I-Net+ exam focuses on accessing Unix servers, we will restrict our discuss in this section to remotely accessing Unix servers. The most popular Telnet client is the Windows 95/98/NT Telnet client, mainly because it comes free with all versions of Windows since (and including) Windows 95. Unlike some programs, the Telnet client isn’t found in the Programs submenu under the Start menu (unless someone has added it manually). To start the Windows 95/98 Telnet client, choose Start Run and type in telnet. The Windows Telnet program will appear (as shown in Figure 4.23). FIGURE 4.23
Opening the Windows 95/98 Telnet program
Before you can connect to a Unix host via Telnet, you should set the Telnet terminal preferences so that Telnet displays characters properly. These preferences can be accessed through the Preferences menu item on the Terminal menu. Once selected, this menu item will bring up the window shown in Figure 4.24. From this window, you set the following preferences: Local Echo Some Unix programs will not send back keystrokes that are initiated at the local terminal. For this reason, the local option exists. When Local Echo is selected, Telnet will display a copy of any keystroke on the local screen. You may want to disable this option unless an application needs it. If you don’t, you may see a duplicate of every character you type.
Blinking Cursor This option, when checked, causes the terminal cursor to blink in a steady on-off pattern. Block Cursor This option controls the appearance of the terminal cursor. When Block Cursor is checked, the cursor will be a solid block as big as one character. When the option is unchecked, the cursor will be an underscore ( _ ). VT100 Arrows This option controls how the arrow keys on your keyboard work. When the option is checked, the Telnet client will send the codes so that the arrow keys work like they do on an actual VT100 terminal. Buffer Size This field specifies how many screen lines the Telnet client will keep in memory so you can scroll back and see what you’ve done. Emulation This setting has two options: VT-52 or VT-100/ANSI. Click the radio button for the type of terminal emulation your Unix box requires. If you don’t know, try one. If it doesn’t work, try the other. If neither work, you may need to get a more flexible Telnet program. Fonts This button brings you to a screen where you can change the font (type style), size, and color of the text being displayed. Background Color This button allows you to pick the background color of the Telnet window. The default color is white, but you can change it to any color you prefer.
If you change your background color, make sure you don’t change it to the same color as the text font; if you do, you won’t be able to see the text being displayed.
Once you have set the terminal emulation preferences, you can initiate the connection with the Unix host by choosing Remote System from the Connect menu. This will bring up the window shown in Figure 4.25, which allows you to specify which you are connecting to and how to connect. From this window, you can specify three different items to control how Telnet will connect to the host: Host Name This field is where you can specify the DNS name or IP address of the host you want to connect to. You can simply type in an address, or because Telnet keeps track of all the previous hosts you have connected to, you can select one from the drop-down list. Port This field allows you to use Telnet to connect to any other TCP port (Telnet uses the Telnet port, TCP port 23, by default). You can use Telnet to check the responsiveness of any service that responds to TCP port requests by typing the address of the host you want to connect to in the Host Name field and the appropriate port number in the Port field. (You can also select the port from the drop-down list provided.) TermType If your Telnet host daemon responds to the TermType setting (check your documentation to be sure), you can specify what terminal commands to use from this field. Select the appropriate setting from the drop-down list to configure this setting. Once you’re finished changing the settings, click Connect to connect to the remote host. FIGURE 4.25
Telnet connection settings
When you’re connected, you can do anything you could do if you were sitting in front of that machine. Telnet will transmit the keystrokes to the server and return screen updates. When you are finished, click the Disconnect
option on the Connect menu. You can connect to a new address using the procedure just outlined, or you can choose Exit from the Connect menu to exit the Telnet program.
News Client
N
ews clients (also called newsreaders) allow you to read and post Internet news messages from an Internet news server using the NNTP protocol. Using the news client, you can view a list of all the newsgroups that exist on a specified news server. If you like a particular newsgroup (alt.autos. studebaker, for example), you can configure your news client to show you all the headers (subject lines) of all the messages in that newsgroup. Then, if a particular message looks interesting, you can click that message to read it and, if you wish, to respond to it. Figure 4.26 shows one example of a newsreader: Microsoft Outlook Express Newsreader comes free with Microsoft Internet Explorer and is embedded in Microsoft Outlook 2000 and 2002. As you can see, this client’s screen is divided into four main areas. Across the top of the window are the menu and navigation bars. Below that, along the left side of the window, is the list of news servers and newsgroups you are subscribed to on those servers. To the right of that, in the top pane, is the list of headers of all the messages for the newsgroup that is selected in the left pane. If you click one of these headers, the client downloads the message to your machine and displays the contents of the message in the message display pane (the largest portion of this client, immediately to the right of the newsgroup list). Even though news clients from different vendors may not look exactly the same, most news clients work in a similar fashion. Each client will have a list of news servers along with the list of newsgroups you subscribe to on each server, a window with a list of the message headers for each newsgroup, and a window where you can view the body of the message.
Some other examples of newsreaders include Netscape News and Free Agent. Again, like Microsoft’s Outlook Express Newsreader, they will function similarly.
Reading news is very much like reading e-mail. As a matter of fact, if you use Microsoft Outlook 2000 or Outlook Express version 5 or higher for your e-mail, you use the same application for news and e-mail. To start reading news, you must first configure the newsreader and subscribe to at least one newsgroup. To start the Outlook Express Newsreader, choose Start Programs Microsoft Outlook Express Newsreader. On the left side of the screen is a list of the news servers you have configured for the client. To view a list of the newsgroups you have subscribed to on a particular news server, click the + next to the news server. This will expand the list of newsgroups, as shown in Figure 4.27.
Displaying the list of newsgroups on a particular server
To download and view a particular newsgroup, click the name of the newsgroup you want to view. This will tell the Outlook Express Newsreader to contact the news server, download all the headers for all the messages in the selected newsgroup, and display the list of headers in the window on the upper right (as shown in Figure 4.28). The Newsreader downloads only the header of each message mainly for efficiency. Some messages can be very large, and downloading several hundred messages at a time could take a very long time. You’d spend more time downloading messages than reading them. FIGURE 4.28
Displaying the headers downloaded for a particular newsgroup
To view the body of a particular news message, you only need to click that message. The text of the message will appear in the window below the list of headers (as shown in Figure 4. 29). Similar to Outlook, when you read a message, it changes from boldfaced to regular type. FIGURE 4.29
Viewing a particular news message
If you want to post your own message, click the New Post icon and fill out the New Post form (as shown in Figure 4.30) with the subject of the post and the message you want to share. Make sure you post information that is relevant to the newsgroup or you may get some nasty e-mail messages telling you to pay attention to the topic. When you’re ready, click Post to post the message.
Remember that your e-mail address is posted every time you post a message to a newsgroup unless you configure your newsreader to send a bogus e-mail address when you post. You do not send your e-mail address when you just read news articles.
nternet e-mail clients are those software programs used to send and receive e-mail and communicate with SMTP servers. E-mail clients send mail using the SMTP protocol. They also download e-mail from SMTP servers using the POP3 protocol. E-mail clients are the second most-popular Internet client software installed on computers today (as you might imagine, web browsers are the most popular). E-mail clients typically include a couple of standard features: Inbox The Inbox is a location where all incoming mail resides. Typically, new mail in the Inbox has some kind of designation to differentiate it from mail that hasn’t been read. Some software packages use an unopened envelope icon next to the message to indicate a new message and an opened envelope to indicate a message that has been read.
Outbox The Outbox is a folder where all messages go as soon as you hit the Send button. The messages stay in this folder until they are sent to the server. This feature allows you to see what items are waiting to be sent. The main reason this feature exists is for people who send a lot of e-mail and may want to make a last-minute change after they click Send. However, once the message has been sent to the mail server, it disappears from this folder and can no longer be edited. Sent items This feature is a folder that keeps a duplicate copy of all messages that have been sent. When you send a message, a copy gets sent to the SMTP server and another copy gets placed into this folder. The Sent items folder allows you to keep track of what you have sent and to whom. Address book No e-mail program would be complete without an address book. This feature is a small database of the e-mail addresses of all the people you frequently send e-mail to. You can either type a person’s e-mail address in every time you send e-mail to him or simply select his name from the address book. The program will read the selected person’s e-mail address from the address book and put it in the To line of the new e-mail message, which is much more efficient. There are two major examples of e-mail clients in use today: Microsoft Outlook/Outlook Express and Netscape Messenger. A version of each is available with the associated manufacturer’s web browser product (Internet Explorer and Navigator, respectively). In this section, we’ll cover these two popular e-mail clients and explain what their interfaces look like.
Microsoft Outlook Microsoft Outlook is one of the most popular e-mail clients for several reasons. First, it is included with the Microsoft Office 97, Office 2000, and Office XP office productivity suites, so it is readily available on many computers. Also, it is easy to use because many people are familiar with the Microsoft Office suite of products and it has a similar interface, so using it is second nature. Finally, it includes many other features besides e-mail, including a powerful contact management feature and an integrated calendar and task list.
Figure 4.31 shows the screen of Outlook 2000. There are a few things to note about this graphic. The bar on the left of the screen is known as the Outlook bar, and it contains icons for your Inbox (where mail is received and stored until you move it to another folder), your Contacts folder (basically, a very powerful address book), your Calendar, and other folders. To the right of that is the display of whatever folder happens to be selected in the Outlook bar (in this case, the Inbox). If you select a different folder on the left, a different window will display on the right. Above these two windows are the menu bar and button bar. Items on these two bars are used to perform the various functions in Outlook, like creating new mail, checking for new mail, replying to mail, and organizing mail, to name a few. FIGURE 4.31
Microsoft Outlook window
To send or receive mail with Outlook, you must first start the Outlook program by choosing Start Programs Microsoft Outlook. This will display the screen shown in Figure 4.31. From this window, you will do all of the mail operations.
To send a new mail message, follow these steps: 1. From the main window, click File New Mail Message. The new,
blank message will display in its own window.
2. Just as in Netscape Messenger, you must type the recipient’s e-mail
address in the To line. You can type in multiple e-mail addresses, but you must separate them with a colon. You must then type a subject in the Subject field, and the body of the message goes into the large, white message area. A completed message is shown in the following screen shot. Notice the similarity to the Netscape Messenger mail messages shown in the preceding section.
3. Click the Send button to send the message. This process will cause the
message to be transferred to the Outbox (a folder where items waiting to be sent reside). To complete the send process, you can wait until Outlook performs an automatic “Send and Receive” cycle (which it does every few minutes—a configurable amount of time) or you can click the Send and Receive button on the main screen. If you do click the Send and Receive button, a window indicating that Outlook is connecting to the mail server will appear; then Outlook will display a progress bar indicating how many messages it has sent and how many messages are left to send.
Receiving a message, as with Netscape Messenger, is somewhat automatic. Again, you can either wait for the “Send and Receive” cycle or click the Send and Receive button to have Outlook query the mail server to see if there is any mail to download. If there is, Outlook will download the mail using either the POP3 or IMAP protocol. When a new message arrives, as in Netscape Messenger, the mail will appear in the Inbox boldfaced. Outlook, however, will display an icon of an envelope next to the message. Messages that have been read will have an open envelope next to them (as shown in Figure 4.32).
Notice also that Outlook displays the first few lines of all new messages so that you can decide if you want to read them. You can read the entire message by double-clicking it to bring up the message in its own window (as in Figure 4.33). From there you can read the entire body of the message and close the window when you’re finished. You can also reply to or forward the message from this screen by clicking the Reply or Forward button. If you want to delete this message, select the message in the main screen and click the button with the icon of a trash can on it on the main screen. FIGURE 4.33
Netscape Messenger Unlike Microsoft Outlook, Netscape Messenger (also called Netscape Mail in older versions of Netscape) is just a mail program, although the folks at Netscape may not like us for saying so. It does include a basic address book and the ability to read HTML embedded in the body of an e-mail message. Plus, it is available for free download along with Netscape Navigator. Figure 4.34 shows what Netscape Messenger looks like. Notice that the layout Netscape Messenger is somewhat similar to the layout of Outlook Newsreader. The list of messages (such as the Inbox) is at the top of the screen, and the selected message is viewed in a pane below that. Additionally, Netscape Messenger can display messages in HTML format. Apart from those items, Netscape Messenger is pretty similar to most other mail programs in functionality. FIGURE 4.34
Netscape Messenger
When sending and receiving mail with Netscape Messenger, you must first run the Netscape Mail program. The program can be found under Start Programs Netscape Communicator Netscape Messenger. Once you
start the program, you will see a window similar to the one in Figure 4.34. Within this window, you can do all client functions, including sending and receiving mail. To send a new e-mail, follow these steps: 1. You must first create an e-mail message. To do so, click the New Msg but-
ton on the button bar. This will cause a new message window to appear.
2. From this window, type in the e-mail address of the person you are
sending the mail to. In the message shown in the following screen shot, the recipient is [email protected]. You can type in more than one recipient e-mail address; just press Enter and enter the next address on the next line. 3. Make sure you type a subject in the Subject line so the recipient knows
what the message is about; then type in your message in the large white area below the Subject line.
Always enter a subject so the recipient can tell what the message is about without having to read the whole message. Also, keep the message short and to the point. It’s excruciatingly painful to read a three-page essay on the fact that you need more “sticky” notes! Finally, don’t attempt sarcasm in an e-mail message. It doesn’t work. “You stink” comes across as “You stink!”
4. To send the message you have composed, simply click the Send but-
ton. The message will be sent immediately. A window indicating the status of the sending process will appear. If the send is successful, a message indicating that the mail was sent successfully will appear in this window.
Mail is received pretty much automatically. Every few minutes (the interval is configurable), Netscape Messenger will contact the mail server and determine if there is any mail to download to the mail client. If mail is waiting, the mail client will automatically download it to the client using the POP3 (or IMAP) protocol and display the message headers (the Subject and From lines) of any new messages. Figure 4.35 shows the Inbox of a sample Netscape Messenger client. Notice the difference between a new message and messages that have been read. For new messages, the font is darker (boldfaced) than the font for the messages that have been read.
If you want to force Netscape Messenger to “go and get the mail,” you can do so by clicking the Get Msg button.
FIGURE 4.35
Receiving a new e-mail message
To read the message you have received, simply click the new, boldfaced message you want to read. The text of the message will appear in the bottom
of the Messenger window (as shown in Figure 4.36). You can read any message by simply clicking it. There are a few options for what to do with the message once you have read it. You can keep it in your Inbox or you can delete it by clicking the Delete button on the button bar; when you delete a message, it will disappear from your Inbox. You can also reply to the original sender by clicking the Reply button, typing a reply, and clicking Send, just as you would send a new message. Finally, you can forward the message to someone else by clicking the Forward button, typing the e-mail address of the person you want to forward the message to in the To line, and clicking Send. FIGURE 4.36
Reading a new message
For more information on using Netscape Messenger, check out Netscape’s support page at help.netscape.com.
QUALCOMM Eudora No discussion of e-mail products would be complete without a discussion of the most popular shareware, Eudora, made by QUALCOMM. There are actually two separate versions of Eudora—Eudora Pro and Eudora Light. The Eudora Pro product is a full-featured, powerful, commercial version that you can buy. Eudora Light is a stripped-down (although it still maintains many advanced features) version of Eudora Pro and is available for download for free from the QUALCOMM web site (eudora.qualcomm.com). As you can see in Figure 4.37, Eudora looks similar to the other Internet mail clients (especially Outlook). You can select a folder on the left and view its contents in the window on the right. Individual messages are listed, and you can view their contents by double-clicking them. FIGURE 4.37
QUALCOMM’s Eudora
Summary
In this chapter, you learned about the infrastructure needed to support an Internet client. The items needed to support an Internet client include not only hardware but software as well. Internet clients run on hardware platforms with some sort of operating system—any of the Microsoft Windows
operating systems, any of the Unix or Linux versions, and MacOS. Once you have these items in place, you must have some form of Internet connection. An Internet connection could be through a network card or any of the WAN links that we discussed in Chapter 1. You also learned various Internet clients, their configuration, and their functions. We discussed how web browsers can be configured with various preferences, such as different colors for hyperlinks and language settings. E-mail clients perform sending and receiving services for e-mail. However, we also explored how FTP clients can be used as separate utilities, or how they have become integrated into web browsers. As we saw, most clients do have a specific purpose, but the trend has been to combine most of these services in the web browser. By understanding the different clients, you will be prepared for our next endeavor: network security.
Exam Essentials Know the different hardware platforms that can be used by Internet clients. To get on the Internet, you need to use a hardware device of some kind. You can use either a PC or an Internet appliance. A PC is the most common choice, but more and more you’ll find people using Internet appliances (like the WebTV device) to surf the Internet. Internet appliances are inexpensive, but they can only perform certain Internet tasks (such as browsing the Web and sending and receiving e-mail). If you use a PC, you may need additional hardware (for example, a modem or NIC) to support your connection to the Internet. Identify the various operating systems used by hardware platforms. Every hardware platform must have an operating system. This isn’t so much an Internet requirement as simply a requirement of the hardware. If you want to do anything on a computer or appliance, you must have an operating system installed. Windows 95/98 is the most common Internet client operating system. Know the types of Internet connections and how to configure them. An Internet connection is a requirement for an Internet client to access the Internet. This means that you must have a connection to the Internet in some form. Whether you use a modem in your PC or your entire network is connected to the Internet, you must have some kind of connection to an ISP to support your Internet clients.
Know the different web browsers and how to configure them. A Web browser is the Internet client that creates HTTP requests and displays the Internet content it receives. It is the tool used to view all the content on the World Wide Web (WWW). The two most popular browsers in use today are Netscape Navigator and Microsoft Internet Explorer. They both are based on the Mosaic web browser developed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, so they are similar in appearance and function. The main difference in their appearance is the activity indicator. IE has a Windows icon and Navigator has an animated N. Understand how E-mail client function and how to configure them. An e-mail client is an Internet client that can send and receive e-mail using the SMTP and POP3 (or IMAP) TCP/IP protocols, respectively. The two most popular Internet e-mail clients in use today are Netscape Messenger (also called Netscape Mail) and Microsoft Outlook.
Key Terms
Before you take the exam, be certain you are familiar with the following terms: client IP address
Review Questions 1. Which component of a web browser indicates activity when animated? A. Menu bar B. Button bar C. Activity indicator D. Status bar 2. What should your FTP client be set to be able to successfully down-
load an EXE file? A. ls B. binary C. exe D. ascii 3. Which feature of an e-mail client allows you to keep track of the e-mail
addresses of the people you commonly send e-mail to? A. Send and Receive B. Inbox C. Outbox D. Address book 4. Which network hardware device connects a computer to an ISP and
the Internet via a standard phone line? A. PC B. Modem C. Internet appliance D. Processor
5. Which Internet client allows you to perform Unix commands on
another computer just as if you were sitting at the console? A. Web client B. FTP client C. Internet news client D. Telnet client 6. Which Internet client(s) allows you to view Internet HTML and
multimedia content. A. FTP client B. Web client C. Mail client D. News client 7. Microsoft Outlook is an example of what type of Internet client? A. Internet mail client B. FTP client C. Telnet client D. Web browser 8. What is the name of the name resolution file for Windows 98? A. HOSTS. B. HOSTS.TXT C. HOSTS.SAM D. HOSTS.NAM 9. When configuring a dial-up connection to an ISP, what information
do you need to get from your ISP before setting up the connection on your Windows 95/98 PC? A. ISP mailing address B. Web server name C. ISP dial-up phone number D. Username and password
10. What item(s) must you configure in an e-mail client before you can
send and receive e-mail? A. Your e-mail username and password B. SMTP mail server DNS name C. Mail server memory configuration D. POP3 mail server DNS name 11. You try to telnet in to a Unix host, but every character is duplicated on
screen when you type it (such as if you type what, it displays as wwhhaatt). What could be wrong? A. Local Echo is enabled on the client. B. Local Echo is disabled on the client. C. Local Echo is enabled on the server. D. Local Echo is disabled on the server. 12. When your Internet client PC is connected to the Internet via a router, in
addition to a client IP address and a subnet mask, what other TCP/IP address must be configured in order for the client to communicate on the Internet? A. Default gateway B. SMTP server address C. DNS server address D. WINS client address 13. Which is an example of the proper syntax for uploading a file
(file.txt) to an FTP server? A. upload file.txt B. put file.txt C. move file.txt D. copy file.txt
14. Which web browser setting controls how a web browser receives
information and automatically downloaded content from web sites? A. Cookies B. Security C. Proxy D. Mail server 15. What item(s) must you configure in an FTP client before you can
connect to an FTP server, log in, and download files? A. FTP server name B. Client IP address C. All of the above D. None of the above 16. How do you access the home page settings for Netscape Navigator? A. Edit Preferences from main window B. Internet Control Panel C. Tools Preferences from main window D. IE Control Panel 17. What item(s) must be configured before you can read news with
Microsoft Outlook Newsreader? A. Subscribe to a newsgroup B. Mail server DNS name C. News server DNS name D. Download newsgroup list
18. Where in Windows 95/98 do you configure the TCP/IP address of that
machine? A. Properties of My Computer TCP/IP address B. Start Settings Devices Network Interface cards TCP/IP C. Start Settings Control Panel TCP/IP D. Start Settings Control Panel Network TCP/IP Properties 19. Which item(s) must be configured for a client PC to support Internet
clients? A. TCP/IP address B. Internet connection C. Web browser D. DNS server IP address 20. Which modem initialization (AT) command will reset a modem back
to its factory defaults? A. ATA B. ATDT C. AT&F D. ATFACT
Answers to Review Questions 1. C. The activity indicator animates when the web browser is either
sending or receiving data. 2. B. The ls command is used to list files, the ascii command sets the
ASCII mode for text files only. The exe command isn’t a real command. 3. D. The address book allows you to keep track of people and their e-
mail addresses. You can send e-mail to them by selecting their name rather than having to type in their e-mail address every time. 4. B. A modem connects a computer to the Internet via a standard phone
line. An Internet appliance can connect to the Internet, but it won’t connect a computer to the Internet. A PC and a computer are the same things, and a processor is part of a PC, so both A and D would be incorrect. 5. D. Of the clients listed, the only one that allows you to perform
Unix commands as if you were sitting at the console of the computer is a Telnet client. 6. B. A web client is the client primarily used to view HTML and Internet
content, although it is possible for some mail and news clients. 7. A. Microsoft Outlook is used to send, receive, and read Internet e-
mail. It cannot perform any functions of the other three clients listed. 8. A. The only correct name resolution file listed for Windows 98 is HOSTS.. 9. C, D. When setting up a dial-up connection on your Windows 95/98
PC, at some point you will be asked for the dial-up phone number of the ISP and your username and password. 10. A, B, D. When configuring an e-mail client, you must configure your
e-mail username and password, your ISP’s SMTP mail server DNS name, and the POP3 mail server DNS name.
11. A. Local Echo is a setting on the client only, and when enabled, it will
duplicate on the screen all characters typed at the keyboard. It is used for those Unix hosts that don’t respond properly to Telnet requests. 12. A. Answers B, C, and D are often configured, but they are actually
optional. The only item that is required is the default gateway. 13. B. The only correct syntax for uploading the file file.txt is put
file.txt. 14. A. The Cookies setting controls how a web browser receives cookies
(special pieces of information from a web site). It is a security setting because you may not always want to automatically receive content from a web server. It could be malicious content. 15. D. Although you must enter the FTP server name during the connec-
tion process, there is really no configuration for an FTP client apart from installing it. 16. A. The home page settings are found in the main Preferences window,
which is accessed by choosing Preferences from the Edit menu in the main Netscape Navigator browsing window. 17. A, C, D. When using Microsoft Outlook Newsreader, you must enter
the DNS name of the news server, download the newsgroup list, and subscribe to at least one newsgroup before you can read any news. 18. D. The TCP/IP address of a machine is set through the Network Control
Panel in the Properties dialog box for the TCP/IP protocol. 19. A, B, D. Although a web browser is a client, it is not required for a PC
to support Internet clients. 20. C. ATA will tell the modem to answer, ATDT causes the modem to
Network Security I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 3.9 Understand when to use various site monitoring procedures. Content may include the following:
Viewing server log files
Monitoring network traffic
Monitoring server utilization
Monitoring server network bandwidth utilization
4.1 Understand and be able to describe various Internet security concepts. Content may include the following:
Access control
Authentication
Encryption—PKI
Secure socket layers (SSL)
Auditing
Secure Electronic Transactions (SET)
4.2 Identify suspicious network activities. Content may include the following:
Multiple login failures
Ping Floods
Denial of service attacks
Mail flooding
SYN floods
Spoofing
4.3 Identify various methods for performing intrusion detection. Content may include the following:
omputer networks are a huge blessing, making it possible for us to share databases, files, messages, and other information quickly and easily. But the blessing comes with several problems—information security chief among them. Think about it. If you have a stand-alone computer, the only way to get access to the information on it (the only way that doesn’t involve very expensive, James Bond–type bugging equipment that monitors the computer’s radio frequency emissions) is to somehow sit down at that computer and operate it. On the other hand, a computer connected to a network may be accessed via its network link. It may be possible to access files and other information on the computer over that connection—indeed, there probably are legitimate reasons for doing so. But that connection can be abused, and a connection to the Internet just widens the field of potential intruders. Author Craig Hunt makes an apt analogy in his Network Security, a Hewlett-Packard Job Aid published by O’Reilly & Associates. Hunt says that computers connected to the Internet are like homes and businesses and that the Internet is like the network of roads and highways. Anyone who owns property wants to protect his or her possessions, but the way to do that is not by blocking the thoroughfares—after all, they’re useful for legitimate purposes as well as nefarious ones. Rather, people interested in keeping themselves and their stuff safe from intruders lock their doors, securing safe private zones from open public zones. You’ll hear a lot about the nobility of hackers and crackers. Some say that people who break into secured systems are really helping the owners of those systems in the sense that they point out problems that need to be corrected. Perhaps there’s some truth to this, but others—notably John C. Dvorak of PC Magazine—point out that in many small towns, people leave their doors unlocked because they trust their neighbors. Clearly, if someone were to take advantage of this trust by illegally entering a house, they would be damaging the community and serving no useful purpose whatsoever.
This chapter aims to bring you up to speed on the issues and technologies of network security. You’ll get definitions of terms you’ll hear as you start to work on a global network populated by bad guys as well as good.
Security Fundamentals
T
he security of networked computers is all about making sure that the right people have access to the right information, and that they get it intact without anyone listening in as the information is transmitted. To accomplish those goals, you have to be sure that people are who they claim to be. You also have to have a way of dealing with security breaches while—and after— they occur, so you can figure out what is (or was) going wrong in order to correct the problem. The difference between security on a local area network (LAN) and security on the Internet is largely one of scale. The threats to security on the Internet are more numerous and more geographically distributed (and therefore harder to detect in some cases), but the motivations and techniques of attackers are largely the same on large and small networks. In this section you will learn about the key elements of computer security:
Authentication
Access control
Encryption
Auditing
These are all vital elements to a secure Internet presence. A weakness in any of them could lead to a breach of the whole system. Let’s examine them individually.
Authentication The first step in network security is authentication. Authentication is the process of verifying that a person (or a piece of software, in situations where programs share information without human intervention) is who they claim to be. If we don’t know who someone is, we can’t be sure if they deserve access to protected computing resources. In real life, we can look at someone and ascertain if he is known and trusted, known and distrusted, or unknown.
In operations that take place across networks, visual identification (usually) isn’t possible and other means of identity verification need to be devised There are several methods of authentication that can be used. The types that we will look at are:
Password authentication
Key and card authentication
Biometric authentication
Digital signatures
Digital certificates
Password Authentication The simplest sort of authentication is a password. Someone who wants access to a computer resource enters their username. On a very (dangerously) trusting system, a username would be all that was needed to get in. But for most systems, the logic process goes like this: “Here is a person claiming to be so-and-so. What is something only so-and-so would know, by which I can verify the claimed identity?” The answer is so-and-so’s password. In theory, only so-and-so knows their password and is the only one capable of correctly entering it. So the system prompts for a password, and if the password entered matches the one the system knows to be associated with so-and-so, the person is admitted to the system. The problem with passwords is that people are sloppy with them, and this leads to more problems. People write down their passwords, making them more easily stolen—see the beginning of the movie WarGames for an example of this. Other times, people choose passwords that may easily be guessed, such as the names of relatives or towns (going back to WarGames, Prof. Falken’s choice of his son’s name—Joshua—as his password was woefully obvious). When choosing a password, select one that’s not quite so obvious. Indeed, it’s best to choose a combination of letters and numbers. Sometimes, hackers will attack a system with a dictionary program that submits thousands of words as passwords in a sort of random guess approach—this is called a brute force attack. Choosing a secure password is not that difficult. One popular technique is to choose a sentence—say, “My cat Otis has two twitching ears”—and use the initial characters to make a password— McOh2te in this case. Another method uses an object that you probably see everyday—vanity license plates (ever notice how creative people are with only six or seven letters?) A great example is 0T2BR1CH (“Ought To Be
Rich”). You can also use character-number substitution for real words. Character-number substitution uses the number for the word spelled out, 0 for an O; 1 for an L or I; 2 as an S or a Z, 3 for an E, and so on. Make sure to use a mixture of uppercase letters, lowercase letters, numerals, and other characters if possible
If you are going to administer a network, the best defense against a brute force attack on client passwords is to place a “lock” on the accounts. With most network operating systems (NOSes), you can set a client account for x number of retries (lock-out number) before the account itself is locked out. At that point, even if the hacker guesses the right password the system still won’t allow them in. Most networks allow between three and five attempts before locking an account. The rule of thumb in setting a lock-out number is that the fewer attempts, the less chance of a hacker getting in. Unfortunately, too few attempts and you’ll be swamped with customer service calls when it’s time for a password change. Another good defense is to force your clients to change their passwords periodically—at least once per quarter. Clients usually won’t change their passwords on their own, and a static password can be just as dangerous as no password at all over time.
Key and Card Authentication In situations in which passwords aren’t enough, some systems require users to have a physical thing they must present to gain access. Whether it’s an actual metal key to be inserted and twisted, or a magnetic card, physical authentication devices increase the likelihood that the person trying to gain access is who he claims to be. The logic is that it’s less likely for a password and a key to be stolen than just one or the other. Also, a legitimate user will know if his physical device has been lost or stolen and can report it as such. There’s no way to know if your password has been stolen—until it’s too late. A perfect example of key and card authentication is an ATM card. Without the card, you can’t “swipe in,” and without the PIN, the card is useless.
Biometric Authentication It may sound like something out of a James Bond movie (indeed, it is), but it’s becoming increasingly possible and economical to use biometric technology to authenticate users. Biometric authentication relies on the fact that all people have unique physical characteristics that may be used to absolutely distinguish them from one another. The practicalities of biometrics are still
being worked out, but it’s clear that there are going to be several viable solutions of varying accuracy and expense: Fingerprints Everyone has fingerprints that are different from those of everyone else. There are devices that scan the ridges on the tip of your index finger and compare the scanned ridges to a database of fingerprint images. Fingerprinting systems that allow physical access to an area, as well as computer access, are currently available . Some fingerprinting systems will even allow you to lock down files and directories on a machine. Voiceprints Everyone has a unique voice, which a computer can describe mathematically and use as a point of comparison for authentication purposes. Voiceprint systems are uncommon, and are still in the infancy stage. Face recognition Computers and video cameras can look at a user and compare facial dimension data—such as eye separation, mouth width, and nose size—to a database. There are still some problems with face recognition systems, however, so it is unlikely that you will see this technology for a few more years. Retinal scanning Similar to fingerprints, the pattern of capillaries on the inside rear of your eyeballs is unique to you. Machines can scan this pattern and compare it to a database of images. Retinal scanning equipment is usually used for physical access. The cost and computing power needed to use the system on a computer is still prohibitive. There are two problems with biometrics. For one thing, people change over time (both long-term and short-term). What if you have a scratchy morning voice when you first try to log on via voiceprint? What happens if you go away for the holidays, gain 10 pounds, and then try to have your computer recognize your face? What happens if you wear makeup some days and not others? These are problems that biometrics companies are working on, and they’ve made real progress. The other, less serious problem is a biometric impostor. It’s sometimes possible to make a latex cast of someone’s finger, complete with ridges, and fool a fingerprint scanner that way. This problem has been pretty much licked by modern devices, though.
Digital Signatures Digital signatures are pieces of electronic information that serve to guarantee that an item—a document, a credit-card number, whatever—was not tampered with as it traveled over the Internet from sender to recipient. By
examining a digital signature that arrives with a piece of data, you can be sure that the data has arrived in exactly the same state it was in as it left its sender’s computer. Digital signatures rely on a mathematical algorithm called a one-way hashing algorithm, or one-way hash. A hash takes a piece of data—the characters in an e-mail message, say—and runs it through a series of operations. The operations yield a value that is unique to the information—even slightly different pieces of the original information will yield vastly different hash results. One-way hashes are so named because you can’t work them backwards. You can’t look at a hash result and figure out the original data from it. This sounds a lot like public-key encryption (a means of hiding messages from prying eyes, which we’ll get to soon), and indeed that process also uses a one-way hashing algorithm. In digital signatures, the hash doesn’t alter the information itself. Rather, it generates a result that’s based on the information. How Digital Signatures Work Let’s walk through the process of sending a digitally signed message traveling between two parties, Al and Sarah. The message might be an e-mail message, an attached document, the contents of a web form—the details of the information don’t matter. First, Al decides that he wants to send a digitally signed message to Sarah. Al generates his message in the usual way, typing it in plainly readable text; then, Al applies a hashing algorithm to the message, generating a result. Finally, Al puts the result in a package with a statement of the hashing algorithm he used and encrypts the package (but not necessarily the message itself) with Sarah’s public key. The encrypted package is called the digital signature, and it’s sent along with the message to Sarah. When Sarah gets the signed message, she uses her private key to decrypt the digital signature. She then knows the hashing algorithm Al used and the results Al got when he applied that algorithm to the message. If Sarah can apply the algorithm to the message she received and get the same result that Al sent, then she knows that the message was not altered as it traveled across the Internet. Digital signatures only verify that information wasn’t tampered with during transmission—they don’t verify that the sender is who he claims to be. E-mail messages commonly use digital signatures. To use digital signatures, you need a digital certificate.
Digital Certificates Certificates are proof that people (or things) are what they claim to be. Your passport is a good example of a certificate. It states some information about you—your name, your date of birth, your nationality, and so on—and associates it with a photograph of you. The idea is that someone can look at your passport, compare the photo in it with your actual appearance, and assume that, if the two match, the personal information in the passport applies to you. The other important element of a passport is that a government issues it. You can’t make your own passport. Rather, you have to prove your identity through procedures established by your government. If your government is well known and its identity-verification procedures are widely regarded as rigorous, your passport is more likely to be regarded as an accurate statement of who you are. Forging a passport is difficult because governments—particularly the well-regarded ones—use special paper, holograms, distinctive bindings, and other devices to make it hard for individuals to manufacture passports. You can think of a digital certificate as the electronic analogue of a passport in an environment that’s based on bits rather than on paper. Like a paper passport, a digital certificate—we’ll call them certificates from now on—meets several criteria:
It’s unique to its owner.
It’s easy to determine whether the person presenting the certificate is in fact its owner.
A universally recognized authority issues it.
It’s hard to forge.
How Digital Certificates Work Say you want a certificate for the purpose of proving your identity on the Internet. The first step in acquiring one is to generate a key pair for yourself—one public key and one private key. Most web browsers will do this for you, as will various e-mail clients and other Internet tools. You must take steps to protect your private key, either by encrypting and password-protecting it on your local machine, storing it on a smart card or other device you carry with you, or some combination of these. Your private key must remain secure. Then, you’d take your public key and approach a certification authority (CA). A certification authority is an organization that’s responsible for verifying the identity of people who come to it and issuing certificates to those whose identities can be verified. Typically, you’d go to the CA’s web site and enter
some information—including your public key—there. Then it’s up to the CA to verify that you are who you say you are and not someone who’s using your public key and personal information to gain a certificate in your name. Often, CAs issue different kinds of certificates to entities that have proven their identities with varying degrees of rigor. An applicant might get one kind of certificate if the CA merely verifies that e-mail sent to the applicant’s address is received. The applicant might get a better certificate if the personal information they provide matches some authoritative, secure database (such as a government list of Social Security numbers in the United States). They might get an even better certificate if they actually show up at the CA’s office for a fingerprint check. Different applications require different degrees of assurance that someone is who they claim to be. Once a CA has verified the applicant’s identity to the necessary degree, the CA generates a certificate for the applicant (usually one that complies with the X.509 certificate specification). The certificate includes several elements:
A serial number
Information about the certificate holder (name and affiliation)
The certificate holder’s public key
Information about the CA
The CA’s digital signature
An expiration date
Bear in mind that a certificate can be hijacked, but doing so isn’t very useful unless the thief also steals the private key that corresponds to the public key incorporated into the certificate. If you send an encrypted message to someone based on the public key in his certificate, they’d be unable to read it without the corresponding private key and the game would be up. Together with digital signatures, certificates provide the basis for nonrepudiation technologies. A nonrepudiation technology is any means of providing absolute proof (where absolute is defined as strong enough to stand up in court) that something occurred. A nonrepudiation scheme might involve a system of receipts—you send me a certified, signed message and I send you another certified, signed message that says I received what you sent. The messages don’t have to be e-mail messages—they could just as well be stages in a transaction involving web forms.
Identity theft is one of the fastest growing crimes in the United States. Criminals are able to gather enough information on an individual, sometimes using only a name and address or even just a license plate number, to apply for new credit cards. They run up the credit cards to the limit and then the credit card companies come after you for payment. Although most credit cards are only liable for the first $50, and some won’t even charge you that much, this can ruin your credit rating! The United States Congress passed a law in 2001 that makes a digital signature just as legal and binding as your handwritten one, an effort to make digital commerce, or e-commerce, easier and more secure. This means that identity theft has taken on a whole, new aspect—digital identity theft. Consider the situation that Microsoft found itself in when, in May 2001, someone managed to obtain a digital certificate from Verisign under the Microsoft name. Once discovered, Microsoft quickly issued a patch that would void the stolen certificate; however, before this error was discovered, the hacker could have sent out anything (including some nasty virus-infested programs) and people would have thought that Microsoft was to blame.
Access Control Say you’ve ascertained the identity of a person trying to gain entry to your system and agreed to let them in because the authentication information matches. You can then protect your resources by restricting users—both individually and as groups—to only the information they need to be able to do their jobs. It stands to reason that a user from the engineering department has no need to look at most internal financial statements and that accountants don’t need to see product drawings. Access control is any system that keeps people from accessing resources they don’t need. If they can’t access it, they can’t steal it. Access control is more complicated than it might seem at first because there’s overlap among the functional units of an organization. An engineer might not deserve to know how much money his colleagues are making, but it would be wrong to bar him from all human resources information because he has a legitimate need to see data about his health insurance. The trick is
figuring out who needs which information and granting them the needed access, but no more. There are a number of methods of controlling access to critical data on a LAN or over the Internet. Each method functions to secure data on an internal network and keep it from being accessed without authorization via the Internet. The most common methods that we will discuss include:
Firewalls
Proxy servers
Access Control Lists (ACLs)
Packet Filtering
Application filtering
Firewalls A firewall is a computer or other network device that prevents unwelcome parties from gaining access to a secured network. Just as a conventional firewall prevents problems (such as fire) in an engine compartment from immediately spreading to the rest of a car or aircraft, a network firewall keeps the bad guys on one side of itself and the good guys (those whose information it protects) on the other. Firewalls use a variety of techniques to provide insulation from attacks. Access Control Lists (ACLs), dynamic packet filtering, and protocol switching are among the most popular techniques. We’ll examine them individually now.
A thorough understanding of the techniques used by firewalls to thwart hackers is not just an essential part of passing the I-Net+ exam, but your frontline defense against cyberterrorism.
Access Control Lists (ACLs) Access Control Lists (ACLs), as they apply to firewalls, are collections of rules about what resources on the secured network may be accessed by entities (people and machines) on the Internet, outside the secured network.
Routers may (indeed, should) be equipped with Access Control Lists that determine how they handle requests for service. An ACL might be set up, for example, to let people on the open Internet access the web server that contains the company’s public sales information. The same ACL might prohibit anyone from outside the local network from accessing the database server that keeps track of orders. Figure 5.1 shows how an ACL-equipped router might protect a LAN from Internet access. See how the router—acting as a firewall here—has a list of rules? Each time a packet comes through, it checks to see that its desired path is permissible according to the rules in its ACL. FIGURE 5.1
A router with an ACL protects a private network.
ACL A can access B. B cannot access A. B can access A if a secure, authenticated connection is detected.
Network B “Public” Network
Network A “Private” Network Router
The problem with ACLs is that it’s possible for bad guys to pretend to be using computers they’re not—a practice known as IP spoofing. That’s why other protective measures remain necessary.
Dynamic Packet Filtering When two computers are engaged in communications over the Internet, they communicate by sending packets back and forth. When one of the computers is on a network that’s protected by a firewall and the other is on the open Internet, the packets must pass through the firewall as they travel from one computer to the other. A firewall that keeps track of the packets traveling between two computers as part of a particular communications session is a powerful
security device. The process of monitoring the packets involved in an exchange and rejecting those that don’t fit is called dynamic packet filtering. You see, the packets involved in a communications session are numbered, and a firewall with dynamic packet filtering features keeps track of the number of the next packet it expects to receive as part of a transaction (the number is stored in a state list). If a packet with another number comes along, claiming to be part of the same transaction, the firewall knows not to let it into the secured network. Figure 5.2 shows how this works. The session the firewall is keeping track of involves the server on the left (A) and the client on the right (B). When the hacker attempts to impersonate the server, the firewall protects the client by recognizing that the hacker is attempting to use nonsequential packets. FIGURE 5.2
A firewall with dynamic packet filtering protecting a network State List Session between A & B: Last packet #1238 Next packet #1239
Server sending packet #1239. Client expecting packet #1239.
Firewall Hacker attempts to get in using packet #1211.
Hacker is denied access because the state list says the firewall should expect packet #1239 next, but instead is receiving #1211, so it rejects the packet.
Application Filtering Software has come a long way in the past few years, but because of it’s complexity there are bound to be bugs found somewhere. Think of an error in a piece of software, like your web browser, that allows a hacker to gain entry into your computer. If you believe this to be a bit far-fetched, you may want to take a look at the security patches available from your web browser vendor (such as Microsoft’s Internet Explorer). Firewalls and gateways can be configured to be aware of some of the more common bugs that could allow an outsider to gain entry into your network, and apply application filtering.
Protocol Switching Some Internet attacks—notably SYN floods and Ping of Death attacks—take advantage of characteristics of the TCP/IP protocols on which the Internet operates. An easy way to stop such attacks is to base your secure networks on protocols other than TCP/IP. A TCP/IP attack won’t be effective against a secure network that’s based on AppleTalk, NetBEUI, or IPX. A firewall that provides protocol switching takes TCP/IP traffic from the Internet and translates it into some other protocol for transmission across the secure network. On the other hand, there’s a lot to be said for the TCP/IP protocols as the basis for a protected network. For one thing, they’re the standard network protocols for Windows NT, Novell v4.11 and up, and Unix machines. You can use protocol switching to protect networks like this by means of a dead zone. A dead zone is a network segment between two routers on which a non-TCP/IP network protocol runs. TCP/IP traffic runs into one router, which translates it into another protocol and passes it on to the other router. The second router translates the traffic in the intermediate protocol back into TCP/IP for use on the secured network.
Until Novell’s NetWare 4.11, also known as IntraNetWare, IPX/SPX was the protocol used in NetWare environments. Novell has realized that while IPX/ SPX is a viable protocol, they needed to integrate their operating systems with web applications. Supporting multiple protocols on a network is not desirable unless you are working in an environment, such as a military facility or a technical company wanting to secure its product lines, that warrants it.
Figure 5.3 shows how protocol switching works with and without a dead zone. The scheme on the left has a dead zone—the Internetwork Packet eXchange/Sequenced Packet eXchange (IPX/SPX) protocol is used between the two routers. The system on the right has no dead zone, but the whole secured network runs IPX/SPX and so is secure from TCP/IP-based attacks.
Protocol switching occurs inside the firewall. The first NIC understands TCP/IP only. The second NIC understands IPX/SPX only.
IPX/SPX only Router Switch
TCP/IP only
IPX/SPX only
Protected intranet E-mail server IPX/SPX TCP/IP both
File & print server
Internal database & Web server
Demilitarized Zones (DMZ) A demilitarized zone (DMZ) is a special network segment that’s kept separate from networked resources that need to be held to a higher level of security. Resources in the DMZ—such as web servers and Internet-accessible database servers—are meant to be accessed from the open Internet. They should be protected from attack—no one wants their web pages altered by vandals—but they can’t be as heavily defended as, say, an internal mail server or a database server that holds proprietary sales data. With the publicly-accessible machines on one segment and the really sensitive resources on another, the DMZ can be given one level of protection and the rest of the network another, much higher, security level. Figure 5.4 shows how a DMZ fits into a firewall protection scheme. The protected intranet, on a network segment separate from the
Internet servers, can be made subject to different security rules than the public and semipublic resources. FIGURE 5.4
A firewall with Internet-accessible computers in a DMZ
Internet
DMZ Router
E-mail router
Web server
Switch
Firewall
FTP server
Switch
Protected intranet E-mail server File & print server
Internal database & Web server
There are three types of DMZ configurations in use today:
Bastion host
Three-homed Firewall
Back-to-back firewall
BASTION HOST
A bastion host configuration is the simplest, least expensive, and easiest to maintain DMZ configuration. The bastion host is a firewall with two network adapters. One NIC connects directly to the Internet, allowing Internet
traffic to and from the host. The second network adapter connects directly to your network, which provides Internet access to your clients. However, while this configuration offers a single point of defense, it also allows Internet traffic to get into your network
A bastion host really doesn’t fall into the DMZ category; however, you need to know that this configuration exists and how it works.
THREE-HOMED FIREWALL
A three-homed firewall is similar to the bastion host, but it has three network adapters. The first NIC connects directly to the Internet and the second is connected to your network; however, the third network adapter connects to the DMZ and acts as the “middle-man” between the Internet and your network. For example, if someone were to connect to your site from the Internet, they would first be routed through the NIC attached to the Internet. The request would then pass to the second NIC that was connected to your DMZ. The traffic wouldn’t cross over to the Internet network. If your network clients wanted to reach the Internet, however, they would pass through all three NICs. BACK-TO-BACK FIREWALLS
With a back-to-back firewall configuration, you are actually using two firewalls around your DMZ. One firewall leads from your internal network to the DMZ, and the second firewall goes from the DMZ to the Internet. It is considered to be the safest and most secure method of protecting your network from attacks; however, the major drawback to this configuration is the expense of purchasing, managing, and maintaining two separate firewalls. Remember, though, you get what you pay for.
Proxy Servers A proxy server is a computer that acts on behalf of the computers on a secured network when they want to access information on the public Internet. If you are on a secured network and want to access a web site, you’d send your request to the proxy server. It would then get the information you want and, in a separate transaction, send it to you. Think of proxy servers as clerks in an old-fashioned store, where you don’t pick the products off the shelves yourself but rather make requests of the clerk. Figure 5.5 shows how this works. The data traveling in each direction remains untouched, but the address header on each packet of data
is changed when the packet passes the proxy server. The remote resource is unable to send packets directly to the client. FIGURE 5.5
A proxy server assisting an Internet transaction Discarded Fro mA
Web server From proxy Data
From proxy
HTTP proxy server
From A Data
Data
Internet
From server
From proxy
Data
Data
A
A proxy receives a request from a client and makes the request on behalf of the client. This example shows an HTTP proxy server.
Encryption Encryption is any method of converting a readable message—known as a cleartext message—into unreadable ciphertext. Encryption is all about advanced mathematics and hashing algorithms—remember them from our earlier discussion of digital signatures? To refresh your memory, here’s how encryption works. If Bud wants to send a message to Beth over the Internet and has reason to believe that Callie will intercept the message en route, Bud can encrypt the message before sending it. Bud takes his cleartext message and runs it through an encryption algorithm (also called a hashing algorithm)—a sequence of mathematical procedures that alters the text and renders it unreadable. Encryption algorithms depend on keys, which we’ll cover in greater depth soon. Having encrypted his text, Bud can send the ciphertext to Beth over the Internet. Beth can decrypt the message with a decryption algorithm and a particular key. If Callie intercepts the message, it doesn’t matter because she doesn’t have the key that Beth has.
Differences among encryption systems center on the key. Keys can be long or short, public or private. Encryption and decryption algorithms can be the same or different. On the Internet, most transactions rely on public-key encryption, which is a system that involves encrypting messages with one key and decrypting them with another. Encryption comes in handy for all sorts of tasks, ranging from e-mail messages to electronic commerce transactions involving credit card numbers to securing sensitive data on a client’s hard drive. Encryption relies upon mathematical functions that use a key to convert cleartext into ciphertext. The key is a string of random characters, the longer the better. Longer keys result in ciphertext that’s harder to crack.
Key length is measured in bits. A 40-bit key is considered the minimum for even marginal security, whereas ciphertext created with a 128-bit key is very hard to decode.
There are two basic kinds of encryption—public-key and private-key. Private-key encryption is, under certain circumstances, theoretically unbreakable but is not well-suited to the kinds of transactions that go on on the Internet, which frequently involve people who have never met. Publickey encryption is extremely strong but theoretically breakable (though only with extraordinarily powerful computers and hundreds of years) and does not require participants in a transaction to meet. We’ll explore these applications further in the following sections.
Private-Key Encryption Private-key encryption, also known as symmetric encryption, relies upon a single, secret key. The sender uses the secret key to encrypt a message. The encrypted message is then sent to the recipient, who uses the same key to convert it back into cleartext. There’s one key, and both parties to the exchange must know what it is. For the exchange to be secure, no one else must know what the key is. Lots of military communications rely on private-key encryption. When there’s a message to be transmitted to the missile submarine or the B-52, the base uses a specific code to encrypt it. The encrypted message then is radioed to its recipient, where someone uses the same code key (often dramatically snapped from a plastic shell) to decode the message. The thing about private keys is that they must be synchronized—the sender and the recipient must have the same key. Further, the keys must be
synchronized securely, without a spy or other bad guy getting a copy with which to decode the messages. Private-key encryption works great for environments like the military, where keys may be synchronized between senders and recipients at (theoretically) secure locations like ports and air bases. But don’t forget that moles sometimes steal private keys. In any case, the Internet is not a military operation, and there’s rarely an opportunity for senders and receivers to synchronize their keys in a secure location. The Internet demands a different solution, and so private-key encryption isn’t used much on the Internet.
Public-Key Encryption In contrast to private-key encryption schemes, sending an encrypted message by a public-key, or asymmetric encryption, setup requires two keys. One key is used to encrypt the message; the other is used to convert it back into readable cleartext form. Let’s say Bill wants to send a message to Kate. To encrypt the message, Bill looks up Kate’s public key—a key that’s freely available and listed in a directory—and uses it to encrypt his message. He then sends the encrypted message to Kate via the open network. It’s true that anyone who wants to can snag the message along the way and also can get access to Kate’s public key the same way that Bill did. But the neat thing is that Kate’s public key is useful only for encrypting messages meant for her—you can’t use it to decrypt messages. This is called a one-way encryption scheme. To decrypt the message Bill has sent her, Kate must apply her private key. Her private key is a secret, known only to her and useful only for decoding messages encrypted with her public key. Unlike in a private-key encryption scheme, there’s less chance that the private key in a public-key system will leak because there’s no need to share it with anyone at all. Figure 5.6 shows how this works. In the illustration, X first sends a message to Y. That message is encrypted with Y’s public key and decrypted with Y’s private key. The reply message, which goes from Y to X, is encrypted with X’s public key and decrypted with X’s private key.
If you use an Internet access method where the Internet is “always on,” such as with a cable modem and xDSL, it is more likely that a hacker can get into your hard drive and steal your private key. It’s a good idea to invest in personal firewall software for your home computer to shut down access to your hard drive from the Internet.
Political Restrictions on Encryption If you’ve ever downloaded a web browser in the United States, you were probably offered a choice of key lengths. You could download the 40-bit version of the browser without any further questions, but you had to make a lot of statements about your citizenship and location if you wanted to have a browser that could handle 128-bit encryption. The reason for this is legislative. The U.S. government has designated strong encryption technologies as munitions, along the lines of bombs and air-search radar units. The theory is that, if a hostile government (or an international criminal, or a terrorist group, or whatever) had strong encryption, they’d be free to communicate in secret.
Therefore, it is illegal to export from the United States encryption technologies based upon keys of more than 40 bits. If you’re going to design a piece of software for use outside the United States, either do your developing outside the country’s borders and use non-U.S. strong encryption technology or, if it’s developed in the United States, use 40-bit encryption.
Encryption Technologies As with any technology, there’s a difference between encryption in theory and encryption in practice. Although the principles of public-key encryption apply to many different situations—such as secure electronic mail and secure credit card transactions with web sites—the implementations of encryption differ in different circumstances. Here, we’ll explore Pretty Good Privacy (PGP) and S/MIME as e-mail encryption technologies, Secure Sockets Layer (SSL) as a way of encrypting web traffic, and Secure Electronic Transactions (SET) as a means of protecting credit card information. Pretty Good Privacy (PGP) Developed by mathematician Philip R. Zimmermann, Pretty Good Privacy is a multipurpose encryption scheme based on public-key architecture. PGP can use keys of various lengths, is simple to apply to many different kinds of information, and can be very powerful—so powerful Zimmerman had to fight the government in court to defend his right to distribute and make money from PGP. It was the first widely available application of public-key encryption theory for the Internet. PGP later become the base behind a company—PGP, Inc.—that sold encryption software and technology. Network Associates eventually bought that company and now operates it as a division.
Read all about PGP at www.pgp.com.
Secure Sockets Layer (SSL) Secure Sockets Layer (SSL) provides an encrypted link between a client computer and a server computer over a TCP/IP connection. Most frequently, it’s seen as the mechanism by which web browsers establish secure connections with web servers for the purpose of exchanging confidential information like credit card numbers.
Here’s how an SSL transaction works. A client requests a secure connection between itself and a server by requesting a particular web page or other resource from that server. The server, which has to have a valid digital certificate in order for SSL to work, presents its certificate to the client. The client then has a reasonable level of certainty that the server is what it is supposed to be (say, that a computer presenting itself as Amazon.com’s e-commerce server really belongs to Amazon.com). Once the identity of the server has been validated, the client and server use a single transaction, encrypted with a public key, to agree on a private key that’s used for the rest of the transaction. Once there is a private key, the client and server can securely exchange sensitive information, such as credit card numbers. Secure Multipurpose Internet Mail Extension (S/MIME) Secure Multipurpose Internet Mail Extension (S/MIME) is a proposed standard for securing electronic mail. S/MIME is related to ordinary e-mail attachments, digital signatures, and public-key encryption. A message that is to be sent in S/MIME format is typed normally and then converted to S/MIME. That process encrypts the message text using the recipient’s public key and generates a digital signature for the message—see the section on digital signatures earlier in this chapter. The digital signature is attached to the message as a binary file, which is one of the problems with S/MIME—the binary attachment slows the message’s progress across the Internet. Secure Electronic Transactions (SET) Secure Electronic Transactions (SET) is a system, developed by Visa (a credit card company) that provides a framework for secure electronic financial transactions on the Internet, particularly at the consumer level. SET relies heavily on VeriSign certificate technology to validate the legitimacy of consumers, merchants, billing consolidators, and card-issuing banks. Digital Signatures Digital signatures rely on encryption to guarantee that a document or other information transmitted over the Internet has not been tampered with. Refer back to the section on digital signatures for more information.
Auditing The final key element of security is auditing. Auditing is a system of record keeping, which basically means that software takes notes about what it does and why and allows administrators to look at those notes (called log files) later. Log files also can be examined by software that’s designed to detect problems. A system properly equipped with security auditing features will provide its administrators with some information with which to figure out the nature of security problems that are occurring (or have occurred). A good auditing scheme will help administrators ascertain the severity of a breach by identifying which information was compromised. It also will help them track down the bad guys.
Log Files Log files are text files that record things that happen as software operates, such as logons, file accesses, failed connection attempts, and more. Information recorded in log files might include the following:
All login attempts, successful and failed
All files copied
All files deleted
All programs launched
All remote logins to other computers
When examining log files, look for signs of suspicious activity, such as repeated and frequent failed passwords or rejected attempts to access a particular resource. You should monitor your log files regularly, just to get a feel for what’s normal. You’ll be better able to recognize unusual events when you’re familiar with what’s usual.
Auditing Logs Because, on a large system, such an auditing scheme might quickly generate log files of many gigabytes in size, logs typically are kept for a few days, analyzed, and deleted. When the logs are analyzed—usually automatically by an analysis program—the administrator gets an alert if suspicious activity shows up. Suspicious activity might include, for example, an extraordinary number of unsuccessful login attempts under a particular username. Log analyzers oriented toward detecting illegal entry into a system are called intrusion detection utilities.
n the game of computer security, your enemies are the people who want to break into your networked resources. To protect your network, you need to understand what motivates these people and how they go about their dastardly business (know thine enemy). In this section, you will learn about some of the common motivations behind attacks, the types of attacks you can expect, and the methods of attacking a network.
Attackers’ Motivations The first step in understanding network attacks is being aware of the factors that motivate such attacks. Motivations can vary depending on each situation and can run the gamut from relatively harmless entertainment to intentional destruction and monetary gain. Possibly the biggest threat to a company’s network security is its own employees. Employees have the easiest access to security information. Did you ever notice that, in the movies, if a person wanted access to a company’s information, they almost always got it through an employee?
Entertainment Playing with computers is fun, and breaking into real-world resources has elements of truth and danger that games lack. In fact, people have written and tried to sell computer games in which the object is to break into a computer across a network—these games haven’t done well. Real-life break-ins are much more fun.
Proving Competence What better way to prove your skill at network trickery than by doing something another expert has set out to prevent you from doing? An attacker might think that if you have wrapped a network with security measures to the best of your ability and she is able to get around them, she is more skilled than you are. So there. Although attacks like these are less likely to end in serious damage to your information or theft of anything of value, they may cause embarrassment when, say, your web site is altered. Don’t underestimate the attractiveness of a challenge.
Spite Some people are just plain mean and will go to considerable lengths to trash a system for no other reason. It’s hard to identify these folks ahead of time. However, disgruntled insiders rank high among the attackers motivated by spite. Employees, miffed because they have been passed over for a promotion or about to defect to a competitor, might intentionally wreak havoc with your networked resources. This is one of the main reasons to control access on networks.
Political and Nationalistic Reasons Some folks will take on a network to promote a political cause. For example, when Congress was debating the Communications Decency Act—which the U.S. Department of Justice supported—intruders broke into the Justice Department’s web site and posted pornographic images there to register their opposition to the act. The most dramatic sort of politically motivated attack is so-called “cyber warfare,” in which agents of a government (or of some cause) make large-scale, coordinated attempts to disrupt the computing infrastructure of an opposing country. One example of this (although the whole story isn’t yet public) is when the Pentagon accused Serbia’s government of leading coordinated attacks against its computer networks during the 1999 war in the Balkans.
During the Spring of 2001, a cyberwar involving American, Russian, and Chinese hackers occurred. Each side attempted to penetrate and deface as many of the opponents country’s web sites in an effort to “defeat” the other side. The goal was to prove which country’s hackers were better than the others. During this cyberwar, several e-commerce and governmental web sites were penetrated and/or defaced. The damage is still being totaled.
Monetary Gain People who break into a network and grab the right kind of information can get rich. Whether they steal credit card information with which to make fraudulent charges or proprietary technical information they can use to defeat you in business, there’s money to be made by taking information.
Types and Methods of Attack There are several mechanisms that the bad guys, called hackers or crackers, employ to gain entry to a network or to bring it down. To protect an Internet resource from problems, you have to be aware of the guises an attack can take. In this section, we will examine how hackers can gain entry into a network and what methods of attack they use to bring that network down.
Additionally, there are viruses. Viruses can be a sort of denial of service attack when they tie up a processor or erase data, though more often they’re mere nuisances that generate messages. Although viruses are considered part of network security, we will discuss viruses in more detail in Chapter 8.
Social Engineering Many attackers get the passwords and other information they need by simply asking for it. Often posing as someone with a legitimate need for the information (such as a technician, an accountant, or a particular user), the bad guy will call someone on the phone, explain a made-up situation, and ask for the information that’s allegedly needed to fix the problem. For a long time, bad guys got free America Online accounts by posing as billing auditors and asking users for their passwords (the practice has been curtailed by an AOL education program). This, in the network security field, is called social engineering. The key to defeating social engineering attacks is education. Train people to be suspicious of anyone attempting to get information related to network access. Teach them that they should never, for any reason, reveal passwords or other security information to anyone (a legitimate administrator would never need to know an individual’s password, for example).
With the new move in help desk technology, many companies are implementing remote management technologies. Remote management will allow a technician to gain control to your computer from their machine, which will allow the technician to save travel time. Always ensure that you know with whom you are dealing before allowing someone to remote into your computer.
Dumpster Diving Dumpster diving is one of the most common techniques used by a hacker to gain useful information to a network, and is exactly what it sounds like. The hacker will drive by a company’s dumpster very late at night or early in the morning and load up their vehicle with the company’s garbage. While this sounds very unappealing and useless, you would be surprised what kinds of information an employee will just throw into the garbage. For example, a simple printer test page from a company network printer can give a hacker enough information to determine your network device naming structure. From there, they can jump from printers to workstation and server names. Sometimes, employees may scribble down an IP address and casually throw it away. This is a gold mine for a hacker.
Dumpster diving is just a lucrative for the identity thief. Remember those credit card offers you get in the mail? They usually have all of your information pre-printed on them. The identity thief just has to call that toll free number on the form and perform a change of address. Ever throw away a bank statement? Well, you can guess what an identity thief can do with that when they know your account number, your spending habits, and so on. The bottom line is to invest in a cross-cut shredder for your company and your home. Ignore the straight-cut shredders, as identity thieves have been caught with taped bank statements in the past.
Brute Force Brute force attacks rely on the capacity of attacking computers to tirelessly generate different combinations of characters and feed them to defended network resources. Typically, brute force attacks are used to find passwords and break encrypted messages. A bad guy will set up a program that tries thousands of different passwords until it finds one that works or will try many different keys in his attempt to crack a message. The thing about brute force attacks is that they rely on large amounts of computing power, and computing power costs money. Every processor cycle that’s devoted to a brute force attack is a cycle that can’t be used for other purposes. As the cost of these computing resources adds up, it becomes apparent that the value of the information you’re attempting to steal had better exceed the money you’re putting into the attack. Breaking a message encrypted with a long key might require $20,000.00 in computer time— hardly worth it for a credit card with a limit of $10,000.00. However, if a
hacker went after a large e-commerce site, they could (and have) steal thousands of credit card numbers.
Attacks Upon Known or Detected Weaknesses Operating systems, server software, browsers, and other programs are incredibly complex and so are certain to have flaws, or bugs. Some of these flaws are of the type that allows intruders to access the system covertly. In the normal course of an operating system’s use in a variety of installations, security problems are found. Sometimes, they pop up as a result of administrators’ vigilance. Other times, it takes a break-in to make a problem known. Either way, the manufacturer of the operating system should take steps to correct the problem. Typically, the manufacturer will make a patch freely available. The patch is a little piece of software that plugs the security hole. The problem occurs when there’s a known bug for which a patch exists but the system administrator—out of laziness or whatever—doesn’t install the patch. Attackers are then free to exploit the hole by well-published means. To make their jobs easier, intruders use software that looks for unpatched holes in systems. The moral: Keep up-to-date on the flaws that have been found in your operating system and always install published patches immediately.
Several e-commerce sites were hit in 2001 by hackers that gained credit card information. Some of these sites were infiltrated because they didn’t apply security patches for known bugs. For more information on updates and patches, see Chapter 8.
Denial of Service Attacks A denial of service (DoS) attack is an attack in which the legitimate users of a computing resource are prevented from accessing it. Typically, a DoS attack relies upon overwhelming a computer or other resource with an extraordinary number of requests for its attention. Most systems will at least slow down under such conditions, although if hit hard enough the targeted computer will crash. Spam and Mail Flooding Everyone hates spam, which is the mass broadcast of unsolicited messages to hundreds of thousands of e-mail addresses. Aside from annoying its recipients, spam places an undue load upon mail servers. A mail server that has to
deliver the same (useless) message to each of its user accounts has to waste time getting the job done. If enough spam messages show up, a large proportion of the server’s time can go toward delivering them. Mail flooding is slightly different. It has to do with unleashing a barrage of mail on a particular user. Whether by subscribing the user to hundreds of mailing lists (the ones, meant for pilots, that distribute hourly airport wind condition reports are favorites) or by sending the user very large messages full of garbage characters, mail flooding can cause real trouble. Mail flooding also is called mail bombing. Ping Attacks Ping is a special-purpose program that’s supposed to be used to ascertain the “aliveness” of another computer on the network. For example, a ping packet might be sent to your computer, which would receive the packet and acknowledge that it had done so. Normally, the amount of processing power required by a ping transaction is negligible, but a ping flood ties up the processor with the need to respond to an enormous amount of pings in a short period of time—in effect creating a denial of service situation.
One of the most popular uses of ping floods is in online gaming. You can unleash a ping flood on an opponent’s machine just as you’re sneaking up on his character. When the other player detects you and attempts to defeat your attack, his machine’s response is slower than usual because it’s overloaded by the ping attack.
It’s sometimes possible to bring down a target computer by sending a Ping packet of more than 65,536 bytes in size, in excess of the maximum size allowed by the Ping specification. This is known as the Ping of Death or the Ping o’ Death. Most operating systems and Ping utilities know about the Ping of Death now and have been patched to deal with it. Make sure your systems have the proper patches in place. SYN Floods A SYN flood attack takes advantage of a particular characteristic of the TCP/ IP protocols and the operating system resources devoted to handling them. SYN floods can result in the inability of a computer to connect to other computers over the Internet. If the computer attacked with a SYN flood is a firewall, the effect can be that the network the firewall protects is isolated from the Internet.
Basically, network operating systems have a facility for handling TCP/IP connections in the process of being made. There’s a buffer for connections that are waiting for packets to come back over the network to complete their links. In a SYN flood, this buffer is filled to capacity by a malicious program, making it impossible for legitimate users to create any new TCP/IP connections. The computer is swamped by half-open connections. SYN floods work only under TCP/IP, so you can defeat SYN floods by implementing protocol switching on your network.
You should be aware that SYN flooding requires less than 10 percent of the bandwidth going into a computer. They’re an easy way to deny service without a huge volume of information passing back and forth between machines.
Information Theft and Destruction Many attackers will attempt to damage you or your company by stealing or destroying important information. This is called information theft or destruction. Although this form of damage is usually done for profit, it can be performed for any of the other reasons listed earlier. Some unscrupulous companies (and possibly some governments) will employ a hacker to gain another company’s valuable research and development, or if it can’t be stolen to cause disruption during a critical phase. The trouble with an information theft attack is that you will probably not be aware when it’s taken place. If someone breaks into your house and steals your television, you know it’s gone. But if someone makes a copy of a critical file, you have no way of knowing about it unless you have an auditing program in place.
Identifying Security Requirements
S
ecurity requirements vary among different kinds of networks. Intranets, for example, exist for the purpose of sharing potentially secret information among people who have a right to see it. Internet sites, on the other hand, are meant to be accessed by everyone in the world and so should contain only information that’s appropriate for that kind of exposure. Other kinds of networks—extranets and virtual private networks—are hybrids and so require special treatment.
This section explores the differences among the different kinds of networks and helps you identify the special security requirements of each.
In large part, an organization’s security requirements depend upon the nature of the organization. If your information has inherent and obvious value—say, a list of credit card numbers—it’s attractive to intruders. If your company is well known, you’re attractive to bad guys looking to earn bragging rights. But if your networks store information of limited inherent value and your organization is not well known, your security strategy needs to be less elaborate.
Internet Security Internet security is a balancing act. You want everyone on the open Internet to have access to your public information without any trouble at all, and you want your organization’s employees to have unencumbered access to external resources on the Internet. Yet, you want to prevent attackers from vandalizing your web site, reading your inbound and outbound e-mail, and using your employees’ Internet link as a point of entry through which to attack resources you want to keep secret. Approach the problems separately. To keep intruders from attacking your secure network through its Internet connection, put a firewall in place and isolate public and semipublic resources in a DMZ. Make sure your firewall has an ACL and performs dynamic packet filtering. To prevent your employee’s legitimate Internet activities from falling victim to thieves and spies, make sure the employees properly secure their Internet transactions. They should encrypt Internet e-mail, use digital signatures to keep tampering in check, and apply encryption to files. They should use SSL to protect exchanges with remote Internet sites.
Intranet Security Intranets, in theory, are insulated from the Internet. Assuming that you’ve secured the intranet from external invasion, your concerns are with the damage an insider can do intentionally or accidentally. A disgruntled insider can steal or trash information. To limit the damage an individual can do, compartmentalize information. Give users access only to the portions of the intranet they need to see to be able to do their jobs.
Make sure you have a backup program in place and that it runs frequently enough to protect your organization from costly data loss in the event of internal vandalism. Careless insiders might unwittingly share their passwords with unauthorized users. For this reason, you should enforce rules requiring users to change their passwords every so often. You might also want to supplement password protection with smart cards or biometric measurements. Also, deploy antivirus software to prevent an intranet user from introducing a malicious program via a floppy disk, CD-ROM, or DVD.
Extranet Security Extranets vary in their purposes, and so vary in their implementation. They are kind of like virtual private networks (VPNs)—which are covered in the next section—in that they use the open Internet for communications between geographically separate entities, but they’re different because the connections between the parties are more sporadic. Extranets give certain parties access to special information over the Internet in addition to the information that’s published on the Internet for the world at large. Many organizations rely on their web server software to secure their extranets. Each remote extranet user—each supplier, or whatever—has a directory on the web server. Those directories are password-protected, and transactions involving their contents are secured with SSL. Databases that contain information that’s internal to the organization can also have information that’s to be shared across the extranet. You can assign different rights for internal clients and extranet users. Users with “extranet” rights can access only certain information, whereas “internal” users can access the entire set of data.
Virtual Private Network (VPN) Security A virtual private network (VPN) is a network of computers that acts like a bunch of machines on a LAN or private WAN but that uses the Internet to provide its long-distance links between sites. Because very sensitive information travels over LANs and private WANs, security is extremely important. Lately, some new technologies have begun to emerge. IPsec is a variation on the IP protocol that allows for better inherent security in transmissions between IP addresses. Part of IPsec called the Layer 2 Tunneling Protocol
(L2TP) is of particular interest to VPN developers. As we discussed in Chapter 3, L2TP provides for secure connections between points on an IPsec network. A VPN user connecting to a remote site with IPsec and L2TP enjoys a high level of security. There are two typical architectures under which VPNs exist. In one, there’s a regular link between two or more fixed sites. In the other, a roaming user logs in to the VPN from various locations. Let’s take a look at each architecture.
Regular Links between Fixed Sites In the case of a fixed VPN structure, there are several locations (offices or whatever) with LANs in each location. The individual LANs need to be linked as if they’re one big LAN so a person in location A can share files with someone in office B as easily as with someone who works in the next cubicle. Traditionally, this has been made possible with dedicated private telecommunications links—leased lines and the like. Those are secure, but very expensive. In this situation, using the Internet to carry traffic between the two locations makes a lot of sense; however, the traffic has to be protected from the bad guys on the open Internet. This is what tunneling protocols are for. A tunneling protocol—Microsoft’s Point-to-Point Tunneling Protocol (PPTP) is one—encapsulates encrypted data in an Internet-friendly wrapper. The packets move over the Internet. When they get where they’re going, the Internet wrapper is stripped off, the contents are decrypted automatically, and a virtual link—a secure one—exists.
Variable Links to Roaming Users VPNs also need to take care of the roving client, such as a salesperson or a telecommuting employee, who logs in to the network, usually via a slow connection, from a different place each time. These people can use tunneling protocols as well, or they can rely upon SSL and other general-purpose Internet security measures to protect the data they send and receive. They also can rely on signed and encrypted e-mail to protect data they send and receive.
A few years ago, telecommuting seemed a dream to many employees. Besides the lack of high-speed communications into a network, employers were afraid that their workers would “slack off” because the boss wasn’t around. Nowadays, employers are finding that employees are more willing to put in extra hours because of the flexibility that telecommuting provides, and are becoming more productive. Since the advent of xDSL and cable modem technologies, employees are able to access the company network at an acceptable speed.
Summary
I
n this chapter, you learned about many different kinds of network and Internet security issues. You learned that in the security world, it is important to be able to authenticate a person or a computer to your network. Authentication involves some form of check to ensure that the person or computer is who they claim to be, and involves such methods as passwords, smart cards and keys, biometric checks, digital certificates and digital signatures. Digital certificates are like digital passports. They are issued by a certification authority that verifies an applicant’s identity and issues the certificate. There are several types of certificates that are issued, each with a higher degree of certainty that the individual possessing the certificate is who they say they are. The degree depends on the type of verification, such as a signature letter and photocopy of their driver’s license to an in-person verification. Digital signatures, like certificates, are also another method of authentication, but in this case the signature ensures that a message wasn’t altered as it traveled from the sender to the recipient. While authentication verifies an individual, sometimes you need to ensure that information transmitted over the Internet is “unreadable.” Credit card transactions fall into this category; hackers love to get credit card account numbers through these transactions. Encryption technologies allow you to encode a message so that only the sender and the intended recipient can read it. We discussed several forms of encryption, such as private-key encryption, public-key encryption, and Secure Sockets Layer (SSL) technologies. Privatekey encryption uses a single, private key that belongs to an individual. For the recipient to decrypt a message, they must have the sender’s private key.
We found that sending the private key over the Internet was a bad thing, so another mechanism—public-key encryption—was developed used. Publickey encryption still uses a private key, but adds a public key that allows messages to be decrypted. Using this technique, the private key is protected because it is not transmitted over the Internet. Secure Sockets Layer (SSL) technology allows a web browser to perform encrypted transactions with a web server. In addition to authentication methods and encryption techniques, firewalls are another major tool of network security. A firewall is a composed of hardware and/or specialized software that prevents unauthorized access to your network. Firewalls come in three basic configuration types: bastion host, threehomed firewall, and back-to-back firewall. Each type uses a special network segment that’s kept separate from the internal network, and is called the demilitarized zone (DMZ). Along with the different configurations, firewalls also use a combination of access control lists (ACLs), dynamic packet filtering, application filtering, and protocol switching to ensure that only authorized clients can get inside the DMZ. We also took a look at some of the various kinds of networks and the security considerations that you need for each type. Because an intranet is mainly subject to internal attacks, you must configure user access rights so that clients can’t get to data that they don’t need to see. On the other hand, extranets allow certain outsiders access to special resources, usually business partners or clients of your company. Secured directories on a web server, Secure Sockets Layer (SSL), and modifications to an ACL will help you to secure an extranet. Internet connections, because of their open nature, require specialized security that usually involves firewalls, proxy servers for outbound access, and scrutinizing system logs for any suspicious activity.
Exam Essentials Understand what authentication is and the different technologies used. Authentication is the process of ascertaining that an entity, a person or a network device, is who they claim to be. Methods of authentication include passwords, smart cards and keys, biometric checks, and digital certificates. Understand digital certificates and how they are supplied. A digital certificate (like passports) provides some guarantee that their holders are who they claim to be. Digital certificates are issued by a certification
authority (CA) after some form of verification of identification has been performed. Know what a digital signature is and how it is used. A digital signature is a piece of electronic information that serves to guarantee that an item—a document, a credit-card number, file, and so on—was not tampered with as it traveled over the Internet from sender to recipient. Know how encryption works and the various forms of encryption technologies used. Encryption is the process of rendering a message unintelligible to all parties but the sender and recipient. The types of encryption we discussed include private-key, public-key, and Secure Sockets Layer (SSL). Identify the different types of firewalls. A firewall is a computer or other network device that prevents unwelcome parties from gaining access to a secured network. There are three types of firewall configurations that you need to know: bastion host, back-to-back firewall, and a three-homed firewall. Understand the different methods used by firewalls to prevent unauthorized access. Firewalls use a variety of methods to prevent hackers from getting into your system: dynamic port filtering, application filtering, protocol switching, and ACLs. You need to know how each method works and recognize when to use them. Identify the different types of denial of service (DoS) attacks. A denial of service attack is a method used to deny clients of computer resources. Forms of attack can include ping floods, SYN attacks, and mail flooding. Identify the security requirements for an intranet, extranet, and Internet connections. Each type of network has different security requirements. Intranets are more susceptible from an internal attack, while Internet connections are at risk from everyone on the Internet.
Review Questions 1. A denial of service (DoS) attack aims to do the following: A. Steal information B. Prevent a particular user from accessing a resource C. Prevent anyone from using a resource D. Render all passwords invalid 2. Biometrics is the term applied to methods of _______. A. Verifying a user’s identity through assessment of his or her physical
characteristics B. Using fingerprints as a means of security C. Comparing the physical traits of a machine’s various users D. Assessing the severity of security breaches by applying algorithms
that derive from genetics 3. Nonrepudiation technologies facilitate accountability by _______. A. Making it harder to acquire a public key B. Making private keys harder to copy C. Making it possible to say with some certainty that a particular
person or entity did something D. Guaranteeing that messages aren’t tampered with in transit
4. A firewall prevents unauthorized access to a secured network by
_______. A. Banning known bad guys from the network B. Adding an extra layer of encryption C. Monitoring and controlling packet flow D. Maintaining a list of users’ access privileges 5. An example of dynamic packet filtering is _______. A. A packet is received by a firewall that is out of sequential order and
then dropped B. A packet is received by a firewall that is in sequential order and
then dropped C. The firewall orders packets in random, but incrementing numbers
before sending them D. The firewall orders packets in sequential order before sending them 6. You should never choose a password _______. A. That includes hard-to-remember characters B. That is very long C. That lacks an underscore character D. That appears in any dictionary 7. A three-homed firewall is a firewall configuration with _______. A. Two network cards—one to the Internet and one to the internal
network B. Three network cards—one to the Internet, one to the DMZ, and
one to the intranet C. Three network cards—two to the Internet and one to the intranet D. Two firewalls—one on each end of the DMZ
8. A brute force attack ______. A. Involves smashing a computer with a heavy object B. Involves a bunch of rapidly generated random guesses against a
password, telephone number, or other security element C. Necessarily involves a flood of e-mail D. Exploits a characteristic of the TCP/IP protocol suite 9. What is the key to foiling social engineering attacks? A. Longer passwords B. Better firewalls C. More restrictive access controls D. Education 10. An attack motivated by money is worthwhile only when _______. A. The value of the attack’s results exceeds the cost of the attack B. Credit card information is stolen C. Money is transferred to an attacker’s account D. The attacker is a paid professional 11. Your risk of coming under attack by thrill-seekers is higher if ______. A. You post lots of Quake files B. Yours is a high-profile organization C. You keep quiet about your computing resources D. You use access control measures
12. What does a ping flood involve? A. A quirk of the SMTP protocol B. Exploitation of a Windows NT weakness C. Breaking a password D. Overloading a machine with ordinarily legitimate requests of a
certain kind 13. A SYN flood takes advantage of _____. A. A characteristic of the TCP/IP protocol suite B. A characteristic of the NNTP protocol C. Recompiling the Linux kernel D. The Ping utility 14. A Ping of Death can _____. A. Disable the Ping utility B. Crash a computer C. Harm an attacker’s computer as well as the victim’s D. Slow down a computer for a few seconds 15. What is authentication? A. A password challenge-and-response sequence B. A biometrics setup C. Any technique for guaranteeing the identity of someone or something D. Relevant in anonymous FTP connections
16. One problem with biometrics is that _____. A. It’s not certain that fingerprints are really unique B. Retinal scanning is painful C. People’s appearances change over time D. Sensing equipment is extremely expensive 17. What is encrypted text called? A. Codetext B. Spaghettitext C. Cryptotext D. Ciphertext 18. What is the advantage of authentication that uses a key or a card? A. Users are likely to know when their key or card has been stolen. B. Keys and cards are guaranteed to provide absolute security. C. Keys and cards are able to eliminate the need for passwords. D. Users’ public keys may be encoded on magnetic cards. 19. Minimally secure encryption requires a key that is _____ bits long. A. 32 B. 128 C. 256 D. 40
Answers to Review Questions 1. C. A DoS attack breaks or overwhelms a resource and prevents
anyone from using it. An attacker could bring about a DoS situation by disabling passwords, by the way. 2. A, B. Biometrics includes any method of recognizing a user, including
face recognition, fingerprinting, voiceprinting, and retinal scanning. 3. C. Nonrepudiation makes it harder for someone to question the
authenticity of a message, acknowledgment, or other piece of information that appears to have come from him. 4. C. Firewalls monitor and regulate the flow of packets in and out of a
network, and may use Access Control Lists (ACLs) and dynamic packet filtering to do the job. 5. A. Hackers sometimes intercept packets along the way and then mod-
ify them, which changes the sequential number in each packet. If the firewall receives a packet that doesn’t fit what it expects, it will discard it. 6. D. Bad guys sometimes try to find passwords by running through the
words in a dictionary. A good password should be at least six characters long and include obscure characters. 7. B. A firewall configuration that uses just two network cards is a bas-
tion host, while a two-firewall system is a back-to-back firewall configuration. With two network cards going to the Internet, and one to the intranet, you have no DMZ; therefore, answer B is correct. 8. B. Brute force attacks can take many forms, including floods of e-mail.
The term most often applies to the use of an automated guessing program applied to a password.
9. D. Educating members of your organization about social engineering
attacks and how they work is critical. Tell people to never give their access information to anyone—legitimate technicians won’t need it. 10. A. The value of information stolen (which may or may not be directly
monetary in nature) must exceed the cost of the human and computational resources devoted to the attack. 11. B. People out to gain bragging rights by breaking into a system need a sys-
tem that everyone will recognize. Microsoft, the Pentagon, and TV networks are favorite targets. 12. D. In a ping flood, a machine is sent lots of “pings” that it must
acknowledge to prove it is alive. Too much pinging saps the machine’s capability to process other jobs. 13. A. A SYN flood fills up a machine’s buffer for keeping track of TCP/
IP connections that are in the process of being made. 14. B. The Ping of Death is a super-large ping that can crash the computer
to where it’s sent. Most operating systems have patches that protect them from these attacks. 15. C. Authentication is any technique for guaranteeing the identity of
someone or something, including biometric schemes and passwords. 16. C. If users wear makeup, gain weight, lose weight, or speak with emo-
tion, their biometric data may change. 17. D. Text to which an encryption algorithm has been applied is called
ciphertext. 18. A. Nothing can provide absolute security, and keys and cards are usu-
ally used in combination with passwords. Although passwords can be stolen without users’ knowledge, however, keys usually can’t be—at least not for long.
19. D. Keys shorter than 40 bits may be broken quickly. On the other
hand, 128-bit encryption is pretty much unbreakable. 20. A. Private keys have to be synchronized in private. This works for cer-
tain applications, but if you transmit your private key with an Internet user, it could be intercepted during transmission. A hacker would then be able to decrypt any message you send out, which defeats the purpose of the encryption.
Internet Development I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 2.1 Understand and be able to describe programming-related terms. Content could include the following:
API
CGI script
SQL
Client-side scripting
Server-side scripting
2.2 Understand and be able to describe differences between popular client-side and server-side programming languages. Content may include the following:
he state of the art in Internet software development is a slippery thing. New technologies are always emerging; others fade away as they’re recognized as too weak, too complicated, or too expensive. Your job as an Internet professional is to stay familiar, in a broad way, with what’s new in software development. Then, when you’re called upon to solve a particular problem, you should be able to consult books and Internet resources to figure out which technologies apply and how to use them. There are no authorities, only people willing to teach themselves quickly. This chapter endeavors to introduce you to some Internet software and media development concepts that seem built to last, at least for a while. Understanding these will help you grasp the new developments as they appear.
Network Software Concepts
S
un Microsystems notes in its advertisements that “the network is the computer.” Increasingly, computation has more to do with sharing information among machines than with processing it on any given machine. Clients request data from servers; servers make requests of other servers; processing jobs are shared among many machines and therefore accomplished faster. This section has to do with the design of software systems that are meant to operate in networked environments.
Client-Server Architecture Client-server architecture describes a particular kind of relationship between two computers. It’s easy to assume that a server is a big, powerful computer and that a client is an everyday personal computer, but that’s not necessarily true. The server in a client-server relationship is simply a computer on which a particular resource resides. The client is a computer that requests a copy of that information for its own use. Clients and servers don’t have to be thought of as items of hardware. They can be separate pieces of software, and indeed the two pieces of software in the equation can be running under the same processor. You can install a web server—Apache, Personal Web Server, whatever—on your machine and request web pages from it, using a web browser that’s running as a separate process. For this reason, you’ll hear web browsers referred to as “web clients,” just as programs like Apache are called “web servers,” independent of the hardware they run on. When you surf the Web, you’re acting as the client in a series of clientserver relationships. The computers that are identified by domain names—yahoo.com, amazon.com, and so on—contain collections of data. When you request a document that’s identified by a particular URL, a copy of that document is sent from the computer on which it resides (the server) to your computer (the client). Similarly, when you use your computer to send a query to a database server and receive a particular set of data in response, your machine is the client in a client-server relationship. Client-server transactions need not involve web pages by any means; however, web servers and web clients illustrate many client-server concepts clearly, so we’ll focus on them for the present.
Client Software In the case of web surfing, a web browser is the client software in a clientserver relationship. People use their web browsers to request information from other computers. The browsers then receive the requested information and are responsible for presenting it to their users properly. The process of interpreting and presenting information is not trivial because web pages can consist of several different kinds of data:
Text with embedded markup information
Special, embedded content that browsers can interpret without outside help
Code written in a scripting language, which must be interpreted and executed
Code written in Java or an ActiveX language
Content that must be interpreted by a plug-in
Let’s explore these different kinds of content in greater detail. Text and Markup Languages The simplest web pages are just text documents—sequences of characters, formatted as American Standard Code for Information Interchange (ASCII)— with special sequences of characters inserted here and there to indicate what the text means and how it should be displayed. You’re probably somewhat familiar with the most popular web markup language—Hypertext Markup Language (HTML). We’ll cover some of its details later in this chapter, but for now, just know that every web browser incorporates an HTML interpreter that interprets the HTML tags in documents. There are other markup languages; eXtensible Markup Language (XML) isn’t as concerned with how text is displayed as with the meaning of individual passages of text. XML is one way to share database information among machines. Native Embedded Content In addition to their capability to interpret and display HTML-formatted text, web browsers can display certain other kinds of data that is referred to in HTML documents. Graphics fall into this category. All web browsers can interpret JPEG and GIF graphics; various browsers can interpret other image formats as well (Microsoft Internet Explorer, for example, can display BMP images). The pieces of software that interpret these embedded files—the interpreters—are inseparable parts of the web browsers. Scripting Languages Sometimes, the markup tags that define web documents include passages of code in scripting languages like JavaScript or VBScript. You can use scripting languages to provide your web pages with a certain level of interactivity and animation. The code that makes up programs—you’ll learn more about it later in this chapter—is never compiled. It always remains readable to human beings. Web browsers that are capable of running scripting-language programs interpret the code directly, without it first being compiled.
Java and ActiveX Content Though they’re kind of like graphics in the sense that they’re assigned a rectangular region and interpreted separately from the HTML document in which they’re embedded, Java applets and ActiveX controls deserve special mention. Java applets and ActiveX Controls are little pieces of software that occupy portions of the browser windows in which they run. Plug-In Content Although browsers can make sense of and display certain kinds of content natively, other kinds of files—such as those containing certain kinds of sounds, three-dimensional graphics, and other special media—fall outside the browser’s built-in capabilities; however, the most popular browsers can take plug-ins, which are software modules that users add to their browsers to expand their capabilities. When a browser encounters a piece of embedded media that it can’t interpret natively, it checks its roster of plug-ins to see if one of them can handle the content. If so, the handling is seamless—it looks to the user as if the browser can handle the content by itself.
The problem with plug-ins is that no one has all of them, and you’re asking a lot of a user if you ask them to download and install a particular plug-in just to view your content. The basic rule is to stick to media that browsers can interpret natively. If you need to use plug-in content to get an effect you need—chemists, for example, use special file formats to display images of complex molecules in three dimensions—make sure the benefits to users are great enough to warrant the trouble of downloading and installing the plug-in.
Server Software In the client-server equation as it applies to web publishing, the web server is both a piece of software (the program that doles out pages in response to client requests) and a unit of hardware (the physical machine on which the server software runs). Popular web server software includes:
Each handles requests for web pages. They may also have capacities to work with server-side scripting languages, extensions, Java servlets, and independent server-side programs that provide some sort of processing service. Server Extensions A server extension is sort of like a plug-in for web server software. A software module that expands the capabilities of the server software itself, server extension may be published by the same company that put out the server software or by another publisher. The Microsoft FrontPage Extensions are perhaps the most popular server extensions around. They allow servers to support some of the interactive features Microsoft FrontPage lets developers embed in their pages. Java Servlets Java servlets are special Java programs that expand the capabilities of web server software. Similar to server extensions, the advantage of a Java servlet is that a single, properly designed servlet will run under any operating system for which there is a Java Virtual Machine (JVM)—the same as any Java program. Server-Side Scripting Languages Server-side scripting languages are programming languages designed for writing code that’s embedded in web pages and interpreted before those pages are sent out to the client. Active Server Pages (ASP) and Hypertext Preprocessor (PHP), both covered later in this chapter, are two of the most popular server-side scripting languages. Compiled Server-Side Programs and Other Servers In addition to exercising their native capabilities and those provided by add-in servlets and extensions, web servers may also cooperate with other kinds of server software, such as a database server. When a program written in a serverside scripting language requests data from a database server, the web server is capable of allowing the script to access the database server, and retrieve the information it needs.
Communication between Client and Server If you’re going to have software that takes advantage of client-server architecture, you must have mechanisms in place for transmitting information back and forth between the two elements in the equation.
Forms As far as the Web is concerned, client-server communications rely upon forms in web pages that collect information from the user. The data the user puts into those forms is then submitted (or posted) to the program on the server side. The form data is packaged and sent to the server in a format defined by the Common Gateway Interface (CGI) specification. Common Gateway Interface (CGI) The Common Gateway Interface (CGI) specification defines a way of packaging text data (such as the contents of a form) for transmission over a network. The CGI specification takes the contents of a form and the names of the form elements (as specified in HTML) and assembles them all into a long string of characters. That string can then be passed to a program on the server side for processing and a response. A CGI submission might tell a server-side database interface to search for something and return a results page, for example. Computers running Microsoft’s Internet Information Server (IIS) support the Internet Services Application Programming Interface (ISAPI), which can be used to carry out CGI-like tasks on IIS-equipped servers.
Do not make the mistake of thinking CGI is a programming language. It is not. Although it is correct to speak of a “CGI program,” such programs are referred to that way because they accept CGI data as input (as from a web form), not because they are written in a language called CGI. There is no such language (at least, there is none of any significance).
Security The problem with CGI is that it involves the transmission of a text string— that is by default readable to anybody who cares to intercept it—across a network. If you’re using CGI to move data over the Internet, you’re exposing yourself and your client-side users to security risks. You should never enter sensitive information, such as credit card data, into a form for CGI transmission unless you’re sure the transmission will be protected. How are CGI submissions protected? Typically, Secure Sockets Layer (SSL) encrypts data transmitted between a client and a server. Read all about SSL and other encryption technologies in Chapter 5.
Trends in Network Computing For many years, the trend in computing technology was toward more storage space, faster processors, more capable software, and lower prices on everything. Now it seems that, although “more, faster, and cheaper” will always be appealing, computer technology is getting to be “good enough” for many everyday applications. Personal computers spend most of their processor cycles idling, waiting for their human users to do something. The trends in computer technology have mainly to do with network computing, which is computing performed by multiple computers linked together on some kind of network. Whether the collaborative computing has to do with sharing information or sharing processor power, it seems that a network often provides more computing power than its component machines could if they were acting alone. You can distinguish the two main trends in networking by determining what is shared:
In enterprise computing, it’s typically the data that’s shared. One machine can access and operate upon information from another or from a central repository.
In distributed computing, the processing work is spread over several machines.
Of course, there’s some overlap, and it’s not always easy to tell where processing begins and data-sharing ends. Let’s explore these trends further.
Enterprise Computing Enterprise computing has to do with sharing information among the applications with which an organization, such as a company or a unit of government, does its business. This kind of sharing can help the company realize efficiencies in its overall process of buying, making, processing, and selling goods and services and in the accounting, finance, human resources, and management infrastructures it maintains to facilitate those processes. An Enterprise Resource Planning (ERP) system, because of its purchasing features, might note that two different plants were buying the same part from different suppliers. Because of its integration with the manufacturing process, it might note that one supplier’s products had far fewer defects than the other supplier’s product did. The two purchasing agents at the plants might have had spotty communication without the ERP system, but the ERP system is much better able to spot situations like this and advise the buyer of the faulty parts to try the other plant’s supplier.
SAP (pronounced as three distinct letters), a German company, is generally recognized as the world leader in enterprise computing. Its flagship ERP product, SAP R/3, is standard equipment among very large organizations all over the world, and SAP consultants make very good coin planning, implementing, and expanding SAP R/3 installations. The trend is beginning to trickle down to smaller companies, as well, and some pundits expect that technologies like XML will make it easier to share information among small-business software products from many publishers.
SAP has several web sites, each of which highlights some aspect of its business. The main one, www.sap.com, will direct you to the information you need. Another site, www.mysap.com, focuses on SAP’s products and services for companies somewhat smaller than enormous.
Distributed Computing Distributed computing has to do with spreading portions of a computing job over several machines. This is not parallel processing, in which an operating system divides the running of some program over several CPUs, but rather a system of sharing discrete business tasks over several machines. The discrete tasks might include the following:
Collecting data
Managing a database that stores the data
Analyzing the data
Performing financial operations (such as billing) based on the data
Any given task might require work from several machines, each of which is responsible for one of the above. The load is spread, and therefore the job is completed more quickly. JavaBeans JavaBeans are the epitome of componentization in the Java language. It’s possible to build a Bean with an elaborate feature set and considerable power, then use it as a sort of “black box” that carries out some particular duty in various software systems of which it is a part. You can distribute your Beans to others as well, either freely or, because Beans are compiled and have no visible source code, commercially. Beans have found applications in Sun’s own graphical user interface toolkit, as well as in other projects.
Component Object Model (COM) The Component Object Model (COM) is Microsoft’s answer to JavaBeans. It’s a means of writing independent code modules and having them communicate with one another. The idea is that you could have a single, specialized COM module that various other pieces of software (other COM modules and non-COM programs) refer to for different reasons. A single COM module might, for example, serve to perform queries on any specified database. One program might refer to that COM module to query an employee database; another COM module might refer to the query module for accessing a sales database. It’s an economy of scale: one COM module does double duty. Distributed Component Object Model (DCOM) Distributed Component Object Model (DCOM) components are really no different from COM components. DCOM is a subset of COM that includes COM modules running on different machines and referring to one another across a network.
Understanding Programming Languages
A
programming language is any system of syntax and grammar that, when used to generate sets of instructions called programs, can have an effect on the behavior of a computer. Programming languages broadly include everything from HTML to C++. Here, we’ll focus on highly capable, full-featured development languages and slightly less capable (but simpler) scripting languages.
Full-Featured Development Languages A full-featured development language is one that can be used to write standalone programs. You can use a full-featured development language to write an elaborate word processor, a web browser, or a database front end. You could write a whole operating system with certain languages if you wanted to. Examples of such languages in the network software milieu include:
These are compiled languages. When you have written source code, you must run it through a special processor called a compiler that translates the humanreadable source code into machine code that processors and operating systems can understand. An operating system makes its resources available to programs running under it by means of an application programming interface (API). Java fans will tell you that their language is not really compiled, and that’s sort of true. When you write Java code, you must run it through a program that converts it to bytecodes, which are instructions that have meaning to an imaginary chip called the Java Virtual Machine (JVM).
Java The prime attraction of Java is platform neutrality—a concept that requires some background. Other programming languages, such as C++, require that programmers write different programs for different operating environments. There’s one version of Microsoft Word for Windows 95/98/ME/2000, a different version for Windows 95/98, another version for Windows 3.x, and another version for MacOS. That’s not acceptable on the Internet, where a server might have to provide the same program to many different kinds of computers—and might not be able to determine what kind of computer needed the program at a given time. Plus, it’s a pain—an expensive pain—for software developers, who must multiply the effort required to write a program by the number of platforms they want to support. The single biggest attraction of Java is that, theoretically, you can write one Java program, compile it, and expect it to run similarly under Windows 2000, Windows ME, Windows 95, MacOS, Solaris, and half a dozen other operating environments—a feature called platform independence or architecture neutrality. In practice, architecture neutrality doesn’t work as well as many programmers would like, especially with complicated graphical programs. But it works fine with simple programs, and future versions of Java will surely have even better architecture neutrality. And, most important, other languages don’t even make attempts at platform independence.
Visual Basic Microsoft Corporation designed Visual Basic as a tool for fast development of programs for its Windows operating systems. The latest versions of Visual Basic incorporate a certain amount of sensitivity to network software development, but the primary use of Visual Basic is in writing software for use under Microsoft Windows 98, Windows Me, Windows NT, Windows 2000
and the soon to be released Windows XP operating systems, and integration with their Microsoft Office suites. A large portion of the work done with Visual Basic involves the creation of user interfaces for databases. Lately, Visual Basic has become more oriented toward network programming. You can use Visual Basic to create ActiveX controls for embedding in web pages that will be interpreted by Microsoft Internet Explorer. You also can use Visual Basic to create Component Object Model (COM) components.
You’ll find the latest Microsoft news about Visual Basic at msdn.microsoft .com/vbasic/.
C and C++ Widely regarded as the top of the heap as far as general-purpose programming languages are concerned, C and C++ are the standards by which most other such languages are measured. You can do anything with C and C++, from manipulating the pixels on a video screen individually to reading the contents of memory one bit at a time. Programs written in the C languages are fast, too. What’s the difference between C and C++? The latter is object oriented, which means (to cite one important characteristic) it supports the creation of code modules that can inherit traits from other such modules. Say you’d created a module that performed a countdown operation. If you wrote that module generically enough, you could write other modules that exploited the countdown capability in different ways. You might, for example, write one program that displayed a countdown in Roman numerals and another that used Thai numerals, both based on the same abstract underpinnings. You’ll often note that C and C++ programs consist of more than one file. There’s usually a primary executable, then several libraries. Windows uses dynamic link libraries (DLLs) most of the time; other platforms use static linking arrangements. C and C++ both have large communities of developers using them, which means there’s a large body of prewritten C and C++ code out there (both for sale and in the public domain). An increasing amount of the prewritten code has to do with communicating information over networks, so it’s fair to say that the languages are network friendly. They don’t, however, support crossplatform development like Java does. If you compile a C program for MacOS, it’s a MacOS program forever.
General-Purpose Scripting Languages A scripting language is any language whose code is not compiled. Further, scripting languages rely on a host to interpret their programs and provide an environment for them to operate in. The host can be an operating system (most operating systems support at least one scripting language) or a standalone program (most programs’ macro languages are examples of scripting languages). Because they’re easy to learn (relative to full-featured programming languages) and so well suited to quick-and-dirty solutions to problems, an awful lot of server site activity is coordinated by programs written in generalpurpose scripting languages. Often, developers will use an operating system’s native scripting language (shell scripting) or install and use a more capable (or more attractive) scripting language like Perl or Tcl. There are two types of scripts: server-side scripts and client-side scripts. Server-side scripts are processed by the server. As you might have guessed, a client-side script is processed by the client.
Shell Scripting Because of the need of users and administrators to automate miscellaneous tasks, most operating systems have a shell scripting language. A shell scripting language is a programming language (usually a fairly simple one) that can issue command-line instructions, manipulate files and directories, do some text management, and invoke other programs. The MS-DOS batch language (which lives on in Windows NT 4 and elsewhere) is an example of a shell scripting language.
Fans of MS-DOS batch files will be severely disappointed with Microsoft Windows Me and Windows 2000. Microsoft has finally made the big push promised with Windows 98 toward MS-DOS’s demise. By using scripting languages for your network applications, you’ll be able to get around this pitfall if you start early.
Users of Unix variants can typically install any of several shells on their machines to provide different sets of commands and different capabilities. These shells—csh, ksh, and bash are some examples of Unix shells—have their own shell scripting languages. Microsoft Windows 98 is the first Microsoft operating system in some time to have a decent shell scripting language. Its Windows Script Host (WSH), available as a retrofit for Windows NT 4 and standard equipment on all versions of Windows 2000, allows you to perform
shell scripting tasks with JavaScript, VBScript, or any other scripting language for which someone makes available a WSH language module. An updated version will be available on Microsoft’s new Windows XP system. For web developers, shell scripting languages are suitable for certain user interactivity tasks. You could use your shell scripting language to take information from a form and store it in a file or to assemble somewhat customized pages for users to view. Still, compared to Perl, shell scripting languages are usually pretty weak.
Perl The darling of system administrators and web developers everywhere (as well as one of the greatest triumphs of the open-source movement), Practical Extraction and Reporting Language (Perl) is like a good video game: easy to learn, but hard to master. The odds are excellent that if the task you have in mind can be accomplished at all, it can be accomplished with Perl. Serverside scripts written in Perl may not run as fast as compiled programs or exhibit the elegance of object-oriented programs, but they get the job done and usually can be written in a hurry. Perl excels at text processing, which means it’s great at picking information out of form submissions and very good at assembling custom web pages in response to surfer requests. Its various modules—there are many—provide Perl programmers with easy access to many different environments, including all popular server-side databases. Plus, it’s free. Larry Wall and a community of developers have released Perl to the public domain and continue to develop it for the public benefit. Though web developers are mainly concerned with command-line Perl programs that interface with remote web clients, the language also can be used in conjunction with the Tk graphical user interface (GUI) toolkit to create programs with graphical interfaces.
The best way to learn about Perl is to install it and play with it yourself. You can download the Perl tools (in source code form and in various compiled forms) from many sites, including the Comprehensive Perl Archive Network (CPAN) at www.cpan.org/.
Tcl The Tool Command Language (Tcl) is a scripting language designed as a sort of glue with which to share information among applications. Pronounced “tickle,” Tcl was developed by John Osterhout. Like Perl, Tcl can work with
the Tk GUI toolkit to create windowed applications. You’ll often hear the two referred to collectively as Tcl/Tk. Tcl isn’t as big a deal for web development as Perl and shell scripting languages.
There’s a collection of Tcl FAQs on the Web at www.faqs.org/faqs/tcl-faq/. John Osterhout started a Tcl-related company called Scriptics; its web site is www.scriptics.com. You can download the Tcl development environment there.
Web-Specialized Server-Side Scripting Languages A server-side scripting language is any language whose code is embedded in text documents and meant to be interpreted by the web server before it sends the documents (typically containing HTML and server-side scripting, as well) out to the client. Server-side scripts can, for example, insert the current date and time in a document, customize a document with the user’s name, or insert a hit counter. Server-side scripting languages can sometimes work with arguments, which are extra pieces of data supplied with the client’s request for a document. A client might, for example, request this document: results.php3 This is a Hypertext Preprocessor (PHP) program called results.php3. Presumably, this program yields output in a form the client can understand—typically HTML. A client might also specify an argument by specifying this file: results.php3?physics That causes physics to be available to results.php3 as the value for one of its variables. Such a system might power a search engine, for example.
Hypertext Preprocessor (PHP) Deriving its abbreviation from its German spelling, Hypertext Preprocessor (PHP) is a public-domain server-side scripting language. It originally was developed for Unix systems and continues to find most of its applications under Apache servers, but a version of the language for 32-bit Microsoft Windows machines is available, too. Get the details at www.php.net.
Active Server Pages (ASP) Active Server Pages (ASP) is the server-side scripting solution for Microsoft web servers—mainly, Internet Information Server (IIS). If you’re running a Windows NT server as your platform for web content and need the versatility of server-side scripting, ASP probably is the way to go. It’s based on VBScript and you can use Microsoft Visual InterDev to automate some of the development process. Get the details at msdn.microsoft.com.
LiveScript LiveScript is the server-side scripting solution for Netscape Communications Corporation’s web servers. Essentially, it’s a variant of JavaScript—often, you will in fact hear it called “server-side JavaScript.” Get the goods on this language at devedge.netscape.com.
Client-Side Markup Languages A client-side markup language is any system of adding tags (markup) to a text document to supply information about how the text should be rendered or what it means. Client-side markup languages are interpreted by the browser, so you’ll sometimes see subtle variations in how different browsers interpret the same markup.
Hypertext Markup Language (HTML) Hypertext Markup Language (HTML) is the most popular language for creating web documents. With fundamentals that are easy to learn and high-end capabilities that satisfy many advanced publishing requirements, HTML is the workhorse of web publishing. HTML code is not always elegant, it does not lend itself to searching (the way XML does), and different browsers will frequently interpret the same HTML document differently, but at present HTML is the standard language of web publishing.
A basic orientation to HTML coding appears later in this chapter.
Dynamic Hypertext Markup Language (DHTML) Dynamic Hypertext Markup Language (DHTML) is not another version of HTML, as you might think. Instead, it is a method of using a combination of scripting language and the Document Object Model (DOM). The DOM
contains standard routines for how documents, such as web pages, are handled. In essence, it is an Application Programming Interface (API) with various document functions. By using scripts that access the DOM, you can create web pages that change as the client makes different choices, such as color preferences.
Cascading Stylesheets (CSS) One of the great things about modern word processing applications is that it allows you to create stylesheets. A stylesheet is a set of tags, similar to HTML tags, that contain formatting information for text—font, color, position, indentation, etc. If you create a document with a header, say Header 1, and you decide that you want to change the font from Times New Roman 13 to Arial 10, all you have to do is change the Header 1 definition. Changing the definition changes the formatting for all of the text tagged as Header 1 to the new font. Cascading stylesheets (CSS) perform the same function for HTML documents that stylesheets do for word processors; they add style to a web page and lend a consistent look to your document. There are three parts to cascading stylesheets that you need to be aware of: Style The style is the actual color, font, margin, and positioning of text. Position Style sheets are placed in the tags, in an external file, or within an individual tag. Cascading Both the web page and the web browser can have conflicting preferences on how some things, such as hyperlinks, appear. Cascading allows different stylesheets to have a precedence order—the stylesheet that has the higher precedence wins. Even though you might think that CSS is a wonderful idea, there are two basic pitfalls that you must keep in mind before using them. The first pitfall is that if you place a stylesheet within an individual tag, the style only affects the text within the tag. Revisions to your web page may not produce the desired affect, and without good documentation could take awhile to locate in larger web pages. Also, just as HTML tags can produce different results, stylesheets are not standardized across all browser platforms. Always test your web pages with multiple browsers and multiple versions of the same vendor’s browser before you implement them on the Web.
eXtensible Markup Language (XML) While HTML describes how its surrounding text should appear in a browser window, eXtensible Markup Language (XML) describes what the text means. Rather than lock publishers into a universal set of tags that must be made to apply in all situations (as HTML does), XML allows users to define their own tags as needed. Remember, XML isn’t so much about the display of data. An XML document on its own, in fact, is not renderable—you have to supply style sheets and other information in order to associate the data in an XML document with cosmetic characteristics a browser could use. For this reason, XML is as much a data-interchange tool for communicating data among applications as a language for displaying data to a person. For this reason, XML is also well suited for sending and receiving data over wireless devices.
Document Type Definition (DTD) While XML is a data-interchange tool for communicating data, it doesn’t define documents on its own; however, you can (and must) embed a document type definition (DTD) that defines the structural layout (attributes, elements, entities, comments, and notes) of the different components of a document. The idea is that, within a given group of XML users—people in a given industry or people wanting to communicate corporate financial information to shareholders— there would be a common XML specification called a document type definition (DTD). One party could publish data in conformance with that DTD, and another party, having access to the same DTD, could interpret it.
Extensible Stylesheet Language (XSL) Stylesheets are used by desktop publishers to define the physical layout of the document with which they are working and are, in essence, templates. For example, a desktop publisher may work on business cards for a company’s employees. Instead of creating a new file from scratch every time a new employee orders business cards, the desktop publisher will use a stylesheet and then simply modify the data. Stylesheets are also used in word processing application for forms, memos, stationary, and so on. When XML was created, there wasn’t any thought of using templates for document layout; however, with the widespread use of the Internet for distributing documents, the need for stylesheets within XML was recognized. The extensible stylesheet language (XSL) was created specifically to address XML’s lack of physical layout definitions.
Wireless Markup Language (WML) The hottest trends in the information industry are wireless devices and networks. Watch TV for a while and you will see cell phone and pager ads that feature devices that connect to the Internet. The biggest problem with these devices isn’t receiving the information, but rather in processing and displaying the data with a device that has less power, smaller display screens, and less memory resources than a desktop. The Wireless Markup Language (WML), which is basically HTML for mobile wireless devices, was designed to address the limitations of wireless devices. In addition to Internet access, wireless devices are used heavily for scheduling and tasking events that empowering individuals to move around without losing track of their daily routines. WML targets event and tasking features that make it more powerful than HTML in this regard. One good resource for basic XML, DTD, WML, and XSL information is at about.com’s HTML site at html.about.com/mbody.htm. Simply type in a topic in the Search box, and you can find many resources and basic tutorials.
Virtual Reality Markup Language (VRML) Virtual Reality Modeling Language (VRML) is a way of describing threedimensional space via tags in a text document. VRML includes means of describing shapes, relative positions, surface textures, and light sources. Although it’s technically astute, fun to play with, and the center of a considerable community on the Internet, VRML really hasn’t found a killer application yet. It’s a minor publishing language.
Client-Side Scripting Languages A client-side scripting language is a scripting language whose code is interpreted on the client side of the client-server relationship. Client-side scripting code typically is embedded in HTML documents. It’s not executed when the server runs the server-side code. Instead, it’s run by the client computer after the download is complete. Client-side scripting languages can provide animation and interactivity (such as form prevalidation and simple calculations) without adding to the server’s processing load. There are two main client-side scripting languages:
JavaScript and JScript (technically two separate but similar languages)
JavaScript and JScript JavaScript is a powerful yet fairly simple scripting language you can use to add “intelligence” and interactivity to your web pages. With JavaScript, you can do such things as:
Make your browser’s status bar display a label when you pass your mouse pointer over a link
Design a registration form that makes sure visitors to your site send in valid information
Equip your pages with animated elements
JScript is Microsoft Internet Explorer’s version of JavaScript. It can handle any JavaScript program more or less. Though JScript is mostly compatible with JavaScript, Microsoft Internet Explorer doesn’t interpret everything the same way Netscape’s browser does. The lesson to you: Test your pages with Microsoft Internet Explorer as well as Netscape Navigator. Here’s an HTML document that incorporates a simple JavaScript program, just to give you an idea of the basic syntax: <TITLE> Hello
Figure 6.1 shows what the code looks like when interpreted by a browser. Note that the document.write() statement results in text that looks like an integral part of the HTML code.
VBScript If JavaScript borrows much of its structure and syntax from Java, VBScript borrows a great deal from Visual Basic. As Microsoft’s preferred client-side scripting language, VBScript can do all sorts of automation, animation, verification, and organization jobs on the client side. If you know standard Visual Basic, you’ll probably find it easy to pick up a working knowledge of VBScript. In fact, you can use your VB development environment to do many VBScript development tasks. Here’s a simple VBScript program to give you a feel for the language’s appearance: <TITLE> Hello
Figure 6.2 shows what this code looks like in Microsoft Internet Explorer (which is the only browser, by the way, with native capacity to interpret VBScript). The subroutine executes only when the user clicks the button. FIGURE 6.2
A simple VBScript program that spawns a dialog box
Databases
A
database is any collection of data. A card file is a kind of database; a list of telephone numbers on a piece of paper is a database. Computer databases are essentially the same sorts of things, but with the information (and the labels identifying it) stored digitally. Database Management Systems (DBMSes) are programs that add data to databases, extract data from them, organize the data for various applications, and generally attempt to guarantee that the database is fast, easily accessible, and secure from outsiders. Data mining applications attempt to analyze the data in databases and draw conclusions from it. Data mining applications at Wal-Mart Stores, Inc., for example, realized that most people who buy breakfast cereal at Wal-Mart also buy bananas during the same visit. The result of the data mining is that most Wal-Marts now stock
bananas in their cereal aisles, hoping to encourage even more cereal buyers to pick up some bananas on impulse.
Non-Relational Databases Non-relational databases (or flat-file databases) are essentially unadorned lists. A non-relational database might serve to correlate names with e-mail addresses or atomic symbols with atomic numbers. Non-relational databases play a big role in the operation of the Internet because they store the Domain Name Services (DNS) lists that associate domain names with IP addresses. The problem with non-relational databases is that they can’t keep track of sets of data with multiple relationships among the fields. Non-relational databases that are used to keep track of data with such complex relationships end up having multiple instances of each piece of data—an inefficiency that’s one of the cardinal sins of database design. Relational databases are better for applications that involve complicated relationships.
Relational Databases A relational database is a database with multiple tables, each having something in common with at least one other. Say, for example, that you work for your government’s automobile-registration organization. You might have one table that correlates license plate numbers to Vehicle Identification Numbers (VINs) and another table that correlates VINs with the registered owners of the vehicles those VINs represent. Still another table might contain contact information about the people who have registered vehicles with your government. So, if the police want to find the address of the person who just held up the liquor store in your town, they’d go to the motor vehicle authority with the license plate number of the getaway car. The spirited database administrator there would be able to find the robber’s name and address by associating the plate number with a VIN, the VIN with a name, and the name with an address. There are three tables involved, all related to one another by some common piece of information. This is a relational database.
Database Servers A database server is any program that’s responsible for maintaining one or more databases and responding to queries sent to them. Database servers
combine efficient storage and access with DBMS software. All major database servers comply with the Structured Query Language (SQL) specification—more on that later—though many of them supplement standard SQL with proprietary or otherwise nonstandard extensions to the language.
In addition, a computer that’s dedicated to running a database server program is called a database server.
Many companies publish database server software. They all have their fans. Some fit better into certain applications than others; some cost more than others. Evaluate the companies’ offerings next to your organization’s needs. Here are the big players, in alphabetical order:
Hughes Technologies puts out Mini SQL (mSQL), a server for small- and mid-size applications that’s free for use under certain circumstances. Read about it at www.hughes.com.au/.
Microsoft publishes Microsoft SQL Server 2002, which runs best under Windows NT. Details are at www.microsoft.com/sql/.
Oracle made its name with database servers. Its latest products are the servers of the Oracle 8I family. Details appear at www.oracle.com/ database/oracle8i/.
Sybase also has been selling database servers for years. Its latest ones compose the Adaptive Server family. Read all about them at www.sybase.com/products/databaseservers/.
T.c.X DataKonsult improved upon mSQL and came out with MySQL, which also is free under some circumstances. The server has a web site: www.mysql.com/.
For a client to be able to access a database, there must be some way to communicate with a database on the server. Scripting languages—PHP, Perl, ASP— provide one method of accessing a database; however, another method is to use a database connector from an Application Programming Interface (API) module. There are two popular connectors in use today: Open Database Connectivity (ODBC) Used on Microsoft Windows operating systems to connect to various types of databases, such as SQL, FoxPro, dBase, Paradox, and many others. Java Database Connectivity (JDBC) Similar to ODBC, but uses Java to provide connectivity to a database.
Structured Query Language (SQL) is the standard language for working with relational databases. It’s a complex, powerful language that’s far beyond the scope of this book and the i-Net+ exam, but its fundamentals are worth a mention here. SQL is a descriptive language, which is a kind of language different from the procedural and object-oriented languages that characterize most software development jobs. Here’s an analogy. Say you went to a delicatessen to buy a ham sandwich. If you were only able to speak in procedural languages, you’d have to tell the guy exactly what to do in order to yield the results you wanted. You’d have to say, “Cut the roll in half and place the halves side-byside, cut side up. Spread mustard on the halves. Slice ham until you have a quarter-pound of it. Remove the ham from the slicer.” You get the idea. With a descriptive language, you could just describe the results you want. “I want a ham sandwich on a Kaiser roll with Dijon mustard and an olive,” you’d say. And then the counterman, being a skilled maker of sandwiches, delivers what you want. The process is a lot easier. In the case of databases, the database server knows how to extract information from the databases it knows about. Clients who make requests for data from the databases describe the data they want with SQL. These requests are called queries. Let’s take a look at an example of an SQL query: SELECT firstName, lastName FROM employeeTable; That query would yield a list—called a report—of all values in the firstName and lastName fields of every record in the employeeTable table. What if we didn’t want all of them? We could try this: SELECT firstName, lastName FROM employeeTable WHERE wage > 8.5; That would yield the contents of the firstName and lastName fields in every record in which the wage field contained a value greater than 8.5. Note that queries don’t have to extract data from databases. The SQL instructions that create tables, insert data into them, and establish relationships (called joins) among tables are called queries, too.
There’s a good SQL tutorial on the Web at w3.one.net/~jhoffman/sqltut.htm.
Most web publishing involves Hypertext Markup Language (HTML) to some degree or another. This section aims to orient you to the barest fundamentals of the language. Detailed coverage of HTML is beyond the scope of this book and the i-Net+ exam. For further information on the language and its uses, refer to a specialized tutorial or reference book.
How to Write an HTML Document If you’re going to write HTML, you need a tool for doing so. You have two options:
A text editor for editing the HTML code directly
A graphical editing environment that handles code generation for you
Editing Code Directly If you intimately understand HTML, you may find it preferable to work with the code directly. Because HTML files are simply ASCII text files with special tags added, you can edit HTML code with any text editor. Which text editor you use is more a matter of preference than anything else, though some editors ship with macros and other aids that speed the HTML writing process. Some popular text editors include:
Wilson WindowWare’s WinEdit (www.winedit.com)
Bare Bones Software’s BBEdit (www.bbedit.com/products/bbedit/ bbedit.html)
Fookes Software’s NoteTab (www.notetab.com)
SimpleText (ships with MacOS)
Notepad or WordPad (ships with Microsoft Windows)
Emacs (ships with most versions of Unix)
There are also some hybrid editors on the market. These editors allow you to switch back and forth between a direct view of the code (as in a text editor) and a view of the formatted page as it would appear in a browser (as in a WYSIWYG editor, discussed next). Allaire HomeSite (available for trial download at www.allaire.com/products/homesite/index.cfm) is such a product.
Using a WYSIWYG Editor Most people prefer not to deal with HTML code directly and would rather use a development tool. With HTLM code, you can’t see how any changes look without saving the file and opening it up in a web browser. Another drawback to HTML is that over the years, so many new tags have been incorporated that it gets to be difficult keeping track of where the code is located in your HTML document, even to experienced professionals. The easiest method is to use a What-You-See-Is-What-You-Get (WYSIWYG) editor that displays your changes immediately, and takes care of the messy details of coding for you. There are several popular programs in this competitive market. The top players include:
Microsoft FrontPage (www.microsoft.com/frontpage/)
Allaire HomeSite (www.allaire.com)
These WYSIWYG editors work a lot like word processing programs like Lotus Word Pro. You enter, arrange, and format text and graphics visually, allowing the software to take care of the underlying code. Sometimes, you’ll hear WYSIWYG editors called graphical user interface (GUI) editors because of their graphical appearance. Figure 6.3 shows a page under construction in DreamWeaver. Viewing Your Code Web browsers allow you to view the HTML code on which a web page is based. You may want to do this if you want your own pages to include an effect you spotted on someone else’s web page, but don’t want to copy code exactly without permission. Here’s how to view HTML. Under Microsoft Internet Explorer (IE), choose View Source from the menu bar, or right-click the document and then choose View Source. The latter approach is particularly handy with multiframed documents. Versions 5 and above of IE also have an icon on the toolbar to enable editing with FrontPage with a single click. Under Netscape Navigator, view the HTML by choosing View Page Source from the menu bar, or right-click the document and choose View Source.
HTML Document Structure HTML documents are ASCII text files with special sequences of characters, called tags, inserted in the text the tags describe. The process of learning HTML is largely a process of learning the different tags, knowing what they do, and knowing how they interact with one another. Here’s the basic syntax for HTML tags: Text to which the tags apply In that passage of HTML code, the words “Text to which the tags apply” are formatted under the rules associated with the tags. Those tags, as it happens, indicate bold formatting of text, and so a browser interpreting this passage would format the words in boldface, like this: Text to which the tags apply
Simple enough? Well, those are probably the most straightforward tags around. It gets more complex from there. But that’s how tags work, in a nutshell. HTML is not case sensitive, but it is a good idea to be consistent with your use of uppercase letters for the sake of clarity. It’s also a good idea to make judicious use of indenting blocks of code and prove blank lines between larger blocks of code. For example, if you define a table, you will want to leave at least one blank line before and after a row definition (which we will discuss shortly in this section.) This section provides only the barest introduction to HTML and the things you can do with it. Any serious HTML publisher needs a full-scale tutorial and reference book on the subject—such books typically run to more than 1,000 pages.
Basic Elements The simplest HTML document consists of a statement of the HTML specification being used (there are several), a head region, a body region, and some tags that define the whole document. You’ll notice that many tags come in pairs, with the first tag indicating the beginning of a section and the second tag with a “/” as the end of a section. Here’s a skeletal HTML document. It contains nothing, but a browser will open it without objection: This example begins with a declaration of the document type, which states that this document complies with the HTML 3.2 language specification as defined by the World Wide Web Consortium (W3C). You’ll learn more about DOCTYPE statements if you decide to explore XML in depth someday. Additionally, there are three sets of tags in the example, specifically:
defines the portion of the document in HTML format—all of it.
defines the head segment of the document, in which some scripting code and other invisible elements will go.
BODY> defines the body segment of the document, in which visible elements will go.
You also can add some simple elements to provide your document with some content. Here’s another document: <TITLE> A Very Simple HTML Document
Venus and Adonis
By William Shakespeare
EVEN as the sun with purple-colour'd face Had ta'en his last leave of the weeping morn, Rose-cheek'd Adonis hied him to the chase; Hunting he loved, but love he laugh'd to scorn; Sick-thoughted Venus makes amain unto him, And like a bold-faced suitor 'gins to woo him. That’s the same document, outfitted with some text and some additional tags. Here’s a rundown of the added tags:
<TITLE> <TITLE> define the label for the browser window that’s displaying this page.
define text as a level-one (i.e., big) heading.
define text as a level-two (i.e., somewhat less big) heading.
defines the beginning of a paragraph—used here to get some white space.
indicates a line break
Figure 6.4 shows how the document looks when rendered by a browser.
Tables Often, you’ll want to include tables in your HTML documents. Tables are handy for organizing data and highlighting the relationships among variables. They’re also useful for implementing relatively complicated page layouts. Several tags come into play in the creation of a table. Here’s a summary:
tags surround the whole table.
tags surround each row of cells (think “table row”).
tags surround the contents of each individual cell (think “table data”).
In addition to using the basic tags—which are adequate to define a table by themselves—you can insert attributes into the tags to specify additional formatting information. The opening
tag, for example, takes an attribute called BORDER. If you use the following syntax, you’ll get a border one pixel wide between all your cells and around the table’s exterior:
Here’s an HTML document that illustrates the use of table-making tags:
Forms Most of HTML has to do with rendering words and pictures in a browser window, but the most valuable Internet transactions are those in which a server sends information to you. After all, that’s how you get demographic information and credit card details. HTML includes tags that cause user interface elements to appear in the browser window. Called forms, these collections of user interface elements can include text fields, check boxes, radio buttons, selection lists, and several kinds of buttons. Detailed coverage of HTML forms is outside the domain of this book and the i-Net+ exam, but here’s a simple form that incorporates two text fields, a Reset button, and a Submit button. It could be used to collect registration information about site users:
<TITLE> A Document With a Form
Registration
Figure 6.6 shows how this form looks when it’s rendered by a browser. When the user clicks the Submit button, the contents of the form are assembled into a CGI-compliant text string and sent to the Perl program at www.davidwall.com/cgi-bin/register.pl (not a real program). That program could then process and respond to the CGI string. FIGURE 6.6
An HTML document that includes a form
Compatibility with Different Browsers When writing HTML, you have to be concerned with how your pages will look when interpreted by each of the many browsers that Internet surfers use. At the very least, you should verify that your pages render properly when loaded by the most recent two or three releases of each of the two major browsers—Netscape Navigator and Microsoft Internet Explorer. Sometimes, it’s a good idea to test them with beta releases (pre-releases of software that vendors like people to test before production) if your company employees have a tendency to download betas even when they’re told not to. At the very least, you’ll be able to troubleshoot any problems that might arise from premature installations.
Never run beta software on your production machine! Bad installations or faulty beta software (after all, that’s why it’s in beta) have been known to cause a full reinstall of the computer’s operating system. Always have a test machine around if you plan on beta testing.
Aside from testing, there are a couple of strategies you can employ when attempting to make your pages cross-browser compatible. The first is to stick with an HTML version a generation or two behind the state of the art. Browsers typically have been out for a while, and the most recent release may have come out before the announcement of the most recent HTML specification (and don’t forget that many users don’t run the latest browser anyway). The second approach involves client-side scripting. You can use a client-side scripting language such as JavaScript to determine the publisher and version of the client’s browser. That script’s test of the page’s environment can trigger the generation of different code for different browsers or cause different pages to load under different conditions.
Multimedia
M
ultimedia is kind of a strange word. Really, it ought to apply to all means of communicating ideas from one person to another, which is what media is. But in the jargon, multimedia has come to refer to media other than text and still graphics. Most commonly, it’s applied to audio, video, and certain kinds of client interfaces that allow users to click buttons, choose options, and see behaviors in response. Multimedia incorporates full-motion video with interlaced sound, silent video, animation (both with sound and without), and sound information (as from a radio). Communicating multimedia over the Internet poses special challenges. Most of the challenges have to do with the fact that multimedia presentations require lots of digital information, and the capacity for communicating data in such volume is limited.
Streaming versus Non-Streaming Media Media files are great in that they can appeal to senses that aren’t normally stimulated during web surfing. An audio file can run in the background while a user pays attention to work; a video can communicate feelings of
motion and excitement that static images cannot convey. But all this capability comes with a price. Media files—those containing sound, video, or both—tend to be huge. In a world that’s still dominated by slow modem connections, you can’t expect the members of your audience to wait patiently while a massive file dribbles in through a slow connection. As is often the case with problems that crop up on the Internet, there’s a solution to the difficulties presented by large media files and slow connectivity. Streaming media allow users to begin enjoying multimedia content before they’ve downloaded whole files. This section explores the differences between traditional, non-streaming multimedia and the improved, streaming variety.
Non-Streaming Media Non-streaming media relies on the concept of files. An article of media—a sound clip, a video, whatever—is electronically encoded in some format as a file (you’ll learn more about multimedia file formats shortly) that is stored on a server. When a client makes a request for the media file, it must be transferred to the client machine completely before the client can begin playing it. The trouble with non-streaming media has to do with time. If a user wants to see a video clip and must wait for the whole thing to download before they see anything, they are going to lose patience with the downloading process if the clip is more than a second or two long. A short clip will disappoint them otherwise. Popular non-streaming media formats include:
MOV, MPEG, and AVI videos
WAV and AU sounds
Streaming Media Compared to non-streaming media, streaming media allows for media presentations of theoretically infinite length with relatively tiny up-front download times. The idea behind streaming media is that use of the data that represents the sound, video, or whatever can begin before the entire file is downloaded. A client can request a very large audio file and begin presenting its contents after it has received only a fraction of the total file. Then, while it plays what it has received, the client continues to download more of the file. Indeed, in many cases, “files” are sort of nebulous concepts in streaming situations. Many radio stations send their live audio feeds to streaming media servers, which dole out the stream to users as it is requested. The files in these applications are more like buffers—they contain the most recent two minutes or so of the broadcast, but newly generated material is constantly
coming in and older material is being thrown away. So, a client that’s playing streaming media can be receiving an ongoing transmission from some server, “tuning in” long after the stream started and “tuning out” before the server stops sending information.
One of the major problems that a network administrator faces is bandwidth issues. While giving the client the ability to listen to radio stations and TV broadcasts may sound like a good thing, (particularly in newer buildings that tend to block radio signals), the bandwidth requirements can greatly degrade your network’s performance—both on the local network and Internet access.
Most streaming media formats implement a buffer on the client side. That is, they’ll download 10 seconds or so of the stream before they begin playing it, to allow for slowdowns in the transfer rate later in the download process. If such a slowdown occurs, the player can just draw down the contents of the buffer rather than interrupt what the user hears or sees. The two main streaming media technologies are RealPlayer, from RealNetworks, and Windows Media Player, from Microsoft. You’ll find details of RealPlayer on the Web at www.real.com/ and further information on Windows Media Player at www.microsoft.com.
Browser Plug-Ins Browser plug-ins attach themselves to the base browser code and expand its capabilities to interpret data. Plug-ins are code libraries, typically written in C or C++, that most often enable the browser to display new kinds of media. When a browser encounters a media file that it can’t interpret natively, it turns responsibility for that file over to the appropriate plug-in (assuming there’s an appropriate one installed). Some of the most popular plug-ins include: Macromedia Shockwave Shockwave allows publishers to embed fairly elaborate animations—with sound and some interactivity—in their web pages. Macromedia Flash has superceded Shockwave because of its more compact files. Macromedia Flash Like its predecessor, Shockwave, Flash provides a format for encoding animation and sound. Its files are much more compact than the Shockwave format, which makes it attractive when you consider download times. Flash files employ vector graphics, which means they’re
composed of mathematically defined curves rather than huge catalogs of pixel details. RealPlayer RealPlayer handles streaming audio and video files. Please refer to the discussion of RealPlayer as part of the streaming media coverage in the preceding section. Windows Media Player Windows Media Player handles streaming audio and video files. Windows Media Player was also part of the streaming media coverage in the preceding section. Apple QuickTime VR QuickTime VR allows the user to view a 360degree panoramic image, and is used for audio and video files. Often, QuickTime VR files find application in showing off building interiors and natural scenery.
File Formats Anything you publish on the Internet must be in the form of a file. Files are sequences of bytes, organized into discrete units that can be managed by operating systems. Within those units, the bytes take on different patterns of organization depending upon what kind of data they represent.
Image File Formats Still graphics take the form of nontextual, nonexecutable binary files. There are several file formats, each with strengths and weaknesses with regard to color depth, file size, and suitability to the network environment. The following are most important to the Internet:
GIF and GIF89a
JPEG
PNG
Graphics Interchange Format (GIF) and GIF89a GIF, pronounced either “giff” or “jiff”, files can hold images containing two to 256 colors. The version of the GIF specification that came out in 1989 is known as GIF89a and has a couple of neat capabilities:
Transparency allows pixels in the GIF file to show whatever color is behind the image—this trait is handy for creating images that have apparently irregular borders. Animated GIFs are really a series of GIFs strung together and shown in sequence. They’re one way to create somewhat flickery animations that most browsers can interpret properly without using plug-ins. GIF files also are progressive, meaning that the data in a given file is organized in such a way that a client can download a blocky version of the image first, then gradually refine the image’s sharpness as more data comes through the pipe.
Technically, CompuServe invented GIF and could try to collect royalties from companies who use GIF technology in their software—or even people who use GIF images on their web sites. In practice, though, GIF is in the public domain. To be on the safe side of this issue (and to correct some of GIF’s technical shortcomings), the Portable Network Graphics (PNG) format has evolved as a replacement for GIF.
Joint Photographic Experts Group (JPEG) JPEG, pronounced “jay-peg,” files are distinguished by their capability to include the entire spectrum of perceptible color (16.7 million shades) while still being more compact than GIF or GIF89a formats. The secret is a compression algorithm that exploits the fact that human eyes are more sensitive to subtle differences in brightness than to subtle differences in color. By simulating some color shading with brightness gradients, JPEGfiles can convey full color while still remaining relatively small. Because its compression algorithm results in changes to the image data, the compression algorithm is said to be lossy. A lossless algorithm, in comparison, would not result in changes to the data. JPEG files are not always progressive, though they can be. A progressive JPEG is one in which the information has been organized to allow progressive refinement of the image during the course of its downloading.
For a detailed JPEGweb, http://www.faqs.org/faqs/jpeg-faq/.
Portable Network Graphics (PNG) The PNG format is a recently established image specification that provides lossless compression of 256-color images. In effect, it’s a replacement for the
GIF format, which isn’t easy for programmers to manipulate and has always been dogged by copyright issues (see the GIF section earlier).
The most comprehensive PNG web site is the one Greg Roelofs has put together to support his book PNG: The Definitive Guide (O’Reilly & Associates, 1999). The URL is www.sonic.net/~roelofs/. Scroll to the middle of that page for a list of PNG links.
BMP Windows Bitmap (BMP) files are capable of displaying 16.7 million colors— more than the human eye can distinguish. However, BMP files are notoriously large and so they usually don’t find their way into situations where they must be downloaded across slow networks. web browsers can’t interpret them without plug-ins anyway. As a result, BMP files are generally relegated to duty on Microsoft Windows machines. Tagged Image File Format (TIFF) TIFF files support 16.7 million colors as well, but they are large and natively unsupported by web browsers. For these reasons, TIFF files usually are used in desktop publishing applications and scientific applications. Encapsulated PostScript (EPS) EPS files are recorded in the PostScript printer language, which means it’s possible to generate a PostScript file on one computer and print that file on any PostScript printer, even if the computer to which the printer is connected contains no software that understands PostScript. web browsers don’t support EPS, though, and so its application to Internet publishing is limited. In general, Adobe Acrobat (which will be discussed later in this chapter) has replaced the EPS format for Internet documents with graphics.
Video and Animation File Formats Like still images, moving images may be encoded as a series of bytes in a file. There are several such file formats, each with its own characteristics. Despite their differences, all video and animation files tend to be large. If you need to put video or animation on a web site, consider doing so with a streaming file format such as RealVideo or Windows Media Player.
QuickTime Developed by Apple Computer, the QuickTime format supports full-motion video and animation with synchronized sound. QuickTime files usually carry a .MOV filename extension. Audio-Video Interlaced Like QuickTime, Audio-Video Interlaced (AVI) format supports full-motion video and animation with sound. Developed for the Windows environment by Microsoft, AVI files usually have .AVI filename extensions. Motion Picture Experts Group Motion Picture Experts Group (MPEG) files handle full-motion video, but the first version of the specification, MPEG-1, does not support sound. The newer MPEG-2 standard does. Developed by a consortium of telecommunications industry representatives, MPEG files carry .MPEG or .MPG filename extensions.
Compressed File Formats Because data-transmission capacity on the Internet is limited, there’s a lot to be said for making files as small as possible. If you’re going to attach a Lotus Word Pro document to an e-mail message, say, you might be concerned that the file was 700KB in size. If you compressed it—converted it to a compressed file format—the size might be reduced dramatically. After transmission over the Internet, the recipient could uncompress the file—convert it back to its original format—and use it as if nothing had happened. In addition to shrinking files for transmission, compression has the added benefit of combining multiple files into neat packages. You can combine several files into one compressed file, then convert them back to separate files during decompression. Some compression formats allow you to extract compressed files individually, as well. Here’s a rundown of some popular compressed file formats. Zip The standard for Windows machines, zip files allow you to combine several files into one compressed unit and extract them individually. Zip files usually have a .ZIP filename extension. The two leading vendors of zip utilities are Winzip Computing’s Winzip and PKWare’s PKZIP. PKZIP was the first widely used zip utility on the market, and has evolved from a DOS-based application to a graphical version; however, Winzip has gained popularity in recent years.
You can download a version of WinZip at www.winzip.com and PKZIP at www.pkware.com.
BinHex The MacOS standard, the BinHex format allows you to combine several files into a single compressed unit and extract them individually. BinHex files usually have an .HQX filename extension. Tape Archive Designed for the needs of tape backup machines connected to Unix boxes, the tape archive format strings multiple files together into a single unit but does not compress them. Tape archive files usually have a .TAR filename extension. Often, you’ll see tape archive used in conjunction with Gzip, covered next. When Gzip compresses a tape archive file, the resulting file usually has a .TAR or .GZ extension. Gzip As zip does for Windows files, Gzip compresses files under Unix. Gzip files usually have a .GZ filename extension. Gzip often is applied to tape archive files, as covered above.
Text and Layout File Formats Multimedia files are great, but most of the information that’s sent over the Internet is in the form of text. There are many ways to encapsulate text in files for transmission. The types that we will discuss are:
ASCII Text
Adobe Acrobat
Rich Text Format
Microsoft Word
ASCII Text The granddaddy of text formats, the American Standard Code for Information Interchange (ASCII) represents an efficient way to store text (though it’s not too good for representing non-Latin character sets). ASCII text files usually carry a .TXT filename extension.
Adobe Acrobat Adobe Corporation came out with Acrobat before the Internet revolution, but the technology languished until the Web took off. Acrobat allows users on diverse platforms to view (and, in its latest version, annotate) documents that incorporate text, graphics, hyperlinks, and complex layouts. It’s better than HTML because the publisher has absolute control over how the finished publication looks on users’ machines. Acrobat files carry a .PDF filename extension.
You can download the Adobe Acrobat Reader software at www.adobe.com.
Rich Text Format The baseline format for formatted text, Rich Text Format allows you to endow text with fairly elaborate formatting—including font information— and expect that almost any word processing program will be able to open your files. RTF files take an .RTF filename extension. Microsoft Word Love it or hate it, Microsoft Word is the most popular word processing application out there. It exists in Windows and MacOS versions, and many other programs can interpret the Word file format; therefore, it’s a reasonable bet that any given user will be able to open a Word document. They usually carry .DOC filename extensions.
Summary
I
n this chapter, you learned about Internet development. First, we took a look at some of the network software concepts, such as client-server architecture. We learned that client software includes text and markup languages, native embedded content, scripting languages, Java and ActiveX content, and plug-in content. Each of the different components that make up network software adds functionality to the content that is viewed. While client software is great for the client to receive and display data from the Internet, it must have a corresponding server component to deliver that content. Server software that provides files and documents to the client includes server extensions, Java Servlets, server-side scripting languages, and compiled server-side programs. Communications between the client and
server take place with HTML requests, forms, and scripts such as CGI. To understand these concepts in more detail, we looked at some of the development languages that are used —Java, Visual Basic, C and C++—before we delved into the different scripting languages themselves. Scripting languages come in two forms: server-side and client-side. The major difference between these two languages is that server-side scripts are processed on the web server, and client-side scripts are processed on the client. Some examples that we looked at included shell scripting, Perl, Tcl, PHP, ASP, LiveScript, Jscript and JavaScript (Microsoft and other vendors, respectively), and VBScript. Scripts are used in conjunction with various markup languages, which display web content and allow for standardized data. While there are a variety of markup languages available, we looked at some very basic HTML coding to get you started. Any functional web site is going to have data in some fashion, and we looked at how databases are used at storing that information. You learned that there are non-relational databases, which are basically flat files of information, and how relational databases allow for a more efficient use of your data since it relates information from one record to another. Finally, we looked at some of the various file formats and multimedia objects that are available to liven up your web site. You learned that plugins, which are special programs that work with your web browser to display content, have a specific file format and extension that relate the file to the correct plug-in. For example, Adobe Acrobat files use a .PDF extension.
Exam Essentials Describe some of the basic programming-related terms as they apply to the Internet. Ensure that you know the difference between an Application Programming Interface (API) and a CGI script. Also, be aware that client-side scripting languages are processed at the client, which is the reverse of server-side scripting languages that are process at the server. Identify the different HTML tags, their function, and how to use them properly. Be able to list an HTML’s document structure and identify the differences between the various tags in use. Know the difference between a Cascading Style Sheet (CSS) and Extensible stylesheet Language (XSL).
Know the compatibility issues between different browsers and the importance of creating cross-browser coding in HTML. Understand that creating HTML requires that you test your code with different browsers and different versions of those browsers. Know that coding with browserspecific tags may not display properly on another vendor’s browser. Understand the different multimedia formats and when to use them. Identify the extensions that go with plug-ins, such as QTVR, Flash, Shockwave, RealPlayer, and Windows Media Player. Know which formats display which type of multimedia content. Know when to use various image and multimedia formats. Know the differences between a BMP, GIF, GIF89a, JPEG, PNG, and PDF files that display images. Be able to identify multimedia files, such as MOV, MPEG, and AVI. Identify common formats used to deliver content to wireless devices. Know that XML is not only used to deliver content to desktop computers, but that it is also used with wireless devices, such as cell phones, pages, and PDAs. Know that WML was created from HTML to be a more compact solution that requires less resources—memory, power—and has the capability to display content on smaller view screens. Identify which scripts are available to connect to a database. Be able to identify PHP, Perl, and ASP as popular scripts used to connect to databases. Identify the differences between ODBC and JDBC. Know that ODBC is primarily used on Microsoft clients to connect to various forms of databases. JDBC is a Java-based version that is used on a variety of platforms, including Unix.
Review Questions 1. Examples of server-side scripts include which of the following?
(Choose two.) A. Active Server Pages (ASP) B. Hypertext Preprocesser (PHP) C. JavaScript D. VBScript 2. Which of the following were developed for use with XML? (Choose two.) A. CGI Script B. XSL C. Perl D. DTD 3. Which of the following HTML tags define a table row? A. .
B.
C.
D. 4. How does a Java servlet differ from a non-Java server extension? A. A given servlet will run under any operating system for which there
is a Java Virtual Machine (JVM). B. Servlets run faster than server extensions. C. Servlets occupy less disk space than server extensions. D. Servlets do not support CGI.
5. Server-side scripting languages are which of the following? A. Compiled B. The same as client-side scripting languages C. Handy for doing database queries D. Not used much in web publishing 6. Headings are defined by which of the following tags? A. B. C.
D.
7. What is a Common Gateway Interface (CGI)? A. A programming language B. A kind of web server C. A specification for packaging data for the trip between client and
server D. A database query language 8. You can easily secure CGI submissions with what? A. A password B. Secure Sockets Layer (SSL) encryption C. Java encryption D. PHP 9. What is the program that translates human-readable source code into
machine language called? A. Interpreter B. Compiler C. Java Virtual Machine D. Object packager
10. What is one of the most attractive aspects of the Java language? A. The fact that an independent standards body defines it B. Its gentle learning curve for novices C. The cross-platform nature of Java programs D. Its integration with MacOS 11. Visual Basic is often used in the creation of what? A. Java applets B. Operating systems C. Perl modules D. ActiveX controls 12. Which of the following multimedia extensions are used for streaming
media? (Choose two.) A. Flash B. Shockwave C. RealPlayer D. Windows Media Player 13. Which of the following are used to deliver content to wireless devices?
(Choose two.) A. HTML B. WML C. XML D. VBScript 14. What is Perl extraordinarily good for? A. Text processing B. Creating windowed applications C. Optimizing memory usage D. Programs written in languages that use a non-Latin character set
15. What is Microsoft’s preferred server-side scripting technology for web
publishers called? A. Internet Information Server (IIS) B. VBScript C. Active Server Pages (ASP) D. ActiveX Server Extensions (ASE) 16. What does eXtensible Markup Language (XML) describe? A. How documents should appear in browser windows B. The meaning of the information in XML documents C. A client-server relationship D. Dynamic HTML information 17. JavaScript is _______ . A. More or less supported in both major browsers B. The same as Java C. Exclusively a server-side language D. Useful only for doing image rollovers 18. Why is a relational database frequently better than a non-relational
database? A. Non-relational databases require special software. B. Complex relationships among data can be represented in rela-
tional databases without multiple instances of the same data. C. Relational databases may be queried more speedily. D. Oracle supports relational databases.
19. Streaming media formats allow for slowdowns in the data stream by
________. A. Dividing the available bandwidth into “normal” and “backup”
segments B. Compressing data in the zip format C. Building up a buffer of data before playback begins D. Storing media information in a database 20. What kind of image file supports transparency? A. GIF B. GIF89a C. GIF87a D. JPEG
Answers to Review Questions 1. A, B. JavaScript and VBScript are both scripting languages that run on
the client computer. 2. B, D. Extensible Markup Language (XML) defines the data, but
doesn’t describe the document or the physical layout. Document Type Definition (DTD) was developed to define the document within XML. The Extensible Stylesheet Language (XSL) was developed to define the physical layout of the document. 3. C.
is used to define the beginning and end of the
entire table.
are the tags that define the actual cell data. tags do not exist. 4. A. Java is a cross-platform programming language, so servlets will run
under almost any operating system. Non-Java server extensions are compiled for a particular environment. 5. C. You can do a database query with a server-side scripting language
and incorporate the results into a document that’s sent to a client. 6. A. While answer C may seem correct, HTML tags follow the format
of to define the beginning and end of a block. 7. C. Common Gateway Interface (CGI) describes a way of packaging
data for the trip between client and server. 8. B. Secure Sockets Layer (SSL) is the easiest way to securely transfer
CGI data from the client to the server. 9. B. A compiler converts source code into machine code. 10. C. If you write a program in Java, you should be able to execute it on
any computer that has a Java Virtual Machine (JVM) installed— including MacOS.
11. D. Visual Basic and its development environment make it easy to gen-
erate ActiveX controls. 12. C, D. Flash and Shockwave are used to display animations, but are
not used with streaming media. RealPlayer and Windows Media Player were developed to provide access to audio and visual data using streaming technology. 13. B, C. Because of the limited capability of wireless devices—memory,
viewing area, processing power—HTML is not suited for wireless devices. Instead, WML was developed from HTML to deliver content to wireless devices. XML capabilities allow it to be used as well. 14. A. Perl has strong text-processing features, such as its capability to
work with regular expressions. 15. C. You can use ASP scripts to automate page generation tasks in
Microsoft environments. The system relies heavily on VBScript. 16. B. XML has to do with the meaning, not so much with the appear-
ance, of information. 17. A. Though Microsoft Internet Explorer’s implementation is called
JScript, it’s similar to JavaScript and many programs run the same in both environments. 18. B. Relational databases excel at linking multiple data tables through
common fields. 19. C. Streaming media applications store a few seconds’ worth of data in
a buffer that can be drawn down in the event of a network problem. 20. B. Files of type GIF89a can include transparent regions.
Internet Site Functionality Design I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 1.1 Identify the issues that affect Internet site functionality. Content may include the following:
Performance, including:
Bandwidth (both client and server)
Internet connection types (both client and server)
Pages taking too long to load
Resolution and size of graphics
Security, including:
Authentication
Permissions
Data Encryption
1.2 Understand and be able to describe the concept of caching and its implications. Content may include the following:
erhaps the most important aspect of implementing and maintaining a web site is making sure that it is accessible and usable by your audience. Regardless of how wonderful content is, if users cannot access the site in a timely and reliable way, they will go elsewhere for the information they seek. Therefore, it is important to know enough about the technologies that run the Internet that you can ensure that your site will meet the demands of its users. In this chapter, you will learn about several critical topics that have an impact on a site’s functionality and usability:
Site functionality issues
Technology and content-type planning
Caching
Site indexing
Each of these major topics contributes to the overall usability of a web site.
Site Functionality Issues
I
nternet users are a fickle bunch. Technological glitches not only harm functionality, they often cost sites their reputation for usability and reliability. At some point, everyone has run into a web page that loaded too slowly or gave too many errors, and has gone to another site. What do we do if www.amazon.com/ goes down? We go to www.barnesandnoble.com/. What do we do if a site requires ActiveX and our corporate security policy is to disallow ActiveX? We go to a different site. It is important to know the most common errors users experience and why they occur.
Functionality errors manifest themselves in three ways:
Users can’t get to the site at all.
It takes too long to download and view a page.
The document they request is missing or appears to be broken.
In the following sections, we’ll take a look at the technological factors underneath each of these errors.
Connectivity Failure The most basic web browser error is when a user fails to get any information from your web server. These attempts will generate a warning message in the browser such as “Host not found” or “Request timed out.” Such a warning message is shown in Figure 7.1—the user is trying to go to the web site www.bahoozit.com, which does not exist. FIGURE 7.1
“DNS Not Found” error message
Not all error messages indicate a connectivity problem. If the server gives the dreaded “404—File not found” error, for example, your client is connecting to the server but the requested document cannot be found. If the host wasn’t found or the request timed out, there was never a full-fledged connection between the client and the browser. Because connectivity errors mean that the server never gets a full connection to the client, such problems are often never logged on the server. As explained in Chapter 2, several client queries and server responses need to succeed for a user to browse a web page—the server’s domain name needs to be resolved into an IP address, and the client needs to make a successful
request to the web server at that address. If the user can’t get to a site at all, the problem could be caused by one of several factors:
The client’s network settings or DNS services are not working.
The client’s connection to the Internet is down.
The server’s hardware or software is malfunctioning or overwhelmed.
The server’s connection to the Internet is down or overwhelmed.
Available IP network connections between client and server are oversaturated.
The server’s DNS records are corrupt or unavailable.
Determining the exact cause of failure requires some troubleshooting. For more information on troubleshooting, see Chapter 8.
Another common reason that users get an error message is that the domain name they entered is incorrect. The best way to counter this potential problem is to register a domain name that is short, descriptive of your organization, and easy to remember. If the domain name isn’t unique sounding, people can forget it and try similar names. Some organizations register multiple domain names that people might think of going to. George W. Bush, for his U. S. presidential campaign, registered domains like www.gwbush.com and www.bush.com. In addition to registering a primary domain name, some organizations will register common misspellings.
Download and View Time One common reason a user doesn’t return to a web site is that it is too slow. How slow is too slow? Researchers at Yale claim that 10 seconds is the threshold of frustration. Users may wait longer than that if the information cannot readily be found elsewhere or if they are particularly interested in a site, but then again, they might not. So depending on the patience level of the audience, pages should finish loading within 10 seconds of the time a user clicks a link. In the following sections, you’ll learn:
The different stages of a request that can eat into those 10 seconds
How to estimate the time it takes to download a page
These sections will enable you to estimate whether your page is going to be too slow.
Some web designers fall into the trap that high-speed Internet communications technologies, such as cable modem and xDSL, are widely available in their area. The downfall to this thinking is that the Internet is a global network. Most areas, even in the U.S., still don’t have access to high-speed communications and probably won’t for the next few years.
Stages of a Request The 10 seconds a user will wait gets split up into several steps, and each step uses up a portion of that time. The major steps are as follows: 1. DNS lookup and initial connection from client to web server occurs. 2. Request sits in the web server queue, waiting to be serviced. 3. Server generates response to the request (gets a file, runs a script). 4. Server transmits the data to the client. 5. Client renders/displays the data.
Combined, steps 1 and 5, which are the ones most clearly out of the control of the server, generally take a second or two. The time required for steps 2 and 3 depends on the server configuration, although they can often also be reduced to less than a second (you’ll learn more about this in “Planning Robust Back-End Service” later in this chapter). The bulk of the time, therefore, is spent on step 4, transmitting the data from server to client.
Step 5 can sometimes take longer than one second. Slow computers may take several seconds to parse and render HTML documents. Even fast computers can get bogged down by complex HTML code, such as nested tables.
Determining Transmission Time Because step 4 generally takes the longest amount of time, it has the most impact on the apparent speed of the web site. If the web page takes too long to load, the user will get impatient and go somewhere else. Therefore, it is important to be able to estimate how long a web page will take to download for different types of users. Transmission time is a function of how large the page is, divided by the speed at which it is downloaded. The size of the page is measured in kilobytes for the HTML, graphics, and multimedia files. The standard way to express this is as follows: Time of Download = (Size of Page ÷ Available Bandwidth) For example, if a site has a 100K page, the time it will take someone to download it with a 5KB/s connection will take 20 seconds to download: X seconds = (100 kilobytes ÷ 5 kilobytes per second) = 20 seconds
Be careful not to confuse bytes and bits. People write about file sizes and download speeds using the terms kilobytes and kilobits. Bytes are generally 8 bits. Also, kilobytes and kilobits refer to 1,024 bytes and bits respectively, not 1,000. Unfortunately, some folks, especially advertisers, represent kilobits and kilobytes with inconsistent symbols. Kilobits are referred to as K, k, kb, and Kb. Kilobytes are referred to as K, k, kB, and KB. The symbols K and k are ambiguous! When looking at a number like 14K or 14k, a good rule of thumb is that modem-like devices are generally measured in terms of kilobits per second, and file sizes are almost always measured in kilobytes. Lacking any other clues, KB is likely to be kilobytes and kb (or Kb) kilobits. When writing, choose clear notation, such as KB and kb.
Bandwidth Bottlenecks When data is downloaded, it flows in a pipeline from the server to the server’s Internet connection to the general Internet, then from the client’s network connection to the client. So the available bandwidth is the speed of the slowest segment of the pipeline. In 2001 in the United States, the slowest segment is generally the client’s network connection. If a U.S. browser is visiting a server in Kenya, however, the slowest segment is likely going to be the slow connection between the Kenyan and U.S. national backbones.
Theoretical and Practical Download Speeds The goal of web designers should be to design pages that will look good and won’t take too long to download. Network connections, however, rarely perform exactly as advertised; therefore, you should consider the following:
Know the theoretical speed of different devices.
Take these speeds with a grain of salt.
It is easy to determine the theoretical speed of any device. A 56Kbps modem, for example, should be able to download about 7K per second. You can determine that with this formula: (56Kbps ÷ 8 bits per byte) = 7KB/s
As we discussed in Chapter 1, 56Kbps modems are prohibited from running higher than 52Kbps due to FCC regulations.
Table 7.1 lists the theoretical speeds of several types of network connections. TABLE 7.1
Real-world factors like initial connection times, intervening devices, and line noise slow downloads to below their advertised limits. Even with a fast server and a good ISP, a 56Kbps modem, for example, will never achieve that speed. 56Kbps modems operate at 33.6Kbps over analog phone lines. If an ISP has digital lines, there is a chance that their users will be able to get 52Kbps download speed, but uploads will stay at 33.6Kbps. A 14.4Kbps modem will often download at 1.5KB/s, a 28.8Kbps modem at 3KB/s, a 56Kbps modem will optimistically download at 5KB/s, and an unloaded T1 dedicated line will download at 180KB/s.
xDSL and cable modem users will notice large variances in their download speeds, anywhere from 384Kbps to 10Mbps. Even DSL services that are advertised at 384Kbps frequently get download speeds of 800Kbps (100KB/s) during unloaded times and 100Kbps or slower when the DSL network is saturated. For more information on benchmarking DSL and cable modems, see the links on home1.gte.net/awiner/.
Example: A Page Viewers Might Abandon Freshmeat (www.freshmeat.net), a popular Unix software directory, weighs in at 78K, almost all HTML. As you can see in Figure 7.2, modem users have a smaller Internet pipeline than DSL users do. It will take a 28.8K modem user about 26 seconds to download a page this size, whereas a DSL modem running at the advertised 384Kbps would receive it in about 2 seconds. Downloading this page hovers on the threshold of frustration for 56Kbs modem users— fickle users might get bored with waiting for the page and jump over to see if linuxapps.com is loading any quicker (at a slightly slimmer 75K). FIGURE 7.2
Download times for freshmeat.net 78KB MOO! 1 second
Example: A Page Viewers Would Not Abandon Google (www.google.com) has a highly functional search page of only 12K. As you can see in Figure 7.3, even a 14.4 modem user can download the page in less than the 10 second threshold of frustration. It is unlikely that even the impatient users would abandon the Google page in less than eight seconds to try another search engine like www.hotbot.com (a lean 30K). FIGURE 7.3
Download times for google.com 12KB 10,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000
You can test out the probable download times of any page on the Internet with this free online tool: www2.imagiware.com.
Inability to Open or View Files If people can’t use the files on your site, they will often feel frustrated and give up. Files that cannot be opened are either corrupt or are somehow incompatible with certain software and hardware configurations. In this section, you will learn the following:
How a browser successfully recognizes a file
What stops a browser from opening a multimedia file
What stops a browser from opening an HTML file
How to identify and fix corrupt files
It is important for web site owners to fix the broken files and mark incompatible ones with warnings as to who can and cannot use them.
Many times when someone says a file “won’t open,” it is because the file is simply not there. Broken links and missing files are quite common on the Web. People move the files in their web site around a lot, and the links to their old files are not automatically updated. See Chapter 8 on how to set up a system to counter this potential source of errors.
How a Browser Recognizes a File Browsers sometimes fail to display a file or display it in a mangled fashion. To understand why they fail, like good doctors we need to first understand what happens with our patient when everything goes right and the browser succeeds in displaying a file. The technology that makes this happen is MIME file types.
MIME is an acronym that stands for Multipurpose Internet Mail Extension. It allows web browsers and e-mail clients to recognize and view lots of different types of files. Servers that deliver pages tag these pages as being certain file types. Clients display these file types as best they can. Check—out whatis.techtarget.com and do a search on MIME for details.
In a foreign culture, even people who know the language need to be told when something is a joke. They often don’t pick up the subtle clues they need to change the context of their understanding from “serious” to “joke.” In a similar way, browsers need to be told explicitly what mode they should use to interpret each file. Browsers handle many different types of files. The first web browser was designed to display only HTML. Later browsers learned to understand files from Gopher servers, FTP servers, and WWWAIS index servers. The next generation of browsers learned to display inline images like GIF and JPEG files. More recently, browsers can open Adobe Acrobat portable documents, Java applets, XML documents, and others. When a browser downloads a file, the web server tells the browser exactly what type of file it is. The server uses a configuration file (MIME.TYPES in Apache and Netscape servers) to figure out what files should be marked as being which file types. As you can see in Figure 7.4, MIME.TYPES has two fields—the field on the left names a content-type. The field on the right contains all the file extensions that should trigger the web server to mark a file as the corresponding content-type in the field on the left.
The browser uses the MIME information to decide what to do with a particular type of file. The browser could try to parse and display the file, save the file, or launch an external program to open the file. The client uses a flexible lookup table mapping “MIME-type” to “what to do.” In Figure 7.4, you can see a list of different MIME types and what MIME types the browser knows belong to each extension. Figures 7.5 and 7.6 show the user configuring the exact mapping; the MIME type audio/x-pn-aiff is being mapped to run on a RealPlayer external program.
Web servers send a MIME header with each file, specifying what type of file it is. The web site administrator maintains a lookup table on the web server that matches file extension to MIME-type. If you are adding a new file type to your site, add it to this lookup table.
Missing MIME-Types and Plug-Ins After a browser uses MIME-types to recognize a file, it may use either an external program or a “plug-in” to open nonstandard file types. Plug-ins are mini-programs that work within the browser and add extra functionality, such as Shockwave or VRML browsing. If the browser comes upon a MIME-type that it doesn’t have in its lookup table, it may be unable to display the file and ask you to save the file to the hard drive. Likewise, if the MIME-type requires a plug-in, the browser may lack that plug-in and be unable to read the file. It is always a good idea to include a link to a plug-in’s web site on your web page to draw attention to the need for a plug-in, and make it easier for the client to obtain a plug-in that they may not already have installed on their machine. If the browser doesn’t have a plug-in or external program capable of opening the file, a file can appear unreadable to the user. The file isn’t really unreadable, it is just not “openable” for that particular user. If the user had a stand-alone application or plug-in that can read the file, then the file would be readable. To assist the user, Microsoft and Netscape browsers check to see if there are any downloadable plug-ins available that can be used to view a new MIME-type. Not all MIME-types have plug-ins for every platform. Some plug-ins only exist for Macintosh computers, others only for Windows. Therefore, users can be unable to open a special multimedia file because the plug-in needed to open that file simply does not exist for their platform.
Misconfigured MIME-Types If the server sends the wrong MIME-type, the browser may try to use the wrong application to interpret the data. This will look to the user like a “‘broken” file. If a document is supposed to be a Microsoft Word document but the browser tries to open it as a plain text file, MIME is probably the reason. Check that the server is sending DOC files with the MIME header application/msword and that the browser is set to use WINWORD.EXE to open files of type application/msword. One of the most common problems that clients experience with common MIME types is that the local machine has the wrong MIME type set for a particular file.
See Chapter 4 for more on configuring MIME on the client.
Malformed HTML Browsers internally render documents with the MIME-type text/html, so users don’t need any plug-ins for normal web documents; however, even HTML can be “not viewable” when one of the following conditions exist:
The HTML contains tags that the browser does not support.
The page includes an incompatible Java or JavaScript program.
There isn’t enough room to properly display the HTML.
Nonstandard Tags If the HTML uses nonstandard HTML tags, such a newly introduced or browser-specific tags, and the browser doesn’t support those tags, the page can be unusable—the frame tags without the “no-frames” option is a good example of this problem. If the HTML is invalid (for example, if it is missing closing tags), the browser may not know how to render the page and just render nothing for the entire malformed item. In the case of a malformed
, the entire page could be blank.
One of the biggest problems with many Internet sites is that they develop their web pages specifically for one browser platform, even when they take into consideration the different versions of that platform. When composing your web site, try to stay with standard HTML tags.
Java and JavaScript HTML pages now can also include client-side scripting using JavaScript and, for those who only use Microsoft Internet Explorer browsers, VBScript. They can also include small Java applications called applets. Both JavaScript and Java have different versions, and not all browsers support all versions. If a page that contains JavaScript works for the developers but fails to load properly for other users, check to see if the JavaScript is written so that it needs a recent browser. If you use JavaScript on your intranet and Netscape Navigator is your company standard, you may want to see if the client is using Microsoft Internet Explorer or the wrong version of the browser.
Graphic Resolution HTML is usually viewable on monitors of many different sizes. Paragraphs wrap to fit the available space. Some HTML tags (including and
) can specify absolute widths in terms of pixels. If a web site uses a
tag, then a user with monitor resolution of 640x480 pixels will not be able to view most of the web site. This can be even more destructive when the frame option is used and the ability to scroll horizontally is removed! Computer screens generally display between 72 and 96 pixels per inch, and there are still many monitors that only display 640x480 pixels. Therefore, when scanning in pictures, keep in mind that a web browser will convert a high-resolution image (say, 300 dots per inch) to 72 pixels per inch. This means a 3.5-inch photograph scanned in at 300 dots per inch can end up displaying at 1,050 pixels—larger than the screen of a large number of browsers.
The terms dots per inch (dpi) and pixels per inch (ppi) are often used interchangeably when discussing screen resolution. This is not technically correct, however. Dots per inch is a printer resolution, whereas pixels per inch is a screen resolution.
Not only is the image larger than the viewable area of the browser window, it also requires extra bandwidth to download the larger graphic file. Sticking to a screen resolution image (72 to 96ppi) will help keep files small enough to transmit quickly.
Corrupt Files File corruption can also stop some files from being opened. Corruption means that a working file has been changed so that its application can no longer understand, or parse, the file. In the web server environment, files are rarely corrupted. Generally, “corrupt” files are really files that aren’t being opened with the right program or that have been misnamed or otherwise mangled by the user. For example, suppose a user has a file called BIGDIARY.DOC and then puts this file in the compressed zip archive ARCHIVES.ZIP. To open ARCHIVES.ZIP, a user would need to have a program that could parse ZIP files. But if that user renames ARCHIVES.ZIP to ARCHIVES.DOC, Microsoft Word would claim that the file is corrupt.
The best way to fix file corruption is to try to open the original file on the original computer. If the file is not corrupted, replace the corrupted version on the client’s computer with the uncorrupted version and try again. If you’re transferring a file from one computer to another using FTP, set the FTP program to use ASCII when transferring text files (such as HTML, scripts, and files with the .TXT extension) and BINARY when transferring binary files (such as files with the extensions .EXE, .JPG, and .DOC).
Security Issues In Chapter 5, we discussed many aspects of network security. When publishing content over the Internet, it has become difficult to maintain tight security around your web site and still allow the appropriate people to access the content they need. In this section, we will take a look at some of the pitfalls of security measures interfering with client access. Authentication As you will recall from Chapter 5, authentication is the process of verifying that a person (or a piece of software, in situations where programs share information without human intervention) is who they claim to be. Many Internet sites now require clients to log in, or authenticate, to the site before they can view most of the material. This allows the owner of the web site to track who is using the site and what they are doing. There are two basic problems with authentication. The first problem is the client who has forgotten their username and/or password. Most sites allow for this by providing an e-mail solution: the client clicks on a link that automatically e-mails the username and password to the client’s registered email account. This solution is efficient in the sense that password retrieval is totally automated, but it’s highly unsecured because the information is sent in clear text and can be easily intercepted— especially if a hacker has obtained someone’s e-mail password. While requiring the client to answer a question that has been previously agreed on works in some cases, again the information is sent in clear text. The second problem of authentication is stolen or “cracked” passwords. As explained in the “Types and Methods of Attack" section in Chapter 5, hackers use password-cracking software (called password crackers) that works on dictionary solutions. Most clients have a tendency to pick easy to remember names, or words that password crackers have little problem breaking. While you can’t guard against a stolen password, you can enforce password restrictions on your web site that can thwart most attempts at penetrating your site. Password restrictions should include a character length greater than six, and include at least 1 number.
Several studies have been performed that have concluded that an eightcharacter password with a mix of numbers and letters (upper- and lowercase letters, if your server will support it) is the optimum length to defeat most password crackers and still allow clients to remember the password.
Permissions Web server software allows you to set different permissions—read, write, execute—on the files and directories that are accessed by clients. One pitfall web administrators experience at some point is incorrectly setting permissions to Internet content. The client will usually receive a message, such as “Unauthorized to access web page,” in response. If all of the permissions are correct, it could be that the client hasn’t successfully logged into the web site. Data Encryption E-commerce sites conduct a lot of credit card transactions, which make these sites tempting to a hacker. Data encryption is a great tool to thwart hackers from being able to read intercepted credit card numbers over the Internet, but it simply doesn’t work unless the client and the server both are using the same type and key-length. For example, if your web site is set up to use 128-bit encryption and the client’s web browser uses 56-bit, the site may not be able to encrypt/decrypt the transactions properly.
Some web browsers are backward compatible, meaning that a 56-bit web site can be accessed without any problems using a browser with 128-bit encryption.
Technology and Content Planning
T
he best way to ensure a well-functioning web site is to plan ahead. By planning ahead, administrators can address potential problems before their customers are screaming for blood. Also, comprehensive planning leads to optimal trade-offs with factors like high functionality versus compatibility.
This section will address planning both the front end (what the users see) and the back end (what makes the site work behind the scenes). Specifically, it will consider the processes for the following:
Planning which content types (media) to use
Planning for what server and network resources may be needed
A well-thought-out and well-implemented plan for both the front and back end of a web site will minimize the problems discussed in the previous sections.
Audience-Appropriate Media A web site’s content is more than just the words in an HTML document. The content can also includes the graphics, video, and other multimedia files on your site. Some people will appreciate these glitzy multimedia effects; others will be unable or unwilling to view nonstandard or large multimedia files. Choices to either include or not include different content types will have consequences on who uses a web site. Keep the following in mind when choosing your content policies:
Determine the attitude and key technical attributes of your audience.
Given the goals of your web site, choose a content policy tailored to your audience.
A site that follows these methods will serve its viewers in a strategic way and is therefore more likely to achieve its goals.
Audience Profiles One simple yet beautiful strategy for building up or maintaining an audience is to use technologies that work for them. Before you can do that, though, you need to know who your audience is. In terms of what content types to use, you will especially want to consider their desires and technical capacity to use different media. This section breaks these attributes into two areas:
Desire for multimedia content
Client performance levels
This information should help you made informed decisions about the appropriate content and capabilities for your web site.
Desire for Multimedia Content It is difficult to know exactly what anyone wants without asking them directly. This section provides a rule of thumb for guessing when multimedia content would be desired. It also shows real-life examples of when such content is appropriate and presents a hypothetical example of when Shockwave multimedia would be a good idea. The rule of thumb for multimedia content is this: does the functionality of the file directly serve the central purpose of the web page and dramatically enhance the usability of the page? If the answer is yes, users will desire that content and may be willing to go through the effort to get to it. If the answer is no, the multimedia files will cause frustration if they delay users or ask them to modify their browsers in any way. For example, when NASA first released pictures from Galileo at galileo .jpl.nasa.gov/images/io/ioimages.html, people went to the site and waited for a long time to download the pictures. Visitors to the NASA site went there especially to view the pictures, so their motivation to wait was high. But when users go to Yahoo!, they don’t expect to be dazzled by a Flash graphic; they have come to find another web site. Yahoo! keeps multimedia delays to a minimum and focuses on its functionality as a category browser. Although it’s different, Yahoo! and NASA are each providing the content their viewers want.
For more on designing usable web sites, visit dmoz.org/Computers/Internet /WWW/Web_Usability/.
Think about who your audience is and why they come to your site. How much time are they willing to spend for multimedia content? Take, for example, Shockwave, which is a plug-in that allows users to play simple games and view animated pictures. Would your audience want to download a Shockwave plug-in to be able to view your site? If you have a news site, nice pictures would be appealing, but a Shockwave game might not be compelling. But if your site provides web-based tools for diagramming atoms, scientists would probably have enough motivation to download a plug-in. It really depends on how relevant the multimedia is to the purpose of the web page. Client Performance Levels To make the best possible site, you’ll need some information about the Internet abilities of your audience. Audiences have different abilities and technological needs, depending on factors such as network and Internet connection speed, browser type and version, and operating system.
The speed of a user’s Internet connection affects their willingness to download big files. Users who get Internet connectivity by dialing in to an ISP with a modem have vastly slower Internet connections than those who have xDSL, cable modems, or T1s. Those with especially slow network connections need text-oriented navigation and content because they may surf “images off” (meaning that they turn off the capability for their browser to display images, thus making the page load faster). Those who have fast Internet connections are more likely to want to download extras like software and music.
Web servers can record how long it takes viewers to download each file. You can use the formula outlined in the section “Download and View Time” earlier in this chapter to compute the average network speed of your viewers.
BROWSER VERSION
It is important to determine what browsers your visitors are using so you know what capabilities they have. Browsers come in more options than just Netscape Navigator and Microsoft Internet Explorer (IE). Each browser can implement different features, and different versions of a browser possess differing capabilities. Although the latest browsers, like Netscape Navigator 6, implement cutting-edge features, using these features may break the web site for other browsers. There are actually hundreds of different brands and versions of browsers, and they can all differ from each other in terms of their capabilities:
The Lynx text-mode browser can’t view Java applets.
Only IE can use VBScript.
Only recent versions of IE can display raw XML.
Netscape Navigator 2 doesn’t format text according to <STYLE> tags.
For a more complete listing of browser features, see www.bdvnet.com/ resources/browser_compare/.
It’s often easier to upgrade a browser than it is to get a faster network connection, but there are still a lot of older browsers in use. A web server can log the browser version used by each visitor to your web site.
Lynx is a text browser for the World Wide Web. It comes installed on most Linux machines and was widely used at universities in the early days of the WWW. It remains the browser of choice for tens of thousands of users. For more information, see lynx.browser.org/.
OPERATING SYSTEMS
People surf the Web on many flavors of Microsoft Windows, Macintosh computers, Linux and other Unix systems, BeOS boxes, Amigas, and more. Browsers often send along the name of their operating system to the web server so it too can be logged and analyzed. Browsers are highly cross platform, so the operating system is not usually an issue; however, if the web site relies on special content types that need plug-ins, those plug-ins may not be available for all operating systems. Also, external plug-in programs may be limited to only a few operating systems. EXAMPLE PERFORMANCE PROFILES
Suppose your audience consists of university computer science departments. How would you categorize their Internet capabilities and needs? Computer science labs typically have fast network connections, recent versions of web browsers, and a mix of operating systems. If your audience consists of Windows gamers, you can’t assume that they have more than a 28.8Kbps modem, but you could assume they are running Windows with a recent browser and can download a Windows-only plug-in if there was a good reason to do so. Sites designed for these two different audiences might very well differ in the media use.
Content-Technology Policies After gathering information about your audience, the next step is to draft a content type policy. The policy will guide the entire organization as to what content types to use on the web site. The goal of the content policy is to make sure the web site can inform, entertain, and supply the target audience with the tools they want without putting up roadblocks. As you saw in the preceding section, different people will consider different content types a roadblock, so no policy will satisfy everyone. An absence of policy can lead to confusion among the web site developers and frustration among its audience, so it is worth considering the basic types of policies and how to implement one.
Types of Policies Should the web site use only content types that everyone is able to view? Should you use a technology if only 80 percent of your visitors have access to it? Here are four of the most common policies. These basic policies can be modified to reflect an organization’s goals and culture. CAPTIVE AUDIENCE
With a captive audience policy, the web site creators create the content in the format they want to use and their audience must use only browsers that work with the content types they’ve chosen. Captive audience policies usually rely on the content creators having control of the browsers people have on their desktops. This is generally only the case in corporate intranets, and even then, only when there is strict control and standardization of computing resources. Where it is feasible, such a policy can lead to the full use of leading-edge money-saving technologies. LOWEST COMMON DENOMINATOR
A lowest common denominator policy is the opposite of the captive audience policy—a web site designer creates a site that is functional for almost any browser, even those with extremely low capabilities. The idea here is to create the content so it looks good on the text-only Lynx browser, and it’ll work even better in everything else. To demonstrate how a lowest common denominator policy works in practice, Figures 7.7 and 7.8 show the same web page—a community site for Canadian activists—in both Lynx and Netscape Navigator. The site in Lynx, a text-mode browser, functions perfectly well, as you can see in Figure 7.7. Figure 7.8 shows the same content in Navigator, which offers all the Lynx features and more—including fonts, colors, and a background image.
There is a techno-political movement that supports the lowest common denominator approach: www.anybrowser.org/campaign/.
The 85% policy states that you should “use technologies that will reach many people, but don’t let the stragglers drag functionality down for other viewers.” A lot of sites don’t care deeply about reaching everyone. For example, college students generally put up a home page for fun. Their home pages don’t generate more fun for them if they were created so the tiny fraction of text-only browsers can view them. But a Shockwave party invitation might increase the fun considerably, so they will design the site so that the majority of the visitors will be able to visit it and take advantage of its features. Businesses generally put up a web site to sell something. A glitzy web site may sell more than a plain one, even if the glitzy page is theoretically not accessible to those with slower modems. ADAPTIVE CONTENT
Using an adaptive content policy, web site developers don’t necessarily have to choose between accessibility and glamour. Instead, sites can deliver advanced features to clients that can use them and deliver standard features to those who can’t. This way, the whole audience is well served. Creating such a web site, however, adds complexity and often cost. There are two ways to create a web site that provides high functionality to advanced clients and also gracefully provides reduced functionality: Differential content Servers identify which clients can use advanced features and send pages with those features only. For example, you could create a web page that recognizes older browsers and redirects them to a portion of the site that doesn’t use frames. Graceful degradation Like subtle irony in the Simpsons, web pages can sometimes include advanced features in a way that harmlessly passes over the heads of less-advanced browsers. This is called graceful degradation— if the browser can’t use advanced features like frames or JavaScript, these extra features are just ignored. The benefit of the graceful degradation is that everyone can use the site as they would like to use it; in other words, “the user is always right.” The cost is added complexity in maintaining multiple versions of documents or in documents that degrade well. Implementing a Policy Once you have put all that effort into researching your audience and choosing the type of policy to use, it would be a waste to ignore the policy. There are two main factors in the success of a policy:
Content policies serve as a style guide for content creators and web site designers. The policy should give these people specific guidelines to follow. Here are some examples of guidelines you might include:
Limit main navigation pages to a maximum size of 80K.
Mark links to pages that are larger than 150K.
Do not use HTML that requires a browser more recent than Netscape 3.
Your guidelines may be more restrictive or less restrictive than these sample guidelines, but they should be as specific. KNOWN AND ADOPTED
Many people contribute to the health of a site and play a role in creating its content. There needs to be agreement on what technologies to use. This policy might be handed down by the CEO, or it might be collaboratively developed. But it should be in writing, and new employees should be trained in its use.
Planning Robust Back-End Service Plan for the web site back end so that it is robust and can meet the demands placed upon it by good fortune. If you don’t, the consequences can be quite severe. There is an amusing television commercial that illustrates this. In a selfhelp group, a man says, “I just can’t help get over my feeling of being stupid,” and the group facilitator says, “Nobody is stupid, Bob.” Bob then reports how he blew a multimillion dollar marketing campaign by not warning the server guys, and the site couldn’t take the hits. The closing comment is “That is stupid Bob!” Don’t be stupid like Bob J. There are two ways in which the failure of the server operations brings down a site:
Too many hits overload the server.
A critical component dies or malfunctions.
The following sections cover strategies for minimizing these two possibilities.
Abundance Equals Performance Web servers are often no heavier or bulkier than a simple word processor, yet they can almost magically serve up millions of documents to people all over the world. Just as mysteriously, they can bog down and serve documents slowly. The actual performance of a web server depends on network speed, RAM,
processor and hard-disk speed, software, and operating system. That said, there is a general theory that you can use to plan in advance how much of these resources your server(s) are going to need. In the following sections, we’ll try to demystify web server performance by providing an overview of these topics:
Basic theory on web server performance
A strategy for high performance
Finding performance blocks
The bottom line is that an overloaded server will seem slower, so it is important to always operate servers with spare capacity. Given that surges of interest can generate demand spikes, it is desirable to have plenty of spare capacity.
Even with extremely fast computers that have enough RAM, an individual request can only be fulfilled as quickly as the client can receive the response (see the section on proxy servers later in this chapter for caveats to this).
Performance Theory Web servers are built to handle many simultaneous requests, much like a busy restaurant is designed to handle the constant flow of dining traffic. People wait in the lobby until there is a free table. Then a waiter leads them to a table and services their requests until they are done and leave. Web servers generally have 10 to 200 semi-independent processes or threads that can each fulfill one request at a time. Each process or thread is called a child of the web server. Incoming requests cool their heels in a pool of unassigned requests (the lobby) called a queue. When a web server process (waiter) is unoccupied, it’ll be assigned to handle a request in the queue.
The following sections use the words threads and processes because the multithreaded, multiprocess model is the most used in today’s software. Apache, for example, is the most widely used web server, and it uses threads and processes, depending on the underlying operating system. But not all web servers rely on the multithreaded or multiprocess model. See the World Wide Web Consortium’s list of servers at www.w3.org/Servers.html for more information on all different types of servers.
Mathematicians have described the properties of these pools of waiting people in queue theory. Queue is the British word for line. One thing that
queue theory predicts is that the length of the queue is dependent on the relative size of the outgoing and incoming flows. Queue theory uses the term utilization rate to signify things like the number of new requests per second divided by the number of requests that can be fulfilled each second: Utilization Rate = (Rate of New Requests ÷ Maximum Rate Fulfilled) For example, a web server that can finish 10 requests each second and gets eight requests per second will have a utilization rate of 0.8. This web server’s queue will approach zero because there will usually be zero requests waiting in the queue. Even if there are occasionally more than 10 requests in a second, the server will quickly be able to recover and bring the queue down to zero again. If the web server gets 11 requests a second, however, then in the long run, the queue will grow by at least 1 per second. In 10 minutes, the queue will have more than 600 waiting requests, and the queue time will be a minute. So if our example web server goes from eight requests to 11 requests a second, the performance degradation is massive, not incremental. Keep Utilization Rate Low When the utilization rate approaches 1.0, the queue will grow and the performance of the server will start to degrade. The key is to ensure that the utilization rate (even at peak times) is a lot less than 1. There are two ways to lower the utilization rate:
Reduce the number of incoming requests.
Increase the maximum rate of fulfilled requests.
Reducing the number of incoming requests is simple—immediately reject or discard requests after a threshold has been reached. This is often unacceptable and is used only as a safety measure to make sure overloaded web sites don’t lock up. Therefore, it is often necessary to increase the rate of fulfilled requests. This rate is the number of children actively fulfilling requests divided by the average length of time it takes each request to be fulfilled. The equation looks like this: Rate of Fulfilled Requests = Active Children ÷ Time per Request Note that the equation is only accurate when both the number of active children and the time per request is fairly constant. So if a web server generally has 20 busy children and the average request is fulfilled in four seconds, the web server has a rate of about five requests per second. It may be possible to decrease the utilization rate by increasing the number of effective children.
Removing Blocks to High Performance The preceding section indicated that increasing the number of effective children can increase the maximum capacity of the web server and thereby increase the performance of the system. Increasing the number of effective child processes requires a balance of resources. The word effective is very important—simply increasing the number of children may actually reduce the number of effective children. If there are more waiters than tables, they’ll just be stumbling into each other and fewer people will get served. When planning your server environment, you need a balance of elements such as network bandwidth, RAM, disk I/O, and database connections. Any one of these can easily become a bottleneck. The most common limiting factor for web servers is the lack of network bandwidth to serve people at peak times. See the sidebar “Choosing the Right Amount of Bandwidth for a Server” later in this chapter for more on this. If there is plenty of bandwidth, it will certainly take experimentation to ascertain what is limiting the number of effective children. As an example of calculating the right balance, let’s consider an Apache web server, which is one of the most commonly used web server software. After bandwidth, the lack of RAM is the most frequent limiting factor for Apache. Each child process will use some amount of memory (5MB of RAM, for example), and so a server with 500MB of RAM available for web serving can only support 100 of these children. Even doubling the CPU cycles will not significantly increase the speed of a system that is RAM bound, and vice versa.
The use of a Swap file or virtual RAM is usually unacceptable for web servers. The time it takes to swap memory to the hard disk increases the average length of time it takes each request to be fulfilled.
There are other possible constraints on the number of effective children. If the children execute computationally intensive scripts or programs, the CPU may be the bottleneck. If each process consumes some other limited resource, such as database connections or disk I/O, it can reduce the usefulness of more children.
Redundancy Equals Reliability Your back-end service is only as good as its weakest link. If your organization’s name servers don’t work, no one will be able to get to your site and it doesn’t matter if your site has plenty of bandwidth.
Choosing the Right Amount of Bandwidth for a Server The formula for determining current bandwidth needs is the maximum request rate divided by the average download speed of each request. If a web server gets a maximum of five requests a second, with an average download speed of 3 K/s, then it only needs 15 K/s of bandwidth, such as a fast ISDN line. When a site is nearing its capacity, it is likely beginning to slow down, which causes users to give up, thereby reducing the bandwidth required. Adding bandwidth to a busy site reduces response time, so fewer people quit.
This means that, after increasing bandwidth, a site will often see its use jump up and needs to increase bandwidth again. The exact utilization rate at which performance is degraded depends on the network hardware and configuration. Request rates often increase in a linear fashion for a while and then sharply spike up when the site is listed in a popular magazine or search engine or when a community site has a critical mass of users. The key is to be able to easily increase a site’s capacity. How easy is it to get your ISP to add an extra T1 of bandwidth or to change ISPs? If it takes a week of lead time, it may be unacceptable to have bad performance for a week. In this case, the network planner should buy bandwidth in advance of possible marketing successes.
Don’t forget to ensure that network services like DNS and e-mail are also redundant. Crackers can disable or clog poorly configured servers with denial of service (DoS) attacks. When DNS stops working, no one can find your site. When e-mail goes down, most organizations shut down even if they won’t admit it. In addition to the security measures covered in Chapter 5, consider getting redundant mail and DNS servers so you’ll be covered in case of crackers, earthquakes, or other emergencies.
It is necessary to plan what would happen if any resource suffered a breakdown and to make sure no crucial points can fail without a backup. There are three ways most organizations assure that they can recover from a malfunctioning component:
Redundant components in the server itself, such as two NICs or two hard drives
Knowing someone will fix the broken component immediately
Whichever strategy or combination of strategies you use, be sure to consider the cost of the strategy against the potential cost of downtime. Owning Spares Servers can fail, T1s can fail, routers can fail, ISPs can fail, and Internet backbones can fail. Redundancy is the safest strategy to cope with the possibility of failure. People who adhere to the mantra of redundancy keep spare hardware for their servers, have spare servers, have multiple network connections, and even keep two generators in their basement. Buying two of everything can be expensive! For sites where downtime costs tens of thousands of dollars an hour, redundancy is a lot less expensive than downtime. Many sites compromise by standardizing on one type of processor, for example, and then keeping two spares for every 10 active computers. If they use a spare, they replace it immediately, so they always have a spare on hand. Some companies have entire web server spares. Several copies of the same web site exist on different servers that are “hot” (active). These “spare” servers are queried, round-robin style, for web requests. If one web server goes down, the other server(s) take over for all requests without interruption. The hardest hit web sites (such as Amazon.com, Yahoo!, and so on) use web servers in this fashion. The problem with this is that it can be expensive, and a less expensive route is available. You can now purchase servers that have “hot swap” capabilities built in with dual power supplies, hard drives, network cards, and so on. This allows the unit to stay functioning and gives you time to plan the replacement of the failed component. Service Agreements It can be burdensome to be ready to fix anything that could possibly go wrong. Organizations with money can buy their way out of this headache by arranging with other organizations to immediately fix any problems that might crop up. People can purchase an on-site service agreement for their servers and networking equipment; however, make sure that you read any service agreement thoroughly and ensure that everything is in writing. Some companies may promise you a 24-hour turnaround time, but if they’re swamped the turnaround could be as long as a week. Because of this, many companies are now including a damages clause, in which the service provider must pay for each hour of downtime beyond the written turnaround time. This small paragraph will help to keep your company on their priority list.
Internet Connection Points and Your ISP Even if your server is powered and has connectivity to your ISP, it could be isolated from the rest of the world if your ISP loses network connectivity. The Internet is an interconnected set of networks. As discussed in Chapter 2, packets of information often have to cross several networks, or backbones, whenever a web page is downloaded. These backbones are connected to each other at Network Access Points (NAPs). The largest backbones interconnect with the most other networks. Smaller regional networks often only connect with a few other networks, and so packets from them rely on hopping from network to network. If your network only has a single NAP, then all its customers are vulnerable to a fault in that NAP. So although it is good to pick an ISP that has a history of avoiding or quickly resolving problems, it is also wise to see how many NAPs they are connected to and how redundant their network connections are.
Caching
The concept of caching is increasingly being used throughout the Internet—by browsers, workgroups, ISPs, servers, and even in operating systems. Caching is the prudent storing of information that may be used shortly. Systems use caching to avoid asking for the same data over and over from the same web site. Caching is used to speed applications and reduce expensive queries.
A dictionary definition of a cache is “a secure place of storage.” In the Internet realm, a cache is the local and/or fast place to temporarily store information, especially information that is likely to be needed again in the near term.
The effects and caveats of different types of caching are different in different technologies. The following sections start off with caching basics and then explore how caching is used in key zones of Internet technology:
Caching interrupts the normal request-response loop described in Chapter 2. Either by accident or on purpose, this can sometimes lead to using less than the most up-to-date version of a document. After reading the following sections, you should be able to recognize both where caching is causing unexpected behavior and where it can lead to a performance gain.
Caching Basics Caching is similar to how you look up people’s phone numbers. If you want to call a friend, you’ll look first in your personal phone list. It is faster to look it up there than to call 411. If you don’t have your friend’s information, you’ll call 411 or dig out your big phone book and look up his information. After calling 411 once, you’ll write down the phone number in your personal phone list so it’ll be easier and faster to find it next time. Of course, this works best with information that isn’t changing all the time. It won’t help to have your own “private copy” of current events—by its very nature this information is usually useless when old. Likewise, if you have someone’s work telephone number from 45 years ago, it isn’t likely to do you much good. In computer networking, caching takes into account many of the same considerations we use when looking up someone’s phone number. Caching is generally used to save time, like a personal phone list is used to save us from hauling out dozens of huge metropolitan phone books. Also, there are rules about when to use the cache and when to go to the original source, just like we have internal guidelines about when we won’t even try our outdated phone list. The biggest danger in caching is using outdated information. Many times caches will take a short query to the original document and say, “Have you changed?” If the document has changed, the whole document is retrieved. If the document has not changed, the cached copy is used. This is similar to trying someone’s phone number and looking it up if the number is wrong.
Why Cache? People use caching to save time and money. This section outlines how one organization could save big dividends by caching and demonstrates a simple mathematical formula for determining possible gains by caching.
Storing local copies of reusable information can be complicated, but it often pays big dividends. Consider an adventurous group of scientists stationed in Antarctica. These scientists share five computers and a single 64K ISDN line to the Internet. When it their turn on the computer, each researcher checks a web page with an Antarctica weather map—a 200K graphic taken straight from a Russian satellite. They use this information to decide whether it is safe for them to visit different parts of the frozen tundra that day. Without caching any information, viewing this weather map could consume a lot of bandwidth, for which the scientists pay for by the minute. If four scientists use each computer, the minimum amount of time it will take can be figured out as follows: 4 scientists × computer \ 5 computers \ 200K image = 4000K × 5Kbps ISDN line = 800 seconds = 13 minutes The image only changes once a day, so if each computer cached the image for a day, then only the first scientist for each computer would download the image. As you can see in Figure 7.9, a cache can shorten a request. If the request is in the cache, it can be used directly instead of the original file. In this case, if the document exists in the cache, the browser can immediately read the file from its RAM or hard disk cache and not wait for a lengthy connection to the Internet. When the other three scientists check the page, they would see the stored copy. This alone would shorten the time from 13 minutes to four. It could get even better—these computers could effectively share a single cache. If they did that, then the whole group would only have to download a single copy of the image, done in less than a minute, and all the scientists could use that file. Caching could reduce these scientists’ connectivity charges by 92 percent—enough to buy them all earmuffs! FIGURE 7.9
Caching shortens the loop Request originator
Original source
Cache request
request
File not in cache file delivered request File in cache file delivered
When a cache is queried, either it will find the document in the cache (called a cache “hit”) or it will not (a cache miss). A false hit is when the cache returns a document that should not be used, such as an out-of-date document.
Web Caching You saw in the sidebar “Download and View Time” that downloading files is often frustrating to users with slow modems. Therefore, it should come as no surprise that most browsers are configured to do some caching, called web caching, so they don’t waste time unnecessarily redownloading large files. If you go to a new page, your browser downloads the images. It also stores these images in both its memory and disk cache. The cache is a “first in, first out system”; that is, the documents in the cache that have been there the longest will be the first documents to be removed from the cache to make room for new documents. This way, the most recent items are generally cached, but the cache doesn’t get bigger than the limits set in the browser. If a browser is spending a lot of time downloading images for a frequently visited web site, check to make sure you have a big cache and that caching is turned on.
Objects in the memory cache are in the computer’s RAM; objects in the disk cache are stored on the hard disk. A computer accesses its RAM much more quickly than its hard disk, although a hard disk is generally still faster than a T1 line. See lowendmac.com/tech/howfast.shtml for an interesting comparison between relative speeds of SCSI, Ethernet, and RAM.
For each request, the browser makes a decision about whether to use the cache. Because the cache might have outdated information, these rules favor not using the cache. They are applied in roughly this order: 1. If there is no cached document, get the original. 2. If the browser is set to never use a cached document, get the original. 3. If the server has flagged content as “not cacheable,” get the original. 4. If the user hits the Reload button, get the original. 5. If the document was selected by using the Back button or browser
history, use the cache. 6. Otherwise, compare the cached document’s time to load against the
Shift+Reload also reloads all the images and other multimedia files on a page.
The browser can force the issue. Its update settings specify how often it should reload even valid cached documents. In Netscape 6, the Document in Cache Is Compared against Document on Network setting can be found under Edit Preferences Advanced Cache. Figure 7.10 shows how a user changes their caching preferences, in particular how big their cache is and how often the settings force their browser to reload cached images. FIGURE 7.10
Editing caching preferences
The browser can be configured as to what cached documents it will try to use. The Every Time setting eliminates caching. The Once Per Session setting is the standard setting—it will consider the documents that were cached since the browser was started. The Never setting means that the browser will always attempt to use the cache, even on pages that may have been visited a long time ago.
When to Clear the Cache There are generally three cases when a user would want to empty the cache. The first is if they wants see the most up-to-date version of many pages that are in their cache. They could click Reload on each page, but it may be faster to just clear the cache. The second reason to dump the cache is to conceal where the user has been. Even if the user exits Navigator, the disk cache is still there, containing information on which pages the user has visited. The last reason is if a web page used to load fine in the web browser, and other clients are able to access it, but it either doesn’t display or generates a weird error message. Sometimes a file retrieved file or a client-side script, for whatever reason, will get garbled during transmission to a client computer. The only way to resolve this issue is to clear the local cache. You can clear either the disk or the memory cache from Edit Preferences Advanced Cache. You can perform the same operation in Internet Explorer by going to Tools Internet Options and, on the General tab, using the Delete Files option under Temporary Internet Files.
Proxy Caching Proxy servers are “middlemen” between web browsers and web servers. They are used to monitor and regulate a browser’s use of the Internet and, less ominously, to act as a shared cache between the users of the proxy server. There are several different types of proxy servers:
LAN-based proxy servers
ISP-based proxy servers
Regional proxy servers
Millions of users use proxy servers. Some do so to provide a shared cache in their workgroup, others as an invisible part of the ISP. In any case, proxy services should be configured to be as transparent to the user as possible.
Inktomi, www.inktomi.com/, is the market leader in advanced caching. On the lower end, Squid, squid.nlanr.net/, is a free and easy-to-use caching proxy server, available on Linux and funded by the National Science Foundation.
LAN-Based Proxy Servers Remember those Antarctic scientists who could reduce their download time to less than a minute by effectively sharing a cache? They could do so by using a web proxy server on their LAN. A LAN-based proxy server is a computer in a local area network that serves as a middleman between the other computers and the Internet. There is typically a very fast connection between computers on the LAN and a slower connection from the LAN to the rest of the Internet. Therefore, it’s quicker to retrieve files cached on the LAN than it is to retrieve files on the Internet. As you can see in Figure 7.11, the scientists would enjoy a 10Mbps connection to their proxy server and only a shared 64Kbps connection to the rest of the Internet. Thus, the LAN proxy server would speed their downloads. FIGURE 7.11
LAN proxy Clients
LAN proxy
Internet 64Kbps ISDN to Internet
10Mb Ethernet
ISP Proxy Servers Most people who use proxies do so through their ISP and never even know they are using a proxy. An ISP’s dial-up customers all use the ISP’s proxy server. As you can see in Figure 7.12, an ISP proxy server is similar to a LAN proxy sever except that it’s not connected to the dial-up customers by a fast Internet connection. AOL, for example, has all of its dial-up customers automatically use the AOL web proxy servers. This means that all AOL browsers connect to the AOL web proxy servers, and the web proxy servers connect to the rest of the Internet. The AOL servers cache a huge portion of the Internet!
Whenever a proxy isn’t located directly on the LAN, it has a mixed effect on speed. If the proxy servers are overloaded, this can increase the response time of the ISP’s users. AOL proxies generally speed up the access of their dial-up customers, especially where there is poor connectivity between AOL and the target server.
If people rely on proxies for browsing, it is important that the proxies are reliable.
Regional Proxy Servers Just as a collection of users can see performance gains by using a proxy server, a collection of proxy servers can use other servers as proxies. This way, groups of proxy servers can be nested in hierarchies. These groupings are especially useful where bandwidth is expensive or limited. For example, as we saw with the Antarctica situation, the proxy server will be most helpful for speed if it is placed right on the LAN with the other workstations. But if all of Antarctica shared a single ISDN line to other continents, it might make sense to use a second-tier proxy server. Figure 7.13 shows a two-tier proxy system. Clients have very fast connections with the LAN proxy closest to them, but they also benefit from a regional proxy. If the first-tier (LAN) proxy server had a miss, it could analyze the request. If the request was destined for an Antarctica site, it would query that site directly, but if it was for an international server, it would forward the request to a continental proxy server.
Problems with Proxy Servers There are plenty of times that different people should not get the same file, even if they go to the same URL. If Sally views a page at my.yahoo.com, and Dave uses the same proxy as Sally and he also goes to my.yahoo.com, they don’t want to get the same information. If the server at the other side isn’t “aware” of proxies, it might mistake Dave for Sally.
You can configure a proxy exclusion list for some browsers. If a proxy server is causing trouble with a particular site only, the user can add that site to the browser’s proxy exclusion list. This means the browser will attempt to use the proxy for most connections but will connect directly to the sites on the proxy exclusion list.
A person’s identity can be established in many ways. One way is to consider a stream of requests from a particular IP address within a certain time frame to be the same person. All requests to the Internet from a proxy server
appear to come from the proxy server’s IP address. If several people are using a proxy server to go to the same web site, it is possible the web server might mistake those people for one person. The obvious way around this is to not rely on using IP addresses as a reliable way of distinguishing between people. Proxies generally don’t confuse cookies or hidden fields containing special tokens.
Caching at the Server: Reverse Proxies Web servers use another type of caching to increase the number of clients they can serve. Reverse proxy servers are web servers that act as middlemen between an organization’s main web server and the general public.
Reverse proxy servers are also known as web server accelerators and sometimes as caching web servers.
When web proxies receive requests from the public, they first check their cache—if they have received the request before, they’ll respond to the request directly. If they don’t have a “hit” in their cache, they will query the main web server, store the result in the cache, and then respond to the client. As you can see in Figure 7.14, there is a very fast network connection between the main server and the reverse proxy. This allows the main server to quickly respond to the proxy’s requests. FIGURE 7.14
A reverse proxy Clients
Reverse proxy
Main server
56Kbps
28.8Kbps
Internet
56Kbps
100Mbps Ethernet
384Kbps
Although reverse proxies add a step to many requests, in specialized situations they can increase a server’s maximum capacity and speed. In the
following sections, we’ll discuss when to use reverse proxies and how to calculate the benefits. After reading these sections, you will be able to recognize the potential benefit of adding a reverse proxy server between the teeming hordes and your main web server.
When to Use Reverse Proxies The main reason to use reverse proxies is to make economical use of server resources. It is seemingly strange that duplicating work is economical—two servers are now potentially receiving and responding to the same request, one after the other. It can be quite effective, however, because reverse proxy servers are typically much better at their specialized task: transmitting server responses while consuming as few resources as possible. Proxy servers are most effective when used in conjunction with an application server. Most web server processes are “heavy”—each one uses a lot of RAM by carrying around instructions on how to do all sorts of things. The Apache web server can be built with mod_perl, for example, and then it has the full Perl interpreter built into each process. Like a tank, it uses a lot of fuel, but it’s equipped to swiftly deal with any eventuality. Reverse proxies allow each type of process to specialize in what it does best—the heavy processes can run the complicated programs and the light “proxy” processes can transmit responses to browsers.
People use Apache more than any other web server because they would rather install a little bit more RAM than use a less functional web server. Web proxies can give you both efficient use of RAM and a highly configurable web server.
Calculating Performance Increase Let’s apply our model of server performance to reverse proxies and see how effective they can be. Our test server will have 600MB of RAM. The real amount of RAM each Apache process consumes varies between 1MB and 20MB, depending on configuration options and whether mod_perl is installed. For the sake of argument, let’s consider a worst-case scenario and say each Apache process uses 20MB of RAM. With this limit in mind, the server can only have 30 children (600 ÷ 20 = 30). If each browser takes 10 seconds to download a request, the maximum response rate is three responses a second. During the 10 seconds of each response, the full power of the Perl-enabled process is
typically only used for less than a second—it is effectively wasting its extra RAM by babysitting the data as it is transferred over to the client. This is where the reverse proxy steps in. A reverse proxy server has lightweight processes, taking less than 1MB per process. If a server is configured to have 200 reverse proxy children and 20 Apache children, then the maximum response rate will go way up. The 20 Apache children can each handle a request a second. They then hand off the full request to the reverse proxy server, which slowly ekes it out over the network. The 200 reverse proxy children can also sustain a response rate of 20/s. So, for no additional server hardware, the server can suddenly handle six times as many clients.
Although the preceding example uses a single computer to host both the reverse proxy and the main server, in a high-bandwidth situation, the system architect would likely choose at least one machine to be the main server and the other to be the reverse proxy. It is also important to remember that bandwidth is more frequently a roadblock to a site’s speed than lack of RAM is, and so reverse proxies are rarely needed.
Searching and Indexing
The ability to find what you’re looking for should not be underestimated. Conversely, great content is useless if it is not found. If users can’t find the information they want on your site, they will leave and take their business with them. Searching in the general sense is looking for information that is not readily at hand. There are a variety of search technologies that you can use to make your site easier to find and more functional:
Basic searches
Pre-indexed searches
Advanced searches
Linguistic searches
The following sections provide an overview of the different types of search features you can put on your web site, as well as an overview of how this relates to your site being found by the large search engines.
Basic Searches Once a user is at a site, they are probably looking for something in particular. For example, different people would use a book on nuclear fission in different ways:
The novice would scan the chapter titles for an introduction.
The expert would check the index for very specific terms.
Web sites are similar—people use a combination of site maps (like using a table of contents) and search pages (like looking up a term in the index) to find information on large sites.
Site Maps A site index is like a table of contents—it gives an overview of what is available. Site maps and hierarchical indexes are useful when the amount of information available is small or when the user does not know the exact word she is looking for. Site maps help people learn about the scope of information that is available. For example, Eric S. Raymond, one of the spokespeople for the open source movement, has scripts generate his site map from meta tags in his HTML pages (see www.tuxedo.org/~esr/sitemap.html). The site map has a one-sentence description of each of his pages. A user could quickly browse this site map and see what page was useful. The site www.dmoz.org/ has an extremely high-quality, hierarchical, and browseable index. Site maps have limited yet useful functionality. They require repeat visitors to click down through a hierarchy of information. If you know an identifying set of words, searching for pages that match those words can be a faster way to find the information you need.
Searching On the Internet, a simple search query asks, “What documents do you have that contain this word.” The server will then return a clickable list of documents that have the searched-for word. Behind the scenes, the local search engine knows two things: the search terms and a set of documents to search. It will compare the contents of these documents against the search terms. Very primitive search engines will compare this material by examining all the words in all the documents one-by-one for each search. This is a slow way to search. If a site has 1,000 documents (which is not unusual) and they each have 2,000 words, 2 million comparisons are performed for each
search. If there is one new search every minute, the server will swiftly use up all its CPU cycles and RAM.
Search Syntax Search syntax is the language that people use to describe what search to perform. For example, if a user searches for “apples pears”, then the search engine will generally return a list of all the documents that have either the word apples or the word pears and put the documents that have both words at the top of the result list. Different search engines use different search syntaxes, but the rules in Table 7.2 are the most common. TABLE 7.2
Search Syntax Rules Rule
Explanation
+word
word must exist
-word
word must not exist
word otherword
word or otherword should exist
word-otherword
word should exist, otherword must not
word AND otherword
+word +otherword
You can use parentheses to group conditions. Consider this search expression: ice and ((man or woman) or (green and blue)) This will return pages with the words ice man, ice woman, and ice green blue, but it will not return pages with ice green.
Internet spiders don’t just search documents on the local machine. They perform three phases of their search—they request files, index them, and then use the information in those files to find more files to request.
Reverse Index Searches Searches can use the concept of caching for massive speed improvements. Most local web site search engines index all their files once a night (usually around 2:00 A.M.) and save a list of all the words someone might want to search for and what documents are good finds for each word. Then for the rest of the day, the search engine can just look up each incoming query and compare it to this list of pregenerated results. The indexing portion of search software stores the words and the documents that refer to those words in a file called a reverse index. On each line of the reverse index is a single word, and next to the word is a list of all the files that contain this word. You can often further optimize an index so that a number replaces the name of the file and is then used to look up the filename. So in this scenario, there are two files: WORDS.DB Bob: 2 Jane: 1 Summer: 1 Vacation: 1, 2 Winter: 2
FILENAMES.DB 1: JANE.LTR 2: BOB.LTR If the search query is for “Bob,” the search software doesn’t need to search through both JANE.LTR and BOB.LTR. It can quickly scan through WORDS.DB for the word Bob and see that this word exists in file number 2. If the query is “Bob or Vacation,” then the search would return both files as matching. Using a reverse index is much faster! Our example of 1,000 documents, each with 2,000 words, would probably have less than 16,000 unique words because people tend to reuse the same words over and over. Using a simple scanning algorithm, it takes less than 17 comparisons to find a word in a file that contains a sorted list of 16,000 unique words Therefore, the combined search of WORDS.DB and FILENAMES.DB would require less than 30 comparisons for each word. So the indexed full-text search is more than 60,000 times faster than the 2 million comparisons required by the naive full-text search.
Why Search Engines Find Bogus Information Using an indexed search is a bit like having a cache—the index stands in for the real data. As with a cache, searches can find the wrong information (false hit) and can fail to find the right information (false miss). If the indexer ran on Monday and a file is deleted on Tuesday, then on Wednesday the search program will still report “hits” for the deleted file. To compensate for this, you can reindex your data every time you change any files, which may be computationally expensive. It is also sometimes possible to do “real-time” verification on search hits and remove any false hits from the cache or mark these hits as “unavailable.” Real-time verification does not work for false-negatives. The only way to eliminate false-negatives is to do an incremental update to the cache by adding the new files.
Advanced Searches A full-text search is a great tool, but it can fail to be discerning enough to handle the enormous volume of information available on the Internet, or even on very large sites. There are probably millions of pages on the Internet that have the word bear in them. Some of them use it as in “I shot the bear,” some say, “You will bear the full cost of this,” others in still other ways. Maybe a logger in Oregon signs all his web pages “Bear of the North.” A full-text search by itself will not be able to find the highest-quality web pages focusing on bears. There are at least three main approaches to narrowing search results to a more manageable number:
Use explicit meta information.
Infer nonexplicit information.
Consider the relationship of the search result page to other sites.
Explicit Meta Information The first strategy is to use explicit meta information. Meta information is information about other information. The document’s size is part of the meta information. HTML pages have <META> tags like and . Indexes that only include words from these sections are more likely to return only pages that are really about bears and not pages that just use the word bear in an off-hand way.
Annotating pages with meta information is even more valuable when people searching use the same vocabulary as the people annotating the pages. This is a strategy library scientists have developed to a high degree.
Nonexplicit Information The second strategy is to infer information about what is most relevant. The simplest way to do this is to just count the number of times each word is used in a document and return the documents that use the word the most number of times. This way, casual references to bears are excluded, but it doesn’t require that all document authors remember to include <META> tags. Some people, of course, abuse this system by repeating words hundreds of times so their documents will show up first in search results. Many search engines are heavily optimized to avoid manipulation of this sort, and will even exclude such sites from appearing on a search.
Popularity Contests The third strategy uses the popularity of a web page in deciding where the web page should be placed in the search results. If the page is “popular” and matches the other search criteria, it will float to the top of the search results. In its simplest form, pages to which a lot of links lead are bumped up in search results—in a sense, other sites have voted for the usefulness of the popular site.
Google (www.google.com/) seems to be the search technology leader with weighted voting. In Google search results, votes are weighted in all sorts of ways. Google search results are determined by complex mathematical calculations, including the relative worth of each page and whether pages share similar keywords.
Linguistics Many search engines use their understanding of the relationships between words and the meaning of words in searching and indexing. Common words like a, or, and the are eliminated; these are called stop words or “noise” words, and they are usually filtered from an index for efficiency because people rarely
search on just those words. Words with the same base, like run, ran, and running, are consolidated into one word (this is called stemming). Words with multiple possible meanings are classified as having one meaning or another based on their neighboring words. When a query is run, a similar analysis is done on the search terms; documents with the same context are returned. Some search engines have a “concept search.” Words that frequently appear next to each other are considered part of the same concept. Each of these words is linked to a concept cluster (another reverse index of words and clusters). The concept search finds pages that have the concept to which the search word is linked. If the search term is in several concept clusters, an example of each cluster may be shown. If the user chooses a “more like this result” icon, the search engine will return other results that ranked highly in that cluster. It may also use other “Query by Example” criteria, such as being from the same web site or being linked to by the same documents
Getting Noticed by Big Search Engines Big search engine companies such as Excite, Google, and AltaVista all have massive servers indexing the Internet, and if you’re lucky, they will index your site, too. Because so many people use the big search engines, sites that show up on the first page in search results get a lot more hits. There are two things that you can do to try to get top billing by the large search engines like www.google.com and www.excite.com:
Make sure the search engines know about your URL.
Prepare your site so that it appears prominently in the result list.
To speed up this process of being noticed by the spiders, you can submit your URLs to the search engines. For example, if you go to www.excite.com/ info/add_url, you can submit your URL and give some contact information as well as some meta information about the site. That will put the URL in their queue. There are a lot of search engines nowadays, and it could take a long time to fill out all their web forms. One of the original tools to submit your link to many search engines was www.submitit.com/, which now is only partially free. To find similar, but free, services, see www.google.com/search?q=submitit+clone. Other commercial services offer the same thing but claim their human experts will personally analyze the link and help classify it in all the various schemes. See www.worldsubmit.com/meta-information for an example.
Don’t expect to be listed right away in the big search engines. They have a large queue, and therefore new pages might not be indexed for a while. Plan early by submitting the page before you are ready to launch new pages.
Even once your site has been indexed by the big search engines, few searchers may wade through all the other search results to find your page. Search engines use the technologies mentioned in the previous sections to prioritize the relative values of their huge number of indexed pages. Different search engines use different combinations of explicit meta information, inferred meta information, and the popularity of a web site. You can prepare your site so it seems valuable by several different metrics. Some engines pay attention to how often your site is linked to, so you should ask other sites to link to yours. Other engines only pay attention to <META> tags, so make sure your web site has descriptive <META> tags. See www.searchenginewatch.com/ for details on which search engines use which prioritization systems.
Summary
Y
ou can be sure that the exam will cover issues that affect functionality, the concepts of caching and proxy servers, and different types of searching, especially search syntax. In this chapter, we started off with some potential issues that can affect your Internet site’s functionality, such as bandwidth, Internet connection types, and security issues. You learned how to computer how long it will take to download a file using various connection speeds, and that a web page load time of more than 10 seconds will frustrate a user. We also took a look at the different MIME types and plug-ins that could cause problems for your clients, and the importance of ensuring that you use standard plug-ins or at least a link to the correct plug-in. One method of relieving an overloaded web server is to implement caching. Caching allows your server to store content locally, and serve that content to the client instead of going out to an Internet site every time the data is requested. You learned that there are two basic forms of caching: client caching and server caching. Client caching occurs at the client level, while server caching is performed by a server, such as a Proxy server. We also saw
that sometimes caching will produce the undesired result of returning old, stale content and that sometimes web pages won’t load correctly because of a corrupt file in the cache. The method to defeat this problem is to clear out the cache to force the data to be retrieved from the Internet site. After we talked about caching, we took a look at the importance of search indexes. Finding data on a web site, particularly a large one, can be difficult without the ability to perform a search. We saw that site maps are useful enough to get the client to the correct portion of the web site, but that they aren’t as good as a decent search engine. Different search engines are available to search your web site, just as there are different search engines available to search the Internet for content. Because most clients will use an Internet search engine to locate the information they want, it is a good idea to get yourself listed on several of the well-known engines. The use of <META> tags enables you to place keywords in your web pages for meta search engines, which home in on key words. Spider search engines actually run small programs, called spiders, that search the web for new content and perform searches.
Exam Essentials Identify the issues that affect Internet site functionality. Know that you need to have the appropriate bandwidth available to service client requests and the different Internet connection types that will give you that bandwidth. Understand that the client “frustration level” peaks at about 10 seconds, and that a page laden with high-resolution graphics or too many graphics will cause a web page to load too slowly on a client machine. Know the different security issues that affect Internet site functionality. Know that authentication can fail, usually from a forgotten password, or a hacker can get around authentication and into your server. Be aware that permissions must be correctly set for clients to be able to view your web site and obtain the appropriate content. Understand that encryption plays an important role in secure transactions. Describe the concept of caching and know the different types of caching. Know the differences and features of web caching, file caching, and proxy caching. Be able to describe how client-side caching works as opposed to server-side caching.
Know how to properly index your site for a search and understand the different search engine methods. Identify the different search methods and search syntaxes. Be able to distinguish between different search engines, such as meta search engines and spider search engines.
Key Terms
Before you take the exam, be certain you are familiar with the following terms: (ppi)
Review Questions 1. Users will happily wait how many seconds for a web page to display? A. 10 B. 20 C. 40 D. 80 2. What is usually the longest step of a server's response? A. The request waiting in the server's queue B. The server generating or looking up content for the response C. Transmitting the response from server to client D. Displaying or rendering the content 3. A user with a 28.8Kbps modem will take about how many about
seconds to download a 100K file? A. 15 B. 30 C. 45 D. 60 4. Which answer describes browser plug-ins to display special file formats? A. They are the mark of a hip site. B. They turn some users away. C. They are one application of MINE-encoding. D. A and C.
5. Some pages will display improperly unless the browser is at least a
certain width. Which of the following assertions is most true? A. There should be a rule strictly forbidding pages to have minimums. B. The “standard” minimum is 800 pixels. This works with smaller
resolution because GIFs are only 72dpi. C. There shouldn't be any rules about minimums because how the
page displays depends on the expert designers. D. A minimum of 400 pixels works for any computer with a graphical
browser and a 640x480 resolution. 6. Which answer describes a web log analysis? A. It is useless to determine usage patterns because HTTP is stateless. B. It is useful to determine the profile of a site's actual audience. C. Station address information is less accurate because of proxies. D. B and C. 7. The test web server is a Pentium-100 with 100MB of RAM running
Apache on Linux. Each child of the web server uses 5MB of RAM. The server has a dedicated T1 line and serves an unlimited number of clients that download its 100K static pages at 10K/s. What is the biggest limitation to the number of pages it serves a second? A. RAM B. CPU cycles C. Bandwidth D. DNS reliability 8. Which of the following statements is true? A. DNS is built into TCP/IP, so it is simply reliable. B. DNS is the responsibility of ISPs and the InterNIC. C. DNS is subject to failure, so it requires the same reliability audit as
other servers and network services. D. DNS can be speeded up by having enough web server children to
9. Which term means the document requested exists in the cache? A. Win B. Hit C. Score D. Cast 10. Who would get the most value out of a web proxy server on their LAN? A. Dull employees who visit the same pages every day B. Dutiful employees who browse the LAN intranet server C. Naughty employees who follow breaking news on remote discussion
servers D. Remote staffers who dial in to the LAN to get Internet connectivity 11. An ISP could cut their bandwidth costs by installing a web proxy
server for their dial-up customers if their customers ___. A. Were scientists B. Visited dynamic web pages that take a long time to generate C. Went to similar web sites D. Favored web sites in other countries 12. Reverse proxies are designed to do what? A. Correct flaws in regular server software B. Hide busy servers from slow international users C. Conserve bandwidth D. Use concepts of division of labor and specialization 13. Which of the following is true of caching? A. Caching is designed to make the Internet more reliable. B. Caching reduces the cost and latency of requests. C. Caching is efficient because it was built with the HTTP protocol
in mind. D. Caching seems efficient, but not because of the popularity of
14. Site indexes are _____. A. A holdover from Gopher B. Based on library science C. Only useful to experts who want to jump straight into content D. A good overview on what is available 15. Full-text searches are most useful in which situations? A. When the search query is strange and uncommon B. When the number of documents to search is less than 20 C. When the content is in a database D. When the number of documents to search grows to more than a
million 16. What is not a search technology? A. Stemming B. Stopping C. Starting D. Query by example 17. A client receives an “Unauthorized to view page” message from a web
site. Which of the following could be the problem? (Choose two.) A. The client is not logged in. B. The multiformat web page does not exist. C. Permissions on the web page are incorrect. D. The web page is cached improperly.
18. What search would find the most number of pages? A. Ice and cream or Sundays B. ice and cream and Sundays C. ice Sundays D. There isn’t enough information to say. 19. What is true about HTML? A. It is a form of hypertext. B. Because it is a standard, there is no cross-browser standard for dis-
playing HTML. C. HTML is generally rendered by a plug-in. D. HTML should be transferred via FTP in the BINARY mode. 20. A site with 10,000 documents has 5,000 terms in its index. If the site
grows to 20,000 documents, which of the following statements is true? A. The number of terms will grow to 10,000 terms. B. A search that used to find 10 hits is now likely to find 20 documents. C. Searching will take twice as long as before. D. It will no longer be possible to use keywords because the total size
of the index will surpass the allowed maximum size.
Answers to Review Questions 1. A. Although it depends on the user, research into “human factors” by
universities and software companies indicates that 10 seconds is the threshold of frustration for waiting for pages to display. 2. C. Clients are usually on modems, which are relatively slower than
most servers and the Internet. 3. B. To figure out how long it will take to download, use the following
equation: (100 kilobytes \ 8 bits/byte) × (28.8kilobits/s) = 27.78, or around 30, seconds. 4. B. Users without needed plug-ins leave a site more often than they
download a plug-in. 5. D. D is factually true. A and C are opinions and B is wrong (there is
no one standard). 6. D. Logs are not 100 percent accurate, but they indicate what browser
and operating system users are using. Proxies do frustrate attempts to log the exact IP address of every visitor because when a proxy is used, the address of the proxy server rather than the user’s IP address is recorded. 7. A. The server is relatively underpowered in RAM. It can only sustain
20 children, which could only upload at their client's capacity of 10K/ s, for a total bandwidth usage of 200K/s. This means that at 100-percent utilization of RAM, the bandwidth usage is only 20 percent. 8. C. DNS is reliable, to a point. But it does require auditing to ensure it’s
continued reliability. 9. B. If a cache has a usable stored result, it is called a hit. If it doesn't, it
is a miss. 10. A. The graphics and static pages on these same pages will be quicker
11. C. Proxy servers are most effective in reducing bandwidth when mul-
tiple people get the cached responses to identical requests. 12. D. Reverse proxies are designed to off-load the plodding and easy
work from sophisticated web servers to lightweight ones. 13. B. Caching is important for more than just the Internet. 14. D. Site indexes help people who don’t know exactly which search
words to use to search on a particular topic. 15. A. Naive full-text searches often return too many hits. Strange words
eliminate false-positives. 16. C. A and B are techniques that use knowledge about what words are
different versions of each other and what words are too common to be used in searching. Query by example is used to group documents by certain characteristics, such as concept clusters, host, or a series of web sites that link to one another. 17. A, C. If a web page exists, the client would have received a “File not
found” error message. An improperly cached web page would produce garbled results. If the client is not logged in, or the permissions are incorrect, they will get the “Unauthorized” error message. 18. C. Ice Sundays is interpreted as Ice or Sundays. This is less restrictive
than both A and B. 19. A. HTML allows linking between documents. B should be true, but
there are multiple versions of HTML, and browsers sometimes permit nonstandard HTML. 20. B. If the new documents are similar to the old documents, it is quite pos-
sible that doubling the number of documents will result in double the number of hits for any particular query. Due to the efficiency of indexing, and the shared vocabulary of most documents, doubling the number of indexed documents will not double either the number of words or the length of a search.
ugs and errors infest the World Wide Web—expect them to crop up on your web site. Bugs stop people from using your web site and can permanently damage your reputation. Although some bugs are inevitable, it’s important to reduce the number and effect of bugs that sneak into your site. In the preceding chapter, you learned the importance of advance planning for the back and front ends of your web site. Careful planning will reduce the number of bugs and errors that your users experience, but bug detection and fixing is an ongoing process. This chapter discusses the important areas of troubleshooting and debugging:
Fixing bugs before your audience sees them
Resolving Internet problems
Virus protection
Software updates
Legacy clients
A clear understanding of these topics will limit the number of bugs you inflict on your users and can lead to quicker resolution of errors that you do find.
Web Site Pre-rollout Testing
B
efore you roll out a new web site, or change an existing site, you should make sure it has as few errors as possible. A new site is kept “private” until its launch date. When it is rolled out, everyone is allowed in to try it out.
You should only make your site public once you have resolved most of the issues. You probably have similar concerns for your existing sites—you don’t want to make your big mistakes in public. Changes to a web site can make the site unreadable or cause web applications to produce errors. Web sites are complicated and contain unexpected dependencies—so even changes to one page may break something on a different part of the site. Just as new sites are kept private until they are rolled out, modifications to existing sites should be tested in a private area. Changes can then be installed once they are verified as correct.
To “break” a web site is to make it unusable. There are many ways to break a web site, most of them unexpected. You might corrupt the graphic layout of a page, make a CGI script stop performing its duties, or even make the server freeze (hang).
There are two essentials you need before you make your site public:
A method for privately testing your changes
A checklist of what problems to look for
With these items in place, you can tinker with your web site and not be afraid that your changes will break the chairman of the board’s favorite page.
Develop a Testing Methodology If avoiding problems with your web site is important to you, you should use a standard testing methodology. A testing methodology is the set of procedures you perform each and every time you make a change. The goal is to check the validity of every page. With a written methodology, you will use the same procedure each time you check your site, which creates a standard procedure and reduces the possibility of missing a potential problem through carelessness, sloppy work, ignorance, or simply forgetting a step. Whether your web site is large or small, a testing methodology decreases the likelihood that you’ll seriously break it. By following a standard procedure, you’re less likely to forget to check for whatever broke your site the last time you modified something. As time goes on, you can fine-tune the extent of your testing, depending on how often things break and how effective your testing is in finding errors.
In the following sections, we’ll give you an example of a simple testing methodology and then discuss some components of a more advanced testing methodology—a location for testing and a storyboard to follow. Later, we’ll review the different types of problems you may discover in a web site. You can then proceed to develop a testing methodology that fits your needs.
Simple Error Checking Different sized web sites require different sized tests. Joe’s home page probably doesn’t need as much testing as Amazon.com’s home page; the cost of an uncaught bug is significantly smaller for Joe than it is for Amazon.com. Joe could probably get away with a simple testing methodology like the following: 1. Download the old page from a public web site, called the production site. 2. Edit the page on local computer. 3. View the local page in a browser. 4. Upload the page to a public web site. 5. View the public page in a browser to make sure the changes worked.
These steps reduce the chance that Joe will break his home page and that the broken page will remain broken until someone e-mails Joe. For example, he’ll know when he performs step 5 if his upload failed. Notice two key points about Joe’s method:
Joe used his hard drive as a private area for development.
Joe double-checked his changes to make sure he didn’t break anything.
More advanced checking builds on these two fundamental aspects of Joe’s testing methodology.
Private Testing Area One component of a good testing methodology is the use of a private testing area. Joe had this component in his methodology—he tried out changes on his hard drive. He could experiment freely without worrying that people would view his rough drafts or cause a problem with his existing web site. Although Joe might be perfectly happy using a hard drive as a development area, this won’t work for many web sites.
More complex web sites require different types of private areas. If more than one person will be testing a web page, it needs to be published on a private site. Once off the production site, the testers can collaborate on the page. Depending on your needs, there are a couple of areas where you can try out your changes:
A private area on your main web server
An entirely separate test server
In the following sections, you’ll see the differences between these two testing areas. You’ll also learn about tools for moving your files from your testing area to the main server. Main Web Servers For most web sites, the best option for a private testing area is in a private section of the main web server. This is a common choice because it provides a more functional test environment than a stand-alone hard drive but doesn’t require many additional resources. By reusing an existing server, you can also save the cost of an additional server.
If hard drive space on your server is a problem, you can either buy a new server or upgrade your drives. Because the cost of hard drives has dropped significantly in the past few years, it’s less expensive to upgrade the drives.
You will certainly want to consider publishing your changes to a web server if you have multiple people looking at a site or if you use pages that need to be tested on a server. By publishing to a private section of a server, you can test server functionality before releasing your site to the public. If the pages being edited use CGI scripts, for example, they can’t be easily tested on a hard drive. Staging Servers If you find that the private version of your web site is interfering with your main site, you may need a staging server. A staging server is a separate web server on which you can put a private version of your public web site. It has almost all the functionality of the public web site, yet it won’t interfere with the public site. Staging servers are useful because tests to it will accurately predict how changes will affect the public site and ensure that the testing phase doesn’t affect the main site.
Staging servers are particularly useful if you are experimenting with new versions of an operating system, new databases, or anything that might crash a whole server. If you ever find yourself wondering if your actions on your test site are going to crash the server, you may want to consider getting a staging server.
Quality assurance is a job function in which applications are checked to make sure they work as expected. People performing quality assurance report the errors that they find.
Copying Files between Servers Whatever the complexity of your testing needs, it should be fairly simple to move files between the private testing area and the main area. If it is too complicated, errors can be introduced in the transfer process. HTML editors, such as NetObjects Fusion or Microsoft FrontPage, have publish/export/save as settings that explicitly support uploading development code to multiple servers or to different areas on a single server. These HTML helpers make sure the file location tags are rewritten to accommodate multiple servers. Figures 8.1 and 8.2 demonstrate the capabilities available in many HTML helpers. Figure 8.1 shows the Server Locations tab in the Publish Setup dialog box of NetObjects Fusion. In this dialog box, users can add, edit, or delete server locations. Notice that there are presently three server locations defined: Main Server, My Computer, and Remote Staging Server. Figure 8.2 shows the Publish Site dialog box. Here, the user can select which server to publish to, as well as whether to publish the entire site or only the changed assets. Once you set up the location of each server, it should be easy to try out and debug changes before your audience can find them.
Avoid Corrupting E-Commerce Web Sites Changes to e-commerce web sites require special care. Problems with e-commerce sites translate into lost revenue for the company, so it is especially important that these sites don’t suffer from errors. E-commerce web sites also typically rely on complex interactions between application servers, SQL databases like Microsoft SQL Server and Oracle, and legacy order-fulfillment systems. Whereas updating a normal web site is much like releasing the new version of a book, updating an e-commerce web site is like releasing a software upgrade. Changes in the user interface often accompany changes in the software that runs the site and talks to the database, credit card companies, and possibly an order-fulfillment system. Private testing areas for e-commerce web sites need to be carefully designed by the people who wrote the software for the e-commerce main site.
Use the Storyboard
A
nother crucial component to any testing methodology is verifying that the page does what it’s supposed to do. When Joe double-checks the modifications to his home page, he can just look at the new page and see if the changes are correct. If you happen to have a more complex site, it is more difficult to determine if a set of pages works correctly. A common way to ensure that changes are correct is to check the new pages against a comprehensive description of how the pages should look and function. This description is often called a storyboard.
A storyboard is a document that describes how the pages should look and function. It lists screen-by-screen how the pages will look and how the user might interact with them. It will often separate the different elements that go into each screen: what text, graphics, animations, and application code should be on each page. Also, it charts out the relationships between the pages, especially what actions occur when users choose each option on each page.
People who perform quality-assurance tests can follow each path on the storyboard (like a map) and make sure the text, graphics, and applications all behave as expected. They should make sure the new pages have all the functionality called for in the storyboard, as well as look for any errors.
In this section, we looked at two key elements you need to have in place to be able to properly test your web site—a private location to test the pages and a storyboard to check the pages against. We haven’t discussed what you should be testing—you need to decide what is important to you. The next section will serve as a pointer to some of the items you may want to put on your checklist.
What to Check
A
s quality assurance testers go through the site, they should look for everything that could appear to be a bug to the target audience. Basically, they ask the following questions:
Do all pages exist?
Do pages appear and function correctly?
Do new pages cause applications to generate error messages or use the wrong logic?
Do pages meet site policies, such as on content types and file size?
Can the server handle the new files?
In the following sections, we’ll cover each of these elements in more detail. Once all of these elements have been checked, you can safely roll out your site.
Link Checking (Checking Hyperlinks) Link checking solves a common problem with web sites: broken links. Broken links return the dreaded “404—File not found” message when you follow them. The server says, “File not found” because the link requests a file that doesn’t exist. Link checking ensures that all of the files are in place. To check links, follow the site map or storyboard and click every link. If you find a broken link, report it. You should also look for paths that the storyboard or site map indicates should exist yet do not.
Nothing beats the human eye, but there are some automated tools that visit every link on the site for you and report broken links and malformed HTML. See tucows.tierranet.com/htmlval95.html for a list of Windows utilities that do this.
Storyboard and Appearance Checking As you are testing your site, make sure it looks good and works well. Ensure the pages operate according to your plan. If each page has been detailed in a storyboard, you can check the storyboard against the actual operation of the page. For example, the storyboard may include detailed instructions for a login page. If the storyboard calls for a login page to verify a user’s password, then the page should really verify the password and take the user to either the “verified user page” or the “password incorrect page.” If the login page verifies people who have not registered through to the “verified user page,” then it is failing to work properly. Verify both the appearance and functionality of the site on more than one browser. If you expect visitors who use Internet Explorer 4, Internet Explorer 5, Navigator 4, and Navigator 6, test the site functionality with all of these browsers. This is especially important if your site relies on client-side scripting or Java applets because they are sensitive to different browser types and versions. Also, as mentioned in Chapter 6, web sites with nonstandard features like style sheets or frames might work according to the storyboard in Netscape 6 yet fail to work properly in Netscape 4. Similarly, if the site aims at supporting clients on modems, you may want to try out the site using a modem. You can often discover which graphics or features are painfully slow when downloading at 56Kbps.
The standard screen resolution is 72 dots per inch (dpi). If pictures were scanned at 300dpi, they will appear four times as large on the screen. You can override the natural size of images by specifying the image size in your HTML document. If you manually set the image size, your browser will shrink or stretch the image to fit the required size. It is most efficient to just scan an image at 72dpi. Check out www.lib.berkeley.edu/Web/imagesizetips.html for information on image resolution.
CGI Errors and File Corruption Moving files around can sometimes cause or expose problems with files and programs on the server. After you install your new files, you’ll want to double-check that your CGI scripts and other web applications work. You’ll also want to download any updated or new binary files to make sure they haven’t been corrupted. When you test your CGI scripts, don’t enter only “good” data. Enter all sorts of bogus data into the forms that run CGI scripts to see if the scripts return errors or do the wrong thing. You can be sure users will type garbage into these forms, so you might as well do it first and find out how your script will behave. Especially try the following:
Enter nothing at all in required fields.
Paste very long strings of text into fields meant for short answers.
Type special characters, such as & ; * - $ < >.
Type information in the wrong field, such as entering in a telephone number for the zip code.
See “Debug Server Problems” later in this chapter for some tips on debugging errors in CGI scripts.
Policy-Compliant Pages Your site probably has a number of policies and guidelines. It’s a good idea to review the pages with these policies in mind. Here are just a few of the possible types of policies you may have:
Security policy for CGI scripts (see Chapter 5)
Policy on size of pages or maximum download times (see Chapter 7)
Policy on content types (see Chapter 7)
Copyright policy (see Chapter 9)
Although it isn’t good to straitjacket a site with rules, policies and guidelines can be valuable in building a knowledge base of successful practices. Your customer support engineers, for example, may have learned that your web site gets half as many complaints when page size is kept below 50KB. This is valuable knowledge, but it will be wasted if it doesn’t change the actions of the web site designers. By crystallizing the understanding of this facet of customer behavior as a site policy, you can translate knowledge into action. It is
often helpful, of course, to document the reason for the policy. Otherwise, the designers may feel unnecessarily constricted or forget the original reason they modified the original policy and revert back to the old method.
Server’s Capability to Handle the New Files If you are about to release either a totally new or upgraded web site, it’s possible that many more people will start coming to it. Before you install a new version of your site, determine if your servers can withstand an increase in demand. Are your pages large, or do you expect them to be wildly popular? You may want to use the suggestions in Chapter 7 about determining your bandwidth needs to see if you need to add server capacity. Web sites that let people download software often face a dilemma here. Software vendors want to make their 2MB software package available online, but if 500 people each try to download the file, it could consume all available network resources. One way of handling this is to create a mirror site, which is another site that copies a portion of your site. For example, a software company may mirror documentation to sites all over the world. A user can then pick any of the mirror sites, reducing the load on any particular site. As you can see in Figure 8.3, software sites mirror their software around the world and then point people to the distributed download points. This helps them avoid bandwidth and latency problems.
Predict Server Response with Load-Testing Software Load-testing software simulates the activity of thousands of users visiting your web site. If you aren’t sure how well your web server will stand up to demand, you can run load-testing software to predict what will happen. See www.google.com/search?q=load+testing+website for a current list of commercial products that do load testing. Some load-testing products test a single component, like the ability of the web server to deliver thousands of pages a minute or the ability of the database engine to run many inserts, deletes, and selects. The other type of load testing pretends to be thousands of virtual users, each taking a path through the web site, trying out different options, and using different resources (see, for example, Portent software at www.loadtesting.com/). Both types of load testing can be useful.
If you are unsure of whether you need to worry about the capacity of your server, keep an eye on your network and server utilization. If they approach 85 percent during peak usage, consider upgrading the server or adding mirrors for large static files.
Resolving Internet Problems
D
espite your best efforts in checking your web site before rolling it out, users will still find errors in your system. Moreover, users will often mistake problems on their own computers or Internet connectivity problems for errors in your web site. At a minimum, you’ll have to be able to discern if a
bug report indicates that your site is broken. Sometimes you may want to help users figure out where their problem is. To resolve an Internet problem, you need to diagnose whether the problem is a misconfigured web browser or one with your site. In this section, you’ll learn some troubleshooting steps, including:
How to identify the exact error that the client reports
How to isolate the location of the error
How to debug server problems
How to use tools to debug connectivity problems
Users may not always describe problems they’re having in terms you understand and they may not understand the technical terms you use. For a humorous look at this problem, see the Jargon File (www.tuxedo.org/~esr/jargon/) for a funny user/sysadmin translation phrase book.
Troubleshooting Steps When you’re troubleshooting, it’s helpful to follow a procedure so you don’t forget important steps. You can follow CompTIA’s Network+ troubleshooting model, for example, to debug your Internet problems. This model has eight steps: 1. Identify the exact issue. 2. Re-create the problem. 3. Isolate the cause. 4. Formulate a correction. 5. Implement the correction. 6. Test the solution. 7. Document the problem and the solution. 8. Give feedback.
To facilitate our discussion of the troubleshooting steps, let’s assume that a user has called you, a web site owner, to complain about not being able to connect to your web site.
Step 1: Identify the Exact Issue Obviously, if you can’t identify a problem, you can’t begin to solve it. Typically, you need to ask some questions to begin to clarify exactly what is happening; for example:
Which parts of the web site are you having trouble accessing?
Is it just this particular web site? Any web site?
Can you use your web browser?
You may find out that the user has trouble accessing all of your web site but can access any other site.
Be especially mindful of the simple stuff. It’s a good idea to make sure cables are plugged in or the modem connection is up before going on to other steps. In one case, a client had called to report a monitor problem; it turned out that the computer wasn’t even powered on.
Step 2: Re-create the Problem The next question to ask anyone who reports a web site problem is, “Can you show me what ‘not working’ looks like?” If you can reproduce the problem, you can identify the conditions under which it occurs. And if you can identify the conditions, you can start to determine the source. Unfortunately, not every problem can be reproduced. The hardest problems to solve are those that can’t be reproduced but instead appear randomly and with no obvious pattern. Computers and networks are fickle; they can work fine for months, suddenly malfunction horribly, and then continue to work fine for several more months, never again exhibiting that particular problem. And that’s why it’s important to be able to reproduce the problem. If you can’t reproduce the problem, you won’t be able to tell if your attempted solution actually fixes it.
It is a definite advantage to be able to watch the user try to reproduce the problem. That way, you can determine whether the user is performing the operation correctly. “Operator error” is one of the most common problems that you will encounter.
Step 3: Isolate the Cause If you can reproduce the problem, your next step is to attempt to determine the cause. Drawing upon your knowledge of the Internet, web servers, and web clients, you might ask yourself and your user questions such as the following: Were you ever able to do this? If not, maybe they are trying to do something they simply cannot do. If so, when did you become unable to do it? If the computer was able to perform the operation and then suddenly could not, the conditions that surround this change become extremely important. You may be able to discover the cause of the problem if you know what happened immediately before the change. It is likely that the cause of the problem is related to the conditions surrounding the change. Has anything changed since you were last able to do this? This question can give you insight into a possible source for the problem. Most often, what was changed before the problem started is the source of the problem. When you ask this question of a user, the answer is typically that nothing has changed, so you might need to rephrase it. For example, did anyone add anything to your computer? Or, are you doing anything differently from the way you normally proceed? Were any error messages displayed? This is one of the best indicators of the cause of a problem. Error messages are designed by programmers to help them determine what aspect of a computer system is not functioning correctly. Unfortunately, nine times out of ten, the client didn’t write down the error. Are other people experiencing this problem? This is one question you must ask yourself. That way, you might be able to narrow the problem down to a specific item that may be causing the problem. Try to duplicate the problem yourself from your own workstation. If you can’t duplicate
the problem on another workstation, it may be related to only one user or group of users (or possibly their workstations). If more than one user is experiencing this problem, you may know this already because several people will be calling in with the same problem. Is the problem always the same? Generally speaking, when problems crop up, they are almost always the same problem each time they occur. But the symptoms may change ever so slightly as conditions surrounding them change. A related question is, If you do x, does the problem get better or worse? For example, you might ask a user, “If you use a different file, does the problem get better or worse?” If the symptoms become less severe, it might indicate that the problem is related to the original file being used. These are just a few of the questions you can use to isolate the cause of the problem.
Step 4: Formulate a Correction After you observe the problem and isolate the cause, your next step is to formulate a solution. Trust us—this gets easier with time and experience. You must come up with at least one possible solution, even though it may not be correct. And you don’t always have to come up with the solution yourself. Someone else in the group may have the answer. Also, don’t forget to check online sources and vendor documentation. You might have determined, for example, that a problem was caused by an improperly configured DNS lookup on a workstation. The correction would be to reconfigure DNS on the workstation.
Step 5: Implement the Correction In this step, you implement your formulated correction. In our example, you would need to reconfigure DNS on the workstation.
Step 6: Test the Solution Now that you have made the changes, you must test your solution to see if it solves the problem. In our example, you would ask the user to try to access the site again. In general terms, ask the user to repeat the operation that previously did not work. If it works, great! The problem is solved. If it doesn’t, try the operation yourself. If the problem isn’t solved, you may have to go back to step 4, formulate a new correction, and redo steps 5 and 6. But it is important to make note of what worked and what didn’t so that you don’t make the same mistakes twice.
Step 7: Document the Problem and the Solution It is important to document the solution to a problem. If one person has a problem, other people are likely to have it. You definitely want to document problems and solutions so that you have the information at hand when a similar problem arises in the future. With documented solutions to documented problems, you can assemble your own database of information that you can use to troubleshoot other problems. If you suspect that other users are likely to have the same problem, such as a “operator error” situation, it may even be a good idea to pass along the information in e-mail.
Step 8: Give Feedback Of all the steps in the troubleshooting model, this is probably the most important. Give feedback to the people who need to know, especially the person experiencing the problem (so they know the problem is fixed). If a malfunctioning web site was a cause of the problem, you should also notify the person responsible for the broken part of the site. Web site developers can use feedback to improve their own debugging process, and most of the time they don’t get the needed information because everyone thinks that it has already been reported. They can learn what errors slipped through their debugging efforts, and improve their pre-rollout testing methodology.
Troubleshooting Tips These eight steps are a good overall methodology for troubleshooting. You can bolster this methodology with tips and advice from the following sections. Here’s a guide for when to use the tips in the following sections during the model troubleshooting process. Understanding Client Error Codes The tips in this section will prove to be helpful for step 1. Isolate the Cause of HTTP Errrors Use these tips for steps 3 and 4. Debug Server Problems If you’ve narrowed a problem down to the server, this section will give you a tip that will help you determine the exact cause of a failed request. Network Connectivity Utilities This section documents when you would use different tools to test Internet connectivity. These tools can be used in both steps 3 and 6.
Understanding Client Error Codes Step 1 of the troubleshooting methodology is to identify the exact problem. In the realm of HTTP clients and servers, you can do this by paying attention to the error code. If you get an error when trying to access a web site, your browser will often report the error using a particular phrase.
See http://www.help.com/for more information on helpful tips and solutions.
Let’s review how error messages are generated. Web requests use the HTTP protocol. As described in Chapter 2, the client issues a request, and the server responds to the request. Along with the response, the server sends a response code. A response code is information about the response, generally saying whether or not the server could fulfill the request and listing any problems it had in doing so.
Error messages are only clues to the underlying problem. If trying again doesn’t work, the next step will be to isolate and fix the problem. Error codes are a useful tool for fixing problems, but they only indicate the results of one person’s requests. Thorough troubleshooting includes testing the extent of the problem. See “Isolate the Cause of HTTP Errrors” later in this chapter for more information on this topic.
When web browsers give error messages, they are saying one of three things:
I tried and failed to do something on the local computer.
I couldn’t get a clear response from the server.
Here is the response code from the server, which I’m passing on to you.
The first two types of messages will generally show up as pop-up windows in the browser. You can see the pop-up error message “The server does not have a DNS entry” in Figure 8.4. Notice that the error message shows up in a small window floating above the main browser window. The third type of error message will generally show up in the main browser window. You can see this type of error message in Figure 8.5. Figure 8.5 shows a “404—File not found” error message. The error code shows up in the title bar. A cryptic explanation appears in the main browser window.
Browser Functionality Errors Many error messages that the browser generates may have nothing to do with the Internet. If a browser recognizes that it is being asked to do something it can’t do, it will report an error—for example, if it is being asked to launch a plug-in or helper that isn’t installed or is installed improperly. In Figure 8.6, you can see the browser reporting its failure to open a PDF file. Unless the PDF file is corrupt, this indicates a misconfigured browser. FIGURE 8.6
Navigator can’t open Acrobat
Similar internal error messages include “Helper application not found” and “Viewer not found.” These messages will also alert the user to local problems such as her hard drive running out of disk space or her computer lacking abundant RAM to perform a task.
Connectivity Errors All of the following messages indicate that the browser could not establish a good connection with the server: No DNS entry This error message indicates that the browser failed to find an IP address for the domain name. This could mean that some DNS server is either corrupt or inaccessible. If a DNS server is the problem, it could be either the DNS server that is authoritative for the domain or the client’s DNS server. This error also occurs when the client has no connectivity to its own DNS server. See Chapter 2 for more on DNS. Server not responding The error occurs when the client can resolve an IP address for the host, but there doesn’t seem to be any network connectivity between the server and the client. Connection reset by peer The server seems to have abruptly cancelled the connection. This could indicate a number of network connectivity problems, or a swamped server.
Server returned an invalid or unrecognized response The client received a response, but it didn’t conform to the HTTP protocol and was probably garbled. File contains no data The response has an HTTP header, but no content. This probably indicates a server application error, such as a CGI script that fails without giving a good warning message. Although it looks like a connection error, in fact it is more likely to be a server error.
This is not a complete list of all error messages. Browsers may not use the same words for the same error. One browser might say, “No DNS entry.” Another might say, “Error resolving host www.anyhost.com.” They are the same basic error, but the error message will vary slightly.
Connectivity problems often go away if you try again in a minute or two. You can see Internet Explorer failing to find the DNS entry for www.google .com in Figure 8.7 and then finding the DNS entry and displaying the web page seconds later in Figure 8.8. FIGURE 8.7
Internet Explorer can’t find the DNS entry
If network problems persist, see “Isolate the Cause of HTTP Errrors” later in this chapter. Isolating the problem is step 3 in the Network+ trouble-shooting model.
HTTP Response Codes Even if the browser successfully received a response to its query, the response may indicate a problem. The HTTP response codes are grouped into four sections:
The 2xx codes represent success.
The 3xx codes indicate that the file has been moved.
The 4xx codes indicate an error, probably on the client.
The 5xx codes indicate an error, probably on the server.
Here are common HTTP error codes (you can read more about all the response codes at www.w3.org/Protocols/HTTP/HTRESP.html): 400—Bad request The request had bad syntax or it was inherently impossible to fulfill it. 401—Unauthorized The server has not yet certified the client’s request as authorized. Usually the client will immediately ask the user for a password and try again. 403—Forbidden The request is for something forbidden. Authorization will not help.
404—Not found The server has not found anything matching the URL given (see Figure 8.9). Usually this is because a file is missing or the user has typed in the wrong URL. 500—Internal error The server ran into a problem. Typically, this indicates that the server tried to run a broken CGI script. You can see the dreaded 404 error code in Figure 8.9. The client tried to look up a file, called imaginary-file.html, that didn’t exist. The other common error code is 500—Internal error. Both 404 and 500 error messages can actually indicate either user error or web site malfunction.
Some web server software (e.g., Microsoft IIS) allows you to configure your own error messages to make a web site more “friendly.”
FIGURE 8.9
A Netscape “File not found” error
Isolate the Cause of HTTP Errors The first part of step 3 (of the troubleshooting steps discussed earlier) is to determine where on the network an Internet problem occurs. The previous sections on network troubleshooting describe the sorts of questions to ask to determine where a problem might be happening.
Once you have asked these questions and done some preliminary testing, you should be able to say what conditions cause the bug. At this point, you should be able to determine who is experiencing the problem:
Everyone looking at the site
Groups of users, but not everyone
The user experiencing problems with the Internet in general
The user experiencing problems with just this site
The user experiencing problems with just certain pages on this site
Many users experiencing problems with certain pages on this site
Determining which category of symptoms applies is important for two reasons. First, it suggests a particular network component to debug. Second, it reveals the scope of the outage and thus the relative priority for fixing it. In the following sections, we’ll give you some tips for formulating a correction depending on the sort of problems—it should be helpful as you consider step 4.
Keep accounts on several different networks. Then when you are trying to see if a site is really unreachable, you can try to connect to it from multiple places on the Internet. This allows you to get a better sense of where Internet connectivity problems are.
Server or Server’s Network Is Down If the problem occurs for everyone trying to look at a site, either the server is down or the target site’s network connection is down. If you have access to the server, see if it is on and if it has network connectivity to the Internet (see “Network Connectivity Utilities” later in this chapter). If a site is inaccessible to certain groups of users, it also generally indicates a connectivity problem. If the server is accessible within the firewall but not outside the firewall, check to see if network connectivity is broken between the firewall and the Internet. If a site is inaccessible only to people on certain Internet backbones, there is probably a connectivity problem between the site’s Internet access provider and other backbones. In this case, the only thing you can do is report the connectivity problem to the site’s Internet access provider and wait for them to fix the problem.
User Connectivity Problem If it turns out the entire Internet is not accessible to a user, then the problem most likely lies with that user and not with any particular web server. At this point, it is useful to determine who’s responsible for the user’s connectivity— usually it will be the user’s ISP or his corporate IS department. Whoever helps the user will look at his network connectivity and network settings. Even though the user may be complaining to the administrator of a certain web site, that web site may not be relevant to the debugging effort.
Web Site Bugs If the user gets an error message only when trying to use your site, or portions of your site, you should find out exactly what the error message is. It may indicate a bug with the pages of your site, or it may indicate that the user’s browser is incompatible or misconfigured. Duplicate the user’s actions and see if the page responds as it should. Refer to the storyboard if necessary. If the page works for you but not for the reporting user, see if different browser versions or browser settings make a difference. If so, refer to your content type policy to see whether the pages need to be changed. If the page doesn’t work for anyone, then there is surely a bug on the web server, which you’ll learn more about in the next section.
Debug Server Problems If isolating a problem points to the web server, the next step is to troubleshoot the web server. To debug the server problem, repeat the troubleshooting steps 1 through 8, but this time, focus on the web server. As you troubleshoot the server problem, you should keep in mind two topics that are especially relevant to the cause of errors and the source of resolution, respectively, for many web server problems: File permissions Misconfigured file permissions on the server are a common problem, so be sure to look for this information in troubleshooting step 3. Log files Web servers keep track of information about the requests they fulfill. You can use the log files to see what is happening with each request. This accurate information is especially helpful in troubleshooting steps 1, 3, and 6.
Bogus Bug Reports A user might say, “The help page on how to use the calendar is gone.” Perhaps this help page never existed, has never been promised, or has never been alluded to. A callous and foolish administrator might say, "There is no bug here, the user is mistaken." But the user is not just mistaken, they are just unknowingly telling you something else. For every user who files a bug report or complaint, there are dozens, perhaps hundreds who just ignore their problem. So although the people who write in are unusually cranky, they are also like the sensitive canary in a coal mine. When one of them croaks, it isn't time to ignore them; it is time to pay attention to what ails them. Maybe there should be a help page for the calendar!
Here’s a quick review of how web servers work so that the discussions about file permissions and log files will make more sense.
How Web Servers Fulfill Requests A web server responds to almost every request it receives. If it can’t fulfill the request, it responds with an error message that has an error code. As it considers each request, it performs a series of tasks, which can be summarized as follows: 1. The server translates the URL to a local file on the web server. 2. The server sees if the user is allowed to make this request. 3. The main action occurs when the server tries to gather the content
needed for the request (such as reading a file or executing a script) and sends it back to the client. 4. Finally, the server logs the request as either successful or unsuccessful.
You will note that the server may fail in step 3 if the file is missing. Also, the web server will log a request whether or not the request succeeds.
File and Directory Permissions As you are trying to isolate the cause of web server error, don’t overlook permission settings. It is quite common for files to have the wrong permission settings and therefore be unreadable by the web server.
In a multiuser operating system, file permissions allow people to keep their files private. The web server isn’t really a person, but to the computer it is just another user. If someone sets the permission on their files so the web server can’t read them, the web server will be denied access to the files by the operating system.
In fulfilling requests, a web server will try to gather content and send it to the client. Gathering content for HTML or graphics files consists of reading the file. Gathering content for scripts or applications consists of executing the programs and reading the output. If the web server tries to read a file that doesn’t exist, it will fail and report a “404—File not found” error. Likewise, if the web server tries to read a file that it does not have permission to read, or if it tries to execute a file it does not have permission to execute, the server will fail and report a “403—Forbidden” error. To fix this, simply change the permission on the file so the web server user can access it.
Log Files Web servers log the problems they run into so you can debug and fix the errors that caused the problems. A server stores a history of its activity, especially its success and failure at fulfilling web requests, in log files. Log files can be an invaluable tool in debugging web server problems. There are least two different log files on a web server—the access log and the error log. Access Log The access log records each hit to the web server. For each hit, it may record the IP address of the client, the name of the browser, the URL of the request, the date and time of the request, the status (200, 403, 404, or otherwise) of the request, and other information of that sort. The access log can be very useful in tracing the path of someone who is having trouble. You can see what requests the web server thought were successful and which ones were not. You can also use this log to stop incoming calls if you check it periodically and fix any errors listed in the log. Error Log The error log records errors, and its detailed messages can make it clear exactly what a problem is. For example, CGI scripts often fail to run for mysterious reasons. Because CGI scripts leave debugging information in the
error log when they fail, you can look in the error log to figure out why your CGI script isn’t working. You can see a broken CGI script in Figure 8.10. The HTML error message is not very instructive. The error log for this failed request is more helpful— Figure 8.11 shows that when the server tried to execute broken-cgi.pl, it couldn’t execute the script because of a syntax error on line 14. FIGURE 8.10
A broken-cgi.pl Web output
FIGURE 8.11
A broken-cgi.pl log output
See Tom Christiansen’s “The Idiot's Guide to Solving Perl CGI Problems” at www.perl.com/CPAN-local/doc/FAQs/cgi/idiots-guide.html for common solutions to fixing broken CGI scripts.
Network Connectivity Utilities As you hone your network debugging skills, your toolset should include utilities to test network connectivity. You learned that error messages like “Server not responding” indicate a connectivity problem. But to determine a problem’s scope and location, you need to do further debugging. There are many network utilities available, and each serves a particular purpose. In the following sections, you’ll learn when to use various diagnostic tools for identifying and resolving Internet problems. Specifically, you’ll learn about the following utilities:
ARP
Netstat
Ping
winipcfg
ipconfig
Trace Routing utility
Network analyzer utilities
The i-Net+ exam only covers when to use these utilities, not how to use them. If you want to learn more about the exact syntax for these utilities, see the Network+ Study Guide (Sybex, 2002), which covers them in more detail.
The ARP Utility The ARP protocol translates TCP/IP addresses to MAC (media access control) addresses that are used by local network devices such as Ethernet cards. The ARP utility is primarily useful for resolving duplicate IP addresses. For example, a workstation receives its IP address from a DHCP (Dynamic Host Configuration Protocol) server, and it accidentally receives the same address as another workstation. When you try to ping it, you get no response. The reason you fail to get a response is that your workstation is trying to determine the MAC address of the destination computer, and it can’t do so because two machines are reporting that they have the same IP address. To solve this problem, you can use the ARP utility to view your local ARP table and see which TCP/IP address is resolved to which MAC address.
netstat Using netstat is a great way to see the TCP/IP connections (both inbound and outbound) on your machine. You can also use it to view packet statistics (similar to the MONITOR.NLM utility on a NetWare server console), such as how many packets have been sent and received, the number of errors, and so on. When used without any options, netstat produces output similar to that in Figure 8.12, which shows all the outbound TCP/IP connections (in the case of Figure 8.12, a web connection). The netstat utility, used without any options, is particularly useful in determining the status of outbound web connections. FIGURE 8.12
Output of the netstat command without any switches
In Figure 8.12, the Protol column lists the protocol being used. Because this is a web connection, the protocol is TCP. The Local Address column lists the source address and the source port. In this case, the source address is ws and the source port is 3020. The foreign address is the web site Precision .GUESSWORK.Com, at the default HTTP port of 80. The state is shown to be closed—the network connection is currently finished.
The Ping Utility Ping is the most basic TCP/IP utility and is included with most TCP/IP stacks for most platforms. Windows 95/98 and NT are no exception. In most cases,
Ping is a command-line utility (although there have been some GUI implementations). You use the Ping utility for two primary purposes:
To find out if you can reach a host
To find out if a host is responding
You can use both Ping and tracert to see if you can reach a host. You generally will use Ping if you just want to see if the host is responding to you. If you also want to see how your packets are routed over the Internet to the host, use tracert.
The winipcfg and ipconfig Utilities Of all the TCP/IP utilities that come with Windows 95/98 or NT, the IP configuration utilities are probably the most overlooked. These utilities display the current configuration of TCP/IP on a workstation, including the current IP address, DNS configuration, WINS configuration, and default gateway. winipcfg is the Windows 95/98 version of this utility, and ipconfig is the Windows NT version and is command-line driven. The winipcfg utility comes in handy when you’re resolving TCP/IP address conflicts and configuring a workstation. For example, if a workstation is experiencing Duplicate IP Address errors, you can run winipcfg to determine its IP address. Also, if the address was obtained from a DHCP server, you can release it and obtain a new IP address by clicking the Renew All button. You can see in Figure 8.13 that it is easy to view your basic TCP/IP information simply by running winipcfg. FIGURE 8.13
The winipcfg utility’s IP configuration dialog box
ifconfig In Unix, ifconfig is a command used to configure network interfaces during the boot process. There are only a few commands used for troubleshooting purposes: ifconfig Displays the status of a particular network interface. ifconfig -a Displays the status of all network interfaces ifconfig debug Enables debugging mode. Information is placed in the console error log. ifconfig -debug Disables debugging mode.
The Trace Routing (tracert) Utility Have you ever wondered where the packets go when you send them over the Internet? The TCP/IP Trace Routing (tracert) command-line utility will show you every router interface a TCP/IP packet passes through on its way to a destination. To use tracert, at a Windows 95/98 or NT command prompt, type tracert, a space, and the DNS name or IP address of the host for which you want to find the route. The tracert utility responds with a list of all the DNS names and IP addresses of the routers the packet is passing through on its way. Additionally, tracert indicates the time it takes for each attempt. Figure 8.14 shows sample tracert output from a workstation connected through an ISP (PacBell Internet, CA, in this case) to the search engine Yahoo! As you can see, the packet bounces through several routers before arriving at its destination. This utility is useful if you are having problems reaching a web server on the Internet and you want to know if a WAN link is down or if the server just isn’t responding. You can use tracert to ascertain how many hops a particular host is from your workstation. This is useful in determining how fast a link should be. Usually, if a host is only a couple of hops away, access should be relatively quick.
Network Analyzer Utilities Network analyzers, like Network Associates’s NetXRay, are tools that monitor and analyze network traffic. More specifically, they listen in on all the traffic on a certain network segment and provide sophisticated reporting tools that display the traffic in both an overall and a packet-by-packet level. Network analyzers are useful for seeing what is happening at the protocol level.
If all your computer networks are switched, a network monitor can’t listen in on the network conversations of other computers. Switched networks route traffic only to their recipients. If you need to analyze what is going on with a certain computer, you could add a small hub to the network, with the hub uplinking to the switch and both the target computer and the computer running the network analyzer plugged into the hub.
Network analyzers allow you to filter and capture packets. This is great for debugging failed network connections—you can set up a filter to look at a conversation between two computers and then examine the exact content of this conversation. You can see two conversations being captured in Figure 8.16. The first two connections show the first conversation. The workstation (192.168.1.2) and chat.excite.com (198.3.98.70) are sending small packets to each other. This is probably the Excite’s PAL client checking with the server to see if any of the workstation’s buddies are online. The rest of the connections signify the activity of checking Usenet newsgroups—although you can’t see all of it in the figure, the workstation is resolving the IP address for news.pacbell.net and then initiating an NNTP connection to news.pacbell.net. FIGURE 8.16
One of the most frustrating things to debug is a virus that is causing havoc with your systems. A virus is a program that causes malicious change in your computer and makes copies of itself. Sophisticated viruses encrypt and hide themselves to thwart detection. These stealthy viruses can be tricky to detect, and unfortunately, the Internet makes it easy to transmit them from one network to another. There are tens of thousands of viruses that your computer can catch. Known viruses are referred to as being “in the wild.” Research laboratories and universities study viruses for commercial and academic purposes. These viruses are known as being “in the zoo,” or not out in the wild. Every month the number of viruses in the wild increases. Viruses can be little more than hindrances, or they can shut down an entire corporation. The types vary, but the approach to handling them does not. You need to install virus protection software on all computer equipment. This is similar to vaccinating your entire family, not just the children who are going to summer camp. Workstations, personal computers, servers, and firewalls all must have virus protection, even if they never connect to your network. They can still get viruses from floppy disks or Internet downloads.
Types of Viruses Several types of viruses exist, but the two most popular are macro and boot sector. Each type differs slightly in the way it works and how it infects your system. Many viruses attack popular applications, such as Microsoft Word, Excel, and PowerPoint, which are easy to use and for which it is easy to create a virus. Because writing a unique virus is considered a challenge to a bored programmer, viruses are becoming more and more complex and harder to eradicate.
Macro Viruses A macro is a script of commonly enacted commands that are used to automatically perform operations without a user’s intervention. Macro viruses use the Visual Basic macro scripting language to perform malicious or mischievous functions in Microsoft Office products. Macro viruses are among the most harmless (but also the most annoying). Because macros are easy to write, macro viruses are among the most common viruses and are frequently found in Microsoft Word and PowerPoint. They affect the file you are work-
ing on. For example, you might be unable to save the file even though the Save function is working, or you might be unable to open a new document— you can only open a template. These viruses will not crash your system, but they are annoying. Cap and Cap A are examples of macro viruses.
Boot Sector Viruses Boot sector viruses get into the master boot record. This is track one, sector one on your hard disk. No applications are supposed to reside there. At boot up, the computer checks this section to find a pointer for the operating system. If you have a multi–operating system boot between Windows 95/98, Windows NT, and Unix, this is where the pointers are stored. A boot sector virus will overwrite the boot sector, thereby making it look as if there is no pointer to your operating system. When you power up the computer, you will see a “Missing operating system” or “Hard disk not found” error message. Monkey B, Stealth, and Stealth Boot are examples of boot sector viruses.
These are only a few of the types of viruses out there. For a more complete list, see your antivirus software manufacturer’s web site, or go to Symantec’s web site at www.symantec.com/.
The “False-Positive” Virus A “False-Positive” virus is actually an alert given by antivirus software that a file may be infected, and it turns out that the file is actually clean. The problem occurs because the newer engines are trying to detect even unknown viruses, and sometimes an application or the operating system may perform tasks in just the right sequence to set off a warning. It’s usually a good idea to visit the anti-virus vendor’s web site to see if it is a common issue. If not, try using another vendor’s product. If you are still not convinced that the file is virus free, you can send the file to the antivirus manufacturer for verification.
Updating Antivirus Components A typical antivirus program consists of two components:
The definition files list the various viruses, their type, and footprints, and they specify how to remove specific viruses. More than 100 new viruses are found in the wild each month. An antivirus program would be useless if it did not keep up with all the new viruses on a weekly basis. The engine accesses the definition files (or database), runs the virus scans, cleans the files, and notifies the appropriate people and accounts. Eventually, viruses become so sophisticated that a new engine and new technology are needed to combat them effectively.
Heuristic scanning is a technology that allows an antivirus program to search for a virus even if there is no definition for that specific virus. The engine looks for suspicious activity that might indicate a virus. Be careful if you have this feature turned on. A heuristic scan might detect more than viruses, like a software installation (that you actually want to happen).
For an antivirus program to be effective, you must upgrade, update, and scan in a specific order: 1. Upgrade the antivirus engine. 2. Update the definition files. 3. Create an antivirus emergency boot disk. 4. Configure and run a full on-demand scan. 5. Schedule monthly full on-demand scans. 6. Configure and activate on-access scans. 7. Update the definition files weekly. 8. Make a new antivirus emergency boot disk weekly. 9. Get the latest update when fighting a virus outbreak. 10. Repeat all steps when you get a new engine.
If you think this is a lot of work, you are right; however, not doing it can mean a lot more work and a lot more trouble.
To Run Antivirus Software or Not to Run One company thought that good antivirus software wasn’t necessary because of the cost involved. This company found out one day that a virus had not only infected one client’s machine, but that the client had sent out an infected e-mail to every person in the company. Everyone who received the e-mail opened it, every computer became infected, and because every computer connected to the network, the company servers were in turn infected. The net result was around three days downtime, most of the work wasn’t performed because the staff couldn’t use their computers; thus a substantial amount of revenue was lost.
Upgrading an Antivirus Engine A virus engine is the core program that runs the scanning process; virus definitions are keyed to an engine version number. For example, a 3.x engine will not work with 4.x definition files. When the manufacturer releases a new engine, consider both the cost to upgrade and the added benefits.
Before installing new or upgraded software, back up your entire computer system, including all data.
Updating Definition files Every week, you need to update your list of known viruses—called the virus definition files. You can do this manually or automatically through the manufacturer’s web site. As you can see in Figure 8.17, the antivirus software goes out to a central server and grabs any new antivirus information. In this way, the antivirus folks don’t get too far behind the virus writers. If you have a very large network, consider upgrading your clients virus definition files through the login script or some software distribution software, such as Microsoft’s Systems Management Server (SMS) or Novell’s Network Application Launcher (NAL), to cut down on bandwidth usage. Antivirus updates don’t seem to cause many problems, so use them!
Scanning for Viruses An antivirus scan is the process in which an antivirus program examines the computer suspected of having a virus and eradicates any viruses it finds. There are two types of antivirus scans:
On-demand
On-access
An on-demand scan searches a file, a directory, a drive, or an entire computer. An on-access scan checks only the files you are currently accessing. To maximize protection, you should use a combination of both types.
On-Demand Scans An on-demand scan is a virus scan initiated by either a network administrator or a user. They can manually or automatically initiate an on-demand scan. Typically, you schedule a monthly on-demand scan for computers and a weekly scan for servers, but you should also do an on-demand scan in the following situations:
Before you initiate an on-demand scan, be sure that you have the latest virus definitions.
When you encounter a virus, scan all potentially affected hard disks and any floppy disks that could be suspicious. Establish a cleaning station, and quarantine the infected area. The support staff will have a difficult time if a user continues to use the computer while it is infected. Ask all users in the infected area to stop using their computers. Suggest a short break. If it is lunchtime, all the better. Have one person remove all floppies from all disk drives. Perform a scan and clean at the cleaning station. For computers that are operational, update their virus definitions. For computers that are not operational or are operational but infected, boot to an antivirus emergency boot disk. Run a full scan and clean the entire system on all computers in the office space. With luck, you will be done before your users return from lunch.
On-Access Scans An on-access scan runs in the background when you open a file or use a program. For example, an on-access scan can run when you do any of the following:
Insert a floppy disk
Use FTP to download a file
Receive e-mail messages and attachments
View a web page
The scan slows the processing speed of other programs, but it is worth the inconvenience. Newer machines may not even have a noticeable effect. A relatively new form of malicious attack makes its way to your computer through ActiveX and Java programs (applets). These are miniature programs that run on a web server or that you download to your local machine. Most ActiveX and Java applets are safe, but some contain viruses or Trojan horses. The Trojan horse allows a hacker to look at everything on your hard drive from a remote location without your knowledge, and could actually send data from your hard drive to the hacker. Some Trojan horses have been found that cause a denial of service (DoS) attack against a particular site. Be sure that you properly configure the on-access component of your antivirus software to check and clean for all these types of attacks.
Many programs will not install unless you disable the on-access portion of your antivirus software, and some installations can become corrupted or cause problems with on-access scanning enabled. This is dangerous if the program has a virus. Your safest bet is to do an on-demand scan of the software before installation. Disable on-access scanning during installation, and then reactivate it when the installation is complete.
Emergency Scans In an emergency scan, only the operating system and the antivirus program are running. An emergency scan is called for after a virus has invaded your system and taken control of the machine. In this situation, insert your antivirus emergency boot disk and boot the infected computer from it. Then scan and clean the entire computer. If the computer connects to your network, it is imperative that you begin a scan of the server that could have been infected. If a virus is detected on that server, it’s time to run a scan of every server on the network or you could wind up with another infection of the same virus.
If you don’t have your boot disk, go to another computer and create one.
Software Patches
P
atches, fixes, service packs, and updates are all the same thing—free software revisions. They are intermediary solutions until a new version of the product is released. A patch may solve a particular problem, as does a security patch, or change the way your system works, as does an update. You can apply a so-called hot patch without rebooting your computer; in other cases, applying a patch requires that the server go down. You should be aware of the reasons people use patches, the sorts of software that people patch, and the problems that buggy upgrades can cause. We’ll discuss these topics in the following sections:
What to consider when deciding if you should install a patch
After looking through these sections, you should have a good understanding of why people use patches—especially security, encryption, and browser updates—and what sorts of problems updates can cause.
Is It Necessary? Because patches are designed to fix problems, it would seem that you would want to download the most current patches and apply them immediately. That is not always the best thing to do. Patches can sometimes cause problems with existing, older software. Different philosophies exist regarding the application of the newest patches. The first philosophy is to keep your systems only as upto-date as necessary to keep them running. This is the “If it ain’t broke, don’t fix it” approach. After all, the point of a patch is to fix your software. Why fix it if it isn’t broken? The other philosophy is to keep the software as up-to-date as possible because of the additional features that a patch will sometimes provide. You must choose the approach that is best for your situation. If you have little time to devote to chasing down and fixing problems, go with the first philosophy. If you always need the latest and greatest features, even at the expense of stability, or you work in an environment that considers security rather high and a security patch has been released, go with the second.
Where to Get Patches Patches are available from several locations:
The manufacturer’s web site
The manufacturer’s CD or DVD
The manufacturer’s support subscriptions on CD or DVD
The manufacturer’s bulletin (less frequently an option)
You’ll notice that, in every case, the source of the patch, regardless of the medium being used to distribute it, is the manufacturer. You cannot be sure that patches available through online magazines, other companies, and shareware web sites are safe. Patches for the operating system are also sometimes included when you purchase a new computer.
RealJukebox Surreptitiously Monitors Users—A Patch in the Making The New York Times reported that “RealNetworks’ popular RealJukebox software for playing CD's on computers surreptitiously monitors the listening habits and certain other activities of people who use it and continually reports this information, along with the user’s identity, to RealNetworks.” After this was publicized, RealNetworks released a patch that would prevent this behavior. Microsoft has similar software that also monitors what songs people listen to, although Microsoft doesn’t link this feature with forced user registration.
Desktop Security The Internet has made it easier to communicate, and unfortunately, this means it has made it easier for our computers to be monitored or sabotaged by others. Users are naturally concerned about this, and so when a new security alert goes out, many download a patch that fixes the new security problem. Some desktop security patches change the basic core of the operating system, such as the way it handles network logons and permissions. In Windows, this usually means changes to central DLLs and Registry information. As anyone who has upgraded a number of systems knows, change the Registry at your own risk! At the very least, back up your configuration files before you update your core system. That way, if the new DLLs don’t work, you can revert to your old system.
Encryption Levels Encryption products use mathematical formulas to hide real information and render it useless without a secret electronic key. You rely on encryption to keep your data safe from unauthorized eyes. Those people who want to steal your data also want to be able to decrypt your information. Thus, when there are flaws in encryption software, or previous levels of security become insufficient, it is necessary for you to update your encryption software and hope your previously encrypted data isn’t in the hands of your enemies.
Any encryption scheme that is untested and unpublished is probably vulnerable to being broken. Trust peer-reviewed encryption like that used in PGP (www.pgp.com/).
For example, Allaire’s ColdFusion web development system used a private encryption system to encrypt users’ applications so the applications’ source code could not be viewed but would still run. This encryption allowed companies to distribute their ColdFusion applications without worrying that someone would look at their raw code. However, this encryption level proved insufficient and it was cracked. Allaire is studying new encryption options, but this won’t help secure all of the now-exposed commercial ColdFusion applications. The United States regulates the strength of encryption that U.S. companies export. Netscape was only allowed to export 40-bit encryption with Navigator 4.08, but due to a changing legal environment, it has been able to export 58-bit encryption since Communicator 4.7. International companies that use Netscape browsers will likely want the added encryption.
Web Browsers and E-Mail Clients In the mad rush to market, software companies often release buggy software or software with missing functionality. They don’t want to, but they are afraid they’ll lose millions of dollars by releasing their products too late and increasing the market share of competing products. So many software companies would rather release a product, stay in business, and then issue a patch than release a product after their competitors do. You can see just a few of the patches Microsoft deems to be critical in Figure 8.18. There are multiple security patches for Internet Explorer.
IE is not alone in security problems. Netscape has also had its fair share of security bugs and patches.
Netscape and Microsoft have both often released buggy products (with their integrated web browsers/e-mail clients) and then released a series of patches to fix them. Typically they will add features without fixing all the potential bugs.
Screen shot reprinted by permission from Microsoft Corporation.
When upgrading your browser, you should take the same care as when you are upgrading your operating system. Back up mail, bookmarks/favorites, and preferences so you can go back to the old browser if the upgrade corrupts them, especially if you are updating client software across a network. If you think that bookmarks/favorites are unnecessary, just wait until a client loses them and you’ll have an irate client on your hands!
Often your ISP’s technical support line can assist you with upgrading your Internet client software.
Whenever you upgrade software, you are either going to upgrade to a newer version or your going to replace the current one with another vendor’s product. Usually, the first option is the easiest, because vendors typically provide some form of backward compatibility, or the capability to interpret the previous version’s format. In this section, we are going to take a look at the pitfalls that accompany both scenarios.
Upgrading to a Newer Version One of the most frustrating aspects for any network administrator is that software is upgraded on a regular basis. If it isn’t, technology just passes you by and it’s almost impossible to catch up. On some occasions, such as working with a client who stays on the leading edge of technology, you’re stuck performing upgrades much more frequently than you’d prefer; however, if you take your time in the testing and planning phases, your rollout will be a lot easier. The first step to any upgrade is to test the software on a standalone computer. Some of the questions that you need to answer are: What are the minimum and suggested hardware requirements? You need to test the response at the minimum requirements as well as the suggested hardware requirements to get a feel for problems that may be encountered at your site. If necessary, you need to schedule hardware upgrades for your clients well before you intend to roll out the new software. What problems can be encountered during installation? Install the product several times on the same machine, and document any problems you encounter. If you have to reinstall because of a faulty installation, what extra steps, such as manually editing the Windows Registry, did you have to take to get a good installation? Does the software support previous versions? If the software claims it does, try importing a few older files into the new software. Did you have to reformat anything; such as, was it a clean import? If it doesn’t, be prepared to find some way of accommodating your clients (or safe passage out of the building). Some of the newer applications allow you to have two different versions on one machine.
Have you found any compatibility issues with other applications? Some software packages simply won’t work if you have a specific application loaded on your computer. Test with the company’s standard software suite and then test with other applications that you know your clients use frequently. Visit the vendor’s web site. Most vendors now have a technical support page where you can see some of the common problems and the associated resolutions. If you’re upgrading a year after the software was released, it’s a good bet that you’ll find some useful information and be better prepared when you encounter those problems. After you’ve determined the pitfalls of upgrading a computer, you need to develop a plan for your deployment. The first step is to decide if you’re going to install the old-fashioned way and run around to each machine and install the software. The smarter route to go, if your company has this option, is to use some sort of software distribution product, such as Microsoft’s Systems Management Server (SMS) or Novell’s Network Application Launcher (NAL). If you have one of these available, you will need to test your distribution and document any problems beforehand. Once you’ve gone through these steps, it’s time to plan out the actual deployment. It’s not a good idea to do more than 20 clients a day unless your rollout is going extremely well. This allows you plenty of time to fix any issues before the next round of installations. It also helps to cut down on bandwidth usage if you’re deploying over the network, and a small pause during installation that could cause a major problem.
Replacing with a Different Vendor Product Replacing software with a different vendor’s product incorporates all of the steps we listed in the previous section, but there are two more tasks that you need to perform: compatibility verification and client training. The most difficult part of switching vendor products is compatibility between the old software and the new software. If the application is similar to an office suite, more than likely your clients are going to need to import documents into the new product. For example, if your company is replacing Corel WordPerfect with Microsoft Word, you’ll need to see what problems are encountered during the import. Typically, formatting issues arise. If the software doesn’t allow you to import the previous product’s data, you’ll need to determine if the cost of re-inputting the information is worth the expense.
Client training is often an overlooked aspect of any software deployment. Unfortunately, it is also one of the most important issues if you want your users to be able to use the software efficiently. Some clients can learn by reading, but how many of them have time? Because you’re using a different vendor’s product, it may be more difficult for your clients to figure it out on their own.
Legacy Clients
A
s you debug and troubleshoot Internet problems, you may come across legacy clients. Legacy client is a term used to describe software or hardware that is quite old. In general, there are three aspects of legacy clients that you should know for the exam:
Compatibility issues
Troubleshooting and performance issues
Determining the version of a client
People use legacy software when it is more expensive to upgrade to than it is to keep using old systems. Although you may encounter whole rooms of Windows 3.11 legacy computer systems, it shouldn’t concern you. If the systems weren’t fairly stable, they wouldn’t have lasted this long.
On the other hand, if they weren’t patched with any Year 2000 (Y2K) patches, you should be concerned. Much of the software in use today required some form of Y2K patch or remedy; you should ensure that all of your legacy clients have been taken care of or strange things could happen.
Compatibility Issues Legacy software uses technology that is older than the latest trends coming out of Silicon Valley. If you want to use new software alongside your old software, you may have a compatibility issue, which means that different technologies may not work with each other. A good example is trying to run some old MS-DOS programs on a Windows Me machine. It simply won’t work. In terms of the Internet, the most common legacy client is the old Windows 3.1 TCP/IP Socket system. As you learned in Chapter 2, sockets are mechanisms clients and servers use to carry on multiple conversations
with each other over the TCP/IP protocol. In Microsoft Windows operating systems, a file called WINSOCK.DLL provides access to the TCP/IP protocol. This means that individual client programs, like Netscape, don’t have to understand anything about your modem; they just talk to the operating system and the operating system talks to the modem. If your client program isn’t compatible with the version of Winsock you are running, your client program won’t run. The flip side to legacy compatibility is that some programs written to use the legacy system will simply not work with newer systems.
Performance There are two aspects of legacy performance: Speed gain Software has bloated over time. Many new software versions are actually slower than their predecessors. Windows 3.1, for example, runs faster than Windows 98 on some hardware. Some newer browser versions are noticeably slower than older ones. Speed loss Newer technologies do increase the maximum speed of networks. Newer versions of Winsock come in both 16- and 32-bit flavors, for example, whereas 3.11 Winsock systems are all 16 bit. Obviously, 32-bit processing is faster than 16-bit processing. Before asking people to give up their legacy system, find out if it is faster than the one you want them to use. If it isn’t, you’ll need to find some other benefits for using the newer system—usually compatibility issues with other software that they want to use will convince them.
Version Checking Legacy software is often not compatible with newer versions of other software; therefore, if you have legacy software, it is useful to know exactly what type of software you are running and exactly what version of other software you have installed. You can then determine if the legacy software and the new software will work together.
Bugs are inevitable, so troubleshooting is important. We discussed some methods of ensuring that your web site has as few bugs as possible. The first step is to develop a written testing method that allows you to follow a standard procedure each time that you test a new implementation of your web site. You saw the importance of creating a private testing area that keeps your production web site functioning while you work on revisions. A private testing area could reside on your computer’s hard drive, a special area on the main web server, or on a separate (staging) server. Storyboards allow you to document your web site’s logical layout. This includes links, file locations, page design, scripts, and so forth. A storyboard is an excellent troubleshooting tool, especially if you haven’t looked at a site in awhile. When you test out links and the appearance of your web pages, the storyboard will have everything listed. Any revisions that are made while you are designing modifications need to be incorporated into your storyboard. You are also able ensure that your site meets your company’s design policies. Pinpointing the cause of a problem and formulating a solution is similar to detective work: Take all of the information and piece it together to find the problem. Along the way, you will find many common problems and will have solutions, but when you can’t find the error, you need to arm yourself with a plan of attack. We discussed several troubleshooting steps that you can use to analyze the problem and formulate a solution. You may need to use one of the different troubleshooting tools that are available to you, such as network analyzers and IP diagnostic tools. Network analyzers provide you with a list of the actual data transmission, and can be valuable when you are experiencing bandwidth issues. IP diagnostic tools—winipcfg, ipconfig, ifconfig, tracert, netstat—are all useful in checking connectivity issues and confirming if a problem is with your server and/or connection equipment, or the ISP. Some other useful troubleshooting clues can be found with client error codes. There are a lot of codes that can’t possibly be memorized, but you can go to the vendor’s web site to see what information is available to fixing a problem. If it’s a problem with the browser, simple HTTP response codes can give you some insight (even if sometimes cryptic) into the problem. HTTP error codes usually indicate if a server (or the server’s network) is down, if the problem could be on the local machine, or possible a problem with the web site itself. To assist in the latter, we took a look at how web servers fulfill requests,
how file and directory permissions can cause a problem, and the wealth of information found in log files. One of the worst fears of any network administrator is having problems with their web site, so we would have been remiss if we hadn’t discussed virus protection. Viruses seem to have been in the news quite frequently in the past year, so it’s helpful to know the types of viruses that exist. You saw that macro viruses are written in Visual Basic and usually target office suites, such as Microsoft Office. Boot Sector viruses actually destroy the boot sector, which is the first track on the hard drive and contains file location information. On occasion, you’ll run into a false-positive where the virus software issues a warning that you may have a virus, but you actually don’t. Antivirus software scans for viruses, and provides for an on-demand scan that you initiate manually, as well as an on-access scan that runs automatically whenever you access a file. Unfortunately, antivirus software is useless if you don’t update the engine and the definition files on a periodic basis, and it’s highly recommended that you do so every week. In addition to updating antivirus software, you need to occasionally install patches on other applications—and even the operating system itself. A patch is a fix to a known bug in software, and some patches are quite necessary. Others may not apply to your situation and you can disregard it; however, to ensure desktop security, it is wise to keep up with the latest security patches for your operating system and web browser. Viruses aren’t the only ones that take advantage of flaws in software, but hackers have been known to use them to steal valuable data such as credit card information. Sometimes a patch isn’t the answer and you must then contemplate upgrading your software and/or operating system. We discussed several factors that should be considered when deciding whether or not to upgrade an application. Testing the new software plays heavily into the equation, but compatibility issues are also of major concern. On occasion, you may need to replace an existing product with another vendor’s application, which may require client training;however, upgrading isn’t always an answer, as you will find a need to keep legacy clients around. A legacy client is an older operating system or application that performs the task required. You cannot always find replacement software that will meet all of your requirements, and you must ensure that you can support them.
Exam Essentials Know what is involved in pre-launch testing. Before you launch your site, check to make sure it is working. At its most basic, this testing should include checking to make sure all the links work and that pages load correctly. More advanced testing includes checking to make sure the pages conform to policies on download time and content type, and that the servers have enough capacity to serve pages. Identify the different diagnostic tools and when to use them. You can use various diagnostic tools to identify and resolve Internet problems. For the exam, you should know when to use each tool. Use ping, for example, when you want to see if another computer is on the network, but use tracert to map the network topology to another host. Understand the importance of patches and updates. Patches and updates are software releases that fix and upgrade existing software. For the exam, you should be able to recognize update-related problems, such as e-mail not working properly after a browser has been updated. Know what legacy clients are and some of the drawbacks of using them. Legacy clients are old versions of software, such as the text-mode browser called lynx or Trumpet Winsock for Windows 3.11. For the exam, you should understand that although legacy clients do have drawbacks, they serve a purpose for the people who use them.
Key Terms
Before you take the exam, be certain you are familiar with the following terms: access log
Review Questions 1. Which utility can you use to find out how many hops it takes to get
from your computer to another host on the network? A. host B. ARP C. hop D. tracert 2. What are network analyzers good for? A. Seeing if your network is up B. Monitoring the utilization of your network C. Counting the number of hops through a network D. Checking and modifying your Internet settings 3. What would you use winipcfg for? A. To see if your network is up B. To monitor the utilization of your network C. To count the number of hops through a network D. To check and modify your TCP/IP configuration 4. What type of web sites should have a pre-rollout methodology? A. Small sites B. Medium and large sites C. All sites D. Sites where the cost of failure is high
5. What is a storyboard? A. A document that says what each page on the site should do B. A cardboard display of the site C. A version of the site used by graphic designers D. B and C 6. What is a staging server? A. A small server that developers use to try out ideas in private B. The main server that acts as a stage for the world C. A server that serves as a proxy between the main server and people
outside the firewall D. A backup server in case the main server goes down 7. What method of copying files from the staging server to the main
server is least likely to introduce an error? A. Manual FTP because it is simple and won’t break B. Choosing Save As in your text editor (such as Notepad) C. An automated publishing tool such as NetObjects Fusion D. Symlinks 8. Human quality control is _____. A. Vital to reduce the amount of bugs B. Wasted effort if you have automated tools C. A function of the salaries of your designers D. Vital to network connectivity
9. What is the standard screen resolution? A. 31 dots per inch B. 72 dots per inch C. 640 x 480 D. 300 dots per inch 10. If you are testing CGI scripts, where would you look for syntax errors? A. In the access log B. On the console C. In the browser window D. In the error log 11. What role does load-testing software play? A. It extrapolates how your server will do under load. B. It places demands on your server and measures the effect. C. It examines your log files and flags peak usage. D. It warns you when your web server is overloaded. 12. How can you find out what browser someone was using when they
received an error message? A. Look in the access log. B. Look in the error log. C. Ask the user. D. A and C 13. You can telnet into your web server and change your files using
command-line tools. Which of the following is generally a poor use of this method? A. Checking file permissions B. Checking scripts for syntax errors C. Looking at log files D. Editing documents
14. What can you do if you don’t have enough bandwidth to distribute
your software over the Internet? A. Nothing—send it by CD. B. Add extra bandwidth. C. Ask other sites to mirror your software. D. Either B or C. 15. If you try to go to a web site and the browser gives you the message,
“The server xxxx does not have a DNS entry,” what is the reason? A. The URL you have typed is invalid. B. The web server reset the connection. C. Your name server couldn’t resolve the IP address of the web server. D. The web site is overloaded. 16. You try to go to a web site and the browser gives you the error, “Server
not responding.” What is going on? A. The client sends off a request to the server’s IP address and speci-
fied port, but the server doesn’t send a response back. B. Your name server is nonresponsive. C. You have the wrong DNS information cached. Reboot your
computer to clear the cache. D. There is an ARP conflict between IP addresses. 17. You try to go to a web site and the browser gives you the error, “File
contains no data.” What is the most likely culprit? A. The file is missing on the web server. B. Your client needs to support XML. C. There is a CGI error. D. There is no connectivity between you and the server.
18. What do HTTP response codes in the 3xx range mean? A. There has been a server error. B. There has been a client error. C. The request has been redirected. D. Everything went as planned. 19. What do HTTP response codes in the 5xx range mean? A. There has been a server error. B. There has been a client error. C. The request has been redirected. D. Everything went as planned. 20. Which device is also known as a concentrator? A. Router B. Switch C. Hub D. Brouter
Answers to Review Questions 1. D. Host and hop are not utilities, so these answers are incorrect.
ARP is used to resolve tracert shows the path your TCP/IP packets take to go to another host. 2. B. Network analyzers display many things about a network seg-
ment, including how much of its bandwidth capacity has been utilized. 3. D. winipcfg will display your basic network settings and let you get a
new IP address if you are using DHCP. Alternately, you can use ipconfig on Windows NT/2000 machines and ifconfig on Unix machines. 4. C. All sites benefit from some sort of pre-rollout methodology to find
bugs. Different size sites will simply use different methodologies. 5. A. Everyone can use the storyboard to see what each page should do,
and allows a team to easily collaborate on a project. 6. A. People publish to a staging server to see how their changes work
before publishing to the real server. Proxy servers act as a proxy between the main server and the people inside the firewall. Backup servers are an excellent solution to ensure that you can perform a restore of a file or the entire server should something happen. 7. C. Publishing tools will rewrite URLs and check to make sure every
resource is published. Manual file copying means manual editing of URLs, which on larger web sites is almost impossible and extremely time-consuming. Choosing Save As only renames a file. 8. A. Humans see a wide range of problems that machines might miss,
especially if they are supported with a checklist of errors to look for. Machines are also lousy at checking for appearance. 9. B. Screen resolution is generally 72 dots per inch. 10. B. Web servers generally store the full error message in the error log.
Access logs will only show you who accessed the file and any access errors that occurred. The console won’t show you syntax errors in your scripts, only server errors.
11. B. Load-testing software simulates the activity of thousands of users
visiting your web site. This form of testing is an important part of any web site since you need to know how many requests the server can handle at a time. 12. D. The most accurate thing to do is to just ask the user. If you’re at the
client’s workstation, you could use the About option under Help. If you want to look for patterns, you can check the access log. 13. D. If you directly edit the documents that are being served to the
world, there is no testing phase before the whole world can see your pages. 14. D. A lot of popular software is distributed by mirroring the download
page around the world. Sending by CD would work, but you wouldn’t be distributing the software over the Internet. 15. C. For whatever reason, your DNS server could not resolve the URL.
This could be for a number of reasons, such as the site no longer exists or the DNS server doesn’t “know” the IP address and is unable to contact its DNS server. 16. A. If the client doesn’t get a response from the server, it warns the user.
In this case, the server could be down or overwhelmed with requests. 17. C. While your first inclination might be that the file is missing, normally
you’d get a “File not found” error. Here, the browser/server acknowledges that the file exists. The server sent a partial response but “forgot” to send the contents of the response. 18. C. The redirect could be either permanent or temporary. 4xxx codes
indicate an error on the client side. 3xx codes indicate that the file has been moved. 2xx codes represent success. 19. A. Generally, this is a CGI or application error on the server side. 20. C. A hub serves as a central connection point for several network
devices, and so it’s also known as a concentrator. Switches, which are also used as central connection points, were developed afterwards and are also known as switching hubs.
Business Concepts I-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: 3.11 Understand and be able to describe the capabilities of application server providers. Content may include the following:
Providing Internet based services on an as needed basis, such as:
Custom Web Hosting
Providing e-mail services
Providing Fax services
Providing access to an application over the web
Providing shared access to expensive hardware, such as a mainframe computer
5.1 Understand and be able to describe e-commerce terms and concepts. Content may include the following:
Internet Service Providers
Portals
SET (Secure Electronic Transactions)
EFT (Electronic Funds Transfer)
EBT (Electronic Benefits Transfer)
EDI (Electronic Data Interchange)
OBI (Open Buying on the Internet)
OTP (Open Trading Protocol)
Understand and be able to describe the differences between the following from a business standpoint: Content may include the following:
5.3 Recognize and explain the current types of e-business models being applied today. Content may include the following:
Business-to-business models
Business-to-consumer models
Business-to-employee models
Business to Government
Consumer-to-business
Consumer-to-consumer
Storefront (bricks & mortar) vs. e-business
New and changing customer expectations
e-business and the Internet
Meta-aggregator (aggregator)
5.4 Identify key factors relating to strategic marketing considerations as they relate to launching an e-business initiative. Content may include the following:
he Internet started out as the province of government agencies and academic institutions. Profit wasn’t an issue then. But as the Internet extended its reach into households around the world, businesses realized the potential for electronic commerce, known as e-commerce. Today, business facilitated by the Internet is a giant industry. It continues to grow rapidly. As an Internet professional, you have to understand the business issues that (in many cases) drive technical innovations. From copyright, to marketing, to the design of Internet storefronts, you have to know what drives the decisions. This chapter explains some of the issues.
Intellectual Property on the Network
Intellectual property denotes any intangible product of a human being, a group of human beings, or another legal entity (such as a corporation). Intellectual property law aims to protect the rights of creative people to capitalize on the things they create. Practically speaking, intellectual property is any creative product—particularly one that has monetary value. Examples of intellectual property include the following:
The question of what qualifies as intellectual property is an open question—court cases come up all the time in which one party alleges that something previously unmentioned in law enjoys copyright protection. The Harley-Davidson Motorcycle Company sued Honda over its bikes’ exhaust noise. Harley claimed that its bikes’ noise was a distinctive feature of their design and enjoyed copyright protection. Harley lost, but this case gives an indication of the evolving nature of intellectual property law. The computer revolution has forced many tests of intellectual property protections, many of which came about in the days when making a copy of a work of music or literature was a difficult, expensive process. Should a piece of software, which may be duplicated perfectly, instantaneously, and for negligible cost, enjoy copyright? Should web publisher A be able to sue web publisher B when B “frames” A’s content and presents it as his own? These are open questions still in the process of being decided.
Copyright A copyright is the right of an author, artist, publisher, or other legal entity to collect money from the use of words, music, performance works, items of visual art, or other creative products. Facts and short phrases cannot be copyrighted (though certain short phrases may be protected under trademark law). Copyright, in the United States, attempts to guarantee the creator several benefits:
The right to reproduce the work and distribute the copies
The right to revise and improve the work
The right to perform or display the work publicly
The right to have some assurance that the work won’t be defaced or used in a way the author did not intend
The right to receive credit for others’ references to the work
A copyright depends on the ability of a person or entity that is claiming protection to prove original creation of the work in question and to prove that creation took place on a certain date. U.S. law actually allows two creators
to have copyright on identical creative works, provided they arrived at their respective creations independently of one another. There’s a good copyright FAQ on the Web at http://fairuse.stanford.edu/ library/faq.html. A creative work whose author renounces a copyright or refuses to enforce it through infringement suits is said to be in the public domain. Public domain works may be used by anyone, for any purpose, without the user paying royalties or licensing fees to anyone. Other ways material may enter the public domain include the following:
Copyright protection can lapse, as it does after some time period (usually 50 or 75 years after the author dies, depending upon when it was first created or published).
Materials published by most governments (including that of the United States and its individual state governments) are automatically in the public domain.
Note that it is possible to sell public domain works. This is what the publishers of William Shakespeare’s plays do, for example.
Because text can be edited so easily, you can improve the legitimacy of the date from which you claim copyright protection by sealing your document in an envelope and mailing it to yourself. The sealed, postmarked envelope serves as stronger—but not absolute—proof that you had the intellectual property on the date you claim.
Registered Copyright You can achieve an extra level of legal protection for a creative work by registering the work with your government’s copyright office. Essentially, formal copyright registration provides a fairly unquestionable way of establishing when a work was created. The duration of copyright protection established this way varies among media. To cite one example, an author who registered a novel today would enjoy copyright protection for the remainder of their life, and their heirs could enjoy the benefits of copyright protection for 50 years after their death. In the United States, the Copyright Office handles copyright registrations. Its web site appears in Figure 9.1. Other governments have similar agencies. You request a registration form from the Copyright Office, fill it out, and send it in to the government with two copies of the work you’re registering. You can request the necessary forms from the Copyright Office’s voice-mail system at +1 202 707 9100 or get them on the Web in Adobe Acrobat format at www.loc.gov/copyright/forms/. You need particular forms for particular kinds of works. Here’s a list: Form TX Books, manuscripts, software, and games Form PA Music (in written form), plus films, video recordings, scripts, and plays Form SR Music (recorded) Form VA Drawings, photos, and cartoons Under a treaty called the Berne Convention, copyrights registered in any signatory country are valid in all others. All major countries of the world are signatories.
Fair Use of Copyrighted Material Copyright law recognizes that a vibrant creative community relies, in part, on artists’ ability to use the creative products of others as starting points for their own creative work. Such applications of copyrighted material are known as fair use applications in the law. Here are some examples of fair use:
These aren’t cast in stone—the nature of fair use is constantly undergoing revision as those accused of copyright violations claim (and sometimes prove to a court) that their use was fair. Though no precise statement of what is not fair use exists, the determining factors seem to be the size of the excerpt and the profit motive of the party using the copyrighted material. Courts tend to favor fair-use claims presented by nonprofit organizations over those put forward by organizations that have made money from their use of copyrighted material. If you’re not making money (or causing the rightful copyright holder to lose money) as a result of your use of brief snippets, you’re probably okay. When in doubt, contact the owner of the copyright and ask for written permission.
Licensing Copyrighted Products If you want to use a copyrighted work in your own products—and remember, it does not matter whether the copyright is registered with a government—you must ask permission. The copyright holder is free to do three things:
Refuse permission to use the material
Allow you to use the material free of charge, provided you credit the copyright holder
Require you to pay a fee for the use of the copyrighted material
The last of the three options is called licensing, and it’s a big part of the intellectual property business. Licensing deals take many forms and usually state explicitly what rights are being granted and what compensation will be paid for them. A writer, for example, might write a story and grant a magazine the rights to publish the story once in its North American editions. The magazine would pay a writer a fee for that right. The writer would retain the rights to sell the story again for use as part of a compilation put out by a book publisher, without consideration to the magazine. The writer would also retain, for example, the ability to license the story to a movie studio for adaptation into a screenplay. Software licensing is an excellent example as well. When you purchase software, you aren’t actually buying the software itself. Instead, you are purchasing a license that entitles you to use the software. If you don’t believe it, get the license agreement from one of your software packages and read it over. Most license agreements are standard, but sometimes they do change,
and it’s a good idea to understand when you are allowed to use the software and when you can’t.
Securing the Entire Copyright If you’re a publisher and want to secure the copyright to a work created by someone else (such as a freelance writer), you can secure the rights by either of two means. A work-made-for-hire agreement states that the creator of a work (the freelancer) created it because he was hired by the publisher to do so and paid accordingly (or paid something, anyway; therefore, the publisher has the copyright and the freelancer does not. A creator also can transfer the copyright on a work to another entity by assignment. Usually, assignment must involve an explicit, written statement that says the original creator is granting their copyright to someone else, such as a publisher.
Infringement Consequences Copyright, in the United States and most developed countries, is a matter of civil law. That is, a copyright holder cannot complain to the government that someone has committed a crime by infringing upon the copyright. Rather, a copyright holder can file a civil suit alleging infringement. If the civil suit goes to trial and the infringement is found to have taken place, the defendant may be made to pay damages to the copyright holder. In point of fact, civil suits are expensive and generally are the last resort of copyright holders who feel their rights have been infringed upon. Usually, those using copyrighted material for purposes perceived to be unfair by the copyright holders will receive stern letters from the copyright holder or their lawyer, asking that the use stop. If the perceived problem continues, the copyright holder can file suit and fight the matter in court. If an infringement is determined to have occurred, the entity using the material without permission may be judged responsible for damages and made to pay reparations to the copyright holder.
Trademarks A trademark is much like a copyright except that trademarks apply to words, phrases, and images used to describe products and services (technically, a word, phrase, or logo that describes a service is called a servicemark, but the
legal concepts are pretty much the same). The following are examples of trademark-protectable intellectual property:
A company’s name (such as Netscape Communications Corporation)
A product’s name (such as Diet Coke)
A logo (such as the AT&T globe image)
A graphic device (such as the Izod alligator)
Oppenheimer Wolff & Donnelly LLP, a law practice, has put together a neat FAQ on the topic. It’s on the Web at http://www.oppenheimer.com/ intprop/trademark/faq/faq.shtml.
Registering a Trademark As is the case with larger creative works protected by copyright, U.S. law provides for trademark protection on words, phrases, and devices even if they’ve not been formally registered with the government. You can assert a trademark or servicemark right by always printing a ™ (for trademarks) or SM (for servicemarks) next to the device you want to protect. You can establish stronger legal protection for your trademark by registering it. Governments maintain registries of trademarked intellectual property. In the United States, the U.S. Patent and Trademark Office (USPTO) maintains the list of registered trademarks. To register a trademark, you must establish that it represents a unique way of denoting a product or service and is not in use by another entity. You must also be actively using the trademark—you can’t register a trademark in anticipation of applying it to a product or service you’ll develop in the future (though this used to be possible). Once you have registered a trademark, you can follow it with the ™ symbol to denote the registration. You can get further information about registering a trademark at the USPTO web site, www.uspto.gov/.
Using Trademarked Material Trademarks run the risk of bringing about their own demise. If a trademarked word is heavily advertised and becomes synonymous with a product or service, it loses its protectability. This is why Xerox Corporation is so adamant that people not talk about “making xeroxes” or “xeroxing documents.” The correct phrases, acknowledging the trademark on the Xerox name, are “making Xerox copies” and “copying documents” (as with a Xerox copier). You have to be careful to use trademarked words as adjectives, not nouns.
Generally, it’s not possible to license trademarked material. The companies holding the trademarks usually are loath to share them with those selling products and services other than their own. There is protection, however, for the incidental appearance of trademarks in creative media. The producers of a movie that features a scene in Trafalgar Square—where a giant Virgin Records sign appears—probably would not infringe upon Virgin’s trademark protection by showing the sign in the film as an incidental part of the scenery.
Infringement Consequences Trademark protection is a matter of civil law, and so enforcement of trademark rights is similar to that of copyrights. Refer to the copyright section for information on legal enforcement of intellectual property.
Patents Patent law exists to protect physical devices and processes. You might patent a cleaner-running engine for automobiles, a faster kind of memory chip, a way of making harder steel, or a chemical formula for a more flexible plastic. Essentially, patent law is the same as the law governing other kinds of intellectual property. The difference is in the nature of the creative product. In applying for a patent, you agree to make the details of your product or process available to the public in exchange for a monopoly in profiting from your invention. The monopoly is limited by time—design patents (on the appearance of a product) last 14 years in the United States, while the time limit on utility patents (on products and processes) lasts 20 years. David Kiewit, a patent lawyer, has posted a good patent FAQ on the Web at patent-faq.com/index.htm.
Securing a Patent Unlike other kinds of intellectual property, there are no implicit patents. You can’t sue someone for patent infringement unless you have formally registered your claim with your government’s patent office. Even if you and another party arrived at the same product independently, the right to sue for patent infringement goes to the party that first secures government registration. To be protectable by patent, a product or process must satisfy the following three requirements. Useful The product or process must accomplish something desirable.
New It must not have been patented before. Nonobvious The product or process must be the result of creative work, not something that would come naturally to someone skilled in the trade to which the product or process applies. Of these, the newness and nonobviousness requirements are the hardest to prove. Novelty can by verified by a search of existing patents, which is something patent lawyers are trained to do. Expert witnesses can assert nonobviousness. In the United States, the U.S. Patent and Trademark Office (USPTO) handles patent applications. Its web site appears in Figure 9.2. You can apply for a patent by describing it and filing the proper paperwork with the USPTO. When the USPTO has examined your application and agrees that your idea is patentable, you are granted a patent. Between the time you file and the time you receive your patent, you can refer to your product as having “patent pending” status. Legally, the phrase means little, but it may discourage aspiring idea thieves. You can get further information about registering a patent at the USPTO web site, www.uspto.gov/. FIGURE 9.2
Licensing Patented Products Companies and individuals that have secured patents on their inventions often are eager to license their patents to others. Indeed, many companies exist for the sole purpose of securing patents that may later be licensed to production companies (this business model is common in the biotechnology industries). Often, a patent license may involve a compensation system that’s based on the number of products sold. At the end of each accounting period, the licensee pays the patent holder some royalty for each instance of the protected item it sold during the period.
Open Source and Public Licensing Most of the software industry is based on companies and individuals—software publishers—writing software, compiling it, and selling the compiled binary code to consumers. The publishers invest in the people and other resources needed to create their products’ source code; they then keep that source code secret. The idea is that a publisher deserves to profit from something in which it has invested money to produce. But an alternative model has long been a part of the hobbyist and academic communities. These groups espouse the idea of writing software and making the source code public, available for anyone to examine, modify, and redistribute. Such freely distributed programs are called open source software. The source code of certain Unix variants has always been available to the public, and lots of Unix utilities are open source, too. But the idea hasn’t started to translate into the world of Intel-standard processors until recently.
The Open Source Movement The open source movement—an informal group of software publishers, book publishers, academic institutions, and individuals—holds the belief that software for which source code is freely available inspires innovation, whereas closed source software stifles it. Opponents of the traditional microcomputer business model, and of Microsoft Corporation in particular, members of the open source movement champions free software like the Linux operating system, the Perl language, and the GNU utilities. Public-domain software has no copyright protections at all. This means that anyone may acquire, copy, and use the software without paying a licensing fee to anyone (there’s no copyright holder to pay). Publishers are free to charge for public-domain software, and they may get the price they ask if
they package it attractively and offer extra features, such as technical support. This is the business model behind Red Hat, Caldera, and other distributors of the Linux operating system, the kernel of which is in the public domain. The concept of copyleft is part of the open source movement. The idea of copyleft is that an organization (usually a not-for-profit group like the Free Software Foundation) establishes a copyright to an item of intellectual property—source code for software, usually—then distributes it, free of charge. People who use the software must agree to its licensing agreement, which specifies they may not make a profit on its distribution. The GNU utilities are covered by copyleft. There are details of copyleft on the Free Software Foundation’s web site, at www.gnu.ai.mit.edu/copyleft/copyleft.html.
Using Open Source Software You’re free to use open source software for any purpose you want without paying a licensing fee. You can modify the software to suit your particular needs, and you can redistribute the software as you want. You can even try to sell the software if it is in the public domain. If the software is copylefted, however, you may be prohibited from making a profit on it.
The Global Marketplace
T
he Internet spans the planet. It has the potential to bring about truly free, worldwide markets in which the most efficient providers of goods and services have the advantage over others, unencumbered by geography and politics. It’s a Utopian vision, but one that’s beginning to come true. Your organization may want to get on board, but first, we need some perspective. Fewer than three percent of the world’s population has ever used the Internet in any way (a statistic that isn’t so shocking when you consider that only about half the people in the world have placed a telephone call). Regardless, the community of Internet users is a great market, comprising mostly people of greater-than-average income and education. If you’re going to sell to the world over the Internet, you must be aware of what you’re getting into. You have to be prepared to communicate with people who prefer many different languages and often only know one. You’ll
have to address varying customs and courtesies. You’ll also have to be sure that your company can deliver what it promises, get paid for its work, and comply with all relevant laws.
Language and Communication Communications technologies exist for the purpose of helping people talk to one another even when separated by time and distance. You can chat with someone anywhere in the world in real time, and your web site can sell your products while you’re asleep. But the best communications tools can’t help you unless you and your audience share a common human language. You have to be able to tell each other what you’re thinking.
Language English is the lingua franca on the Internet, probably because English-speaking Americans make up a huge proportion of the user population and Americans are notoriously reluctant to pick up other languages. Plenty of Internet resources exist in languages other than English, but it seems that two parties attempting to communicate across cultures default to English a lot of the time. New innovations, such as the Unicode character set that makes it easier to incorporate non-Latin characters into displayed text, are making it easier to cater to the needs of non-English speakers. English is an awful language. It’s loaded with irregularities, exceptions, special cases, and strange pronunciations. Many English-language conventions—ungendered nouns, say—are totally at odds with what’s normal in other tongues. Learning English as a secondary language is devilishly hard. So, when someone writes to you (assuming you’re an English speaker) and refers to a piece of software as “him” or talks about “the weather, which one is rainful,” recognize that the writer is going to considerable effort to accommodate you. Don’t disregard a message because the English is faulty. Do your best to interpret it, politely ask for clarification when you must, and reply as you would to a grammatically perfect message. As a web publisher, you may decide to publish your content in multiple languages. If you do so, make sure the translations are all good—have them done by someone who speaks both languages superbly and can catch all the idioms. Don’t assume that translation software is good enough—it almost never is, and it can be obvious when translation has been done that way. A bad translation is worse than no translation at all in many cases.
Cultural Differences Everyone knows how hard it is to communicate emotion in electronic mail. A person who mentions a “brilliant idea” you had may be paying you a compliment, or may be using sarcasm to mock you. The problem grows when the sender and recipient of a communication are on different cultural wavelengths. In Brazil, it’s not unusual for a man, casually wishing another man goodbye, to pat his acquaintance on the stomach. An Englishman, subjected to belly-patting for the first time, may be insulted—the same gesture in the United Kingdom might mean that the patter thought the pattee was putting on weight. In fact, it’s just a friendly gesture that’s meant to convey nothing more than familiarity. It’s a cultural affectation that the Englishman must learn to recognize and interpret properly. The same sort of situation can arise in e-mail. Receiving an e-mail from a business contact in Australia, an American might be put off by the Aussie’s formal tone—such a stiff approach is adversarial, they might think. It’s not—it’s a cultural trait of many Australians to use a somewhat reserved tone in written business communications of all kinds, even e-mail. It’s not fair to generalize. There certainly are Australians who like to write casual e-mails for business and Americans who prefer a formal writing style. The point is, be slow to take offense at perceived oddities in communications from other countries. The odds are good that no hard feelings are meant and that you’re coming across just as strangely.
Delivering the Goods Sharing information is one thing, but business is based on actually providing customers with something they’re willing to pay for. The process of getting goods and services to consumers is easy enough in a geographical area with a good postal service and other package-delivery resources, but it’s more of a challenge when there are oceans between you and your customers. If you’re doing manufacturing work overseas or importing materials, you have to be concerned about the effect of shipping issues on your supply chain. Getting paid for the things you sell can prove challenging, as well.
Order Fulfillment Many Internet businesses, tasked with delivering physical products to customers scattered far and wide, set up distribution centers in various parts of
the world. Amazon.com, for example, has one warehouse in Delaware, one in Seattle, one in the United Kingdom, and one in Germany. The relative efficiencies of different snail-mail systems and routes still mean a lot. Australians, for example, report faster shipment of books from Amazon.co.uk in England than from Amazon.com in the United States.
Getting Paid Credit cards are magical things. If a German uses his Visa card to make a purchase in Hong Kong, the merchant gets paid in Hong Kong dollars and the German pays his bill in Euros. The banks and the credit card companies handle the currency exchange behind the scenes (at bulk interbank rates favorable to everyone) and the transaction goes as smoothly as one at the German’s local grocery store. The same holds true on the Web. An American can buy a product from a British site without problem. The merchant is paid in British pounds and the American pays his bill in U.S. dollars. Assuming there’s a way to get the product from Britain to the United States, the transaction proceeds without problem. The moral: Use credit cards for cross-border transactions wherever possible. If you can’t use a credit card, you have other options.
Wire transfers between banks
Wire transfers of cash (such as American Express and Thomas Cook)
Personal delivery of cash
Payment in kind
These aren’t really applicable to retail operations on the Web, but they’re all workable for consulting relationships and other business-to-business transactions. You can try all kinds of strategies in businesses characterized by a low volume of high-value transactions. Bear in mind that many countries regulate cash outflows (the United States, for example, requires you to report movements of sums greater than $10,000 out of the country). Bank transfers in some countries don’t always go as smoothly as in others and may require some personal shepherding by a local citizen. Although there are many ways to ensure that you get paid for your products and services, there are several electronic methods that we will take a look at.
Portals Portals are high-traffic web sites composed of a wide range of content, hyperlinks to vendor Internet sites, and services. While some portals— Yahoo!, Excite, Lycos, and MSN, just to name a few—offer a large variety of subjects, some portals offer specific services, such as computer-related information. In addition to content and web links, portals usually offer services such as e-mail, distribution lists, community (chat) rooms, and instant messaging. Secure Electronic Transactions (SET) Secure Electronic Transactions (SET) is an open standard created by MasterCard and Visa to perform electronic transactions between the consumer and the vendor. The specification arose from the need to verify both parties in an electronic transaction and facilitate e-commerce. To be able to work, both the vendor and the consumer exchange their digital certificates and digital signatures. These electronic documents are verified on both ends of the transaction to ensure that the vendor is actually a legitimate vendor, and that the buyer is who they claim to be. Electronic Funds Transfer (EFT) If you’ve ever used a debit card to buy a meal or purchase that new CD, you’ve used Electronic Funds Transfer (EFT). EFT is used by financial institutions, such as banks, to transfer debits and credits (sometimes called electronic funds) between themselves. These debits and credits represent dollars that are deducted from your account to a merchant (credit) when you purchase something, or added to your account (debit) when you are given a refund or receive your paycheck. Electronic Benefits Transfer (EBT) Electronic Benefits Transfer (EBT) is similar to EFT in that it is a transfer of debits and credits to/from an account, but instead of bank accounts it deals with benefits delivered to public assistance recipients. These benefits are
accessed through a government issued debit card that is not supposed to be transferred to anyone other than the authorized recipient. Electronic Data Interchange (EDI) Electronic Data Interchange (EDI) was developed to facilitate the exchange of invoices and orders before the Internet became popular. EDI relied on value-added networks (VANs) to transport data from one company to another in an efficient and cost-effective manner. But when Internet use became widespread, companies began to realize that they could move the data themselves over their existing Internet connection and keep the fees charged by VANs. Open Buying on the Internet (OBI) The Open Buying on the Internet (OBI) standard was created by the Internet Purchasing Roundtable as a free, open standard to expedite the purchase, payment, and delivery of goods. Because the standard is free and easily obtainable, vendors are able to create specific products that can interact with other vendors’ products. For example, one vendor can create a purchase requisitioning system that Company A installs and uses for buying products and services. Company B may obtain another vendor’s product that handles electronic transactions. If Company A needs to order more paper from Company B, Bob can log into the corporate intranet and access the requisitioning software. The software would bring up several paper vendors that Bob could choose from. Bob picks Company B and sees their web page magically appear. Bob picks the paper from the web site, not realizing that Company B’s electronic transaction software has already verified Bob’s digital signature, and goes to check out. While Company B’s system is working on processing the order, Bob sees a purchase requisition appear with all of the information needed. Bob then reviews the requisition, accepts it, and the information is stored in Company A’s financial system. Open Trading Protocol (OTP) The Open Trading Protocol (OTP) is actually not an individual protocol, but a set of standards that attempts to do for electronic transactions what TCP/IP does for communications. It strives to set a standard communication method between all forms of electronic payment systems from the consumer to merchant to financial institution. OTP is relatively new, and the work is still being performed by the Internet Engineering Task Force (IETF).
For more information on OTP, refer to the IEFT web page at www.ietf.org/ ids.by.wg/trade.html.
Obeying the Law The legal standing of companies physically located in one country (to the extent that they have a physical location at all) while doing business in another country is still up in the air. The matter seems to be reaching the courts piecemeal. The U.S. state of Wisconsin, for example, is attempting to prosecute several Caribbean companies for alleged violations of the state’s anti-gambling laws by Internet gambling sites. If the player was in Milwaukee, the server simulating a slot machine was in Jamaica, and the server’s owner was in Britain, where did the gambling take place? It’s a matter for the courts to decide. You have to be aware of legal and regulatory issues in all places you do business. Criminal law aside, it’s generally accepted that any organization is responsible for paying taxes on all its income wherever it is incorporated, regardless of where the money comes from. A Maryland company that receives cash from Argentina must pay state (Maryland) and federal (U.S.) taxes on that income. It’s also responsible for paying whatever import duties Argentina levies on the incoming goods. Legal and regulatory matters are always complicated, and you almost certainly pay for legal help in your own country. The logic applies even more strongly in other countries. Hire a local lawyer or other consultant. A trustworthy expert can be invaluable.
Online Marketing
I
t’s as the old saw (so to speak) about the tree falling in the woods says: You have to have an audience to be able to make a splash in Internet business. The raw numbers of people are out there, for sure, but you have to motivate enough of them to come to your site and keep returning. The first rule of marketing is to know your target market. What kind of people do you want to attract? Once you’ve figured that out, try to identify
things that appeal to people like that. Provide your audience with the things it likes, and make a point of advertising what’s available on your site.
Advertising You can build the best web site in the world, but random chance and wordof-mouth will draw only so much traffic to your pages. If you want to bring people in, you have to tell them to come and why they should do so. You have several ways to advertise.
Pull and Push Technologies A push technology is any system that causes information to appear on a user’s screen without them having specifically requested it, at least not in the immediate sense. The most popular (and generally most effective) push technology is electronic mail. Certain web sites offer people the opportunity to sign up for mailing lists, which then send e-mail messages containing a mixture of useful information and advertising. Less popular push technologies include PointCast and other specialty software packages for getting news onto a user’s screen automatically. Push technology contrasts with pull technology, which require users to specifically request information at the time they want it. Standard web sites are examples of pull technology. Both push and pull systems can make effective marketing tools. Push schemes keep your message in front of users, even long after they’ve departed your web site. Be careful not to annoy them or send out unsolicited e-mail (spam). Anti-spam departments of ISP companies can and will turn off your account if enough complaints are received, and in some cases one complaint is enough to do it.
Paid Banner Ads A banner ad is a rectangular graphic, usually about 500 pixels wide by 60 pixels tall, that’s placed on a web page where the people who look at that page can see it. When a surfer clicks a banner ad, their taken to the advertiser’s web site (and usually, the fact that they got there through a banner ad is noted). Figure 9.3 shows a banner ad. FIGURE 9.3
The banner ad industry has skyrocketed because some companies will pay individuals a set fee for each client received from the banner ad. High-traffic sites, such as CNN.com, get tens of thousands of dollars for an ad placed on their pages. Other sites take payment in kind, usually in the form of reciprocal advertising. Often, a banner ad deal will involve a combination of cash and in-kind payments. Usually, banner ad placement is priced in views. If you buy 10,000 views on a site, you’ve agreed to pay a fee in exchange for your ad being sent out to surfers, as part of a page, 10,000 times. There’s no guarantee that merely showing an ad to a surfer will motivate him to click through to your site. You might try negotiating a banner ad deal based on a certain number of clickthroughs—say, the site will show your ad until 500 people click it to see your site. Estimates of realistic clickthrough rates are generally low—one or two percent of the people shown a well-designed ad will click through to the advertiser’s site.
You can get creative with search sites, specifying that your ad appear at the top of results lists when people search for particular words.
Webring Banner Ads A webring is a community of sites that relate to some common interest. As a courtesy to their users, each site in the webring agrees to post banner ads for the other sites in the ring, increasing traffic to all the sites. Webrings usually are casual affairs that find application among fans of a particular musical group or participants in a specific hobby. They’re not usually a popular option among commercial publishers.
Non-Web Advertising Spending on web advertising remains tiny in comparison to the money organizations drop on traditional media, and for good reason: Many more people listen to the radio and watch television than surf the Web. There’s lots of overlap between the Web and traditional media. Hardly anyone interacts with the Web exclusively, never picking up a newspaper or watching a television show. The trick in choosing a non-web advertising venue is to figure out what your audience does. If you’re trying to appeal to people who read books for
pleasure, you might have good luck advertising on public radio and on billboards near large bookstores. If you want to attract expectant mothers, shun advertising on cocktail napkins and try distributing imprinted pens to obstetricians. Find out where your people are, and talk to them there.
What Happened to the “Dot-Commers?” The year 2000 saw a frenzy of Y2K activity as companies continued to monitor hardware and software for any sign of failure due to the “Y2K bug.” Unfortunately, not many people monitored the .COM businesses that sprung up almost overnight, and therefore fell by the wayside when many of these businesses began to fail. Many factors were involved in their demise, and many companies are still feeling the backlash as the economy tries to stabilize, but there were a few major reasons for the massive failure: lack of good financial management, failure to provide goods and services when promised, and a lack of proper advertising. If you are going to put up an on-line business or storefront, make sure that you target the appropriate audience, have something that isn’t easily found anywhere else (such as don’t try and sell pens on the Internet unless they are highly specialized or personalized pens), and know how to use the appropriate advertising scheme.
Partnerships Internet sites are loaded with partnerships, which are relationships between companies that can take any number of forms. On some sites, partnerships is another word for advertising relationships. On other sites, partnerships denote broader financial support of the company that’s publishing the site. On still other sites, partnerships denote vendor-customer relationships or a common parent company. Sometimes, a small company will describe a better-known company as its “partner,” no matter how slender the relationship, just to get some rub-off credibility from the name recognition. Some companies, such as Sun Microsystems, have formal rules about partnerships. Consultants who have passed certain certification tests can describe themselves as Sun partners for their consulting work.
Free Information Although talk of the Internet as a gift economy has faded, it’s easy to give information away on the global network. Indeed, web surfers have come to expect free information in many cases and can take umbrage at organizations that are stingy with data. In many cases, it’s in an organization’s interest to be free with information. If having product manuals on the Web can cut down on technical support calls from people who have lost their documentation, the web site saves some telephone expenses and a technician’s time. The same goes for all kinds of everyday information people once routinely called in to get, often using a toll-free number and an expensive call center in the process. Such information includes the following:
Credit card balances
Frequent-flyer statements
Stock quotes
Bank statements
Store hours, locations and phone numbers
Analyze your customer service calls. If the operators have stock answers to standard questions, put those answers on the Web in a Frequently Asked Questions (FAQ) document. Everyone will be happier—and it will improve public relations.
Many businesses today seem to forget the concept of goodwill. Goodwill isn’t something that you can really put a price on, but it does show up on the bottom line. For example, if your company makes contributions to charity based on its revenues, and you list this on your web site, people feel better about purchasing your products because part of it goes to help others in need. If you list that your company makes those charitable contributions, however, and it’s found out later that you really don’t, people will stop visiting your site.
New Marketing Challenges With every blessing comes a curse, and there is a considerable dark side to the cheap marketing brought about by the Internet. For one thing, your organization’s competitors can promote themselves as easily as your organization can. It’s yet another arena for you to fight it out in your media campaigns, and indeed it’s a lot easier for customers to compare competing sites side-by-side than to compare radio commercials. There are other challenges:
Sites published by customers unhappy with your products and services
Communications media that allow rumors to spread very quickly
“Disgruntled Customer” Sites It’s always been true that an unhappy customer tells more people about his bad feelings, also called negative advertising, than does a satisfied customer. But traditionally, such customers could only share their feelings with a relatively small number of people. Now, though, it’s a simple matter for a customer who feels that he’s been wronged to post all sorts of vitriol on the Web. When someone searches for the name of your company or one of its products, guess what comes up in the results? Your company’s page, but also the page titled “Why ABC Company is a Scourge upon the Earth.” Surfers read pages like that and weigh what they say against other information they have about your company. Similarly, sites like Deja.com (www.deja.com) promote themselves as forums in which consumers can sound off for and against the things they buy. A bad review here can really hurt you. Some web sites, such as www.zdnet.com, have price comparisons between registered vendors and provide a rating system based on customer satisfaction. Your best defense against negative advertising is to provide good products and services. If you take care of your customers, positive feedback should outweigh negative comments—something potential buyers will notice.
The High-Speed Rumor Mill Internet-enabled media like Usenet, e-mail, and chat provide customers with new, global forums for their opinions about your company and its products. True or not, information about you can circulate faster than ever before. You have two responsibilities.
First, be aware of what’s being said about your organization in the Internet media. Monitor the chat rooms in which your customers gather. Check in at the web sites that consumer groups and professional organizations maintain. Read relevant columnists’ work. Participate in forums, where appropriate, to build goodwill and to establish trust so your arguments against negative comments, when they arise, are taken seriously. Second, be prepared to counteract problems when they pop up. Recognize your products’ problems; note that you want to fix them and that you take customer comments seriously. Offer to compensate the complainer with freebies, if needed.
Changing Customer Expectations In the days before the Internet, customers had to buy magazines or newspapers to get information and reviews on new products, watch advertisements, go down to a store to purchase an item and lug it home, or use mail order catalogs. Because these things took time, customers were willing to wait six to eight weeks for delivery. Not anymore! In comes the wonderful world of instantaneous gratification. Need to research a product or do price comparison? Go to your favorite shopping site on the Internet and there is the information. Don’t feel like running down to the bookstore to buy that new Tom Clancy novel? Log on to the Internet and purchase it online; some companies have free shipping with second-day mail. Sounds like a dream come true, but in just a few short years society has gone from being afraid to purchase goods and services online, to expecting to log on and buy anything with delivery the next day. This is a perfect example of how a customer’s expectation has changed dramatically in just a relatively short period of time. Your company, to be effective, must be able to recognize when customer expectations have changed and be able to react quickly. Surveys, feedback, watching marketing trends, and so on all have clues as to what customers expect.
The Business Case for Networks
S
etting up a network of computers isn’t cheap. Doing so requires specialized equipment (such as routers and switches), specialized services (such as dedicated telephone lines and Internet backbone services), and on-site
experts (such as yourself). These things are expensive, and they’re not always easy for the people who dispense money to understand. Part of your job is to explain the business cases for different kinds of networks. You have to be able to explain why they’re good investments.
Internet Your company’s Internet site is its storefront for most of the world. It’s always available and should be used to provide nonstop marketing and customer-service functions that would cost too much to provide otherwise. Brochureware Your site can contain basic information about your business—the sorts of things that would appear in a basic marketing brochure. Self-service customer information You can reduce the load on a customer service department by enabling customers to help themselves. Computer-related services use a knowledgebase or technical support database for troubleshooting problems with software. Marketing materials Supplementing the basics of brochureware, marketing materials enable customers to get the information they want at the levels they want it. One customer might want general descriptions; another might want technical details. Ordering facilities Internet sites can generate revenue through catalogs and credit card acceptance.
Intranet An intranet enables the people in your organization to collaborate efficiently, sharing information and files. Intranets use Internet standards, and so employees can conduct business on the intranet with the Internet tools they already know how to use. Applications might include the following:
Extranet An extranet involves granting certain outsiders limited access to your company’s internal resources. You might find it advantageous, for example, to allow a vendor to monitor your level of some raw material and automatically send you more when your supply drops to some prearranged limit. An extranet might also enable you to share information with an external service provider, such as a payroll company or a marketing house. You might implement the following technologies on an extranet:
Shared database access
Conferencing
Some extranets incorporate Electronic Data Interchange (EDI), which is the automated sharing of information among computers. A supplier’s database may automatically transfer data on shipping schedules to a customer’s machine, for example.
Virtual Private Network A virtual private network (VPN) is functionally similar to a local area network (LAN), except for the fact that some or all of the network nodes are connected by communications channels established on the open Internet. Secure networking protocols make it possible to operate a VPN with a high degree of confidence that your data remains confidential. The business case for a VPN is strong. Your argument for a VPN should hinge on the fact that a distributed network otherwise would require a considerable investment in communications services. Here are some advantages: Telecommunications savings Where linking geographically separated network resources required dedicated lines before, a VPN can provide the same connectivity for far less cost. Flexibility Because there are no contracts on dedicated-line service, you can reconfigure your VPN more easily than a networked based on traditional telecommunications links.
Internet commerce is any sort of business that is facilitated by the Internet. Usually, the term applies primarily to commerce that involves a web site of some kind. There are as many variations to the Internet commerce tune as there are companies on the Internet, but it’s fair to fit the business models into the following categories:
Storefront (bricks & mortar) vs. e-business
Business to consumer
Business to business
Business to employee
Business to government
Consumer to business
Consumer to consumer
Meta-aggregator
Storefront (Bricks & Mortar) versus e-business A storefront (bricks & mortar) business has a physical location that customers can visit and purchase goods and services. Storefronts make good business sense if you are going to run a small, regional operation. Usually, you will need to have someone stock the shelves, run the cash register, and provide room to display your wares as well as keep a good inventory on hand. Some businesses, such as clothing or food, work better as a storefront business than an e-business. If you want to expand your business range, or plan to operate a medium to large business, e-business may make sense for you. For one thing, your merchandise showroom takes up little space because it’s on the web. Labor costs also decrease because you don’t need as many personnel as you would in a storefront business. On the other hand, you will have to deal with shipping costs to the customer.
Business to Consumer The best-known Internet commerce sites have to conduct business by selling goods and services to individual people. Typically, so-called business-to-consumer sites involve presenting a catalog of products, a virtual shopping cart in which surfers can store the ones they want, and a credit card acceptance facility. Internet business-to-consumer sites offer opportunities to increase sales in ways conventional stores and paper catalogs do not. There are a couple of strategies you can try: Cross-sell The process of selling the buyer of a given product accessories and other related products. You might design your e-commerce system to present the buyer of a computer printer, for example, with advertisements for toner cartridges, parallel cables, and paper trays. Up-sell The process of encouraging a buyer who thinks she wants one product to buy another, more profitable product. Up-sell may involve pitching a larger package of the same stuff or a more feature-rich variant of the same model. Amazon.com does a great job of cross-promoting its products. If you’re looking at the detail page for a particular book, you see a list of other products buyers of that book have purchased. If you’re looking at the detail page for a stereo amplifier, you see references to compatible speakers. Where making such pitches consistently in a bricks-and-mortar store would require a highly trained sales staff, the same pitches on the Internet require only a well-designed database. Not all business-to-consumer operations on the Internet take the form of storefronts. Some sites (such as the Wall Street Journal, www.wsj.com) charge a subscription fee for access to information. Others, such as the assortment of Internet stock brokerages, provide some information (such as portfolio tracking pages) free of charge while collecting a fee on other services (such as trades).
Business to Business
W
hile retailers like Amazon.com and eToys.com get all the headlines at present, many experts predict that the Internet marketplace for businessto-business commerce will soon overshadow the retail market by a large
margin. Business-to-business commerce is exactly what it sounds like—businesses providing goods and services to other businesses. Business-to-business transactions can mimic business-to-consumer commerce closely. Businesses buy paper clips, computers, motor vehicles, and travel products all the time, just as individuals do. Companies like Dell Computer Corporation (www.dell.com) and the Internet Travel Network (www.itn.net) do good business selling to companies, governments, and other organizations this way. But a greater potential may lie in providing information services to businesses. Trucking companies, for example, typically waste a lot of money moving empty trucks around because the next hauling job usually starts some distance from where the previous one ended. There’s a market for a company that finds buyers for the hauling capacity that presently goes unused. The great thing about business-to-business work is that companies often buy more of what you’re selling, more frequently, than an individual consumer does. The volume of transactions might be lower, but their individual value is higher.
Business-to-Employee Models
B
usiness-to-employee models include goods and services to the employee. In many cases, corporations have found that providing employee benefit information over the intranet has cut down on a lot of questions to the personnel department. Because some benefits are changeable during a set time of the year, such as health insurance, or multiple times of the year, like 401K plans, companies are now allowing employees to make changes directly over the corporate intranet. Some companies even sell products like T-shirts and coffee mugs to their employees over the intranet. Business-to-employee services generally include:
In the United States, anyone living near the Washington DC area can tell you about business-to-government models. Many businesses will sell goods and services to the government just as they would any other client, and some companies even provide contract services to federal agencies on multiyear contracts. Selling to the government is the same as selling to the consumer; however, U.S. government pricing is regulated by the General Services Administration (GSA), which doesn’t allow for much room to haggle over prices.
Consumer to Consumer
H
ave you ever wanted to sell your car or have held a yard sale? If so, you were conducting consumer-to-consumer commerce. There are various methods for selling that unwanted item: bulletin boards, newspaper ads, yard sales, and online auctions. Online auction sites, such as eBay, have become popular because the seller doesn’t have to worry about the details of payment. Generally, auction sites will handle the transaction for you and charge you a small percentage of the sale for their services.
Meta-aggregator
A
meta-aggregator provides brokerage services for businesses and consumers. One of these services includes opening markets to consumers that were traditionally closed to direct access. Expedia.com is a good example of a meta-aggregator in the travel industry, where previously you had to go to a travel agent to book your vacation. Meta-aggregators also lower costs to consumers by providing price comparison services for products. An example of price comparison can be seen when you shop at MySimon.Com; you search for a product, pick the product type from the search results, and get a list of vendors and their prices.
If you use meta-aggregator services for price comparison, make sure that you double-check the results by visiting the vendor’s web site. Sometimes, the meta-aggregator may not have gotten the new information and you will wind up paying a lot more than you thought. One time, I even found the lowest price for a network interface card (NIC) from the vendor listed as having the highest price.
Summary
I
n this chapter, you learned about basic Internet business concepts. The first important concept that we discussed included intellectual property laws. Intellectual property is anything that is created, designed, or imaged by an individual or a company. Examples of intellectual property include works of literature, music, software, and so on. Intellectual property laws are designed to protect the rights of creators to profit from their creative works. Copyright laws protect written materials such as listed previously, while trademark laws protect slogans, phrases, and visual devices. Patent laws are used to protect inventions, processes, and procedures. Intellectual property laws vary from country to country (and sometimes providence to providence), but it is a major part of doing business in the global marketplace. Other concerns that must be dealt with when conducting business on an international scale, such as international laws, language barriers, and cultural differences. What may seem acceptable by an American may be insulting to a Egyptian. Similarly, while certain drugs may be legal to purchase and use in European countries, they may be illegal in the United States. The bottom line is that you need to understand the laws and customs of a country, or hire a consultant or lawyer, to be able to do business there. While dealing with the global marketplace may seem to be a bit daunting, you need to bring customers to your web site if you want to sell your products and/or services. Internet marketing strategies include mailing lists, web ads, webrings, free information, and a platform for customer feedback. Customers have to know you exist before they can visit your site, but to keep them coming back you have to give them a reason—price alone just doesn’t work anymore. Online customer support, such as Frequently Asked Questions (FAQs) and technical support databases that allow quick and easy
access to solutions make a difference in client retention. Even adding customer feedback options, such as e-mails or online chats with company representatives, won’t work if you don’t follow up on the complaints. Different types of networks have different roles in reducing your company’s total costs. Networks that we discussed in this chapter included intranet, Internet, extranet, and virtual private networks (VPNs). Intranets provide a communications medium unheard of even five years ago, and allow your employees to share information, files, hold conferences and collaborate on projects. Internet sites, when developed properly, can reduce your customer service calls by a dramatic rate by providing frequently asked questions (FAQs) and other product information online. Extranets allow your company to work closely with business partners and share information that each needs access to. VPNs allow you to use the Internet to connect your employees to the intranet from anywhere in the world. All of these options help reduce overhead costs. Just as you have different types of networks, you have different forms of business models. Storefronts (bricks & mortar) are the oldest model around, as you need to have a physical location that people can visit and browse your inventory. Businesses sell to consumers (business to consumer), other companies (business to business), and to governments (business to government). Consumers can also sell goods to other consumers (consumer to consumer). Armed with the information presented in this chapter, you should be able to understand the basic Internet business concepts.
Exam Essentials Understand intellectual property laws and identify the various types that exist. Intellectual property laws exist to protect the rights of creators to profit from their creative works. Copyright law protects works of literature, music, and other visual and performance arts, including software. Copyright law is generally interpreted to allow for limited, free use of protected materials under certain circumstances. This is called fair use. Trademark law protects slogans, phrases, and visual devices. Patent law protects inventions, including devices, processes, and procedures.
Understand the issues that arise when conducting business in the global marketplace. The Internet facilitates a global exchange of products, services, and money, which means buyers and sellers must be sensitive to language and cultural differences. While it’s relatively easy to collect money from anyone in the world through the use of credit cards, but delivering physical goods can be more difficult. Taxes are another issue, as you may have to pay export taxes to the country shipping and import taxes to the country receiving the merchandise. Know different Internet marketing techniques. Web banner ads typically are sold based on the number of times they’re exposed to surfers. You may also choose to advertise your site as part of a webring or on media other than the Internet. There are advantages to be realized by putting lots of free information on the Internet, such as reducing calls to customer support lines and provide customers with a source of help that’s always available. The Internet provides customers with an efficient way to broadcast their feelings—good and bad—about your company and its products. Negative publicity needs to be resolved with the person(s) making the complaint. Identify the different types of networks as they relate to business. An Internet site can reduce the load on an organization’s customer-service department and bring in cash through online sales. An extranet can make an organization’s interactions with its suppliers and subcontractors more efficient by providing them with easy, automated access to the information they need about your organization. An intranet can provide efficient file-sharing, conferencing, and database access to the people in a building, all with familiar and easily supportable web tools. A virtual private network (VPN) cuts down on the expenses of the dedicated lines that were once required to connect geographically separate network nodes. Identify the different business models used in Internet commerce. A business-to-consumer site sells goods and services to individuals, usually by presenting them with a catalog, allowing them to choose items and then pay for their selections with a credit card. A business-to-business site focuses on the needs of corporations and other organizations. Such sites sell the goods and information these entities need. Business-to-government transactions take place between a business and a government entity. Consumer-to-consumer transactions take place through web ads, distribution lists, and online auctions.
Review Questions 1. Copyright law applies to _______. A. Software only B. Works of literature only C. Slogans and logos D. Software, literature, and works of graphic and performing arts 2. Copyright law prohibits _______. A. All use of copyrighted material by anyone other than the copyright
holder B. All unlicensed use of copyrighted material by anyone other than
the copyright holder C. All but “fair use” of the copyrighted material by anyone other than
the copyright holder without permission D. Parody of copyrighted works 3. To be able to enjoy copyright protection, a work _______. A. Must be registered with a government copyright agency B. Must be an original creation C. Must have an individual author D. Must not include excerpts of other copyrighted works 4. In the United States, what government agency registers copyrights? A. The Department of Justice B. The Patent and Trademark Office C. The Copyright Office D. The individual states
5. The process of securing formal permission, perhaps in exchange for
money, to use copyrighted material is called _______. A. Licensing B. Rights management C. Permissioning D. Copyright contracting 6. Two ways to transfer a copyright permanently from the creator of a
piece of intellectual property to another party are _______. A. Work-made-for-hire agreements and assignment contracts B. Work-made-for-hire agreements and permanent licensing C. Assignment contracts and licensing D. Reregistration and work-made-for-hire agreements 7. Trademark protection applies to _______. A. Logos B. Words, phrases, logos, and representative devices C. Only words, phrases, logos, and representative devices that have
been registered with the Trademark Office D. Advertising slogans for as-yet-unreleased products 8. The ™symbol denotes _______. A. A registered trademark B. A trademark for which registration is pending C. Something for which trademark protection is claimed, even if reg-
istration has not yet taken place D. A trademark whose validity has been upheld in court
9. A notable risk assumed by companies that hold trademarks is ______. A. That the trademarked term will become part of the common
vocabulary and therefore lose its protection B. That a judge will disallow the validity of the trademark C. That another company will come out with a similar slogan or
device D. That the value of the trademark will fade in the public
consciousness 10. A patent secures protection for _______. A. A physical product B. An idea for a product C. A software program D. A product or process 11. If two organizations invent the same product independently and
simultaneously, which enjoys patent protection on the product? A. Both B. Neither C. Whichever one applies for and receives a patent first D. Whichever sues the other and wins the right to apply for a patent 12. To qualify for patent protection, a product or process must
be _______. A. Useful B. Unique C. Useful, new, and nonobvious D. New and nonobvious
13. Open source software in the public domain _______. A. May not be sold at a profit B. May be modified by anyone C. May not be incorporated into for-profit software D. Must carry a copyright notice 14. The most common human language for interaction across borders on
the Internet is _______. A. English B. French C. Esperanto D. XML 15. Dealing with nonnative speakers of your language requires _______. A. An eagerness to correct grammar and spelling B. A tendency to use elaborate colloquialisms C. Patience and appreciation of their efforts D. A total reliance on translation software 16. You might consider establishing a warehouse in another country
so _____. A. Your tax standing in that country is clearer B. You can worry less about delivery times C. You can avoid customs duties D. You can apply for United Nations subsidies
17. A web site posted by a disgruntled customer is best dealt with
by _______. A. Filing a libel suit B. Providing honestly good products and services so the positive buzz
drowns out the complaints C. Attacking the site covertly D. Slandering the publisher of the site in newsgroups 18. Placements of banner ads typically are sold in terms of _______. A. Cost per hundred exposures B. Cost per thousand exposures C. Cost per hour of exposure D. Cost per byte 19. Lots of advertisers pay for their ads with reciprocal ad space or other
non-cash commodities. Such payments are called _______. A. Illegal B. Nontaxable C. Payments in kind D. Payment in viewer volume 20. The easiest way to handle intercurrency sales is with _______. A. Wire transfers B. Payment in kind C. Interbank transfers D. Credit cards
Answers to Review Questions 1. D. Copyright protection applies to software, literature, and works of
graphic and performing arts. 2. C. Fair use allows for academic excerpting, commentary, parody, and
other applications. 3. B. Though registration of a copyrighted work with a government
agency can help establish the date of its creation, it is not necessary to have copyright protection. 4. C. The United States Copyright Office handles copyright
registrations. 5. A. The process of securing permission is called licensing. 6. A. Work-made-for-hire agreements and assignment contracts can
transfer copyright from one holder to another. 7. B. Trademarks protect the words, phrases, logos, and representative
denotes a registered trademark. 9. A. A trademarked word that becomes generic loses its protection.
Aspirin, for example, once referred to a particular brand of painkiller. 10. D. Patents protect items (products) and processes. 11. C. To have patent protection, an inventor must apply for and be
granted a patent by a government. 12. C. The law specifies that patentable products and ideas must be use-
ful, new, and nonobvious. 13. B. You can sell public-domain software for a profit, or at least try to
14. A. Because there are so many English speakers on the Internet, that
language is the default for many international communications. 15. C. You have to be patient with and appreciative of people who are
going to the trouble to accommodate you. 16. B. By establishing distribution centers in other countries with which
you do a lot of business, you eliminate the time and expense associated with lots of international shipments. 17. B. You can’t argue with the truth and expect to win. 18. B. Ad rates often are quoted in terms of some price per 1,000 expo-
sures of the ad. 19. C. A payment in kind is any payment made with something other than
cash. Such transactions are taxable. 20. D. Though their issuers collect fees, credit cards allow for easy con-
1. People are complaining that they don’t know what type of documents
your site has. What would improve things? A. A site map B. A more consistent user interface C. Full-text search D. A content type policy 2. People are complaining that there are too many documents listed on
your site, and they can’t find the documents they are looking for. What technology would best help them find the information they need? A. A site map B. A more consistent user interface C. Full-text search D. A content type policy 3. You don’t want your information to be indexed by the large search
engines. What can you do? A. Update the site infrequently. B. Add an entry in your ROBOTS.TXT file to ward off spidering robots. C. Avoid links from other sites. D. Move to a database-driven dynamic web site. 4. Which is not a factor in a client’s performance level? A. Network speed B. Browser version C. Color scheme D. Operating system
5. Secure Sockets Layer (SSL) usually is used for what? A. To encrypt e-mail messages B. To secure the flow of credit card transaction data among banks C. To make transactions between web browsers and web servers
secure D. To encrypt usernames and passwords 6. Log files can contain information that indicates what? A. A stolen password B. A brute force attack on a password C. A disgruntled employee D. Whether bad guys are spoofing IP addresses to bypass your
firewall 7. What is public-key encryption’s big advantage on the Internet? A. There’s no need to exchange keys privately. B. It’s more secure than other forms of encryption. C. It’s especially suitable to e-mail encryption. D. It makes routers’ work easier. 8. One way to design a web page for compatibility with diverse browsers
is to _____. A. Include a JavaScript test of the browser version B. Design with the latest HTML specification C. Not use tables D. Avoid streaming media
9. Multimedia content that requires clients to have plug-ins works best
_______. A. On public web sites B. On sites with an academic focus C. For streaming media D. In environments where the work at hand absolutely requires the
plug-in-enabled content 10. What is the emerging replacement for the GIF image? A. Tagged Image File Format (TIFF) B. Portable Network Graphics (PNG) C. Joint Photographic Experts Group (JPEG) D. Encapsulated PostScript (EPS) 11. The audio-video file format developed for Microsoft Windows is
called._____. A. QuickTime B. Gzip C. Audio Visual Interleaved (AVI) D. MPEG 12. What are the two most popular streaming media standards? A. RealPlayer and Windows Media Player B. RealPlayer and Audio Visual Interleaved (AVI) C. RealPlayer and RealPlayer G2 D. Windows Media Player and Motion Picture Experts
13. The tape archive format usually is used in conjunction with _____. A. The Zip compression format B. The Gzip compression format C. The BinHex compression format D. The Solaris archive format 14. Open source software that is protected by copyleft _____. A. Cannot be sold for profit B. May not be modified C. Must be attributed to the Free Software Foundation D. Includes the Linux operating system 15. The success of a banner-ad placement is usually measured by its _____. A. Cost per view B. Clickthrough rate C. User commentary D. Graphic design values 16. What is one business advantage of a virtual private network? A. The extra security it provides B. The reduction in telecommunications expenses C. The inherent capability to bring vendors onto your network D. The elimination of the need for an Internet site 17. What is the proper HTML syntax to use to make the copyright symbol
18. What are the best types of documents to distribute using Electronic
Data Interchange (EDI)? A. Orders and invoices B. Memos C. Reports D. E-mail 19. What is the best way to increase traffic to a web site? A. Bulk e-mail B. Search engine placement C. Banner ads D. Banner ads on a search engine site 20. A web site designed for use by a company’s inside sales staff is an
23. A network or web site designed specifically for business-to-business
transactions is called an ______. A. Internet B. Extranet C. E-commerce D. Intranet 24. Online sales transactions are best protected by which technology? A. HTML B. SSL C. HTTP D. PGP 25. Why is it important to obtain permission before linking to a graphic
on a third-party site (such as using HTML that displays a graphic or a portion of another web page in your web page)? A. It is illegal in some states. B. You are making a derivative work. C. It is a requirement of HTML. D. The owner of the originating web site must do something to the
server on his end to make it possible. 26. What is the most common mistake people make when designing an
international web site? A. Using HTML coding B. Using graphics C. Using English as the primary language D. Using colloquialisms
27. Which type of site has the capability to reach the largest audience? A. Internet B. Extranet C. Intranet D. Outernet 28. Point-to-Point Tunneling Protocol (PPTP) is most often used for pro-
viding ______ functions between a corporate office and a branch office. A. Remote access B. VPN C. Internet D. Telephone 29. When setting up a U.S. web site for international use, what is the high-
est level of encryption you can support? A. 40 bit B. 60 bit C. 80 bit D. 128 bit 30. Your web site about classic Studebaker automobiles is not showing up
when you search for “Studebaker” on your favorite web search engine. You can improve your web site’s chances of appearing in search results through the use of __________. A. HTML tags B. JavaScript C. Meta tags D. Search forms
31. A merchant system will allow _______ on your web site. A. Secure access B. FTP access C. Credit card transactions D. The use of usernames and passwords 32. Push technology allows users to ________ . A. Subscribe to information and automatically receive updates of that
information B. Send information to other users without knowing their addresses C. Allow other users to access their computer’s hard disk D. Share their files via FTP and e-mail 33. The DIR attribute in HTML 4 indicates what? A. Forms order B. List formatting C. Picture formatting D. Arrow direction 34. Which graphics format requires a plug-in to be viewed by a web
35. Which scripting language is compiled before execution? A. XML B. Java C. VBScript D. VRML 36. What is the use of the “NAME=” attribute in the tag
within an HTML page? A. To assign a frame horizontal position B. To assign a frame vertical position C. To name an image frame D. To identify a target frame 37. Of those listed, which web graphics format allows an image to be
transparent (such as you can see part of the web page through it)? A. JPEG B. GIF89a C. GIF87a D. PNG 38. If you want to use 3-D effects within a web-based presentation, what
is the best technology for the task? A. RealPlayer B. Shockwave C. QuickTime VR D. VRML
39. What should you change in the following HTML web page code to
make sure it displays properly? David’s web page
Welcome to my web page. Enjoy your visit
Click me A. Change all tags to lowercase. B. Change all tags to lowercase and correct the <TITLE> tag. C. Correct the <TITLE> tag. D. Nothing. 40. Portals are: A. High-traffic web sites composed of a wide range of content, hyper-
links to vendor Internet sites, and services B. An open standard created by MasterCard and Visa C. A standard created by the Internet Purchasing Roundtable as a
free, open standard to expedite the purchase, payment, and delivery of goods D. Rely on value-added networks (VANs) to transport data from one
company to another 41. If you want to play streaming video on your web site, which technol-
ogy would you implement? A. RealPlayer B. Flash C. Acrobat D. QuickTime VR
42. METHOD, ACTION, ENCTYPE, TARGET are all attributes of
which HTML tag? A. <TITLE> B. C. ) tag. In the example given, the / was missing from the ending tag. 40. A. The open standard created by MasterCard and Visa is known as
secure electronic transactions. Open Buying on the Internet is the standard created by the Internet Purchasing Roundtable. EDI is the technology that uses VANs to transport data. 41. A. Of the formats listed, RealPlayer is the only streaming video
format. 42. C. The attributes only correspond to the