804 1066_05F9_c2
1
© 1999, Cisco Systems, Inc.
Establishing Best Practices for Network Management Session 804
804 10...
18 downloads
476 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
804 1066_05F9_c2
1
© 1999, Cisco Systems, Inc.
Establishing Best Practices for Network Management Session 804
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
2
1
Agenda • • • • •
804 1066_05F9_c2
Introduction to Best Practices Preparing the Network for Management Managing Change Fault Management Summary
3
© 1999, Cisco Systems, Inc.
Introduction to Best Practices
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
4
2
Network Downtime is Costly • The Internet and e-commerce has significantly increased the availability stakes… 24-hour banking E-trade Global economy
8
Infonetics Cost of WAN Downtime ’98
7 6 5
Average Dollars 4 per Year 3 ($000,000)
2 1 0
$3.6M Productivity Productivity Loss Loss
$4.2M
$3.6M
Revenue Revenue Loss Loss
Costs
Enterprise Network Mgmt. Budget
*Due to hard downtime and service degradations 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
5
Best Practices Defined • Applying what works well for others to improve overall network availability Reduce the time required for planned outages (scheduled change) and includes changes with no associated outage Reduce network downtime during unplanned outages (unscheduled change)
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
6
3
Lots of Practices—Some Truths • Even the best NM products can be useless with “bad” practices
Do What Works for You!
• Tools help you to do your job, they are NOT the job • Communication and security are the “bread and butter” of best practices 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
7
Preparing the Network
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
8
4
Congratulations! You’ve just Been Promoted to Manage the Entire Network for the Western Region...
804 1066_05F9_c2
9
© 1999, Cisco Systems, Inc.
What They’re Really Thinking…
What am I getting into… how am I going to do this? Where do I begin?
804 1066_05F9_c2
I sure hope he lasts longer than the last guy..
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
What a loser! Does he have any idea what he’s in for? How come we don’t have legs?
10
5
Preparing the Network for Management Best Practices 1. 2. 3. 4. 5.
804 1066_05F9_c2
Selecting the “right” tools Preparing the devices Preparing the tools Building a baseline Maintaining “management”
11
© 1999, Cisco Systems, Inc.
Selecting the Right Tools
? • How do I select the “right” set of management applications? Understand the technologies and buzzwords Understand your network and end-user requirements Implement company standards Many choices evaluate and choose what’s right for your environment 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
12
6
Platforms and Vendor Specific Management • NMS SNMP-based, status map, and trap receiver HP Openview, Tivoli Netview, CA UniCenter, SNMPc, etc. MicroMuse, Seagate, Concord, Enterprise Pro, and MRTG
• Vendor Specific Geared towards managing a specific vendors devices only Optivity, Transcend, CiscoWorks2000
804 1066_05F9_c2
13
© 1999, Cisco Systems, Inc.
Integrating Enterprise Management Helpdesk, Trouble-ticket, Event MOM
Application Application
DBMS DBMS
Server Server
Service
804 1066_05F9_c2
Service
Service
Network Network
Device
Device
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
Device
Desktop Desktop
Device
Device
User User
Device
14
7
Understand Your Organization • Roles and responsibilities • Escalation policy • Help desk vs. operations • Planners vs. administrators 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
15
Preparing the Devices
• Security for Management • Notification • Baseline 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
16
8
Securing the Devices • Identify scope of control Who needs access to what?
• Secure and log access Physical access (badge readers) Telnet and console (AAA accounting, Syslog) SNMP communities (ACL, SNMP traps) 804 1066_05F9_c2
17
© 1999, Cisco Systems, Inc.
Sample Security Configuration
aaa new-model aaa authentication login test tacacs+ line aaa authentication enable default tacacs+ enable access-list 8 permit 161.44.34.157 logging 161.44.34.157 logging source-interface Loopback0 snmp-server community public RO snmp-server community bitbuck RW 8 snmp-server contact Paul L. Della Maggiora snmp-server chassis-id 071293 snmp-server system-shutdown snmp-server trap-source Loopback0 snmp-server trap-authentication snmp-server host 161.44.34.157 public frame-relay
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
Tacacs+ SNMP Community ACL Syslog SNMP gets and sets
SNMP traps
18
9
Security Access Changes
• Password change policy Quarterly Every time an employee leaves
• Solution Use radius or tacacs+ Script the change 804 1066_05F9_c2
19
© 1999, Cisco Systems, Inc.
Notification • SNMP Traps Critical for NMS notification
• Syslog Cisco-specific notification
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
20
10
Sample Notification Configuration
aaa new-model aaa authentication login test tacacs+ line aaa authentication enable default tacacs+ enable access-list 8 permit 161.44.34.157 logging 161.44.34.157 logging source-interface Loopback0 snmp-server community public RO snmp-server community bitbuck RW 8 snmp-server contact Paul L. Della Maggiora snmp-server chassis-id 071293 snmp-server system-shutdown snmp-server trap-source Loopback0 snmp-server trap-authentication snmp-server host 161.44.34.157 public frame-relay
804 1066_05F9_c2
Tacacs+ SNMP Community ACL Syslog SNMP gets and sets
SNMP traps
21
© 1999, Cisco Systems, Inc.
Building a Baseline
• Document the network Maps Spreadsheets/databases
• Track inventory Identify equipment and who owns it
• Backup configurations 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
22
11
Building a Baseline
• Collect performance data Snapshot of the network Provides historical data for comparison Useful for capacity planning and trending
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
23
Discovering the Network • Auto-discovery can make documentation easy… but the daemons must be tamed Filters Seedfiles Discovery intervals Exchange inventory among multiple autodiscovery tools 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
24
12
Layer 2 Autodiscovery 1. Query seed device via SNMP 2. Query CDP neighbor table (ciscoCdpMIBObjects) 3. Interrogate neighbors Caveat—CDP only sees Cisco devices c55k-26 (enable) sho cdp neigh Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge S - Switch, H - Host, I - IGMP, r - Repeater Port Device-ID Capability -------- ------------------------4/1 002261261 4/1 002274433 4/1 069004796 4/1 Router_81.130 4/1 WBU_GATEWAY
804 1066_05F9_c2
Port-ID
Platform
----------------- ------------------ ------4/1 4/1 4/1 Ethernet0 Ethernet0
WS-C5000 WS-C5000 WS-C5500 cisco 4500 cisco 4500
T B S T B S T B S R R
© 1999, Cisco Systems, Inc.
25
Layer 3 Autodiscovery 1. Start with default router 2. Query MIB II ifTable, ipAddrTable, ipRouteTable 3. Interrogate neighbors Special cases e.g. IP unnumbered, HSRP 4500-4>sho ip rout Codes: C - connected, S - static, I - IGRP, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, * - candidate default U - per-user static route Gateway of last resort is not set
O C C 804 1066_05F9_c2
100.0.0.0/8 is subnetted, 1 subnets 100.100.100.0 [110/70] via 172.16.11.1, 13:35:34, Serial0 153.10.0.0/16 is subnetted, 1 subnets 153.10.1.0 is directly connected, Serial1 172.16.0.0/16 is subnetted, 1 subnets 172.16.11.0 is directly connected, Serial0
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
26
13
Inventory
• Typical NMS is not enough IP address, comm strings, and interfaces
• Third-party management suites and vendor specific provide richer content • MIBs are generally vendor specific, although entity MIB will change this 804 1066_05F9_c2
27
© 1999, Cisco Systems, Inc.
Inventory • Items of interest System information Chassis information Chassis cards Interfaces Storage and memory Serial numbers
• All information available via IETF and Cisco MIBs 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
28
14
Configurations • Collection repository Useful for staging new configs Version control helps with space and documentation
• How to automate Scheduled backup Watch Syslog 804 1066_05F9_c2
29
© 1999, Cisco Systems, Inc.
Maintaining Management
• Adding new devices • Keeping the management applications up-to-date • New management products and standards An Ongoing Process! 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
30
15
Change Management
804 1066_05F9_c2
31
© 1999, Cisco Systems, Inc.
Post Mortem Blues • Unplanned outages may be the result of many factors. How do you explain and account for what occurred?
I Didn’t Do It
Fact based vs. hearsay Who, what, and when was the change made? Your job may be at stake 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
32
16
Some Facts
• 80% of all outages are due to human error* When an airlines reservation system went down, thousands of travel agents had to book flights manually. Estimated loss of reservations amounted to $36,000 a minute
*Based on Carnegie-Mellon Usability Study 804 1066_05F9_c2
X
© 1999, Cisco Systems, Inc.
33
Common Causes of Change
• Business growth or downsizing • New applications or services • Implementing new technology • Deploying product fixes or upgrades
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
34
17
Change Management Defined
• Configuration, software and hardware changes • Change tasks include: Anticipating and planning for change, controlling the introduction of change, and installing and implementing changes to software and hardware 804 1066_05F9_c2
35
© 1999, Cisco Systems, Inc.
Best Practices for Change Best Practices 1. Implementing a change control process 2. Planning for change 3. Implementing change 4. Monitoring change
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
36
18
Change Control Process
Change request • End user request • New app, server • New network service
Change review board • Identify risk • Schedule change • Generate work order
Change or work order • Tracking # • Detailed change requests
Close Work Order or Resubmit If Problems
Validation • Change verification • Audit
Implementation • Net admin • Engineer/tech.
804 1066_05F9_c2
37
© 1999, Cisco Systems, Inc.
Examples
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
38
19
Planning • Hardware Pre-configure, test prior to upgrade
• Software Research release, defect support, new feature set, and device compatibility
• Configuration Test prior to deployment
• Have a back-out plan 804 1066_05F9_c2
39
© 1999, Cisco Systems, Inc.
Implementing
• Make different types of changes one at a time • Maker/checker model • Understand contingency plan in event of failure • Validate the change was successful 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
40
20
Monitoring
• Identifying change, who, what, when • Audit trail • Fault notification
804 1066_05F9_c2
41
© 1999, Cisco Systems, Inc.
Change Management Tools Planning SWIM—Defect, SWIM—Defect, image image analysis analysis CWSI—Layer CWSI—Layer 2/Layer 2/Layer 33 topo topo Netsys—Impact Netsys—Impact of of change change
Deployment
Monitor
SWIM—Download SWIM—Download software software images images CWConfig—Deploy CWConfig—Deploy config config changes changes CiscoView—Switch CiscoView—Switch config config changes changes
CAS—Change CAS—Change audit audit and and reporting reporting service, service, logs logs software, software, config config and and hardware hardware changes changes CWSI—Topo CWSI—Topo and and user user tracking tracking
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
42
21
Change Scenario 1. User telnets into device and makes a config change (shutdown int) 3. C/Agent identifies device change, notifies archive
Server
Change Agent
4. Archive gets config via transport validates change w/DIFF Archive Audit Log
5. IF VALID, Archive gets Config and logs details to ENCASE Syslog Poll
Transport Change
Network 804 1066_05F9_c2
2. Device updated Syslog generated
© 1999, Cisco Systems, Inc.
43
Fault Management
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
44
22
Scenario
• Virginia building-003 network goes down • Your boss has bad breath • Multiple people making changes • Resolution takes nine hours 804 1066_05F9_c2
45
© 1999, Cisco Systems, Inc.
Scenario
• Result: Network was down additional four hours due to conflicting changes No one seems to know how the problem occurred or how it was resolved
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
46
23
Best Practices for Fault Management Best Practices 1. 2. 3. 4. 4.
804 1066_05F9_c2
Preventive Measures Coordination Reacting to Faults Escalation Policy Become Proactive
© 1999, Cisco Systems, Inc.
47
Preventive Measures • Maintain accurate documentation Key to quick resolution Includes maps, closets, connections, wiring, and servers May require process/policy change. Only good if up to date, easy to maintain, and useful Dump it if you can’t maintain it! 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
48
24
Preventive Measures • Remove single points of failure Alternate paths for mission-critical applications Redundant equipment for critical junctures Ensure appropriate bandwidth to avoid contention and over utilization Permits network rerouting 804 1066_05F9_c2
49
© 1999, Cisco Systems, Inc.
Coordination • Communication is KEY... Understand roles and responsibilities Place phones in closets; use cell phones, pagers Publish policies and procedures 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
Say What You Do, Do What You Say 50
25
Coordination
• Establish base of operations All efforts must go through one person Prevents “who dropped the baby” and “slam management” Conduct practice “scramble”
• Train staff on devices and technology 804 1066_05F9_c2
51
© 1999, Cisco Systems, Inc.
Determination of Faults
• Notification via: NMS status change Trap and event logs Help desk Phone call from tech (“whoops...”) 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
ALARM
52
26
Determination of Faults
• Remove the “noise” factor 1. Filter 2. Prioritize 3. Appropriately notify 4. Correlate 804 1066_05F9_c2
53
© 1999, Cisco Systems, Inc.
Reacting to Faults • Determine fault domain Which equipment, services, and users are affected?
• Determine level of response What is the severity of the fault? Can we kill the backbone? Identify dispatch timeframe and number of people 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
54
27
Reacting to Faults (Severe) • Determine escalation timeline Criteria and time limits to escalate to next level Opening a case with the TAC Identifying the point of drastic action 804 1066_05F9_c2
Is It Time to Hit the Big Red Switch?
© 1999, Cisco Systems, Inc.
55
Reacting to Faults (severe) • Coordinate, communicate, and document • Debrief Determine source of fault Evaluate recovery efforts Document resolution for continuous improvement process In order to learn, avoid CYA environment 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
56
28
Moving from Reactive to Proactive
• Automate fault notification, escalation and resolution via “triggers” • React to data before it goes bad • Learn device and network behavior That doesn’t look right… 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
57
Active vs. Passive Polling
• Polling with thresholds vs. event-based polling RMON events and alarms
• Conservation of network traffic vs. device CPU and memory • Might be a combination of both 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
58
29
Fault Management Tools Planning CiscoView— CiscoView— Real-time Real-time time time monitoring monitoring RME—Availability, RME—Availability, Syslog Syslog and and CCO CCO tools tools CWSI—User CWSI—User tracking, tracking, traffic traffic director director and and topo topo
Deployment SWIM— SWIM— Defect Defect analysis analysis CCO/TAC— CCO/TAC— Case Case tracking tracking tools tools Stack Stack Decoder— Decoder— Crash Crash analysis analysis
804 1066_05F9_c2
Monitor Availability— Availability— Monitor Monitor key key resources resources Syslog—Reporting, Syslog—Reporting, automated automated recovery recovery 24-Hour 24-Hour Reports— Reports— Monitor Monitor reloads, reloads, Syslog, Syslog, and and change change Traffic Traffic Director—RMON Director—RMON config config and and report report
© 1999, Cisco Systems, Inc.
59
Best Practices Can Improve Network Availability • Prepare the network for management Security, notification and maintenance
• Implement a change control process Plan, deploy and monitor
• Reduce unplanned outage minutes through fault management Prepare, coordinate and be proactive 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
60
30
For More Information • General network management portal http://netman.cit.buffalo.edu/index.html
• Another good network management portal http://compnetworking.miningco.com/msubmanage.htm? terms=network+management&cob=home&TMog= 5006366091143m&Mint=56534342191358&FFV=1
• “The Simple Times” http://www.simple-times.org/pub/simple-times/issues/
• SNMP FAQ http://www.cis.ohio-state.edu/hypertext/faq/usenet/ snmp-faq/part1/faq.html 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
61
For More Information • Sample Cisco device security configs http://www.cisco.com/warp/public/700/tech_configs .html#SECURITY
• Cisco device SNMP configuration tips http://www.cisco.com/warp/public/490/index.shtml
• White paper on threshold management http://www.ccci.com/product/papers/pete/papers/thresh.htm
• Public domain performance monitoring tool (MRTG) http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/mrtg.html 804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
62
31
Please Complete Your Evaluation Form Session 804
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
63
804 1066_05F9_c2
© 1999, Cisco Systems, Inc.
64
Copyright © 1998, Cisco Systems, Inc. All rights reserved. Printed in USA. 1066_05F9_c2.scr
32