
Routing Stability Analysis
Table of Contents
See
http://www.ra.net/statistics for more.
Disclaimer
- Preliminary results
- Intent is not to single out specific providers or vendors
- Intent is to explore the problem space
Routing Instability
- Peaks of 8 million BGP announcements and withdraws in a given day
- 100 BGP updates a second
- Major providers are not the major causes of instability
Graph (MAE-EAST RSs)
Two months of BGP announce/widthdraw. There are hugh spikes
and these spikes were caused by only two ISPs. The valleys are
on the weekends.
Long-term Trends
- Instability is increasing (despite aggresive use of dampening algorithms).
- BGP traffic is a function of weekday/weekend.
- Individual ISPs and end-sites can have a disproportionate effect
on routing stability. The only way to control this appears to be
the size of pipe.
Graph (MAE-EAST RSs)
Generally shows an increase in announcements.
Graph (MAE-EAST RSs)
There is a hugh spike between 6-7am, which may be an ISP
configuration window. There is also an increase over the work day
that appears to match the same general trend of traffic.
Daily Trends
- Instability is a function of time (load sensitive?)
- Big spike at 6-7am.(ET)
- Big drop off at 5pm. (ET)
Table of BGP prefix Updates
shows a large number of widthdraws
Problems with BGP updates
- Most BGP updates are redundant or unncessary
- A large percentage of BGP announcements are duplicates
- 99% of BGP traffic is widthdraws
- The majority of withdraws appears to be looping
More on widthdraws
Data suggests that routes are being widthdrawn that were never
announced by that AS.
Causes of Excess BGP Traffic
- Problem seems specific to certain vendors
- Configuration Errors
- Vendor implementation decisions (per spec!), but still contribute
to many widthdraws that are unnecessary.
- Vendor bugs/BGP oscillation?
- Link/Router failures/congestion
- Dialup Customers
Vendor Implementation
This vendor does not keep state on widthdraws that it has already
announced, so it will announce it again. This is allowable by the
spec.
Effects of excessive BGP Traffic
- Withdraws require minimal processing
- No change in Route Server memory or CPU usage
- Ciscos cannot keep up, but seem to eventually recover (7000)
So, is this a real problem?
Other Trends
- Stability as a function of ASPath length
- There is a trememdous amount of instability when the ASPATH
length 8. The majority of aspath length averages at 4.
- Stability function of prior stability
- Stability as a function of AS origin
- some origins are less stable than others
- As few ASs appear in paths that are less stable than others.
Recommendations
- Use BGP dampening.
- Aggregate -- use CIDR. Less prefixes, less flapping
- Filter, Filter, Filter (no /32s)
- Consider using the route servers
Future
- We need your help
- Peer with us
- Answer our email
- Provide more/better data. Consider registration in IRR.
- More data, from more views
- Simulation and Visualization
- Correlation with end-end performance
- Characterize behaviro of instability and ASPaths