Merit Networking, Inc.

Routing Instability

Craig Lebovitz

Table of Contents

Stan Barber's Notes

This work was done by Robert Malan, Craig Labovitz, and Farnam Jahanian who are part of the Internet Performance Measurement and Analysis Team

Instability is bad (packet loss, convergence, load on routers). Alot of this happens because of route caching, which is also bad.

This measurement effort started with Curtis Villamizar during the NSFNET Project. He did these type of measurements.

During the RA project, Merit started to characterize instability so as to evaluate solutions, but never got there since there were so many pathologies.

http://www.merit.edu/ipma is the location for more on what is happening going forward. RSNG based statistics will be continuing (at all NAP locations except Sprint NAP).

What do we understand?

There are three types of routing updates: Forwarding Instability (the network has to choose a new path), Policy Changes (BGP is announcing something new that does not really affect routing), redundant information (junk).

The group has set up five catagories of routing updates. Craig will concentrate on two of these. One is called AADup (duplicate announcements of policy information) and the other is called WWDup (duplicate widthdraw annoucements).

Most BGP information is redundant. WWDup is predominate (more than 99%). Some may be due to now changed software. Probably not significant impact.

Craig notes that most routing instability at 10am EST everyday, some amount during US business hours.

Instability appears to relate to network usage and congestion. Craig suggest that this indicates a widespread failure of large parts of the Internet each day.

There is a possibility that there is a strong 30 second periodicity which might be due to an interval timer one some routers. The vast majority of these are high frequency (most just 2 to 5 times). There are a few low frequencies. It also appears to not be vendor specific.

One possible explanation is the injection of IGP into BGP. There is also the possibility that it is due to CSU Oscillation. BGP has a problem in that it will not always converge. There is also a problem in some routers that may cause them to self-synchronize.

More on self-synchronization: There is a router with a 30 second interval timer, but it is now being fixed. Unjittered timers are a BAD IDEA. Inter-arrival times suggest that self-synchronization is occuring.

More on Timer Dampening: Because of these timers, update tend to be bundled together. Artificial dampening will also occur.

Most route instability comes from advertisements of /24. Having every class C multihomed does not scale. Multihoming is growing and the Quality of aggregation is decreasing. This is bad.

Source of Instability: No dominating contributor, no real pattern, Quality of Aggregation is important

Some folks show more stability than others, but there is no single ISP or AS that are contributing.

The data that we are confident shows instability is only about 10% of the announcements observed.

So, the Internet is mostly stable. However, we need to resolve pathologies, then we can characterize forwarding/routing instability. Both vendors and ISPs have been very helpful in this work.


This page has been accessed times since .
Copyright © 1997 Stan Barber. Reproduction with attribution granted.
Academ Consulting Services
P.O. Box 300481
Houston, Texas 77230-0481
Comments via email to www@academ.com
Academ Consulting Services is a registered trademark.