Lawrence Berkeley National Laboratory

End-to-End Routing Behavior in the Internet

Vern Paxson

Table of Contents

Stan Barber's Notes

An experiment to measure end-to-end routing behavior: what the users see.

Network Probe Daemon -- runs on sites around the Internet. These sites are not "special" per se. The daemon take authenticated requests to run traceroute, tcpdump and source/sink a probe.

A large number of sites in US, Asia and Europe. There is a n-squared effect when sites are added.

Two runs were done. The first one involved 6459 traceroutes sampled every 1-2 days in the last two months of 1994. The second one was done the last two months of 1995. There were 35,109 traceroutes with two rates (mean 2 hours for 60% of the paths and 40% with a mean of 2.75 days)

85 ASs samples (about 8% of the total active ASs) (Of course, not all ASs are equal. Since some ASs are bigger than others, about 50% of the ASs that are covered.)

Routing pathologies

  1. Unresposive router: rare in the first sample, none in the second (There was one wierd one at adv/icm-dante-e0.icp.net since it can't get to AS690.)
  2. Rate limited: just in Solaris end-systems and ucol router.
    Could not connect to some NPD's (at a rate by about 5-7%). This introduces a bias towards an underestimation.
  3. Routing Loops: There were a few loops. 12 in the first sample, 73 in the second. There were no inter-AS lops. There were a few that were last than 20 seconds. The largest loop was 5 routers.
  4. Route Flaps: An example of a particularly wierd traceroute illustrates how things change over the space of a few minutes.
  5. Bad Routing: An example of particularly bad routing illustrated that sometimes you don't really don't know where you packets can go.
  6. Flutter:An example of flutter (which can occur when load balancing is flopping) was presented showing one packet going one way and one going another.
    1. Path properties are difficult to predict or characterize.
    2. Path can be partially asymmetric.
    3. Out-of-order packets can cause spurious TCP fast retransmit.
    4. This bodes ill for WAN deflection routing.
    5. None of these are a big deal for localized flutter.
  7. Skipping -- In this case, it shows up how the Cisco does some its load balancing
  8. Internet diameter -- not really a pathology -- Lots of places in the internet are more than 30 hops.
  9. Outages -- For outages less than 70 seconds, routers will recover faster from outages faster than for outages longer than 70 seconds.
  10. Circuitous routing -- not really a pathology

    Changes across the datasets (1994/1995):


    Total user visable pathologies did get worse.

    Route stability

    Two Characteristics:
    1. Prevalence: likelyhood of observing a particular route. Easy to access. Affects the utility of caching.
    2. Persistence: how long a route remains unchanged. Hard to assess. Affects the utiliy of storing state in routers; consistency of repeated network measurements.

    A route that is persistent is also prevalent. However, a prevalent route may not be persistent.

    Internet paths are strongly dominated by a single route.

    More than 2/3s of routes persist for more than a day.

    Route symmetry

    Is the path from A->B the same as B->A.

    About 30% of paths in 1994 were asymmetric.
    About 50% of paths in 1995 were asymmetric.

    The magnitude of the asymmetries were were using just one city, though many were two.

    Questions

    Yakov commented that traceroute is not an accurate tool. He wonders about the error in the data. --- It could be large. The data has been statistically analyzed to remove as much error as possible, but this is not an attempt to characterize the errors in traceroute, per se.

    Is the number of hops growning and reducing? In the first study, there were no sites beyond 30 hops and in the second, there were definately some that were 30 or greater.

    It appears that most of the sites in the sampling pool were academic. Is the sample pool valid? Yes, the sample pool is valid since anything specific to the site where statisically eliminated.

    Why consider cities as significant granularity? It was a way to represent specific geographical distances. It might not be a great approach.

    Since most sites were academic, what impact did the shutdown of NSFNET have? Surprisingly, very little.

    Enke is not surprised that there is more asymmetric routes. Enke thinks that as more new BGP features are available and used, this may be moderated more.

    Enke says that the data presented did show a large number of routing instabilities. He is hoping that Vern will rerun this again in 1996. Vern said that he didn't know if he will be able to do that. However, Vern does suggest that some kind of mechanism be developed to make gathering this data more routine.


    Copyright © 1996 Stan Barber. Reproduction with attribution granted.
    Academ Consulting Services
    P.O. Box 300481
    Houston, Texas 77230-0481
    Comments via email to www@academ.com