Another difference between ATM-based networks and native IP
networks is how networks control congestion.
Because of the self-similar nature of data traffic, periods of congestion
cannot be eliminated by aggregating more traffic streams together
or by increasing the number of buffers in gateways.
Native IP networks feature the so called best-effort delivery;
i.e. packets are not guaranteed to be delivered, and gateways
simply drop packets when congestion occurs at some communication line.
When reliable delivery is needed, the discarded data packets are retransmitted.
Since retransmissions cause additional load on an already congested network,
the communicating hosts must slow down to avoid aggravating the congestion.
This is achieved with a technique known as exponential back-off, which
causes hosts to halve the effective transmission data rate every time
the transmitter learns about loss of a packet.
When all hosts cooperate by such voluntary reduction of transmission rates
the congestion quickly abates.
After slowing down, hosts start to increase the transmission rate gradually,
until the saturation point is reached (as determined by loss of a packet),
so the network is operating close to congestion; i.e. the nearly-optimal
resource utilization is achieved.
This technique was shown to control congestion effectively in a global
heterogeneous network, while achieving nearly 100% utilization.
Note that some packet loss is necessary for normal functioning of a
best-effort network even if it is not overloaded.
Since communication of congestion indicators between transmitters and
receivers is not instantaneous, some time passes from the beginning
of congestion until transmitting hosts start slowing down.
That time is close to the characteristic round-trip time (RTT) in
the network.
The size of buffers in the gateways in the best-effort networks should
be sufficient to accommodate accumulation of packets arrived after congestion
started but before transmitting hosts slowed down.
Because congestion does not occur instantaneously, the maximal data rate of traffic
routed to an outgoing link is close to the link's bandwidth; so the
buffer size for that link should be at least as large as the product of RTT (delay)
and bandwidth.
Increasing buffer size beyond that allows the gateway to accommodate longer
transient congestion, but also causes packets to spend more time in
buffers (because the TCP congestion control algorithm tends to "fill
the pipe", and makes queues to grow to the limit).
Therefore, further increasing buffer size only increases delays in the
network, effectively decreasing the quality of service.
In a network that has properly sized buffers, the maximal RTT therefore
does not exceed minimal RTT multiplied by the number of hops (an average
number of inter-POP hops in the modern Internet is about 5, the "small"
hops inside POPs are artifacts of clustering technology).
An important modification of cooperative congestion control is
to have gateways drop packets randomly before the congestion actually
occurs, with a frequency dependent on the queue size.
This technique is known as Random Early Discard (RED), and was shown to
be effective in pro-active prevention of congestion.
RED allows decreasing the size of buffers by as much as 60%, thus significantly
decreasing maximal network latency.
Another benefit of RED is that if a packet source is not willing to cooperate
by reducing packet rates, its packets will have higher probability of
being discarded; i.e. RED gateways enforce fairness.
Virtual-circuit based networks usually employ the method of congestion
control known as back-pressure flow control.
The back-pressure flow control allows loss-free transmission by having
gateways verify that the next gateway has sufficient buffer space
available before sending data.
The back-pressure flow control works perfectly when buffering on different
virtual circuits is independent; i.e. when every virtual circuit has its own
pre-allocated buffer space.
However, this solution is not practical because most virtual circuits are
inactive most of the time, so the buffer space would be wasted.
It is also very
expensive when the number of virtual circuits passing through a switch
reaches millions, as is the case of pure-ATM global network.
The size of a buffer per every virtual circuit should be at least equal to
the product of bandwidth and round-trip time on a communication line to
the next switch, to ensure that the full capacity of the line can be used by
the virtual circuit when other VCs are idle.
This means that the maximal delay in a properly tuned network with
back-pressure flow control is three times the minimal delay (for every
hop, the maximal delay equals time spent in a buffer, plus the propagation
time in the link; buffer size is sufficient to accommodate round-trip
time on the link, or two times the delay in the link).
The difficulty of practical implementation of independent buffering means that
switches are designed to share buffer space between
virtual circuits.
When the shared buffering is used, congestion on a line causes
depletion of buffer space available to other lines on the same switch;
which in the case of a completely loaded network triggers the congestion
to propagate to neighbor gateways, and possibly to a large part of the
network.
This phenomenon of chain-reaction congestion collapse is not possible in best-effort
networks because they simply remove excessive packets.
To summarize what was said above, best-effort networks have
similar (if RED is used, and backbones comprise large central-office
routers) latency parameters as back-pressure flow control networks,
but are not prone to chain-reaction congestion collapses.
In any case, the pure ATM architecture for global networks is not feasible
for the reasons outlined in previous chapters; so we will further discuss
congestion control in flattened networks.
The simplest flattened network wouldn't perform any statistical multiplexing
at ATM level, simply using a mesh of CBR permanent virtual circuits.
The congestion control is then performed by IP routers on the edges, and
is essentially the same best-effort as in native IP networks.
This architecture, however, does not make much sense, because similar functions
can be successfully performed by a combination of IP routers and cheap
synchronous multiplexors (producing a native IP network with too many
circuits).
Therefore in a realistic flattened network, ATM switches will perform statistical
multiplexing.
When the back-pressure flow control is used by the ATM backbone, the edge IP
routers will drop packets if the backbone cannot accept more data.
This is equivalent to the best effort, but with the added "benefit" of
chain-reaction congestion collapses.
Another alternative is to have ATM switches drop cells, thus performing
the best-effort delivery.
A peculiarity of ATM is the small size of cells (53 bytes) making
implementation of a sliding-window reliable transport protocol operating
on individual cells infeasible (for comparison, a minimal TCP/IP header is 40 bytes).
This means that a transport-level data packet must be split into several cells
(and it is done so by AAL5).
However, the switches operate on cell level, so a loss of one cell makes it
necessary to retransmit the entire packet.
In a typical case of a bulk file transfer using 1.5 kilobyte packets (which will
occupy 31 cells), a cell loss of 1% will result in 30% packet loss.
In other words, a minor congestion would cause sudden loss of connectivity.
ATM switch vendors attempted to use strategies such as packet-tail drop (i.e. if
a cell was dropped, all subsequent cells in the same packet are also discarded),
and variants of RED.
However, those methods only reduce the effect but do not eliminate
it completely, or render it harmless enough.
The only real solution is, therefore, to accumulate an entire packet before making
the decision about dropping it.
An ATM switch doing that becomes a best-effort delivery gateway with a
rather inefficient line encapsulation protocol.
The next logical step is to replace the inefficient line encapsulation protocol
with a framing method for entire data packets (to reclaim 15-20% of total
line capacity lost to ATM framing overhead), thus returning to native IP networking.