Why does packet loss seriously hurt application performance over the WAN? How do we address it?

Network-Switches-234-x-575

TCP was never optimized for high-bandwidth WANs or interactive applications over the WAN. Packet loss has the greatest impact on the performance of most applications over the WAN, by design.

Why is packet loss such a killer? There are many reasons, most having to do with the nature of how TCP was designed especially how TCP does congestion control/congestion avoidance. The key issue revolves around dealing with contention for limited bandwidth.

TCP is designed to use all available bandwidth, and to use it “fairly” across flows on average. To do this, given that each end station and TCP flow doesn’t know how much bandwidth is available – neither if the single flow was the only one using bandwidth end-to-end at the moment, nor in the more typical case when given multiple flows, the amount available changes moment to moment.  So the sender of the TCP data needs a way to know when “enough is enough.” Packet loss is the basic signal of this.

TCP and routers together are designed to control data flow to prevent over-utilization of the network and the potential of congestion. The goals of TCP’s design are to minimize the amount of time that the data flow grinds to a halt (congestion avoidance), and to react appropriately to reduce traffic at those times that it does (congestion control).

TCP packets received by the receiving station are acknowledged back to the sending station. TCP is a window-based protocol, meaning that it can have a certain amount of traffic “in flight” between sending station and receiving station. It is designed to back off and substantially reduce the amount of bandwidth offered (by half) when packet loss is observed. Further, until the lost packet is received, and acknowledged by the receiver, only limited amounts of additional packets will be offered. Even for those applications that use multiple TCP flows, the similar principle applies that only so many new flows opened/packets sent until a lost packet is received at the other end and its receipt acknowledged.

Packet loss is detected in one of two ways. For a longer transfer where just a packet or two is lost, the sender notices and reacts to the loss when subsequent packets are acknowledged by the receiver, but not the missing one. Alternatively – and more typically for new or short TCP flows – packet loss is detected by the occurrence of a “timeout”: the absence of receipt of an acknowledgement of the packet. The amount of time until a “timeout” is deemed to have occurred varies typically between a couple hundred milliseconds and three seconds.

TCP is an elegant protocol designed over 40 years ago when CPU and memory was extremely expensive. This worked – and continues to work – fantastically well on high-bandwidth, low-latency LANs and on low-bandwidth, high-latency WANs. But TCP wasn’t designed to work optimally in the medium-to-high bandwidth, high-latency environment that characterizes most WAN use today. TCP also wasn’t designed optimally for running interactive applications (web browsing, remote desktop) across very long-distance WANs.

TCP particularly was designed so that each end station could make its decisions completely independently of every end station. This conservative approach contributes to network stability and minimization of congestion.

Because the amount of data offered into the network is reduced by half – and only increased slowly thereafter as packets received successfully are acknowledged – when a single packet loss is detected by the sending station, WAN packet loss can have a huge impact on large transfer performance.  This is why private networks, such as MPLS, VPLS or IEPL improve application performance so significantly: they nearly eliminate packet loss.

What else can be done about  packet loss? Well, at a standards-compliant end station, pretty much nothing. But for an intelligent device in the middle of the network, and especially one at a key WAN edge location, there are many possibilities. There are at least six different approaches to minimizing the impact of WAN packet loss on application performance:

- Drastically reduce the number of WAN packets transmitted.

- React differently to loss (if good knowledge of the network in between).

- Mitigate the effects of the loss and hide it from the end station.

- Enable the end stations to react more quickly to loss.

- Avoid much of the loss in the first place (think MPLS, VPLS, IEPL)

- Avoid the additional loss that often follows after a burst of loss.

Application layer solutions are the first, most obvious approach here.  Doing replicated file service avoids WAN packet loss in accessing files, delivering full LAN-speed performance, because all client access to the data is in fact done locally.

Similarly, “static” caching of objects via a local web (HTTP) object cache completely avoids WAN access for those objects, and thus any impact from packet loss.

Beyond these, drastically reducing the number of packets transmitted is an area where WAN Optimization offerings do a great job.  Now, since we’re talking about reducing the number of packets transmitted, you might think first of memory-based compression, which is one of the techniques almost every WAN Optimization solution offers. Memory-based compression can reduce the time it takes to do the first-time transmission of data – a factor of two for compressible data is typical – but in fact it doesn’t do proportionately better in the face of packet loss than when there is little or no loss. Reducing the amount of data sent by 50% doesn’t really help that much when it comes to packet loss and its impact on a window-based protocol like TCP. So while memory-based compression certainly doesn’t hurt here, it’s not really the answer when the problem is WAN packet loss.

There are two other technologies in most WAN Optimization products that do have a large performance impact in the face of packet loss: data deduplication, and CIFS-specific application proxy.

Data deduplication essentially does “dynamic” caching of data locally, and while this requires at least one round-trip across the WAN, it will always involve far fewer such round-trip transactions than when the data is not stored locally. Besides saving bandwidth and speeding up data transfers in the more typical case of little to no packet loss, the application speed-up is proportionately greater still in the face of any meaningful amount of packet loss. And data deduplication is usually applicable for any application, not just for file access.

For the very chatty Microsoft CIFS protocol, data deduplication is usually combined with an application-specific proxy that will reduce round-trip requests still further. By essentially doing local CIFS termination, a CIFS proxy provides much faster access to files on a remotely located file server even for the first access. The impact on application performance of the combination of data deduplication and CIFS proxy can be 10 to 40 times even when there is no packet loss; in the face of packet loss, the additional benefit can be another 2x to 10x, meaning a combined performance impact of anywhere from 20x to 400x or more. For files that have been previously accessed across the WAN, this is essentially full LAN-speed performance, versus the very slow, often unusable WAN performance under packet loss if accessing large files across a WAN completely unaided.

Andy Gottleib is a twenty-five year data networking veteran, who founded Talari Networks, a pioneer in WAN Virtualization technology, and served as its first CEO, and is now leading product management at Aryaka Networks. Andy is the author of an upcoming book on Next-generation Enterprise WANs.  His bog is located at http://www.networkworld.com/community/blog/26142

 

My network is slow. Why? Do you have network monitoring in place?

Network performance is always a hot topic to discuss.  When performance slows, it is easy to blame the carrier.  But often the problem is due to your own LAN or server applications.  How can you figure out what the problem is?

Unless you have centralized network monitoring installed on your network, you very likely will never resolve your performance issues.

Most people are familiar with SNMP (Simple Network Management Protocol) since nearly every network device supports it.  SNMP is fine to keep track of devices that devices are attached and operating, but beyond that, it places a great deal of overhead traffic on your network. It uses Polling, running information back and forth on the network.  But SNMP won’t provide much troubleshooting information.

Products such as NetFlow, Sflow, Jflow and IPFIX are common standards for Flow Records.  Flow Records follow the flow of packets source IP address, destination IP address, source port, destination port, layer 3 protocol type, type of service (TOS) byte, and input logical interface.  Flow analysis compiles and collects packet samples entering the switches and routers, providing good data for analysis.  Flow analysis uses statistical sampling, so not every packet is collected.  There are some freeware applications that run on Linux that are worth investigating.

Flow-based analysis relies heavily on the same hardware being used to control network traffic: the routers and switches themselves.  On busy networks,  conflicts for hardware resources like processing power and memory can result. It is the flow analysis that loses when conflicts occur. While it does allow for some troubleshooting, like identifying users who are hogging bandwidth, for example, it does not include any payload information, nor are the packets saved, limiting one’s ability to troubleshoot the network intelligently. (This explains one reason why routers have options for additional memory.)

Packet based monitoring is the most comprehensive tool.  The common term “packet sniffing” is done by capturing every packet traversing the network.  The packets are then decoded and analyzed, allowing analysis right down to the application level.  The server collecting your data can be accessed whenever a network problem arises, so you can see exactly what has happened.  You can go back in time which is especially helpful with intermittent problems that are difficult to reproduce.  Finally, you will also want to collect Payload information, which is the linkage between networking and application information.  Then all the data you need is available.  But this approach also is the most expensive approach.

Here are a few links worth visiting learn about monitoring applications:

MRTG – Multi Router Traffic Grapher:   http://oss.oetiker.ch/mrtg/

NTOP Netflow Probe: http://www.ntop.org/solutions.html

WinPcap: http://www.winpcap.org/

PRTG Network Monitor: http://www.paessler.com/prtg/

EtherApe: http://etherape.sourceforge.net/

Wild Packets: http://www.wildpackets.com/products/network_analysis

Solarwinds:  http://www.solarwinds.com/products/

MPLS-Experts has the technical resources to help you resolve your network performance challenges.  Contact us for more information.

WAN Accelerators and MPLS – Important Facts

WAN Accelerators are wonderful tools in improving your network performance, provided your traffic can benefit from this technology.

If you obtain an MPLS network, your network performance will be better than a VPN over the internet.  But you need to select your Classes of Service appropriately.  Different CoS levels have different packet loss SLAs.  On a simple level, the SLAs might be:

  • Basic CoS: 99.9% packet delivery
  • Middle CoS: 99.99% packet delivery
  • Best CoS: 99.999% packet delivery

If you decide to subscribe to all Basic CoS, the SLA is 99.9% packet delivery.  That is typically the same as an uncongested internet access circuit, so you might not see any performance improvement.  But if you use your WAN Accelerator with the Middle CoS with 99.99% packet delivery, you will experience a more noticeable improvement.  Obviously, the Basic CoS will work better than the internet when the internet is congested, since the MPLS network avoids those bottlenecks.

When using a WAN Accelerator, since you are using compression, if your compression ratio is 20:1, if you lose 1 packet, you are really losing 20 or more packets.  So you maximize performance with a network that has less packet loss/better packet delivery.

To reduce or eliminate the number of undelivered packets, select a higher CoS.

One thing you should be aware of, that is not widely publicized is that the lower level Class of Service levels will not provide the expected performance improvements when you use a WAN Accelerator.  But if you design your network accordingly, you will be very pleased with the performance boost.

What Causes Packet Loss on the Internet?

Why Packet Loss?: When faced with the typical Global Internet ping time of 300 milliseconds and 1 out of 25 packets being lost (4 per cent) an Internet VPN user with a T1 connection who might expect 10 megabytes/minute throughput on a clean local connection will find he is getting less than 10 per cent or under 1 megabytes/minute. 

Congestion: The Internet Standards treat packet loss and congestion as synonyms. Routers discard incoming packets that can’t be stored or transmitted.  Imagine an Ethernet (10 megabit/sec) pipe feeding a T1 (1.54 megabit) router.  Anytime the average feed from the Ethernet exceeds 1.54 megabits/sec, packets will be lost. This is normal congestion, (ie. packets lost) because the average sum of the inputs to a router exceeds the capacity of its output. 

Bit errors: As information packets move from place to place, there is always a chance that some bits will be modified. Each packet has a mathematical sum of the bits it contains appended to it. When a receiving router receives a packet whose contents and the appended sum don’t agree, that packet is discarded.  This can occur anyplace in the journey from source to destination.

Deliberate Discard: ATM networks can guarantee that voice or video connections won’t lose bits. Internet packet traffic moves over these same networks, and if it looks like there are too many packets to get all the voice and video through without missing a bit, packets are discarded until there is room for the voice and video.  Similarly Cisco and Nortel backbone routers offer packet discard policies, so the operator of the router can decide which types of traffic will suffer lost packets as the router approaches congestion.

What is causing the delay? 80 milliseconds is a typical North American ping time when going between national backbones. Most of this delay cannot be avoided.  Speed of Light:.  The speed of light in a fiber optic cable works out to 10 milliseconds per thousand miles, for a ping time (due to the speed of light) of 60 milliseconds on a coast-to-coast (US) fiber link.  Remember that long distance routes are not normally as-the-crow-flies, so the distance may be much further than you think, especially with overseas connections.Router in and out time: Routers receive packets before forwarding them.  If a router is sending a 1500 byte packet at T1(1.536 megabits/sec) the time from the first bit to the last bit is 7.8 milliseconds.  If the router is storing packets waiting to send them, then the delay time increases.  We believe this is the reason the measured Global Internet ping time varies.

Congestion Avoidance:  TCP assumes that all packet loss is caused by congestion and responds by reducing the transmission rate. 

Slow Start: When a TCP connection starts (or re-starts if more than one packet has been lost) it sends one packet, waits for the acknowledgment, then sends 2, then 4…and ramps up its transmission pace. Each step in the ramp consumes a round trip delay.

Data Acknowledgments:  The TCP receiver sends an acknowledgment to the sender whenever a segment of information is received. The sender does not assume any data is lost until a multiple of round trip time has elapsed without receiving an acknowledgment, or until it has received multiple duplicate acknowledgments.

Window Size:  TCP can only send a certain amount of data before it must stop transmitting and wait for an acknowledgment. That amount of data is called the window size. The standard window size in TCP is limited to 64 kilobytes.  RFC1323 allows larger windows, but it is not yet usable by applications running on Microsoft platforms.

 

When you factor all of the above, it gives a new perspective on why VPN over the internet is subject to such variability in performance.  If you can experience zero packet loss, your performance rises.  This is what makes MPLS networks so attractive for any applications where performance is important.