Troubleshooting WAN Performance Issues

MRTG-Picture-Large

You have an MPLS or VPLS network and your clients in field offices are complaining.  Your network should be performing better and you can’t figure out what the problem is.  You can contact MPLS-Experts to have their engineers solve your problem, but you want to try to solve the problems yourself.

1. The first thing to check, seems trivial, but you need to confirm that the ports on your router and switch ports are configured for the same speed and duplex. Log into your switches and check the logs for  mismatches of speed or duplex.  Auto-negotiation sometimes does not work properly, so a 10M port connected to a 100M port is mismatched.  Or you might have a half-duplex port connected to a full-duplex port.  Don’t assume that a 10/100/1000 port is auto-negotiating correctly!

2. Is your performance problem consistent?  Does it occur at roughly the same time of day?  Or is it completely random? If you don’t have the monitoring tools to measure this, you are at a big disadvantage in resolving the issues on your own.

3. Do you have Class of Service configured on your WAN?   Do you have DSCP configured on your LAN?  What is the mapping of your DSCP values to CoS?

4. What kind of applications are traversing your WAN?  Are there specific apps that work better than others?

5. Have your reviewed bandwidth utilization on your carrier’s web portal to determine if you are saturating the MPLS port of any locations?  Even brief peaks will be enough to generate complaints.  Large files, such as CAD drawings, can completely saturate a WAN link.

6. Are you backing up or synchronizing data over the WAN?  Have you confirmed 100% that this work is completed before the work day begins.

7. Might your routing be taking multiple paths and not the most direct path?  Look at your routing tables.

8 . Next, you want to see long term trend statistics.  This means monitoring the SNMP streams from all your routers, using tools such as MRTG, NTOP or Cacti.  A two week sampling should provide a very good picture of what is happening on your network.

NTOP allows you to

  • Sort network traffic according to many protocols
  • Show network traffic sorted according to various criteria
  • Display traffic statistics
  • Store on disk persistent traffic statistics in RRD format
  • Identify the identity (e.g. email address) of computer users
  • Passively (i.e. without sending probe packets) identify the host OS
  • Show IP traffic distribution among the various protocols
  • Analyse IP traffic and sort it according to the source/destination
  • Display IP Traffic Subnet matrix (who’s talking to who?)
  • Report IP protocol usage sorted by protocol type
  • Act as a NetFlow/sFlow collector for flows generated by routers (e.g. Cisco and Juniper) or switches (e.g. Foundry Networks)
  • Produce RMON-like network traffic statistic

MRTG (Multi-Router Traffic Grapher) provides easy to understand graphs of your network bandwidth utilization.

MRTG Picture

Cacti requires a MySQL database.  It is a complete network graphing solution designed to harness the power of RRDTool‘s data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.

Both NTOP and MRTG are freeware applications that will run on the freeware versions of Linux.  As a result, they can be installed on almost any desktop computer that has out-lived its value as a Windows desktop machine.  If you are skilled with Linux and networking, and you have the time, you can install this monitoring system on your own. You will need to get your carrier to provide read-only access to your router SNMP traffic.

But you might find it more cost effective to have the engineers at MPLS-Experts do the work for you.  All you need to do is provide an available machine with a Linux install (Ubuntu, CentOS, RedHat, etc) with remote access via a VPN.  Our engineers will then download all the software remotely, install and configure the machine.  When we are done with the monitoring, beside understanding how to solve your problem (and solving it!) you will have your own network monitoring system installed for your use on a daily basis.  We’ll teach you how to use it, which is quite simple using the web based tools, so you can view it from any machine on your network.

If you need assistance in troubleshooting your wide area network, contact MPLS-Experts today!

 

Troubleshooting VPLS and Ethernet Tunnels over MPLS

LSP ping has limited efficiency in troubleshooting both VPLS and point-to-point Ethernet tunnels over MPLS. There are a number of reasons:

1. Ethernet P2P or VPLS is implemented on an additional layer over the LSP; and that is referenced as a psuedowire or Martini tunnel.This additional layer has its own headers and ID. In addition, the psuedowire has its own signaling procedure that should take place over the LSP. The mapping between a pseudowire and a LSP is not necessarily one-to-one: multiple psuedowires can ride the same LSP. The LSP might be up and passing traffic for some psuedowires but not the others. Many reasons can lead to this, for example an MTU mismatch on the psuedowire end-points on the ports facing the customer. In such case the LSP ping will succeed while the psuedowire is down and hence the customer service frames cannot get through. For this reason, some vendors (such as Cisco) came up with an additional “ping” that operates on the psuedowire level.. In this ping a user can specify the individual psudowire ID on which the ping traffic rides. This is a good solution for the problem however it has its own limitations [described in 2 below]

2. Since the pseudowire (and its underlying LSP) are implemented over MPLS, then the scope/reach of both pseuedowire and LSP ping tools in troubleshooting an Ethernet service is limited to the provider edge devices (where MPLS terminates). This means it does not include the access tail circuits (the local loops) all the way to the customer CPE. These access tails are more likely to fail than a provider’s MPLS backbone.

3. Hence, Ethernet Service OAM (SOAM) can provide better troubleshooting techniques. that is achieved by the loopback request/reply LBM/LBR (AKA Ethernet or CFM ping) and LTM/LTR (CFM link trace request/reply). Ethernet loopback and link trace are very useful in troubleshooting Ethernet services since they can cover the end-to-end service and they are guaranteed to ride the exact tunnel carrying the customer service frames. a LBM or LTM that is sent on an UP MEP covers the provider backbone (whether MPLS or any other encapsulation); likewise a LBM or LTM that is sent on a DOWN MEP cover the access tail circuit all the way to the end customer CPE.

4. Another advantage for SOAM is that Y.1731 can provide service and SLA assurance through its Delay Measurement Message/Reply (DMM/DMR). frame delay, and frame delay variation measurements are provided. a DMM/DMR pair can verify not only the continuty of the service but also the compliance to the SLA.

5. One carrier that has excellent troubleshooting tools that it puts in its customers’ hands at no charge is Inteliquent. Their EtherVision web tools implement all the mentioned techniques: MPLS PW ping, SOAM LBM/LBR, SOAM LTM/LTR, and DMM/DMR. All these tools and techniques are available on their EtherVision web portal so that customers can continuously monitor the continuity and performance of their services.

If you need consulting services to troubleshoot or design your network, contact us!

 

Troubleshooting MPLS Network Performance Issues

Are you having performance issues on your MPLS network? Does the carrier tell you that everything is OK on their end?  That the problem is on your internal network?  The fact is, troubleshooting these performance issues is complicated.  Our engineers are very good at this process, having worked both on the carrier and enterprise environments.

Troubleshooting your problem requires access to your network.  So the steps are:

  • Obtain access to  your network via a VPN
  • Run tests
  • Collect results

This sounds easy… but it is definitely not.

The time required for testing process will depend on how intermittent the performance issue is.

Then we’ll be able to do one of two things:

  1. fix the problem
  2. demonstrate in a provable way that the problem is in the provider’s infrastructure

If you have a network performance issue and need a third party analysis of the problem, please contact MPLS-Experts.

 

 

My network is slow. Why? Do you have network monitoring in place?

Network performance is always a hot topic to discuss.  When performance slows, it is easy to blame the carrier.  But often the problem is due to your own LAN or server applications.  How can you figure out what the problem is?

Unless you have centralized network monitoring installed on your network, you very likely will never resolve your performance issues.

Most people are familiar with SNMP (Simple Network Management Protocol) since nearly every network device supports it.  SNMP is fine to keep track of devices that devices are attached and operating, but beyond that, it places a great deal of overhead traffic on your network. It uses Polling, running information back and forth on the network.  But SNMP won’t provide much troubleshooting information.

Products such as NetFlow, Sflow, Jflow and IPFIX are common standards for Flow Records.  Flow Records follow the flow of packets source IP address, destination IP address, source port, destination port, layer 3 protocol type, type of service (TOS) byte, and input logical interface.  Flow analysis compiles and collects packet samples entering the switches and routers, providing good data for analysis.  Flow analysis uses statistical sampling, so not every packet is collected.  There are some freeware applications that run on Linux that are worth investigating.

Flow-based analysis relies heavily on the same hardware being used to control network traffic: the routers and switches themselves.  On busy networks,  conflicts for hardware resources like processing power and memory can result. It is the flow analysis that loses when conflicts occur. While it does allow for some troubleshooting, like identifying users who are hogging bandwidth, for example, it does not include any payload information, nor are the packets saved, limiting one’s ability to troubleshoot the network intelligently. (This explains one reason why routers have options for additional memory.)

Packet based monitoring is the most comprehensive tool.  The common term “packet sniffing” is done by capturing every packet traversing the network.  The packets are then decoded and analyzed, allowing analysis right down to the application level.  The server collecting your data can be accessed whenever a network problem arises, so you can see exactly what has happened.  You can go back in time which is especially helpful with intermittent problems that are difficult to reproduce.  Finally, you will also want to collect Payload information, which is the linkage between networking and application information.  Then all the data you need is available.  But this approach also is the most expensive approach.

Here are a few links worth visiting learn about monitoring applications:

MRTG – Multi Router Traffic Grapher:   http://oss.oetiker.ch/mrtg/

NTOP Netflow Probe: http://www.ntop.org/solutions.html

WinPcap: http://www.winpcap.org/

PRTG Network Monitor: http://www.paessler.com/prtg/

EtherApe: http://etherape.sourceforge.net/

Wild Packets: http://www.wildpackets.com/products/network_analysis

Solarwinds:  http://www.solarwinds.com/products/

MPLS-Experts has the technical resources to help you resolve your network performance challenges.  Contact us for more information.

MPLS troubleshooting with LSP Ping, not "regular" Ping

Everyone is familiar with the Ping command to measure latency and packet loss on networks.  When troubleshooting MPLS networks, your familiar Ping command can provide some information.  But you really should be using LSP Ping.

LSP Ping encapsulates the UDP packet in the MPLS header.  The TTL of the label stack is set to 255 and the TTL of the encapsulated UDP packet is set to 1. This assures that the labeled packet reaches the destination, if the end-to-end path is not broken. If there is no continuous Label Switched Path between the originating router and the Forwarding Equivalance Class router, an intermediate router receives an exposed IP packet, decrements the TTL to zero and sends an error as the reply, exposing the flaw in the network.  This would not be the case with the more conventional Ping command.

Click here for information on troubleshooting VPLS and Ethernet tunnels over MPLS.

For more information, visit  http://www.mpls-experts.com/technical-resources-about-mpls/  and click on Troubleshooting MPLS Networks with LSP Ping.

MPLS-Experts provides consulting engineering services to trouble shoot networks where carriers have failed.  Our engineers have a wealth of experience working on the carrier side with all the major global MPLS carriers.