How does traceroute work?

This one likes to make an appearance now and then so let’s dive into the lab and get the answer. First of all…my own definition of traceroute: “To identify all Layer 3 network devices in a given path to a destination

Topology

4 routers in the lab running OSPF in area 0. Everyone is advertising their networks into Area 0, so full end to end reachability is possible.

Scenario

Let’s perform a traceroute from vIOS1 to vIOS4:

  • Source: 10.1.0.1/24 (vIOS1)
  • Destination: 10.3.0.4/24 (vIOS4)
  • ***This traceroute on Cisco IOS will use UDP for it’s transport

Exercise

Ok so here we go… lets perform the standard traceroute and get the basic output:

vIOS1#traceroute 10.3.0.4
Type escape sequence to abort.
Tracing the route to 10.3.0.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.0.2 6 msec 5 msec 4 msec
  2 10.2.0.3 7 msec 5 msec 5 msec
  3 10.3.0.4 7 msec *  18 msec

We have a path shown from this source to the destination. Observations.. why do we have 3 responses on each hop? This is down to the probe value and can be set using the extended traceroute command as per below:

In this example we change the probe count to 1 and sure enough we only see 1 response now on each hop

So with this in place.. lets now dive deeper into how exactly traceroute is working. Let’s stick with the probe count of 1 as it will our lives easier in the packet captures. Here we go..

Packet Captures

So if you do a quick Google on how traceroute works, you will more than likely start reading about TTL values and how they change along the way. Before we go any further, lets check this in the capture and then go from there with the how and why.

So I have setup captures on each routers interface and will set off a traceroute and then talk through the results. Let’s go..

vIOS1 Gig0/0 to vIOS2 Gig0/0

vIOS1#traceroute
Protocol [ip]: ip
Target IP address: 10.3.0.4
Ingress traceroute [n]:
Source address: 10.1.0.1
Numeric display [n]:
Timeout in seconds [3]:
Probe count [3]: 1
Minimum Time to Live [1]:
Maximum Time to Live [30]:
Port Number [33434]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Type escape sequence to abort.
Tracing the route to 10.3.0.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.0.2 7 msec
  2 10.2.0.3 9 msec
  3 10.3.0.4 9 msec

What did we see in Wireshark?

vIOS1

Here is the entire capture from the point of starting the traceroute:

PCAP of traceroute

What is exactly going on?

Packet 112 and 113

A UDP packet (112) is encapsulated and the IP header confirms the source and destination IP addresses as expected:

In the same IP header we also see the TTL (Time to Live) field set as 1:

This particular UDP stream contains 1 ICMP packet (113). Let’s take a look at this in more detail:

The IP header now contains different Source and Destination addresses

The Source of this ICMP packet is from 10.1.0.2, which is infact vIOS2 and the Destination of this packet is 10.1.0.1 which is vIOS1. Remember that 10.1.0.2 is the 1st hop that we see in the traceroute.

What else is going on here? Well we can also see the original Src and Dst IP header (10.1.0.1 to 10.3.0.4) encapsulated in the ICMP packet along with the UDP connection details:

What about this TTL stuff? Well let’s take a look at the ICMP packet and what it is telling us about TTL:

TTL of 0 = expired in transit = drop packet!
  • vIOS2 has decremented the TTL of 1 by 1, which leaves 0.
  • A TTL of 0 means the packet is dropped.
  • This is communicated from vIOS2 back to vIOS1 within this ICMP packet.

So what next? How does that help us? Well infact vIOS1 has now discovered the 1st hop in the path to our desired destination. That hop is 10.1.0.2 which we do see within the traceroute. So again… what next?!

vIOS2 now wants to discover the router detail for the 2nd hop in the path. So guess what? The TTL is increased to a count of 2. So we now move to the next UDP stream for this information.

Packet 114 and 115

Lets take a look at that UDP packet..

The Src and Dst IP addresses are still the same as per our traceroute. (vIOS1 and vIOS4)

But wait we can now see the TTL has increased to 2. Knowing what we know so far, we are expecting an ICMP response from vIOS2 right? Maybe! Maybe not! Let’s take a look at packet 115 for this detail:

The ICMP response this time is infact from Source IP 10.2.0.3 which is actually the L3 interface on vIOS3:

Let’s not forget that we are seeing this from the perspective of vIOS1 at this stage.

The Destination address here is also important. This is targeted at vIOS1 on 10.1.0.1, which is infact always going to be the Destination address for these ICMP responses along the way:

The Destination IP is always the same with the ICMP packet targeted each time at the source of the traceroute on vIOS1

But wait, the Source of this ICMP packet is also the 2nd hop address in our traceroute:

So we are starting to learn more about the path and have now discovered hop 1 and hop 2. But wait what about the concept with the TTL? What happened here with this 2nd ICMP packet?

The response was sure enough ‘Time to live exceeded in transit’ which must mean the reported router had decremented the TTL to the point of 0 and is reporting back to drop the packet. Let’s expand on this…

The reporting router is 10.2.0.3, so we need to shift out attention to this router what it has done in order to help discover the path.

Pcaps on vIOS2

Here are captures on vIOS2. Let’s start with packet 104:

Packet 104 shows the UDP connection arrive from vIOS1 / 10.1.0.1 destined for 10.3.0.4. Note the TTL is 1!

Packet 105 shows the ICMP packet sent back to vIOS1 / 10.1.0.1:

  • TTL is 0, therefore expired in transit.
  • The original IP header is encapsulated in the ICMP packet as before.
  • The ICMP response is a Type 11

So really this is just confirmation around the 1st UDP / ICMP connection. Let’s shift our attention back to that 2nd UDP stream and ICMP response..

Packet 106 and 107

Ok so here things are really heating up on this whole TTL concept. Let’s break this down clearly:

  • 106 – vIOS2 has received the UDP packet from vIOS1 with our original Src and Dest IP addresses as per the trace
  • 106 – The TTL field is now set to 2. Why is this set to 2? Because vIOS1 set it as 2! Remember this packet is from vIOS1 who is attempting to discover the network path to 10.3.0.4. vIOS2 will therefore pass this onto the next hop router and will NOT drop the packet. So we should be expecting a response from vIOS3 right?
  • 107 – ICMP packet from 10.2.0.3 / vIOS3 – Correct! The ICMP response shows the TTL of 0 and therefore expired in transit with the response going back to 10.1.0.1 being vIOS1.

So lets come to earth and summarise this before we go any deeper:

  • Each time vIOS1 / Source of the traceroute learns about a next hop in the path it will increment the TTL by 1.
  • Packet 108 the 3rd UDP stream confirms this in the IP header with a TTL of 3:
  • For each UDP packet that we send, we will always expect the ICMP response to come back from the next hop in the path
  • The destination IP for each ICMP response will always be the Source IP of the traceroute (We are the ones who need to know the path remember!)

Final Response / Last Hop

Back to vIOS1 now and let’s look at ICMP packet 117 from 10.3.0.4:

Sure enough as it was the 3rd hop we are attempting to discover, we sent a TTL of 3.

The ICMP response this time is a little different. This time the response is from 10.3.0.4 which is the final hop in the traceroute and infact our destination in the traceroute. What is going on here then..

Well the ICMP response this time states ‘Destination Unreachable’ with a Type 3 ICMP response:

‘The ICMP Destination Unreachable message is sent by a router in response to a packet which it cannot forward because the destination (or next hop) is unreachable or a service is unavailable’ My definition of this is.. ‘There is nowhere else to go!’ Therefore a Type 3 ICMP packet is sent back to vIOS1.

TTL by design

Outside of traceroute.. TTL exists within the IP Header for what purpose?

The point of the TTL/hop limit is to keep streams of undeliverable packets stuck in routing loops (perhaps due to incorrect routing tables) from circulating forever and clogging up the networks in question

How does it normally operate outside of a traceroute?

‘An IP TTL is set initially by the system sending the packet. It can be set to any value between 1 and 255; different operating systems set different defaults. Each router that receives the packet subtracts at least 1 from the count; if the count remains greater than 0, the router forwards the packet, otherwise it discards it and sends an Internet Control Message Protocol (ICMP) message back to the originating host, which may trigger a resend.’

Infact we can actually see this happening in this lab example with the IP Header in the ICMP responses:

Packet 113 with a received TTL of 255 from 10.1.0.2:

Packet 115 with a received TTL of 254 from 10.2.0.3:

Packet 117 with a receive TTL of 253 from 10.3.0.4:

Summary

Traceroute is essentially taking advantage of the existing TTL functionality in the IP Header (In combination with ICMP) to discover each next hop router in the path to the destination.

Starting with a TTL of 1, when we discover each next hop, we increment the TTL by 1 to then pass the 1st known router to the 2nd, then onto the 3rd, 4th, 5th etc… each time with the TTL being set and decremented accordingly to allow the router to drop the packet and respond back to the originator via ICMP.