The end of the router as we know it

This article was first published in Capacity Media on September 29, 2020. 

You can read the original article here.

Hannes Gredler, CTO at RtBrick, reflects on the evolution of the not so humble router

Since their first commercial deployments network routers have evolved significantly. For the past two decades network equipment vendors have had a difficult time keeping pace with the rapid adoption of internet services and demand for bandwidth hungry applications. Service providers also struggled to capitalise on their investment, and consistently asked vendors to help them build value added services. Critics of router-based network services argue that adding but never removing features has driven up the cost of the routers to a point where the advantages and cost-savings achieved through technological advances are no longer passed down to customers. Here are some of the functions that have become redundant during these technology developments, but operators are still paying for.

Hardware-buffer
In their 2004 paper Sizing Router Buffers, researchers at Stanford University wrote that buffers as deep as 2000ms have served well for low bandwidth services and flow diversity of order of hundreds like in the days of the internet. However, with speeds of 10GBits/s and flow diversity of 10 million flows on a typical internet backbone link the value of buffering becomes highly doubtful. The benefits are becoming even more questionable, considering that there is no signalling to the Transmission Control Protocol (TCP) layer during “buffering”. Most routers, however, still support a 100ms+ buffer depth for 100GB/s circuits, which means that you need 1.25 GB DDR4-RAM for every 100 GBit/s port in a router.

This DDR4 RAM, sometimes the most expensive form of RAM, makes the buffer the biggest driver for today’s router hardware cost. Not only is it needed for a function that rarely works with today’s internet backbone usage patterns, it must also be implemented as off-chip memory, which increases the cost of external I/O, power consumption, and cooling.

Hardware-forwarding tables

The next biggest cost factor for a router’s data plane is the size of its forwarding table. Contemporary hardware can store approximately two million forwarding entries in IPv4, IPV6, and MPLS forwarding tables. The design of those forwarding engines has been driven by two thoughts:

Firstly, a single forwarding entry can consume a large amount of traffic. A single prefix may even consume the entire bandwidth of a link. Today, this continues to be a true assumption, as content delivery networks (CDNs), and Web 2.0 companies, attract a large part of internet traffic to only a few IP prefixes.

Secondly, all forwarding entries may carry the full link bandwidth. This is clearly invalid today. Like all “organic” systems, the internet shows an exponential distribution for its “traffic-per-prefix” curve, which means that the chip design for data forwarding must be radically revised. Rather than treating each IP forwarding entry equally, a memory cache hierarchy is much more practical. This is comparable to today’s computer designs: a tiered memory hierarchy with Level-1, Level-2, and Level-3 memory, at varying speeds and cost.

Modern IP routers still work at only one storage level, on the assumption that every forwarding entry must be fast. However, analysis of the real backbone traffic data shows that this is no longer the case. In fact, the forwarding tables are now 10 times oversized for practical use. The good news is that the hardware can be easily optimized. Customers only need to clearly articulate what is needed so that the next generation of forwarding hardware can be adapted accordingly. Network software, on the other hand, is a different class of problem.

Software features
It is difficult to say which software functions are actually redundant and which not, since these depend on the individual needs of a telecommunications company. Over time, manufacturers have developed a wide range of features at the request of network operators and added them directly into the code. However, this procedure makes it impossible to deactivate certain functions after implementation. This can quickly become a cost factor as network operators have to pay for these features even when they are not being used. At the same time, when a new function is introduced, interference tests must be carried out every time for all existing functions – even for those that are not required.

The reason for this is that the software for the router was previously programmed as a monolithic system, whereby new functions were closely linked to the underlying infrastructure. The removal of such functions from the code base can be as complex as their original development. At the same time, the hurdle for a functional expansion increases with each new feature.

In today’s highly competitive telecommunications market, however, service providers rely on their systems being agile, easy to maintain, tailored to their needs and those of their customers. As such, router system manufacturers have to take a new, more cost-effective approach that enables network operators to flexibly and smoothly manage, update and if necessary, remove functions.

The solution: Disaggregated Systems, Distributed SDNs & Modular Code
Disaggregation of hardware and software allows network operators to choose between different bare metal switches and validated network software. They can take advantage of the latest chip generations and thereby strengthen their innovation potential. However, this means that the responsibility for function management lies entirely with network operators. A distributed software-defined network (SDN) offers the ideal conditions for this. It combines the advantages of SDN with the benefits of a distributed control level and thus enables smooth management of the software.

To ensure that functions can be added and removed smoothly, the code should be structured in such a way that it can be put together from individual blocks of code, so called “composable” code. These should be able to be swapped or removed again as needed, with no interdependencies between the blocks. A “cloud-native” design like this brings many advantages, with independent micro services deployed and running in containers. If a new function or an update is required, a corresponding container is supplied by the software developer, which updates or adds the respective feature within milliseconds, without interrupting the service. This way, route processing, updating, and restarting are 20 times faster than with conventional router operating systems. If open interfaces are also available, network operators can even develop and implement their own functions.

Traditional routers and dynamic control systems are challenged by new concepts such as disaggregation and distributed SDNs. They are promising significantly faster implementation, automated control, and a shorter time to market. For future router designs to meet these challenges, fundamentally new router hardware and software must be developed, and modern software architectures and paradigms introduced.