Before proceeding to this use case, it would be better if the reader is acquainted with some basic SDN concepts.
Modern data center networks rely on multi-rooted topologies that offer many alternative data paths between any pair of hosts. As an example, a fattree topology is shown below.
Fattree topology with 4 core switches
A network flow can be categorized as an elephant when it is long lasting and bandwidth demanding. Other short lived flows are termed as mice flows. Set of elephant flows, even if it is less in number can cause network congestion and affect other latency sensitive mice flows. In order to avoid congestion, we need to identify the elephant flow and apply congestion avoidance or traffic engineering strategies like rerouting a flow to a new path or separate paths for mice and elephant etc.
Elephant flow detection can be done in following ways:
- Continuous polling of edge switches in the network
- Early detection at host side (Motive of this usecase)
The capabilities of netfilter_queue library can be exploited by the hosts to tag the elephant flow packets on host. SDN controller can calculate and install the best route for these tagged flows.
For simulating an SDN enabled topology, we have used mininet with light-weight Docker containers as hosts. The said agents will reside on all hosts and intercept the outgoing traffic.
(SRC_IP,DST_IP,SRC_PORT,DST_PORT,PROTOCOL).
And accordingly, the structure of a node in the hashed table is:
struct node{ key; //Same as defined above flowSizeCounter; //Number of bytes transferred over an arbitrary time interval node* next; }
So the idea is to modify differentiated service (DS) field in IPv4 header when the flowSizeCounter (over a time interval) exceeds a particular threshold. One thing to keep in mind is that the ip header checksum needs to be recalculated each time a modification is done.
Callback function implementation
Once we get a packet for processing, we cast the packet data to a local structure representing an IPv4 header . Consequently, we get the tcp header and thus the key for identifying a flow. Note that SRC_PORT and DST_PORT fields of the key are obtained from tcp header and rest from IPv4 header. We either insert a new key or update flowSizeCounter for a particular key in the hashed table.
If flowSizeCounter for the key is found to exceed the threshold number of bytes, we set the DS field of IPv4 header to 192 (Can be any arbitrary value). And since the packet is modified, ip header checksum is recalculated using the algorithm in rfc1071 . And the modified packet is then passed over the network by setting verdict as NF_ACCEPT.
Note: The hashed table is flushed over regular time interval using a timer thread. In order to maintain mutual exclusion while accessing the hashed table, a mutex lock is placed every time a modification happens.