Wednesday, March 28, 2018

AWS Instances Network Throughput

Enhanced Networking
Network Burst
NIC RSS Feature
AWS Instance Network Throughput

Enhanced Networking

AWS offers instance types with varying network capabilities. Network throughput of instances is advertised as: low, Moderate, High, up to 10 Gbps, 10 Gbps and 20 Gbps. To help Netflix teams to know the true network throughput of AWS instances, internal micro-benchmarks were executed using netperf tool.
New instance types supports Enhanced Networking feature that allows even smaller instances to achieve higher network throughput and low latencies. Enhanced Networking feature allows native driver to run on an instance where it can have direct access (DMA) to subset of NIC hardware resources via PCIe SR-IOV extension that helps achieve low latency networking due to low virtualization overhead.
To check if instance is configured correctly for Enhanced Networking run: $ sudo ethtool -i eth0. If driver field shows: ena or  ixgbevf , then enhanced networking is properly configured.  

Network Burst

In addition to Enhanced Networking, instance families like I3 and R4 offer Network burst feature on smaller instance types (l|xl|2xl|4xl). These instances use network credit model, similar to CPU credit model used for T1 instance family and and IO credit model for EBS GP2 and ST1/SC1 storage. Instance accumulates network credits during low or no network traffic. Larger instance gets more credits and thus can sustain network burst of 6 Gbps for a longer period. Once all network credits are consumed, network throughout is dropped down to base levels. Both Network and IO credit system work best for bursty workloads like Hadoop, large file transfers, that may require higher network throughput for a shorter period of network activity. It is recommended to upgrade to a newer instance types, if your application is still on an old instance type, to achieve better network throughput and lower latency at the same price point. 

NIC RSS Feature

Instance NIC (sriov, ena) supports (RSS) feature that divides the card into multiple logical receive and transmit descriptor queues, called multi-queue devices. When a packet is received, NIC hardware fans out packets to various receive queues (AWS ena NIC can have up to 8 queues) that bound to different cpus. This allows packet processing to scale across multiple cpus to achieve higher throughput and lower latencies. NIC hardware applies a filter to each packets that assigns logical flow and that steers the packet to a assigned receive queue.Filter is typically a hash function that uses TCP network and transport layer headers (4 tuples: src ip, src port and dst ip and dst port) to direct traffic to logical queues.
Thus in order to achieve peak throughput, traffic should be spread across all NIC queues. Each queue is then served by a different cpu and that offers better performance and scalability.
See RSS in action below. As you can see incoming traffic is spread across multiple logical queues on the same physical NIC on m5.24XL instance.

AWS Instance Network Throughput

Network throughput listed was measured internally by running netperf tool. To achieve 23 Gbit/s throughput on an instance, network load is generated in parallel across 18 8xl instances by selecting jumbo frames (BaseAMI is hardwired to 1500 MTU due to lack of jumbo frames support in older AWS instances and mixing MTU on the network results in unpredictable performance)
TypeLXL2XL4XL8XL9XL12XL16XL18XL24XL32XL
T2500 Mbps700 Mbps960 Mbpsxxxxxxxx
I2x700 Mbps960 Mbps2 Gbps9.4 Gbpsxxxxxx
I3*6 Gbps
700 Mbps
6 Gbps
1 Gbps
7 Gbps
2 Gbps
9 Gbps
4 Gbps
9.4 Gbpsxx15 Gbps**xxx
R3500 Mbps700 Mbps960 Mbps2 Gbps5 Gbpsxxxxxx
R4*6 Gbps
700 Mbps
6 Gbps
1 Gbps
7 Gbps
2 Gbps
9 Gbps
4 Gbps
9.4 Gbpsxx15 Gbps**xxx
M4x700 Mbps1 Gbps2 Gbpsxxx12 Gbps**xxx
M5*6 Gbps
700 Mbps
6 Gbps
1 Gbps
7 Gbps
2 Gbps
9 Gbps
4 Gbps
xx9.4 Gbps

xx16 Gbps**

x
C4500 Mbps700 Mbps2 Gbps4 Gbps9.4 Gbpsxxxxxx
C5*6 Gbps
700 Mbps
6 Gbps
1 Gbps
7 Gbps
2 Gbps
9 Gbps
4 Gbps
x9Gbpsxx16 Gbps**xx
D2x700 Mbps2 Gbps4 Gbps9.4 Gbps
xxxxx
X1xxxxxxx9.4 Gbpsxx15 Gbps**
X1e*x6 Gbps
600 Mbps
7 Gbps
1 Gbps
9 Gbps
2 Gbps
9.4 Gbpsxx9.4 Gbpsxx15 Gbps**
*New instance families (I3,R4,M5,C5) offer burstable network throughput on lower instance types (l, xl, 2xl, 4xl). Selecting jumbo frames (9001 MTU) offers much higher burst up to 9.4 Gbps on lower instance types: l, xl, 2xl, 4xl. Once burst credits are exhausted, network throughput drops to baseline level, several folds lower than burst levels.
**Selecting Jumbo Frames (9001 MTU) (sudo ip link set dev eth0 mtu 9001) on instance types: i3.16xl, r4.16xl, m4.16xl, m5.24xl, c5.18xl, c5.36xl, x1.32xl,x1e.32xl, x1.64xl can offer network throughput of 23 Gbps

Thursday, March 15, 2018

Public Cloud Computing Workshop

I was invited at OPEN (Organization Of Pakistani Entrepreneurs) Silicon Valley to deliver "Public Cloud Computing Workshop". Sharing the slides and labs.