Systems & Networking
building next-generation datacenter network systems

Advance in datacenter networking in the past decade has driven a sea change in the way datacenters are organized and managed. We are exploring various datacenter networking issues from the systems perspective, and we are rethinking datacenter network systems for new hardware and software trends. Some of our current focuses include network for resource disaggregation and RDMA network.



Network Task Disaggregation and Consolidation with NetPool

Servers in today's data centers host software and hardware resources for processing network packets. Managing and executing network tasks at each end host can be costly both in capital cost and engineering effort. We propose to disaggregate network resources from end hosts to a separate network resource pool. We propose NetPool, a distributed SmartNIC platform that pools together at the rack scale. Each SmartNIC in NetPool consolidates network functionalities from multiple endpoints by fairly sharing limited hardware resources, and it achieves its performance goals with an auto-scaled, highly parallel data plane and a scalable control plane.


FPGA-Based Multi-Tenant SmartNIC

With CPU scaling slowing down in today's data centers, more functionalities are being offloaded from the CPU to auxiliary devices. One such device is the SmartNIC, which is being increasingly adopted in data centers. In today's cloud environment, VMs on the same server can each have their own network computation (or network tasks) or workflows of network tasks to offload to a SmartNIC. These network tasks can be dynamically added/removed as VMs come and go and can be shared across VMs. Such dynamism demands that a SmartNIC not only schedules and processes packets but also manages and executes offloaded network tasks for different users. Although software solutions like an OS exist for managing software-based network tasks, such software-based SmartNICs cannot keep up with the quickly increasing data-center network speed.
We built a new SmartNIC platform called SuperNIC that allows multiple tenants to efficiently and safely offload FPGA-based network computation DAGs. For efficiency and scalability, our core idea is to group network tasks into chains that are connected and scheduled as one unit. We further propose techniques to automatically scale network task chains with different types of parallelism. Moreover, we propose a fair share mechanism that considers both fair space sharing and fair time sharing of different types of hardware resources. Our FPGA prototype of SuperNIC achieves high bandwidth, low latency performance whilst efficiently utilizing and fairly sharing resources.


RDMA Side-Channel Attack

RDMA is a technology that allows direct access from the network to a machine’s main memory without involving its CPU. While RDMA provides massive performance boosts and has thus been adopted by several major cloud providers, security concerns have so far been neglected.

The need for RDMA NICs to bypass CPU and directly access memory result in them storing various metadata like page table entries in their on-board SRAM. When the SRAM is full, RNICs swap metadata to main memory across the PCIe bus. We exploited the resulting timing difference to establish side channels and demonstrated that these side channels can leak access patterns of victim nodes to other nodes.

Pythia is a set of RDMA-based remote sidechannel attacks that allow an attacker on one machine to learn how victims on other machines access the server’s exported in-memory data. We reverse engineered the memory architecture of the most widely used RDMA NIC and use this knowledge to improve the efficiency of Pythia. We further extended Pythia to build side-channel attacks on Crail, a real RDMA-based key-value store application. Pythia is fast (57μs), accurate (97% accuracy), and can hide all its traces from the victim or the server.


Datacenter Approximate Tranmission Protocol

Many datacenter applications such as machine learning and streaming systems do not need the complete set of data to perform their computation. Current approximate applications in datacenters run on a reliable network layer like TCP and either sample data before sending or drop data after receiving to improve performance. These approaches are network oblivious and transmit (and retransmit) more data than necessary, affecting both application runtime and network bandwidth usage.

We propose to run approximate applications on a lossy network and to allow packet loss in a controlled manner. We designed a new network protocol called Approximate Transmission Protocol, or ATP, for datacenter approximate applications. ATP opportunistically exploits available network bandwidth as much as possible, while performing a loss-based rate control algorithm to avoid bandwidth waste and retransmission. It also ensures bandwidth fair sharing across flows and improves accurate applications’ performance by leaving more switch buffer space to accurate flows.


Indirection Layer for RDMA

Recently, there is an increasing interest in building datacenter applications with RDMA because of its low-latency, high-throughput, and low-CPU-utilization benefits. However, RDMA is not readily suitable for datacenter applications. It lacks a flexible, high-level abstraction; its performance does not scale; and it does not provide resource sharing or flexible protection. Because of these issues, it is difficult to build RDMA-based applications and to exploit RDMA’s performance benefits.

To solve these issues, we built LITE, a Local Indirection TiEr for RDMA that virtualizes native RDMA into a flexible, high-level, easy-to-use abstraction and allows applications to safely share resources.

Find out more about and get LITE here.


Related Publication


Conferences and Journals

SuperNIC: An FPGA-Based, Cloud-Oriented SmartNIC
Will Lin*, Yizhou Shan*, Ryan Kosta, Arvind Krishnamurthy, Yiying Zhang (* equal contribution)
to appear at the 32nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '24) (Best Paper Runner-Up Award)

Pythia: Remote Oracles for the Masses
Shin-Yeh Tsai, Mathias Payer, Yiying Zhang
Proceedings of the 28th USENIX Security Symposium (USENIX SEC '19)

Exploiting Network Loss for Distributed Approximate Computing with NetApprox
Ke Liu, Jinmou Li, Shin-Yeh Tsai, Theophilus Benson, Yiying Zhang
arXiv:1901.01632 (arXiv '19)

LITE Kernel RDMA Support for Datacenter Applications
Shin-Yeh Tsai, Yiying Zhang
Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP '17)

Workshop

Towards a Fully Disaggregated and Programmable Data Center
Yizhou Shan, Will Lin, Zhiyuan Guo, Yiying Zhang
to appear at the 13th ACM Asia-Pacific Workshop on Systems (APSys '22)

User-Defined Cloud
Yiying Zhang, Ardalan Amiri Sani, Guoqing Harry Xu
The 18th Workshop on Hot Topics in Operating Systems (HotOS '21)

A Double-Edged Sword: Security Threats and Opportunities in One-Sided Network Communication
Shin-Yeh Tsai, Yiying Zhang
11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '19)

Building Atomic, Crash-Consistent Data Stores with Disaggregated Persistent Memory
Shin-Yeh Tsai, Yiying Zhang
the 10th Annual Non-Volatile Memories Workshop (NVMW '19)

MemAlbum: an Object-Based Remote Software Transactional Memory System
Shin-Yeh Tsai, Yiying Zhang
the 2018 Workshop on Warehouse-scale Memory Systems (WAMS '18) (co-located with ASPLOS '18)