Disaggregation & Serverless
building next-generation serverless datacenters

Datacenters have been using a “monolithic” server model for decades, where each server hosts a set of hardware devices like CPU and DRAM and runs an OS or hypervisor on top to manage the hardware resources. In recent years, cloud providers offer a type of service called “serverless computing” in response to cloud users’ desires for not managing clusters and for auto-scaling their applications to the right size. Serverless computing quickly gained popularity, but today’s serverless computing still runs on servers. The monolithic server model is not the best fit for serverless computing and it fundamentally restricts datacenters from achieving efficient resource packing, hardware rightsizing, and great heterogeneity.

We propose to fully or partially “disaggregate” monolithic servers into network-attached hardware components that host different hardware resources and offer different functionalities (e.g., a computation pool for running application logics, a memory pool for enlarged and consolidated memory spaces, a persistent-memory pool for fast accesses to key-value data). With such a “serverless” datacenter, hardware resources can be allocated and scaled to the exact amount that applications use and can be individually managed and customized for different application needs. WukLab is taking piorneering efforts in building end-to-end, multi-layer solutions for the next-generation serverless datacenter, including new hardware platforms, a new operating system, new distributed systems, a new virtualization platform, new networking systems, and new security solutions.

Serverless Computing on Disaggregated Resources

We are building a new serverless-computing platform on disaggregated resource clusters. Users can write functions just like traditional POSIX-compatible programs, with the additional support from our platform for managing application "states", IPC-like communication across functions, and improved security and QoS guarantees.

Stay tuned for more details.

Hardware-Based Disaggregated Memory Services

We are building a hardware platform which provides disaggregated and distributed memory services to both existing datacenter servers and disaggregated processors. Our hardware-based disaggregated memory services can be accessed by virtual memory interface, key-value store interface, and other customized interfaces.

Stay tuned for more details.

Network for Resource Disaggregation

The need to access remote resources and to access them fast demands a new network system for future disaggregated datacenters. We are building LegoNET, a new network system designed for adding disaggregated resources to existing datacenters in a non-disruptive way, while delivering low-latency, high-throughput performance and a flexible, easy-to-use interface.

Disaggregated Operating System

The monolithic server model where a server is the unit of deployment, operation, and failure is meeting its limits in the face of several recent hardware and application trends. To improve resource utilization, elasticity, heterogeneity, and failure handling in datacenters, we believe that datacenters should break monolithic servers into disaggregated, network-attached hardware components. Despite the promising benefits of hardware resource disaggregation, no existing OSes or software systems can properly manage it.

We propose a new OS model called the splitkernel to manage disaggregated systems. Splitkernel disseminates traditional OS functionalities into loosely-coupled monitors, each of which runs on and manages a hardware component. A splitkernel also performs resource allocation and failure handling of a distributed set of hardware components. Using the splitkernel model, we built LegoOS, a new OS designed for hardware resource disaggregation. LegoOS appears to users as a set of distributed servers. Internally, a user application can span multiple processor, memory, and storage hardware components. We implemented LegoOS on x86-64 and evaluated it by emulating hardware components using commodity servers. Our evaluation results show that LegoOS’ performance is comparable to monolithic Linux servers, while largely improving resource packing and reducing failure rate over monolithic clusters.

Find out more about and get it here.

Disaggregated Persistent Memory

One viable approach to deploy persistent memory (PM) in datacenters is to attach PM as self-contained devices to the network as disaggregated persistent memory, or DPM. DPM requires no changes to existing servers in datacenters; without the need to include a processor, DPM devices are cheap to build; and by sharing DPM across compute servers, they offer great elasticity and efficient resource packing.

We propose three architectures of DPM: 1) compute nodes directly access DPM (DPM-Direct); 2) compute nodes send requests to a coordinator server, which then accesses DPM to complete a request (DPM-Central); and 3) compute nodes directly access DPM for data operations and communicate with a global metadata server for the control plane (DPM-Sep). Based on these architectures, we built three atomic, crash-consistent data stores.

Indirection Layer for RDMA

Recently, there is an increasing interest in building datacenter applications with RDMA because of its low-latency, high-throughput, and low-CPU-utilization benefits. However, RDMA is not readily suitable for datacenter applications. It lacks a flexible, high-level abstraction; its performance does not scale; and it does not provide resource sharing or flexible protection. Because of these issues, it is difficult to build RDMA-based applications and to exploit RDMA’s performance benefits.

To solve these issues, we built LITE, a Local Indirection TiEr for RDMA that virtualizes native RDMA into a flexible, high-level, easy-to-use abstraction and allows applications to safely share resources.

Find out more about and get LITE here.

Related Publication

Conferences and Journals

Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores
Shin-Yeh Tsai, Yizhou Shan, Yiying Zhang
2020 USENIX Annual Technical Conference (USENIX ATC '20)

Storm: A Fast Transactional Dataplane for Remote Data Structures
Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran, Liran Liss, Michael Wei, Dan Tsafrir, Marcos Aguilera
Proceedings of the 12th ACM International Systems and Storage Conference (SYSTOR '19) (Best Paper Award)

LegoOS: A Disaggregated, Distributed OS for Hardware Resource Disaggregation
Yizhou Shan, Yutong Huang, Yilun Chen, Yiying Zhang
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) (Best Paper Award)


Challenges in Building and Deploying Disaggregated Persistent Memory
Yizhou Shan, Yutong Huang, Yiying Zhang
the 10th Annual Non-Volatile Memories Workshop (NVMW '19)

Building Atomic, Crash-Consistent Data Stores with Disaggregated Persistent Memory
Shin-Yeh Tsai, Yiying Zhang
the 10th Annual Non-Volatile Memories Workshop (NVMW '19)

Disaggregating Memory with Software-Managed Virtual Cache
Yizhou Shan, Yiying Zhang
the 2018 Workshop on Warehouse-scale Memory Systems (WAMS '18) (co-located with ASPLOS '18)

MemAlbum: an Object-Based Remote Software Transactional Memory System
Shin-Yeh Tsai, Yiying Zhang
the 2018 Workshop on Warehouse-scale Memory Systems (WAMS '18) (co-located with ASPLOS '18)

Split Container: Running Containers beyond Physical Machine Boundaries
Yilun Chen, Yiying Zhang
the 2018 Workshop on Warehouse-scale Memory Systems (WAMS '18) (co-located with ASPLOS '18)

Disaggregated Operating System
Yiying Zhang, Yizhou Shan, Sumukh Hallymysore
the 17th International Workshop on High Performance Transaction Systems (HPTS '17)