building hardware for systems and building systems in hardware
Systems and hardware closely interact with each other. WukLab is taking non-traditional approaches to "building systems" and "building hardware" in response to new hardware and software trends. We build systems on new types of hardware like non-volatile memory; we design new hardware architectures for new software trends; we build systems like OSes in hardware; and we improve hardware experiences with software flexibility.
FPGA-Based Multi-Tenant SmartNIC
With CPU scaling slowing down in today's data centers, more functionalities are being offloaded from the CPU to auxiliary devices. One such device is the SmartNIC, which is being increasingly adopted in data centers. In today's cloud environment, VMs on the same server can each have their own network computation (or network tasks) or workflows of network tasks to offload to a SmartNIC. These network tasks can be dynamically added/removed as VMs come and go and can be shared across VMs. Such dynamism demands that a SmartNIC not only schedules and processes packets but also manages and executes offloaded network tasks for different users. Although software solutions like an OS exist for managing software-based network tasks, such software-based SmartNICs cannot keep up with the quickly increasing data-center network speed.
We built a new SmartNIC platform called SuperNIC that allows multiple tenants to efficiently and safely offload FPGA-based network computation DAGs. For efficiency and scalability, our core idea is to group network tasks into chains that are connected and scheduled as one unit. We further propose techniques to automatically scale network task chains with different types of parallelism. Moreover, we propose a fair share mechanism that considers both fair space sharing and fair time sharing of different types of hardware resources. Our FPGA prototype of SuperNIC achieves high bandwidth, low latency performance whilst efficiently utilizing and fairly sharing resources.
Disaggregating Memory: A Hardware Aproach with Clio
Memory disaggregation has attracted great attention recently because of its benefits in efficient memory utilization and ease of management. So far, memory disaggregation research has all taken one of two approaches, building/emulating memory nodes with either regular servers or raw memory devices with no processing power. The former incurs higher monetary cost and face tail latency and scalability limitations, while the latter introduce performance, security, and management problems.
We seek a sweet spot in the middle of these two extremes by proposing, for the first time, a hardware-based memory disaggregation solution that has the right amount of processing power at memory nodes. We built a hardware-based disaggregated memory system called Clio, which virtualizes and manages disaggregated memory at the memory node. Clio includes a new hardware-based virtual memory system, a customized network system, and a framework for computation offloading. In building Clio, we not only co-design OS functionalities, hardware architecture, and the network system, but also co-design the compute node and memory node. We prototyped Clio’s memory node with FPGA and implemented its client-node functionalities in a user-space library. Clio achieves 100 Gbps throughput and an end-to-end latency of 2.5 µs at median and 3.2 µs at the 99th percentile. Clio scales much better and has orders of magnitude lower tail latency than RDMA, and it has 1.1× to 3.4× energy saving compared to CPU-based and SmartNICbased disaggregated memory systems and is 2.7× faster than software-based SmartNIC solutions.
Distributed Shared Persistent Memory
NVMs have the potential to greatly improve the performance and reliability of large-scale applications in datacenters. However, it is still unclear how to best utilize them in distributed, datacenter environments.
We introduce Distributed Shared Persistent Memory (DSPM), a new framework for using persistent memories in distributed datacenter environments. DSPM provides a new abstraction that allows applications to both perform traditional memory load and store instructions and to name, share, and persist their data. We built Hotpot, a kernel-level DSPM system that provides low-latency, transparent memory accesses, data persistence, data reliability, and high availability.
Get Hotpot here.
Reliable and Highly-Available NVMM
NVMM would be especially useful in large-scale data center environments, where reliability and availability are critical. However, providing reliability and availability to NVMM is challenging, since the latency of data replication can squander the low latency that NVMM can provide.
Mojim is a system that provides the reliability and availability that large-scale storage systems require, while preserving the performance of NVMM. Mojim achieves these goals by using a two-tier architecture in which the primary tier contains a mirrored pair of nodes and the secondary tier contains one or more secondary backup nodes with weakly consistent copies of data. Mojim uses highly-optimized replication protocols, software, and networking stacks. Our evaluation results show that surprisingly Mojim provides replicated NVMM with similar or even better performance than un-replicated NVMM (reducing latency by 27% to 63% and delivering between 0.4 and 2.7X the throughput). Mojim also outperforms MongoDB's replication system by 3.4 to 4X.
Related Publication
Conferences and Journals
SuperNIC: An FPGA-Based, Cloud-Oriented SmartNIC
Will Lin*, Yizhou Shan*, Ryan Kosta, Arvind Krishnamurthy, Yiying Zhang (* equal contribution)
to appear at the 32nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
(FPGA '24)
(Best Paper Runner-Up Award)
Clio: A Hardware-Software Co-Designed Disaggregated Memory System
Zhiyuan Guo*, Yizhou Shan*, Xuhao Luo, Yutong Huang, Yiying Zhang (* equal contribution)
Proceedings of the 27th International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS '22)
Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores
Shin-Yeh Tsai, Yizhou Shan, Yiying Zhang
2020 USENIX Annual Technical Conference
(USENIX ATC '20)
Distributed Shared Persistent Memory
Yizhou Shan, Shin-Yeh Tsai, Yiying Zhang
Proceedings of the ACM Symposium on Cloud Computing 2017
(SoCC '17)
Workshop
Challenges in Building and Deploying Disaggregated Persistent Memory
Yizhou Shan, Yutong Huang, Yiying Zhang
the 10th Annual Non-Volatile Memories Workshop
(NVMW '19)
Building Atomic, Crash-Consistent Data Stores with Disaggregated Persistent Memory
Shin-Yeh Tsai, Yiying Zhang
the 10th Annual Non-Volatile Memories Workshop
(NVMW '19)
Disaggregating Memory with Software-Managed Virtual Cache
Yizhou Shan, Yiying Zhang
the 2018 Workshop on Warehouse-scale Memory Systems
(WAMS '18)
(co-located with ASPLOS '18)
Distributed Shared Persistent Memory
Yizhou Shan, Shin-Yeh Tsai, Yiying Zhang
the 9th Annual Non-Volatile Memories Workshop
(NVMW '18)