better programming for better systems building
Writing correct programs is the foundation of building reliable software systems. Programming language designers constantly seeks new approaches to improve the efficiency, relability, and flexibility of software programming. However, less is known about how effective these new approaches are in the real world. We take an empirical view of new programming languages features by studying real-world usages and issues (i.e., bugs) and by building tools that can help programmers avoid or detect bugs.
Empirical Study on Solidity Smart Contract Gas Issues
The execution of smart contracts on Ethereum, a public blockchain system, incurs a fee called gas fee for its computation and data-store consumption. When programmers develop smart contracts (e.g., in the Solidity programming language), they could unknowingly write code snippets that unnecessarily cause more gas fees. These issues, or what we call gas wastes, could lead to significant monetary waste for users. Yet, there have been no systematic examination of them or effective tools for detecting them. This paper takes the initiative in helping Ethereum users reduce their gas fees in two important steps: we conduct the first empirical study on gas wastes in popular smart contracts written in Solidity by understanding their root causes and fixing strategies; we then develop a static tool, PeCatch, to effectively detect gas wastes with simple fixes in Solidity programs based on our study findings. Overall, we make seven insights and four suggestions from our gas-waste study, which could foster future tool development, language improvement, and programmer awareness, and develop eight gas-waste checkers, which pinpoint 383 previously unknown gas wastes from famous Solidity libraries.
Disaggregation and Program Behavior: A Static-Runtime-Codesign Approach with Mira
Far memory, where memory accesses go to memory on remote servers, has become more popular in recent years as a solution to expand memory size and avoid memory stranding. Prior far memory systems have taken two approaches: transparently swap memory pages between local and far memory, and utilizing new programming models to move fine-grained data between local and far memory. The former requires no program changes but comes with performance penalty, while the latter requires significant program changes, though with increased performance.
We propose a far-memory system that co-designs static program analysis and compilation with run-time systems, called Mira. Mira utilizes program analysis results, profiled execution information, and system environments together to guide code compilation and system configurations for far memory. Our evaluation shows that Mira outperforms prior swap-based and programming-model-based systems up to 18 times.
Co-Optimizing Compilers and Execution Systems for Far Memory
Far memory, where memory accesses go to memory on remote servers, has become more popular in recent years as a solution to expand memory size and avoid memory stranding. Prior far memory systems have taken two approaches: transparently swap memory pages between local and far memory, and utilizing new programming models to move fine-grained data between local and far memory. The former requires no program changes but comes with performance penalty, while the latter requires significant program changes, though with increased performance.
We propose a compiler-system co-designed far-memory system called Cocas. Cocas statically generates code for far-memory accesses and computation offloading. It monitors the execution of compiled applications on our customized far-memory execution platform. Based on profiling results and far-memory system environments, Cocas dynamically optimizes both the compiler and the execution system. Cocas outperforms prior swap-based and programming-model-based systems because it considers program semantics, system environments, and profiled application execution together, as our results demonstrate.
Empirical Study on Rust Safety Bugs
Rust is a young programming language designed for systems software development. It aims to provide safety guarantees like high-level languages and performance efficiency like low-level languages. The core design of Rust is a set of strict safety rules enforced by compile-time checking. To support more low-level controls, Rust allows programmers to bypass these compiler checks to write unsafe code.
It is important to understand what safety issues exist in real Rust programs and how Rust safety mechanisms impact programming practices. We performed the first empirical study of Rust by close, manual inspection of 850 unsafe code usages and 170 bugs in five open-source Rust projects, five widely-used Rust libraries, two online security databases, and the Rust standard library. Our study answers three important questions: how and why do programmers write unsafe code, what memory-safety issues real Rust programs have, and what concurrency bugs Rust programmers make. Our study reveals interesting real-world Rust program behaviors and new issues Rust programmers make.
Empirical Study on Go Concurrency Bugs
Go is a statically-typed programming language that aims to provide a simple, efficient, and safe way to build multithreaded software. Since its creation in 2009, Go has matured and gained significant adoption in production and open-source software. Go advocates for the usage of message passing as the means of inter-thread communication and provides several new concurrency mechanisms and libraries to ease multi-threading programming. It is important to understand the implication of these new proposals and the comparison of message passing and shared memory synchronization in terms of program errors, or bugs. Unfortunately, as far as we know, there has been no study on Go’s concurrency bugs.
We performed the first systematic study on concurrency bugs in real Go programs. We studied six popular Go software including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems. Our study provides a better understanding on Go’s concurrency models and can guide future researchers and practitioners in writing better, more reliable Go software and in developing debugging and diagnosis tools for Go.
Related Publication
Conferences and Journals
DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency
Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, Harry Xu
the 18th USENIX Symposium on Operating Systems Design and Implementation
(OSDI '24)
How to Save My Gas Fees: Understanding and Detecting Real-world Gas Issues in Solidity Programs
Mengting He, Shihao Xia, Boqin Qin, Nobuko Yoshida, Tingting Yu, Linhai Song, Yiying Zhang
arxiv preprint arXiv:403.02661
(arxiv 2024)
Mira: A Program-Behavior-Guided Far Memory System
Zhiyuan Guo, Zijian He, Yiying Zhang
Proceedings of the 29th ACM Symposium on Operating Systems Principles
(SOSP '23)
Understanding Memory and Thread Safety Practices and Issues in Real-World Rust Programs
Boqin Qin*, Yilun Chen*, Zeming Yu, Linhai Song, Yiying Zhang (* co-first authors)
The ACM SIGPLAN Conference on Programming Language Design and Implementation 2020
(PLDI '20)
Understanding Real-World Concurrency Bugs in Go
Tengfei Tu, Xiaoyu Liu, Linhai Song, Yiying Zhang
Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS '19)
(Rated second-most visited URL related to Golang in 2019)