A few that caught my particular eye...
- The Linux scheduler: a decade of wasted cores
Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database. The main contribution of this work is the discovery and analysis of these bugs and providing the fixes.
- Hold 'em or fold 'em?: aggregation queries under performance variations
Due to large performance variations in clusters, some processes are slower. Therefore, aggregators are faced with the question of how long to wait for outputs from processes before combining and sending them upstream. Longer waits increase the response quality as it would include outputs from more processes. However, it also increases the risk of the aggregator failing to provide its result by the deadline.
- A high performance file system for non-volatile main memory
In this paper, we propose HiNFS, a high performance file system for non-volatile main memory. Specifically, HiNFS uses an NVMM-aware Write Buffer policy to buffer the lazy-persistent file writes in DRAM and persists them to NVMM lazily to hide the long write latency of NVMM. However, HiNFS performs direct access to NVMM for eager-persistent file writes, and directly reads file data from both DRAM and NVMM as they have similar read performance, in order to eliminate the double-copy overheads from the critical path.
- pVM: persistent virtual memory for efficient capacity scaling and object storage
pVM extends the OS virtual memory (VM) instead of building on the VFS and abstracts NVM as a NUMA node with support for NVM-based memory placement mechanisms. pVM inherits benefits from the cache and TLB-efficient VM subsystem and augments these further by distinguishing between persistent and nonpersistent capacity use of NVM. Additionally, pVM achieves fast persistent storage by further extending the VM subsystem with consistent and durable OS-level persistent metadata.
- zExpander: a key-value cache with both high performance and fewer misses
In this paper, we show that, by leveraging highly skewed data access pattern common in real-world KV cache workloads, we can both reduce miss ratio through improved memory efficiency and maintain high performance for a KV cache. Specifically, we design and implement a KV cache system, named zExpander, which dynamically partitions the cache into two sub-caches. One serves frequently accessed data for high performance, and the other compacts data and metadata for high memory efficiency to reduce misses.
- A study of modern Linux API usage and compatibility: what to support when you're supporting
This paper presents a study of Linux API usage across all applications and libraries in the Ubuntu Linux 15.04 distribution. We propose metrics for reasoning about the importance of various system APIs, including system calls, pseudo-files, and libc functions. Our metrics are designed for evaluating the relative maturity of a prototype system or compatibility layer, and this paper focuses on compatibility with Linux applications.
- POSIX abstractions in modern operating systems: the old, the new, and the missing
Little has been done to measure how and to what extent traditional POSIX abstractions are being used in modern OSes, and whether new abstractions are taking form, dethroning traditional ones. We explore these questions through a study of POSIX usage in modern desktop and mobile OSes: Android, OS X, and Ubuntu. Our results show that new abstractions are taking form, replacing several prominent traditional abstractions in POSIX.
- Efficient queue management for cluster scheduling
Job scheduling in Big Data clusters is crucial both for cluster operators' return on investment and for overall user experience. In this context, we observe several anomalies in how modern cluster schedulers manage queues, and argue that maintaining queues of tasks at worker nodes has significant benefits. On one hand, centralized approaches do not use worker-side queues. Given the inherent feedback delays that these systems incur, they achieve suboptimal cluster utilization, particularly for workloads dominated by short tasks. On the other hand, distributed schedulers typically do employ worker-side queuing, and achieve higher cluster utilization. However, they fail to place tasks at the best possible machine, since they lack cluster-wide information, leading to worse job completion time, especially for heterogeneous workloads.
You'll notice I've been particularly interested in Non Volatile Memory papers recently. It's a trendy thing, and I'm trying to get my head around it...