Wednesday, October 19, 2016

VLDB 2016

So I finally got around to looking at the full program from last month's 42nd International Conference on Very Large Data Bases (VLDB) 2016

What a massive conference this has become!

Anyway, a few things that particularly appealed to me:

  • Multi-Version Range Concurrency Control in Deuteronomy
    In this paper, we enhance our multi-version timestamp order technique to handle range concurrency and prevent phantoms.
  • Not for the Timid: On the Impact of Aggressive Over-booking in the Cloud
    In this paper, we examine policies that inherently tune the system’s idle sensitivity. Increased sensitivity to idleness leads to aggressive over-booking while the converse leads to conservative reclamation and lower utilization levels. Aggressive over-booking also incurs a “reserve” capacity cost (for when we suddenly “owe” capacity to previously idle databases.) We answer these key questions in this paper: (1) how to find a “good” resource reclamation policy for a given DBaaS cluster of users; and (2) how to forecast the needed near-term reserve capacity.
  • Incremental Computation of Common Windowed Holistic Aggregates
    This paper provides the first in-depth study of how to efficiently implement the three most common holistic windowed aggregates (count distinct, mode and quantile) by reusing the aggregate state between consecutive frames.
  • Aerospike: Architecture of a Real-Time Operational DBMS
    In this paper, we describe the solutions developed to address key technical challenges encountered while building a distributed database system that can smoothly handle demanding real-time workloads and provide a high level of fault tolerance. Specifically, we describe schemes for the efficient clustering and data partitioning for the automatic scale out of processing across multiple nodes and for optimizing the usage of CPUs, DRAM, SSDs and networks to efficiently scale up performance on one node.
  • Comdb2 Bloomberg’s Highly Available Relational Database System
    Comdb2 is a distributed database system designed for geographical replication and high availability. In contrast with the latest trends in this field, Comdb2 o↵ers full transactional support, a standard relational model, and the expressivity of SQL. Moreover, the system allows for rich stored procedures using a dialect of Lua. Comdb2 implements a serializable system in which reads from any node always return current values. Comdb2 provides transparent High Availability through built-in service discovery and sophisticated retry logic embedded in the standard API.
  • How Good Are Query Optimizers, Really?
    Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using an- other set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates.
  • The End of Slow Networks: It’s Time for a Redesign
    The next generation of high-performance networks with re- mote direct memory access (RDMA) capabilities requires a fundamental rethinking of the design of distributed in-memory DBMSs. These systems are commonly built under the assumption that the network is the primary bottleneck and should be avoided at all costs, but this assumption no longer holds.
  • Accelerating Analytics with Dynamic In-Memory Expressions
    With Oracle Database 12.2, Database In-Memory is further enhanced to accelerate analytic processing through a novel lightweight mechanism known as Dynamic In-Memory Expressions (DIMEs). The DIME mechanism automatically detects frequently occurring expressions in a query workload, and then creates highly optimized, transactionally consistent, in-memory columnar representations of these expression results.
  • Distributed Data Deduplication
    In this paper, we show how to further speed up data deduplication by leveraging parallelism in a shared-nothing computing envi- ronment. Our main contribution is a distribution strategy, called Dis-Dedup, that minimizes the maximum workload across all worker nodes and provides strong theoretical guarantees.

I'm not lacking for fascinating things to read!

No comments:

Post a Comment