Thursday, July 19, 2012

Postcards from the high end...

These two items are not really related, but consider them two postcards from the high-end of database-y things:

  • Firstly, Amazon rolled out this week what they call their "High I/O" instance configuration: New High I/O EC2 Instance Type - hi1.4xlarge - 2 TB of SSD-Backed Storage:
    we are introducing a new family of EC2 instances that are designed to run low-latency, I/O-intensive applications, and are an exceptionally good host for NoSQL databases such as Cassandra and MongoDB.
    The instance is certainly impressive, as described by Werner Vogels: Expanding The Cloud – High Performance I/O Instances for Amazon EC2
    The hi1.4xlarge has 8 cores and 60.5GB of memory. Most importantly it has 2 SSDs of 1 TB each and a 10 Gb/s Ethernet NIC that using placement groups can be directly connected to other High I/O instances.

    The SSDs will give you very high IO performance: for 4k random reads you can get 120,000 IOPS when using PV and 90,000 when using HVM or Windows. Write performance on SSDs is more variable depending on, among other things, the free space the disk, the fragmentation and the type of filesystem. With PV virtualization we are seeing between 10,000 and 85,000 IOPS for 4k random writes and with HVM between 9,000 and 75,000.

    Distributed filesystem guru Jeff Darcy gives the new instance a spin, concluding that they indeed deliver tremendous performance, but noting that they are also quite costly to provision compared to lower-end instance.

  • Meanwhile, I happened to see some fascinating information about a company I've been intermittently following, Clustrix, who are rolling out their latest high-end system, Clustrix 4.0. Curt Monash covers the announcement: Clustrix 4.0 and other Clustrix stuff
    Clustrix technical notes include:
    • Clustrix is MVCC (Multi-Version Concurrency Control).
    • Clustrix exploits MVCC to allow online, lockless schema changes. Clustrix says these changes are typically single-column, for example an add or a widening/datatype change.
    • Clustrix indexes are a mix of b-trees and log-structured merge files.
    • Clustrix sounds like it’s paid attention to being multi-core. For example, DR replication is via parallel, multi-core log streaming, going single-core only when transactions have the potential to influence each other.
    • MySQL features Clustrix lacks include triggers and XML support.
    • Clustrix uses MLC flash.
    Note that Clustrix is way not cheap: their pricing sheet starts at 150,000, although that price includes both the software and the hardware, I believe.
Anyway, enjoy some o' that Big Data info!

1 comment:

  1. Clustrix pricing starts at $4,400/mo for a 3 node cluster (3yr lease) or $5,495/mo for the high performance Database-as-a-Service with 1TB SDD storage (via Rackspace or GoGrid - both featuring a high speed connect to Amazon EC2). See more details on the Clustrix Pricing web page.