Pages

Friday, June 7, 2013

What I'm reading

Well, Dell Service has visited and the new computer appears to be booting successfully. Time to get back to surfing the web!

  • The Cray Files
    These pages contain my project notes on how to build a Cray supercomputer at home. It was – and is – a fascinating journey to me into the history of computers. I’m writing my story up in hopes that others (well, you, dear reader) will find it interesting. It is somewhat of a living document, a journal of sorts. Though the story starts several years ago – and that part I recount from memory – I intend to add new pages to the files as I progress. When I tell this story I leave out some of the dead-ends and my wanderings in the digital desert, so the story-line feels more linear than it actually was. I don’t think you want to hear about all the dozens of failed attempts at solving something. I’ll only leave some of the most spectacular failures in for entertainment value.
  • Practical HTTP Host header attacks
    There are two main ways to exploit this trust in regular web applications. The first approach is web-cache poisoning; manipulating caching systems into storing a page generated with a malicious Host and serving it to others. The second technique abuses alternative channels like password reset emails where the poisoned content is delivered directly to the target. In this post I'll look at how to exploit each of these in the presence of 'secured' server configurations, and how to successfully secure applications and servers.
  • Bobby Tables: A guide to preventing SQL injection
    There is only one way to avoid Bobby Tables attacks
    • Do not create SQL statements that include outside data.
    • Use parameterized SQL calls.
    That's it. Don't try to escape invalid characters. Don't try to do it yourself. Learn how to use parameterized statements. Always, every single time.
  • How tcmalloc Works
    tcmalloc is a memory allocator that's optimized for high concurrency situations. The tc in tcmalloc stands for thread cache — the mechanism through which this particular allocator is able to satisfy certain (often most) allocations locklessly. It's probably the most well-conceived piece of software I've ever had the pleasure of reading, and although I can't realistically cover every detail, I'll do my best to go over the important points.
  • How to Build a Highly Available System Using Consensus
    This paper explains how to build efficient highly available systems out of replicas, and it gives a careful specification and an informal correctness proof for the key algorithm. Nearly all of the ideas are due to Leslie Lamport: replicated state machines [5], the Paxos consensus algorithm [7], and the methods of specifying and analyzing concurrent systems [6]. I wrote the paper because after I had read Lamport’s papers, it still took me a long time to understand these methods and how to use them effectively. Surprisingly few people seem to know about them in spite of their elegance and power.
  • Distributed Locking: When to use it? How?
    An obvious question to ask yourself before venturing into any of the approaches above is: Do you need a lock service after all? Sometimes the answer is you don’t need to as executing the same workflow twice may not be an issue. Whereas there are many other cases where you need such an abstraction. So, please make this decision carefully.
  • Building a Distributed Messaging System
    MessageBus has become an important component in Groupon’s infrastructure, with over 100 topics, 200 subscribers deployed in production. 5 million messages are published to two clusters with 9 nodes daily, and spiky traffic up to 2 million per hour.
  • Network Congestion and Web Browsing
    Despite the rejection of this patch, Google’s servers have been leveraging this kernel modification to experiment with increasing the initial congestion window for SPDY connections. Beyond just experimenting with statically setting the initial congestion window, Google has also experimented with using SPDY level cookies (see SETTINGS_CURRENT_CWND) to cache the server’s congestion window at the browser for reuse later on when the browser re-establishes a SPDY connection to the server.
  • Scaling Storage Is Hard To Do
    But clustering only goes as far as the interconnect will allow. Systems that relied on Fibre Channel, IP/Ethernet, and iSCSI for inter-node communication could only scale to a handful of nodes before node coordination latency got in the way.
  • Atomic I/O operations
    Quite a few workloads — including a lot of database workloads — are especially sensitive to the latency imposed by waits in the filesystem. If the number of waits could be somehow reduced, latency would improve. Fewer waits would also make it possible to send larger I/O operations to the device, with a couple of significant benefits: performance would improve, and, since large chunks are friendlier to a flash-based device's garbage-collection subsystem, the lifetime of the device would also improve. So reducing the number of wait operations executed in a filesystem transaction commit is an important prerequisite for getting the best performance out of contemporary drives.

That's all for now. Have fun!

No comments:

Post a Comment