Tuesday, October 5, 2010

Urban Airship and the C500K problem

If you've been around web server implementations for 15 years or so (as I have -- wow I'm getting old!), then you're undoubtedly familiar with the C10K problem. The C10K problem was an attempt, about 7-8 years ago, to structure the discussion of server implementation strategies, and, in particular, to expose some of the basic choices that could be made, and what their impacts were. The C10K paper describes this quite clearly:

Designers of networking software have many options. Here are a few:

  • Whether and how to issue multiple I/O calls from a single thread

    • Don't; use blocking/synchronous calls throughout, and possibly use multiple threads or processes to achieve concurrency

    • Use nonblocking calls (e.g. write() on a socket set to O_NONBLOCK) to start I/O, and readiness notification (e.g. poll() or /dev/poll) to know when it's OK to start the next I/O on that channel. Generally only usable with network I/O, not disk I/O.

    • Use asynchronous calls (e.g. aio_write()) to start I/O, and completion notification (e.g. signals or completion ports) to know when the I/O finishes. Good for both network and disk I/O.

  • How to control the code servicing each client

    • one process for each client (classic Unix approach, used since 1980 or so)

    • one OS-level thread handles many clients; each client is controlled by:

      • a user-level thread (e.g. GNU state threads, classic Java with green threads)

      • a state machine (a bit esoteric, but popular in some circles; my favorite)

      • a continuation (a bit esoteric, but popular in some circles)

    • one OS-level thread for each client (e.g. classic Java with native threads)

    • one OS-level thread for each active client (e.g. Tomcat with apache front end; NT completion ports; thread pools)

  • Whether to use standard O/S services, or put some code into the kernel (e.g. in a custom driver, kernel module, or VxD)

The following five combinations seem to be popular:

  1. Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification

  2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification

  3. Serve many clients with each server thread, and use asynchronous I/O

  4. serve one client with each server thread, and use blocking I/O

  5. Build the server code into the kernel

Well, time has passed, and, frankly, 10,000 simultaneous connections just doesn't seem all that scary any more. At my day job, we have a number of customers who approach these levels routinely, and a few who are solidly pushing beyond them.

So, what's the next step? The folks at Urban Airship have recently published a pair of fascinating pair of blog posts talking about their own internal efforts to prototype, benchmark, and study a C500K server.

Yes, that's right: they are trying to support 500,000 simultaneous TCP/IP connections to a single server!

Moreover, they're trying to do this in Java (actually, I suspect, in Scala)!

Still moreover, they're trying to do this in the Amazon EC2 cloud!

As should probably not be surprising, the biggest issue is memory.

At any rate, if you're still reading at this point, you'll definitely want to head over to Urban Airship's site and read through their report:

It's quite interesting, and much thanks to Urban Airship for sharing their findings.

1 comment:

  1. Hi Bryan,

    Thanks for the write up! For the service that is designed to handle 500k connections is straight Java+NIO. In our original blog post we mention testing Scala+Netty, but we were only able to get a fraction of the connections before we ran into problems.

    Scala+NIO (or even Netty) may have been adequate with a bit more work, but for a relatively simple edge service Java sufficed quite well.

    Michael Schurter
    Urban Airship