Tuesday, March 16, 2010

The Data Deluge

The February 27th, 2010 issue of the Economist has an interesting special section:

Data, data everywhere: A special report on managing information


The special section contains about ten articles, looking at the subject from various angles. From the first article:

The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account.

But they are also creating a host of new problems.


The companies and organizations which are working with this "big data" are the obvious ones: Google, IBM, Microsoft, Oracle, etc. Here's an example:

Wal-Mart's inventory-management system, called RetailLink, enables suppliers to see the exact number of their products on ever shelf of every store at that precise moment. The system shows the rate of sales by the hour, by the day, over the past year, and more. Begun in the 1990's, RetailLink gives suppliers a complete overview of when and how their products are selling, and with what other products in the shopping cart. This lets suppliers manage their stocks better.

The article gives a clever use of such data mining:

In 2004 Wal-Mart peered into its mammoth databases and noticed that before a hurricane struck, there was a run on flashlights and batteries, as might be expected; but also on Pop-Tarts, a sugary American breakfast snack. On reflection it is clear that the snack would be a handy thing to eat in a black-out, but the retailer would not have thought to stock up on it before a storm.


The articles touch on many of the problems in dealing with big data:

  • Size/cost/time to process such enormous amounts of data

  • False conclusions, confusing interpretation issues

  • Security and privacy concerns

  • Ownership issues

  • Fair access to data

  • Environmental issues


You might be surprised to see "Environmental issues" in the list.

Another concern is energy consumption. Processing huge amounts of data takes a lot of power. "In two to three years we will saturate the electric cables running into the building," says Alex Szalay at Johns Hopkins University. "The next challenge is how to do the same things as today, but with ten to 100 times less power."

Both Google and Microsoft have had to put some of their huge data centers next to hydroelectric plants to ensure access to enough energy at a reasonable price.


The articles also talk about many of the fascinating areas of technology being driven/created by these new big-data efforst:

  • Cloud computing

  • Query processing

  • Statistical algorithms, such as

    • collaborative filtering

    • statistical spelling correction

    • statistical translation

    • Bayesian spam filtering

    • Predictive analytics


  • Storage and networking advancements

  • Data visualization

  • Flash trading



It's a long list, and a lot of exciting areas.

As is often true with Economist special reports, the writing is fairly dry, and the presentation tends to provide a fairly high-level overview of a lot of related areas, without providing much in the way of resources for digging into those areas more deeply.

But overall it was intriguing and quite worth reading.

1 comment:

  1. Dear Sir,

    I have the pleasure to brief on our Data Visualization software
    "Trend Compass".

    TC is a new concept in viewing statistics and trends in an animated
    way by displaying 5 axis (X, Y, Time, Bubble size & Bubble color)
    instead of just the traditional X and Y axis. It could be used in
    analysis, research, presentation etc. In the banking sector, we have
    Deutsche Bank New York as our client.


    This a link on weather data :

    http://www.epicsyst.com/test/v2/aims/

    This is a bank link to compare Deposits, Withdrawals and numbers of
    Customers for different branches over time ( all in 1 Chart) :

    http://www.epicsyst.com/test/v2/bank-trx/

    Misc Examples :

    http://www.epicsyst.com/test/v2/airline/
    http://www.epicsyst.com/test/v2/stockmarket1/
    http://www.epicsyst.com/test/v2/tax/
    http://www.epicsyst.com/test/v2/football/
    http://www.epicsyst.com/test/v2/swinefludaily/
    http://www.epicsyst.com/test/v2/flu/
    http://www.epicsyst.com/test/v2/babyboomers/
    http://www.epicsyst.com/test/v2/bank-trx/
    http://www.epicsyst.com/test/v2/advertising/

    This is a project we did with Princeton University on US unemployment :
    http://www.epicsyst.com/main3.swf

    A 3 minutes video presentation of above by Professor Alan Krueger
    Bendheim Professor of Economics and Public Affairs at Princeton
    University and currently Chief Economist at the US Treasury using
    Trend Compass :
    http://epicsyst.com/trendcompass/princeton.aspx?home=1

    Latest financial links on the Central Bank of Egypt:

    http://www.epicsyst.com/trendcompass/samples/Aggregate-balance-sheet/
    http://www.epicsyst.com/trendcompass/samples/balance-sheet
    http://www.epicsyst.com/trendcompass/samples/banks-deposits-by-maturity/
    http://www.epicsyst.com/trendcompass/samples/egyptian-banks/
    http://www.epicsyst.com/trendcompass/samples/currency-by-denomination/

    I hope you could evaluate it and give me your comments. So many ideas
    are there.

    You can download a trial version. It has a feature to export
    EXE,PPS,HTML and AVI files. The most impressive is the AVI since you
    can record Audio/Video for the charts you create.

    http://epicsyst.com/trendcompass/FreeVersion/TrendCompassv1.2_DotNet.zip

    All the best.

    Epic Systems
    www.epicsyst.com

    ReplyDelete