Pages

Monday, February 20, 2012

Predictive analytics

Like many others, I was fascinated by the article in this Sunday's New York Times Magazine: Psst, you in aisle 5. (Curiously, in the online version of the article, the title is: How Companies Learn Your Secrets, which is more accurate if not as colorful.)

Although the juicy tidbits about the effectiveness of modern advertising and marketing campaigns were quite intriguing, I was mostly interested in the underlying computing aspects, but the article was rather limited in its discussion of those areas. Perhaps I was attracted by the sweet words:

“It’s like an arms race to hire statisticians nowadays,” said Andreas Weigend, the former chief scientist at Amazon.com. “Mathematicians are suddenly sexy.”

The basic idea, of course, is to start with simply massive amounts of data:

Whenever possible, Target assigns each shopper a unique code — known internally as the Guest ID number — that keeps tabs on everything they buy. “If you use a credit card or a coupon, or fill out a survey, or mail in a refund, or call the customer help line, or open an e-mail we’ve sent you or visit our Web site, we’ll record it and link it to your Guest ID,” Pole said. “We want to know everything we can.”

This sort of data can be collected by the retailer itself, but then it extends that data with other related data it acquires externally:

Target can buy data about your ethnicity, job history, the magazines you read, if you’ve ever declared bankruptcy or got divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own.

Once you have all that data, however, the trick is to figure out how to use it successfully. Since essentially all retailers have access to similar data, the primary differentiator is how effectively they can figure out how to locate patterns in the data and successfully deduce what they mean.

In Target's case, one technique is to identify correlations, which they can do in a broadly statistical manner, similar to the "signals" that search engines use to deduce your intent from your actions:

Target has a baby-shower registry, and Pole started there, observing how shopping habits changed as a woman approached her due date, which women on the registry had willingly disclosed. He ran test after test, analyzing the data, and before long some useful patterns emerged. Lotions, for example. Lots of people buy lotion, but one of Pole’s colleagues noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.

As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.

These analyses are very sophisticated, and quite successful at separating false patterns from true ones:

I stopped at a Target to pick up some deodorant, then also bought some T-shirts and a fancy hair gel. On a whim, I threw in some pacifiers, to see how the computers would react. Besides, our baby is now 9 months old. You can’t have too many pacifiers.

When I paid, I didn’t receive any sudden deals on diapers or formula, to my slight disappointment. It made sense, though: I was shopping in a city I never previously visited, at 9:45 p.m. on a weeknight, buying a random assortment of items. I was using a corporate credit card, and besides the pacifiers, hadn’t purchased any of the things that a parent needs. It was clear to Target’s computers that I was on a business trip. Pole’s prediction calculator took one look at me, ran the numbers and decided to bide its time.

But as many people, including the marketing departments at Target, realized, this sort of analysis can get rather disturbing:

Pole applied his program to every regular female shopper in Target’s national database and soon had a list of tens of thousands of women who were most likely pregnant.

...

At which point someone asked an important question: How are women going to react when they figure out how much Target knows?

Indeed, that is a very important question, and it's not clear how well Target is dealing with it. The article describes one attempt, which involved camouflaging their results to try to fool their customers:

“With the pregnancy products, though, we learned that some women react badly,” the executive said. “Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.

“And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”

This "fool your own customers" technique is a poor one, and hopefully the company quickly learns that this is the wrong direction to be heading, and instead moves in a direction of greater transparency, greater honesty, and learns to trust its customers and work with them, rather than manipulating them.

Unlikely, I suppose, but we can hope!

No comments:

Post a Comment