Dustin Hoffman in The Graduate.
If you're of a certain age, you will remember this scene from The Graduate.
Dustin Hoffman plays the character of a young man graduating from college. At his graduation party, a neighbour takes him aside confidentially and says, "I have just one word for you -- one word -- plastics".
What he means is that he is "confidentially" telling him what the "next big thing" is which he should chose as his field of study and employment. It's hard to remember why plastics would be a big thing, but in 1967, when this movie came out, plastics were not ubiquitous. There were paper bags, not plastic bags at the supermarket. Less things were shrink- wrapped in plastic, they just came as they were. Toothpaste tubes were made of aluminum, not plastic. Children's toys were made of rubber, an older product, not plastic. There were bed springs and cotton stuffing, not memory foam. Cottage cheese came in paper tubs, not plastic (yoghurt wasn't as ubiquitous). There simply weren't designer water bottles at all. And so on. Plastics, with their "More Science High" and chemicals-for-better-living took over the scene and nobody thought about how they had once hardly existed. Indeed, it was a booming field for a young person looking to start a career in a big company.
So today, if you're at that graduation party your parents have thrown for you and you're feeling alienated and at loose ends, surely someone will come up and take you aside and say, "I have just one word for you -- actually two -- Big Data!"
Big Data is now the latest or the Next Big Thing and it's just everywhere. Colleges are scrambling to make majors around data mining, businesses of course have been touting it for years already and it's now going to rule our lives. Big Data will show us why we're wrong if we object. Nate Silvers is all about Big Data.
I thought this paragraph was really telling as to what Big Data was all about -- finding, like within a Rorschach blot, what you'd like to see and making the numbers fit your story:
Rachel Schutt, a senior research scientist at Johnson Research Labs, taught “Introduction to Data Science” last semester at Columbia (its first course with “data science” in the title). She described the data scientist this way: “a hybrid computer scientist software engineer statistician.” And added: “The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”
Somehow, a separate science or critical industry has to spring up alonside these "computer scientists," that will question how they form their narratives to see the structure in those unstructured situations. Who will provide this necessary service?
Of course, other things go along with Big Data -- the Cloud, where it all has to be housed; neuroscience, which is going to purport to duplicate the brain's functions, only make them better, despite what critics say; and then online governance (the wired state) in which Big Data will increasingly be used -- and maybe deployed so as even to obviate the need for voting at all. The geeks who want to eliminate political parties and voting would be happy to see data-scraping and strategic deployment of its results replace organic -- and messy -- democracy so that they can control it as coders.
Most of all, the forced majoritarian democracy of Big Data results will be deployed as "science" to prevent deliberation and voting at all -- or even as a to selectively chose votes (likes) to make a point. I would see this sort of thing in Second Life all the time on the JIRA, the public board where you could vote for a bug to get attention from the developers, or suggest a feature. So, only two people would vote for some obvious bug that affected businesses, and the open source cultists who hated business would flash mob 100 people to comment against it (or start another bug entry and vote for that), and the devs would say, "oh, you're in the minority, sorry". These people who in fact were never ruled by democracy and never really even consulted democracy would suddenly invoke "democracy" in a narrow, manipulated setting to make their point. They avoided a real show-down by having a properly-notified vote with the issues framed properly, and a one-person, one-vote system to avoid alts or sock-puppets -- and of course didn't even have the ability to vote "no" which was an automatic skewer.
So Big Data is above all about who frames the data and who is telling the story, and how they tell it. It's also about the "garbage-in" problem -- if Wikipedia is forming the basis for new data-gathering and Big Data drilling projects, small wonder that we will have political skewing.
It's not really about the data. It's about the story. I would see this with Nate Silvers time and again. There was never anything wrong with "the math" -- how could there be, it was math! But it was about what was selected, which polls, which story to go with them, where the cuts were made. This goes in, that doesn't go in.
I remember one of the big discoveries I made in Second Life was that there was a very different real popularity list than the "top 20 popular sites" list that the company kept featuring. I had this instinctual sense that where I saw my customers going, and where the company said they were going were different.
The "top 20" was manipulated, of course by several factors. First, the company itself would pick out sites it liked for ideological or business reasons, and feature them in various ways -- sometimes with a blog post, or even something as seemingly innocuous as the CEO putting it in his own personal "picks" on his avatar profile. There could even be guided paths to those sites -- like Cubey Terra's Aerodrome -- for the newbie when he landed in the world and was looking for things to do. So those "top 20" sites had some steerage -- and then it was a self-fulfilling prophecy -- once boosted, the site would stay in the high ranks as people came to it because people came to it...Having had such a windfall a few times for my own sites -- when unbenownst to me, my sites were picked when I hadn't even applied -- I can see the deluge of traffic that would come to them, no doubt the way the traffic for the sites featured right now on the Linden blog, like Eshi Otatawa's dress store.
The other factor would be the manipulation of traffic statistics in Second Life through "camping" as it was called -- paying people to physically remain at your store or hangout or dance club to artificially drive up the numbers. Soon people deployed bots, hiding them in boxes up in the sky so their artificial nature wouldn't turn people off, and that way they could make it appear as if hundreds of people were coming and staying at the site.
So I suggested that instead of looking at those fake and manipulated numbers, the company should tabulate another source, which amounted to the "likes" on Facebook -- before that system even existed, and before Facebook was widely used. These were the "picks" that you put on your profile. When you visited a store, you could "like" it by clicking on the button to have it show up in your picks, and then the picture and name and URL would automatically render on your profile. To be sure, there were companies that were willing to buy people's picks, but really not that many of them, because most people found the few spots on their profile to show their favourite places to be so precious they didn't want to sell them to some store. In fact, they used them as a story board, and put up moments of their Second Life, like their first home or their virtual weddings or a party with friends, and then in a sense the place where their story had occurred became a "favourite".
So I challenged the company to track those "picks" and put up the results, and Philip Rosedale was finally persuaded to do this once, and even Hamlet Au, who loathed me because I challenged his house-organ style paid writing for the company at the time, was forced to write about this great idea of mine, which was called a "folksonomy". Naturally, the pristine state of the "folksonomy" couldn't last long, as companies might then game it. But it was less gamed than the other means which was determined by traffic -- itself a category that then got discarded and manipulated with other secret algorithms a la Google (and even the Google Search Appliance was used).
I thought of all this when I read Socialbakers' report on the top brands "liked" in America based on what people clicked on Facebook. Sure, those companies may be hiring bands of "likers". But by and large, I think they have pretty much genuine material. I see hundreds of my friends "liking" those brands. Once, I heard someone complain that a brand he didn't think he "liked," really, was showing up in his feed. Did he accidently click when he was hovering? Is it sometimes put in without your liking? (Right now, I see things in my Facebook feed I can't believe I could have ever accidently clicked, even, like "Lower My Bills," so it's right to ask the question).
Even so, if pressed, I think these analytical firms could show through representative samples that most of the "likes" are organic.
Wal-mart, the store that the left loves to hate, is America's favourite store. They love it. They "like" it on Facebook. For real. Because they are not like you. We are not like you. I shop at Wal-mart's when I can, which isn't so often as we don't have one in New York City, which is too urbane and hipster and lefty for Wal-mart -- it would never get approved. We don't have very many big stores beyond Macy's and a few like Bed, Bath and Beyond anyway -- be thankful we have at least a K-mart.
Note that Apple doesn't even appear on this list; Samsug is the favourite brand for gadgets.
Despite the left's best efforts, Chik-Fil-A's picture of their founder enjoying a birthday was the most interacted picture. The second-most interacted, i.e. reposted, was a critique of Michelle Obama's birthday party at a time when White House tours for kids were cancelled. The third is a demand that McCain apologize to Rand Paul.
Now, you might conclude that these are just organized conservative or liberatarian flashmobs, and maybe they are, but they are interesting, because they are so different than the flashmobs we seem to see on Twitter.
Here's another "crowd-sourced" evidence of the "liking" of Wal-mart from Dorothy Gambrell. This is the place where people are, when they spot someone they'd like to date, but either don't get up enough nerve to meet, or else lose in the crowd before they can say anything. This romantic "missed connection" idea is very popular on Craig's List, and this map of the country pining in their place of "missed connection" is quite telling about culture -- the 24-hour fitness gym in California, the subway in New York City where people have to take long rides underground -- and Wal-mart so many other places!
The fact is, while the left keeps trying to demonize Wal-mart as part of their assault on capitalism as a system, because they think it exemplifies its evils, people keep shopping there because it's convenient, helpful, and the prices are cheap but the goods not too low quality. The left prefers to bash Wal-mart for buying goods manufactured in China, even as they tweet arrogantly on their iphones also made in China by Silicon Valley companies like Apple.
Whenever I see one of these anti-Wal-mart stories, I always have to wonder where they come from. I saw one, two, three of them the other day, and they all seemed to come from the same source at the same time, or are recycled. It seems to me Business Insider, despite its name, consistently runs a leftist or "progressive" line that is anti-business of the sort that Silicon Valley hates culturally or finds competitive.
So I asked using that self-same social media whether other people were finding that the Wal-mart shelves were empty, the sales people were depleted, and the lines were long. Yes, there was a picture of an empty shelf at Wal-mart in these news article, but was that what people were randomly discovering in their particular Wal-mart?
Not surprisingly, some people in different locations across the country reported that they weren't seeing any of this.
Some reported that they saw less sales help and longer lines, but not empty shelves. I will have to come back and post my findings when I hear from some more people, but even with just a half dozen answers where no one found any empty shelves, I had to question what this story was about.
Now, a brisk Silicon Valley Big Data manager might tell me that the numbers were showing empty or emptying shelves over the entire system. Or the journalists trying to make a point maybe selected a geographical sample, or an illustrative sample to make their point. And how will we know, as journalists are increasingly babbling about how they are crowd-sourcing and using Big Data now and there is even an implication that we need "journalist-coders" for the future.
But if they cannot faithfully render what might only be the minority experience, it's misleading. And as we keep encountering these seeming "Big Data" pronouncements, dripping with mathematical certitude and self-righteousness, how can we dare combat them? The media could be selective or it could really use "Big Data" from across a system, but come to tendentious readings of it. Telling the story their way.
And there will really only be one way of combating this, if strong enough social movements rise up to keep insisting on their own narratives. One side will always find the other's to be "false," but what's important is that there be mulitple narratives so that there's a running critique for the ordinary man to try to make up his own mind.
As for that job you're trying to land? I'm not sure there will be that many six-figured data-mining jobs so available just yet. And despite what you read in the papers, Wal-mart is always hiring and is massively popular, so try there.