Date: 11/29/16

Welcome to Season 2 of Raw Data! We’re kicking off our second season by exploring a big question: Just how valuable is our data, really? Is the big data revolution just getting started, or are we in a sort of data bubble that could soon burst? Are we data mining for gold...or pyrite?

Many major tech companies have built their entire business models on the premise that big data has major power. Google, Facebook, and Amazon are worth hundreds of billions of dollars, in part because investors drool over their most valuable asset – their user data. 

But to better understand how new companies are trying to turn data into dollars, we visited a startup in Menlo Park called Quantifind. Quantifind is an analytics company that looks for hidden signals in data from all over the web, partnering with companies to help them identify opportunities that can drive a company’s strategy. 

Quantifind co-founder Ari Tuchman told Raw Data that when sifting through data to find a useful trend or metric, about 85% of the data is useless noise. If there is gold, it takes a lot of digging – and knowing where to mine.  

Still, Quantifind is a good example of a company that is innovating in the new big data ecosystem. Successful companies in this space must adopt a big data mindset, where an increasing number of decisions – whether that’s investing in new products, tapping into new markets, or recruiting new talent – will all be driven by data. 

“I see big data as a new perspective on reality, a new way of looking at the world and understanding the world in which we live in,” said Viktor Mayer-Schönberger, an Oxford professor and co-author of the book Big Data: A Revolution that Will Transform How We Live, Work, and Think. “And driven by that better understanding of the world we live in are better decisions, and decisions are key to successes of businesses.”

In other words, a big data mindset isn’t about relying on data to solve some huge unknown problem. Rather, it’s the accumulation of many smaller decisions that ultimately leads to a lucrative strategy. 

Despite the payoff, companies that promise to deliver big data insights still require a lot of investment. Among other things, you have to pay the army of data scientists to help find the little gold nuggets. So the question remains – is big data really that valuable?

To help us think through what the future of big data may look like, we turned to an unexpected place. There’s a group at UC Berkeley called the Center for Long-Term Cybersecurity (CLTC), and a few months ago they released a report called “Cyberfutures 2020.” When we spoke with Executive Director Betsy Cooper, she told us that the CLTC came up with five scenarios for the future of the digital world. Basically, each scenario takes current trends in technology and extrapolates out into the future. “The idea is, if this one trend goes to the nth degree, how should we prepare and respond to it,” said Betsy. 

We were particularly struck by two scenarios. 

The “Omega” scenario imagines a world in which predictive analytics have gotten so sophisticated that we’ve figured out the last, or “omega” algorithm – an algorithm so powerful it can tell what a specific individual will do in a given situation. On the other end of the spectrum, “Bubble 2.0” explores the possibility that huge tech giants land in major financial trouble because data isn’t as powerful as we think it is today. You can learn more about the scenarios by checking out the original report

In this episode, we’ve been thinking about data’s value as a business asset, but data is can also be a liability for a company. Chances are your own data has been compromised in one major data breach or another – Target, Sony, Snapchat, or some other company. And data can also be a liability in a totally different way – government agencies are pressuring companies like Apple to give law enforcement access in the pursuit of criminal and terrorist activity. But are the agencies posing a risk to our civil liberties?  

We’ll be talking about the dark side of big data on the next episode of the podcast. We’ll dive into our current surveillance state, as well as infringements on constitutional rights like the first amendment. We’ll be back in two weeks, so stay tuned!

–Isha Salian

Additional episode credits:

Voice of cyberfutures – Nicholas Weiler
Raw Data theme music – “From the Outset,” Nick Carlozzi
Other music – “Gentle Whispering (radio edit)”, Lee Rosevere; “White Lotus,” Kevin MacLeod


MICHAEL C. OSBORNE: From Stanford, this is Raw Data. I’m Mike Osborne.

LESLIE CHANG: And I’m Leslie Chang. This episode is the launch of season TWO of our podcast! We’re so excited to be back. Today’s episode: Gold... or Pyrite?

[theme continues, fades under]

LC: There’s a cultural mindset here in Silicon Valley that all the data tech companies are creating and collecting holds immense value. This data is the raw material being fed into ever more sophisticated algorithms.

MCO: The excitement around big data is fueling the rise of companies both big and small. It’s like the gazillion bits of information filling up our servers are a gold mine, and everyone is looking to hit paydirt.

LC: So for the past few months, Mike and I have been wondering: is that actually true? Are we just at the very beginning of the big data revolution? Will our lives be radically different five or ten years from now because of all the information being captured today?

MCO: Or is big data just a bunch of hype? Are we over-estimating its value? And how exactly do you take data and turn it into real dollars? These are all big questions, so it seemed like a good place to start out this new season.

LC: There are lots of investors and entrepreneurs betting that big data will bring in huge rewards. A few weeks ago, we heard about a start-up that’s called Quantifind. So, like a mash-up between quantify and find. Quantifind. Their offices are located in Menlo Park, and the inside looks like a pretty typical start up. Open floor plan, unfinished ceilings, mid-century furniture, lots of whiteboards. In the lobby, we meet one of the co-founders, Ari Tuchman.


LC: Hey, I’m Leslie.

MCO: Mike Osborne.

MCO: Ari led us to a conference room, we talked for a few minutes about Quantifind. But I was having a hard time getting clarity on what they actually do.

MCO: I actually do want to back up and ask it that simple. How do you describe Quantifind to somebody who has no idea what big data is, what is your company all about?

AT: The simplest thing, which is a little bit of a catch phrase is "We help find signal in noise." I think, to be a little more refined on that is, we help find signal in noise that actually informs, making a decision to drive a business metric.

MCO: Okay.

MCO: Basically Quantifind is like a consulting firm. So pretend I’m a company that sells beer, and I wanna know what’s driving sales of my new craft ale. People are tweeting about my new craft ale and posting photos of themselves drinking my new craft ale at bars. But what do they really think about my beer? What makes them buy it or not?

LC: With these questions, Mike could go to Ari, who would help him gather up all the snippets of conversation happening on social media about the new beer. Ari and his team would start by filtering out all the irrelevant stuff. And whatever’s left – all the bits of chatter that might help answer Mike’s questions – that gets funneled into Quantifind’s complicated statistical models. From there, they would help Mike figure out what’s driving sales.

MCO: And if I, as a beer company actually hire Quantifind, then I also might give them access to my internal data.

AT: The most valuable data for a lot of our consumer facing type of marketing engagements, is enterprise data. Think about transcripts from call logs, email correspondence, voice of customer surveys. What's incredible is how much money companies spend to generate and collect that data versus how little effort they spend on actually making sense out of it.

LC: To give you a real world example of how all this data can help inform business decisions, Ari told us about a telecommunications company that Quantifind has worked with in the past. They started with a bunch of conversational data that showed a million customers were really frustrated that a call was dropped.

AT: We're able to see that actually those frustrations, those conversations have no impact on switching behavior. A million people are really cursing out, they go to remote areas, there's no signal, but they stay loyal to the carrier. Whereas, a much smaller fraction of that, just by volume, are complaining they got hit with a $5 overage charge in their data usage plan, within 11 weeks, they switch carriers.

LC: So imagine you’re this telephone company, which group of customers matters more?

AT: The million people that are complaining about signal in remote areas of the wilderness or dropped calls – they're not leaving anyway. Whereas, the other group are telling you what you need to know. Again, you can do something about it, you can change your data plan.

MCO: This is a typical example of how it works. It’s not that big data necessarily leads to HUGE radical decisions – this telecommunications company didn’t completely overhaul their entire business model or anything because of this one insight. Instead, it’s about helping to inform a whole bunch of smaller decisions. These add up to a larger data-driven way for businesses to think about the world. One of the points Ari impressed upon us is that the really clever work comes from posing the right kinds of questions.

AT: Big data can often be a philosophical question, like wow, here's all my data, what's interesting? That's a terrible question. Anytime anyone asks us that, we run. You're never going to succeed, because what's interesting to me is not to you, so that's a very subjective claim. Big data solves problems that are more objectively defined. I don't want to touch that, “what's interesting.” What I can touch is what drives sales? What drives revenue? What's a predictor of viral outbreaks? What's a predictor of fraud events? If you can define the question as, here's something that's been happening, that we can measure, what drives that? Now it's a math problem, it's not a philosophy problem.

LC: And right now, many people are betting that once you put big data through those math problems, it can lead to big returns and benefits. We talked to someone who falls firmly into this camp:

VIKTOR MAYER SCHONBERGER: Hello. My name is Viktor Mayer-Schönberger. I am a co-author of the book Big Data: A Revolution that Will Transform How We Live, Work and Think.

LC: And broadly, how do you describe the ways in which big data is really transforming business and the economy?

VMS: Well a lot of people think that big data has to do with new, digital tools. New technological advancements that enable us to do what we have been doing but just faster or cheaper. I see the thing differently. I see big data as a new perspective on reality, a new way of looking at the world and understanding the world in which we live in. And driven by that better understanding of the world we live in are better decisions, and decisions are key to successes of businesses. What new product to design, what new product line or revenue stream to invest in.

MCO: of course a major reason for the current obsession with data is the success of tech giants like Google, Facebook, and Amazon. Today, though, there’s this whole second generation of companies like Quanitfind that are built on data science and analytical knowhow.

VMS: The Big Data age is an age where entry into market is incredibly easy and available at low cost. You don't have to build manufacturing plants anymore like you did in the industrial age, and you don't even have to build huge server plants anymore as you had to in the first dot com age. The barrier to entry is very low, all that you need is some way of getting access to data, and of course a really good idea and that of course means a Big Data mindset.

LC: But here’s the thing: this is a whole industry propped up by something intangible. It’s not exactly clear how to put a dollar value on data. What’s it all worth? It’s not like we have factories and assembly line machines, all of which represent the kind of capital whose value can be estimated in real dollars.

VMS: Data is not a physical asset, and because it's a relatively new asset class, it is something that over time the market only needs to understand better and therefore be able to price better. What we see with the valuations of Twitter and Facebook and many other of the internet companies is that the market is actually valuing the data and adding that value and incorporating that into the stock price.

MCO: So... is the market right? Is our data actually as valuable as investors and companies here and around the world believe?

LC: There’s a group over at the University of California Berkeley that, in a way, has been exploring this exact question. The group is called the Center for Long-Term Cybersecurity, and a few months ago, they released a report called “Cyberfutures 2020.”  Basically, they created a set of stories about what the future might look like.

MCO: To develop their stories, the group used a technique called scenario planning. If you haven't heard of it before, scenario planning is a process that organizations use to make strategic decisions. It forces them to question their assumptions about the future. You start with uncertainties you see in the world today, combine them in different ways, and create stories about the future by playing them out. At the end of the process, you’ve created several plausible scenarios that often challenge the status quo.

LC: In one of the Berkeley cyberfuture scenarios, they imagine that we’re only at the earliest stages of the big data revolution. In this story, we become exponentially better at analyzing data. Organizations can extract way more value than what we can do today. Imagine predictive algorithms that are so good they can predict what a specific individual will do in any given situation.

MCO: What might that world feel like?

[music: “Gentle Whispering (radio edit)”, Lee Rosevere]

Nick Weiler: Imagine this – It’s the year 2020, and as you sip your morning coffee, you boot up your laptop. Your computer seems to know everything you’re about to do the exact moment before you actually do it, from bringing up the hourly weather forecast, to asking whether you want to finally buy that 3-pack of Batman underwear you’ve been dreaming about – at 20% off.

This is one potential world that could arise out of today’s big data trends. This scenario is called... OMEGA.

With accelerated developments in machine learning, algorithms, and sensors that track human actions, the internet of 2020 will have embedded within it profoundly powerful models capable of predicting—and manipulating—a wide range of human behavior.

These new models don’t just “guess” what you want by comparing you to groups with similar characteristics; They will have the ability to make truly individualized predictions. Steve only buys clothing online after 10 pm, after he’s looked at 6 cat videos. Here’s another cat video, Steve. Now check out this new color-changing t-shirt. The algorithms will know you better than you know yourself.

In this new world, high-tech firms and sophisticated criminals alike will be able to identify – and, in some circumstances influence, control – your future behavior at an incredibly granular level. With barely a second thought, you’ll be nudged towards consuming certain foods, medicines, movies, and music.

Predictive models will play an increasingly significant role in day to day life. Aspects of this world would appear mundane. Algorithms route global air traffic, and choose which ads to display on a billboard. Other instances are more problematic – the federal government uses algorithms to decide where border police should be stationed in order to intercept immigrants.

In the private sector, every company will invest in sophisticated analytical prediction teams. Weight-loss companies will be able to make precise diet and behavior recommendations based on entire medical histories. Retail fashion brands will correctly forecast the total sales generated from the European rollout of their spring line. In 2020, CVS Health will begin pre-filling shopping carts.

Many will regard these capabilities as a signal of the last, most powerful algorithm—the “omega” algorithm. The power of data science to predict individual behavior at this...precise...level will become the most polarizing debate of the decade. Optimists will claim that dynamic individualized predictions can solve problems that humans had almost given up on, like world hunger, poverty and disease. Pessimists will see it as the final step before humanity hands over all power to omniscient technologies—and, indeed, as an end to free will itself.

BETSY COOPER: We of course, do not expect any of these scenarios to come true and we may see elements of each of them in the real world, 2020.

LC: This is Betsy Cooper, the executive director of the Berkeley cybersecurity group that developed these stories.

BC: But the idea is if this one trend goes to the nth degree, how should we prepare and respond to it.

LC: Even though some of the scenarios address the value of data, Besty’s group is primarily concerned about what each of the stories mean for cybersecurity. In the Omega future…

BC: We know a lot more about you and of course there are commercial implications, but there’s also implications for the way cybersecurity would function in a world in which algorithms can be predictive so you can imagine for instance that people will try to attack the algorithms or they might try to attack the way that, use social manipulation techniques based on these algorithms. If I know your political tendencies or that you're a one issue voter on the issue of abortion, if I can use algorithms to try to make you see that a particular political candidate will vote the way you want them to vote, you might end up making a decision based on that.

MCO: Omega is just one scenario, and in the same cyberfutures report, there’s another story that in some ways is the exact opposite. Rather than making a quantum leap in our ability to use data and algorithms to predict behavior, instead, we hit a limit. In this story, it turns out that the current way of thinking about data is misguided, and all the investment in the data economy was based on questionable assumptions. This scenario is titled…

NW: ...Bubble 2.0.

[music: “White Lotus,” Kevin MacLeod]

NW: This is a world in which many of today’s data-intensive internet companies—and the platforms and advertising revenue underpinning them—collapse as a result of perceived overvaluation. While consumers today have assumed that handing over their data would lead to better, more personalized products, it turns out, companies are not able to deliver on that promise. In other words, we’re in a bubble propped up by data.

In the “Bubble 2.0” world, slow-moving trends, already underway, will set the stage for an internet business model crash. Engineers will abandon high-priced Silicon Valley firms for alternative clusters in Singapore, China, South Korea, and elsewhere. The exodus will be driven in part, by brewing ideological disillusionment within the tech community. People will ask, “When did we stop trying to change the world, and instead just make indulgence products for 30-year-old, urban-dwelling singles?”

A slew of earnings reports shows a decline in digital advertising revenue among major firms, such as Google and Facebook. Within a short period, the market capitalization of big and small technology companies alike collapses.

Skype announces, with no warning, that its services will be discontinued as of a particular Tuesday afternoon. Following the lead set by General Motors in 2009, Google seeks a government bailout because the company is too big to fail. Freemium becomes a word of the past, and many of the free apps that had been iconic symbols of Web 2.0 will be no more.

As the valuations of major tech firms plummet, they realize one of the assets they can sell to raise capital are their databases of user behavior. A new market opens as tech firms begin selling off their data in bulk. Consumers suddenly realize that their personal information is now up for sale on the open market, completely out of their control. Today we might not care that Amazon’s database knows our entire purchasing history. But what happens when that data is sold to the highest bidder? Does it reveal something we’d much prefer to keep private? What embarrassing items have you purchased recently?

BC: The fear for me is not just about the implications of what cybersecurity looks like. It's the overall structure of Silicon Valley and the way our world works. Clearly, the resale of data would be a key part of that but there would also be huge shift in just the overall economic structure of the United States. I think that's a story both of changes in cybersecurity that are extremely important but changes to our worldview out here that's also important to keep in mind.

MCO: These scenarios ares not meant to be predictions. Obviously the future is unpredictable. And one way or another, the year 2020 will not look like we expect it to look today. The point of scenario planning is to help us expand our thinking.

BC: What you often find first is that people say, no, that can never happen. With numerous of the scenarios, we sit down with someone and say, "That's an absolutely preposterous idea." Then, if you get them to engage and really sift through the scenarios and think beyond implications then they start to come back and come around and say, "Oh, that could probably actually happen."

MCO: In the Omega future, companies like Quantifind are big winners. People like Viktor Mayer-Schonberger are proven right, and our digital data is a gold mine that we are just beginning to tap.

LC: But in the Bubble 2.0 future, all our server farms are actually just full of pyrite, fool’s gold. The bubble bursts, and everyone that has invested in the promise of big data loses out. The market crash has ripple effects throughout the entire economy.

MCO: Regardless of what actually happens in the next five years, there’s another angle that we haven’t even talked about yet. Data is in some sense an asset, but it can also be a liability. Just look at Target, or Sony, or any other consumer company that’s had a data breach. More and more, these events are becoming commonplace, and they come with huge costs.

LC: And... there’s another way in which data can be a liability. A lot of the companies that built the big data ecosystem have had to deal with the fact that government and law enforcement are often interested in  the information they’re collecting. While we were working on this episode and wrestling with these questions, Mike and I visited the Electronic Frontier Foundation and talked to their executive director, Cindy Cohn. Cindy spends a lot of her energy thinking about how the current state of affairs may threaten our civil liberties.

CINDY COHN: There's a lot of big data fetishism going on in government.

I do worry, I worry not only that the value proposition for the rest of us isn't really that great, and that data sets are misused, the way that algorithms get deployed.

I'm a little less bullish that big data could make so many things so much better. I think, that the jury's still out on that, and I worry that the hype is not going to live up to the hype.

MCO: EFF’s mission is to protect our rights in the digital age, including our first amendment rights as Americans. Free speech is one of the core principles that defines American democracy. It is the ability to express our views, even if they’re controversial, and to criticize those in power. But free speech is not just about the right to express opinions in public – it’s also about our right to a private conversation and to find other people like us who want to have those conversations.

CC: I don't have any problem explaining this to my friends in Black Lives Matter. I don't have any problem explaining this to my friends, frankly, who are gun activists. It doesn't matter what side of the spectrum you are. If you're outside of the mainstream of thought, and you're trying to organize things, you're trying to make the world a different place, you understand the need to have a private conversation, and

That to me, is why the Constitution cares about this. It's not really about individualized lives. It's about the necessary conditions for a functioning democracy.

MCO: Today there’s growing evidence that we live in a surveillance state. This is the dark side of what it means to collect big data and track behavior. The ability to capture communications poses serious challenges for journalists, whistleblowers, activists, and anyone else who wants to challenge the status quo.

LC: So what protections do we have in place around free speech, and how are those safeguards being tested? And given the new realities, how are people fighting today to protect our constitutional rights?

MCO: All of that, next time on Raw Data.

[music: “From the Outset,” Nick Carlozzi]

MCO: Raw Data is produced by me, Mike Osborne, and Leslie Chang, with help from Isha Salian. Special thanks to Nick Weiler, voice of the future.

LC: There are a few things you might want to check out from this episode, and we’ll link to all of it in our blog post. First, Viktor Mayer-Schonberger’s book is called Big Data: A Revolution that Will Transform How We Live, Work and Think.

MCO: Ari Tuchman’s company is called Quantifind, and their website is quantifind DOT com. Thanks to Josh Anish for helping coordinate our visit.

LC: Betsy Cooper’s group at Berkeley is called the Center for Long Term Cybersecurity, and you can find their Cyberfutures 2020 report and learn about the other scenarios online at c-l-t-c DOT berkeley DOT edu SLASH scenarios.

MCO: And finally, Cindy Cohn is the executive director of the Electronic Frontier Foundation. They’re online at e-f-f DOT org.

LC: We want to thank Allison Berke and Jackson Roach for editing help this week. Our show is a production of Worldview Stanford, and we’d also like to thank everyone on team Worldview for their edits and their support. You guys are the best.

MCO: Raw Data is made possible with support from the Stanford Cyber Initiative, whose mission is to produce research and frame debates on the future of cyber-social systems. You can learn more about the Cyber Initiative and their projects at cyber DOT stanford DOT edu.

LC: Once again, this is the first episode of season 2! If you haven’t listened to season 1 yet, you can find all our episodes on iTunes, SoundCloud, and Stitcher.

MCO: We’ll be back in two weeks with our next episode. Thanks so much for listening.