Photo: TenSafeFrogs/CC by 2.0

Big Data: Star Trek style.

There's been a flood of news over the past couple days about the scope of the National Security Agency's data mining—shocking, or not, depending on how paranoid you are and/or how much James Bamford you've read. Which, well:

The heavily fortified $2 billion center should be up and running in September 2013. Flowing through its servers and routers and stored in near-bottomless databases will be all forms of communication, including the complete contents of private emails, cell phone calls, and Google searches, as well as all sorts of personal data trails—parking receipts, travel itineraries, bookstore purchases, and other digital “pocket litter.” It is, in some measure, the realization of the “total information awareness” program created during the first term of the Bush administration—an effort that was killed by Congress in 2003 after it caused an outcry over its potential for invading Americans’ privacy.

This reminded me of a couple things at the civic level. One is how electronic communications will interface with local law enforcement. For instance:

After that catchy intro, Richard uses the second half of his presentation to give us a demo of how they used the EID engine to help the Chicago police monitor social traffic around the Nato summit, by running a realtime social application powered by Salience that the police could watch all weekend. The engine is interesting in that Endeca developed it specifically to be able to address any type of data, be it structured, semi-structured, or completely unstructured.

Richard goes on to explain that they continued the job by scraping data from the National Counter Terrorism website. They dropped the data in Endeca, processed it with Salience, and used the results to expose the relationships between the scraped data and current mentions of them in social media. In this way, many relationships are exposed between two completely different datasets.

The discussion starts around 14 minutes. The part where they sort NATO protest twitterers by Klout score is pretty funny; more interesting is showing how "negative sentiment" tweets sync up when the protests start getting intense. I'm certain my data is in there because I was tweeting during the protest, and I have geolocation enabled. "Interesting for the police were the most popular articles and links being sent out." I hope I was in there too. It's like the CPD following a ChartBeat site of the whole city.


"We were testing the real-time capability of data sift and we were finding that it was anywhere from a quarter of a second to half a second between hitting send on Twitter and it showing up in this application." Including POLICE STATE showing up very large in the word cloud that the police were looking at. Sometimes we're having conversations about civil liberties without really realizing it.

The other was an interview I did with Rayid Ghani, the chief scientist of the Obama campaign. One of Ghani's chief complaints about media coverage of the Obama team's vaunted data analytics was its emphasis on proprietary commercial data, the old saw about how what you buy can determine whether you're going to vote for someone for president. Ghani told me that the really useful data they had was voluntarily given, which makes a certain amount of sense—why settle from a vague mathematical intuition based on whether someone bought a Subaru and a subscription to Garden & Gun when you can just ask?

I asked Ghani if cities could supplement the vast but patchy information they already have with harder to obtain but more useful information of the kind the Obama campaign got directly and voluntarily:

I think so. People are not inherently against giving data if you are able to explain to them what you are doing with that data, and if you are able to give them something in return.

If you wanted to give them a particular service, [you ask] "where do I send you that service?" What you're asking for is information that allows you to give me that service. And that happens all the time, right? If you're looking at students, they fill out the FAFSA to get financial aid. You fill out your income information to get a credit card.

And the city is not any different as as service organization. As long as the city uses that data to improve the service they provide… it's an empirical question, but other organizations have shown that if you're improving the service you're giving people, they're willing to give you more, because they're getting a benefit in return.

We voluntarily give this kind of information all the time to receive things that benefit us, whether it's a free, robust email service or a president. The commercial aspect of big data has normalized this, and one of the most interesting questions that will come out of this is whether these NSA revelations will actually be embraced by the American public. Take David Simon, former Baltimore Sun crime reporter and creator of The Wire:

But those planes really did hit those buildings.  And that bomb did indeed blow up at the finish line of the Boston marathon.  And we really are in a continuing, low-intensity, high-risk conflict with a diffuse, committed and ideologically-motivated enemy.  And for a moment, just imagine how much bloviating would be wafting across our political spectrum if, in the wake of an incident of domestic terrorism, an American president and his administration had failed to take full advantage of the existing telephonic data to do what is possible to find those needles in the haystacks.  After all, we as a people, through our elected representatives, drafted and passed FISA and the Patriot Act and what has been done here, with Verizon and assuredly with other carriers, is possible under that legislation.  Indeed, one Republican author of the law, who was quoted as saying he didn’t think the Patriot Act would be so used, has, in this frantic little moment of national overstatement, revealed himself to be either a political coward or an incompetent legislator.  He asked for this.  We asked for this.  We did so because we measured the reach and possible overreach of law enforcement against the risks of terrorism and made a conscious choice.

Foreign policy/counterterrorism journalist Joshua Foust says the same thing from a more skeptical point of view: "How do you fix a problem like the NSA?" One fix, for Foust, is to give up the services we receive in return for generating all this data: "Restrict the capacity of the government to spy on us. That means restricting their capacity to see threats early on. It also means, again, accepting more risk."

Today the blame is being laid at the feet of the Obama and Bush administrations, but as Foust points out, that would be unfortunate; Congress has to reauthorize these programs. It's also a more malleable target. But Americans have to express objections to it—every two years, not just every four. Whether they will is something maybe even the NSA can't figure out.