The canary in the data mine is dead

Even well intentioned data mining can leave a damaging trail, so what's a developer to do?

Recently, I read a tweet from one of my favorite journalists and activists, Asher Wolf, about Samaritans Radar, an app that mines Twitter for keywords indicating someone might be suicidal or “struggling to cope.”

Concerns have been raised about privacy issues with Samaritans Radar -- which should be taken with a grain of salt, because it mostly mines public tweets. You could say that being upset about this is the digital equivalent of yelling in the town square, then grousing the next day that someone quoted you in the paper.

More accurately, Samaritans Radar is like putting a recording device in every town square and monitoring it for catchphrases, sort of like what the NSA now does with ... everything. Samaritans Radar is a bit creepy, but in the wrong hands, it could also be destructive.

What if I’m a cretin and decide that rather than help those with psychological issues, I’d like to urge them along? If you've ever read Reddit or the comments on YouTube or 4chan, you know that beneath all that great stuff brought to you by the Internet is a sewer teeming with toxic trolls who revel in berating vulnerable naifs, racial minorities, and women.

I'm sure that Samaritans.org is a group of well-meaning people who happened to lack a critical thinker at the helm. Which brings up an important issue: Simply because we can, should we?

Consider this snippet from a recent post by Joe Ferns, executive director of policy, research, and development at Samaritans.org:

We condemn any behavior which would constitute bullying or harassment of anyone using social media. If people experience this kind of behavior as a result of Radar or their support for the App, we would encourage them to report this immediately to Twitter, who take this issue very seriously. 

Nice sentiment, and to be sure, what is yelled in the town square is public. But with the technology to mine the data, correlate it, and republish it, what are the ethical and liability concerns? Samaritans Radar crosses local, provincial, and national boundaries. There must be laws from copyright considerations to privacy and stalking laws. The Samaritans organization may be partly protected, but what about your company?

Hoovering everything for fun and profit

Mining social networks is already commonplace. Say anything nasty about any U.S. airline, and it will respond. The company still won’t fix its crap service, but it'll respond to tweets with canned expressions of sympathy. Some companies even sue people for trashing their brand (which inevitably backfires).

I know how to create a social graph, track the original bad sentiment back to the source, and intervene if necessary using open source tools and technologies. It isn’t even hard. However, are their cases where I should refuse? Are their situations where I’m both ethically and potentially legally obligated to say, “I’m sorry, that is a bad idea”?

I bet most of you don’t even use Twitter. You probably do your capitalist rendition of Maoist self-reporting via Facebook -- and if you’re geek enough, maybe via Google Plus. But what about Verizon, AT&T, and their permacookies? Every unencrypted request you send across the Web from your phone (or possibly tethered from your phone) has an extra header added that uniquely identifies you. All it takes is any piece of identifying information anywhere on the Internet, and everything you do can be tracked by Facebook, Google, and their affiliates.

I realize this isn’t new. Browser cookies have been doing it for almost two decades, but you can delete your browser cookie and start anew. You can go into “incognito mode” or turn off cookies. Verizon is doing this further up the chain.

What are the downsides of such constant exposure? Recently, right after I had some dental work done, I announced I was signing off Twitter for a bit and taking a Percocet. I did this to inform family, friends, and casual followers that I wouldn’t be posting -- or if I did, not to worry if I posted something strange. I didn’t intend to be added to a government or corporate database of potential drug offenders, possibly cataloged for risk, and potentially subjected to ads for other pain medications or medical treatments. Do you have any doubt that at least some of that happened?

When it comes to data about you, which you didn’t directly intend to communicate, we only have the “terms of service” -- which protects a company’s right to collect your data and use however it pleases. Nothing really protects you.

Should any legal protections or regulations be put into place? Could they be enforced? Do we all have to start throwing CryptoParties using Tor and the various alternatives to CipherShed?

That seems like a lot more work than this Internet thing is worth.

This story, "The canary in the data mine is dead" was originally published by InfoWorld.

Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.