De-Identification

This blog post is in response to Daniel Barth-Jones’ tweet to me about de-identification. Due to the space limitations of twitter, I felt a blog post was the best place to respond.

Daniel Barth-Jones @dbarthjones

@PrivacyMaverick @GarrettCobarr “De-identification is not enough. Need k-anonymity with a large k” Really? Check out: http://www.vldb2005.org/program/paper/fri/p901-aggarwal.pdf 
11:25 PM – 25 Nov 12 
 
 
I agree.  K-anonymity may not be useful and it may not be practical. My point in the above statement, which was in response to a tweet Garrett did about re-identification in Big Data, was that the simplistic removal of identifiers and quasi-identifiers, which many people equate with de-identification, is insufficient. The article Daniel references above concludes that the information removed in order to achieve k-anonymity may render the data useless from a data mining perspective. Clearly. I didn’t mean to suggest otherwise.  As Daniel notes in his slide deck, perfect information and perfect privacy are mutually exclusive.