Privacy engineering and transparency

One of the principle tenets of privacy engineering is transparency. Data subjects should be able to see exactly how the data is being used. However, a complex and time intensive tool, ala Google Dashboard, is not as good as a intuitive and obvious design at the forefront.

Consider the following scenarios (which I stole from my own comment on Kashmir Hill’s Forbes blog.)

(1) A patron enters a club where an employee sits with two clickers. When a female customer enters, the employee clicks the clicker in his right hand; a male, the left. The clickers transmit data to the internet where a mobile App publicizes the clubs male/female ratio.

(2) A patron enters a club where a camera takes a picture of the customer’s face. A computer hooked to the camera analyzes the picture and makes a call as to whether the person is male or female. The computer then transmits the data to the internet where a mobile App publicizes the clubs male/female ratio.

What’s the difference? Even in my description, it’s unclear what the data is in the second example. The male and female count? The customer’s photo? To a customer, the mere fact that a picture was taken raises a concern as to what is being done with this image. The company (and the club) may say the data is aggregated, but how can the customer trust them? In the first example, the method by which the collection takes place makes it obvious to the customer that nothing beyond the male/female information was collected. In other words, the design of the system in the first example provides obvious cues to customer. The customer doesn’t need to audit the App companies computer system to know that information isn’t being inadvertently used, purposefully repurposed, or susceptible to hacks, leaks or legal subpoenas. The design of the system is privacy protective by default.

Control versus Efficiency (i.e. privacy transaction costs)

While working on my speech for this weekend’s Software Architecture Symposiums International conference on Cloud Architecture, I’ve come to the curious realization that control and efficiency are a sliding scale. I’ve previously talked about the notion that privacy is not about secrets but about control.

Consider a very simple game that allows you to play with friends and associates. The pinnacle of control (i.e. privacy) would give each player random string of a characters that could be given by the player to their friends they want to play with. I say this is the pinnacle because the player must decide proactively with who to share two bits of information: one, that they play the game and secondly, how to contact them to play.

Now for someone with lots of friends, this is a very transaction costly endeavor. If I’m looking for friends to play with, I have to contact each one of them individually and ask “Do you play game XYZ? If so, here is my contact in the game.” A less costly activity would be for me to broadcast that information out to all my friends (say on my Facebook wall or via Twitter) or perhaps using an interface provided by the game’s creator. While transactions cost have now plummeted, I’ve done so at the cost of cost of control. If I want to broadcast to all my friends except my ex who still is on my friend’s list but whom I don’t really want to interact with, I now have to take the time and effort to exclude her. In other words, my transaction costs have just increased.

This is the conundrum faced by many a web 2.0 company. The create efficiencies by allowing you to connect with other people but then have to layer on control features to maintain privacy. Sure, I could email all my friends that funny video, but isn’t it easier to post it to my Facebook wall or Twitter account? But wait, I don’t want my grandmother seeing it, it’s too risque. This is where Google has tried with Circles to find a middle ground. What might be appropriate for your high school friends, might not be appropriate for family or business associates. Humans naturally segregate information. I don’t tell my co-workers about my health problems and I don’t bother my wife with a new Identity management solution. However, what Google and Facebook and the ilk fail to appreciate is the mental energy it takes to manage multiple communities of consumers of my information from one interface and the inherent risks of leakage and spillage are often not advantageous. What most consumers do (and this is based on anecdotal experience not any scientific study) is segregate people by platform or different username. Teens have been found to be using Twitter to communicate with other teens because parents don’t use Twitter. Business professionals use LinkedIn because they don’t need their business associates looking at the drunken escapades to the Caribbean.

The control efficiency dichotomy can also be looked as defaulting to opt in or opt out, with the requirement of an affirmative opt in being the control option and defaulting in but allowing for opt out being the efficient option. It bears noting that transaction costs really do play a role here. When people are required to opt in the opt in rates are similar to the opt out rates when people are given the option to opt out. Why should this be the case? If people were truly expressing their desire, it shouldn’t matter whether they were given an option to get in or automatically put in and given an option to get out. Only, if the group preference was at 50% in and 50% out should the selection of opt in or opt out be equal. But no, give people the option to opt in and 20% say yes. Give people the option to opt out and 20% say yes. Really, it comes down to 20% of the people are willing to express a preference.

I’ll post more on this topic later.

What is privacy? Again.

I’ve been trying lately, in preparation for the speech I’m giving at SASI conference on Saturday (there is still time to register!), to distill a definition of privacy that is easy and palatable to understand. I keep returning to the notion that privacy is ultimately about control of information, not about keeping things secret (a popular misconception). This is really an “aha” moment that I’ve had that most people don’t yet understand. Hopefully, this post will give you that aha moment.

On one of the mailing lists I’m on, a question was posted about ownership of data. Who owns certain data about you? As a list participant pointed out, ownership of data is really not an operative term. Data, unlike creative works, can’t be copyrighted. Data are facts. Data are opinions. Data could consist of copyrightable material but need not be. Data is not owned so much as it is possessed and controlled. Ultimately, control over data is what privacy is all about.

Often companies try to give users putative control over their data. Google, for instance, has an entire suite of tools and settings at your disposal to ostentatiously give you control of the information that they collect and maintain about you. However, control here is illusory. It only serves as an instruction to Google of your preferences and wishes, but it’s up to Google to abide by that preference. Google, could, at it’s discretion, ignore your desire. In layman’s terms we would call that a violation of one’s privacy. I’m not suggesting Google does this, merely illustrating the illusory nature of their privacy “controls.”

A company may make certain statements about your privacy and the efforts they will go to to adhere to your wishes. Some of those statements may be legally binding. The company may have additional obligations under the law or maybe moral obligations that are implied under the context of your relationship. Legally binding or not, there is no recourse once those binds are broken; the information is released; it escapes or is set free; the genie is out of the bottle. While you may take solace in seeking monetary compensation, the information can not be contained. You have NO real control.

Now, let me turn my attention to the third party doctrine. The third party doctrine essentially provides that any information you provide to a third party (i.e. not a party to the communication), is fair game for a government subpoena. In other words, despite any promises or obligations of the third party to adhere to your wishes regarding the release of the information, a government request trumps all. Again, you have NO real control.

Real privacy comes from real control. If past, present and future decisions about information are yours and yours alone, then and only then you have real privacy.