Privacy Maverick – Page 11 – Privacy provocateur. Digital nomad.

Software Architecture Symposiums International

I’ll be speaking at the Software Architecture Symposiums International conference on May 18-19th, 2012. The conference will be about software architecture in the cloud and my speech in particular will be about architecting for privacy (what else?!).

Talk by Ann Cavoukian

Here is a Youtube video of Ann Cavoukian speaking at Web 2.0. Zero sum is not the solution, privacy will always lose. Positive sum is the way to go.

Crypto Class

Consider signing up for this online class in Cryptography taught by Stanford University’s Dan Boneh.

Euclid – Online Analytics for the Offline World

H/T to Vince for sending me this one.

Clearly the writing is on the wall with mobile tracking. Euclid aims to do for “brick and mortar” stores what Google does for online stores. I won’t comment about they business model, it does seem interesting and possibly viable in metropolitan areas with high percentage of mobile phone users but whether that turns into a successful business is another question.

On the issue of privacy, they clearly seem to at least put privacy front and center. They, at least know it’s going to be an issue. Their privacy statement, while lengthy, is fairly clear and straight forward, not legalese, and puts three principles at its core: limited data collection, anonymous and aggregate and easy opt out. I’m intrigued by this product, so I’m going to analyze the privacy statement here in a stream of conscious form, so please forgive me it seems rambling.

As mentioned, the overview cites their three privacy principles.
1) limited data collection -I’m wondering how much is a result of the limitations of technology as opposed to self imposed limitations. They identity users by MAC address and say they don’t collect information such as who is using the phone or any content on the phone. Again, that seems like a technical limitation. I suppose they could use cameras to take pictures of people as they walked by to correlate the signal, and don’t think that’s not coming.

2) Only share aggregate and anonymous information. Fair enough, but what if the merchant correlates the data somehow the aggregate data? Also, does it aggregate across stores? If Joe walks into Starbucks 1 and doesn’t buy a coffee but then crosses the street and buys a coffee there, can they tell me that 1% of customers bought from a different store than they initially walked in?

3) Opt-out. I’m not crazy about this one. Yes you can opt out by giving them your Mac address on their website but what about at the store? I understand their desire to go opt-out but wouldn’t it be more privacy friendly if it were opt in? Let store give away coupons for “scanning” your phone and tracking you, or something of that nature. What if I don’t mind one store tracking me (Starbucks) but don’t want another doing it (Hustler).

What exactly do we collect and why?
A repeat of the statements that they only collect information from phones with Wifi that is enabled and then anonymize it and aggregate it.

How do we collect information?

“Euclid’s sensors passively detect these MAC addresses and encrypt them, then transmit the data to our collection server” Obviously I am not privy to their exact technology but I’m curious if they encrypt, transmit and decrypt or if they hash the MAC address. Better yet, how about hashing it with a unique code for the store (thereby making each store or chain’s data independent)? The could send a generic hash too, to compare against the opt out database, although here again it would be easier if it was opt in per store or chain then they could easily do a comparison of what to include, not what to exclude.

If your data does not contain my name or any other personal information, how is it useful?

“Do more people usually tend to grab a coffee or an ice cream after going to the dentist?” Ahh… is this the smoking gun in terms of showing that they correlate the data across merchants. It’s like the ultimate http referrer but with your complete history (again aggregated). I would think some merchants might be concern for their own competitive purposes with allowing them access to this information. What if their customers are leaving their store and going to a competitor to make their purchase?

“Euclid may augment the records in its database with information it guesses infers from user activity, such as whether a device owner is male or female, income bracket, etc” Here is another statement that concerns me. How exactly does one infer income bracket from a MAC address? What other information are they correlating it against? Have you ever been discriminated against at a car dealer because you didn’t fit their profile for who purchases their cars? Consider a system like this that informs a store owner than on Tuesdays, 90% of visitors are from a low income part of town and are likely to be window shoppers (or thieves) rather than purchasers. What are the ramifications of this?

With whom do you share information?

“Euclid does not combine data from multiple retail locations into one report EXCEPT when all the locations in question are owned by the same company” How do they tell the ice cream store that 50% of their visitors came directly from the dentist office next door? Or was that just a really bad example that isn’t an accurate display of the types of information?

“We only use qualified, respected 3rd-party providers (such as Amazon Web Services) where for data exchange, aggregation, and storage, which is necessary to provide Euclid services” The grammarian in me is coming out; is this a sentence?

How do you store data?

“Once collected, it is transferred securely (using SSL) and is anonymized (hashed) before it is stored on Amazon Web Services.” AH! They do hash it. Given the above statement that it can only be correlated by chain, I’m wondering if they concatenate the hash with a code specific to that chain?

Is there any way to link data about your device’s physical location back to you?
“If someone already knows both your name and your MAC address, they could potentially legally require that we provide them with information in our database about your mobile device including the locations of the sensors where we recorded your signal.” Hmm, one of the limitations of hashing. If they were to take it to the next level, they could use M of N secret splitting or some other technique to only store the data in aggregate form thereby not storing the hashes in a way that could be subpoenaed. One of the benefits of not having such data (but still having the data to function) is that you avoid the inevitable cost of responding to subpoenas and court orders.

All in all, I must say I’m impressed that they took privacy seriously. I only have two primary concerns,
(1) whether adequate notice is given In-Store about the ability to opt-out or even notice on the collection of information, and
(2) whether an easier way of showing their commitment is possible. I can tell, by reading, and them talking about hashes they aren’t just paying lip service to “we value your privacy” but really are embedding it into their service. The question is how can they demonstrate this to someone who is not going to take the time to read the privacy statement. Going back to my vot
ing analogy that I use often, when I drop my ballot in the ballot box, I know they can’t tie my vote back to me.

Surveiling the surveilers.

This article brings up an interesting point, as surveillance and privacy become more difficult for the average citizen, it also becomes increasing difficult for the spooks who need to remain anonymous in their detective work. Tor, the anonymous network, was originally a project of the U.S. Navy who saw the benefits of anonymity in cyberwarfare.

User Insecurity

Two recent articles highlight what is probably old news to most security professionals: the weakest security link is the user. This is why it’s important to help users help themselves. They aren’t security (or privacy) experts. This is especially true when circumventing what user’s trust is a secure connection (i.e. a supposedly helpful man in the middle). I find it especially interesting to see that most users, who view going to a webpage as a very solitary and privaty experience, aren’t even aware of all the other users who can go to the same website as them. This is why they choose “password” as their password, because for them they are alone in going to the site, no one is around to see them type in “password.” They just are cognizant that other people might try to type in their username and “password.”

by analogy

Humans often learn by analogy. You take something you don’t understand and try to analogize to something you do understand. This is one of the reasons wave particle duality is so difficult to understand, because light in analogous to two incongruous models we already know: waves and particles.

Cryptography is also very difficult to understand because people have a hard time bridging the gap between things they know and are familiar with and things they don’t know. Some cryptographic techniques just don’t have very good real world analogies. It’s like quantum science, unfathomable to most people.

M of N secret splitting is just such a creature. Something that really has no real world analogy and is hard for people to grasp. Even harder it seems, is the multitude of applications such a neat techniques has for us.

Say you have a sentence “Now the right to life has come to mean the right to enjoy life, — the right to be let alone.” which you want to split up and store in N locations. The most obvious option, store the entire sentence in each location, isn’t very secure since any one of those locations could be compromised and give up your secret. You could split the sentence into N parts and store each part in each location and then it would take someone compromising each location to reconstruct your sentence. That’s better but now we suffer from another problem. What happens if we lose one of our locations (due to corruption or destruction)? Now we can’t reconstruct the sentence, because we’ve lost data. What’s a nice medium solution?

M of N secret splitting allows us to split the sentence into N parts but any M of them will reconstruct the entire sentence. For example we could split it into 10 parts and it takes 5 to reconstruct the sentence. This gives us the security of an attacker needing to compromise 5 locations AND us the security that we could lose 50% of the locations and still find the secret. Without cryptography it’s difficult to understand how this could be done. What’s even more interesting, is what else can be accomplished with this technique, beyond the simple idea of secret sharing. I gave one example last month, but I’m going to give another example this month that I thought about extensively during the Privacy Academy in Dallas.

Assume your customers have your mobile phone application and you want to be able to alert them to crowded conditions so they can avoid them (like traffic) or go to them (like hot nightclubs) but you don’t want to worry about tracking customers location information. You could have some sort of polling system that just ticked off where customers are without storing the information but the customer’s phones still have to let you know where they are, right? This leads to problems of hack, leaks or employee malfeasance. Wouldn’t it be better to have a system that gave you the information you needed without customers needing to tell you where they were? Enter M of N secret splitting.

Let each customer be identified by a unique customer name (email address or username). Take the user identifier and compress it down to one of 100 slots. In other words, take the numeric equivalent of the identifier and modulo it by 100. [100 is an arbitrary number here]. That way each customer falls into an essentially random bucket out of 100 buckets.

Now describe the location, say “El Gaucho Inca Restaurant” (where I recently had a delicious Peruvian meal). Perform M of N secret splitting on the location, where N is 100 and M is 5. This way, you have 100 parts and any 5 will give you the location. Now, pick the piece number equivalent with the slot you chose for this customer and upload that.

The company now has one piece of data it can’t do anything else. If it collect 4 others, it can reconstruct the location, but not until. By splitting it into 100, there is a 99/100 chance that the next piece it gets won’t be in the same bucket. We have met the criteria of absolutely not knowing where each person is, yet we can tell, in the aggregate, if at least 5 people are at this location.

While this may not seem like a major revelation, it seems that many system designers and business people have difficultly grasping the ability to build a system to meet the needs without storing or collecting data it would seem critical to the system. I’ll be talking more about this in future posts.

Cloud Computing contracts

As many others have pointed out, cloud computing is really nothing new. Before it was called cloud computing, application service providers (ASPs) provided software not as a downloadable product but as an online service. Really, what has changed is the acceleration of software (or infrastructure, data or platforms) as a much more modular and turnkey service. Service providers have minified the transaction costs of software (or hardware). Whereas before purchasing new or additional services took time and effort (i.e. transaction costs) on the part of both the seller and buyer, now it can be requisitioned and provisioned with a few clicks of a mouse, the so-called utility model; one just increases demand by adding more consuming devices and the utility provides.

However, shrinking transaction costs for efficiency means that there is no longer room for substantial negotiations between provider and consumer. This leads to a gap in the needs of the consumer for certain protections (e-discovery, retention, security, privacy etc) and the desires of the provider to limit liability and provide a one size fits all service. Bigger clients, which may command attention and have some bargaining power, make it more difficult for service providers to provide a simple cheap service because of the need for negotiation. I’m suggesting the end result is probably a stratification of service providers in differing industries (or geographically) in order to limit the need for negotiation with clients who have differing needs.

Privacy Academy 2011

I attended Privacy Academy 2011 in Dallas last week and it was quite interesting. Met a lot of people and have been contacting them furiously this week (while still trying to catch up on 2 weeks of missed work). While the seminars and lectures were thought inspiring (especially the one on the law of obscurity), it’s still problematic the gap between the legal privacy types and the mathematical/computer science community. I was inspired, though seeing Marc Rotenberg, o EPIC give the headlining speech at the Friday luncheon. He mentioned many people that I admire, such as Phil Zimmerman (PGP), David Chaum (DigiCash) and others. He spoke about the need for PETs and Privacy by Design, which as I’ve mentioned, is sorely needed in the Privacy Professional community.

I did submit a proposal to do a speech on privacy engineering for non-engineers at the Global Privacy Summit next year in Washington. Crossing my fingers that it occurs.

Workshop on Privacy in the Electronic Society

WPES is going on mid October. I just learned about it or I might have tried to go.