I’ll be speaking at the Software Architecture Symposiums International conference on May 18-19th, 2012. The conference will be about software architecture in the cloud and my speech in particular will be about architecting for privacy (what else?!).
Month: November 2011
Talk by Ann Cavoukian
Here is a Youtube video of Ann Cavoukian speaking at Web 2.0. Zero sum is not the solution, privacy will always lose. Positive sum is the way to go.
Crypto Class
Consider signing up for this online class in Cryptography taught by Stanford University’s Dan Boneh.
Euclid – Online Analytics for the Offline World
H/T to Vince for sending me this one.
Clearly the writing is on the wall with mobile tracking. Euclid aims to do for “brick and mortar” stores what Google does for online stores. I won’t comment about they business model, it does seem interesting and possibly viable in metropolitan areas with high percentage of mobile phone users but whether that turns into a successful business is another question.
On the issue of privacy, they clearly seem to at least put privacy front and center. They, at least know it’s going to be an issue. Their privacy statement, while lengthy, is fairly clear and straight forward, not legalese, and puts three principles at its core: limited data collection, anonymous and aggregate and easy opt out. I’m intrigued by this product, so I’m going to analyze the privacy statement here in a stream of conscious form, so please forgive me it seems rambling.
As mentioned, the overview cites their three privacy principles.
1) limited data collection -I’m wondering how much is a result of the limitations of technology as opposed to self imposed limitations.  They identity users by MAC address and say they don’t collect information such as who is using the phone or any content on the phone.  Again, that seems like a technical limitation.  I suppose they could use cameras to take pictures of people as they walked by to correlate the signal, and don’t think that’s not coming. 
2) Only share aggregate and anonymous information. Fair enough, but what if the merchant correlates the data somehow the aggregate data? Also, does it aggregate across stores? If Joe walks into Starbucks 1 and doesn’t buy a coffee but then crosses the street and buys a coffee there, can they tell me that 1% of customers bought from a different store than they initially walked in?
3) Opt-out. I’m not crazy about this one.  Yes you can opt out by giving them your Mac address on their website but what about at the store?  I understand their desire to go opt-out but wouldn’t it be more privacy friendly if it were opt in?  Let store give away coupons for “scanning” your phone and tracking you, or something of that nature. What if I don’t mind one store tracking me (Starbucks) but don’t want another doing it (Hustler). 
What exactly do we collect and why?
A repeat of the statements that they only collect information from phones with Wifi that is enabled and then anonymize it and aggregate it.  
How do we collect information?
“Euclid’s sensors passively detect these MAC addresses and encrypt them, then transmit the data to our collection server”  Obviously I am not privy to their exact technology but I’m curious if they encrypt, transmit and decrypt or if they hash the MAC address.  Better yet, how about hashing it with a unique code for the store (thereby making each store or chain’s data independent)?  The could send a generic hash too, to compare against the opt out database, although here again it would be easier if it was opt in per store or chain then they could easily do a comparison of what to include, not what to exclude. 
If your data does not contain my name or any other personal information, how is it useful?
“Do more people usually tend to grab a coffee or an ice cream after going to the dentist?”  Ahh… is this the smoking gun in terms of showing that they correlate the data across merchants.  It’s like the ultimate http referrer but with your complete history (again aggregated).  I would think some merchants might be concern for their own competitive purposes with allowing them access to this information.  What if their customers are leaving their store and going to a competitor to make their purchase?  
“Euclid may augment the records in its database with information it guesses infers from user activity, such as whether a device owner is male or female, income bracket, etc”  Here is another statement that concerns me.  How exactly does one infer income bracket from a MAC address?  What other information are they correlating it against? Have you ever been discriminated against at a car dealer because you didn’t fit their profile for who purchases their cars?  Consider a system like this that informs a store owner than on Tuesdays, 90% of visitors are from a low income part of town and are likely to be window shoppers (or thieves) rather than purchasers.  What are the ramifications of this?
With whom do you share information?
“Euclid does not combine data from multiple retail locations into one report EXCEPT when all the locations in question are owned by the same company”  How do they tell the ice cream store that 50% of their visitors came directly from the dentist office next door?  Or was that just a really bad example that isn’t an accurate display of the types of information?
“We only use qualified, respected 3rd-party providers (such as Amazon Web Services) where for data exchange, aggregation, and storage, which is necessary to provide Euclid services”  The grammarian in me is coming out; is this a sentence?
How do you store data?
“Once collected, it is transferred securely (using SSL) and is anonymized (hashed) before it is stored on Amazon Web Services.”  AH! They do hash it.  Given the above statement that it can only be correlated by chain, I’m wondering if they concatenate the hash with a code specific to that chain?  
Is there any way to link data about your device’s physical location back to you?
“If someone already knows both your name and your MAC address, they could potentially legally require that we provide them with information in our database about your mobile device including the locations of the sensors where we recorded your signal.”  Hmm, one of the limitations of hashing.  If they were to take it to the next level, they could use M of N secret splitting or some other technique to only store the data in aggregate form thereby not storing the hashes in a way that could be subpoenaed. One of the benefits of not having such data (but still having the data to function) is that you avoid the inevitable cost of responding to subpoenas and court orders.  
All in all, I must say I’m impressed that they took privacy seriously.  I only have two primary concerns, 
(1) whether adequate notice is given In-Store about the ability to opt-out or even notice on the collection of information, and
(2) whether an easier way of showing their commitment is possible.  I can tell, by reading, and them talking about hashes they aren’t just paying lip service to “we value your privacy” but really are embedding it into their service.  The question is how can they demonstrate this to someone who is not going to take the time to read the privacy statement.  Going back to my vot
ing analogy that I use often, when I drop my ballot in the ballot box, I know they can’t tie my vote back to me.  
