Privacy implications of Local Storage in web browsers

Privacy professionals often have a hard time keeping track of technology and how it affects privacy. This post is meant to help explain the technology of local/web storage.

With the ability to track users across domains, cookies have earned a bad reputation in the privacy community. This became particularly acute with the passing of the EU Cookie law. In particular the law requires affirmative consent when local storage on a user’s computer is used in a way that is not “strictly necessary for the delivery of a service requested by the user.” In other words, if you’re using it to complete a shopping cart at an online store, you need not get consent. If you’re using it to track the user for advertising purposes then you need to get consent.

Originally part of the HTML5 standard, web storage was split into it’s own specification. For more history on the topic, see this article. Web storage is meant to be accessed locally (by javascript) and can store up to 5MB per domain, compared to cookies which only store a maximum of 4kbs of data. Cookies are natively accessible by the server; the purpose of the cookie is to be accessed by server side scripts. Web storage is not immediately accessible by the server but it can be through javascript.

CONS

The con here is that, as a privacy professional, you should be aware of what your developers are doing with web/local storage. Simply asking your developer if they are using cookies may illicit a negative response when they are using an alternative technology that isn’t cookies. Later revelations and returning to your developers may result in a response “Well you asked about cookies, not local storage!” There are also proposals for a local browser accessible database but as of the time of this writing this is not an internet standard (see Mozilla Firefox’s IndexDB for an example).

Web Storage is not necessarily privacy invasive but two things need to be addressed. First, whether that local data is transmitted back to the server or used in such a way that implies results that are transmitted back to the server. Secondly, whether the data stored in local storage is accessible to third parties and represents a risk of exposure to the user. As of this writing, I’m not sure if 3rd party javascript running through a 1st party domain has the ability to access local storage of if it is restricted by a content security policy. The other risk is that a local user can access local storage through the a javascript console. Ideally data on the client should be encrypted.

PROS

Local storage also has the potential to increase privacy. Decentralization is a key technique for architecting for privacy and having access to 5MB of local storage allows enough room to keep most, if not all, client data on the client. Instead of developing rich customer profiles for personalization on the server, keeping this data on the client reduces the risks to the user because the server becomes less of a target. Of course, care must be taken to deal with multi tenancy (more than one person on an end client), which may be especially difficult for systems accessed often by library patrons and the problems of people accessing the data of other local users.

Thoughts on DNT

Yesterday I attended the Atlanta IAPP KnowledgeNet on DNT.  The panelist were Peter Swire (@peterpswire) and Brooks Dobbs. Peter is co-chair of the W3C working group on DNT. He is trying to find consensus amongst nearly 100 participants in the process.  Up until recently, at least one point of contention was whether DNT stood for Do Not Track or Do Not Target.  However, the dispute isn’t over the acronym so much as whether the meaning is to not send people targeted ads or track them as they surf the internet and compile dossier or at least gauge  correlated interests of segments of the population.  I would suspect that some of the confusion is related to the historical providence of the DNT initiative coming out of the wildly successful Do Not Call registry. While there are several distinction, one that must be clearly understood is the underlying privacy harm.  In Do Not Call, no one is decrying marketing firms dossier building, rather it is the intrusion into the personal space of the called part that is deemed the privacy harm. Counter this to web surfing and the ubiquity of advertising on free websites. The intrusion in the personal space of the user is not the privacy harm at issue.  Rather the desire by firms to almagamate information about people to create more targeted or more effective advertising (thus increasing cost effectiveness of the advertising).  Before we can begin designing a solution, it’s important to identity the actual harm you’re trying to prevent.

Peter framed it rather nicely during the panel by asking two questions.  The first question he asked the audience was whether they wanted to be tracked without their knowledge as they surfed the internet. Only one hand was raised and that one was in jest.  The second question, to which many people myself included answered affirmatively, was whether they wanted a more personal experience on the internet. Those two questions framed the debate as a conflict between something people don’t want (tracking) and something people do want (personalization).  Thought I personally haven’t engineering a system to do it, my gut reaction is that these two concepts need not be at odds.  Certainly, you can design a system (as most are) where tracking support personalization but I suspect that just because it is sufficient, does not necessitate tracking.

Consider for instance a decentralized design where a person’s dossier were kept browser side. When an ad network wanted to serve an ad, the browser could request an ad targeted at sports enthusiasts who like dogs. If someone visited a website for fishing, the dossier may add that tag to the person’s dossier. An intrusive one might actually say “Hey this website wants to identify you as a fisherman.” Later, the person could modify or even wipe out their personal dossier. Family members could switch dossiers depending on who is using the browser or even individuals could maintain multiple personas (the dog loving sports enthusiast and the business professional into golfing).  This would solve one problem the ad industry is struggling with which is transparency.  Some ad networks may reject this idea because they won’t be able to throttle ads (not true, the browser could tell the ad network not to send some ads) or because they can’t resell their data about customers. As to the second point, just because it’s more cost effective for the business (and thus the consumer), doesn’t make it an acceptable practice.  Cold calling is an effective sales technique (even with only a fraction of a percent acceptance rate) but we as a society reject it because the benefit to the consumers is not apparent.  Under that argument any privacy invasive technique that saves a company money could be argued to be beneficial to the consumer no matter how privacy invasive.

Tracking clearing falls within the umbrella of a surveillance system. Many types of surveillance could bring about more cost efficiencies, but it doesn’t make them legitimate.  The question is, are there other business models that bring the cost benefits of targeted advertising but don’t carry the privacy harms of tracking?

Before the W3C working group can come to the agreement, they must realize that personalization are not at odds and creative design solutions are possible.  Industry must be willing to explore all technological options and fully understand the privacy risks and tradeoffs of different solutions. I’d love the opportunity to work with any ad network interested in exploring the options.