Artificial Motivations

Recently I had the pleasure of working with two start-ups in the AI space to help them consider privacy in their designs. Mr Young, a Canadian start-up, wants to use an intelligent agent to allow individuals to find resources available to them to improve their mental well-being. Eventus AI, a US based company, hopes to use AI to optimize the sales funnel from leads collected at events. Both recognized the potential privacy implications of their services and wanted to not only ensure compliance with legal obligations, but showcase privacy as an important aspect of their brand.

At the onset of my engagements, I had to think about how the intelligent agents driving them could threaten individuals’ privacy. In my previous work, threat actors were persons, organizations or governments, each with distinctive motives. People can be curious, seek revenge, trying to make money, or exert control.  Organizations are generally driven by making money or creating competitive advantage. Governments invade privacy for law enforcement or espionage purposes. Less angelic governments may invade privacy out of desires for control or repression of their citizens.

Figure 1 From the book Strategic Privacy by Design chapter on Actors

Typically, when I think of software, it isn’t a “threat actor” in my privacy model. They don’t have independent motives. They are tools made by developers but they don’t have motives on their own. The question arises though; does AI represent a different beast? Does AI have “motives” independent of its creator? Clearly, we haven’t reached a stage where HAL 9000 refuses Dave’s command or Skynet determines humanity as a threat to its existence, but could something slightly less sentient manifest motive?

Still, I would argue that AI is not similar to other software. It can, in the sense, present a privacy threat beyond the intent of its creator.  While not completely autonomous, AI does exhibit an ends justify the means approach to achieving its objective.  The difference between AI and, say, a human employee, is the human can put their business objective in context of other social norms, whereas AI lacks this contextual understanding. I liken it to the dystopian analogy of robots being programmed to prevent humans from harming one another and determining the best way is to exterminate all humans. Problem solved! No more humans harming other humans. Like the genie granting a wish, they do exactly what they are told, sometimes with unintended and far-reaching consequences.

The motivation that I would ascribe to AI then is “programmatic goal-seeking.” It is not that AI seeks to invade privacy for independent purposes; rather it seeks whatever it’s been programmed to seek (such as ‘increasing engagements’). Privacy is the beautiful pasture bulldozed on AI’s straight-line path to its destination.

The question now becomes, from the perspective of a developer trying to build an AI into a system, how do you prevent privacy being a casualty of that relentless pursuit? I make no claims that my suggestion below in any way supplants all of the efforts to consider ethics in AI development (failed or successful), but rather this is the approach I take complements others. I think it gets us far along in a pragmatic and systematic way.

Before looking at tactics in the AI context (or anywhere really) there is a fundamental construct the reader must understand: the difference between data and information. Consider a photo of a person. The data is the photo – the bits, bytes, interpretations of how color should be rendered, etc. But a photo is rich with much information. It probably displays the gender of the individual, their hair color, their age, their ethnic background, perhaps their economic or social status. Even without geotagging, if the photograph has a distinctive background it could reveal the person’s location. Their subject’s hairstyle and dress and the quality and makeup of the photo might suggest the decade it was recorded. Giving over that photo to someone not only gives them the bits and the bytes but also gives them all of that rich information.

In general, for privacy by design, I use Jaap-Henk Hoepman’s strategies and tactics to reduce privacy risks. Just as they can be applied to other threat actors, I think they are equally applicable here. Returning to how to use Hoepman’s strategies against AI, consider the following example:

Your company has been tasked with designing an AI based solution to sort through thousands of applicants to find the one best suited for a job. You’re concerned the solution might adversely discriminate against candidates from ethnic minority populations. If you’re questioning whether this is even a “privacy” issue, I’d point you to the concept of Exclusion under the Solove Taxonomy.  We’re (well the AI) is potentially using information, ethnicity, without knowledge and participation of the individuals, an Exclusion violation.

How then can we seek to prevent this potential privacy violation?

Two immediate tactics come to mind. These are by no means the only tactics that could or should be employed but illustrative. The first is stripping which falls under the Minimize strategy and my ARCHITECT supra-strategy. Stripping refers to removing unnecessary attributes. Here, the attribute we need to remove is ethnicity. This isn’t as simple as removing ethnicity as a data point given to the AI. Rather, returning to the distinction between data and information, we need to examine any instance where ethnicity could be inferred from data, such as a name or cultural distinctions in the way candidates my respond to certain questions. This also includes ensuring that training data doesn’t contain hidden biases in its collection.

The second tactic is auditing which falls under the Demonstrate strategy and my SUPERVISE supra-strategy. AI already employs validation data to ensure that the AI is properly goal seeking (i.e. achieving its primary purpose). Review of this validation process should be used to also continue ensuring that the AI isn’t inferring ethnicity somehow (that we failed to strip out) and using that information inappropriately as part of its goal seeking objective. If it turns out it is, then, similar to a human employee, the AI might need retraining with new, further sanitized, data.

While AI represents a new and potentially scary future, with proper design considerations and strategic systematic approaches, we reduce the potential privacy risks they would otherwise create.

Data Fetishism, FIPPs and the Intel Privacy Proposal

fetish – ‘An excessive and irrational devotion or commitment to a particular thing.

Basing their legislative proposal in the Fair Information Practice Principles (FIPPs), Intel looks to the past not the future of privacy.  The FIPPs were developed by the OECD in the 1970’s to help harmonize international regulation on the protection of personal data. Though they have evolved and morphed, those basic principles have served as the basis for privacy frameworks, regulations and legislation world-wide. Intel’s proposal borrows heavily from the FIPPs principles: collection limitation, purpose specification, data quality, security, transparency, participation and accountability. But the FIPPs age is showing. In crafting a new law for the United States, we need to address the privacy issues for the next 50 years, not the last.

When I started working several years ago for NCR Corporation I was a bit miffed at my title of “Data Privacy Manager.” Why must I be relegated to data privacy? There is much more to privacy than data and often controls around data are merely a proxy for combating underlying privacy issues. If the true goal is to protect “privacy” (not data) then shouldn’t I be addressing those privacy issues directly? The EU’s General Data Protection Regulation similarly evidences this tension between goals and mechanism. What the regulators and enactors sought to rein in with the GDPR was abusive practices by organizations that affected people’s fundamental human rights, but they constrained themselves to the language of “data protection” as the means to do this, leading to often contorted results. The recitals to the regulation mention “rights and freedoms” no less than 35 times. Article 1 Paragraph 2 even states “This Regulation protects fundamental rights and freedoms of natural persons and in particular their right to the protection of personal data.” Clearly the goal is not to protect data for its own benefit, the goal is to protect people.

Now many people whose career revolves around data focused privacy issues may question why data protection fails at the task. Privacy concerns existed way before “data” and our information economy. Amassing data just exacerbated power imbalances that are often the root cause of privacy invasions. For those still unpersuaded whether “data protection” is indeed insufficient, I provide four quick examples where a data driven regulatory regime fails to address known privacy issues. They come from either end of Prof. Dan Solove’s taxonomy of privacy.

Surveillance – Though Solove classifies surveillance and interrogation under the category of Information Collection, the concern around surveillance isn’t about the information collected. The issue with surveillance is it invites behavioral changes and causes anxiety in the subject being watched. It’s not the use of the information collected that’s concerning, thought that may give rise to separate privacy issues, but rather the act and method of collection. Just the awareness and non-consent of surveillance (the unwanted perception of observation in Ryan Calo’s model) triggers the violation. No information need be collected. Consider store surveillance by security personnel where no “data” is stored. Inappropriate surveillance, such as targeting ethnic minorities, causes consequences (unease, anxiety, fear, avoiding certain normal actions that might merely invite suspicion) in the surveilled population.

Interrogation – Far from the stereotypical suspect in a darkened room with one light glaring, interrogation is about any contextually inappropriate questioning or probing for personal information. Take my favorite example of a hiring manager interviewing a female candidate and asking if she was pregnant. Inappropriate given the context of a job interview; that’s interrogation. It’s not about the answer (the “data”) or the use of the answer. The candidate needn’t answer to feel “violated” in the mere asking of the question, raising consequences of anxiety, trepidation, embarrassment or more. Again, we find the act and method of questioning is the invasion, irrespective of any data.

Intrusion – When Pokémon Go came out, fears about what information were collected about players abounded, but one privacy issue hardly on anyone’s radar was the use of churches by the game as places for the individual to train their characters. It turned out some of the churches on the list had been converted to people’s residences thus inviting players to intrude upon those resident’s tranquility and peaceful enjoyment of their homes. I defy any privacy professional to say that asking any developer about the personal data there are processing, even under the most liberal definition of personal data, would have uncovered this privacy invasion.

Decisional Interference – Interfering with private decisions strikes at the heart of personal autonomy. The classic examples are laws that affect family decisions, such as China’s one child policy or contraception in the United States. But there are many ways to interfere with individual’s decisions. Take the recent example of Cambridge Analytica. Yes, the researcher who collected the initial information shared people’s information with Cambridge Analytica and that was bad. Yes, Cambridge Analytica developed psychographic profile and that was problematic. But what really got the press, the politicians and others so upset was Cambridge Analytica’s manipulation of individuals. It was there attempt, successful or otherwise, to alter peoples’ perception and manipulate their decision to vote and for whom.

None of the above examples of privacy issues are properly covered by a FIPPs based data protection regime, without enormous contortion. They deal with interactions between persons and organizations or among person, not personal data. Some may claim, that while true, any of these invasions, at scale, must involve data, not one-off security guards. I invite readers to do a little Gedanken experiment. Imagine a web interface with a series of questions, each reliant on the previous answers. Are you a vegetarian? No? What is your favorite meat, chicken, fish or beef? Etc. I may not store your answer (no “data” collection) but ultimately the questioning leads you to one specific page where I offer you a product or service based on your specific selection, perhaps discriminatory pricing based on the selection. Here user interface design essentially captures and profiles users but without that pesky data collecting that would invite scrutiny from the privacy office. I’m not saying some companies might be advanced enough in their thinking, but in my years of practice most privacy assessments begin with “what personal data are you collecting?”

Now, I’ll admit I haven’t spent a time to develop a regulatory proposal but I’d at least suggest looking at Woody Hartzog’s Privacy’s Blueprint for one possible path to follow. Hartzog’s notions of obscurity, trust and autonomy as guiding privacy goals encapsulate more than a data centric world. But Hartzog doesn’t just leave these goals sitting out there with no way to accomplish them. He presents two controls that would help: signaling and increasing transaction costs. Hartzog’s proposal for signaling is that in determining the relationship between individuals and organizations and the potential for unfairness and asymmetries (in information and power), judges should look not to the legalese of the privacy notice, terms and conditions or contracts but the entirety of the interaction and interfaces. This would do more to determine whether a reasonable user fully understood the context of their interactions.

Hartzog’s other control, transaction costs, goes into making it more expensive for organization to commit privacy violations. One prominent example of legislation that increases transaction costs is the US TPCA which bans robocalls. Robocalling technology significantly decreases the cost of calling thousands or millions of households. The TCPA doesn’t ban solicitation, but it significantly increases the costs to solicitors by requiring a paid human caller to make the call. In this way, it reduces the incidents of intrusion. Similarly, the GDPR’s ban on automated decision making increases the transaction costs by requiring human intervention. This significantly reduces the scale and speed at which a company can commit privacy violations and the size of the population affected. Many would counter, and in fact many commenters on any legislative proposal, are concerned about the effect on innovation and small companies. True, that increasing transaction costs, in the way that the TCPA does, will increase costs for small firms. That is, after all, the purpose of increasing transaction costs, but the counter-argument is do you want a two-person firm in a garage somewhere adversely affecting the privacy of millions of individuals? Would you want a small firm without any engineers thinking about safety building a bridge over which thousands of commuters traveled daily? One could argue the same for Facebook, they’ve made it so efficient to connect billions of individuals they simply don’t have the resources to deal with the scale of the problems they’ve created.

The one area where I agree with the Intel proposal is about FTC enforcement. As our de-facto privacy enforcer it already has institutional knowledge to build on, but their enforcement needs real teeth not ineffectual consent decrees. When companies analyze compliance risk if the impact of non-compliance is cost comparable to the cost of compliance, the they are incentivized to reduce the likelihood of getting caught, not actually get in compliance with the regulation. The fine (impact) multiplied by the likelihood of getting fined must exceed the cost of compliance to drive compliance. This is what, at least in theory, the GDPR 4% seeks to accomplish. Criminal sanctions on individual actors, if enforced, may have similar results.

There are other problems with the FIPPs. They mandate controls without grounding in the ultimate effectiveness of those controls. I can easily technically comply with the FIPPs without manifesting improving privacy. Mandating transparency (openness in the Intel proposal) without judicial ability to consider the entirety of the user experience and expectation only yields lengthy privacy notices. Even shortened notices provide less than information about what’s going on than user’s reliance on the interactions with the company.

In high school, I participated in a mock constitution exercise where we were supposed to develop a new constitution for a new society. Unfortunately, we failed and lost the competition. Our new constitution was merely the US Constitution with a few extra amendments. As others have said we don’t need GDPR-light, we need something unique to the US. I don’t claim Hartzog’s model is the total solution, but rather than looking at the FIPPs, Intel and others proposing legislation should be looking forward for solutions for the future, not the past.

 

 

Kafka’s “The Trial” finds new life on Amazon

In the early years of the aught decade, Dan Solove, then assistant professor of law at Seton Hall Law School presaged that the new paradigm for privacy in the Internet age wasn’t George Orwell’s 1984 but rather Franz Kafka’s The Trial. In the book, the protagonist is subject to a secret trial in which he knows not the charges against him or the evidence used. He is not allowed to participate in the proceedings or dispute the evidence. In the end, the character is executed having never learned what the charges were. A few years later when Professor Solove developed his taxonomy of privacy, which categorized cognizable privacy violations, this form of privacy violation became known as Exclusion: the use of information about an individual without the individual’s knowledge or ability to participate. Solove’s taxonomy encompasses 17 distinct privacy violations spanning four broad categories: information collection, information processing, information dissemination and the non-information privacy issues in invasion. Exclusion is a particularly pernicious violation because it combines two fundamental forces that underlie many issues around privacy, namely imbalance in information and imbalance in power between the organization and the individual. In The Trial, the government is the perpetrator of this type of violation. Historically, government actors is where one mostly find this because government’s power monopoly prevents individual choice and the impact, such as loss of liberty, can be devastating. The unfairness of exclusion was crucial in criticisms of the US government’s no-fly list, which contained names of individuals who didn’t know they were on the list, even once known were unable to know or dispute the information used against them. Concern over exclusion also underpins the original formation of the Fair Information Practices (which was primarily aimed at government databases in the 1970s) to prevent secret databases to which individual couldn’t dispute inaccuracies. In the commercial realm, these concepts first took hold in credit reporting industry and the Fair Credit Reporting Act’s requirement for transparency and a dispute mechanism.

I’m a heavy user of Professor Solove’s taxonomy in my privacy by design training because it help participants categorize different types of privacy violations. It highlights, for most participants, that insecurity of data (which seems to be almost a fetishistic focus for many) is really only the tip of the privacy iceberg. I like to joke in my training that I’ve had every single privacy violation foisted on me in some fashion or another. Using anecdotes helps my students relate to the violations in a way that defining and explaining doesn’t. Personal stories convey even more than sensational news stories can. In one instance, I talk about how a cashier at fast food restaurant identified my then girlfriend to send her a Facebook friend request. I show students an Asian dating website where my picture was appropriated in advertising. Exclusion really hits home when I ask students if they’ve ever been put on hold and told their call will be answered by the next available operator. I discuss how some companies use secret profiles of their “problem” customers to constantly kick them to the back of the queue without the customer being able to know or dispute this status.

My most recent example comes from Amazon and is detailed below.

Last month I was traveling in Europe to attend a few privacy related events and conduct some training for Deloitte Romania. In Bucharest, I taught a CIPT course as well as my Privacy by Design class, relabeled Data Protection by Design for the EU audience. While in Romania I attempted to place an order on Amazon for an electronic gift card. Apparently this raised red flags within Amazon’s systems (an electronic gift card ordered by an American consumer but from Romania, which they probably inferred from my IP address). The order was canceled. Amazon also blocked access to my account for 5 hours.

After 5 hours, I followed the instructions, which included resetting my password. I was back in and figuring it was safe since I re-authenticated my access to my email, re-placed my order . Of course, it wasn’t. Amazon repeated it’s effort but this time required that I call to reinstate my account.

I called Amazon, spoke with a customer service agent and after answering quite a few questions (like how long I had had my Amazon account, whether I had any linked devices, etc.), my account access was reinstated and I was able to reset my password and log in. I decided NOT to try and replace my order until I returned to the US the follow week, which I did. At this point my order went through and I thought everything was behind me. I even subsequently ordered Woody Hartzog’s new book Privacy’s Blueprint, without issue. [Side note, excellent book, highly recommended, just don’t order it from Amazon ;-)]

About two weeks later I’m in D.C. attending the Privacy Law Scholar’s Conference and ordered a present for a friend off their wish-list. BAM! My account was again shutdown. I didn’t realize it, since I was off doing actual work, but when I tried to check the status of the order I realized that they had shut my account down again. I called customer service but the best they could offer was to submit an email to the “Account Specialist” team. At 1:30 AM later than morning I received the following email:

Thank you for letting us know about the unauthorized activity on your Amazon.com account. For your security, the credit card information stored on your Amazon.com account cannot be accessed via our website. Your full credit card number is also not displayed in your account.

Due to this activity, your Amazon.com account has been closed and all open orders have been canceled. To continue shopping with Amazon.com, we ask that you open a new Amazon account. Your order history and other features such as Wishlists cannot be transferred to your new account.

Are you really sorry for the inconvenience this has caused, Amazon?

Note the email starts out “Thank you for letting us know about unauthorized activity.” I quickly responded that there was no authorized activity (and I certainly didn’t let them know). Oddly enough there was another email at 9:30AM that morning (AFTER the account closure) suggesting I needed to call to reset my password again. I did call but the customer service agent was only able to offer to send another email to the Account Specialist team.

Frustrated at this point, and not wanting my friend’s present further delayed, I dutifully followed the instructions about using another Amazon account, after all Amazon specifically told me that To continue shopping with Amazon.com, we ask that you open a new Amazon account.” I went into my business account which I had previously used to host AWS instances for various business projects. However, my attempt to complete my order was similarly thwarted. I initially received an email to confirm my billing address, which I quickly replied. I was then presented with this sternly worded email:

Any new accounts you open will be closed.”

So, despite having told me previously to open a new account to continue shopping, Amazon has decided they will close any account I open in the future, with no discussion of why this was occurring, no information as to what led to this decision and no opportunity to dispute or appeal it. Some faceless, nameless, bureaucratic Account Specialist had declared me persona non-grata. Assuming the Account Specialist is even a person and not a bot.

My virtual death as an Amazon customer.

Quite a few of the most recent complaints on BBB for Amazon are about inexplicable account closures. Luckily, my dependence on Amazon is negligible. Though I’ve been a customer 10 years, I wasn’t using AWS actively and canceled Prime just a few months ago because it wasn’t worth the costs. However, I could only imagine how this could have affected me. Having dinner the next day with a friend in the privacy/security industry he was shocked and concerned because he orders virtually everything off Amazon as I know many do. What if I used Kindle, would I lose all my purchases? I’d be more concerned about those who’s livelihood depended on the Amazon service. What if I was an associate and depended on Amazon referrals for this blog’s revenue? Or a reseller on Amazon? What I had had extensive use of AWS for my business? Perhaps if I was in one of these categories they would have been more circumspect in their closing my account. Perhaps I would have had another avenue to escalate, but as a regular consumer I certainly didn’t.

As for my particular circumstances, first and foremost, I’m a bit miffed that I can’t get my order history and wish-lists. I have frequently referred to my order history when doing my year end taxes as some of my purchases are business related. Also, I’ve spent years adding things to my wish-list, mostly books that I’ll probably never get around to ordering, though I have had relatives order them occasionally for birthdays in the past. There is no way for me to reconstruct this information. Account closure seems particular vindictive. Of course, I don’t know why they closed my account, so it’s hard to say. I can only guess that they suspected some sort of fraudulent activity despite all the charges to my account having always gone through and never disputed. If potential fraud is the reason, they could have suspended ordering privileges, but allowed the account to still access historical data. One problem with behemoths such as Amazon is that by providing multiple services to which people may become dependent, a problem in one area can have an out-sized impact on the individual. Segmentation of services is important in reducing risks. A problem with my personal account shouldn’t affect my business account. A problem with ordering physical products shouldn’t affect my Prime video membership or Kindle e-books. A problem with my use of the affiliate program shouldn’t affect my personal use as a shopper. Etc.

Maybe my tweet complaining that they were recording my call without informing me up front was the reason they closed my account.

Were I in the EU, I might have some additional recourse under the newly passed GDPR. Namely

  • Under Article 15 I could receive the personal information they have on me, including not only my wish-list and order history, but potentially whatever information was the basis for the account closure.
  • Under Article 16, I could potentially rectify any inaccurate data about me which may have lead to the decision to close my account.
  • Under Article 20, the right to data portability. While this probably doesn’t apply to my order history, since it’s arguable whether I “supplied” the information, it would apply to my wish-list, since that subjective preference data is something I supplied when I clicked “Add to my wish-list”
  • Under Article 21, the right not to be subject to automated decision making. More specifically:
    the data controller shall implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision.


    Of course, I don’t know if the account deletion determination was based on an automated decision, but I suspect in part, they have some AI making a determination at some point to suspend or restrict use of my account.

Once it’s available, I’ve been considering applying for Estonia’s digital nomad visa. If I still care at that point, when I’m in the EU, I might subject Amazon to some of these rights to which I’m not afforded as someone in the US. It’s highly likely I won’t care at that point. As I stated, my use of Amazon, unlike many associates of mine, has been limited. Maybe my account closure is a good thing as it pushes me closer to removing my reliance on the big tech firms. I’ve closed my Facebook account and haven’t used Facebook in almost 6 months (though to be transparent, I still use Instagram). I moved several years ago to a Libre 15 by Purism running PureOS (an Ubuntu Linux variant) and away from Windows, though again in full transparency, I purchased a Microsoft Surface Pro for presentations (on newegg, btw, where I’ll obviously be shopping more from now on). At least Microsoft is more in the pro-privacy camp. I’m still trying to extricate myself from Google. I purchased an iPhone (again on newegg) for the first time having been an Android user for many years. I’m still, unfortunately, on GoogleFi because I like that it works when I travel internationally. I still use Gmail for my personal email, though not for business anymore, having my own domain. One reason I still use Gmail is when I have to give my email address to someone orally, I know they aren’t going to misspell gmail.com like they might privacymaverick.com.

Trust by Design

I’ve fashioned myself a Privacy and Trust Engineer/Consultant for over a year now and I’ve focused on the Trust side of Privacy for a few years, with my consultancy really focused on brand trust development rather than privacy compliance. Illana Westerman‘s talk about Trust back at the 2011 Navigate conference is what really opened my eyes to the concept. Clearly trust in relationships requires more than just privacy. It is a necessary condition but not necessarily sufficient. Privacy is a building block to a trusted relationship, be it inter-personal, commercial or between government and citizen.

More recently, Woody Hartzog and Neil Richard’s “Taking Trust Seriously in Privacy Law” has fortified my resolve to pushing trust as a core reason to respect individual privacy. To that end, I though about refactoring the 7 Foundational Principles of Privacy by Design in terms of trust, which I present here.

  1. Proactive not reactive; Preventative not remedial  → Build trust, not rebuild
    Many companies take a reactive approach to privacy, preferring to bury their head in the sand until an “incident” occurs and then trying to mitigate the effects of that incident. Being proactive and preventative means you actually make an effort before anything occurs. Relationships are built on trust. If that trust is violated, it is much harder (and more expensive) to regain. As the adage goes “once bitten, twice shy.”
  2. Privacy as the default setting → Provide opportunities for trust
    Users shouldn’t have to work to protect their privacy. Building a trusted relationship occurs one step at a time. When the other party has to work to ensure you don’t cross the line and breach their trust that doesn’t build the relationship but rather stalls it from growing.
  3. Privacy embedded into the design → Strengthen trust relationship through design
    Being proactive (#1) means addressing privacy issues up front. Embedding privacy considerations into product/service design is imperative to being proactive. The way an individual interfaces with your company will affect the trust they place in the company. The design should engender trust.
  4. Full functionality – positive sum, not zero sum →  Beneficial exchanges
    Privacy, in the past, has been described as a feature killer preventing the ability to fully realize technology’s potential. Full functionality suggest this is not the case and you can achieve your aims while respecting individual privacy. Viewed through the lens of trust, commercial relationships should constitute beneficial exchanges, with each party benefiting from the engagement. What often happens is that because of asymmetric information (the company knows more than the individual) and various cognitive biases (such as hyperbolic discounting), individuals do not realize what they are conceding in the exchange. Beneficial exchanges mean that, despite one party’s knowledge or beliefs, they exchange should still be beneficial for all involved.
  5. End to end security – full life-cycle protection → Stewardship
    This principle was informed by the notion that sometimes organizations were protecting one area (such as collection of information over SSL/TLS) but were deficient in protecting information in others (such as storage or proper disposal). Stewardship is the idea that you’ve been entrusted, by virtue of the relationship, with data and you should, as a matter of ethical responsibility, protect that data at all times.
  6. Visibility and transparency – keep it open → Honesty and candor
    Consent is a bedrock principle of privacy and in order for consent to have meaning, it must be informed; the individual must be aware of what they are consenting to. Visibility and transparency about information practices are key to informed consent. Building a trusted relationship is about being honest and candid. Without these qualities, fear, suspicion and doubt are more prevalent than trust.
  7. Respect for user privacy – keep it user-centric  → Partner not exploiter
    Ultimately, if an organization doesn’t respect the individual, trying to achieve privacy by design is fraught with problems. The individual must be at the forefront. In a relationship built on trust, the parties must feel they are partners, both benefiting from the relationship. If one party is exploitative, because of deceit or because the relationship is achieved through force, then trust is not achievable.

I welcome comments and constructive criticisms of my analysis. I’ll be putting on another Privacy by Design workshop in Seattle in May and, of course, am available for consulting engagement to help companies build trusted relationships with consumers.

 

Price Discrimination

This post is not an original thought (do we truly even have “original thoughts”, or are they all built upon the thoughts of others? I leave that for others to blog about).  I recently read a decade old paper on price discrimination and privacy from Andrew Odlyzko.  It was a great read and it got more thinking about many of the motivations for privacy invasions, particularly this one.

Let me start out with a basic primer on price discrimination. The term refers to pricing items based on the valuation of the purchaser, in other words discrimination in the pricing of goods and services between individuals. Sounds a little sinister, doesn’t it? Perhaps downright wrong, unethical. Charging one price for one person and a different price for another.  But price discrimination can be a fundamental necessity in many economic situations.

Here’s an example. Let’s say I am bringing cookies to a bake sale. For simplicity, let’s say there are three consumers at this sale (A, B and C).  Consumer A just ate lunch so isn’t very interest in a cookie but is willing to buy one for $0.75. Consumer B likes my cookies and is willing to pay $1.00. Consumer C hasn’t eaten and loves my cookies but only has $1.50 on him at the time. Now, excluding my time, the ingredients for the cookies cost $3.00. At almost every price point, I end up losing money

Sale price $0.75 -> total is 3x$0.75 = $2.25
Sale price $1.00 -> total is 2x$1.00 = $2.00 (Consumer A is priced out as the cost is more than they are willing to pay)
Sale price $1.50 -> total is 1x$1.50 = $1.50 (Here both A and B are priced out)

However, if I was able to charge each Consumer their respective valuation of my cookies, things change.

$0.75+$1.00+$1.50= $3.25

Now, not only does everyone get a cookie for what they were willing to pay, I cover my cost and earn some money to cover my labor in baking the cookie. Everybody is happier as a result, something that could not have occurred had I not been able to price discriminate.

What does this have to do with Privacy? The more I know about my consumers, the more I’m able to discover their price point and price sensitivity. If I know that A just ate, or that C only has $1.50 in his pocket, or that B likes my cookies, I can hone in on what to charge them.

Price discrimination it turns out is everywhere and so are mechanisms to discover personal valuation. Think of discounts to movies for students, seniors and military personnel. While some movie chain may mistakenly believe they are doing it out of being a good member of society, there real reason is they are price discriminating. All of those groups tend to have less disposable income and thus are more sensitive to where they spend that money. Movies theaters rarely fill up and an extra sale is a marginal income boost to the theater.  This is typically where you find price discrimination, where the fix costs are high (running the theater) but the marginal cost per unit sold are low. Where there is limited supply and higher demand, the seller will sell to those willing to pay the highest price.

But what do the movie patrons have to do to obtain these cheaper tickets? They have to reveal something about themselves….their age, their education status or their profession in the military.

Other forms of uncovering consumer value also have privacy implications.  Most of them are very crude groupings of consumer in to bucket, just because our tools are crude, but some can be very invasive. Take the FAFSA, the Free Application for Federal Student Aid.  This form is not only needed for U.S. Federal loans and grants, but many universities rely on this form to determine scholarships and discounts. This extremely probing look into someones finances is used to perform price discrimination on students (and their parents), allowing those with lower income and thus higher price sensitivity to pay less for the same education as another student from a wealthier family.

Not all methods of price discrimination affect privacy, for instance, bundling.  Many consumers bemoan bundling done by cable companies who don’t offer an ala carte selection of channels. The reason for this is price discrimination. If they offered each channel at $1 per month, they would forgo revenue from those willing to pay $50 a month for the golf channel or those willing to pay $50 a month for the Game Show  Network. By bundling a large selection of channel, many of whom most consumers don’t want, they are able to maximize revenue from those with high price points for certain channels as well as those with low price points for many channels.

I don’t have any magic solution (at this point). However, I hope by exposing this issue more broadly we can begin to look for patterns of performing price discrimination without privacy invasions. One of the things that has had me thinking about this subject is a new App I’ve been working on for privacy preserving tickets and tokens for my start-up Microdesic. Ticket sellers have a problem price discriminating and tickets often end up on the secondary market as a result.

[I’ll take the bottom of this post to remind readers of two upcoming Privacy by Design workshops I’ll be conducting. The first is in April in Washington, D.C. immediately preceding the IAPP Global Summit. The second is in May in Seattle. Note, the tickets ARE price discriminated, so if you’re a price sensitive consumer, be sure to get the early bird tickets. ]

Pokemon Goes to Church

In case you haven’t read enough about Pokemon Go and Privacy

In the past, you knew you’d arrived on the national scene if Saturday Night Live parodied you. While SNL still remains a major force in television, the Onion has taken its place for the Internet set. Just as privacy issues have graced the covers of major news sites around the world, so too has it made its way into plenty of Onion stories. The latest faux news story involves the Pokemon Go craze sweeping the nation like that insidious game in Star Trek: The Next Generation that took over crew member brains on the Enterprise.

“What is the object of Pokemon Go?” asks the Onion in their article. And their response was “To collect as much personal data for Nintendo as possible.” That may or may not have been part of the intent of Nintendo, but the Onion found humor because of its potential for truth. Often times comedians create humor from uncomfortable truthfulness. In a world of Flashlight apps collecting geolocation, intentions for collecting data are not always clear as was Nintendo’s potential collection with their game. Much has already been written about this. So much attention has been focused on Nintendo, it stirred frequent pro-privacy Senator Al Franken to write a letter. I’d like to focus, though, on something that another news story picked up.

The privacy issue I’m talking about isn’t about the collection of information by Pokemon Go or even the use of the information that was collected. The privacy issue I want to relay is something even the most astute privacy professional might overlook in an otherwise thorough privacy impact assessment. As mentioned by Beth Hill in her previous post on the IAPP about Pokemon Go, a man who lived in a church found players camped outside his house. The App uses churches and gyms where player would converge to train. While this wouldn’t normally be problematic but one particular church was converted years ago into a private residence. The privacy issue at play here is one of invasion, defined by Dan Solove as “an invasive act that disturbs one’s tranquility or solitude.” We typically see invasion issues more commonly crop up related to spam emails, browser pop-ups, or telemarketing.

This isn’t the first time we’ve seen this type invasion. In order to personalize services, many companies subscribe to IP address geolocation services. These address translation services translate an IP address into a geographic location. Twenty years ago the best one could do would be a country or region based on assigned IP address space in ARIN (American Registry for Internet Numbers). If your IP address was registered to a California ISP, you were probably in California. The advent of smartphones and geolocation has added a wealth of data granularity to the systems. Now, if you connect your smart phone to your home WiFi, the IP address associated with that WiFi could be tied to your exactly longitude and latitude. Who do you think that “Flashlight” application was selling your geolocation information to? The next time you go online with your home computer (without GPS), services still know where you are by virtue of the previously associated IP address and geolocation. One of the subscribers to these services are law enforcements, and lawyers and a host of others trying to track people down. Behind on your child support payment? Let them subpoena Facebook, get the IP address you last logged in and then geo-locate that to your house, to serve you with a warrant. Now that’s personalization by the police department!  No need to be inconvenienced and go down to the station be arrested. But what happens when your IP address has never been geolocated? Many address translation services just pick the geographic center of where what they can determine, be that city, state or country. Read about a Kansas farm owner’s major headaches because he’s located at the geographic center of the U.S. at http://fusion.net/story/287592/internet-mapping-glitch-kansas-farm/

Many privacy analysts wouldn’t pick up on these type of privacy concerns for no less that four reasons. First, it doesn’t involve information privacy, but intrusion into an individual’s personal space. Second, even when looking for intrusion type risks, an analyst is typically thinking of marketing issues (through spamming or solicitation), in violation of CAN-SPAM, CASL, the telephone sales solicitation rules or other national or laws. Third, the invasion didn’t involve Pokemon Go user privacy but rather another distinct party. This isn’t something that could be disclosed on a privacy policy or adequately addressed by App permissions settings. Finally, the data in question didn’t involve “personal data.” It was the address of churches at issue. If you haven’t been told by a system owner, developer or other technical resource that no “personal data” is being collected, stored or processed, then you clearly haven’t been doing privacy long enough. In this case, they would be more justified that most. Now, this isn’t to excuse the developers for using churches as gyms. An argument could easily be made that people are just as deserving of “tranquility and solitude” is their religious observations as in their home.

Ignoring the physical invasion in religious institution’s space for one moment, one overriding problem in identifying this issue is that it is rare. Most churches simply aren’t people’s homes. A search on the Internet reveals a data broker selling a list of 110,000 churches in the US (including geolocation coordinates). If the one news story represents the only affected individual, this means that only approximately 1/100,000 churches were actually someone’s home. If you’re looking for privacy invasions, this is probably not high on your list based on a risk based analysis.

There are two reasons that this is the wrong way to think about this. First off, if your company has millions of users (or is encouraging millions of users to go to church), even very rare circumstances will happen. Ten million users with a one in a million chance of a particular privacy invasion means is going to happen, on average, to ten users. The second reason that this is extremely important to business is because these types of very rare circumstances are newsworthy. It is the one Kansas farm that makes the news. It is the one pregnant teenager you identify through big data that gets headlines. The local auto fatality doesn’t make the front page but if one person poisons a few bottles of pills out of the billions sold then your brand name is forever tied to that tragedy. Corporations can’t take advantage of the right to be forgotten.

Assuming you can identify the issue, what do you do? Despite the rarity of the situation, the fact that it doesn’t involve information, it isn’t about marketing, it isn’t about your customers or users of your service, and, on it’s face, it doesn’t involve personal data, is all hope lost? What controls are available at your disposal to mitigate the risks? Pokemon Go developers were clearly cognizant enough to not include personal residences as gyms. They chose locations that were primarily identified as public. At a minimum then, they could have done, potentially, more to validate the quality of the data and confirm that their list of churches didn’t actually contain people’s residences. Going a step further, they could have considered excluding churches from the list of public places. This avoids not only the church converted to residence issue but also the invasion into religious practitioners’ solitude. Of course, the other types of locations chosen as gyms still needs to be scrubbed for accuracy as public spaces. However, even this isn’t sufficient. Circumstances change over time. What is a church or a library today, may be someone’s home tomorrow. Data ages. Having a policy of aging information and constantly updating it is important even when it may not be, on its’ face, personal data. A really integrated privacy analyst or a development team that was privacy aware could even have turned this into a form of game play. Getting users to, subtly, report back through in-game mechanism that something is no longer a gym (i.e. no longer a public space), would keep your data fresh and mitigate privacy invasions.

No-one ever said the job of a privacy analyst was easy, but with the proper analysis, the proper toolset and the proper support of the business, you can keep your employer out of the news and try keeping your customers (and non-customers) happy and trusting your brand.

Essentialism and Privacy

I first learned about essentialism while listening to an audio book of The Greatest Show on Earth by Richard Dawkins. Essentialism has it roots in Plato’s Idealism, though I would suggest that our being drawn to it may be a result in the way the human brain functions. For those unfamiliar, essentialism, simply put, is the notion that “things” have an essential form behind them. Thus in Plato’s world, a circle is defined by a perfect ideal of circle and while real world circles may have variations, bumps and such, a circle is essentially a line drawn around a point at all times equidistance from that point.

There a large variety of geometric shapes, triangles, squares, dodecagon, for which humans have assigned monikers. However, there are an infinite number of shapes that defy such simplistic definition. While a line equidistance from a point is the perfect circle, a random squiggle is the best whatever it is, despite us not having a name for it. Now, I don’t claim to be a neuro-biologist, but in my rudimentary understanding, our brains store things in a way that provides simple categorization. Language is built on defining things we can relate to. We see something round, our brain fires off the neurons that represent a circle. We can also abstract by grouping things together. We see a 12 side shape; we may know it is a polygon but not a dodecagon. Our brains are really good at analogizing as well. We learn by analogy. We see something big, strong, with fangs and bearing its teeth, we may not know what it is, but we can recognize it’s probably a predator.

Dawkins discussed essentialism in the concept of evolution. Prior to Charles Darwin, living creatures broken into a taxonomy. In 1735, Carl Linnaeus is the seminal work Systema Naturae started with three kingdoms of nature (only two animals and plants were living), divided into classes then orders, genus, and species. We still use a form of this taxonomy today when we talk about life, only now thanks to Thomas Cavalier-Smith, we have six kingdoms. Dawkins beef with essentialism is that by categorization we make it more difficult to see the evolutionary changes. Take a rabbit, defined as a furry creature with fluffy ears, a bushy tail and strong hind legs. But that’s the ideal, every rabbit is different and if you go back in the ancestry of rabbits, when does it cease to be a rabbit? In the future, as generations are born, when does the descendent of a modern day rabbit cease to be a rabbit? Humans have a hard time dealing with conceptualizing large spans of time, so we can analogize (again, using that great learning technique) to relatives and aging.  My brother is clearly my relative, as are my first and second cousins. Though I don’t know them, I know I have third cousins and more that are relatives. At what point though are we no longer “relatives?” One young girl even claimed to show that all but one of the presidents were related, tracing lineage back to an English King. When I meet someone on the street, do I only not put someone in the “relative” bucket in my brain because nobody has done the analysis? Aging provides a similar means of clearly showing the continuity of life and a break down of our taxonomy of age. We are born as babies, grow to be infants, then toddlers, next children then young adults, then adults, then we’re labeled old, and perhaps elderly after that. But what defines those classifications? When do I become old? Do we one day wake up and we’re suddenly “elderly?” Isn’t 60 the new 30?

Once I learned about essentialism, I started seeing the dichotomy everywhere: the breakdown between where people try to classify or categorize things and the reality that there is a continuous line. One of my first epiphanies occurred when I was trying to clean up my vast MP3 collection. Many of the songs had no associated genre or the genre was way off. I set about to correct that. I started labeling all my music. But then I ran into a clear conundrum. Was Depeche Mode “new wave” or “80’s pop”? Was Billy Bragg punk, folk or some crossover folk punk? Clearly the simplistic labeling system provided by Windows was the problem as it only allowed me to pick one genre. I need something more akin to modern day tagging where I could tag a song with a related genre, one or more. But was that really the problem?

I started realizing this problem (though not in the way I’ve characterized it now) about 20 years ago in relations to techno music. There seemed to be all sorts of subgenres: jungle, synth, ambient, acid, trance, industrial. It seemed every time I turned around there was a new subgenre: darkwave, dubstep, trap, the list goes on. Wikipedia lists over a hundred genres of electronic music. I couldn’t keep up and have trouble distinguishing between many of them. SoundCloud has millions upon millions of songs. Many of these defy categorization. What we’re learning from this is that we can like a song without pegging it into a specific category and with the power of suggestion, SoundCloud can find other songs we like without us needing to search the “Pop-Country” section of the local record store.

So now I come to privacy. You may be thinking that I’m going to talk about personalization and privacy and how in order to suggest an uncategorizable song, I have to know about your musical taste. While that it a valid topic for conversation, I’ll leave that to another post. What I want to talk about today is privacy’s taxonomy. I’ve been a big fan of Dan Solove’s privacy taxonomy for quite some time. I think it really does a good job of pinpointing privacy issues that people don’t normally think about and allows me to explore when talking with others. Going through the taxonomy allows me to illustrate types of privacy invasion that aren’t just about insecurity and identity theft. Talking about surveillance allows me to discuss how it can have a chilling effect, even if you’re not the target of the surveillance or “doing anything wrong.” I can talk about how interrogation, even if the subject doesn’t answer, may make them uncomfortable.

But I’ve also been thinking about the taxonomy and essentialism. What are we missing in the gaps between the categories? I’ve been working on a book, hopefully, to be published later this year on a theory of privacy that I hope will fill those gaps. A unified field theory of privacy, I hope. Stay tuned.

Privacy implications of Local Storage in web browsers

Privacy professionals often have a hard time keeping track of technology and how it affects privacy. This post is meant to help explain the technology of local/web storage.

With the ability to track users across domains, cookies have earned a bad reputation in the privacy community. This became particularly acute with the passing of the EU Cookie law. In particular the law requires affirmative consent when local storage on a user’s computer is used in a way that is not “strictly necessary for the delivery of a service requested by the user.” In other words, if you’re using it to complete a shopping cart at an online store, you need not get consent. If you’re using it to track the user for advertising purposes then you need to get consent.

Originally part of the HTML5 standard, web storage was split into it’s own specification. For more history on the topic, see this article. Web storage is meant to be accessed locally (by javascript) and can store up to 5MB per domain, compared to cookies which only store a maximum of 4kbs of data. Cookies are natively accessible by the server; the purpose of the cookie is to be accessed by server side scripts. Web storage is not immediately accessible by the server but it can be through javascript.

CONS

The con here is that, as a privacy professional, you should be aware of what your developers are doing with web/local storage. Simply asking your developer if they are using cookies may illicit a negative response when they are using an alternative technology that isn’t cookies. Later revelations and returning to your developers may result in a response “Well you asked about cookies, not local storage!” There are also proposals for a local browser accessible database but as of the time of this writing this is not an internet standard (see Mozilla Firefox’s IndexDB for an example).

Web Storage is not necessarily privacy invasive but two things need to be addressed. First, whether that local data is transmitted back to the server or used in such a way that implies results that are transmitted back to the server. Secondly, whether the data stored in local storage is accessible to third parties and represents a risk of exposure to the user. As of this writing, I’m not sure if 3rd party javascript running through a 1st party domain has the ability to access local storage of if it is restricted by a content security policy. The other risk is that a local user can access local storage through the a javascript console. Ideally data on the client should be encrypted.

PROS

Local storage also has the potential to increase privacy. Decentralization is a key technique for architecting for privacy and having access to 5MB of local storage allows enough room to keep most, if not all, client data on the client. Instead of developing rich customer profiles for personalization on the server, keeping this data on the client reduces the risks to the user because the server becomes less of a target. Of course, care must be taken to deal with multi tenancy (more than one person on an end client), which may be especially difficult for systems accessed often by library patrons and the problems of people accessing the data of other local users.

Balancing privacy and societal benefit

One common retort to claims of privacy infringement from government and industry is that in prevents or impairs  a societal benefit.  The most oft cited example is the alleged dichotomy of security versus privacy. Another incarnation is in the debate over black box recorders in consumer vehicles and whether the public policy should favor detailed tracking to reduce auto fatalities over the lost privacy of drivers. However, not every public policy debate has settled in favor of reducing social harm to the determent of privacy.

Consider sexually transmitted diseases.  Health information is one of the special classes of personal information that the US has deemed fit to grace with one of it’s sectoral laws, HIPPA. Public policy in the United States favor strong protection for health data. However, HIPPA only covers health care providers and their business associates. What about private individuals who come into knowledge of health information about another person? May they disclose a persons health condition to the public at large? Generally, the answer is yes, as long as it is truthful. However, this is not always the case. Many states have a privacy tort for public disclosure of embarrassing private facts. The elements of proving such a tort are

  • There must be public disclosure of the facts
  • The facts must be private facts, not public ones
  • The matter made public must be one which would be offensive and objectionable to a reasonable person of ordinary sensibilities.

In some cases, the revelation must not be newsworthy or a matter of public interest. One’s condition of having an STD would seem like the prototypical private fact that one would seek to keep from disclosure. But from a public policy perspective, there is an argument to be made that in order to prevent the spread of the communicable diseases, infection with an STD should be public knowledge to prevent potential partners from becoming infected. However, to my knowledge, no US state requires mandatory public disclosure (to the public at large). They may require reporting to a government agency by health providers or require disclosure of status by an individual to a potential sexual partner. Failure to do so may be criminally or civilly punishable. The decision to reveal one’s status publicly remains in the control of the individual, even when the law requires revelation to those at risk (an STD infected individual’s sex partners).  This is by no means a closed debate.

Privacy is often pitted against societal benefits and the debate framed as an individual‘s right to privacy versus this particular social good.  Privacy rarely comes out ahead because the societal benefits of privacy are hard to quantify in terms of lives, money, or some other enumerable figure that can be directly compared against. But protection of privacy does have societal benefits. Selective disclosure allows people to building trusting relationship. Financial privacy prevents targeting and theft, making society more productive. Anonymity is critical to a society free speech less speakers be judged for controversial or unpopular thoughts. A lack of privacy impedes risk taking and chills the activities of people. Ultimately, privacy is about decision making and the autonomy of the individual to make those decisions for themselves. Without that autonomy, one does not have a free society and all the benefits that liberty brings.
We can not allow ourselves to fall into the trap that any social good which implicates privacy interests outweighs the privacy harms. If we’re going to take a economic approach we have to find a way to quantify privacy harms as a social costs.

 

Please note that this blog post is not mean to be a treatise on the intricacies of privacy law as they relate to health data, in general, or sexually transmitted diseases, in particular. For some additional information see http://www.iom.edu/~/media/Files/Activity%20Files/Research/HIPAAandResearch/PrittsPrivacyFinalDraftweb.ashx

 

 

 

 

KnowledgeNet

The KnowledgeNet speech in Boca Raton went really well. I got some great positive feedback.  In fact it was suggested that I propose to give the speech (or a similar one) at one of the IAPP’s national conferences. In addition, my preparation for the speech spurred my interested in several areas which I hope to explore, both within this blog and outside of it. 

The first, in trying to develop a simple PbD (Privacy by Design) example, I ran into the issue of protecting emails while still supplying the system with contact information.  Some people use one time email services (like Mailinator).  However, these have several potential downfalls, primary of which the email service can read your email and secondly, for some, you can only get email once.  I’m going to return to this subject when I have some time to do some more investigation.  I know there are other services out there that might fit the bill, I just need to find an innovative solution to this problem. 

Another issue that I found is that privacy professionals really need to be versed in cryptography.  They don’t need to actually know how the cryptography works, they just need to know about the capablities so they can demand those of their product development teams.  Things like zero knowledege proofs, homomorphic encryption, hasing.  I’m going to try and write an e-book about this but I think first I write each chapter (on a different technology as a blog post).

Still another issue that raised its head is the concept of provable audit-ability.  Most auditor just have to take the IT professional’s word that certain information/systems are secure.  Take for example, a developer who makes a backup of production data on an orphan server.  Nobody knows about it except the developer.  Nobody audits the access controls on that box because they don’t even know about it.  How is an audit supposed to find it?  The concept of provable audit-ability goes to proving with mathematical certainty that nobody tampered with or has access without authorization.  It’s doable, if organizations are willing to consider privacy by design rather than privacy by accident.  Currently auditors say “we think we’re secure” but they really don’t know and they can’t know until a breach occurs and it’s too late.

Giving this speech has put a lot on my mind and there are many more blog posts to make in the coming weeks.  Let’s hope I can find time to put the pen to paper, so to speak.